matlab决策树回归分析,决策树回归模型（Decision Tree - Regression）

最新推荐文章于 2025-07-01 15:37:27 发布

转载最新推荐文章于 2025-07-01 15:37:27 发布 · 1.8k 阅读

·

0

·

标签

#matlab决策树回归分析

本文介绍了使用MATLAB进行决策树回归分析的方法，包括ID3算法和标准差减少的概念。决策树通过划分数据集来构建回归或分类模型，其中决策节点代表属性测试，叶节点代表数值目标。标准差减少用于评估数据集划分后的纯度，递归地构建决策树直至满足终止条件，如标准差低于一定比例或实例数量过少。

Decision

Tree - Regression

Decision tree builds

regression or classification models in the form of a tree

structure. It brakes down a dataset into smaller and smaller

subsets while at the same time an associated decision tree is

incrementally developed. The final result is a tree

with decision

nodes and leaf nodes.

A decision node (e.g., Outlook) has two or more branches (e.g.,

Sunny, Overcast and Rainy), each representing values for the

attribute tested. Leaf node (e.g., Hours Played) represents a

decision on the numerical target. The topmost decision node in a

tree which corresponds to the best predictor

called root node. Decision trees can handle

both categorical and numerical data.

Decision Tree Algorithm

The core algorithm for

building decision trees

called ID3 by J. R.

Quinlan which employs a top-down, greedy search through the space

of possible branches with no backtracking. The ID3 algorithm can be

used to construct a decision tree for regression by replacing

Information Gain with Standard

Deviation Reduction.

Standard

Deviation

A decision tree is built

top-down from a root node and involves partitioning the data into

subsets that contain instances with similar values (homogenous). We

use standard deviation to calculate the homogeneity of a numerical

sample. If the numerical sample is completely homogeneous its

standard deviation is zero.

a) Standard deviation

for one attribute:

b) Standard deviation

for two attributes:

Standard Deviation

Reduction

The standard deviation

reduction is based on the decrease in standard deviation after a

dataset is split on an attribute. Constructing a decision tree is

all about finding attribute that returns the highest standard

deviation reduction (i.e., the most homogeneous

branches).

Step 1: The standard

deviation of the target is

calculated.

Standard deviation (Hours

Played) = 9.32

Step 2: The dataset is

then split on the different attributes. The standard deviation for

each branch is calculated. The resulting standard deviation is

subtracted from the standard deviation before the split. The result

is the standard deviation reduction.

Step 3: The attribute

with the largest standard deviation reduction is chosen for the

decision node.

Step 4a: Dataset is

divided based on the values of the selected attribute.

Step 4b: A branch set

with standard deviation more than 0 needs further

splitting.

In practice, we need some

termination criteria. For example, when standard deviation for the

branch becomes smaller than a certain fraction (e.g., 5%) of

standard deviation for the full

dataset OR when too few

instances remain in the branch (e.g., 3).

Step 5: The process is

run recursively on the non-leaf branches, until all data is

processed.

When the number of instances

is more than one at a leaf node we calculate

the average as the final

value for the target.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。