matlab决策树回归分析,决策树回归模型(Decision Tree - Regression)

本文介绍了使用MATLAB进行决策树回归分析的方法,包括ID3算法和标准差减少的概念。决策树通过划分数据集来构建回归或分类模型,其中决策节点代表属性测试,叶节点代表数值目标。标准差减少用于评估数据集划分后的纯度,递归地构建决策树直至满足终止条件,如标准差低于一定比例或实例数量过少。

Decision

Tree - Regression

Decision tree builds

regression or classification models in the form of a tree

structure. It brakes down a dataset into smaller and smaller

subsets while at the same time an associated decision tree is

incrementally developed. The final result is a tree

with decision

nodes and leaf nodes.

A decision node (e.g., Outlook) has two or more branches (e.g.,

Sunny, Overcast and Rainy), each representing values for the

attribute tested. Leaf node (e.g., Hours Played) represents a

decision on the numerical target. The topmost decision node in a

tree which corresponds to the best predictor

called root node. Decision trees can handle

both categorical and numerical data.

a4c26d1e5885305701be709a3d33442f.png

Decision Tree Algorithm

The core algorithm for

building decision trees

called ID3 by J. R.

Quinlan which employs a top-down, greedy search through the space

of possible branches with no backtracking. The ID3 algorithm can be

used to construct a decision tree for regression by replacing

Information Gain with Standard

Deviation Reduction.

Standard

Deviation

A decision tree is built

top-down from a root node and involves partitioning the data into

subsets that contain instances with similar values (homogenous). We

use standard deviation to calculate the homogeneity of a numerical

sample. If the numerical sample is completely homogeneous its

standard deviation is zero.

a) Standard deviation

for one attribute:

a4c26d1e5885305701be709a3d33442f.png

b) Standard deviation

for two attributes:

a4c26d1e5885305701be709a3d33442f.png

Standard Deviation

Reduction

The standard deviation

reduction is based on the decrease in standard deviation after a

dataset is split on an attribute. Constructing a decision tree is

all about finding attribute that returns the highest standard

deviation reduction (i.e., the most homogeneous

branches).

Step 1: The standard

deviation of the target is

calculated.

Standard deviation (Hours

Played) = 9.32

Step 2: The dataset is

then split on the different attributes. The standard deviation for

each branch is calculated. The resulting standard deviation is

subtracted from the standard deviation before the split. The result

is the standard deviation reduction.

a4c26d1e5885305701be709a3d33442f.png

a4c26d1e5885305701be709a3d33442f.png

Step 3: The attribute

with the largest standard deviation reduction is chosen for the

decision node.

a4c26d1e5885305701be709a3d33442f.png

Step 4a: Dataset is

divided based on the values of the selected attribute.

a4c26d1e5885305701be709a3d33442f.png

Step 4b: A branch set

with standard deviation more than 0 needs further

splitting.

In practice, we need some

termination criteria. For example, when standard deviation for the

branch becomes smaller than a certain fraction (e.g., 5%) of

standard deviation for the full

dataset OR when too few

instances remain in the branch (e.g., 3).

a4c26d1e5885305701be709a3d33442f.png

Step 5: The process is

run recursively on the non-leaf branches, until all data is

processed.

When the number of instances

is more than one at a leaf node we calculate

the average as the final

value for the target.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值