原创转载请注明出处:https://www.cnblogs.com/agilestyle/p/12719231.html
准备数据
这里使用到 sklearn 自带的波士顿房价数据集,该数据集给出了影响房价的一些指标,比如犯罪率,房产税等,最后给出了房价。根据这些指标,使用 CART 回归树对波士顿房价进行预测。
from sklearn.datasets import load_boston from sklearn.metrics import mean_squared_error, mean_absolute_error from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor boston = load_boston() features = boston.data labels = boston.target # (506, 13) features.shape # (506,) labels.shape
分割训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.33, random_state=0)
建模训练
dtr = DecisionTreeRegressor() # DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None, # max_leaf_nodes=None, min_impurity_decrease=0.0, # min_impurity_split=None, min_samples_leaf=1, # min_samples_split=2, min_weight_fraction_leaf=0.0, # presort=False, random_state=None, splitter='best') dtr.fit(X_train, y_train)
评价模型
predict_price = dtr.predict(X_test) print('回归树二乘偏差均值:', mean_squared_error(y_test, predict_price)) print('回归树绝对值偏差均值:', mean_absolute_error(y_test, predict_price))
运行结果(每次运行结果可能会有不同)
回归树二乘偏差均值: 24.67646706586826 回归树绝对值偏差均值: 3.1670658682634736
决策树可视化
from sklearn.tree import export_graphviz with open('boston.dot', 'w') as f: f = export_graphviz(dtr, out_file=f)
如果把回归树画出来,可以得到下面的图示(波士顿房价数据集的指标有些多,所以树比较大):