Python之随机森林实战

代码实现：

 # -*- coding: utf-8 -*-

 """

 Created on Tue Sep  4 09:38:57 2018

 @author: zhen

 """

 from sklearn.ensemble import RandomForestClassifier

 from sklearn.model_selection import train_test_split

 from sklearn.metrics import accuracy_score

 from sklearn.datasets import load_iris

 import matplotlib.pyplot as plt

 iris = load_iris()

 x = iris.data[:, :2]

 y = iris.target

 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

 # n_estimators:森林中树的个数（默认为10），建议为奇数

 # n_jobs:并行执行任务的个数（包括模型训练和预测），默认值为-1，表示根据核数

21 rnd_clf = RandomForestClassifier(n_estimators=15, max_leaf_nodes=16, n_jobs=1)

22 rnd_clf.fit(x_train, y_train)

23

24 y_predict_rf = rnd_clf.predict(x_test)

 print(accuracy_score(y_test, y_predict_rf))

 for name, score in zip(iris['feature_names'], rnd_clf.feature_importances_):

     print(name, score)

 # 可视化

 plt.plot(x_test[:, 0], y_test, 'r.', label='real')

 plt.plot(x_test[:, 0], y_predict_rf, 'b.', label='predict')

 plt.xlabel('sepal-length', fontsize=15)

 plt.ylabel('type', fontsize=15)

 plt.legend(loc="upper left")

 plt.show()

 plt.plot(x_test[:, 1], y_test, 'r.', label='real')

 plt.plot(x_test[:, 1], y_predict_rf, 'b.', label='predict')

 plt.xlabel('sepal-width', fontsize=15)

 plt.ylabel('type', fontsize=15)

 plt.legend(loc="upper right")

 plt.show()

结果：

可视化（查看每个预测条件的影响）：

　　分析：鸢尾花的花萼长度在小于6时预测准确率很高，随着长度的增加，在6~7这段中，预测出现较大错误率，当大于7时，预测会恢复到较好的情况。宽度也出现类似的情况，在3~3.5这个范围出现较高错误，因此在训练中建议在训练数据中适量增加中间部分数据的训练量（该部分不容易区分），以便得到较好的训练模型！

Python之随机森林实战的相关教程结束。

《Python之随机森林实战.doc》

下载本文的Word格式文档，以方便收藏与打印。

Python之随机森林实战

Python之随机森林实战的相关教程结束。

相关推荐

python中bool的应用场景有哪些

怎么使用python编写简单鸡兔同笼程序

python任意进制转换的方法是什么

python怎么去掉重复数据

python列表重复元素怎么删除

python中怎么去掉重复项

python中len函数的使用方法是什么

python如何把字符串拆开