验证曲线是用来提高模型的性能,验证曲线和学习曲线很相近,不同的是这里画出的是不同参数下模型的准确率而不是不同训练集大小下的准确率,主要用来调参,validation_curve方法使用采样k折交叉验证来评估模型的性能。
sklearn.model_selection.validation_curve(estimator, X, y, *, param_name, param_range, groups=None, cv=None, scoring=None, n_jobs=None, pre_dispatch='all', verbose=0, error_score=nan)
参数:
param_name :str,要评估的参数值,如果当model为SVC时,改变gamma的值,求最好的那个gamma值
param_range:参数的范围
返回:
train_scores:训练集得分
test_scores:测试集得分
from sklearn import datasets from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import validation_curve import numpy as np import matplotlib.pyplot as plt (X,y) = datasets.load_digits(return_X_y=True) # print(X[:2,:]) param_range = [10,20,40,80,160,250] train_score,test_score = validation_curve(RandomForestClassifier(),X,y,param_name='n_estimators',param_range=param_range,cv=10,scoring='accuracy') train_score = np.mean(train_score,axis=1) test_score = np.mean(test_score,axis=1) plt.plot(param_range,train_score,'o-',color = 'r',label = 'training') plt.plot(param_range,test_score,'o-',color = 'g',label = 'testing') plt.legend(loc='best') plt.xlabel('number of tree') plt.ylabel('accuracy') plt.show()