Sklearn

sklearn是基于numpy和scipy的一个机器学习算法库，设计的非常优雅，它让我们能够使用同样的接口来实现所有不同的算法调用。本文首先介绍下sklearn内的模块组织和算法类的顶层设计图。

Sklearn 使用

在sklearn里面，我们可以使用完全一样的接口来实现不同的机器学习算法，通俗的流程可以理解如下：

1、数据加载和预处理
2、定义分类器（回归器等等)
3、用训练集对模型进行训练，只需调用fit方法
4、用训练好的模型进行预测
5、对模型进行性能评估

加载数据

from sklearn import datasets
from sklearn import metrics
dataset=datasets.make_classification(n_samples=1000,n_features=10,n_informative=2,n_redundant=2,n_repeated=0,n_classes=2)
print(dataset[0])
print(dataset[1])
print('n')

分类数据

from sklearn import cross_validation
kf=cross_validation.KFold(len(dataset[0]),n_folds=10,shuffle=True)
for train_index,test_index in kf:
	x_train,y_train=dataset[0][train_index],dataset[1][train_index]
	x_test,y_test=dataset[0][test_index],dataset[1][test_index]
    print(x_train,'n')
    print(y_train,'n')
    print(x_test,'n')
    print(y_test,'n')

GaussianNB

模型训练与预测与评估

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
print (pred)
print( y_test)

acc = metrics.accuracy_score(y_test, pred)
print ("Accuracy:",acc)
f1 = metrics.f1_score(y_test, pred)
print ("F1-score：",f1)
auc = metrics.roc_auc_score(y_test, pred)
print ("AUC ROC:",auc)
print("n")

结果

GaussianNB
[1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.84
F1-score： 0.8518518518518519
AUC ROC: 0.8445512820512822

SVC

模型训练与预测与评估

from sklearn.svm import SVC
print("SVC")
C_values = [1e-02, 1e-01, 1e00, 1e01, 1e02]
for Cs in C_values:
	clf = SVC(C=Cs, kernel='rbf', gamma=0.1)
	clf.fit(x_train, y_train)
	pred = clf.predict(x_test)
	print (pred)
	print( y_test)
    
	acc = metrics.accuracy_score(y_test, pred)
	print ("Accuracy:",acc)
	f1 = metrics.f1_score(y_test, pred)
	print ("F1-score：",f1)
	auc = metrics.roc_auc_score(y_test, pred)
	print ("AUC ROC:",auc)
	print("n")

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
大专栏  Sklearn">37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

SVC
[1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.82
F1-score： 0.8363636363636364
AUC ROC: 0.8253205128205129


[1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.84
F1-score： 0.8518518518518519
AUC ROC: 0.8445512820512822


[1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.85
F1-score： 0.8598130841121496
AUC ROC: 0.8541666666666666


[1 1 0 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1
 1 0 1 0 0 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.85
F1-score： 0.854368932038835
AUC ROC: 0.8525641025641024


[1 1 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1
 0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 0 1
 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.84
F1-score： 0.8400000000000001
AUC ROC: 0.8413461538461539

Random Forest

模型训练与预测与评估

from sklearn.ensemble import RandomForestClassifier
print("Random Forest")
n_estimators=[10, 100, 1000]
for value in n_estimators:
	clf = RandomForestClassifier(n_estimators=value)
	clf.fit(x_train, y_train)
	pred = clf.predict(x_test)
	print (pred)
	print( y_test)
    
	acc = metrics.accuracy_score(y_test, pred)
	print ("Accuracy:",acc)
	f1 = metrics.f1_score(y_test, pred)
	print ("F1-score：",f1)
	auc = metrics.roc_auc_score(y_test, pred)
	print ("AUC ROC:",auc)
	print("n")

结果

Random Forest
[1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.9
F1-score： 0.9
AUC ROC: 0.9014423076923077


[1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1
 1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.9
F1-score： 0.9019607843137256
AUC ROC: 0.9022435897435898


[1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1
 1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
[1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
Accuracy: 0.9
F1-score： 0.9019607843137256
AUC ROC: 0.9022435897435898

相关阅读:
逐点分析，这样做Web端性能测试
 如何完成大数据测试-从功能测试角度分析
 自动化测试和手动测试利弊
 （一）SQL注入漏洞测试的方式总结
 如何设计一个完整的测试用例
 测试与开发、产品、上下级沟通、
黑盒测试用例设计总结
 改变测试思路，你的性能测试才能更值钱！（下）
20190923-03Linux时间日期类 000 011
20190923-02Linux文件目录类 000 010
原文地址：https://www.cnblogs.com/lijianming180/p/12302263.html

最新文章
接口开发
 练习
 操作REDIES
修改EXCEL
读EXCEL
通用导出EXCEL
操作MYSQL
商品管理的小程序
 生成一个双色球号码的小程序
 写excel