• Sklearn


    Sklearn

    sklearn是基于numpy和scipy的一个机器学习算法库,设计的非常优雅,它让我们能够使用同样的接口来实现所有不同的算法调用。本文首先介绍下sklearn内的模块组织和算法类的顶层设计图。

    Sklearn 使用

    在sklearn里面,我们可以使用完全一样的接口来实现不同的机器学习算法,通俗的流程可以理解如下:

    1、数据加载和预处理
    2、定义分类器(回归器等等)
    3、用训练集对模型进行训练,只需调用fit方法
    4、用训练好的模型进行预测
    5、对模型进行性能评估

    加载数据

    1
    2
    3
    4
    5
    6
    from sklearn import datasets
    from sklearn import metrics
    dataset=datasets.make_classification(n_samples=1000,n_features=10,n_informative=2,n_redundant=2,n_repeated=0,n_classes=2)
    print(dataset[0])
    print(dataset[1])
    print('n')

    分类数据

    1
    2
    3
    4
    5
    6
    7
    8
    9
    from sklearn import cross_validation
    kf=cross_validation.KFold(len(dataset[0]),n_folds=10,shuffle=True)
    for train_index,test_index in kf:
    x_train,y_train=dataset[0][train_index],dataset[1][train_index]
    x_test,y_test=dataset[0][test_index],dataset[1][test_index]
    print(x_train,'n')
    print(y_train,'n')
    print(x_test,'n')
    print(y_test,'n')

    GaussianNB

    模型训练与预测与评估

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    from sklearn.naive_bayes import GaussianNB
    clf = GaussianNB()
    clf.fit(x_train, y_train)
    pred = clf.predict(x_test)
    print (pred)
    print( y_test)

    acc = metrics.accuracy_score(y_test, pred)
    print ("Accuracy:",acc)
    f1 = metrics.f1_score(y_test, pred)
    print ("F1-score:",f1)
    auc = metrics.roc_auc_score(y_test, pred)
    print ("AUC ROC:",auc)
    print("n")

    结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    GaussianNB
    [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
    1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.84
    F1-score: 0.8518518518518519
    AUC ROC: 0.8445512820512822

    SVC

    模型训练与预测与评估

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    from sklearn.svm import SVC
    print("SVC")
    C_values = [1e-02, 1e-01, 1e00, 1e01, 1e02]
    for Cs in C_values:
    clf = SVC(C=Cs, kernel='rbf', gamma=0.1)
    clf.fit(x_train, y_train)
    pred = clf.predict(x_test)
    print (pred)
    print( y_test)

    acc = metrics.accuracy_score(y_test, pred)
    print ("Accuracy:",acc)
    f1 = metrics.f1_score(y_test, pred)
    print ("F1-score:",f1)
    auc = metrics.roc_auc_score(y_test, pred)
    print ("AUC ROC:",auc)
    print("n")

    结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    大专栏  Sklearn">37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    SVC
    [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
    1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.82
    F1-score: 0.8363636363636364
    AUC ROC: 0.8253205128205129


    [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
    1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.84
    F1-score: 0.8518518518518519
    AUC ROC: 0.8445512820512822


    [1 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 1
    1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.85
    F1-score: 0.8598130841121496
    AUC ROC: 0.8541666666666666


    [1 1 0 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1
    1 0 1 0 0 0 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.85
    F1-score: 0.854368932038835
    AUC ROC: 0.8525641025641024


    [1 1 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1
    0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 0 1
    1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.84
    F1-score: 0.8400000000000001
    AUC ROC: 0.8413461538461539

    Random Forest

    模型训练与预测与评估

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    from sklearn.ensemble import RandomForestClassifier
    print("Random Forest")
    n_estimators=[10, 100, 1000]
    for value in n_estimators:
    clf = RandomForestClassifier(n_estimators=value)
    clf.fit(x_train, y_train)
    pred = clf.predict(x_test)
    print (pred)
    print( y_test)

    acc = metrics.accuracy_score(y_test, pred)
    print ("Accuracy:",acc)
    f1 = metrics.f1_score(y_test, pred)
    print ("F1-score:",f1)
    auc = metrics.roc_auc_score(y_test, pred)
    print ("AUC ROC:",auc)
    print("n")

    结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    Random Forest
    [1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0
    0 0 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.9
    F1-score: 0.9
    AUC ROC: 0.9014423076923077


    [1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1
    1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.9
    F1-score: 0.9019607843137256
    AUC ROC: 0.9022435897435898


    [1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1
    1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    [1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0
    1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0
    1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0]
    Accuracy: 0.9
    F1-score: 0.9019607843137256
    AUC ROC: 0.9022435897435898
  • 相关阅读:
    逐点分析,这样做Web端性能测试
    如何完成大数据测试-从功能测试角度分析
    自动化测试和手动测试利弊
    (一)SQL注入漏洞测试的方式总结
    如何设计一个完整的测试用例
    测试与开发、产品、上下级沟通、
    黑盒测试用例设计总结
    改变测试思路,你的性能测试才能更值钱!(下)
    20190923-03Linux时间日期类 000 011
    20190923-02Linux文件目录类 000 010
  • 原文地址:https://www.cnblogs.com/lijianming180/p/12302263.html
Copyright © 2020-2023  润新知