• 机器学习入门---逻辑斯蒂 随机梯度上升


    # -*- coding: utf-8 -*-
    import pandas
    import numpy
    #create featrue list
    column_name = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']
    #获取数据
    data = pandas.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data',  names=column_name)
    # print(type(data))
    #将数据中的空值替换为?
    data = data.replace(to_replace='?', value=numpy.nan)
    #删除键值存在空值的数据
    data = data.dropna(how='any')
    
    from sklearn.model_selection import train_test_split
    #将数据集分为 训练集和测试集(X_train:训练数据,y_train:训练标记/特征;X_test:测试数据集,y_test:测试标记、特征)
    X_train, X_test, y_train, y_test = train_test_split(data[column_name[1:10]], data[column_name[10]], test_size=0.25, random_state=33)
    
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LogisticRegression
    from sklearn.linear_model import SGDClassifier
    
    ss = StandardScaler()
    X_train = ss.fit_transform(X_train)
    X_test = ss.transform(X_test)
    
    lr = LogisticRegression()
    sc = SGDClassifier()
    
    lr.fit(X_train, y_train)
    lr_y_predict = lr.predict(X_test)
    
    sc.fit(X_train, y_train)
    sc_y_predict = sc.predict(X_test)
    
    from sklearn.metrics import classification_report
    
    print('逻辑斯蒂模型-准确率'.decode('utf-8'), lr.score(X_test, y_test))
    print(classification_report(y_test,lr_y_predict, target_names=['良性','恶性']))
    
    print('随机梯度上升模型-准确率'.decode('utf-8'), sc.score(X_test, y_test))
    print(classification_report(y_test,sc_y_predict, target_names=['良性','恶性']))
    # print(lr_y_predict)
    # print(sc_y_predict)

    《机器学习及实践》 阅读记录。

    线性分类:

      1.逻辑斯蒂算法;

      2.随机梯度上升算法;

    线性分类器是最常用和最基础的机器学习模型。逻辑斯蒂(LogisticRegression)和随机梯度上升(SGDClassifier)前者对参数采用精确解析的方式,计算时间场但模型性能较高,后者采用随机梯度上升估算模型参数,计算时间短但产出的模型性能略低。

  • 相关阅读:
    python 异常类型大全
    HDU6532 Chessboard (最大费用流)
    P2764 最小路径覆盖问题 (最小点覆盖=顶点数-最大匹配)
    P3355 骑士共存问题 (最小割)
    P1251 餐巾计划 (网络流)
    P2765 魔术球问题 (网络流)
    P3381 [模板] 最小费用最大流
    P3376 [模板] 网络最大流
    P3384 [模板] 树链剖分
    BZOJ1009: [HNOI2008]GT考试 (矩阵快速幂 + DP)
  • 原文地址:https://www.cnblogs.com/lilili/p/6413310.html
Copyright © 2020-2023  润新知