• 机器学习100天-day4,5,6,8逻辑回归


    机器学习100天-day4,5,6,8逻辑回归

    在这里插入图片描述

     一,数据导入

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    dataset = pd.read_csv('D:\100DaysdatasetsSocial_Network_Ads.csv')
    #print(dataset.head(5))
        User ID  Gender  Age  EstimatedSalary  Purchased
    0  15624510    Male   19            19000          0
    1  15810944    Male   35            20000          0
    2  15668575  Female   26            43000          0
    3  15603246  Female   27            57000          0
    4  15804002    Male   19            76000          0

    将类别变量转为哑变量

    dataset = pd.get_dummies(dataset,columns=['Gender'])
    print(dataset.head())
        User ID  Age  EstimatedSalary  Purchased  Gender_Female  Gender_Male
    0  15624510   19            19000          0              0            1
    1  15810944   35            20000          0              0            1
    2  15668575   26            43000          0              1            0

    检测是否有nan值

    print(dataset.isnull().sum())
    User ID            0
    Age                0
    EstimatedSalary    0
    Purchased          0
    Gender_Female      0
    Gender_Male        0
    dtype: int64

     划分数据集

    #划分数据集
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    X = dataset[['Age','EstimatedSalary','Gender_Female','Gender_Male']]
    ss = StandardScaler()
    X = ss.fit_transform(X)
    Y = dataset['Purchased']
    X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=0)

    将X的数据进行归一化处理 

    二,逻辑回归模型

    from sklearn.linear_model import LogisticRegression
    logistic = LogisticRegression()
    logistic.fit(X_train,Y_train)
    y_pred = logistic.predict(X_test)

    三,评估预测

     生成混淆矩阵

    from sklearn import metrics
    cm = metrics.confusion_matrix(Y_test,y_pred)
    print(cm)
    print(metrics.accuracy_score(Y_test,y_pred))
    [[65  3]
     [ 6 26]]
    0.91

    混淆矩阵(confusion matrix)是机器学习尤其是统计分类中常用的用以判断分类好坏的方法,如下:

    TP(True Positive): 真实为0,预测也为0

    FN(False Negative): 真实为0,预测为1

    FP(False Positive): 真实为1,预测为0

    TN(True Negative): 真实为0,预测也为0

     矩阵:

    总体准确率:

     由此可理解示例中混淆矩阵和准确率的含义

    四、逻辑回归详解-day8

     推荐阅读文章

    翻译,https://blog.csdn.net/Neuf_Soleil/article/details/81712097,链接里有原文链接

     

  • 相关阅读:
    Spring中Quartz的配置
    通用表格打印1
    使用Lucene.Net实现全文检索
    DIV CSS 网页兼容全搞定 (IE6 IE7 IE8 IE9 火狐 谷歌)
    Code128 条码生成
    URL参数Base64解密和解密
    JQuery EasyUI 中文API
    linux 下mysql命令 (授权用户 和 基本操作)
    python操作MySQL数据库
    GridView分组,统计,排序的解决方案
  • 原文地址:https://www.cnblogs.com/1113127139aaa/p/10273807.html
Copyright © 2020-2023  润新知