• R语言-逻辑回归


    > ###############逻辑回归
    > setwd("/Users/yaozhilin/Downloads/R_edu/data")
    > accepts<-read.csv("accepts.csv")
    > names(accepts)
     [1] "application_id" "account_number" "bad_ind"        "vehicle_year"   "vehicle_make"  
     [6] "bankruptcy_ind" "tot_derog"      "tot_tr"         "age_oldest_tr"  "tot_open_tr"   
    [11] "tot_rev_tr"     "tot_rev_debt"   "tot_rev_line"   "rev_util"       "fico_score"    
    [16] "purch_price"    "msrp"           "down_pyt"       "loan_term"      "loan_amt"      
    [21] "ltv"            "tot_income"     "veh_mileage"    "used_ind"      
    > accepts<-accepts[complete.cases(accepts),]
    > select<-sample(1:nrow(accepts),length(accepts$application_id)*0.7)
    > train<-accepts[select,]###70%用于建模
    > test<-accepts[-select,]###30%用于检测
    > attach(train)
    > ###用glm(y~x,family=binomial(link="logit"))
    > gl<-glm(bad_ind~fico_score,family=binomial(link = "logit"))
    > summary(gl)
    
    Call:
    glm(formula = bad_ind ~ fico_score, family = binomial(link = "logit"))
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -2.0794  -0.6790  -0.4937  -0.3073   2.6028  
    
    Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept)  9.049667   0.629120   14.38   <2e-16 ***
    fico_score  -0.015407   0.000938  -16.43   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 2989.2  on 3046  degrees of freedom
    Residual deviance: 2665.9  on 3045  degrees of freedom
    AIC: 2669.9
    
    Number of Fisher Scoring iterations: 5

    多元逻辑回归

    > ###多元逻辑回归
    > gls<-glm(bad_ind~fico_score+bankruptcy_ind+age_oldest_tr+
    +            tot_derog+rev_util+veh_mileage,family = binomial(link = "logit"))
    > summary(gls)
    
    Call:
    glm(formula = bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + 
        tot_derog + rev_util + veh_mileage, family = binomial(link = "logit"))
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -2.2646  -0.6743  -0.4647  -0.2630   2.8177  
    
    Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
    (Intercept)      8.205e+00  7.433e-01  11.039  < 2e-16 ***
    fico_score      -1.338e-02  1.092e-03 -12.260  < 2e-16 ***
    bankruptcy_indY -3.771e-01  1.855e-01  -2.033   0.0421 *  
    age_oldest_tr   -4.458e-03  6.375e-04  -6.994 2.68e-12 ***
    tot_derog        3.012e-02  1.552e-02   1.941   0.0523 .  
    rev_util         3.763e-04  5.252e-04   0.717   0.4737    
    veh_mileage      2.466e-06  1.381e-06   1.786   0.0741 .  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 2989.2  on 3046  degrees of freedom
    Residual deviance: 2601.4  on 3040  degrees of freedom
    AIC: 2615.4
    
    Number of Fisher Scoring iterations: 5
    
    > glss<-step(gls,direction = "both")
    Start:  AIC=2615.35
    bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + tot_derog + 
        rev_util + veh_mileage
    
                     Df Deviance    AIC
    - rev_util        1   2601.9 2613.9
    <none>                2601.3 2615.3
    - veh_mileage     1   2604.4 2616.4
    - tot_derog       1   2605.1 2617.1
    - bankruptcy_ind  1   2605.7 2617.7
    - age_oldest_tr   1   2655.9 2667.9
    - fico_score      1   2763.8 2775.8
    
    Step:  AIC=2613.88
    bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + tot_derog + 
        veh_mileage
    
                     Df Deviance    AIC
    <none>                2601.9 2613.9
    - veh_mileage     1   2604.9 2614.9
    + rev_util        1   2601.3 2615.3
    - tot_derog       1   2605.7 2615.7
    - bankruptcy_ind  1   2606.1 2616.1
    - age_oldest_tr   1   2656.9 2666.9
    - fico_score      1   2773.2 2783.2
    > #出来的数据是logit,我们需要转换
    > train$pre<-predict(glss,train)
    > #出来的数据是logit,我们需要转换
    > train$pre<-predict(glss,train)
    > summary(train$pre)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     -4.868  -2.421  -1.671  -1.713  -1.011   2.497 
    > train$pre_p<-1/(1+exp(-1*train$pre))
    > summary(train$pre_p)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.00763 0.08157 0.15823 0.19298 0.26677 0.92395 
    1 > #逻辑回归不需要检测扰动项,但需要检测共线性
    2 > library(car)
    3 > vif(glss)
    4     fico_score bankruptcy_ind  age_oldest_tr      tot_derog    veh_mileage 
    5       1.271283       1.144846       1.075603       1.423850       1.003616 
  • 相关阅读:
    使用uwsgi --http :80 --wsgi-file test.py 在浏览器上无法访问(头疼了我一天)
    linux部署django启动项目报错误
    linux python3使用最新sqlite3版本
    linux上部署python本地开发环境
    Linux下安装Python3.9.0
    python上传图片到本地
    Python:手机号码验证
    PHP 自带的加密解密函数
    html中或者app中在线预览word文档,PDF,PPT
    Python 列表解析
  • 原文地址:https://www.cnblogs.com/ye20190812/p/13925635.html
Copyright © 2020-2023  润新知