• R语言-决策树


    C5.0

    > ###########################决策树
    > ########C5.0
    > setwd("/Users/yaozhilin/Downloads/R_edu/data")
    > orgdata<-read.csv("Allelectronics.csv")
    > summary(orgdata)
          age       income            student          credit_rating      buys_computer     
     Min.   :1   Length:14          Length:14          Length:14          Length:14         
     1st Qu.:1   Class :character   Class :character   Class :character   Class :character  
     Median :2   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
     Mean   :2                                                                              
     3rd Qu.:3                                                                              
     Max.   :3                                                                              
    > #C5.0只能处理分类变量
    > orgdata<-as.data.frame(lapply(orgdata[,1:5],as.factor))
    > library(C50)
    > #编辑决策树的控制数据
    > #minCases控制类的样本量,CF置信因子越大,模型越大,winnow是否筛选变量(做xy的相关检验)
    > tr<-C5.0Control(minCases = 1,CF=0.95,winnow = FALSE,noGlobalPruning = TRUE)
    > model<-C5.0(buys_computer~.,data = orgdata,trials=1,rules=FALSE,control=tr)
    > plot(model)

    > #生成规则
    > rules<-C5.0(buys_computer~.,data = orgdata,trials=1,rules=TRUE,control=tr)
    > summary(rules)
    
    Call:
    C5.0.formula(formula = buys_computer ~ ., data = orgdata, trials = 1, rules = TRUE,
     control = tr)
    
    
    C5.0 [Release 2.07 GPL Edition]      Wed Nov  4 15:12:38 2020
    -------------------------------
    
    Class specified by attribute `outcome'
    
    Read 14 cases (5 attributes) from undefined.data
    
    Rules:
    
    Rule 1: (3, lift 2.2)
        age = 1
        student = no
        ->  class no  [0.800]
    
    Rule 2: (2, lift 2.1)
        age = 3
        credit_rating = excellent
        ->  class no  [0.750]
    
    Rule 3: (4, lift 1.3)
        age = 2
        ->  class yes  [0.833]
    
    Rule 4: (3, lift 1.2)
        age = 3
        credit_rating = fair
        ->  class yes  [0.800]
    
    Rule 5: (7/1, lift 1.2)
        student = yes
        ->  class yes  [0.778]
    
    Default class: yes
    
    
    Evaluation on training data (14 cases):
    
                Rules     
          ----------------
            No      Errors
    
             5    1( 7.1%)   <<
    
    
           (a)   (b)    <-classified as
          ----  ----
             4     1    (a): class no
                   9    (b): class yes
    
    
        Attribute usage:
    
         85.71%    age
         71.43%    student
         35.71%    credit_rating
    
    
    Time: 0.0 secs

    cart方法

    ######cart实现决策树
    library(rpart)
    library(rpart.plot)
    #minsplit:每个节点中最小样本量,minbucket:节点中所含样本最小数,cp:复杂参数,通常是阈值。
    tc<-rpart.control(minsplit = 1,minbucket = 1,cp=0.001,maxdepth = 6,xval = 10)
    cmodel<-rpart(buys_computer~.,orgdata,parms = list(split="gini"),
                  method = "class",control = tc)
    rpart.plot(cmodel,branch=1,extra=106,under=TRUE,faclen=0,cex=0.8)

    #进行剪枝
    cmodel_p<-prune(cmodel,cp=0.3)
    rpart.plot(cmodel_p)

  • 相关阅读:
    Python split分割字符串
    test markdown
    Python 数字格式转换
    Python 字符串改变
    Python axis的含义
    python 第三方库
    Spark快速入门
    vim快捷键
    Hadoop HDFS负载均衡
    YARN DistributedShell源码分析与修改
  • 原文地址:https://www.cnblogs.com/ye20190812/p/13926992.html
Copyright © 2020-2023  润新知