• Apriori算法实例----Weka,R, Using Weka in my javacode


    学习数据挖掘工具中,下面使用4种工具来对同一个数据集进行研究。

    数据描述:下面这些数据是15个同学选修课程情况,在课程大纲中共有10门课程供学生选择,下面给出具体的选课情况,以ARFF数据文件保存,名称为TestStudenti.arff。我使用Apriori算法期望挖掘出学生选课的关联规则。

    @relation test_studenti


    @attribute Arbori_binari_de_cautare {TRUE, FALSE}
    @attribute Arbori_optimali {TRUE, FALSE}
    @attribute Arbori_echilibrati_in_inaltime {TRUE, FALSE}
    @attribute Arbori_Splay {TRUE, FALSE}
    @attribute Arbori_rosu_negru {TRUE, FALSE}
    @attribute Arbori_2_3 {TRUE, FALSE}
    @attribute Arbori_B {TRUE, FALSE}
    @attribute Arbori_TRIE {TRUE, FALSE}
    @attribute Sortare_topologica {TRUE, FALSE}
    @attribute Algoritmul_Dijkstra {TRUE, FALSE}


    @data
    TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE
    TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE
    FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE
    FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
    TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
    TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE
    FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
    TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,FALSE
    FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE
    TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
    FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE
    TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
    FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
    TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE
    TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE

    (一) Weka 使用实例

    在Apriori算法中,设置minSupprot=50%, 最小置信度 minimum confidence 也设置为50%。Weka配置路径为 Explore-》Openfile(TestStudenti.arff)->Associate 点击配置参数信息

    在算法完成之后,我们得到以下结果:

    Best rules found:

    1. Sortare_topologica=FALSE 13 ==> Arbori_TRIE=TRUE 13 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    2. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    3. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    4. Arbori_optimali=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
    5. Arbori_echilibrati_in_inaltime=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    6. Arbori_optimali=TRUE Sortare_topologica=FALSE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    7. Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
    8. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
    9. Arbori_binari_de_cautare=TRUE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    10. Arbori_B=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    11. Arbori_rosu_negru=TRUE Sortare_topologica=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    12. Arbori_TRIE=TRUE 15 ==> Sortare_topologica=FALSE 13 <conf:(0.87)> lift:(1) lev:(0) [0] conv:(0.67)
    13. Arbori_rosu_negru=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
    14. Arbori_rosu_negru=TRUE Arbori_TRIE=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
    15. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
    16. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
    17. Arbori_TRIE=TRUE Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
    18. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
    19. Arbori_TRIE=TRUE 15 ==> Arbori_rosu_negru=TRUE 11 <conf:(0.73)> lift:(1) lev:(0) [0] conv:(0.8)
    20. Sortare_topologica=FALSE 13 ==> Arbori_rosu_negru=TRUE 9 <conf:(0.69)> lift:(0.94) lev:(-0.04) [0] conv:(0.69)

    分析第一条结果,我们可以得出关联规则: 如果一个学生不参加Sortare topologica 课程,那么他的一个趋向是肯定不会参加 Arbori TRIE课程。这条关联规则的置信度是100%,是非常可信的。

    (二) Using Weka in my Javacode 

    展示Java代码,运行程序可以得到和上面一样的结果

    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import weka.associations.Apriori;
    import weka.core.Instances;
    public class Main{
      public static void main(String[] args) {
        Instances data = null;
        try {
        BufferedReader reader = new BufferedReader( new FileReader( "TestStudenti.arff" ) );
        data = new Instances(reader);
        reader.close();
        data.setClassIndex(data.numAttributes() - 1);
      }
      catch ( IOException e ) {
        e.printStackTrace();
      }
      double deltaValue = 0.05;
      double lowerBoundMinSupportValue = 0.1;
      double minMetricValue = 0.5;
      int numRulesValue = 20;
      double upperBoundMinSupportValue = 1.0;
      String resultapriori;
      Apriori apriori = new Apriori();
      apriori.setDelta(deltaValue);
      apriori.setLowerBoundMinSupport(lowerBoundMinSupportValue);
      apriori.setNumRules(numRulesValue);
      apriori.setUpperBoundMinSupport(upperBoundMinSupportValue);
      apriori.setMinMetric(minMetricValue);
      try{
        apriori.buildAssociations( data );
      }
      catch ( Exception e ) {
        e.printStackTrace();
      }
      resultapriori = apriori.toString();
      System.out.println(resultapriori);

      }

    }

    (三) Using Weka in R

    程序很简单,仅仅三行代码搞定。

    library(RWeka);
    data <- read.arff("D:/test.studenti.arff");
    Apriori(data,control=Weka_control(N=20,T =0,C =0.5,D =0.05, U= 1.0,M =0.5, S =-1.0, c =-1))

    运行结果:

    Apriori
    =======

    Minimum support: 0.6 (9 instances)
    Minimum metric <confidence>: 0.5
    Number of cycles performed: 8

    Generated sets of large itemsets:

    Size of set of large itemsets L(1): 7

    Size of set of large itemsets L(2): 8

    Size of set of large itemsets L(3): 2

    Best rules found:

    1. Sortare_topologica=FALSE 13 ==> Arbori_TRIE=TRUE 13 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    2. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    3. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    4. Arbori_optimali=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
    5. Arbori_echilibrati_in_inaltime=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    6. Arbori_optimali=TRUE Sortare_topologica=FALSE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    7. Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
    8. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
    9. Arbori_binari_de_cautare=TRUE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    10. Arbori_B=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    11. Arbori_rosu_negru=TRUE Sortare_topologica=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
    12. Arbori_TRIE=TRUE 15 ==> Sortare_topologica=FALSE 13 <conf:(0.87)> lift:(1) lev:(0) [0] conv:(0.67)
    13. Arbori_rosu_negru=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
    14. Arbori_rosu_negru=TRUE Arbori_TRIE=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
    15. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
    16. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
    17. Arbori_TRIE=TRUE Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
    18. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
    19. Arbori_TRIE=TRUE 15 ==> Arbori_rosu_negru=TRUE 11 <conf:(0.73)> lift:(1) lev:(0) [0] conv:(0.8)
    20. Sortare_topologica=FALSE 13 ==> Arbori_rosu_negru=TRUE 9 <conf:(0.69)> lift:(0.94) lev:(-0.04) [0] conv:(0.69)

  • 相关阅读:
    基于visual Studio2013解决C语言竞赛题之1054抽牌游戏
    基于visual Studio2013解决C语言竞赛题之1053洗牌
    基于visual Studio2013解决C语言竞赛题之1052求根
    CSS样式
    CSS引入方式,高级选择器
    CSS基础,选择器
    html基础
    sql server链接查询
    [置顶]动态网页开发基础【笔记】
    sql server链接查询
  • 原文地址:https://www.cnblogs.com/7899-89/p/3551418.html
Copyright © 2020-2023  润新知