• Apriori算法Python实现


    Apriori如果数据挖掘算法的头发模式挖掘鼻祖,从60年代开始流行,该算法非常简单朴素的思维。首先挖掘长度1频繁模式,然后k=2

    这些频繁模式的长度合并k频繁模式。计算它们的频繁的数目,并确保其充分k-1集长度为频繁,值是,为了避免反复。合并的时候。仅仅合并那些前k-2个字符都同样,而k-1的字符一边是少于还有一边的。

    下面是算法的Python实现:

    __author__ = 'linfuyuan'
    min_frequency = int(raw_input('please input min_frequency:'))
    file_name = raw_input('please input the transaction file:')
    transactions = []
    
    
    def has_infrequent_subset(candidate, Lk):
        for i in range(len(candidate)):
            subset = candidate[:-1]
            subset.sort()
            if not ''.join(subset) in Lk:
                return False
            lastitem = candidate.pop()
            candidate.insert(0, lastitem)
        return True
    
    
    def countFrequency(candidate, transactions):
        count = 0
        for transaction in transactions:
            if transaction.issuperset(candidate):
                count += 1
        return count
    
    
    with open(file_name) as f:
        for line in f.readlines():
            line = line.strip()
            tokens = line.split(',')
            if len(tokens) > 0:
                transaction = set(tokens)
                transactions.append(transaction)
    currentFrequencySet = {}
    for transaction in transactions:
        for item in transaction:
            time = currentFrequencySet.get(item, 0)
            currentFrequencySet[item] = time + 1
    Lk = set()
    for (itemset, count) in currentFrequencySet.items():
        if count >= min_frequency:
            Lk.add(itemset)
    print ', '.join(Lk)
    
    while len(Lk) > 0:
        newLk = set()
        for itemset1 in Lk:
            for itemset2 in Lk:
                cancombine = True
                for i in range(len(itemset1)):
                    if i < len(itemset1) - 1:
                        cancombine = itemset1[i] == itemset2[i]
                        if not cancombine:
                            break
                    else:
                        cancombine = itemset1[i] < itemset2[i]
                        if not cancombine:
                            break
                if cancombine:
                    newitemset = []
                    for char in itemset1:
                        newitemset.append(char)
                    newitemset.append(itemset2[-1])
                    if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:
                        newLk.add(''.join(newitemset))
        print ', '.join(newLk)
        Lk = newLk
    


    版权声明:本文博客原创文章。博客,未经同意,不得转载。

  • 相关阅读:
    经典sql面试题(学生表_课程表_成绩表_教师表)
    69道Spring面试题及答案
    Spring常见面试题
    Java基础面试题及答案(六)
    Java基础面试题及答案(五)
    maven工程,java代码加载resources下面资源文件的路径
    oracle的事务级别
    JMeter测试HBase
    JMeter测试clickhouse
    JMeter入门
  • 原文地址:https://www.cnblogs.com/mfrbuaa/p/4620279.html
Copyright © 2020-2023  润新知