• k均值聚类


    k均值聚类

    1)从N个文档随机选取K个文档作为质心
    2)对剩余的每个文档测量其到每个质心的距离,并把它归到最近的质心的类
    3)重新计算已经得到的各个类的质心
    4)迭代2~3步直至新的质心与原质心相等或小于指定阈值,算法结束

    k均值聚类python代码实现:

    def kcluster(rows,distance=pearson,k=4):
      # Determine the minimum and maximum values for each point
      ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) 
      for i in range(len(rows[0]))]
      print "ranges",ranges[0]
      print "ranges",ranges[1]
      # Create k randomly placed centroids
      clusters=[[random.random()*(ranges[i][1]-ranges[i][0])+ranges[i][0] 
      for i in range(len(rows[0]))] for j in range(k)]
      
      lastmatches=None
      for t in range(100):
        print 'Iteration %d' % t
        bestmatches=[[] for i in range(k)]
        
        # Find which centroid is the closest for each row
        for j in range(len(rows)):
          row=rows[j]
          bestmatch=0
          for i in range(k):
            d=distance(clusters[i],row)
            if d<distance(clusters[bestmatch],row): bestmatch=i
          bestmatches[bestmatch].append(j)
    
        # If the results are the same as last time, this is complete
        if bestmatches==lastmatches: break
        lastmatches=bestmatches
        
        # Move the centroids to the average of their members
        for i in range(k):
          avgs=[0.0]*len(rows[0])
          if len(bestmatches[i])>0:
            for rowid in bestmatches[i]:
              for m in range(len(rows[rowid])):
                avgs[m]+=rows[rowid][m]
            for j in range(len(avgs)):
              avgs[j]/=len(bestmatches[i])
            clusters[i]=avgs
          
      return bestmatches
  • 相关阅读:
    ul前面有40px的距离怎么办
    JQuey中 attr('checked', true)设置状态只有第一次有用
    只有一个RADIO的单选框如何在选中后取消选中
    为Table中的thead加上边框
    ADB 无线连接设备
    面试准备的内容
    蓝牙MESH相关代码
    怎样重构代码
    safari 调试iPhone web页面
    Appium1.6.4 真机运行ios10.3.1 填坑记
  • 原文地址:https://www.cnblogs.com/huanhuanang/p/5253055.html
Copyright © 2020-2023  润新知