K近邻分类算法实现 in Python

K近邻分类算法实现 in Python
K近邻（KNN）：分类算法

* KNN是non-parametric分类器（不做分布形式的假设，直接从数据估计概率密度），是memory-based learning.

* KNN不适用于高维数据（curse of dimension）

* Machine Learning的Python库很多，比如mlpy（更多packages），这里实现只是为了掌握方法

* MATLAB 中的调用，见《MATLAB分类器大全(svm,knn,随机森林等)》

* KNN算法复杂度高（可用KD树优化，C中可以用libkdtree或者ANN）

* k越小越容易过拟合，但是k很大会降分类精度（设想极限情况：k=1和k=N(样本数)）

本文不介绍理论了，注释见代码。

KNN.py
[python] view plain copy
1. from numpy import *
2. import operator
4. class KNN:
5. def createDataset(self):
6. group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
7. labels = ['A','A','B','B']
8. return group,labels
10. def KnnClassify(self,testX,trainX,labels,K):
11. [N,M]=trainX.shape
13. #calculate the distance between testX and other training samples
14. difference = tile(testX,(N,1)) - trainX # tile for array and repeat for matrix in Python, == repmat in Matlab
15. difference = difference ** 2 # take pow(difference,2)
16. distance = difference.sum(1) # take the sum of difference from all dimensions
17. distance = distance ** 0.5
18. sortdiffidx = distance.argsort()
20. # find the k nearest neighbours
21. vote = {} #create the dictionary
22. for i in range(K):
23. ith_label = labels[sortdiffidx[i]];
24. vote[ith_label] = vote.get(ith_label,0)+1 #get(ith_label,0) : if dictionary 'vote' exist key 'ith_label', return vote[ith_label]; else return 0
25. sortedvote = sorted(vote.iteritems(),key = lambda x:x[1], reverse = True)
26. # 'key = lambda x: x[1]' can be substituted by operator.itemgetter(1)
27. return sortedvote[0][0]
29. k = KNN() #create KNN object
30. group,labels = k.createDataset()
31. cls = k.KnnClassify([0,0],group,labels,3)
32. print cls
-------------------
运行：

1. 在Python Shell 中可以运行KNN.py

>>>import os

>>>os.chdir("/Users/mba/Documents/Study/Machine_Learning/Python/KNN")

>>>execfile("KNN.py")

输出B

（B表示类别）

2. 或者terminal中直接运行

$ python KNN.py

3. 也可以不在KNN.py中写输出，而选择在Shell中获得结果，i.e.,

>>>import KNN

>>> KNN.k.KnnClassify([0,0],KNN.group,KNN.labels,3)

from: http://blog.csdn.net/abcjennifer/article/details/19757987
相关阅读:
bzoj1691 [Usaco2007 Dec]挑剔的美食家
 cf493D Vasya and Chess
cf493C Vasya and Basketball
cf493B Vasya and Wrestling
cf493A Vasya and Football
bzoj1106 [POI2007]立方体大作战tet
bzoj1537 [POI2005]Aut- The Bus
bzoj1103 [POI2007]大都市meg
bzoj1935 [Shoi2007]Tree 园丁的烦恼
 poj2299 Ultra-QuickSort
原文地址：https://www.cnblogs.com/GarfieldEr007/p/5354722.html