工作原理:
给定一个训练数据集,对新的输入实例,在训练数据集中找到与该实例最邻近的K个实例(也就是上面所说的K个邻居), 这K个实例的多数属于某个类,就把该输入实例分类到这个类中。
代码实例:
kNN.py
from numpy import * import operator def createDataSet(): group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]]) labels= ['B','B','B','B'] return group,labels
>>> import kNN >>> group,labels = kNN.createDataSet() >>> group array([[ 1. , 1.1], [ 1. , 1. ], [ 0. , 0. ], [ 0. , 0.1]]) >>> labels ['B', 'B', 'B', 'B']
数据读取
矩阵第一维的长度 >>> group.shape[0] 4
矩阵操作:
>>> import numpy as np >>> a = np.matrix('1 2 7;3 4 8;0 3 10') >>> a.min(0) matrix([[0, 2, 7]]) >>> a.min(1) matrix([[1], [3], [0]]) min([axis, out]) :返回指定轴的最小值
复制数组
>>> from numpy import * >>> a=[0,1,2] >>> b=tile(a,2) >>> b array([0, 1, 2, 0, 1, 2]) >>> b=tile(a,(1,2)) >>> b array([[0, 1, 2, 0, 1, 2]]) >>> b=tile(a,(2,1)) >>> b array([[0, 1, 2], [0, 1, 2]]) >>> b=tile(a,(2,2)) >>> b array([[0, 1, 2, 0, 1, 2], [0, 1, 2, 0, 1, 2]])
构建数组
>>> zeros((3,4))
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
sum
>>> c = np.array([[1,2,3],[4,5,6],[7,8,9]] ) >>> print c.sum() 45 >>> print c.sum(axis=0) [12 15 18] >>> print c.sum(axis=1) [ 6 15 24]
argsort
返回从小到大排序的索引值
>>> x=np.array([3,1,2]) >>> np.argsort(x) array([1, 2, 0])
sorted
根据key排序
>>> students = [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10),] >>> from operator import itemgetter, attrgetter >>> sorted(students, key=itemgetter(2)) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] >>> sorted(students, key=itemgetter(1)) [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
get
获取列表键对应的值
>>> a = {1: 1, 3: 1} >>> a.get(1,0) 1 >>> a.get(3,0) 1 >>> a.get(4,0) 0