声明:如有侵权,请与本人联系,会尽快删除。
致谢:感谢网络中各位老师与前辈的分享,其中某些部分是在研读老师与前辈的文章之后才明白的,具体链接附在了对应代码附近。
KNN : 邻近算法,或者说K最近邻(kNN,k-NearestNeighbor)分类算法
图解
KNN算法中不使用循环的方法原理:完全平方公式
参考链接:https://blog.csdn.net/qq_27261889/article/details/84891734
代码实现
分三个文件:1.数据加载 2.距离计算 3.训练与测试
所用数据集:cifar-10-batches-py
1. 数据加载部分:data_utils.py
import pickle
import numpy as np
import os
def load_cifar_batch(filename):
with open(filename,'rb') as f :
datadict=pickle.load(f,encoding='bytes')
x=datadict[b'data'] # b表示以二进制的方式读取数据
y=datadict[b'labels']
x=x.reshape(10000,3,32,32).transpose(0,2,3,1).astype('float')
y=np.array(y)
return x,y
def load_cifar10(root):
xs=[]
ys=[]
for b in range(1,6):
f=os.path.join(root,'data_batch_%d' % (b,))
x,y=load_cifar_batch(f)
xs.append(x)
ys.append(y)
Xtrain=np.concatenate(xs) #1
Ytrain=np.concatenate(ys)
del x ,y
Xtest,Ytest=load_cifar_batch(os.path.join(root,'test_batch')) #2
return Xtrain,Ytrain,Xtest,Ytest
2. 预测与距离计算实现:knn.py (利用到了 python 的广播机制)
import numpy as np
class KNearestNeighbor:
def __init__(self):
pass
def train(self,X,y):
self.X_train=X
self.y_train=y
def predict(self,X,k=1,num_loops=0): #1
if num_loops== 0:
dists=self.compute_distances_no_loops(X)
elif num_loops==1:
dists=self.compute_distances_one_loop(X)
elif num_loops==2:
dists=self.compute_distances_two_loops(X)
else:
raise ValueError('Invalid value %d for num_loops' % num_loops)
return self.predict_labels(dists,k=k)
def cumpute_distances_two_loops(self,X):
num_test=X.shape[0]
num_train=self.X_train.shape[0]
# 建立一个num_test 行,num_train列的二维矩阵,
# 矩阵中i行j列表示:第i 个测试数据与第j个训练数据之间的距离
dists=np.zeros((num_test,num_train))
print(X.shape,self.X_train.shape)
for i in range(num_test):
for j in range(num_train):
dists[i,j]=np.sqrt(np.sum((X[i,:]-self.X_train[j,:])**2))
return dists
def compute_distances_one_loop(self,X):
num_test=X.shape[0]
num_train=self.X_train.shape[0]
dists=np.zeros((num_test,num_train))
for i in