1.将代码拷贝到F:studioMachineLearningInActionch14下
2.启动ipython
3.在ipython中改变工作目录到F:studioMachineLearningInActionch14
In [17]: cd F:\studio\MachineLearningInAction\ch14
F:studioMachineLearningInActionch14
4.在工作目录下新建一个svdRec.py文件并加入如下代码:
from numpy import * from numpy import linalg as la def loadExData(): return[[0, 0, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1], [1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [5, 5, 5, 0, 0], [1, 1, 1, 0, 0]]
5.进行SVD分解并验证分解结果:
In [18]: import svdRec In [19]: Data=svdRec.loadExData() In [20]: U,Sigma,VT=linalg.svd(Data) In [21]: Sigma Out[21]: array([ 9.64365076e+00, 5.29150262e+00, 9.11145502e-16, 1.40456183e-16, 3.09084552e-17]) In [22]: Sig2=mat([[Sigma[0],0],[0,Sigma[2]]]) In [23]: Sig2 Out[23]: matrix([[ 9.64365076e+00, 0.00000000e+00], [ 0.00000000e+00, 9.11145502e-16]]) In [24]: Sig2=mat([[Sigma[0],0],[0,Sigma[1]]]) In [25]: Sig2 Out[25]: matrix([[ 9.64365076, 0. ], [ 0. , 5.29150262]]) In [26]: U[:,:2]*Sig2*VT[:2,:] Out[26]: matrix([[ -1.36157966e-16, -8.59140046e-16, -8.59140046e-16, 2.00000000e+00, 2.00000000e+00], [ 7.22982080e-16, -3.61491040e-16, -3.61491040e-16, 3.00000000e+00, 3.00000000e+00], [ 2.40994027e-16, -1.20497013e-16, -1.20497013e-16, 1.00000000e+00, 1.00000000e+00], [ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, -8.60707644e-18, -8.60707644e-18], [ 2.00000000e+00, 2.00000000e+00, 2.00000000e+00, -1.72141529e-17, -1.72141529e-17], [ 5.00000000e+00, 5.00000000e+00, 5.00000000e+00, -1.39716789e-16, -1.39716789e-16], [ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, -8.60707644e-18, -8.60707644e-18]])
可以看出,U[:,:2]*Sig2*VT[:2,:]是对原来的Data矩阵的一个非常好的近似。
6.在svdRec.py中加入如下代码:
def ecludSim(inA,inB): return 1.0/(1.0 + la.norm(inA - inB)) def pearsSim(inA,inB): if len(inA) < 3 : return 1.0 return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1] def cosSim(inA,inB): num = float(inA.T*inB) denom = la.norm(inA)*la.norm(inB) return 0.5+0.5*(num/denom)
上述代码定义了三种不同的相似度量
7.利用朴素的基于相似度的推荐方法建议推荐结果
In [44]: reload(svdRec) Out[44]: <module 'svdRec' from 'svdRec.py'> In [45]: myMat=mat(svdRec.loadExData()) In [46]: myMat Out[46]: matrix([[0, 0, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1], [1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [5, 5, 5, 0, 0], [1, 1, 1, 0, 0]]) In [47]: myMat[0,1]=myMat[0,0]=myMat[1,0]=myMat[2,0]=4 In [48]: myMat[3,3]=2 In [49]: myMat Out[49]: matrix([[4, 4, 0, 2, 2], [4, 0, 0, 3, 3], [4, 0, 0, 1, 1], [1, 1, 1, 2, 0], [2, 2, 2, 0, 0], [5, 5, 5, 0, 0], [1, 1, 1, 0, 0]]) In [50]: svdRec.recommend(myMat,2) the 1 and 0 similarity is: 1.000000 the 1 and 3 similarity is: 0.928746 the 1 and 4 similarity is: 1.000000 the 2 and 0 similarity is: 1.000000 the 2 and 3 similarity is: 1.000000 the 2 and 4 similarity is: 0.000000 Out[50]: [(2, 2.5), (1, 2.0243290220056256)] In [53]: svdRec.recommend(myMat,2,simMeas=svdRec.ecludSim) the 1 and 0 similarity is: 1.000000 the 1 and 3 similarity is: 0.309017 the 1 and 4 similarity is: 0.333333 the 2 and 0 similarity is: 1.000000 the 2 and 3 similarity is: 0.500000 the 2 and 4 similarity is: 0.000000 Out[53]: [(2, 3.0), (1, 2.8266504712098603)] In [54]: svdRec.recommend(myMat,2,simMeas=svdRec.pearsSim) the 1 and 0 similarity is: 1.000000 the 1 and 3 similarity is: 1.000000 the 1 and 4 similarity is: 1.000000 the 2 and 0 similarity is: 1.000000 the 2 and 3 similarity is: 1.000000 the 2 and 4 similarity is: 0.000000 Out[54]: [(2, 2.5), (1, 2.0)]
8.利用SVD提高推荐的效果
在svdRec代码中个加入如下代码:
def loadExData2(): return[[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5], [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3], [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0], [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0], [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0], [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0], [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1], [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4], [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2], [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0], [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]]
上面的矩阵比较稀疏。现在计算该矩阵进行SVD分解需要多少维特征
In [57]: reload(svdRec) Out[57]: <module 'svdRec' from 'svdRec.py'> In [58]: U,Sigma,VT=la.svd(mat(svdRec.loadExData2())) In [59]: Sigma Out[59]: array([ 15.77075346, 11.40670395, 11.03044558, 4.84639758, 3.09292055, 2.58097379, 1.00413543, 0.72817072, 0.43800353, 0.22082113, 0.07367823]) In [60]: mat(svdRec.loadExData2()) Out[60]: matrix([[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5], [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3], [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0], [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0], [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0], [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0], [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1], [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4], [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2], [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0], [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]]) In [61]: Sig2=Sigma**2 In [62]: Sig2 Out[62]: array([ 2.48716665e+02, 1.30112895e+02, 1.21670730e+02, 2.34875695e+01, 9.56615756e+00, 6.66142570e+00, 1.00828796e+00, 5.30232598e-01, 1.91847092e-01, 4.87619735e-02, 5.42848136e-03]) In [63]: sum(Sig2) Out[63]: 541.99999999999955 In [64]: sum(Sig2)*0.9 Out[64]: 487.79999999999961 In [65]: sum(Sig2[:2]) Out[65]: 378.82955951135784 In [66]: sum(Sig2[:3]) Out[66]: 500.50028912757921
9.基于SVD进行评分:
在svdRec中加入如下代码:
def svdEst(dataMat, user, simMeas, item): n = shape(dataMat)[1] simTotal = 0.0; ratSimTotal = 0.0 U,Sigma,VT = la.svd(dataMat) Sig4 = mat(eye(4)*Sigma[:4]) #arrange Sig4 into a diagonal matrix xformedItems = dataMat.T * U[:,:4] * Sig4.I #create transformed items for j in range(n): userRating = dataMat[user,j] if userRating == 0 or j==item: continue similarity = simMeas(xformedItems[item,:].T, xformedItems[j,:].T) print 'the %d and %d similarity is: %f' % (item, j, similarity) simTotal += similarity ratSimTotal += similarity * userRating if simTotal == 0: return 0 else: return ratSimTotal/simTotal
它定义了基于SVD的相似度评分
10.测试效果
In [69]: myMat=mat(svdRec.loadExData2()) In [70]: svdRec.recommend(myMat,1,estMethod=svdRec.svdEst) the 0 and 3 similarity is: 0.490950 the 0 and 5 similarity is: 0.484274 the 0 and 10 similarity is: 0.512755 the 1 and 3 similarity is: 0.491294 the 1 and 5 similarity is: 0.481516 the 1 and 10 similarity is: 0.509709 the 2 and 3 similarity is: 0.491573 the 2 and 5 similarity is: 0.482346 the 2 and 10 similarity is: 0.510584 the 4 and 3 similarity is: 0.450495 the 4 and 5 similarity is: 0.506795 the 4 and 10 similarity is: 0.512896 the 6 and 3 similarity is: 0.743699 the 6 and 5 similarity is: 0.468366 the 6 and 10 similarity is: 0.439465 the 7 and 3 similarity is: 0.482175 the 7 and 5 similarity is: 0.494716 the 7 and 10 similarity is: 0.524970 the 8 and 3 similarity is: 0.491307 the 8 and 5 similarity is: 0.491228 the 8 and 10 similarity is: 0.520290 the 9 and 3 similarity is: 0.522379 the 9 and 5 similarity is: 0.496130 the 9 and 10 similarity is: 0.493617 Out[70]: [(4, 3.3447149384692283), (7, 3.3294020724526971), (9, 3.3281008763900695)] In [71]: svdRec.recommend(myMat,1,estMethod=svdRec.svdEst,simMeas=svdRec.pearsSim) the 0 and 3 similarity is: 0.341942 the 0 and 5 similarity is: 0.124132 the 0 and 10 similarity is: 0.116698 the 1 and 3 similarity is: 0.345560 the 1 and 5 similarity is: 0.126456 the 1 and 10 similarity is: 0.118892 the 2 and 3 similarity is: 0.345149 the 2 and 5 similarity is: 0.126190 the 2 and 10 similarity is: 0.118640 the 4 and 3 similarity is: 0.450126 the 4 and 5 similarity is: 0.528504 the 4 and 10 similarity is: 0.544647 the 6 and 3 similarity is: 0.923822 the 6 and 5 similarity is: 0.724840 the 6 and 10 similarity is: 0.710896 the 7 and 3 similarity is: 0.319482 the 7 and 5 similarity is: 0.118324 the 7 and 10 similarity is: 0.113370 the 8 and 3 similarity is: 0.334910 the 8 and 5 similarity is: 0.119673 the 8 and 10 similarity is: 0.112497 the 9 and 3 similarity is: 0.566918 the 9 and 5 similarity is: 0.590049 the 9 and 10 similarity is: 0.602380 Out[71]: [(4, 3.3469521867021736), (9, 3.3353796573274703), (6, 3.307193027813037)] In [72]: