7.3 岭回归
7.3.1 验证多重共线性
7.3.2 岭回归理论
7.3.3 岭际分析
7.3.4 k值的判断
7.3.5 辅助函数
(1)导入多维数据集:加载数据集
def loadDataSet(filename): numFeat = len(open(filename).readline().split(' '))-1#get number of fields dataMat = [] labelMat = [] fr = open(filename) for line in fr.readlines(): lineArr = [] curLine = line.strip().split(' ') for i in range(numFeat): lineArr.append(float(curLine[i])) dataMat.append(lineArr) labelMat.append(float(curLine[-1])) return dataMat,labelMat
(2)标准化矩阵数据集
#标准化数据集 def normData(xArr,yArr): xMat = mat(xArr) yMat = mat(yArr).T yMean = mean(yMat,0) xMean = mean(xMat,0) ynorm = yMat - yMean xVar = var(xMat,0) xnorm = (xMat-xMean)/xVar return xnorm,ynorm
(3)绘制图形
def scatterplot(wMat,k):#绘制图形 fig = plt.figure() ax = fig.add_subplot(111) wMatT = wMat.T m,n = shape(wMatT) for i in xrange(m): ax.plot(k,wMatT[i,:]) ax.annotate("feature["+str(i)+"]",xy = (0,wMatT[i,0]),color = 'black') plt.show()
7.3.6 岭回归实现与K值确定
#前8列为Arr,后1列为yArr xArr,yArr = loadDataSet('abalone.txt') xMat,yMat = normData(xArr,yArr) #标准化数据集 Knum = 30 #确定k的迭代次数 wMat = zeros((Knum,shape(xMat)[1])) klist = zeros((Knum,1)) for i in xrange(Knum): k = float(i)/500 #算法的目的是确定k的值 klist[i] = k #k值列表 xTx = xMat.T*xMat denom = xTx + eye(shape(xMat)[1])*k if linalg.det(denom) == 0.0: print "This matrix is singular,connot do inverse" sys.exit(0) ws = linalg.inv(denom) * (xMat.T*yMat) wMat[i,:] = ws.T print klist scatterplot(klist,klist) scatterplot(wMat,klist)
参考资料:郑捷《机器学习算法原理与编程实践》 仅供学习研究