关注了Q18~Q20三道编程作业题。这三道题都与Logistic Regression相关。
Q18~19是用全量梯度下降实现Logistic Regression;Q20要求用随机梯度下降实现Logistic Regression。
这三题的代码都何在一个py文件中了。个人觉得,这道题的程序设计,完全用了矢量化编程的路子;运行效率很快,迭代2000次,基本是1秒完成。
#encoding=utf8 import sys import numpy as np import math from random import * # read input data ( train or test ) def read_input_data(path): x = [] y = [] for line in open(path).readlines(): items = line.strip().split(' ') tmp_x = [] for i in range(0,len(items)-1): tmp_x.append(float(items[i])) x.append(tmp_x) y.append(float(items[-1])) return np.array(x),np.array(y) # calculate graident of Ein def calculate_gradient(w, x, y): s = np.dot(w, x.transpose())*y theta = 1.0/(1+np.exp(s)) gradient_all = theta.reshape(-1,1)*(-1)*y.reshape(-1,1)*x gradient_average = np.sum(gradient_all, axis=0) return gradient_average / gradient_all.shape[0] # update W combine with gradient result and learning rate (ita) def update_W(w, ita, gradient): return w - ita*gradient # test result def calculate_Eout(w, x, y): scores = 1/(1+np.exp((-1)*np.dot(w, x.transpose()))) predicts = np.where(scores>=0.5,1.0,-1.0) Eout = sum(predicts!=y) return (Eout*1.0) / predicts.shape[0] if __name__ == '__main__': # read train data x,y = read_input_data("train.dat") # add '1' column x = np.hstack((np.ones(x.shape[0]).reshape(-1,1),x)) ## fix learning rate gradient descent T1 = 2000 ita1 = 0.01 w1 = np.zeros(x.shape[1]) for i in range(0,T1): gradient = calculate_gradient(w1, x, y) w1 = update_W(w1, ita1, gradient) ## fix learning rate stochastic gradient descent T2 = 20 ita2 = 0.1 w2 = np.zeros(x.shape[1]) for i in range(0,T2): x_n = x[i%x.shape[0]] y_n = y[i%y.shape[0]] gradient = calculate_gradient(w2, x_n, y_n) w2 = update_W(w2, ita2, gradient) # test test_x,test_y = read_input_data("test.dat") test_x = np.hstack((np.ones(test_x.shape[0]).reshape(-1,1),test_x)) Eout1 = calculate_Eout(w1, test_x, test_y) Eout2 = calculate_Eout(w2, test_x, test_y) print Eout1 print Eout2
程序效率比较高,主要得益于python numpy非常便捷的各种矩阵操作运算。
通过这几道作业题
(1)熟悉了Logistic Regression的梯度求解公式:
(2)体会了learning rate的作用。如果learning rate很小,可能需要迭代很多次才能得到满意的结果(把Q18的迭代次数调整到20W次,可以达到0.22的error rate)
但是,之前的经验是,learning rate不敢选太大(一般0.01就挺大了)。learning rate这个真是技术活儿,跟算法有关,跟实际的数据也有关。