可以参考:cs231n assignment1 SVM 完整代码
231n作业 多类 SVM 的损失函数及其梯度计算(最好)https://blog.csdn.net/NODIECANFLY/article/details/82927119 (也不错)
完成梯度计算的循环形式的代码 svm_loss_naive
完成向量化梯度计算的代码 svm_loss_vectorized
完成随机梯度下降法的代码,在linear_classifier.py文件的 SGDLinearClassifier.train()函数中
def svm_loss_naive(W, X, y, reg): """ Structured SVM loss function, naive implementation (with loops). Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """ dW = np.zeros(W.shape) # initialize the gradient as zero # compute the loss and the gradient num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in range(num_train): scores = X[i].dot(W) correct_class_score = scores[y[i]] for j in range(num_classes): if j == y[i]: continue margin = scores[j] - correct_class_score + 1 # note delta = 1 if margin > 0: loss += margin dW[:,j]+=X[i] #数据分类错误时的梯度 dW[:,y[i]]-=X[i] #数据分类正确时的梯度,所有非正确的累减 # Right now the loss is a sum over all training examples, but we want it # to be an average instead so we divide by num_train. loss /= num_train dW /=num_train # 加上正则项的部分:reg? loss += reg * np.sum(W * W) dW+=reg*np.sum(W) #reg是正则化强度的量 ############################################################################# # TODO: # # Compute the gradient of the loss function and store it dW. # # Rather that first computing the loss and then computing the derivative, # # it may be simpler to compute the derivative at the same time that the # # loss is being computed. As a result you may need to modify some of the # # code above to compute the gradient. # ############################################################################# return loss, dW
通常我们会将所有图片,包括训练数据和待分类数据,减去图片每个位置像素的均值,使得数据中心化,这样可以提高模型的效果。同时,也可以对中心化后的数据归一化处理,使其分布在[-1, 1]区间,进一步优化模型效果。
小批量数据梯度下降(Mini-batch gradient descent)
def svm_loss_vectorized(W, X, y, reg): """ Structured SVM loss function, vectorized implementation. #计算向量化计算损失和梯度 Inputs:输入都为numpy array - W: 形状为(D, C)的权重矩阵,3073x10 - X: 形状为(N, D)的小批量数据,200x3073 - y: 形状为(N,)的标签向量,200,由以上关系可得XW为所有样本的得分,第i行为样本i的得分,XW的第ij个元素 即为样本i在第j个分类的得分; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) 正则化损失系数(无法通过拍脑袋设定,需要多试几个值,交叉验证,然后找个最优的) 变量说明: delta:超参数,SVM的间隔,一般取1 """ num_train=X.shape[0] #X形状的第0个参数,由说明可知是N为小批量样本数200 num_classes=X.shape[1] loss = 0.0 dW = np.zeros(W.shape) # 零初始化的梯度矩阵DxC delta=1 """ patch_X=X # 200X3073 patch_y=y # 200 """ ############################################################################# # TODO: 计算loss # # Implement a vectorized version of the structured SVM loss, storing the # # result in loss. # ############################################################################# scores=X.dot(W) #所有样本的得分情况200x10(NxC) scores_y=scores[range(num_train),y] # 其中num_train和y的size都是N,每一个样本正确分类的得分(1xN)向量 scores_y=np.reshape(scores_y,(num_train,1)) #得到Nx1的矩阵,便于下面广播 margins =np.maximum(0,scores-scores_y+delta) #Nxc矩阵,第i行的C个元素为样本i对于第j类的hinge loss margins[range(num_train),y]=0 #将label所在误差置0 loss+=np.sum(margins)/num_train #计算损失data loss 部分 print(loss) loss+= 0.5*reg * np.sum(W * W) #加上正则项 0.5为了平衡W平方求导后的2 ############################################################################# # END OF YOUR CODE # ############################################################################# ############################################################################# # TODO: 计算梯度dW # # Implement a vectorized version of the gradient for the structured SVM # # loss, storing the result in dW. # # # # Hint: Instead of computing the gradient from scratch, it may be easier # # to reuse some of the intermediate values that you used to compute the # # loss. # ############################################################################# margins[margins>0]=1.0 #取大于0的为1,便于计算梯度 row_sum=-np.sum(margins,1) #对每一行不为0的margins求和,用来计算正确分类项的梯度,即该列为减去所有非正确项的特征累加 margins[range(num_train),y] =row_sum #N X C直接更改 margins正确部分的值 dW+=np.dot(X.T,margins)/num_train #x为NxD dW+=reg*W ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, dW