向量(y)(为one-hot编码,只有一个值为1,其他的值为0)真实类别标签(维度为(m),表示有(m)类别):
[y=egin{bmatrix}y_1\ y_2\ ...\y_mend{bmatrix}
]
向量(z)为softmax函数的输入,和标签向量(y)的维度一样,为(m):
[z=egin{bmatrix}z_1\ z_2\ ...\z_mend{bmatrix}
]
向量(s)为softmax函数的输出,和标签向量(y)的维度一样,为(m):
[s=egin{bmatrix}s_1\ s_2\ ...\s_mend{bmatrix}
]
[s_{i}=frac{e^{z_{i}}}{sum_{k=1}^{m}e^{z_{k}}}
]
交叉熵损失函数:
[c=-sum_{j=1}^{m}y_jlns_j
]
损失函数对向量(z)中的每个(z_i)求偏导:
[frac{partial c}{partial z_i}=-sum_{j=1}^{m}frac{partial (y_jlns_j)}{partial s_j}*frac{partial s_j}{partial z_i}
=-sum_{j=1}^{m}frac{y_j}{s_j}*frac{partial s_j}{partial z_i}
]
当j=i时:
[frac{partial s_j}{partial z_i}=frac{partial (frac{e^{z_{i}}}{sum_{k=1}^{m}e^{z_{k}}})}{partial z_i}
=frac{e^{z_i}*sum_{k=1}^{m}e^{z_k}-e^{z_i}*e^{z_i}}{(sum_{k=1}^{m}e^{z_k})^2}
=frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}}*frac{sum_{k=1}^{m}e^{z_k}-e^{z_i}}{sum_{k=1}^{m}e^{z_k}}
=frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}}*(1-frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}})
=s_i*(1-s_i)
]
当j!=i时:
[frac{partial s_j}{partial z_i}=frac{partial (frac{e^{z_{j}}}{sum_{k=1}^{m}e^{z_{k}}})}{partial z_i}
=frac{0*sum_{k=1}^{m}e^{z_k}-e^{z_j}*e^{z_i}}{(sum_{k=1}^{m}e^{z_k})^2}
=-frac{e^{z_j}}{sum_{k=1}^{m}e^{z_k}}*frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}}
=-s_js_i
]
所以:
[frac{partial s_j}{partial z_i}=egin{cases}s_i(1-s_i)& j=i \ -s_js_i& j
eq{i} end{cases}
]
损失函数对向量(z)中的每个(z_i)求偏导:
[frac{partial c}{partial z_i}
=-sum_{j=1}^{m}frac{y_j}{s_j}*frac{partial s_j}{partial z_i}
=-(frac{y_i}{s_i}*frac{partial s_i}{partial z_i}+sum_{j
eq{i}}^{m}frac{y_j}{s_j}*frac{partial s_j}{partial z_i})
=-(frac{y_i}{s_i}*s_i(1-s_i)+sum_{j
eq{i}}^{m}frac{y_j}{s_j}*(-s_js_i))
]
[=-y_i(1-s_i)+sum_{j
eq{i}}^{m}y_js_i
=-y_i+s_iy_i+sum_{j
eq{i}}^{m}y_js_i
=-y_i+sum_{j=1}^{m}y_js_i
=s_i-y_i
]