神经网络学习笔记 - 损失函数的定义和微分证明
损失函数 Loss function (cross entropy loss)
损失函数,反向传播和梯度计算构成了循环神经网络的训练过程。
激活函数softmax和损失函数会一起使用。
激活函数会根据输入的参数(一个矢量,表示每个分类的可能性),计算每个分类的概率(0, 1)。
损失函数根据softmax的计算结果(hat{y})和期望结果(y),根据交叉熵方法(cross entropy loss) 可得到损失(L)。
cross entropy loss函数
[L_t(y_t, hat{y_t}) = - y_t log hat{y_t} \
L(y, hat{y}) = - sum_{t} y_t log hat{y_t} \
frac{ partial L_t } { partial z_t } = hat{y_t} - y_t \
ext{where} \
z_t = s_tV \
hat{y_t} = softmax(z_t) \
y_t ext{ : for training data x, the expected result y at time t. which are from training data}
]
证明
[egin{align}
frac{ partial L_t } { partial z_t }
& = frac{ partial left ( - sum_{k} y_k log hat{y_k}
ight ) } { partial z_t } \
& = - sum_{k} y_k frac{ partial log hat{y_k} } { partial z_t } \
& = - sum_{k} y_k frac {1} {hat{y_k}} cdot frac{ partial hat{y_k} } { partial z_t } \
& = - left ( y_t frac {1} {hat{y_t}} cdot frac{ partial hat{y_t} } { partial z_t }
ight ) - left ( sum_{k
e t} y_k frac {1} {hat{y_k}} cdot frac{ partial hat{y_k} } { partial z_t }
ight ) \
& ecause ext{softmax differentiation formula } \
& = - left ( y_t frac {1} {hat{y_t}} cdot ( 1 - hat{y_t} ) hat{y_t}
ight ) - left ( sum_{k
e t} y_k frac {1} {hat{y_k}} cdot (-hat{y_t} hat{y_k})
ight ) \
& = - left ( y_t cdot ( 1 - hat{y_t} )
ight ) - left ( sum_{k
e t} y_k cdot (-hat{y_t})
ight ) \
& = - y_t + y_t hat{y_t} + left ( sum_{k
e t} y_k hat{y_t}
ight ) \
& = - y_t + hat{y_t} left ( sum_{k} y_k
ight ) \
& ecause sum_{k} y_k = 1 \
& = hat{y_t} - y_t
end{align}
]