神经网络的损失函数为
[Jleft( Theta ight) = - frac{1}{m}left[ {sumlimits_{i = 1}^m {sumlimits_{k = 1}^k {y_k^{left( i ight)}log {{left( {{h_Theta }left( {{x^{left( i ight)}}} ight)} ight)}_k} + left( {1 - y_k^{left( i ight)}} ight)log left( {1 - {{left( {{h_Theta }left( {{x^{left( i ight)}}} ight)} ight)}_k}} ight)} } } ight] + frac{lambda }{{2m}}sumlimits_{l = 1}^{L - 1} {sumlimits_{i = 1}^{{s_l}} {sumlimits_{j = 1}^{{s_{l + 1}}} {{{left( {Theta _{ji}^{left( l ight)}} ight)}^2}} } } ]
我们想要最小化J(Θ)
[underbrace {min }_Theta Jleft( Theta ight)]
需要计算
[Jleft( Theta ight)]
[frac{partial }{{partial _{ij}^{left( l ight)}}}Jleft( Theta ight)]
问题的关键是计算
[frac{partial }{{partial _{ij}^{left( l ight)}}}Jleft( Theta ight)]
有如下神经网络(省略中间的连线)
以一个样本为例(x, y)
先计算向前传播
[egin{array}{l}
{a^{left( 1
ight)}} = x\
{z^{left( 2
ight)}} = {Theta ^{left( 1
ight)}}{a^{left( 1
ight)}}\
{a^{left( 2
ight)}} = gleft( {{z^{left( 2
ight)}}}
ight)left( { + a_0^{left( 2
ight)}}
ight)\
{z^{left( 3
ight)}} = {Theta ^{left( 2
ight)}}{a^{left( 2
ight)}}\
{a^{left( 3
ight)}} = gleft( {{z^{left( 3
ight)}}}
ight)left( { + a_0^{left( 3
ight)}}
ight)\
{z^{left( 4
ight)}} = {Theta ^{left( 3
ight)}}{a^{left( 3
ight)}}\
{a^{left( 3
ight)}} = {h_Theta }left( x
ight) = gleft( {{z^{left( 4
ight)}}}
ight)
end{array}]
反响传播算法
定义
[delta _j^{left( l ight)} = “error” of node j in layer l.]
[delta _j^{left( l ight)} = 第 l 层的第 j 个节点的“偏差”]
因此,对于每个输出层单元(L=4)
[delta _j^{left( 4 ight)} = a_J^{left( 4 ight)} - {y_j}]
yj是真实值。
接下来计算前面几层的“偏差”
[egin{array}{l}
{delta ^{left( 3
ight)}} = {left( {{Theta ^{left( 3
ight)}}}
ight)^T}{delta ^{left( 4
ight)}}. * g'left( {{z^{left( 3
ight)}}}
ight)\
{delta ^{left( 2
ight)}} = {left( {{Theta ^{left( 2
ight)}}}
ight)^T}{delta ^{left( 3
ight)}}. * g'left( {{z^{left( 2
ight)}}}
ight)
end{array}]
第一层没有“偏差”
又可以证明(我没有证明)
[egin{array}{l}
{delta ^{left( 3
ight)}} = {left( {{Theta ^{left( 3
ight)}}}
ight)^T}{delta ^{left( 4
ight)}}. * left( {{a^{left( 3
ight)}}. * left( {1 - {a^{left( 3
ight)}}}
ight)}
ight)\
{delta ^{left( 2
ight)}} = {left( {{Theta ^{left( 2
ight)}}}
ight)^T}{delta ^{left( 3
ight)}}. * left( {{a^{left( 2
ight)}}. * left( {1 - {a^{left( 2
ight)}}}
ight)}
ight)
end{array}]
又有
[frac{partial }{{partial Theta _{ij}^{left( l ight)}}}Jleft( Theta ight) = a_j^{left( l ight)}delta _i^{left( {l + 1} ight)}]
总结反响传播算法
有训练集
[left{ {left( {{x^{left( 1 ight)}},{y^{left( 1 ight)}}} ight),...,left( {{x^{left( m ight)}},{y^{left( m ight)}}} ight)} ight}]
1,令
[Delta _{ij}^{left( l ight)} = 0]
[Delta 是delta的大写 ]
2,计算
For i = 1 to m {
Set a(1) = x(i)
Perform forward propagation to compute a(l) for l = 2, 3,.., L
Using y(i), compute δ(L) = a(L) - y(i)
Compute δ(L-1), δ(L-2),...,δ(2)
[Delta _{ij}^{left( l ight)}: = Delta _{ij}^{left( l ight)} + a_j^{left( l ight)}delta _i^{left( {l + 1} ight)}]
}
3,计算
if j ≠ 0
[D_{ij}^{left( l ight)}: = frac{1}{m}Delta _{ij}^{left( l ight)} + lambda Theta _{ij}^{left( l ight)}]
if j = 0
[D_{ij}^{left( l ight)}: = frac{1}{m}Delta _{ij}^{left( l ight)}]
这里
[D_{ij}^{left( l ight)} = frac{partial }{{partial Theta _{ij}^{left( l ight)}}}Jleft( Theta ight)]