• 神经网络——反向传播算法


    神经网络的损失函数为

    [Jleft( Theta  ight) =  - frac{1}{m}left[ {sumlimits_{i = 1}^m {sumlimits_{k = 1}^k {y_k^{left( i ight)}log {{left( {{h_Theta }left( {{x^{left( i ight)}}} ight)} ight)}_k} + left( {1 - y_k^{left( i ight)}} ight)log left( {1 - {{left( {{h_Theta }left( {{x^{left( i ight)}}} ight)} ight)}_k}} ight)} } } ight] + frac{lambda }{{2m}}sumlimits_{l = 1}^{L - 1} {sumlimits_{i = 1}^{{s_l}} {sumlimits_{j = 1}^{{s_{l + 1}}} {{{left( {Theta _{ji}^{left( l ight)}} ight)}^2}} } } ]

    我们想要最小化J(Θ)

    [underbrace {min }_Theta Jleft( Theta  ight)]

    需要计算

    [Jleft( Theta  ight)]

    [frac{partial }{{partial _{ij}^{left( l ight)}}}Jleft( Theta  ight)]


    问题的关键是计算

    [frac{partial }{{partial _{ij}^{left( l ight)}}}Jleft( Theta  ight)]


    有如下神经网络(省略中间的连线)

    以一个样本为例(x, y)

    先计算向前传播

    [egin{array}{l}
    {a^{left( 1 ight)}} = x\
    {z^{left( 2 ight)}} = {Theta ^{left( 1 ight)}}{a^{left( 1 ight)}}\
    {a^{left( 2 ight)}} = gleft( {{z^{left( 2 ight)}}} ight)left( { + a_0^{left( 2 ight)}} ight)\
    {z^{left( 3 ight)}} = {Theta ^{left( 2 ight)}}{a^{left( 2 ight)}}\
    {a^{left( 3 ight)}} = gleft( {{z^{left( 3 ight)}}} ight)left( { + a_0^{left( 3 ight)}} ight)\
    {z^{left( 4 ight)}} = {Theta ^{left( 3 ight)}}{a^{left( 3 ight)}}\
    {a^{left( 3 ight)}} = {h_Theta }left( x ight) = gleft( {{z^{left( 4 ight)}}} ight)
    end{array}]

     反响传播算法

    定义

    [delta _j^{left( l ight)} = “error” of node j in layer l.]

    [delta _j^{left( l ight)} = 第 l 层的第 j 个节点的“偏差”]

    因此,对于每个输出层单元(L=4)

    [delta _j^{left( 4 ight)} = a_J^{left( 4 ight)} - {y_j}]

    yj是真实值。

    接下来计算前面几层的“偏差”

    [egin{array}{l}
    {delta ^{left( 3 ight)}} = {left( {{Theta ^{left( 3 ight)}}} ight)^T}{delta ^{left( 4 ight)}}. * g'left( {{z^{left( 3 ight)}}} ight)\
    {delta ^{left( 2 ight)}} = {left( {{Theta ^{left( 2 ight)}}} ight)^T}{delta ^{left( 3 ight)}}. * g'left( {{z^{left( 2 ight)}}} ight)
    end{array}]

    第一层没有“偏差”

    又可以证明(我没有证明)

    [egin{array}{l}
    {delta ^{left( 3 ight)}} = {left( {{Theta ^{left( 3 ight)}}} ight)^T}{delta ^{left( 4 ight)}}. * left( {{a^{left( 3 ight)}}. * left( {1 - {a^{left( 3 ight)}}} ight)} ight)\
    {delta ^{left( 2 ight)}} = {left( {{Theta ^{left( 2 ight)}}} ight)^T}{delta ^{left( 3 ight)}}. * left( {{a^{left( 2 ight)}}. * left( {1 - {a^{left( 2 ight)}}} ight)} ight)
    end{array}]

    又有

    [frac{partial }{{partial Theta _{ij}^{left( l ight)}}}Jleft( Theta  ight) = a_j^{left( l ight)}delta _i^{left( {l + 1} ight)}]


    总结反响传播算法

    有训练集 

    [left{ {left( {{x^{left( 1 ight)}},{y^{left( 1 ight)}}} ight),...,left( {{x^{left( m ight)}},{y^{left( m ight)}}} ight)} ight}]

    1,令

    [Delta _{ij}^{left( l ight)} = 0]

    [Delta 是delta的大写 ]

    2,计算

    For i = 1 to m {

      Set a(1) = x(i)

      Perform forward propagation to compute a(l) for l = 2, 3,.., L

      Using y(i), compute δ(L) = a(L) - y(i)

      Compute δ(L-1), δ(L-2),...,δ(2)

      [Delta _{ij}^{left( l ight)}: = Delta _{ij}^{left( l ight)} + a_j^{left( l ight)}delta _i^{left( {l + 1} ight)}]

     }

    3,计算

    if j ≠ 0

    [D_{ij}^{left( l ight)}: = frac{1}{m}Delta _{ij}^{left( l ight)} + lambda Theta _{ij}^{left( l ight)}]

    if j = 0

    [D_{ij}^{left( l ight)}: = frac{1}{m}Delta _{ij}^{left( l ight)}]

    这里

    [D_{ij}^{left( l ight)} = frac{partial }{{partial Theta _{ij}^{left( l ight)}}}Jleft( Theta  ight)]

  • 相关阅读:
    Linux进程组调度机制分析【转】
    Linux内存管理 (22)内存检测技术(slub_debug/kmemleak/kasan)【转】
    linux syscall 详解【转】
    Linux C 读取文件夹下所有文件(包括子文件夹)的文件名【转】
    Linux RTC驱动模型分析之rtc-sysfs.c【转】
    Linux的notifier机制在TP中的应用【转】
    Linux内存管理 (10)缺页中断处理【转】
    proc/net/dev实时网速统计实例【转】
    硬中断和软中断【转】
    Linux网络
  • 原文地址:https://www.cnblogs.com/qkloveslife/p/9872785.html
Copyright © 2020-2023  润新知