• review backpropagation


      The goal of backpropagation is to compute the partial derivatives C/w and C/b of the cost function C with respect to any weight ww or bias b in the network. 

     we use the quadratic cost function

       

     

    two assumptions :

      1: The first assumption we need is that the cost function can be written as an average 

            (case for the quadratic cost function)

        The reason we need this assumption is because what backpropagation actually lets us do is compute the partial derivatives

      ∂Cx/w and Cx/b for a single training example. We then recover C/w and C/b by averaging over training examples. In

      fact, with this assumption in mind, we'll suppose the training example x has been fixed, and drop the x subscript, writing the

      cost Cx as C. We'll eventually put the x back in, but for now it's a notational nuisance that is better left implicit.

      2: The cost function can be written as a function of the outputs from the neural network

       

    the Hadamard product

       (st)j=sjtj(s⊙t)j=sjtj

      

    The four fundamental equations behind backpropagation

       

    BP1 

       :the error in the jth neuron in the lth layer

         

        You might wonder why the demon is changing the weighted input zlj. Surely it'd be more natural to imagine the demon changing

       the output activation alj, with the result that we'd be using C/alj as our measure of error. In fact, if you do this things work out quite

      similarly to the discussion below. But it turns out to make the presentation of backpropagation a little more algebraically complicated.

       So we'll stick with δlj=C/zlj as our measure of error.

       An equation for the error in the output layer, δL: The components of δL are given by

      

      it's easy to rewrite the equation in a matrix-based form, as

      

      

      

    BP2

      

      

      

    BP3

      

      

    BP4

      

      

      

    The backpropagation algorithm

      

        

          Of course, to implement stochastic gradient descent in practice you also need an outer loop generating mini-batches

        of training examples, and an outer loop stepping through multiple epochs of training. I've omitted those for simplicity.

     reference: http://neuralnetworksanddeeplearning.com/chap2.html

    ------------------------------------------------------------------------------------------------

    reference: Machine Learning by Andrew Ng

  • 相关阅读:
    Spring Boot入门教程1、使用Spring Boot构建第一个Web应用程序
    单点登录(SSO)的设计
    .NET Core快速入门教程 5、使用VS Code进行C#代码调试的技巧
    .NET Core快速入门教程 4、使用VS Code开发.NET Core控制台应用程序
    .NET Core快速入门教程 3、我的第一个.NET Core App (CentOS篇)
    .NET Core快速入门教程 2、我的第一个.NET Core App(Windows篇)
    .NET Core快速入门教程 1、开篇:说说.NET Core的那些事儿
    JAVA的8种基本数据类型
    JVM
    在coffeescript中声明和屏蔽模块变量
  • 原文地址:https://www.cnblogs.com/cbattle/p/9385919.html
Copyright © 2020-2023  润新知