• Momentum(动量/冲量)的理解及应用


    1. 基本概念(Momentum vs SGD)

    Momentum 用于加速 SGD(随机梯度下降)在某一方向上的搜索以及抑制震荡的发生。

    • GD(gradient descent)

      θt=θt1ηJθ(θ)θ=θηJ(θ)

      for i in range(num_epochs):
          params_grad = evaluate_gradient(loss_function, data, params)
          params = params - learning_rate * params_grad
    • SGD(stochastic gradient descent)

      θt=θt1ηJθ(θ;x(i),y(i))θ=θηJ(θ;x(i),y(i))

      for i in range(num_epochs):
          np.random.shuffle(data)
          for example in data:
              params_grad = evaluate_gradient(loss_function, example, params)
              params = params - learning_rate * params_grad
    • Momentum(冲量/动量)

      vt=γvt1+ηθJ(θ)θ=θvt

      for i in range(num_epochs):
          params_grad = evaluate_gradient(loss_function, data, params)
          v = gamma*v + learning_rate*params_grad
          params = params - v

      γ 即为此处的动量,要求 γ<1,一般取 γ=0.9 或者更小的值,如本文第二节所示,还可以在迭代过程中设置可变的 γ

    2. 可变动量设置

    maxepoch = 50;
    initialmomentum = .5;
    finalmomentum = .9;
    
    for i = 1:maxepoch
        ...
        if i < maxepoch/2
            momentum = initialmomentum
        else
            momentum = finalmomentum
        end
        ... 
    end
  • 相关阅读:
    Codeforces Round #226 (Div. 2)
    内存管理
    C/C++ 函数
    Codeforces Round #225 (Div. 2)
    常用链表操作总结
    Codeforces Round #224 (Div. 2)
    Codeforces Round #223 (Div. 2)
    Codeforces Round #222 (Div. 2)
    -树-专题
    Codeforces Round #221 (Div. 2)
  • 原文地址:https://www.cnblogs.com/mtcnn/p/9421807.html
Copyright © 2020-2023  润新知