• BFGS(1)


    • 算法特征:
      利用函数$f(vec{x})$的1阶信息, 构造其近似的二阶Hessian矩阵. 结合Armijo Rule, 在最优化过程中达到超线性收敛的目的.
    • 算法推导:
      为书写方便, 引入如下两个符号$B$、$D$分别表示近似Hessian矩阵及其逆矩阵:
      egin{equation}label{eq_1}
      B approx H; quad D approx H^{-1}
      end{equation}
      注意, $B$与$D$均为对称矩阵.
      考虑第$k$步迭代, 将其与第$k-1$步迭代的优化变量及梯度之差分别记为$vec{s}_k$、$vec{y}_k$:
      egin{equation}label{eq_2}
      vec{s}_k = vec{x}_k - vec{x}_{k-1} \
      vec{y}_k = abla{f(vec{x}_k)} - abla{f(vec{x}_{k-1})}
      end{equation}
      引入割线条件(类似于微分中值定理):
      egin{equation}label{eq_3}
      vec{y}_k = B_k cdot vec{s}_k
      end{equation}
      因此有:
      egin{equation}label{eq_4}
      D_k cdot vec{y}_k = vec{s}_k
      end{equation}
      在相邻两步迭代过程中, 采用Frobenius范数衡量矩阵之变化. 为增强近似Hessian矩阵的稳定性, 考虑近似Hessian矩阵遵循保守变化, 即在满足约束条件的情况下, 相邻两步迭代过程中$B$或$D$的变化越小越好. 由此抽象出一个优化问题如下(以下忽略矢量符号):
      egin{equation}label{eq_5}
      minquadleft | D_k - D_{k-1} ight |_F^2\
      s.t.quad D_k cdot y_k = s_k\
      quadquad D_k is symmetric
      end{equation}
      其中, $D_k$为优化变量, 代表第$k$步迭代近似Hessian矩阵的逆矩阵. 该优化问题的最优解形式如下(笔者也不太确定, 大家有能力的可以验证一下):
      egin{equation}label{eq_6}
      D_k = (I - ho_ks_ky_k^T)D_{k-1}(I - ho_ky_ks_k^T) + ho_ks_ks_k^T
      end{equation}
      其中, $ ho_k = (y_k^Ts_k)^{-1}$.
    • 代码实现:
        1 # BFGS算法实现
        2 # 通过Armijo Rule确定迭代步长
        3 
        4 import numpy
        5 from matplotlib import pyplot as plt
        6 
        7 
        8 # 目标函数0阶信息
        9 def func(x1, x2):
       10     funcVal = 5 * x1 ** 2 + 2 * x2 ** 2 + 3 * x1 - 10 * x2 + 4
       11     return funcVal
       12 # 目标函数1阶信息
       13 def grad(x1, x2):
       14     gradVal = numpy.array([[10 * x1 + 3], [4 * x2 - 10]])
       15     return gradVal
       16     
       17 
       18 class BFGS(object):
       19     
       20     def __init__(self, seed=None, epsilon=1.e-6, maxIter=300):
       21         self.__seed = seed                         # 迭代起点
       22         self.__epsilon = epsilon                   # 计算精度
       23         self.__maxIter = maxIter                   # 最大迭代次数
       24         
       25         self.__xPath = list()                      # 记录优化变量之路径
       26         self.__fPath = list()                      # 记录目标函数值之路径
       27         
       28         
       29     def solve(self):
       30         self.__init_path()
       31         
       32         xCurr = self.__init_seed(self.__seed)
       33         fCurr = func(xCurr[0, 0], xCurr[1, 0])
       34         gCurr = grad(xCurr[0, 0], xCurr[1, 0])
       35         DCurr = self.__init_D(xCurr.shape[0])      # 矩阵D初始化 ~ 此处采用单位矩阵
       36         self.__save_path(xCurr, fCurr)
       37         
       38         for i in range(self.__maxIter):
       39             if self.__is_converged(gCurr):
       40                 self.__print_MSG()
       41                 break
       42             
       43             dCurr = -numpy.matmul(DCurr, gCurr)
       44             alpha = self.__calc_alpha_by_ArmijoRule(xCurr, fCurr, gCurr, dCurr)
       45             
       46             xNext = xCurr + alpha * dCurr
       47             fNext = func(xNext[0, 0], xNext[1, 0])
       48             gNext = grad(xNext[0, 0], xNext[1, 0])
       49             DNext = self.__update_D_by_BFGS(xCurr, gCurr, xNext, gNext, DCurr)
       50             
       51             xCurr, fCurr, gCurr, DCurr = xNext, fNext, gNext, DNext
       52             self.__save_path(xCurr, fCurr)
       53         else:
       54             if self.__is_converged(gCurr):
       55                 self.__print_MSG()
       56             else:
       57                 print("BFGS not converged after {} steps!".format(self.__maxIter))
       58                 
       59     
       60     def show(self):
       61         if not self.__xPath:
       62             self.solve()
       63             
       64         fig = plt.figure(figsize=(10, 4))
       65         ax1 = plt.subplot(1, 2, 1)
       66         ax2 = plt.subplot(1, 2, 2)
       67         
       68         ax1.plot(numpy.arange(len(self.__fPath)), self.__fPath, "k.")
       69         ax1.plot(0, self.__fPath[0], "go", label="starting point")
       70         ax1.plot(len(self.__fPath) - 1, self.__fPath[-1], "r*", label="solution")
       71         ax1.set(xlabel="$iterCnt$", ylabel="$iterVal$")
       72         ax1.legend()
       73         
       74         x1 = numpy.linspace(-100, 100, 500)
       75         x2 = numpy.linspace(-100, 100, 500)
       76         x1, x2 = numpy.meshgrid(x1, x2)
       77         f = func(x1, x2)
       78         ax2.contour(x1, x2, f, levels=36)
       79         x1Path = list(item[0] for item in self.__xPath)
       80         x2Path = list(item[1] for item in self.__xPath)
       81         ax2.plot(x1Path, x2Path, "k--", lw=2)
       82         ax2.plot(x1Path[0], x2Path[0], "go", label="starting point")
       83         ax2.plot(x1Path[-1], x2Path[-1], "r*", label="solution")
       84         ax2.set(xlabel="$x_1$", ylabel="$x_2$")
       85         ax2.legend()
       86         
       87         fig.tight_layout()
       88         fig.savefig("bfgs.png", dpi=300)
       89         plt.close()
       90     
       91     
       92     def __print_MSG(self):
       93         print("Iteration steps: {}".format(len(self.__xPath) - 1))
       94         print("Seed: {}".format(self.__xPath[0].reshape(-1)))
       95         print("Solution: {}".format(self.__xPath[-1].reshape(-1)))
       96         
       97             
       98     def __is_converged(self, gCurr):
       99         if numpy.linalg.norm(gCurr) <= self.__epsilon:
      100             return True
      101         return False
      102         
      103             
      104     def __update_D_by_BFGS(self, xCurr, gCurr, xNext, gNext, DCurr):
      105         sk = xNext - xCurr
      106         yk = gNext - gCurr
      107         rk = 1 / numpy.matmul(yk.T, sk)[0, 0]
      108         
      109         term1 = rk * numpy.matmul(sk, yk.T)
      110         term2 = rk * numpy.matmul(yk, sk.T)
      111         I = numpy.identity(term1.shape[0])
      112         term3 = numpy.matmul(I - term1, DCurr)
      113         term4 = numpy.matmul(term3, I - term2)
      114         term5 = rk * numpy.matmul(sk, sk.T)
      115         
      116         Dk = term4 + term5
      117         return Dk
      118             
      119     
      120     def __calc_alpha_by_ArmijoRule(self, xCurr, fCurr, gCurr, dCurr, c=1.e-4, v=0.5):
      121         i = 0
      122         alpha = v ** i
      123         xNext = xCurr + alpha * dCurr
      124         fNext = func(xNext[0, 0], xNext[1, 0])
      125         
      126         while True:
      127             if fNext <= fCurr + c * alpha * numpy.matmul(dCurr.T, gCurr)[0, 0]: break
      128             i += 1
      129             alpha = v ** i
      130             xNext = xCurr + alpha * dCurr
      131             fNext = func(xNext[0, 0], xNext[1, 0])
      132         
      133         return alpha
      134     
      135     
      136     def __init_D(self, n):
      137         D = numpy.identity(n)
      138         return D        
      139         
      140         
      141     def __init_seed(self, seed):
      142         if seed is None:
      143             seed = numpy.random.uniform(-100, 100, 2)
      144             
      145         seed = numpy.array(seed).reshape((2, 1))
      146         return seed
      147 
      148         
      149     def __init_path(self):
      150         self.__xPath.clear()
      151         self.__fPath.clear()
      152 
      153     
      154     def __save_path(self, xCurr, fCurr):
      155         self.__xPath.append(xCurr)
      156         self.__fPath.append(fCurr)
      157         
      158 
      159 
      160 if __name__ == "__main__":
      161     obj = BFGS()
      162     obj.solve()
      163     obj.show()
      View Code

      笔者所用示例函数为:
      egin{equation}label{eq_7}
      f(x_1, x_2) = 5x_1^2 + 2x_2^2 + 3x_1 - 10x_2 + 4
      end{equation}

    • 结果展示:
    • 使用建议:
      1. BFGS作为一种拟牛顿法, 主要用于无法直接获取目标函数准确Hessian矩阵的情形. 此时退而求其次, 采用BFGS的更新方式获取目标函数的近似Hessian矩阵.
      2. 需要注意的是, Hessian矩阵反映函数本身的曲率信息, 因此对函数响应的噪声部分非常敏感. 如果函数响应自带噪声, 则需要采用一定的光滑手段处理之.
    • 参考文档:
      http://aria42.com/blog/2014/12/understanding-lbfgs
      https://blog.csdn.net/itplus/article/details/21897443
  • 相关阅读:
    关于PowerShell调用Linq的一组实验
    PowerShell创建参考窗口
    Python切图脚本
    11->8
    用Python演奏音乐
    关于Haskell计算斐波那契数列的思考
    傅立叶变换与小波分析
    堆排序(python实现)
    二进制数据表示方式
    oracle数据插入/查询乱码
  • 原文地址:https://www.cnblogs.com/xxhbdk/p/11785365.html
Copyright © 2020-2023  润新知