• Conjugate Gradient Descent (1)


    • 算法特征:
      ①. 将线性方程组等效为最优化问题; ②. 以共轭方向作为搜索方向.
    • 算法推导:
      Part Ⅰ 算法细节
      现以如下线性方程组为例进行算法推导,
      egin{equation}
      Ax = b
      label{eq_1}
      end{equation}
      如​上式$eqref{eq_1}$解存在, 则等效如下最优化问题,
      egin{equation}
      minquad frac{1}{2}|Ax - b|_2^2 quad Rightarrowquad minquad frac{1}{2}x^mathrm{T}A^mathrm{T}Ax - b^mathrm{T}Ax
      label{eq_2}
      end{equation}
      上式$eqref{eq_2}$之最优解$x^star$即为式$eqref{eq_1}$的解. 为简化叙述, 令$C = A^mathrm{T}A, c=A^mathrm{T}b$, 则式$eqref{eq_2}$转换如下,
      egin{equation}
      minquad frac{1}{2}x^mathrm{T}Cx - c^mathrm{T}x
      label{eq_3}
      end{equation}
      其中, $C$为对称半正定矩阵. 上式$eqref{eq_3}$为无约束凸二次规划问题, 拟设其目标函数符号为$J$, 则梯度表示如下,
      egin{equation}
      g = abla J = Cx - c
      label{eq_4}
      end{equation}
      不失一般性, 在迭代求解过程中, 令第$k$步迭代形式如下,
      egin{equation}
      x_{k+1} = x_k + alpha_kd_k
      label{eq_5}
      end{equation}
      其中$alpha_k$、$d_k$分别代表第$k$步迭代步长与迭代方向.
      根据线搜法要求,
      egin{equation}
      alpha_k = mathop{argmin}_{alphain mathbb{R}} J(x_k+alpha d_k)quadRightarrowquad
      frac{mathrm{d}J}{mathrm{d}alpha}igg|_{k+1} = 0
      label{eq_6}
      end{equation}
      解之得,
      egin{equation}
      alpha_k = frac{-g_k^mathrm{T}d_k}{d_k^mathrm{T}Cd_k}
      label{eq_7}
      end{equation}
      根据共轭方向要求,
      egin{gather}
      d_k = -g_k + eta_kd_{k-1}label{eq_8} \
      d_{k}^mathrm{T}Cd_{k-1} = 0label{eq_9}
      end{gather}
      解之得,
      egin{equation}
      eta_k = frac{g_k^mathrm{T}Cd_{k-1}}{d_{k-1}^mathrm{T}Cd_{k-1}} =
      frac{g_k^mathrm{T}(g_k-g_{k-1})}{d_{k-1}^mathrm{T}(g_k-g_{k-1})}
      label{eq_10}
      end{equation}
      结合式$eqref{eq_5}$、$eqref{eq_7}$、$eqref{eq_8}$、$eqref{eq_10}$即可迭代求解式$eqref{eq_3}$, 其最优解$x^star$即为式$eqref{eq_1}$的解.
      Part Ⅱ 算法流程
      初始化收敛判据$epsilon$、迭代起点$x_0$
      计算当前梯度值$g_0=Cx_0 - c$, 令: $d_0 = -g_0$, $k=0$, 重复以下步骤,
        step1: 如果$|g_k|<epsilon$, 收敛, 迭代停止
        step2: 计算迭代步长$displaystyle alpha_k=frac{-g_k^mathrm{T}d_k}{d_k^mathrm{T}Cd_k}$
        step3: 更新迭代点$displaystyle x_{k+1}=x_k + alpha_kd_k$
        step4: 更新梯度值$g_{k+1}=Cx_{k+1}-c$
        step5: 计算组合系数$displaystyle eta_{k+1}=frac{g_{k+1}^mathrm{T}(g_{k+1}-g_k)}{d_{k}^mathrm{T}(g_{k+1}-g_k)}$
        step6: 更新共轭方向$displaystyle d_{k+1}=-g_{k+1}+eta_{k+1}d_k$
        step7: 令$k = k+1$, 转step1
    • 代码实现:
      笔者以求解随机生成的线性方程组$Ax = b$为例进行算法实施, conjugate gradient descent实现如下,
        1 # conjugate gradient descent之实现
        2 # 求解线性方程组Ax = b
        3 
        4 import numpy
        5 from matplotlib import pyplot as plt
        6 
        7 
        8 numpy.random.seed(0)
        9 
       10 
       11 # 随机生成待求解之线性方程组Ax = b
       12 def get_A_b(n=30):
       13     A = numpy.random.uniform(-10, 10, (n, n))
       14     b = numpy.random.uniform(-10, 10, (n, 1))
       15     return A, b
       16     
       17     
       18 # 共轭梯度法之实现
       19 class CGD(object):
       20     
       21     def __init__(self, A, b, epsilon=1.e-6, maxIter=30000):
       22         self.__A = A                               # 系数矩阵
       23         self.__b = b                               # 偏置向量
       24         self.__epsilon = epsilon                   # 收敛判据
       25         self.__maxIter = maxIter                   # 最大迭代次数
       26         
       27         self.__C, self.__c = self.__build_C_c()    # 构造优化问题相关参数
       28         self.__JPath = list()
       29 
       30 
       31     def get_solu(self):
       32         '''
       33         获取数值解
       34         '''
       35         self.__JPath.clear()
       36         
       37         x = self.__init_x()
       38         JVal = self.__calc_JVal(self.__C, self.__c, x)
       39         self.__JPath.append(JVal)
       40         grad = self.__calc_grad(self.__C, self.__c, x)
       41         d = -grad
       42         for idx in range(self.__maxIter):
       43             # print("iterCnt: {:3d},   JVal: {}".format(idx, JVal))
       44             if self.__converged1(grad, self.__epsilon):
       45                 self.__print_MSG(x, JVal, idx)
       46                 return x, JVal, True
       47                 
       48             alpha = self.__calc_alpha(self.__C, grad, d)
       49             xNew = x + alpha * d
       50             JNew = self.__calc_JVal(self.__C, self.__c, xNew)
       51             self.__JPath.append(JNew)
       52             if self.__converged2(xNew - x, JNew - JVal, self.__epsilon ** 2):
       53                 self.__print_MSG(xNew, JNew, idx + 1)
       54                 return xNew, JNew, True
       55             
       56             gNew = self.__calc_grad(self.__C, self.__c, xNew)
       57             beta = self.__calc_beta(gNew, grad, d)
       58             dNew = self.__calc_d(gNew, d, beta)
       59             
       60             x, JVal, grad, d = xNew, JNew, gNew, dNew
       61         else:
       62             if self.__converged1(grad, self.__epsilon):
       63                 self.__print_MSG(x, JVal, self.__maxIter)
       64                 return x, JVal, True
       65         
       66         print("CGD not converged after {} steps!".format(self.__maxIter))
       67         return x, JVal, False
       68         
       69         
       70     def get_path(self):
       71         return self.__JPath
       72         
       73         
       74     def __print_MSG(self, x, JVal, iterCnt):
       75         print("Iteration steps: {}".format(iterCnt))
       76         print("Solution:
      {}".format(x.flatten()))
       77         print("JVal: {}".format(JVal))
       78                 
       79                 
       80     def __converged1(self, grad, epsilon):
       81         if numpy.linalg.norm(grad, ord=numpy.inf) < epsilon:
       82             return True
       83         return False
       84         
       85         
       86     def __converged2(self, xDelta, JDelta, epsilon):
       87         val1 = numpy.linalg.norm(xDelta, ord=numpy.inf)
       88         val2 = numpy.abs(JDelta)
       89         if val1 < epsilon or val2 < epsilon:
       90             return True
       91         return False
       92         
       93     
       94     def __init_x(self):
       95         x0 = numpy.zeros((self.__C.shape[0], 1))
       96         return x0
       97         
       98         
       99     # def __calc_JVal(self, C, c, x):
      100         # term1 = numpy.matmul(x.T, numpy.matmul(C, x))[0, 0] / 2
      101         # term2 = numpy.matmul(c.T, x)[0, 0]
      102         # JVal = term1 - term2
      103         # return JVal
      104         
      105         
      106     def __calc_JVal(self, C, c, x):
      107         term1 = numpy.matmul(self.__A, x) - self.__b
      108         JVal = numpy.sum(term1 ** 2) / 2
      109         return JVal
      110         
      111         
      112     def __calc_grad(self, C, c, x):
      113         grad = numpy.matmul(C, x) - c
      114         return grad
      115         
      116         
      117     def __calc_d(self, grad, dOld, beta):
      118         d = -grad + beta * dOld
      119         return d
      120         
      121         
      122     def __calc_alpha(self, C, grad, d):
      123         term1 = numpy.matmul(grad.T, d)[0, 0]
      124         term2 = numpy.matmul(d.T, numpy.matmul(C, d))[0, 0]
      125         alpha = -term1 / term2
      126         return alpha
      127         
      128         
      129     def __calc_beta(self, grad, gOld, dOld):
      130         term0 = grad - gOld
      131         term1 = numpy.matmul(grad.T, term0)[0, 0]
      132         term2 = numpy.matmul(dOld.T, term0)[0, 0]
      133         beta = term1 / term2
      134         return beta        
      135         
      136         
      137     def __build_C_c(self):
      138         C = numpy.matmul(A.T, A)
      139         c = numpy.matmul(A.T, b)
      140         return C, c
      141         
      142         
      143 class CGDPlot(object):
      144     
      145     @staticmethod
      146     def plot_fig(cgdObj):
      147         x, JVal, tab = cgdObj.get_solu()
      148         JPath = cgdObj.get_path()
      149         
      150         fig = plt.figure(figsize=(6, 4))
      151         ax1 = plt.subplot()
      152         
      153         ax1.plot(numpy.arange(len(JPath)), JPath, "k.")
      154         ax1.plot(0, JPath[0], "go", label="starting point")
      155         ax1.plot(len(JPath)-1, JPath[-1], "r*", label="solution")
      156         
      157         ax1.legend()
      158         ax1.set(xlabel="$iterCnt$", ylabel="$JVal$", title="JVal-Final = {}".format(JVal))
      159         fig.tight_layout()
      160         fig.savefig("plot_fig.png", dpi=100)
      161     
      162     
      163     
      164 if __name__ == "__main__":
      165     A, b = get_A_b()
      166     
      167     cgdObj = CGD(A, b)
      168     CGDPlot.plot_fig(cgdObj)
      View Code 
    • 结果展示:

      注意, 上式中JVal之计算采用式$eqref{eq_2}$左侧表达式, 该值收敛时趋于0则数值解可靠.
    • 使用建议:
      ①. 适用于迭代求解大型线性方程组;
      ②. 对于凸二次型目标函数可利用子空间扩展定理进一步简化组合系数$eta$之计算.
    • 参考文档:
      https://zhuanlan.zhihu.com/p/338838078
  • 相关阅读:
    利用HttpModule做流量记录
    VS2010 调试出现 asp.net development server 错误
    利用win7自带的虚拟WIFI网卡,与其他设备共享网络
    关于 ASP 中使用 Server.CreateObject("ADODB.Stream") 上传文件报错
    lightweight jobs
    YOLO v3 包括Tiny-Yolo 训练自己的数据集(Pytorch版本)以及模型评价指标的介绍
    Sublime Text3 下载安装与激活使用
    QT中自定义封装控件笔记
    19_7_25-7_27 暑假学校收获
    数字图像处理基础知识2
  • 原文地址:https://www.cnblogs.com/xxhbdk/p/15008748.html
Copyright © 2020-2023  润新知