• 扩展——向量求导


    向量求导

    感谢

    [矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - 知乎 (zhihu.com)](https://zhuanlan.zhihu.com/p/263777564#:~:text= 分子布局,就是分子是列向量形式,分母是行向量形式,如 式。 如果这里的 是 实向量函数 的话,结果就是,的矩阵了: 分母布局 ,就是分母是 列向量 形式,分子是 行向量 形式,如 式。)

    07 自动求导【动手学深度学习v2】_哔哩哔哩_bilibili

    梯度,指向值变化最大的方向,这里都是分子布局

    一 函数计算、求导与向量矩阵

    考虑一个函数

    \[\text{function(input)} \]

    针对\(\text{function}\)\(\text{input}\)的类型,我们可以将这个函数分类。

    1 \(\text{function}\)是一个标量

    我们称\(\text{function}\)是一个实值标量函数。用细体小写字母\(f\)表示

    1.1 \(\text{input}\)是一个标量

    我们称\(\text{function}\)的变元是标量,用细体小写字母\(x\)表示。

    计算:输入是标量(\((1,)\)),函数是一个实值标量函数,结果是一个值(标量)(\((1,)\)

    求导: 分母(函数值)是标量(\((1,)\)),分子是标量(\((1,)\)),结果是标量(\((1,)\)

    image-20220331202208775

    例1

    \[f(x) = 2x+2 \\ f'(x)=2 \]

    1.2 \(\text{input}\)是一个向量

    我们称\(\text{function}\)的变元是向量,用粗体小写字母\(\mathbfcal{x}\)表示。

    计算:输入是列向量(\((n,1)n\times 1\)),函数是一个实值标量函数,结果是一个标量(数)(\((1,)\)

    求导:分母(函数值)是标量(\((1,)\)),分子是列向量(\((n,1)n\times 1\)),结果是行向量(\((1,n)1\times n\)

    \[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right ]_{(n,1)} \\ y=f(\mathbfcal{x})_{(1,)} \\ f'(\mathbfcal{x})= \frac{\partial y}{ \partial \mathbfcal{x}} = \left[ \frac{\partial y}{\partial x_1} , \frac{\partial y}{\partial x_2}, \cdots , \frac{\partial y}{\partial x_n} \right ]_{(1,n)} \]

    image-20220331211924207

    例2

    \[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \end{array} \right ] \\ y=f(\mathbfcal{x}) = a_1x_1^2+a_2x_2^2+a_3x_1x_2+a_4x_1+a_5x_2+a_6 \\ f'(\mathbfcal{x})= \frac{\partial y}{ \partial \mathbfcal{x}} = \left[ \frac{\partial y}{\partial x_1} , \frac{\partial y}{\partial x_2} \right ] = \left[ 2a_1+a_3x_2+a_4, 2a_2+a_3x_1+a_5\right ] \]

    1.3 \(\text{input}\)是一个矩阵

    我们称\(\text{function}\)的变元是矩阵,用粗体大写字母\(\symbf{X}\)表示。

    计算:输入是矩阵(\((n,k)n\times k\)),函数是一个实值标量函数,结果是一个标量(数)(\((1,)\)

    求导:分母(函数值)是标量(\((1,)\)),分子是矩阵(\((n,k)n\times k\)),结果是矩阵(\((k,n)k\times n\)

    \[\symbf{X} = \left ( \begin{matrix} x_{11} & x_{12} & \cdots & x_{1k} \\ x_{21} & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \\ \end{matrix} \right )_{(n,k)n\times k} \\ y=f(\symbf{X})_{(1,)} \\ f'(\symbf{X}) =\frac{\partial y}{\partial \symbf{X}} = \left ( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{n1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{n2}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y}{\partial x_{1k}} & \frac{\partial y}{\partial x_{2k}} & \cdots & \frac{\partial y}{\partial x_{nk}} \\ \end{matrix} \right )_{(k,n)k\times n} \]

    image-20220331215017049

    例3

    \[\symbf{X} = \left ( \begin{matrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{matrix} \right )_{(3,2) 3\times 2} \\ y=f(\symbf{X})_{(1,)}=a_1x_{11}^2+a_2x_{12}^2+a_3x_{21}^2+a_4x_{22}^2+a_5x_{31}^2+a_6x_{32}^2 \\ f'(\symbf{X}) =\frac{\partial y}{\partial \symbf{X}} = \left ( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{31}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{32}} \\ \end{matrix} \right) _{(2,3),2\times 3} = \left ( \begin{matrix} 2a_1x_{11} & 2a_3x_{21} & 2a_5x_{31} \\ 2a_2x_{12} & 2a_4x_{22} & 2a_6x_{32} \\ \end{matrix} \right) _{(2,3),2\times 3} \]

    2 \(\text{function}\)是一个向量

    我们称\(\text{function}\)是一个实向量函数。用粗体小写字母\(\mathbfcal{f}\)表示。

    含义: \(\mathbfcal{f}\)是由 若干个\(f\)组成的一个向量

    image-20220401095853152

    2.1 \(\text{input}\)是一个标量

    计算:输入(变元)是标量(\((1,)\)),函数是列向量函数(\((m,1)m \times 1\)),输出结果是列向量(\((m,1)m \times 1\))

    求导: 求导分母(函数值)是列向量(\((m,1)m \times 1\)),分子是标量(\((1,)\)),求导结果是列向量(\((m,1)m \times 1\))

    \[x \\ \mathbfcal{y} = \mathbfcal{f}(x)= \left[ \begin {array}{1} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right ] \\ \frac{\partial \mathbfcal{y}}{ \partial x} = \left[ \begin {array}{1} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x}\\ \vdots \\ \frac{\partial y_n}{\partial x} \end{array} \right ] \]

    image-20220401100531229

    例四

    \[x \\ \mathbfcal{y}=\mathbfcal{f}(x)= \left[ \begin {array}{1} x+1 \\ 2x^2+1 \\ 3x^3+1 \end{array} \right ]_{(3,1)3 \times 1} \\ \mathbfcal{f}'(x)=\frac{\partial \mathbfcal{y}}{ \partial x} = \left[ \begin {array}{1} 1 \\ 4x \\ 9x^2 \end{array} \right ]_{(3,1)3\times 1} \]

    2.2 \(\text{input}\)​是一个向量

    计算:输入(变元)是向量(\((n,1)n\times 1\)),函数是列向量函数(\((m,1)m \times 1\)),输出结果是列向量(\((m,1)m \times 1\))

    求导: 求导分母(函数值)是列向量(\((m,1)m \times 1\)),分子是向量(\((n,1) n \times 1\)),求导结果是列向量(\((m,n)m \times n\))

    Jacobian矩阵

    • Jacobian矩阵可被视为是一种组织梯度向量的方法。
    • 梯度向量可以被视为是一种组织偏导数的方法。
    • 故,Jacobian矩阵可以被视为一个组织偏导数的矩阵。

    \[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right ]_{(n,1)n\times 1} \\ \mathbfcal{y} = \mathbfcal{f}(\mathbfcal{x})= \left[ \begin {array}{1} y_1 \\ y_2 \\ \vdots \\ y_m \end{array} \right ] = \left[ \begin {array}{1} f_1(x_1,x_2,\cdots,x_n) \\ f_2(x_1,x_2,\cdots,x_n) \\ \vdots \\ f_m(x_1,x_2,\cdots,x_n) \end{array} \right ]_{(m,1)m\times 1} \\ \frac{\partial \mathbfcal{y}}{ \partial \mathbfcal{x}} = \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{1}} & \frac{\partial y_1}{\partial x_{2}} & \cdots & \frac{\partial y_1}{\partial x_{n}} \\ \frac{\partial y_2}{\partial x_{1}} & \frac{\partial y_2}{\partial x_{2}} & \cdots & \frac{\partial y_2}{\partial x_{n}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{1}} & \frac{\partial y_m}{\partial x_{2}} & \cdots & \frac{\partial y_m}{\partial x_{n}} \\ \end{matrix} \right )_{(m,n)m\times n} \]

    image-20220401105850540

    image-20220401104012735

    例五

    \[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \end{array} \right ]_{(2,1)2 \times 1} \\ \mathbfcal{y}=\mathbfcal{f}(x)= \left[ \begin {array}{1} x_1+x_2 \\ 2x_1^2+2x_2^2 \\ 3x_1^3+3x_2^3 \end{array} \right ]_{(3,1)3 \times 1} \\ \mathbfcal{f}'(x)=\frac{\partial \mathbfcal{y}}{ \partial x} = \left[ \begin {array}{1} 1 &1 \\ 4x_1 &4x_2 \\ 9x_1^2 &9x_3^2 \end{array} \right ]_{(3,2)3\times 2} \]

    2.3 \(\text{input}\)是一个矩阵

    计算:输入(变元)是矩阵(\((n,k)n\times k\)),函数是列向量函数(\((m,1)m \times 1\)),输出结果是列向量(\((m,1)m \times 1\))

    求导: 求导分母(函数值)是列向量(\((m,1)m \times 1\)),分子是矩阵(\((n,k) n \times k\)),求导结果是张量(\((m,k,n)m \times k \times n\))

    \[\symbf{X} = \left ( \begin{matrix} x_{11} & x_{12} & \cdots & x_{1k} \\ x_{21} & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \\ \end{matrix} \right )_{(n,k)n\times k} \\ \\ \mathbfcal{y} = \mathbfcal{f}(\symbf{X})= \left[ \begin {array}{1} y_1 \\ y_2 \\ \vdots \\ y_m \end{array} \right ] = \left[ \begin {array}{1} f_1(x_{11},x_{12},\cdots,x_{21},\cdots,x_{nn}) \\ f_2(x_{11},x_{12},\cdots,x_{21},\cdots,x_{nn}) \\ \vdots \\ f_m(x_{11},x_{12},\cdots,x_{21},\cdots,x_{nn}) \end{array} \right ]_{(m,1)m\times 1} \\ \frac{\partial \mathbfcal{y}}{ \partial \symbf{X}} = \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{11}} & \frac{\partial y_1}{\partial x_{12}} & \cdots & \frac{\partial y_1}{\partial x_{1k}} \\ \frac{\partial y_2}{\partial x_{11}} & \frac{\partial y_2}{\partial x_{12}} & \cdots & \frac{\partial y_2}{\partial x_{1k}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{11}} & \frac{\partial y_m}{\partial x_{12}} & \cdots & \frac{\partial y_m}{\partial x_{1k}} \\ \end{matrix} \right ) \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{21}} & \frac{\partial y_1}{\partial x_{22}} & \cdots & \frac{\partial y_1}{\partial x_{2k}} \\ \frac{\partial y_2}{\partial x_{21}} & \frac{\partial y_2}{\partial x_{22}} & \cdots & \frac{\partial y_2}{\partial x_{2k}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{21}} & \frac{\partial y_m}{\partial x_{22}} & \cdots & \frac{\partial y_m}{\partial x_{2k}} \\ \end{matrix} \right ) \cdots \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{n1}} & \frac{\partial y_1}{\partial x_{n2}} & \cdots & \frac{\partial y_1}{\partial x_{nk}} \\ \frac{\partial y_2}{\partial x_{n1}} & \frac{\partial y_2}{\partial x_{n2}} & \cdots & \frac{\partial y_2}{\partial x_{nk}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{n1}} & \frac{\partial y_m}{\partial x_{n2}} & \cdots & \frac{\partial y_m}{\partial x_{nk}} \\ \end{matrix} \right ) _{(m,k,n)m\times k\times n} \]

    image-20220401111128159

    image-20220401114216702

    3 \(\text{function}\)是一个矩阵

    我们称\(\text{function}\)是一个实矩阵函数。用粗体大写字母\(\mathbf{F}\)表示。

    含义: \(\mathbf{F}\)是由 若干个\(f\)组成的一个矩阵

    3.1 \(\text{input}\)是一个标量

    计算:输入是标量(\((1,)\)),函数是一个实矩阵函数,结果是一个矩阵(\((m,l)\)

    求导: 分母(函数值)是矩阵(\((m,l)\)),分子是标量(\((1,)\)),结果是矩阵(\((m,l)\)

    \[x \\ \symbf{Y} = \mathbf{F}(x)= \left ( \begin{matrix} f_{11}(x) & f_{12}(x) & \cdots & f_{1l}(x) \\ f_{21}(x) & f_{22}(x) & \cdots & f_{2l}(x) \\ \vdots & \vdots & & \vdots \\ f_{m1}(x) & f_{m2}(x) & \cdots & f_{ml}(x) \\ \end{matrix} \right ) = \left ( \begin{matrix} y_{11} & y_{12} & \cdots & y_{1l} \\ y_{21} & y_{22} & \cdots & y_{2l} \\ \vdots & \vdots & & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{ml} \\ \end{matrix} \right )_{(m,l)m\times l} \\ \frac{ \partial\symbf{Y}}{\partial x} =\mathbf{F}'(x) = \left ( \begin{matrix} f'_{11}(x) & f'_{12}(x) & \cdots & f'_{1l}(x) \\ f'_{21}(x) & f'_{22}(x) & \cdots & f'_{2l}(x) \\ \vdots & \vdots & & \vdots \\ f'_{m1}(x) & f'_{m2}(x) & \cdots & f'_{ml}(x) \\ \end{matrix} \right ) = \left ( \begin{matrix} \frac{ \partial y_{11}}{ \partial x} & \frac{ \partial y_{12}}{ \partial x} & \cdots & \frac{ \partial y_{1l}}{ \partial x} \\ \frac{ \partial y_{21}}{ \partial x} & \frac{ \partial y_{22}}{ \partial x} & \cdots & \frac{ \partial y_{2l}}{ \partial x} \\ \vdots & \vdots & & \vdots \\ \frac{ \partial y_{m1}}{ \partial x} & \frac{ \partial y_{m2}}{ \partial x} & \cdots & \frac{ \partial y_{ml}}{ \partial x} \\ \end{matrix} \right )_{(m,l)m\times l} \]

    image-20220401120135813

    3.2 \(\text{input}\)是一个向量

    计算:输入是向量(\((n,1)\)),函数是一个实矩阵函数,结果是一个矩阵(\((m,l)\)

    求导: 分母(函数值)是矩阵(\((m,l)\)),分子是向量(\((n,1)\)),结果是矩阵(\((m,l,n)m\times l \times n\)

    3.3 \(\text{input}\)​是一个矩阵

    计算:输入是矩阵(\((n,k)\)),函数是一个实矩阵函数,结果是一个矩阵(\((m,l)\)

    求导: 分母(函数值)是矩阵(\((m,l)\)),分子是矩阵(\((n,k)\)),结果是矩阵(\((m,l,k,n)m\times l \times k \times n\)

    总结

    img

    image-20220331192357424

    样例

    标量关于向量求导1.2

    image-20220402102242277

    向量关于向量求导2.2

    image-20220402102209146

    二 向量链式法则

    image-20220402102028760

    标量链式法则

    \(x,u,y\)都是标量

    \[y=f(u) , u=g(x) \\ \frac{\partial y}{\partial x} = \frac{\partial y}{\partial u}\frac{\partial u}{\partial x} \]

    向量链式法则

    标量关于向量求导

    • 中间变量是标量 1.1 1.2

      \[y=f(u) , u=g(\mathbfcal{x}) \\ \mathbfcal{x}_{(n,1)}、u_{(1,)}、y_{(1,)} \\ \frac{\partial y}{\partial \mathbfcal{x}}_{(1,n)} = \frac{\partial y}{\partial u}_{(1,)} \frac{\partial u}{\partial \mathbfcal{x}}_{(1,n)} \]

      image-20220402104700857

    • 中间变量是向量 1.2 , 2.2

      \[y=f(\mathbfcal{u}) , \mathbfcal{u}_{(k,1)}=\mathbfcal{g}(\mathbfcal{x}) \\ \mathbfcal{x}_{(n,1)}、\mathbfcal{u}_{(k,1)}、y_{(1,)} \\ \frac{\partial y}{\partial \mathbfcal{x}}_{(1,n)} = \frac{\partial y}{\partial \mathbfcal{u}}_{(1,k)} \frac{\partial \mathbfcal{u}}{\partial \mathbfcal{x}}_{(k,n)} \]

      image-20220402110032446

    向量关于向量求导

    • 中间变量是向量 2.2 2.2

      \[\mathbfcal{y}_{(m,1)}=\mathbfcal{f}_{(m,1)}(\mathbfcal{u}_{(k,1)}) , \mathbfcal{u}_{(k,1)}=\mathbfcal{g}_{(k,1)}(\mathbfcal{x}_{(n,1)}) \\ \mathbfcal{x}_{(n,1)}、\mathbfcal{u}_{(k,1)}、\mathbfcal{y}_{(m,1)} \\ \frac{\partial \mathbfcal{y}}{\partial \mathbfcal{x}}_{(m,n)} = \frac{\partial \mathbfcal{y}}{\partial \mathbfcal{u}}_{(m,k)} \frac{\partial \mathbfcal{u}}{\partial \mathbfcal{x}}_{(k,n)} \]

    样例

    image-20220402111412097

    image-20220402111719161

    三 自动求导

    image-20220403101900651

    自动求导,是求导计算一个函数在指定值上的导数

    计算图

    image-20220403101958920

    两种模式

    image-20220403102044180

    反向累积

    image-20220403102255322

    image-20220403102339593

    复杂度

    image-20220403102422729

    代码实现

    image-20220403114451327

    image-20220403114510473

    image-20220403114521486

  • 相关阅读:
    终于合一起了...
    关于Earley第二篇论文给的建立parse tree的算法的bug
    不作恶
    windows server安装ATI显卡驱动
    不作恶:这次我是一个坚定的五毛党
    看了老刘的文章
    webkit svg高斯模糊的bug
    世界不平坦
    我发现我对人类活动的认识开始有一点点变化了
    css parsing中词法的RegEx(python)
  • 原文地址:https://www.cnblogs.com/zuti666/p/16095607.html
Copyright © 2020-2023  润新知