多元复合函数二阶导数与向量微积分的思考
引入
对于形似(z=f(u_1,u_2,...,u_n),)其中(u_i=g_i(x_i))的多元复合函数,对其二阶导数的考察常常会经过繁琐而重复的运算,且容易在连续运用链式法则时犯错。本文将提出该类题型的通解以及理论推导过程供参考。
例1:设(z=f(x^2-y^2,e^{xy})),其中(f)具有二阶连续偏导数,求 (frac{partial ^2z}{ partial x partial y}).
通过链式法则,我们可以得到结果(frac{partial ^2z}{ partial x partial y}=-4xyf^{''}_{11}+2(x^2-y^2)e^{xy}f^{''}_{12}+xye^{2xy}f{''}_{22}+e^{xy}(1+xy)f^{'}_2)
对于式子中的(f^{''}_{11}、f^{''}_{12})的出现,我们可以联想到矩阵的下标,由此引发我们对该式子简化形式甚至该类题型通解的思考。
梯度矩阵
我们定义[1],对于一个函数(f: ℝ^n
ightarrow ℝ ,pmb{x}
ightarrow f(pmb{x}),pmb{x}in ℝ^n),即,(pmb{x}=[x_1,x_2,x_3,...,x_n]^T),偏导数为:
[frac{partial f}{partial x_1}= lim_{h
ightarrow 0} ffrac{(x_1+h,x_2,...,x_n)-f(pmb{x})}{h}\.\.\.\frac{partial f}{partial x_n}= lim_{h
ightarrow 0}frac{f(x_1,x_2,...,x_n+h)-f(pmb{x})}{h} ag{2.1}
]
我们写作行向量的形式,记作:
[∇_{pmb{x}}f=grad f=left[egin{matrix}frac{partial f(pmb{x})}{partial x_1} & frac{partial f(pmb{x})}{partial x_1} & ... & frac{partial f(pmb{x})}{partial x_n}\end{matrix}
ight] inℝ^n ag{2.2}
]
例如,对于函数(f(x,y)=(x+2y^3)^2),我们有:
[∇f=left[egin{matrix}2(x+2y^3) & 12(x+2y^3)y^2end{matrix}
ight] inℝ^{1×2} ag{2.3}
]
为了探求文章开始所提出问题通解形式的探讨,绕不开的一个重要步骤是对梯度矩阵(∇f)进行求导,我们将在推导的过程中单独进行分析。
多元复合函数的二阶导数与黑塞矩阵
设(z=f(u_1,u_2,...,u_n),)其中(u_i=g_i(x_i)),求(frac{partial ^2z}{ partial x_i partial x_j}).
[frac{partial z}{ partial x_i}=frac{partial z}{ partial pmb{u} }·frac{partial pmb{u}}{ partial x_i} =left[egin{matrix}frac{partial f}{partial u_1} & frac{partial f}{partial u_2} & ... & frac{partial f}{partial u_n}end{matrix}
ight] left[egin{matrix}frac{partial u_1}{partial x_i} \ frac{partial u_2}{partial x_i} \ ... \ frac{partial u_n}{partial x_i}end{matrix}
ight] ag{3.1}
]
为了简化形式,我们令:
[pmb{X_i}=left[egin{matrix}frac{partial u_1}{partial x_i} & frac{partial u_2}{partial x_i} & ... & frac{partial u_n}{partial x_i}end{matrix}
ight]^T ag{3.2}
]
那么:
[frac{partial z}{ partial x_i}=∇_{pmb{u}}f·pmb{X_i} ag{3.3}
]
接下来,我们需要求解
[frac{partial {}}{ partial x_i}(∇_{pmb{u}}f·pmb{X_i} ag{3.4})
]
[frac{partial {}}{ partial x_j}(∇_{pmb{u}}f·pmb{X_i} )=frac{partial {}}{ partial x_j}∇_{pmb{u}}f·pmb{X_i} + ∇_{pmb{u}}f·frac{partial {}}{ partial x_j}pmb{X_i} ag{3.5}
]
(frac{partial {}}{ partial x_j}pmb{X_i})的答案容易得到的,我们着重于讨论(frac{partial {}}{ partial x_j}∇_{pmb{u}}f·pmb{X_i}),尤其是(frac{partial {}}{ partial x_j}∇_{pmb{u}}f)的结果。
经分析:
[frac{partial {}}{ partial x_j}∇_{pmb{u}}f=frac{partial {}}{ partial pmb{u}^T}·frac{partial {pmb{u}^T}}{ partial x_j}·∇_{pmb{u}}f=frac{partial {pmb{u}^T}}{ partial x_j}·frac{partial {}}{ partial pmb{u}^T}·∇_{pmb{u}}f ag{3.6}
]
问题被简化转化为解决向量((∇_{pmb{u}}f))对向量((pmb{u}^T))求导的问题。
我们对这个运算进行进一步分析,这个运算的实质是梯度矩阵中的元素逐个对(u_i)分别求导,结果显然是一个(2×2)的方阵,而这个矩阵在数学上被定义为 黑塞矩阵(Hessian Matrix),记作(H(f)),它的具体形式是:
[A= left[egin{matrix} frac{partial^2 f}{partial x_1partial x_1} & frac{partial^2 f}{partial x_1partial x_2} & cdots & frac{partial^2 f}{partial x_1partial x_n}\ frac{partial^2 f}{partial x_2partial x_1} & frac{partial^2 f}{partial x_2partial x_2} & cdots & frac{partial^2 f}{partial x_2partial x_n} \ vdots & vdots & ddots & vdots \ frac{partial^2 f}{partial x_npartial x_1} & frac{partial^2 f}{partial x_npartial x_2} & cdots & frac{partial^2 f}{partial x_1partial x_n} end{matrix}
ight] ag{3.7}
]
其规律是显而易见的。
于是,引入(H(f))后,我们可以继续化简:
[frac{partial {pmb{u}^T}}{ partial x_j}·frac{partial {}}{ partial pmb{u}^T}·∇_{pmb{u}}f=frac{partial {pmb{u}^T}}{ partial x_j}·left[egin{matrix} frac{partial^2 f}{partial u_1partial u_1} & frac{partial^2 f}{partial u_1partial u_2} & cdots & frac{partial^2 f}{partial x_1partial u_n}\ frac{partial^2 f}{partial x_2partial x_1} & frac{partial^2 f}{partial u_2partial u_2} & cdots & frac{partial^2 f}{partial u_2partial u_2} \ vdots & vdots & ddots & vdots \ frac{partial^2 f}{partial u_npartial u_1} & frac{partial^2 f}{partial u_npartial u_2} & cdots & frac{partial^2 f}{partial u_1partial u_n} end{matrix}
ight]=pmb{X_j}^T·H_{pmb{u}}(f) ag{3.8}
]
所以
[frac{partial ^2z}{ partial x_i partial x_j}=pmb{X_j}^T·H_{pmb{u}}(f)·pmb{X_i}+∇_{pmb{u}}f·frac{partial {}}{ partial x_j}pmb{X_i}=pmb{X_j}^T·H_{pmb{u}}(f)·pmb{X_i}+∇_{pmb{u}}f·pmb{X_{ij}} ag{3.9}
]
其中
[pmb{X_{ij}}=left[egin{matrix}frac{partial^2 u_1}{partial x_ipartial x_j} & frac{partial^2 u_2}{partial x_ipartial x_j} & ... & frac{partial^2 u_n}{partial x_ipartial x_j}end{matrix}
ight]^T ag{3.10}
]
当然在实际计算过程中,由于(pmb{X_i})的值已经被计算,所以直接计算(frac{partial {}}{ partial x_j}pmb{X_i})或许更为便捷。
总结
设(z=f(u_1,u_2,...,u_n),)其中(u_i=g_i(x_i)),求(frac{partial ^2z}{ partial x_i partial x_j}).
[frac{partial z}{ partial x_i}=∇_{pmb{u}}f·pmb{X_i} \frac{partial ^2z}{ partial x_i partial x_j}=pmb{X_j}^T·H_{pmb{u}}(f)·pmb{X_i}+∇_{pmb{u}}f·pmb{X_{ij}} ag{end}
]
参考
- [1] 《MATHEMATICS FOR MACHINE LEARNING》(Marc Peter Deisenroth,A. Aldo Faisal ,Cheng Soon Ong)