损失函数及其梯度

Typical Loss
MSE
- Derivative
- MSE Gradient
Softmax
- Derivative

Typical Loss

Mean Squared Error
Cross Entropy Loss
- binary
- multi-class
- +softmax

MSE

$l o s s = \sum [y - (x w + b)]^{2}$
$L_{2 - n o r m} = | | y - (x w + b) | |_{2}$
$l o s s = n o r m (y - (x w + b))^{2}$

Derivative

$l o s s = \sum [y - f_{θ} (x)]^{2}$
$\frac{\nabla loss}{\nabla θ} = 2 \sum [y - f_{θ} (x)] * \frac{\nabla f_{θ} (x)}{\nabla θ}$

MSE Gradient

import tensorflow as tf

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:

tape.watch([w, b])

prob = tf.nn.softmax(x @ w + b, axis=1)

loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y, depth=3), prob))
grads = tape.gradient(loss, [w, b])

grads[0]

<tf.Tensor: id=92, shape=(4, 3), dtype=float32, numpy=
array([[ 0.01156707, -0.00927749, -0.00228957],
       [ 0.03556816, -0.03894382,  0.00337564],
       [-0.02537526,  0.01924876,  0.00612648],
       [-0.0074787 ,  0.00161515,  0.00586352]], dtype=float32)>

grads[1]

<tf.Tensor: id=90, shape=(3,), dtype=float32, numpy=array([-0.01552947,  0.01993286, -0.00440337], dtype=float32)>

Softmax

soft version of max
大的越来越大，小的越来越小、越密集

21-损失函数及其梯度-softmax.jpg

Derivative

p_{i} = \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}

\frac{\partial p_{i}}{\partial a_{j}} = \frac{\partial \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}}{\partial a_{j}} = p_{i} (1 - p_{j})

$i \neq j$

\frac{\partial p_{i}}{\partial a_{j}} = \frac{\partial \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}}{\partial a_{j}} = - p_{j} * p_{i}

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:

tape.watch([w, b])

logits =x @ w + b

loss = tf.reduce_mean(

tf.losses.categorical_crossentropy(tf.one_hot(y, depth=3),

logits,

from_logits=True))
grads = tape.gradient(loss, [w, b])

grads[0]

<tf.Tensor: id=226, shape=(4, 3), dtype=float32, numpy=
array([[-0.38076094,  0.33844548,  0.04231545],
       [-1.0262716 , -0.6730384 ,  1.69931   ],
       [ 0.20613424, -0.50421923,  0.298085  ],
       [ 0.5800004 , -0.22329211, -0.35670823]], dtype=float32)>

grads[1]

<tf.Tensor: id=224, shape=(3,), dtype=float32, numpy=array([-0.3719653 ,  0.53269935, -0.16073406], dtype=float32)>

相关阅读:
【从0开始学架构】架构设计三原则
 【Linux网络】端口
 【Python连接数据库】Python连接Teradata数据库-TD SQL Driver 方式（teradatasql包）
【Python实战】python中含有中文字符无法运行
 【Linux基础】linux updatedb命令
 【Linux基础】查看和更改当前系统字符集（LC_ALL、LC_TYPE和LANG）
【数据库】ODBC与JDBC
LNMP安装目录及配置文件位置
 Linux Vi 的使用
 C#VS2017添加ReportViewer控件
原文地址：https://www.cnblogs.com/abdm-989/p/14123298.html