• 散度


    原谅我写中文太累了,而且相信在座的都有一定的英文水平。

    KL散度

      考虑某个未知分布  $p(x)$  ,假定已经使用一个近似的分布  $q(x)$  对它进行建模。如果使用  $q(x)$  来建立一个编码体系,用来把  $x$  的值传给接收者,那么,由于使用 了  $q(x) $ 而不是真实分布 $ p(x) $ ,因此在具体化  $x$  的值时,需要一些附加的信息。我们需要的平均的附加信息量(单位是nat)为:

        $\begin{aligned}D_{K L}(p \| q) &=-\int p(x) \log q(x)-(-\int p(x) \log p(x)) \\&=-\int p(x) \log \frac{q(x)}{p(x)} d x\end{aligned}$

      得:

        $D_{K L}(p \| q)=-\int p(X) \log \frac{q(X)}{p(X)} d X$

        $D_{K L}(q \| p)=-\int q(X) \log \frac{p(X)}{q(X)} d X$

      KL散度在一定程度上衡量了两个分布的差异,具有如下性质:

      • $D_{K L}(p \| q)>=0$ ,并且当且仅当时取等号。
      • 不满足对称性,即 $D_{K L}(p \| q) \neq D_{K L}(q \| p)$  ,因此选择作为衡量两个分布的差距时要慎重选择。

    $\alpha$-散度

      给定 $\alpha \in \mathbb{R}$ ,$\alpha$ 散度 可以被定义为

        $\frac{1}{\alpha(1-\alpha)}\left(1-\sum\limits _{x} p_{2}(x)\left(\frac{p_{1}(x)}{p_{2}(x)}\right)^{\alpha}\right)$

      KL散度是$\alpha$ 散度的一个特例,$K L\left(P_{1}, P_{2}\right)$ ,  $K L\left(P_{2}, P_{1}\right)$ 分别对应 $ \alpha=1, \alpha=0$,且 $\alpha \neq 0,1 $。

      The Amari divergence come from the above by the transformation $\alpha=\frac{1+t}{2}$.

    JS散度

      为构造出对称的形式,可以将两种 KL 散度结合起来,就是 JS 散度(Jensen-Shannon散度),表达式如下:

        $D_{J S}(p \| q)=\frac{1}{2} D_{K L}\left(p \| \frac{p+q}{2}\right)+\frac{1}{2} D_{K L}\left(q \| \frac{p+q}{2}\right)$

      性质:

    • JS散度是对称的。
    • JS散度有界,范围是 $  [0, 1]$ 。

    F-散度

      Given a convex function  $f(t): \mathbb{R}_{\geq 0} \rightarrow \mathbb{R}$  with  f$(1)=0$, $f^{\prime}(1)=0$, $f^{\prime \prime}(1)=  1$ , the  $f$ -divergence  on  $\mathcal{P}$  is defined by

        $\sum \limits _{x} p_{2}(x) f\left(\frac{p_{1}(x)}{p_{2}(x)}\right)$

    • The cases  $f(t)=t \ln t$ correspond to the Kullback-Leibler distance.
    • The cases  $f(t)=(t-1)^{2}$ correspond to the $\chi^{2}$ -distance.
    • The case  $f(t)=|t-1|$ correspond to the  variational distance.
    • The case $f(t)=4(1-\sqrt{t})$ (as well as $f(t)=2(t+1)-4 \sqrt{t})$ corresponds to the squared Hellinger metric.
    • The case $f(t)=(t-1)^{2} /(t+1) $ correspond to the Vajda–Kus semimetric.
    • The case $ f(t)=\left|t^{a}-1\right|^{1 / a}$ with $0<a \leq 1$ correspond to the generalized Matusita distance.
    • The case $f(t)=\frac{\left(t^{a}+1\right)^{1 / a}-2^{(1-a) / a}(t+1)}{1-1 / \alpha} $ correspond to the Osterreicher semimetric.

    Harmonic mean similarity

      The harmonic mean similarity is a similarity on $\mathcal{P}$ defined by

        $2 \sum \limits _{x} \frac{p_{1}(x) p_{2}(x)}{p_{1}(x)+p_{2}(x)} .$

    Fidelity similarity

      The fidelity similarity (or Bhattacharya coefficient, Hellinger affinity) on $\mathcal{P}$ is

        $\rho\left(P_{1}, P_{2}\right)=\sum_{x} \sqrt{p_{1}(x) p_{2}(x)} .$

    Hellinger metric

      In terms of the fidelity similarity $\rho$ , the Hellinger metric (or Matusita distance, Hellinger-Kakutani metric) on $\mathcal{P}$ is defined by

        $\left(\sum\limits_{x}\left(\sqrt{p_{1}(x)}-\sqrt{p_{2}(x)}\right)^{2}\right)^{\frac{1}{2}}=\sqrt{2\left(1-\rho\left(P_{1}, P_{2}\right)\right)}$

    Bhattacharya distance 1

      In terms of the fidelity similarity $\rho$ , the Bhattacharya distance 1 (1946) is

        $\left(\arccos \rho\left(P_{1}, P_{2}\right)\right)^{2} $
      for $P_{1}, P_{2} \in \mathcal{P}$ . Twice this distance is the Rao distance  . It is used also in Statistics and Machine Learning, where it is called the Fisher distance.

    Bhattacharya distance 2

      The Bhattacharya distance 2(1943) on $\mathcal{P}$ is defined by

        $-\ln \rho\left(P_{1}, P_{2}\right)$

    $\chi^{2}$ -distance 

      The $\chi^{2}$ -distance (or Pearson $\chi^{2} $-distance) is a quasi-distance on $\mathcal{P}$ , defined by

        $\sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{2}(x)}$

      The Neyman $\chi^{2}$ -distance is a quasi-distance on $\mathcal{P} $, defined by

        $\sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{1}(x)} .$

      The half of $\chi^{2}$ -distance is also called Kagan's divergence.

      The probabilistic symmetric $\chi^{2}$ -measure is a distance on $\mathcal{P} $, defined by

        $2 \sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{1}(x)+p_{2}(x)} .$

      由于我暂时用不到剩下的,所以没写。

      本文参考 《 Encyclopedia  of Distances》,需要电子书的联系博主。

      Distances on Distribution Laws ..................................... 261

      同时参考了另外一个”借鉴“者的博客《机器学习中的数学

  • 相关阅读:
    css属性操作2(外边距与内边距<盒子模型>)
    css的属性操作1
    css伪类
    属性选择器二
    属性选择器1
    03_MySQL重置root密码
    02_Mysql用户管理之Navicat下载及安装
    18.扩散模型
    17.广播模型
    16.友谊悖论
  • 原文地址:https://www.cnblogs.com/BlairGrowing/p/15859978.html
Copyright © 2020-2023  润新知