• Perplexity Vs Cross-entropy


    Evaluating a Language Model: Perplexity

    We have a serial of (m) sentences:

    [s_1,s_2,cdots,s_m ]

    We could look at the probability under our model (prod_{i=1}^m{p(s_i)}). Or more conveniently, the log probability:

    [log prod_{i=1}^m{p(s_i)}=sum_{i=1}^m{log p(s_i)} ]

    where (p(s_i)) is the probability of sentence (s_i).

    In fact, the usual evaluation measure is perplexity:

    [PPL=2^{-l} ]

    [l=frac{1}{M}sum_{i=1}^m{log p(s_i)} ]

    and (M) is the total number of words in the test data.

    Cross-Entropy

    Given words (x_1,cdots,x_t), a language model prdicts the following word (x_{t+1}) by modeling:

    [P(x_{t+1}=v_j|x_tcdots,x_1)=hat y_j^t ]

    where (v_j) is a word in the vocabulary.

    The predicted output vector (hat y^tin mathbb{R}^{|V|}) is a probability distribution over the vocabulary, and we optimize the cross-entrpy loss:

    [mathcal{L}^t( heta)=CE(y^t,hat y^t)=-sum_{i=1}^{|V|}{y_i^tlog hat y_i^t} ]

    where (y^t) is the one-hot vector corresponding to the target word. This is a poiny-wise loss, and we sum the cross-ntropy loss across all examples in a sequence, across all sequences in the dataset in order to evaluate model performance.

    The relationship between cross-entropy and ppl

    [PP^t=frac{1}{P(x_{t+1}^{pred}=x_{t+1}|x_tcdots,x_1)}=frac{1}{sum_{j=1}^V {y_j^tcdot hat y_j^t}} ]

    which is the inverse probability of the correct word, according to the model distribution (P).

    suppose (y_i^t) is the only nonzero element of (y^t). Then, note that:

    [CE(y^t,hat y^t)=-log hat y_i^t=logfrac{1}{hat y_i^t} ]

    [PP(y^t,hat y^t)=frac{1}{hat y_i^t} ]

    Then, it follows that:

    [CE(y^t,hat y^t)=log PP(y^t,hat y^t) ]

    In fact, minizing the arthimic mean of the cross-entropy is identical to minimizing the geometric mean of the perplexity. If the model predictions are completely random, (E[hat y_i^t]=frac{1}{|V|}), and the expected cross-entropies are (log |V|), ((log 10000approx 9.21))

  • 相关阅读:
    Eclipse SVN忽略某些文件或文件夹方法
    在ORACLE中给已有数据的表增加、修改、删除一个字段(或一个列)或者多个字段(或多个列)的问题
    Java中IO流,输入输出流概述与总结
    Java面向对象之继承
    jquery-each()
    window.showModalDialog以及window.open用法简介
    struts1、ajax、jquery、json简单实例
    软件人才管理
    疑难杂症定位记录
    linux中断子系统
  • 原文地址:https://www.cnblogs.com/ZJUT-jiangnan/p/5612096.html
Copyright © 2020-2023  润新知