Perplexity Vs Cross-entropy

Perplexity Vs Cross-entropy

Evaluating a Language Model: Perplexity

We have a serial of (m) sentences:

[s_1,s_2,cdots,s_m ]
We could look at the probability under our model (prod_{i=1}^m{p(s_i)}). Or more conveniently, the log probability:

[log prod_{i=1}^m{p(s_i)}=sum_{i=1}^m{log p(s_i)} ]
where (p(s_i)) is the probability of sentence (s_i).

In fact, the usual evaluation measure is perplexity:

[PPL=2^{-l} ]
[l=frac{1}{M}sum_{i=1}^m{log p(s_i)} ]
and (M) is the total number of words in the test data.

Cross-Entropy

Given words (x_1,cdots,x_t), a language model prdicts the following word (x_{t+1}) by modeling:

[P(x_{t+1}=v_j|x_tcdots,x_1)=hat y_j^t ]
where (v_j) is a word in the vocabulary.

The predicted output vector (hat y^tin mathbb{R}^{|V|}) is a probability distribution over the vocabulary, and we optimize the cross-entrpy loss:

[mathcal{L}^t( heta)=CE(y^t,hat y^t)=-sum_{i=1}^{|V|}{y_i^tlog hat y_i^t} ]
where (y^t) is the one-hot vector corresponding to the target word. This is a poiny-wise loss, and we sum the cross-ntropy loss across all examples in a sequence, across all sequences in the dataset in order to evaluate model performance.

The relationship between cross-entropy and ppl

[PP^t=frac{1}{P(x_{t+1}^{pred}=x_{t+1}|x_tcdots,x_1)}=frac{1}{sum_{j=1}^V {y_j^tcdot hat y_j^t}} ]
which is the inverse probability of the correct word, according to the model distribution (P).

suppose (y_i^t) is the only nonzero element of (y^t). Then, note that:

[CE(y^t,hat y^t)=-log hat y_i^t=logfrac{1}{hat y_i^t} ]
[PP(y^t,hat y^t)=frac{1}{hat y_i^t} ]
Then, it follows that:

[CE(y^t,hat y^t)=log PP(y^t,hat y^t) ]
In fact, minizing the arthimic mean of the cross-entropy is identical to minimizing the geometric mean of the perplexity. If the model predictions are completely random, (E[hat y_i^t]=frac{1}{|V|}), and the expected cross-entropies are (log |V|), ((log 10000approx 9.21))
相关阅读:
Eclipse SVN忽略某些文件或文件夹方法
 在ORACLE中给已有数据的表增加、修改、删除一个字段（或一个列）或者多个字段（或多个列）的问题
 Java中IO流，输入输出流概述与总结
 Java面向对象之继承
 jquery-each()
window.showModalDialog以及window.open用法简介
 struts1、ajax、jquery、json简单实例
 软件人才管理
 疑难杂症定位记录
 linux中断子系统
原文地址：https://www.cnblogs.com/ZJUT-jiangnan/p/5612096.html

Perplexity Vs Cross-entropy

Evaluating a Language Model: Perplexity

Cross-Entropy

The relationship between cross-entropy and ppl