beam search

beam search
Beam Search

greedy search

generate (or “decode”) the target sentence by taking argmax on each step of the decoder

problem with greedy search :
- Greedy decoding has no way to undo decisions!
  - Input: il a m’entarté (he hit me with a pie)
  - → he ____
  - → he hit ____
  - → he hit a ____ (whoops! no going back now…)
Exhaustive search decoding

Ideally we want to find a (length T) translation y that maximizes :

[egin{aligned} P(y | x) &=Pleft(y_{1} | x ight) Pleft(y_{2} | y_{1}, x ight) Pleft(y_{3} | y_{1}, y_{2}, x ight) ldots, Pleft(y_{T} | y_{1}, ldots, y_{T-1}, x ight) \ &=prod_{t=1}^{T} Pleft(y_{t} | y_{1}, ldots, y_{t-1}, x ight) end{aligned} ]
We could try computing all possible sequences y:
- This means that on each step t of the decoder, we’re tracking (V^t) possible partial translations, where (V) is vocab size
- This (O(V^t)) complexity is far too expensive!
beam search
- Core idea : On each step of decoder, keep track of the k most probable partial translations (which we call hypotheses), where (k) is the beam size (in practice around 5 to 10)
- A hypothesis (y_1,cdots,y_t) has a score which is its log probability:
  
  [operatorname{score}left(y_{1}, ldots, y_{t} ight)=log P_{mathrm{LM}}left(y_{1}, ldots, y_{t} | x ight)=sum_{i=1}^{t} log P_{mathrm{LM}}left(y_{i} | y_{1}, ldots, y_{i-1}, x ight) ]
  - Scores are all negative, and higher score is better
  - We search for high-scoring hypotheses, tracking top (k) on each step
- Beam search is not guaranteed to find optimal solution
- But much more efficient than exhaustive search!
Beam search decoding: stopping criterion
- In greedy decoding, usually we decode until the model produces a token
- In beam search decoding, different hypotheses may produce
  tokens on different timesteps
  
  When a hypothesis produces , that hypothesis is complete.
  
  Place it aside and continue exploring other hypotheses via beam search.
- Usually we continue beam search until:
  - We reach timestep T (where T is some pre-defined cutoff), or
  - We have at least n completed hypotheses (where n is pre-defined cutoff)
Beam search decoding: finishing up
- We have our list of completed hypotheses.
- How to select top one with highest score?
- Each hypothesis (y_1,cdots,y_t) on our list has a score
  [operatorname{score}left(y_{1}, ldots, y_{t} ight)=log P_{mathrm{LM}}left(y_{1}, ldots, y_{t} | x ight)=sum_{i=1}^{t} log P_{mathrm{LM}}left(y_{i} | y_{1}, ldots, y_{i-1}, x ight) ]
Problem with this: longer hypotheses have lower scores

Fix : Normalize by length. Use this to select top one instead:

[frac{1}{t} sum_{i=1}^{t} log P_{mathrm{LM}}left(y_{i} | y_{1}, ldots, y_{i-1}, x ight) ]
相关阅读:
LNMP 部署
 zabbix3.2安装graphtree3.0.4
升级java8---from centos
mysql5.6-5.7性能调优
 samba server install
centos7 zabbix3 install done
实验四总结
 第五周学习小结
 个人的一些html、css笔记
 为什么wait，notify，notifyAll定义在Object中？
原文地址：https://www.cnblogs.com/curtisxiao/p/10828197.html

Beam Search

greedy search

Exhaustive search decoding

beam search

Beam search decoding: stopping criterion

Beam search decoding: finishing up