• Sequence Model


    Various Sequence To Sequence Architectures

    Basic Models

    Sequence to sequence model

    seq2seq

    Image captioning

    use CNN(AlexNet) first to get a 4096-dimensional vector, feed it to a RNN

    Picking the Most Likely Sentence

    translate a French sentence (x) to the most likely English sentence (y) .

    it's to find

    [argmax_{y^{<1>}, dots, y^{<T_y>}} P(y^{<1>}, dots, y^{<T_y>} | x) ]

    • Why not a greedy search?

      (Find the most likely words one by one) Because it may be verbose and long.

    • set the (B = 3 ext{(beam width)}), find (3) most likely English outputs

    • consider each for the most likely second word, and then find (B) most likely words

      beam_search
    • do it again until (<EOS>)

    if (B = 1), it's just greedy search.

    Length normalization

    [argmax_{y} prod_{t = 1}^{T_y} P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) ]

    (P) is much less than (1) (close to (0)) take (log)

    [argmax_{y} sum_{t = 1}^{T_y} log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) ]

    it tends to give the short sentences.

    So you can normalize it ((alpha) is a hyperparameter)

    [argmax_{y} frac 1 {T_y^{alpha}} sum_{t = 1}^{T_y} log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) ]

    Beam search discussion

    • large (B) : better result, slower
    • small (B) : worse result, faster

    let (y^*) be human high quality translation, and (hat y) be algorithm output.

    • (P(y^* | x) > P(hat y | x)) : Beam search is at fault
    • (P(y^* | x) le P(hat y | x)) : RNN model is at fault

    Bleu(bilingual evaluation understudy) Score

    if you have some good referrences to evaluate the score.

    [p_n = frac{sum_{ ext{n-grams} in hat y} ext{Count}_{ ext{clip}}( ext{n-grams})} {sum_{ ext{n-grams} in hat y} ext{Count}( ext{n-grams})} ]

    Bleu details

    calculate it with (exp(frac{1}{4} sum_{n = 1}^4 p_n))

    BP = brevity penalty

    [BP = egin{cases} 1 & ext{if~~MT\_output\_length > reference\_output\_length}\ exp(1 - ext{reference\_output\_length / MT\_output\_length}) & ext{otherwise} end{cases} ]

    don't want short translation.

    Attention Model Intuition

    it's hard for network to memorize the whole sentence.

    bleu_table

    compute the attention weight to predict the word from the context

    Attention_Model_Intuition

    Attention Model

    Use a BiRNN or BiLSTM.

    [egin{aligned} a^{<t'>} &= (vec a^{<t'>}, overleftarrow a^{<t'>})\ sum_{t'} alpha^{<i, t'>} &= 1\ c^{<i>} &= sum_{t'} alpha^{<i, t'>} alpha^{<t'>} end{aligned} ]

    attention_model

    Computing attention

    [egin{aligned} alpha^{<t, t'>} &= ext{amount of "attention" } y^{<t>} ext{ should pay to } a^{<t'>}\ &= frac{exp(e^{<t, t'>})}{sum_{t' = 1}^{T_x} exp(e^{<t, t'>})} end{aligned} ]

    train a very small network to learn what the function is

    the complexity is (mathcal O(T_x T_y)) , which is so big (quadratic cost)

    computing_attention

    Speech Recognition - Audio Data

    Speech recognition

    (x( ext{audio clip}) o y( ext{transcript}))

    Attention model for sppech recognition

    generate character by character

    CTC cost for speech recognition

    CTC(Connectionist temporal classification)

    "ttt_h_eee___ ____qqq(dots)" ( ightarrow) "the quick brown fox"

    Basic rule: collapse repeated characters not separated by "blank"

    Trigger Word Detection

    label the trigger word, let the output be (1)s

  • 相关阅读:
    师弟大喜之日,送上一幅对联 求横批
    漫画:Google 走了
    产品研发流程改进
    Outlook2010 Bug 一则
    Android 手机用户版本比例
    CDMA 短信中心号码
    UIM卡 PIN 码特点
    [Accessibility] Missing contentDescription attribute on image
    java打印函数的调用堆栈
    android中解析Json
  • 原文地址:https://www.cnblogs.com/zjp-shadow/p/15178221.html
Copyright © 2020-2023  润新知