• 【NLP】Conditional Language Models


    Language Model estimates the probs that the sequences of words can be a sentence said by a human. Training it, we can get the embeddings of the whole vocabulary.


    UnConditional Language Model just assigns probs to sequences of words. That’s to say, given the first n-1 words and to predict the probs of the next word.(learn the prob distribution of next word).

    Beacuse of the probs chain rule, we only train this:

    image


    Conditional LMs

    A conditional language model assigns probabilities to sequences of words, W =(w1,w2,…,wt) , given some conditioning context x.


    For example, in the translation task, we must given the orininal sentence and its translation. The orininal sentence is the conditioning context, and by using it, we predict the objection sentence.


    Data for training conditional LMs:

      To train conditional language models, we need paired
 samples.E.X.

    image

    Such task like:Translation, summarisation, caption generation,
 speech recognition


    How to evaluate the conditional LMs?

    • Traditional methods: use the cross-entropy or perplexity.(hard to interpret,easy to implement)
    • Task-specific evaluation:  Compare the model’s most likely output to human-generated expected output . Such as 【BLEU】、METEOR、ROUGE…(okay to interpret,easy to implement)
    • Human evaluation: Hard to implement.


    Algorithmic challenges:

    Given the condition context x, to find the max-probs of the the predict sequence of words, we cannot use the gready search, which might cann’t generate a real sentence.

    We use the 【Beam Search】.


    We draw attention to the “encoder-decoder” models  that learn a function that maps  x  into a fixed-size
 vector and then uses a language model to “decode”
 that vector into a sequence of words, 

    image


    Model: K&B2013

    image

    A simpal of Encoder – just cumsum(very easy)

    image

    A simpal of Encoder – CSM Encoder:use CNN to encode

    image

    The Decoder – RNN Decoder

    image

    The cal graph is.

    image


    Sutskever et al. Model (2014):

    - Important.Classic Model

    image

    Cal Graph:

    image


    Some Tricks to Sutskever et al. Model :

    • Read the Input Sequence ‘backwards’: +4BLEU

      image

    •  Use an ensemble of m 【independently trained】 models (at the decode period) :
    1. Ensemble of 2 models: +3 BLEU
    2. Ensemble of 5 models: +4.5 BLEU


        For example:

          image

    • we want to find the most probable (MAP) output
 given the input,i,e.

          image

      We use the beam search : +1BLEU

        For example,the beam size is 2:

          image


    Example of A Application: Image caption generation

    Encoder:CNN

    Decoder:RNN or

                 conditional n-gram LM(different to the RNN but it is useful)

                 image

                 image


    We must have some datasets already.

    Kiros et al. Model has done this.




















      .

  • 相关阅读:
    (todo)数组名 有存储空间吗?
    c和指针 指针数组 关于指针数组
    c面试题 来自android手机 1/6
    c 和指针 二维数组赋予一维数组指针 数组长度 数组与指针长度区别
    c语言优先级面试小结
    hxf 每晚 宏 不用大于小于 求大小
    DelphiX中的DXSprite单元中涉及到修改
    在java开发过程中,添加表时,需要配置一下的目录和文件。
    代码生成器cs的注册方法.
    将一个字符串映射为一个Delphi页面控件属性名
  • 原文地址:https://www.cnblogs.com/duye/p/9403804.html
Copyright © 2020-2023  润新知