• RNN comprehension


    RNN

    https://www.tensorflow.org/tutorials/text/text_classification_rnn

    Create the model

    A drawing of the information flow in the model

    Above is a diagram of the model.

    1. This model can be build as a tf.keras.Sequential.

    2. The first layer is the encoder, which converts the text to a sequence of token indices.

    3. After the encoder is an embedding layer. An embedding layer stores one vector per word. When called, it converts the sequences of word indices to sequences of vectors. These vectors are trainable. After training (on enough data), words with similar meanings often have similar vectors.

      This index-lookup is much more efficient than the equivalent operation of passing a one-hot encoded vector through a tf.keras.layers.Dense layer.

    4. A recurrent neural network (RNN) processes sequence input by iterating through the elements. RNNs pass the outputs from one timestep to their input on the next timestep.

      The tf.keras.layers.Bidirectional wrapper can also be used with an RNN layer. This propagates the input forward and backwards through the RNN layer and then concatenates the final output.

      • The main advantage of a bidirectional RNN is that the signal from the beginning of the input doesn't need to be processed all the way through every timestep to affect the output.

      • The main disadvantage of a bidirectional RNN is that you can't efficiently stream predictions as words are being added to the end.

    5. After the RNN has converted the sequence to a single vector the two layers.Dense do some final processing, and convert from this vector representation to a single logit as the classification output.

    The code to implement this is below:

    model = tf.keras.Sequential([
        encoder,
        tf.keras.layers.Embedding(
            input_dim=len(encoder.get_vocabulary()),
            output_dim=64,
            # Use masking to handle the variable sequence lengths
            mask_zero=True),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(1)
    ])

    LSTM

    https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

    The Problem, Short-term Memory

    Recurrent Neural Networks suffer from short-term memory. If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning.

    During back propagation, recurrent neural networks suffer from the vanishing gradient problem. Gradients are values used to update a neural networks weights. The vanishing gradient problem is when the gradient shrinks as it back propagates through time. If a gradient value becomes extremely small, it doesn’t contribute too much learning.

    Gradient Update Rule

    So in recurrent neural networks, layers that get a small gradient update stops learning. Those are usually the earlier layers. So because these layers don’t learn, RNN’s can forget what it seen in longer sequences, thus having a short-term memory. If you want to know more about the mechanics of recurrent neural networks in general, you can read my previous post here.

    https://aditi-mittal.medium.com/understanding-rnn-and-lstm-f7cdf6dfc14e

    What is Recurrent Neural Network (RNN)?

    Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. In other neural networks, all the inputs are independent of each other. But in RNN, all the inputs are related to each other.

     

    First, it takes the X(0) from the sequence of input and then it outputs h(0) which together with X(1) is the input for the next step. So, the h(0) and X(1) is the input for the next step. Similarly, h(1) from the next is the input with X(2) for the next step and so on. This way, it keeps remembering the context while training.

    Word Embedding

    https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa

     

    Google Images

    https://easyai.tech/ai-definition/word-embedding/

    CBOW

    https://thinkinfi.com/continuous-bag-of-words-cbow-single-word-model-how-it-works/

    To implement Word2Vec, there are two flavors which are — Continuous Bag-Of-Words (CBOW) and continuous Skip-gram (SG)
    In this post I will explain only Continuous Bag of Word (CBOW) model with a one-word window to understand continuous bag of word (CBOW) clearly. If you can understand CBOW with single word model then multiword CBOW model will be so easy to you.
    While explaining, I will present a few small examples with a text containing a few words. However, keep in mind that word2vec is typically trained with billions of words.

    Continuous Bag of Words (CBOW):

    It attempts to guess the output (target word) from its neighboring words (context words). You can think of it like fill in the blank task, where you need to guess word in place of blank by observing nearby words.

     
    出处:http://www.cnblogs.com/lightsong/ 本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。
  • 相关阅读:
    怎样应对突发性的开发需求
    ASP.NET过滤HTML标签只保留换行与空格的方法
    sqlserver 各种判断是否存在(表名、函数、存储过程等)
    Timing advance of GSM(时间提前量)
    对.NET中Hashtable和ArryList的理解
    GPS原始经纬度转百度经纬度
    baidu经纬度坐标与google经纬度坐标都转换
    .NET资料之-根据两点经纬度计算直线距离
    .net处理JSON简明教程
    在asp.net中要不使用其他插件的情况下只能使用定时器来检查, 并执行任务.
  • 原文地址:https://www.cnblogs.com/lightsong/p/14709045.html
Copyright © 2020-2023  润新知