• Dinosaurus_Island_Character_level_language_model_final_v3b


    Character level language model - Dinosaurus Island

    Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go berserk, so choose wisely!

    ![](https://i.imgur.com/jWnLsAT.jpg) style="250;height:300px;">

    Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset. (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!

    By completing this assignment you will learn:

    • How to store text data for processing using an RNN
    • How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
    • How to build a character-level text generation recurrent neural network
    • Why clipping the gradients is important

    We will begin by loading in some functions that we have provided for you in rnn_utils. Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.

    Updates

    If you were working on the notebook before this update...

    • The current notebook is version "3b".
    • You can find your original work saved in the notebook with the previous version name ("v3a")
    • To view the file directory, go to the menu "File->Open", and this will open a new tab that shows the file directory.

    List of updates 3b

    • removed redundant numpy import
    • clip
      • change test code to use variable name 'mvalue' rather than 'maxvalue' and deleted it from namespace to avoid confusion.
    • optimize
      • removed redundant description of clip function to discourage use of using 'maxvalue' which is not an argument to optimize
    • model
      • added 'verbose mode to print X,Y to aid in creating that code.
      • wordsmith instructions to prevent confusion
        • 2000 examples vs 100, 7 displayed vs 10
        • no randomization of order
    • sample
      • removed comments regarding potential different sample outputs to reduce confusion.
    import numpy as np
    from utils import *
    import random
    import pprint
    

    1 - Problem Statement

    1.1 - Dataset and Preprocessing

    Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

    data = open('dinos.txt', 'r').read()
    data= data.lower()
    chars = list(set(data))
    data_size, vocab_size = len(data), len(chars)
    print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))
    
    There are 19909 total characters and 27 unique characters in your data.
    
    • The characters are a-z (26 characters) plus the " " (or newline character).
    • In this assignment, the newline character " " plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture.
      • Here, " " indicates the end of the dinosaur name rather than the end of a sentence.
    • char_to_ix: In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26.
    • ix_to_char: We also create a second python dictionary that maps each index back to the corresponding character.
      • This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer.
    chars = sorted(chars)
    print(chars)
    
    ['
    ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
    
    char_to_ix = { ch:i for i,ch in enumerate(chars) }
    ix_to_char = { i:ch for i,ch in enumerate(chars) }
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(ix_to_char)
    
    {   0: '
    ',
        1: 'a',
        2: 'b',
        3: 'c',
        4: 'd',
        5: 'e',
        6: 'f',
        7: 'g',
        8: 'h',
        9: 'i',
        10: 'j',
        11: 'k',
        12: 'l',
        13: 'm',
        14: 'n',
        15: 'o',
        16: 'p',
        17: 'q',
        18: 'r',
        19: 's',
        20: 't',
        21: 'u',
        22: 'v',
        23: 'w',
        24: 'x',
        25: 'y',
        26: 'z'}
    

    1.2 - Overview of the model

    Your model will have the following structure:

    • Initialize parameters
    • Run the optimization loop
      • Forward propagation to compute the loss function
      • Backward propagation to compute the gradients with respect to the loss function
      • Clip the gradients to avoid exploding gradients
      • Using the gradients, update your parameters with the gradient descent update rule.
    • Return the learned parameters


    style="450;height:300px;">

    Figure 1: Recurrent Neural Network, similar to what you had built in the previous notebook "Building a Recurrent Neural Network - Step by Step".
    • At each time-step, the RNN tries to predict what is the next character given the previous characters.
    • The dataset (mathbf{X} = (x^{langle 1 angle}, x^{langle 2 angle}, ..., x^{langle T_x angle})) is a list of characters in the training set.
    • (mathbf{Y} = (y^{langle 1 angle}, y^{langle 2 angle}, ..., y^{langle T_x angle})) is the same list of characters but shifted one character forward.
    • At every time-step (t), (y^{langle t angle} = x^{langle t+1 angle}). The prediction at time (t) is the same as the input at time (t + 1).

    2 - Building blocks of the model

    In this part, you will build two important blocks of the overall model:

    • Gradient clipping: to avoid exploding gradients
    • Sampling: a technique used to generate characters

    You will then apply these two functions to build the model.

    2.1 - Clipping the gradients in the optimization loop

    In this section you will implement the clip function that you will call inside of your optimization loop.

    Exploding gradients

    • When gradients are very large, they're called "exploding gradients."
    • Exploding gradients make the training process more difficult, because the updates may be so large that they "overshoot" the optimal values during back propagation.

    Recall that your overall loop structure usually consists of:

    • forward pass,
    • cost computation,
    • backward pass,
    • parameter update.

    Before updating the parameters, you will perform gradient clipping to make sure that your gradients are not "exploding."

    gradient clipping

    In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if needed.

    • There are different ways to clip gradients.
    • We will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N].
    • For example, if the N=10
      • The range is [-10, 10]
      • If any component of the gradient vector is greater than 10, it is set to 10.
      • If any component of the gradient vector is less than -10, it is set to -10.
      • If any components are between -10 and 10, they keep their original values.


    style="400;height:150px;">

    Figure 2: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into "exploding gradient" problems.

    Exercise:
    Implement the function below to return the clipped gradients of your dictionary gradients.

    • Your function takes in a maximum threshold and returns the clipped versions of the gradients.
    • You can check out numpy.clip.
      • You will need to use the argument "out = ...".
      • Using the "out" parameter allows you to update a variable "in-place".
      • If you don't use "out" argument, the clipped variable is stored in the variable "gradient" but does not update the gradient variables dWax, dWaa, dWya, db, dby.
    ### GRADED FUNCTION: clip
    
    def clip(gradients, maxValue):
        '''
        Clips the gradients' values between minimum and maximum.
        
        Arguments:
        gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
        maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
        
        Returns: 
        gradients -- a dictionary with the clipped gradients.
        '''
        
        dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
       
        ### START CODE HERE ###
        # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
        for gradient in [dWaa, dWax, dWya, db, dby]:
            gradient = np.clip(gradient,-maxValue,maxValue,out = gradient)
        ### END CODE HERE ###
        
        gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
        
        return gradients
    
    # Test with a maxvalue of 10
    mValue = 10
    np.random.seed(3)
    dWax = np.random.randn(5,3)*10
    dWaa = np.random.randn(5,5)*10
    dWya = np.random.randn(2,5)*10
    db = np.random.randn(5,1)*10
    dby = np.random.randn(2,1)*10
    gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
    gradients = clip(gradients, mValue)
    print("gradients["dWaa"][1][2] =", gradients["dWaa"][1][2])
    print("gradients["dWax"][3][1] =", gradients["dWax"][3][1])
    print("gradients["dWya"][1][2] =", gradients["dWya"][1][2])
    print("gradients["db"][4] =", gradients["db"][4])
    print("gradients["dby"][1] =", gradients["dby"][1])
    
    gradients["dWaa"][1][2] = 10.0
    gradients["dWax"][3][1] = -10.0
    gradients["dWya"][1][2] = 0.29713815361
    gradients["db"][4] = [ 10.]
    gradients["dby"][1] = [ 8.45833407]
    

    ** Expected output:**

    gradients["dWaa"][1][2] = 10.0
    gradients["dWax"][3][1] = -10.0
    gradients["dWya"][1][2] = 0.29713815361
    gradients["db"][4] = [ 10.]
    gradients["dby"][1] = [ 8.45833407]
    
    # Test with a maxValue of 5
    mValue = 5
    np.random.seed(3)
    dWax = np.random.randn(5,3)*10
    dWaa = np.random.randn(5,5)*10
    dWya = np.random.randn(2,5)*10
    db = np.random.randn(5,1)*10
    dby = np.random.randn(2,1)*10
    gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
    gradients = clip(gradients, mValue)
    print("gradients["dWaa"][1][2] =", gradients["dWaa"][1][2])
    print("gradients["dWax"][3][1] =", gradients["dWax"][3][1])
    print("gradients["dWya"][1][2] =", gradients["dWya"][1][2])
    print("gradients["db"][4] =", gradients["db"][4])
    print("gradients["dby"][1] =", gradients["dby"][1])
    del mValue # avoid common issue
    
    gradients["dWaa"][1][2] = 5.0
    gradients["dWax"][3][1] = -5.0
    gradients["dWya"][1][2] = 0.29713815361
    gradients["db"][4] = [ 5.]
    gradients["dby"][1] = [ 5.]
    

    ** Expected Output: **

    gradients["dWaa"][1][2] = 5.0
    gradients["dWax"][3][1] = -5.0
    gradients["dWya"][1][2] = 0.29713815361
    gradients["db"][4] = [ 5.]
    gradients["dby"][1] = [ 5.]
    

    2.2 - Sampling

    Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:


    style="500;height:300px;">

    Figure 3: In this picture, we assume the model is already trained. We pass in (x^{langle 1 angle} = vec{0}) at the first time step, and have the network sample one character at a time.

    Exercise: Implement the sample function below to sample characters. You need to carry out 4 steps:

    • Step 1: Input the "dummy" vector of zeros (x^{langle 1 angle} = vec{0}).

      • This is the default input before we've generated any characters.
        We also set (a^{langle 0 angle} = vec{0})
    • Step 2: Run one step of forward propagation to get (a^{langle 1 angle}) and (hat{y}^{langle 1 angle}). Here are the equations:

    hidden state:

    [ a^{langle t+1 angle} = anh(W_{ax} x^{langle t+1 angle } + W_{aa} a^{langle t angle } + b) ag{1}]

    activation:
    [ z^{langle t + 1 angle } = W_{ya} a^{langle t + 1 angle } + b_y ag{2}]

    prediction:
    [ hat{y}^{langle t+1 angle } = softmax(z^{langle t + 1 angle }) ag{3}]

    • Details about (hat{y}^{langle t+1 angle }):
      • Note that (hat{y}^{langle t+1 angle }) is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1).
      • (hat{y}^{langle t+1 angle}_i) represents the probability that the character indexed by "i" is the next character.
      • We have provided a softmax() function that you can use.

    Additional Hints

    • (x^{langle 1 angle}) is x in the code. When creating the one-hot vector, make a numpy array of zeros, with the number of rows equal to the number of unique characters, and the number of columns equal to one. It's a 2D and not a 1D array.
    • (a^{langle 0 angle}) is a_prev in the code. It is a numpy array of zeros, where the number of rows is (n_{a}), and number of columns is 1. It is a 2D array as well. (n_{a}) is retrieved by getting the number of columns in (W_{aa}) (the numbers need to match in order for the matrix multiplication (W_{aa}a^{langle t angle}) to work.
    • numpy.dot
    • numpy.tanh

    Using 2D arrays instead of 1D arrays

    • You may be wondering why we emphasize that (x^{langle 1 angle}) and (a^{langle 0 angle}) are 2D arrays and not 1D vectors.
    • For matrix multiplication in numpy, if we multiply a 2D matrix with a 1D vector, we end up with with a 1D array.
    • This becomes a problem when we add two arrays where we expected them to have the same shape.
    • When two arrays with a different number of dimensions are added together, Python "broadcasts" one across the other.
    • Here is some sample code that shows the difference between using a 1D and 2D array.
    matrix1 = np.array([[1,1],[2,2],[3,3]]) # (3,2)
    matrix2 = np.array([[0],[0],[0]]) # (3,1) 
    vector1D = np.array([1,1]) # (2,) 
    vector2D = np.array([[1],[1]]) # (2,1)
    print("matrix1 
    ", matrix1,"
    ")
    print("matrix2 
    ", matrix2,"
    ")
    print("vector1D 
    ", vector1D,"
    ")
    print("vector2D 
    ", vector2D)
    
    matrix1 
     [[1 1]
     [2 2]
     [3 3]] 
    
    matrix2 
     [[0]
     [0]
     [0]] 
    
    vector1D 
     [1 1] 
    
    vector2D 
     [[1]
     [1]]
    
    print("Multiply 2D and 1D arrays: result is a 1D array
    ", 
          np.dot(matrix1,vector1D))
    print("Multiply 2D and 2D arrays: result is a 2D array
    ", 
          np.dot(matrix1,vector2D))
    
    Multiply 2D and 1D arrays: result is a 1D array
     [2 4 6]
    Multiply 2D and 2D arrays: result is a 2D array
     [[2]
     [4]
     [6]]
    
    print("Adding (3 x 1) vector to a (3 x 1) vector is a (3 x 1) vector
    ",
          "This is what we want here!
    ", 
          np.dot(matrix1,vector2D) + matrix2)
    
    Adding (3 x 1) vector to a (3 x 1) vector is a (3 x 1) vector
     This is what we want here!
     [[2]
     [4]
     [6]]
    
    print("Adding a (3,) vector to a (3 x 1) vector
    ",
          "broadcasts the 1D array across the second dimension
    ",
          "Not what we want here!
    ",
          np.dot(matrix1,vector1D) + matrix2
         )
    
    Adding a (3,) vector to a (3 x 1) vector
     broadcasts the 1D array across the second dimension
     Not what we want here!
     [[2 4 6]
     [2 4 6]
     [2 4 6]]
    
    • Step 3: Sampling:

      • Now that we have (y^{langle t+1 angle}), we want to select the next letter in the dinosaur name. If we select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, we will use np.random.choice to select a next letter that is likely, but not always the same.
      • Pick the next character's index according to the probability distribution specified by (hat{y}^{langle t+1 angle }).
      • This means that if (hat{y}^{langle t+1 angle }_i = 0.16), you will pick the index "i" with 16% probability.
      • Use np.random.choice.

      Example of how to use np.random.choice():

      np.random.seed(0)
      probs = np.array([0.1, 0.0, 0.7, 0.2])
      idx = np.random.choice(range(len((probs)), p = probs)
      • This means that you will pick the index (idx) according to the distribution:

      (P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2).

      • Note that the value that's set to p should be set to a 1D vector.
      • Also notice that (hat{y}^{langle t+1 angle}), which is y in the code, is a 2D array.
      • Also notice, while in your implementation, the first argument to np.random.choice is just an ordered list [0,1,.., vocab_len-1], it is Not appropriate to use char_to_ix.values(). The order of values returned by a python dictionary .values() call will be the same order as they are added to the dictionary. The grader may have a different order when it runs your routine than when you run it in your notebook.
    Additional Hints
    • range
    • numpy.ravel takes a multi-dimensional array and returns its contents inside of a 1D vector.

      arr = np.array([[1,2],[3,4]])
      print("arr")
      print(arr)
      print("arr.ravel()")
      print(arr.ravel())

      Output:

      arr
      [[1 2]
      [3 4]]
      arr.ravel()
      [1 2 3 4]
    • Note that append is an "in-place" operation. In other words, don't do this:

      fun_hobbies = fun_hobbies.append('learning')  ## Doesn't give you what you want
      
    • Step 4: Update to (x^{langle t angle })

      • The last step to implement in sample() is to update the variable x, which currently stores (x^{langle t angle }), with the value of (x^{langle t + 1 angle }).
      • You will represent (x^{langle t + 1 angle }) by creating a one-hot vector corresponding to the character that you have chosen as your prediction.
      • You will then forward propagate (x^{langle t + 1 angle }) in Step 1 and keep repeating the process until you get a " " character, indicating that you have reached the end of the dinosaur name.
    Additional Hints
    • In order to reset x before setting it to the new one-hot vector, you'll want to set all the values to zero.
    # GRADED FUNCTION: sample
    
    def sample(parameters, char_to_ix, seed):
        """
        Sample a sequence of characters according to a sequence of probability distributions output of the RNN
    
        Arguments:
        parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
        char_to_ix -- python dictionary mapping each character to an index.
        seed -- used for grading purposes. Do not worry about it.
    
        Returns:
        indices -- a list of length n containing the indices of the sampled characters.
        """
        
        # Retrieve parameters and relevant shapes from "parameters" dictionary
        Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
        vocab_size = by.shape[0]
        n_a = Waa.shape[1]
        
        ### START CODE HERE ###
        # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
        x = np.zeros(( vocab_size, 1))
        # Step 1': Initialize a_prev as zeros (≈1 line)
        a_prev = np.zeros(( n_a, 1))
        
        # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
        indices = []
        
        # Idx is a flag to detect a newline character, we initialize it to -1
        idx = -1 
        
        # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append 
        # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well 
        # trained model), which helps debugging and prevents entering an infinite loop. 
        counter = 0
        newline_character = char_to_ix['
    ']
        
        while (idx != newline_character and counter != 50):
            
            # Step 2: Forward propagate x using the equations (1), (2) and (3)
            a = np.tanh( np.dot(Waa, a_prev) + np.dot(Wax, x) + b)
            z = np.dot(Wya, a) + by
            y = softmax(z)
            
            # for grading purposes
            np.random.seed(counter+seed) 
            
            # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
            idx = np.random.choice(range(vocab_size), p = y.ravel())
     
            # Append the index to "indices"
            indices.append(idx)
            
            # Step 4: Overwrite the input character as the one corresponding to the sampled index.
            x = np.zeros(( vocab_size, 1))
            x[idx,:] = 1
            
            # Update "a_prev" to be "a"
            a_prev = a
            
            # for grading purposes
            seed += 1
            counter +=1
            
        ### END CODE HERE ###
     
        if (counter == 50):
            indices.append(char_to_ix['
    '])
        
        return indices
    
    np.random.seed(2)
    _, n_a = 20, 100
    Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
    b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
    parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
    
    
    indices = sample(parameters, char_to_ix, 0)
    print("Sampling:")
    print("list of sampled indices:
    ", indices)
    print("list of sampled characters:
    ", [ix_to_char[i] for i in indices])
    
    Sampling:
    list of sampled indices:
     [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 17, 24, 12, 13, 24, 0]
    list of sampled characters:
     ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'q', 'x', 'l', 'm', 'x', '
    ']
    

    ** Expected output:**

    Sampling:
    list of sampled indices:
     [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 17, 24, 12, 13, 24, 0]
    list of sampled characters:
     ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'q', 'x', 'l', 'm', 'x', '
    ']
    

    3 - Building the language model

    It is time to build the character-level language model for text generation.

    3.1 - Gradient descent

    • In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients).
    • You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent.

    As a reminder, here are the steps of a common optimization loop for an RNN:

    • Forward propagate through the RNN to compute the loss
    • Backward propagate through time to compute the gradients of the loss with respect to the parameters
    • Clip the gradients
    • Update the parameters using gradient descent

    Exercise: Implement the optimization process (one step of stochastic gradient descent).

    The following functions are provided:

    def rnn_forward(X, Y, a_prev, parameters):
        """ Performs the forward propagation through the RNN and computes the cross-entropy loss.
        It returns the loss' value as well as a "cache" storing values to be used in backpropagation."""
        ....
        return loss, cache
        
    def rnn_backward(X, Y, parameters, cache):
        """ Performs the backward propagation through time to compute the gradients of the loss with respect
        to the parameters. It returns also all the hidden states."""
        ...
        return gradients, a
    
    def update_parameters(parameters, gradients, learning_rate):
        """ Updates parameters using the Gradient Descent Update Rule."""
        ...
        return parameters
    

    Recall that you previously implemented the clip function:

    parameters

    • Note that the weights and biases inside the parameters dictionary are being updated by the optimization, even though parameters is not one of the returned values of the optimize function. The parameters dictionary is passed by reference into the function, so changes to this dictionary are making changes to the parameters dictionary even when accessed outside of the function.
    • Python dictionaries and lists are "pass by reference", which means that if you pass a dictionary into a function and modify the dictionary within the function, this changes that same dictionary (it's not a copy of the dictionary).
    # GRADED FUNCTION: optimize
    
    def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
        """
        Execute one step of the optimization to train the model.
        
        Arguments:
        X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
        Y -- list of integers, exactly the same as X but shifted one index to the left.
        a_prev -- previous hidden state.
        parameters -- python dictionary containing:
                            Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                            Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                            Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                            b --  Bias, numpy array of shape (n_a, 1)
                            by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
        learning_rate -- learning rate for the model.
        
        Returns:
        loss -- value of the loss function (cross-entropy)
        gradients -- python dictionary containing:
                            dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
                            dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
                            dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
                            db -- Gradients of bias vector, of shape (n_a, 1)
                            dby -- Gradients of output bias vector, of shape (n_y, 1)
        a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
        """
        
        ### START CODE HERE ###
        
        # Forward propagate through time (≈1 line)
        loss, cache = rnn_forward(X, Y, a_prev, parameters)
        
        # Backpropagate through time (≈1 line)
        gradients, a = rnn_backward(X, Y, parameters, cache)
        
        # Clip your gradients between -5 (min) and 5 (max) (≈1 line)
        gradients = clip(gradients, 5)
        
        # Update parameters (≈1 line)
        parameters = update_parameters(parameters, gradients, learning_rate)
        
        ### END CODE HERE ###
        
        return loss, gradients, a[len(X)-1]
    
    np.random.seed(1)
    vocab_size, n_a = 27, 100
    a_prev = np.random.randn(n_a, 1)
    Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
    b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
    parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
    X = [12,3,5,11,22,3]
    Y = [4,14,11,22,25, 26]
    
    loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
    print("Loss =", loss)
    print("gradients["dWaa"][1][2] =", gradients["dWaa"][1][2])
    print("np.argmax(gradients["dWax"]) =", np.argmax(gradients["dWax"]))
    print("gradients["dWya"][1][2] =", gradients["dWya"][1][2])
    print("gradients["db"][4] =", gradients["db"][4])
    print("gradients["dby"][1] =", gradients["dby"][1])
    print("a_last[4] =", a_last[4])
    
    Loss = 126.503975722
    gradients["dWaa"][1][2] = 0.194709315347
    np.argmax(gradients["dWax"]) = 93
    gradients["dWya"][1][2] = -0.007773876032
    gradients["db"][4] = [-0.06809825]
    gradients["dby"][1] = [ 0.01538192]
    a_last[4] = [-1.]
    

    ** Expected output:**

    Loss = 126.503975722
    gradients["dWaa"][1][2] = 0.194709315347
    np.argmax(gradients["dWax"]) = 93
    gradients["dWya"][1][2] = -0.007773876032
    gradients["db"][4] = [-0.06809825]
    gradients["dby"][1] = [ 0.01538192]
    a_last[4] = [-1.]
    

    3.2 - Training the model

    • Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example.
    • Every 2000 steps of stochastic gradient descent, you will sample several randomly chosen names to see how the algorithm is doing.

    Exercise: Follow the instructions and implement model(). When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:

    Set the index idx into the list of examples
    • Using the for-loop, walk through the shuffled list of dinosaur names in the list "examples".
    • For example, if there are n_e examples, and the for-loop increments the index to n_e onwards, think of how you would make the index cycle back to 0, so that we can continue feeding the examples into the model when j is n_e, n_e + 1, etc.
    • Hint: n_e + 1 divided by n_e is zero with a remainder of 1.
    • % is the modulus operator in python.
    Extract a single example from the list of examples
    • single_example: use the idx index that you set previously to get one word from the list of examples.
    Convert a string into a list of characters: single_example_chars
    • single_example_chars: A string is a list of characters.
    • You can use a list comprehension (recommended over for-loops) to generate a list of characters. Python str = 'I love learning'
      list_of_chars = [c for c in str]
      print(list_of_chars)
    ['I', ' ', 'l', 'o', 'v', 'e', ' ', 'l', 'e', 'a', 'r', 'n', 'i', 'n', 'g']
    
    Convert list of characters to a list of integers: single_example_ix
    • Create a list that contains the index numbers associated with each character.
    • Use the dictionary char_to_ix
    • You can combine this with the list comprehension that is used to get a list of characters from a string.
    Create the list of input characters: X
    • rnn_forward uses the None value as a flag to set the input vector as a zero-vector.
    • Prepend the list [None] in front of the list of input characters.
    • There is more than one way to prepend a value to a list. One way is to add two lists together: ['a'] + ['b']
    Get the integer representation of the newline character ix_newline
    • ix_newline: The newline character signals the end of the dinosaur name.
      • get the integer representation of the newline character ' '.
      • Use char_to_ix
    Set the list of labels (integer representation of the characters): Y
    • The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time step ahead of the characters in the input X.
      • For example, Y[0] contains the same value as X[1]
    • The RNN should predict a newline at the last letter so add ix_newline to the end of the labels.
      • Append the integer representation of the newline character to the end of Y.
      • Note that append is an in-place operation.
      • It might be easier for you to add two lists together.
    # GRADED FUNCTION: model
    
    def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27, verbose = False):
        """
        Trains the model and generates dinosaur names. 
        
        Arguments:
        data -- text corpus
        ix_to_char -- dictionary that maps the index to a character
        char_to_ix -- dictionary that maps a character to an index
        num_iterations -- number of iterations to train the model for
        n_a -- number of units of the RNN cell
        dino_names -- number of dinosaur names you want to sample at each iteration. 
        vocab_size -- number of unique characters found in the text (size of the vocabulary)
        
        Returns:
        parameters -- learned parameters
        """
        
        # Retrieve n_x and n_y from vocab_size
        n_x, n_y = vocab_size, vocab_size
        
        # Initialize parameters
        parameters = initialize_parameters(n_a, n_x, n_y)
        
        # Initialize loss (this is required because we want to smooth our loss)
        loss = get_initial_loss(vocab_size, dino_names)
        
        # Build list of all dinosaur names (training examples).
        with open("dinos.txt") as f:
            examples = f.readlines()
        examples = [x.lower().strip() for x in examples]
        
        # Shuffle list of all dinosaur names
        np.random.seed(0)
        np.random.shuffle(examples)
        
        # Initialize the hidden state of your LSTM
        a_prev = np.zeros((n_a, 1))
        
        # Optimization loop
        for j in range(num_iterations):
            
            ### START CODE HERE ###
            
            # Set the index `idx` (see instructions above)
            idx = j % len(examples)
            
            # Set the input X (see instructions above)
            single_example = examples[idx]
            single_example_chars = [c for c in single_example]
            single_example_ix = [char_to_ix[c] for c in single_example_chars]
            X = [None]+ [single_example_ix]
            X = [None] + [char_to_ix[ch] for ch in examples[idx]]; 
            # Set the labels Y (see instructions above)
            ix_newline = char_to_ix["
    "]
            Y = X[1:]+ [ix_newline]
    
            # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
            # Choose a learning rate of 0.01
            curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate=0.01)
            
            ### END CODE HERE ###
            
            # debug statements to aid in correctly forming X, Y
            if verbose and j in [0, len(examples) -1, len(examples)]:
                print("j = " , j, "idx = ", idx,) 
            if verbose and j in [0]:
                print("single_example =", single_example)
                print("single_example_chars", single_example_chars)
                print("single_example_ix", single_example_ix)
                print(" X = ", X, "
    ", "Y =       ", Y, "
    ")
            
            # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
            loss = smooth(loss, np.mean(curr_loss))
    
            # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
            if j % 2000 == 0:
                
                print('Iteration: %d, Loss: %f' % (j, loss) + '
    ')
                
                # The number of dinosaur names to print
                seed = 0
                for name in range(dino_names):
                    
                    # Sample indices and print them
                    sampled_indices = sample(parameters, char_to_ix, seed)
                    print_sample(sampled_indices, ix_to_char)
                    
                    seed += 1  # To get the same result (for grading purposes), increment the seed by one. 
          
                print('
    ')
            
        return parameters
    

    Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.

    parameters = model(data, ix_to_char, char_to_ix, verbose = True)
    
    j =  0 idx =  0
    single_example = turiasaurus
    single_example_chars ['t', 'u', 'r', 'i', 'a', 's', 'a', 'u', 'r', 'u', 's']
    single_example_ix [20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19]
     X =  [None, [20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19]] 
     Y =        [[20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19], 0] 
    
    Iteration: 0, Loss: 23.054379
    
    Nkzxwtdmfqoeyhsqwasjkjvu
    Kneb
    Kzxwtdmfqoeyhsqwasjkjvu
    Neb
    Zxwtdmfqoeyhsqwasjkjvu
    Eb
    Xwtdmfqoeyhsqwasjkjvu
    
    
    j =  1535 idx =  1535
    j =  1536 idx =  0
    Iteration: 2000, Loss: 42.405948
    
    Rouuusarasrouossuasooossaosouoo
    Oraaaasuoaaruouoaouraauaosuaausraronsruroorsroross
    Ouuusarosrauossuasooous
    Roa
    Uuusarasrouossuasooossaosouoo
    Naansuoaaruouoaouraauaosuaausraronsruroorsrorossro
    Uusarosrauossuasooous
    
    
    
    
    /home/jovyan/work/Week 1/Dinosaur Island -- Character-level language model/utils.py:96: RuntimeWarning: divide by zero encountered in log
      loss -= np.log(y_hat[t][Y[t],0])
    
    
    Iteration: 4000, Loss: inf
    
    Rruuusarossouossu
    Rsoaaasuraaruruooruroauarsuaauurorrnssurrossrosous
    Ruuusarossourssuasrrrus
    Roa
    Uuusarossouossu
    Oaaasuraaruruooruroauarsuaauurorrnssurrossrosousrr
    Uusnrossourssuasrrrus
    
    
    Iteration: 6000, Loss: inf
    
    Rruuusorossnuossuasorousaououoo
    Rroaansuoaaououooouronunosuaauuroronsrurrnssrnsnus
    Ruuusnrnsrouossuasrorus
    Roa
    Uuusorossnuossuasorousaououoo
    Oaansuoaaououooouronunosuaauuroronsrurrnssrnsnusro
    Uusnrnsrouossuasrorus
    
    
    Iteration: 8000, Loss: inf
    
    Rruuusaoarraunsru
    Rra
    Ruuusarnrrauosruarrooss
    Roa
    Uuusaoarraunsru
    Oaa
    Uusarnrrauosruarrooss
    
    
    Iteration: 10000, Loss: inf
    
    Rruuusaoorrauosrs
    Oro
    Ruuusararrouosruasoooss
    Roa
    Uuusaoorrauosrs
    Oaa
    Uusararrouosruasoooss
    
    
    Iteration: 12000, Loss: inf
    
    Rruuusaoarrauasrs
    Ora
    Ruuusararrauasruasoooss
    Raa
    Uuusaoarrauasrs
    Aaa
    Uusararrauasruasoooss
    
    
    Iteration: 14000, Loss: inf
    
    Rouussaoarrauarrs
    Ora
    Russr
    Roa
    Uussaoarrauarrs
    Aaa
    Usr
    
    
    Iteration: 16000, Loss: inf
    
    Rruuusanaos
    Osa
    Ruuus
    Roa
    Uuusanaos
    Aaa
    Uus
    
    
    Iteration: 18000, Loss: inf
    
    Rouuusaoarrauasru
    Ora
    Ruuus
    Raa
    Uuusaoarrauasru
    Aaa
    Uus
    
    
    Iteration: 20000, Loss: inf
    
    Rruuusaoorrauosrs
    Rro
    Ruuusararrouosruasooous
    Roa
    Uuusaoorrauosrs
    Oaa
    Uusararrouosruasooous
    
    
    Iteration: 22000, Loss: inf
    
    Roussraaaor
    Ora
    Ousss
    Raa
    Ussraaaor
    Aaa
    Uss
    
    
    Iteration: 24000, Loss: inf
    
    Roussraaanr
    Ora
    Oussr
    Raa
    Ussraaanr
    Aaa
    Usr
    
    
    Iteration: 26000, Loss: inf
    
    Rouuusanaoraunsrs
    Ora
    Ruussarasoauosssaronoss
    Roa
    Uuusanaoraunsrs
    Aaa
    Uusarasoauosssaronoss
    
    
    Iteration: 28000, Loss: inf
    
    Rruussaaarr
    Rra
    Ruuss
    Raa
    Uussaaarr
    Aaa
    Uus
    
    
    Iteration: 30000, Loss: inf
    
    Rouuuuaaaor
    Osa
    Ouuuu
    Raa
    Uuuuaaaor
    Aaa
    Uuu
    
    
    Iteration: 32000, Loss: inf
    
    Rruuuranaor
    Ora
    Ruuus
    Raa
    Uuuranaor
    Aaa
    Uus
    
    
    Iteration: 34000, Loss: inf
    
    Rruuusaaarr
    Ora
    Ruuus
    Raa
    Uuusaaarr
    Aaa
    Uus
    

    ** Expected Output**

    j =  0 idx =  0
    single_example = turiasaurus
    single_example_chars ['t', 'u', 'r', 'i', 'a', 's', 'a', 'u', 'r', 'u', 's']
    single_example_ix [20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19]
     X =  [None, 20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19] 
     Y =        [20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19, 0] 
    
    Iteration: 0, Loss: 23.087336
    
    Nkzxwtdmfqoeyhsqwasjkjvu
    Kneb
    Kzxwtdmfqoeyhsqwasjkjvu
    Neb
    Zxwtdmfqoeyhsqwasjkjvu
    Eb
    Xwtdmfqoeyhsqwasjkjvu
    
    
    j =  1535 idx =  1535
    j =  1536 idx =  0
    Iteration: 2000, Loss: 27.884160
    
    ...
    
    Iteration: 34000, Loss: 22.447230
    
    Onyxipaledisons
    Kiabaeropa
    Lussiamang
    Pacaeptabalsaurus
    Xosalong
    Eiacoteg
    Troia
    

    Conclusion

    You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implementation generated some really cool names like maconucon, marloralus and macingsersaurus. Your model hopefully also learned that dinosaur names tend to end in saurus, don, aura, tor, etc.

    If your model generates some non-cool names, don't blame the model entirely--not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!

    This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favorite name is the great, undefeatable, and fierce: Mangosaurus!


    style="250;height:300px;">

    4 - Writing like Shakespeare

    The rest of this notebook is optional and is not graded, but we hope you'll do it anyway since it's quite fun and informative.

    A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in the sequence. These long term dependencies were less important with dinosaur names, since the names were quite short.


    style="500;height:400px;">

    Let's become poets!

    We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.

    from __future__ import print_function
    from keras.callbacks import LambdaCallback
    from keras.models import Model, load_model, Sequential
    from keras.layers import Dense, Activation, Dropout, Input, Masking
    from keras.layers import LSTM
    from keras.utils.data_utils import get_file
    from keras.preprocessing.sequence import pad_sequences
    from shakespeare_utils import *
    import sys
    import io
    
    Using TensorFlow backend.
    
    
    Loading text data...
    Creating training set...
    number of training examples: 31412
    Vectorizing training set...
    Loading model...
    

    To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called "The Sonnets".

    Let's train the model for one more epoch. When it finishes training for an epoch---this will also take a few minutes---you can run generate_output, which will prompt asking you for an input (<40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try "Forsooth this maketh no sense " (don't enter the quotation marks). Depending on whether you include the space at the end, your results might also differ--try it both ways, and try other inputs as well.

    print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
    
    model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])
    
    Epoch 1/1
    31412/31412 [==============================] - 320s - loss: 2.5707   
    
    
    
    
    
    <keras.callbacks.History at 0x7fbeb1377080>
    
    # Run this cell to try with different inputs without having to re-train the model 
    generate_output()
    
    Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: here is the
    
    
    Here is your poem: 
    
    here is thes,
    i eiding nides ade'm my galin thrime,
    whosce gomirs of formied your mine,
    co nkolasy gees whe murtige nove,
    that dimy eor pars lur how thum or featanes
    thy beautys end thou of sather'e dabes,
    now hatouss beharbed pribe as you lof pry,
    why fors an weme som with thy save's of kyound.
    ser fally sthere to rose bey eye's brand,
    thy sourthy reall betide the dead.
    is id hase durh worlled himy is wring
    

    The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:

    • LSTMs instead of the basic RNN to capture longer-range dependencies
    • The model is a deeper, stacked LSTM model (2 layer)
    • Using Keras instead of python to simplify the code

    If you want to learn more, you can also check out the Keras Team's text generation implementation on GitHub: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py.

    Congratulations on finishing this notebook!

    References:

    
    
  • 相关阅读:
    SMTP协议简介
    Debian
    TCP/IP协议基础
    CentOS(Community ENTerprise Operating System)
    IO
    【备忘】Windows网络命令行操作
    repeater中绑定dropdownlist事件
    MultipleActiveResultSets
    .NET UEditor使用方法
    Asp.net中时间格式化的几种方法
  • 原文地址:https://www.cnblogs.com/tolshao/p/dinosaurusislandcharacterlevellanguagemodelfinalv3.html
Copyright © 2020-2023  润新知