课程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization)，第三周（Hyperparameter tuning, Batch Normalization and Programming Frameworks） —— 2.Programming assignments

Tensorflow

Welcome to the Tensorflow Tutorial! In this notebook you will learn all the basics of Tensorflow. You will implement useful functions and draw the parallel with what you did using Numpy. You will understand what Tensors and operations are, as well as how to execute them in a computation graph.

After completing this assignment you will also be able to implement your own deep learning models using Tensorflow. In fact, using our brand new SIGNS dataset, you will build a deep neural network model to recognize numbers from 0 to 5 in sign language with a pretty impressive accuracy.

【中文翻译】

Tensorflow

欢迎来到 Tensorflow 教程!在本笔记本中, 您将学习 Tensorflow 的所有基础知识。您将实现有用的函数，并与您使用 Numpy 实现并行一样，实现并行化。您将了解什么是张量和运算, 以及如何在计算图中执行它们。

完成此任务后, 您还可以使用 Tensorflow 实现自己的深入学习模型。事实上, 使用我们的全新标志SIGNS数据集, 您将建立一个深的神经网络模型, 以识别数字从0到5的符号语言，并取得相当可观的准确性。

TensorFlow Tutorial

Welcome to this week's programming assignment. Until now, you've always used numpy to build neural networks. Now we will step you through a deep learning framework that will allow you to build neural networks more easily. Machine learning frameworks like TensorFlow, PaddlePaddle, Torch, Caffe, Keras, and many others can speed up your machine learning development significantly. All of these frameworks also have a lot of documentation, which you should feel free to read. In this assignment, you will learn to do the following in TensorFlow:

Initialize variables
Start your own session
Train algorithms
Implement a Neural Network

Programing frameworks can not only shorten your coding time, but sometimes also perform optimizations that speed up your code.

【中文】

TensorFlow 教程

欢迎来到本周的编程任务。直到现在, 你一直使用 numpy 建立神经网络。现在, 我们将通过一个深入的学习框架来帮助您更轻松地构建神经网络。机器学习框架, 如 TensorFlow, PaddlePaddle, Torch, Caffe, Keras, 和许多其他的，可以显著地加快你的机器学习过程。所有这些框架也有大量的文档, 您应该随时阅读。在本作业中, 您将学习在 TensorFlow 中执行以下操作:

　　初始化变量

　　启动您自己的进程

　　训练算法

　　实现神经网络

编程框架不仅可以缩短编码时间, 而且有时还会执行加速代码的优化。

1 - Exploring the Tensorflow Library

To start, you will import the library:

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

%matplotlib inline
np.random.seed(1)

Now that you have imported the library, we will walk you through its different applications. You will start with an example, where we compute for you the loss of one training example.

【中文翻译】

既然您已经导入了库, 我们将带您浏览它的不同应用程序。您将从一个示例开始, 在这里我们将为您计算一个训练样本的损失。

【code】

y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')                    # Define y. Set to 39

loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss

init = tf.global_variables_initializer()         # When init is run later (session.run(init)),
                                                 # the loss variable will be initialized and ready to be computed
with tf.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables
    print(session.run(loss))                     # Prints the loss

【result】

Writing and running programs in TensorFlow has the following steps:

Create Tensors (variables) that are not yet executed/evaluated.
Write operations between those Tensors.
Initialize your Tensors.
Create a Session.
Run the Session. This will run the operations you'd written above.

Therefore, when we created a variable for the loss, we simply defined the loss as a function of other quantities, but did not evaluate its value. To evaluate it, we had to run init=tf.global_variables_initializer(). That initialized the loss variable, and in the last line we were finally able to evaluate the value of loss and print its value.[

【中文翻译】

在 TensorFlow 中编写和运行程序有以下步骤:

　　1、创建尚未执行/评估的张量 (变量)。

　　2、在这些张量之间写入操作（运算）。

　　3、初始化您的张量。

　　4、创建一个线程。

　　5、运行线程。这将运行您在上面编写的操作（，进行运算）。

因此, 当我们为损失创建一个变量时, 我们简单地将损失定义为其他量的函数, 但没有评估它的值。要评估它, 我们必须运行 init = tf. global_variables_initializer ()。初始化损失变量, 并在最后一行, 我们终于能够评估损失的值和打印其值。

Now let us look at an easy example. Run the cell below:

【code】

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

【result】

Tensor("Mul:0", shape=(), dtype=int32)

As expected, you will not see 20! You got a tensor saying that the result is a tensor that does not have the shape attribute, and is of type "int32". All you did was put in the 'computation graph', but you have not run this computation yet. In order to actually multiply the two numbers, you will have to create a session and run it.

【中文翻译】

正如所料, 你不会看到 20!你得到一个张量说, 结果是一个不具有形状属性的张量, 并且是 "int32" 类型。你所做的只是把 "计算图", 但你还没有运行这个计算。为了将两个数字相乘, 您必须创建一个线程并运行它。

【code】

sess = tf.Session()
print(sess.run(c))

【result】

Great! To summarize, remember to initialize your variables, create a session and run the operations inside the session.

Next, you'll also have to know about placeholders. A placeholder is an object whose value you can specify only later. To specify values for a placeholder, you can pass in values by using a "feed dictionary" (feed_dict variable). Below, we created a placeholder for x. This allows us to pass in a number later when we run the session.

【中文翻译】

好!总结一下 , 请记住初始化变量, 创建线程并在线程内运行操作(计算)。

接下来, 您还必须了解占位符。占位符是一个对象, 其值只能在以后指定。若要指定占位符的值, 可以使用 "feed dictionary" (feed_dict 变量) 传入值。下面, 我们为 x 创建了一个占位符。这允许我们稍后在运行进程时传入一个数字。

【code】

# Change the value of x in the feed_dict

x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

【result】

When you first defined x you did not have to specify a value for it. A placeholder is simply a variable that you will assign data to only later, when running the session. We say that you feed data to these placeholders when running the session.

Here's what's happening: When you specify the operations needed for a computation, you are telling TensorFlow how to construct a computation graph. The computation graph can have some placeholders whose values you will specify only later. Finally, when you run the session, you are telling TensorFlow to execute the computation graph.

【中文翻译】

第一次定义 x 时, 不必为其指定值。占位符只是一个变量, 您将在后面运行进程时将数据分配给x。我们说, 您在运行进程时将数据送到这些占位符。

下面是正在发生的情况: 当您指定计算所需的操作时, 您将告诉 TensorFlow 如何构造计算图。计算图可以有一些占位符, 其值将仅在以后指定。最后, 当您运行进程时, 您要告诉 TensorFlow 执行计算图。

1.1 - Linear function

Lets start this programming exercise by computing the following equation:

Exercise: Compute

X = tf.constant(np.random.randn(3,1), name = "X")

You might find the following functions helpful:

tf.matmul(..., ...) to do a matrix multiplication
tf.add(..., ...) to do an addition
np.random.randn(...) to initialize randomly

【code】

# GRADED FUNCTION: linear_function

def linear_function():
    """
    Implements a linear function: 
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns: 
    result -- runs the session for Y = WX + b 
    """
    
    np.random.seed(1)
    
    ### START CODE HERE ### (4 lines of code)
    X = tf.constant(np.random.randn(3,1), name = "X")
    W = tf.constant(np.random.randn(4,3), name = "W")
    b = tf.constant(np.random.randn(4,1), name = "b")
    Y = tf.constant(np.random.randn(4,1), name = "Y")
    ### END CODE HERE ### 
    
    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate
    
    ### START CODE HERE ###
    sess = tf.Session()
    result = sess.run(tf.add(tf.matmul(W,X),b)) 
    ### END CODE HERE ### 
    
    # close the session 
    sess.close()

    return result

print( "result = " + str(linear_function()))

【result】

result = [[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]

Expected Output :

result

[[-2.15657382] [ 2.95891446] [-1.08926781] [-0.84538042]]

1.2 - Computing the sigmoid

Great! You just implemented a linear function. Tensorflow offers a variety of commonly used neural network functions like tf.sigmoid and tf.softmax. For this exercise lets compute the sigmoid function of an input.

You will do this exercise using a placeholder variable x. When running the session, you should use the feed dictionary to pass in the input z. In this exercise, you will have to

(i) create a placeholder x,

(ii) define the operations needed to compute the sigmoid using tf.sigmoid, and then

(iii) run the session.

Exercise : Implement the sigmoid function below. You should use the following:

tf.placeholder(tf.float32, name = "...") #如果有其他参数，例如shape,则 tf.placeholder(dtypr=tf.float32, shape=(n_x,n_y),name = "...")
tf.sigmoid(...)
sess.run(..., feed_dict = {x: z}) #如果有多个参数，则 sess.run(..., feed_dict = {x: z，y:w, ...})

Note that there are two typical ways to create and use sessions in tensorflow:

Method 1:

sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

Method 2:

with tf.Session() as sess: 
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)

【code】

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Computes the sigmoid of z
    
    Arguments:
    z -- input value, scalar or vector
    
    Returns: 
    results -- the sigmoid of z
    """
    
    ### START CODE HERE ### ( approx. 4 lines of code)
    # Create a placeholder for x. Name it 'x'.
    x = tf.placeholder(tf.float32, name = "x")
    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)                 #   1/(1 + math.e**(- x))

    # Create a session, and run it. Please use the method 2 explained above. 
    # You should use a feed_dict to pass z's value to x. 
    with tf.Session() as sess: 
        # Run session and call the output "result"
        result =sess.run( sigmoid, feed_dict = {x:z})
    
    ### END CODE HERE ###
    
    return result

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

【result】

sigmoid(0) = 0.5
sigmoid(12) = 0.999994

Expected Output :

sigmoid(0)	0.5
sigmoid(12)	0.999994

To summarize, you how know how to:

Create placeholders
Specify the computation graph corresponding to operations you want to compute
Create the session
Run the session, using a feed dictionary if necessary to specify placeholder variables' values.

【中文翻译】

总结一下, 您如何知道如何:

　　创建占位符

　　指定与要计算的运算对应的计算图

　　创建进程

　　运行进程, 如果需要, 请使用feed dictionary来指定占位符变量的值。

1.3 - Computing the Cost

You can also use a built-in function to compute the cost of your neural network. So instead of needing to write code to compute this as a function of

you can do it in one line of code in tensorflow!

Exercise: Implement the cross entropy loss. The function you will use is:

tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)

Your code should input z, compute the sigmoid (to get a) and then compute the cross entropy cost

【code】

# GRADED FUNCTION: cost

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy
    
    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 
    
    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 
    
    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    
    ### START CODE HERE ### 
    
    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z =  tf.placeholder(tf.float32, name = "z")
    y =  tf.placeholder(tf.float32, name = "y")
    
    # Use the loss function (approx. 1 line)
    cost =tf.nn.sigmoid_cross_entropy_with_logits(logits = z,  labels = y)
    
    # Create a session (approx. 1 line). See method 1 above.
    sess = tf.Session()
    
    # Run the session (approx. 1 line).
    cost = sess.run(cost, feed_dict={z:logits,y:labels})
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return cost

【code】

logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
cost = cost(logits, np.array([0,0,1,1]))
print ("cost = " + str(cost))

【result】

cost = [ 1.00538719  1.03664088  0.41385433  0.39956614]

Expected Output :

cost

[ 1.00538719 1.03664088 0.41385433 0.39956614]

1.4 - Using One Hot encodings

Many times in deep learning you will have a y vector with numbers ranging from 0 to C-1, where C is the number of classes. If C is for example 4, then you might have the following y vector which you will need to convert as follows:

This is called a "one hot" encoding, because in the converted representation exactly one element of each column is "hot" (meaning set to 1). To do this conversion in numpy, you might have to write a few lines of code. In tensorflow, you can use one line of code:

tf.one_hot(labels, depth, axis)

【中文翻译】

这称 "1热" 编码, 因为在被转换的元素中，表示法确切将地每列的一个元素设置成为 "热的" (意思是设置到 1)。要在 numpy 中进行此转换, 您可能需要编写几行代码。在 tensorflow 中, 您可以使用一行代码。

Exercise: Implement the function below to take one vector of labels and the total number of classes

【code】

# GRADED FUNCTION: one_hot_matrix

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1.  【创建一个矩阵, 其中第i行对应于第 i 类,和 j 列
                     对应于 j 训练样本。所以, 如果 j 样本有一个标签 i，那么坐标(i, j)对应的值将是1】
                     
    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension
    
    Returns: 
    one_hot -- one hot matrix
    """
    
    ### START CODE HERE ###
    
    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    C = tf.constant(C, name="C")
    
    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot_matrix = tf.one_hot(labels, C, axis=0)
    
    # Create the session (approx. 1 line)
    sess = tf.Session()
    
    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot_matrix)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return one_hot

【code】

labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C = 4)
print ("one_hot = " + str(one_hot))

【result】

one_hot = [[ 0.  0.  0.  1.  0.  0.]
 [ 1.  0.  0.  0.  0.  1.]
 [ 0.  1.  0.  0.  1.  0.]
 [ 0.  0.  1.  0.  0.  0.]]

Expected Output:

one_hot

[[ 0. 0. 0. 1. 0. 0.] [ 1. 0. 0. 0. 0. 1.] [ 0. 1. 0. 0. 1. 0.] [ 0. 0. 1. 0. 0. 0.]]

1.5 - Initialize with zeros and ones

Now you will learn how to initialize a vector of zeros and ones. The function you will be calling is tf.ones(). To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array of dimension shape full of zeros and ones respectively.

Exercise: Implement the function below to take in a shape and to return an array (of the shape's dimension of ones).

tf.ones(shape)

【code】

# GRADED FUNCTION: ones

def ones(shape):
    """
    Creates an array of ones of dimension shape
    
    Arguments:
    shape -- shape of the array you want to create
        
    Returns: 
    ones -- array containing only ones
    """
    
    ### START CODE HERE ###
    
    # Create "ones" tensor using tf.ones(...). (approx. 1 line)
    ones = tf.ones(shape)
    
    # Create the session (approx. 1 line)
    sess = tf.Session()
    
    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    return ones

print ("ones = " + str(ones([3])))

【result】

ones = [ 1.  1.  1.]

Expected Output:

ones

[ 1. 1. 1.]

2 - Building your first neural network in tensorflow

In this part of the assignment you will build a neural network using tensorflow. Remember that there are two parts to implement a tensorflow model:

Create the computation graph
Run the graph

Let's delve into the problem you'd like to solve!

One afternoon, with some friends we decided to teach our computers to decipher sign language(破译手语). We spent a few hours taking pictures in front of a white wall and came up with the following dataset. It's now your job to build an algorithm that would facilitate communications from a speech-impaired person to someone who doesn't understand sign language(现在你的工作是建立一个算法, 这将有助于一个语音受损的人与不懂手语的人之间的沟通。).

Training set: 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number).
Test set: 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures per number).

Note that this is a subset of the SIGNS dataset. The complete dataset contains many more signs.

Here are examples for each number, and how an explanation of how we represent the labels. These are the original pictures, before we lowered the image resolutoion to 64 by 64 pixels.

【中文翻译】

一天下午, 和一些朋友一起, 我们决定教我们的计算机破译手语。我们花了几个小时在白色的墙前拍照, 并得到了下面的数据集。现在你的工作是建立一个算法, 这将有助于一个语音受损的人与不懂手语的人之间的沟通。

　　训练集: 1080 张图片 (64 乘64像素) ，标志代表数字从0到 5 (180 张图片每个数字)。

　　测试集: 120 张图片 (64 乘64像素) ，标志代表数字从0到 5 (20 张图片每个数字)。

请注意, 这是符号数据集的子集。完整的数据集包含许多更多的符号。

这里是每个数字的例子, 以及如何解释我们如何代表标签。这些是原始图片。后来，我们降低图像到 64 *64 像素。

Run the following code to load the dataset.

【code】

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

Change the index below and run the cell to visualize some examples in the dataset.

【code】

# Example of a picture
index = 0
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

【result】

y = 5

As usual you flatten the image dataset, then normalize it by dividing by 255. On top of that, you will convert each label to a one-hot vector as shown in Figure 1. Run the cell below to do so.

【code】

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T     # X_train_orig.shape = (1080, 64, 64, 3)
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten/255.
X_test = X_test_flatten/255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print ("number of training examples = " + str(X_train.shape[1]))
print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

【result】

number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

Note that 12288 comes from

Your goal is to build an algorithm capable of recognizing a sign with high accuracy. To do so, you are going to build a tensorflow model that is almost the same as one you have previously built in numpy for cat recognition (but now using a softmax output). It is a great occasion to compare your numpy implementation to the tensorflow one.

The model is LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX. The SIGMOID output layer has been converted to a SOFTMAX. A SOFTMAX layer generalizes SIGMOID to when there are more than two classes.

【中文翻译】

你的目标是建立一个算法, 能够高精度地识别手语。要这样做, 您将构建一个 tensorflow 模型, 它与以前在 numpy 中构建的 cat 识别 (但现在使用 softmax 输出) 几乎相同。这是一个很好的机会来比较你的 numpy 实现与tensorflow 实现。

模型是 LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX。SIGMOID 输出层已转换为 SOFTMAX。当有两个以上的类时, SOFTMAX 层将SIGMOID 推广。

2.1 - Create placeholders

Your first task is to create placeholders for X and Y. This will allow you to later pass your training data in when you run your session.

Exercise: Implement the function below to create the placeholders in tensorflow.

【code】

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)
    
    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
    
    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(dtype=tf.float32,shape=(n_x, None), name = "Placeholder_1")
    Y = tf.placeholder(dtype=tf.float32,shape=(n_y, None), name = "Placeholder_2")
    ### END CODE HERE ###
    
    return X, Y

X, Y = create_placeholders(12288, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

【result】

X = Tensor("Placeholder_1_1:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Placeholder_2_1:0", shape=(6, ?), dtype=float32)

Expected Output:

X	Tensor("Placeholder_1:0", shape=(12288, ?), dtype=float32) (not necessarily Placeholder_1)
Y	Tensor("Placeholder_2:0", shape=(10, ?), dtype=float32) (not necessarily Placeholder_2)

2.2 - Initializing the parameters

Your second task is to initialize the parameters in tensorflow.

Exercise: Implement the function below to initialize the parameters in tensorflow. You are going use Xavier Initialization for weights and Zero Initialization for biases. The shapes are given below. As an example, to help you, for W1 and b1 you could use:

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

Please use seed = 1 to make sure your results match ours.

【code】

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]
    
    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    
    tf.set_random_seed(1)                   # so that your "random" numbers match ours
        
    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12,25], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b2 = tf.get_variable("b2", [12,1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6,12], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b3 = tf.get_variable("b3", [6,1], initializer = tf.zeros_initializer())
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return parameters

tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

【result】

W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>

Expected Output:

W1	< tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref >
b1	< tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref >
W2	< tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref >
b2	< tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref >

As expected, the parameters haven't been evaluated yet.

2.3 - Forward propagation in tensorflow

You will now implement the forward propagation module in tensorflow. The function will take in a dictionary of parameters and it will complete the forward pass. The functions you will be using are:

tf.add(...,...) to do an addition
tf.matmul(...,...) to do a matrix multiplication
tf.nn.relu(...) to apply the ReLU activation

Question: Implement the forward pass of the neural network. We commented for you the numpy equivalents so that you can compare the tensorflow implementation to numpy. It is important to note that the forward propagation stops at z3. The reason is that in tensorflow the last linear layer output is given as input to the function computing the loss. Therefore, you don't need a3!

【中文翻译】

2.3-在 tensorflow中的正向传播

现在, 您将在 tensorflow 中实现正向传播模块。该函数将接受一个参数字典, 它将完成前向传递。您将使用的功能有:

tf.add(...,...) to 做加法
tf.matmul(...,...) to 做矩阵乘法
tf.nn.relu(...) to 应用 relu 激活函数

问题: 实现神经网络的前向传播。我们为您评论了 numpy 等效项, 以便您可以将 tensorflow 实现与 numpy 进行比较。重要的是要注意, 向前传播停止在 z3。原因在于, 在 tensorflow 中, 最后的线性层输出作为损失函数的输入。因此, 你不需要 a3!

【code】

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
    
    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
    
    ### START CODE HERE ### (approx. 5 lines)               # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1,X) ,b1)                        # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                                     # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2,A1) ,b2)                       # Z2 = np.dot(W2, A1) + b2
    A2 = tf.nn.relu(Z2)                                     # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3,A2) ,b3)                        # Z3 = np.dot(W3,A2) + b3
    ### END CODE HERE ###
    
    return Z3

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

【result】

Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32)      #  "Add_2:0"  ???

Expected Output:

Z3	Tensor("Add_2:0", shape=(6, ?), dtype=float32)

You may have noticed that the forward propagation doesn't output any cache. You will understand why below, when we get to brackpropagation.

2.4 Compute cost

As seen before, it is very easy to compute the cost using:

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

Question: Implement the cost function below.

It is important to know that the "logits" and "labels" inputs of tf.nn.softmax_cross_entropy_with_logits are expected to be of shape (number of examples, num_classes). We have thus transposed Z3 and Y for you.
Besides, tf.reduce_mean basically does the summation over the examples.

【code】

# GRADED FUNCTION: compute_cost 

def compute_cost(Z3, Y):
    """
    Computes the cost
    
    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3
    
    Returns:
    cost - Tensor of the cost function
    """
    
    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)
    
    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits =  logits, labels = labels))
    ### END CODE HERE ###
    
    return cost

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

【result】

cost = Tensor("Mean:0", shape=(), dtype=float32)  #"Mean:0" ???

Expected Output:

cost

Tensor("Mean:0", shape=(), dtype=float32)

2.5 - Backward propagation & parameter updates

This is where you become grateful to programming frameworks. All the backpropagation and the parameters update is taken care of in 1 line of code. It is very easy to incorporate this line in the model.

After you compute the cost function. You will create an "optimizer" object. You have to call this object along with the cost when running the tf.session. When called, it will perform an optimization on the given cost with the chosen method and learning rate.

For instance, for gradient descent the optimizer would be:

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

To make the optimization you would do:

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

This computes the backpropagation by passing through the tensorflow graph in the reverse order. From cost to inputs.

Note When coding, we often use _ as a "throwaway" variable to store values that we won't need to use later. Here, _ takes on the evaluated value of optimizer, which we don't need (and c takes the value of the cost variable).

【中文翻译】

2.5-向后传播和参数更新

这就是你对编程框架的感激之处。所有的反向和参数更新在1行代码中得到了实现。在模型中加入这行代码很容易。

在计算成本函数后。您将创建一个 "优化器" 对象。在运行 tf. sessiom时, 您必须调用此对象以及cost。当被调用时, 它将用选择的方法和学习速率对给定的cost进行优化。

例如, 对于梯度下降, 优化器将是:

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

要进行优化, 请执行以下操作:

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

这将通过 tensorflow 计算图以相反的顺序来计算反向传播。从cost到输入。

当编码时, 我们通常使用 _ 作为 "一次性" 变量来存储我们以后不需要使用的值。这里, _ 接受优化器的评估值, 我们不需要 (c 采用cost变量的值)。

2.6 - Building the model

Now, you will bring it all together!

Exercise: Implement the model. You will be calling the functions you had previously implemented.

【code】

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 1500, minibatch_size = 32, print_cost = True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables[能够在不覆盖 tf 变量的情况下重新运行模型]
    tf.set_random_seed(1)                             # to keep consistent results
    seed = 3                                          # to keep consistent results
    (n_x, m) = X_train.shape                          # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]                            # n_y : output size
    costs = []                                        # To keep track of the cost
    
    # Create Placeholders of shape (n_x, n_y)
    ### START CODE HERE ### (1 line)
    X, Y =  create_placeholders(n_x, n_y)
    ### END CODE HERE ###

    # Initialize parameters
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###
    
    # Forward propagation: Build the forward propagation in the tensorflow graph
    ### START CODE HERE ### (1 line)
    Z3 = forward_propagation(X, parameters)
    ### END CODE HERE ###
    
    # Cost function: Add cost function to tensorflow graph
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z3, Y)
    ### END CODE HERE ###
    
    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    ### END CODE HERE ###
    
    # Initialize all the variables
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        
        # Run the initialization
        sess.run(init)
        
        # Do the training loop
        for epoch in range(num_epochs):

            epoch_cost = 0.                       # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                ### START CODE HERE ### (1 line)
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                ### END CODE HERE ###
                
                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)
                
        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print ("Parameters have been trained!")

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
        
        return parameters

Run the following cell to train your model! On our machine it takes about 5 minutes. Your "Cost after epoch 100" should be 1.016458. If it's not, don't waste time; interrupt the training by clicking on the square (⬛) in the upper bar of the notebook, and try to correct your code. If it is the correct cost, take a break and come back in 5 minutes!

parameters = model(X_train, Y_train, X_test, Y_test)

【result】

Cost after epoch 0: 1.855702
Cost after epoch 100: 1.016458
Cost after epoch 200: 0.733102
Cost after epoch 300: 0.572940
Cost after epoch 400: 0.468774
Cost after epoch 500: 0.381021
Cost after epoch 600: 0.313822
Cost after epoch 700: 0.254158
Cost after epoch 800: 0.203829
Cost after epoch 900: 0.166421
Cost after epoch 1000: 0.141486
Cost after epoch 1100: 0.107580
Cost after epoch 1200: 0.086270
Cost after epoch 1300: 0.059371
Cost after epoch 1400: 0.052228

Parameters have been trained!
Train Accuracy: 0.999074
Test Accuracy: 0.716667

Expected Output:

Train Accuracy	0.999074
Test Accuracy	0.716667

Amazing, your algorithm can recognize a sign representing a figure between 0 and 5 with 71.7% accuracy.

Insights:

Your model seems big enough to fit the training set well. However, given the difference between train and test accuracy, you could try to add L2 or dropout regularization to reduce overfitting.
Think about the session as a block of code to train the model. Each time you run the session on a minibatch, it trains the parameters. In total you have run the session a large number of times (1500 epochs) until you obtained well trained parameters.

【中文翻译】

令人惊异的是, 您的算法可以识别一个符号, 表示一个介于0和5之间的数字, 具有71.7% 的精确度。

见解:

　　你的模型看起来很大, 足以适应训练组。然而, 鉴于训练和测试的准确性之间的差异, 你可以尝试添加 L2 或dropout正则化, 以减少过拟合。

　　将session看作是一个用于训练模型的代码块。每次在 minibatch 上运行session时, 它都会训练参数。在总体上, 您已经运行了大量的时间 (1500 个世纪), 直到您获得良好的训练参数。

2.7 - Test with your own image (optional / ungraded exercise)

Congratulations on finishing this assignment. You can now take a picture of your hand and see the output of your model. To do that:

1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.
2. Add your image to this Jupyter Notebook's directory, in the "images" folder
3. Write your image's name in the following code
4. Run the code and check if the algorithm is right!

【code】

import scipy
from PIL import Image
from scipy import ndimage

## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "thumbs_up.jpg"
## END CODE HERE ##

# We preprocess your image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1, 64*64*3)).T
my_image_prediction = predict(my_image, parameters)

plt.imshow(image)
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))

【result】

Your algorithm predicts: y = 3

You indeed deserved a "thumbs-up" although as you can see the algorithm seems to classify it incorrectly. The reason is that the training set doesn't contain any "thumbs-up", so the model doesn't know how to deal with it! We call that a "mismatched data distribution" and it is one of the various of the next course on "Structuring Machine Learning Projects".

What you should remember:

Tensorflow is a programming framework used in deep learning
The two main object classes in tensorflow are Tensors and Operators.
When you code in tensorflow you have to take the following steps:
- Create a graph containing Tensors (Variables, Placeholders ...) and Operations (tf.matmul, tf.add, ...)
- Create a session
- Initialize the session
- Run the session to execute the graph
You can execute the graph multiple times as you've seen in model()
The backpropagation and optimization is automatically done when running the session on the "optimizer" object.

相关阅读:
SPRING IN ACTION 第4版笔记-第九章Securing web applications-008-使用非关系型数据库时如何验证用户(自定义UserService)
[转]苹果商店审核规则，你触犯了哪一条？
Xcode itunes完美打包api方法
 Swift中构造器的继承和重写
 Swift中类的两段式构造(类的构造过程)
Swift中的便利构造器和构造器链
 iOS在UITableViewController里使用UISearchDisplayController报错"[UISearchResultsTableView dequeueReusableCellWithIdentifier:forIndexPath:]"
Swift缩水版MJExtension
构造器相关
 可选链
原文地址：https://www.cnblogs.com/hezhiyao/p/8082980.html