• deep learning and neural networks--handwriting recongnization


    use an algorithm to separate the letters from sentences. ( kind of classifier)

    the architecture of neural networks:

      input layer; hidden layer; output layer

    different kinds of neural networks:

      feedforward neural networks; recurrent neural networks; and so on

      Usually, we use optimizer ( e.g. SGD Stochastic Gradient Descent in this chapter) to optimize/find a solution. As we all known, NNs is supervised learning. So we need to learn from the training dataset first, the we get our model from the data, then, test our result on the test dataset in the end.

    traning dataset: MNIST data ( Modified NIST) comes in two parts: the first part contains 6000 images to be used as training data; the second part of the MNIST data set is 10000 images to be used as testing data.

    notation x to denote a training input, it'll be regarded as a 28*28 = 784 dimensional vector as the image vector. And the corresponding output y = y(x) is a 10-dimensional vector. The element in y is either 0 or 1, and there will be only 1 in the vector y. ( e.g. if the input x is 6, if we recongnize the x correctly, then, the output y will be (0,0,0,0,0,0,1,0,0,0)T ).

    for each neuron, there are two parameters--weight and bias. Wj,kl is weight of the lth connection the kth neuron ( right side) to the jth neron ( left side ). So after the formation the input x is z=wx+b. Then, use the activation function in this chapter is sigmoid function to compute each output f(z) to decide which neuron to be fired.

    w denotes the collection of all weights in the network, b all the biases, n is the total number of training inputs, a is the vector of outputs from the network when x is input, and the sum is over all training inputs, x.

    Why not try to maximize that number directly, rather than minimizing a proxy measure like the quadratic cost? The problem with that is that the number of images correctly classified is not a smooth function of the weights and biases in the network. For the most part, making small changes to the weights and biases won't cause any change at all in the number of training images classified correctly. That makes it difficult to figure out how to change the weights and biases to get improved performance. If we instead use a smooth cost function like the quadratic cost it turns out to be easy to figure out how to make small changes in the weights and biases so as to get an improvement in the cost.

    The cost function C changes as follows, then, we need to choose Δv1, Δv2 to make (7) negative.

    An idea called stochastic gradient descent can be used to speed up learning. The idea is to estimate the gradient Cby computing ∇Cx for a small sample of randomly chosen training inputs. By averaging over this small sample it turns out that we can quickly get a good estimate of the true gradient C∇C, and this helps speed up gradient descent, and thus learning.

    Then stochastic gradient descent works by picking out a randomly chosen mini-batch of training inputs, and training with those, ( In other words, use backpropagation to update w,b in each mini-batch) Then we get the final set of weights and biases of the neural network as the result

      1 # %load network.py
      2 
      3 """
      4 network.py
      5 ~~~~~~~~~~
      6 IT WORKS
      7 
      8 A module to implement the stochastic gradient descent learning
      9 algorithm for a feedforward neural network.  Gradients are calculated
     10 using backpropagation.  Note that I have focused on making the code
     11 simple, easily readable, and easily modifiable.  It is not optimized,
     12 and omits many desirable features.
     13 """
     14 
     15 #### Libraries
     16 # Standard library
     17 import random
     18 
     19 # Third-party libraries
     20 import numpy as np
     21 
     22 class Network(object):
     23 
     24     def __init__(self, sizes):
     25         """The list ``sizes`` contains the number of neurons in the
     26         respective layers of the network.  For example, if the list
     27         was [2, 3, 1] then it would be a three-layer network, with the
     28         first layer containing 2 neurons, the second layer 3 neurons,
     29         and the third layer 1 neuron.  The biases and weights for the
     30         network are initialized randomly, using a Gaussian
     31         distribution with mean 0, and variance 1.  Note that the first
     32         layer is assumed to be an input layer, and by convention we
     33         won't set any biases for those neurons, since biases are only
     34         ever used in computing the outputs from later layers."""
     35         self.num_layers = len(sizes)
     36         self.sizes = sizes
     37         self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
     38         self.weights = [np.random.randn(y, x)
     39                         for x, y in zip(sizes[:-1], sizes[1:])]
     40 
     41     def feedforward(self, a):
     42         """Return the output of the network if ``a`` is input."""
     43         for b, w in zip(self.biases, self.weights):
     44             a = sigmoid(np.dot(w, a)+b)
     45         return a
     46 
     47     def SGD(self, training_data, epochs, mini_batch_size, eta,
     48             test_data=None):
     49         """Train the neural network using mini-batch stochastic
     50         gradient descent.  The ``training_data`` is a list of tuples
     51         ``(x, y)`` representing the training inputs and the desired
     52         outputs.  The other non-optional parameters are
     53         self-explanatory.  If ``test_data`` is provided then the
     54         network will be evaluated against the test data after each
     55         epoch, and partial progress printed out.  This is useful for
     56         tracking progress, but slows things down substantially."""
     57 
     58         training_data = list(training_data)
     59         n = len(training_data)
     60 
     61         if test_data:
     62             test_data = list(test_data)
     63             n_test = len(test_data)
     64 
     65         for j in range(epochs):
     66             random.shuffle(training_data)
     67             mini_batches = [
     68                 training_data[k:k+mini_batch_size]
     69                 for k in range(0, n, mini_batch_size)]
     70             for mini_batch in mini_batches:
     71                 self.update_mini_batch(mini_batch, eta)
     72             if test_data:
     73                 print("Epoch {} : {} / {}".format(j,self.evaluate(test_data),n_test));
     74             else:
     75                 print("Epoch {} complete".format(j))
     76 
     77     def update_mini_batch(self, mini_batch, eta):
     78         """Update the network's weights and biases by applying
     79         gradient descent using backpropagation to a single mini batch.
     80         The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
     81         is the learning rate."""
     82         nabla_b = [np.zeros(b.shape) for b in self.biases]
     83         nabla_w = [np.zeros(w.shape) for w in self.weights]
     84         for x, y in mini_batch:
     85             delta_nabla_b, delta_nabla_w = self.backprop(x, y)
     86             nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
     87             nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
     88         self.weights = [w-(eta/len(mini_batch))*nw
     89                         for w, nw in zip(self.weights, nabla_w)]
     90         self.biases = [b-(eta/len(mini_batch))*nb
     91                        for b, nb in zip(self.biases, nabla_b)]
     92 
     93     def backprop(self, x, y):
     94         """Return a tuple ``(nabla_b, nabla_w)`` representing the
     95         gradient for the cost function C_x.  ``nabla_b`` and
     96         ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
     97         to ``self.biases`` and ``self.weights``."""
     98         nabla_b = [np.zeros(b.shape) for b in self.biases]
     99         nabla_w = [np.zeros(w.shape) for w in self.weights]
    100         # feedforward
    101         activation = x
    102         activations = [x] # list to store all the activations, layer by layer
    103         zs = [] # list to store all the z vectors, layer by layer
    104         for b, w in zip(self.biases, self.weights):
    105             z = np.dot(w, activation)+b
    106             zs.append(z)
    107             activation = sigmoid(z)
    108             activations.append(activation)
    109         # backward pass
    110         delta = self.cost_derivative(activations[-1], y) * 
    111             sigmoid_prime(zs[-1])
    112         nabla_b[-1] = delta
    113         nabla_w[-1] = np.dot(delta, activations[-2].transpose())
    114         # Note that the variable l in the loop below is used a little
    115         # differently to the notation in Chapter 2 of the book.  Here,
    116         # l = 1 means the last layer of neurons, l = 2 is the
    117         # second-last layer, and so on.  It's a renumbering of the
    118         # scheme in the book, used here to take advantage of the fact
    119         # that Python can use negative indices in lists.
    120         for l in range(2, self.num_layers):
    121             z = zs[-l]
    122             sp = sigmoid_prime(z)
    123             delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
    124             nabla_b[-l] = delta
    125             nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
    126         return (nabla_b, nabla_w)
    127 
    128     def evaluate(self, test_data):
    129         """Return the number of test inputs for which the neural
    130         network outputs the correct result. Note that the neural
    131         network's output is assumed to be the index of whichever
    132         neuron in the final layer has the highest activation."""
    133         test_results = [(np.argmax(self.feedforward(x)), y)
    134                         for (x, y) in test_data]
    135         return sum(int(x == y) for (x, y) in test_results)
    136 
    137     def cost_derivative(self, output_activations, y):
    138         """Return the vector of partial derivatives partial C_x /
    139         partial a for the output activations."""
    140         return (output_activations-y)
    141 
    142 #### Miscellaneous functions
    143 def sigmoid(z):
    144     """The sigmoid function."""
    145     return 1.0/(1.0+np.exp(-z))
    146 
    147 def sigmoid_prime(z):
    148     """Derivative of the sigmoid function."""
    149     return sigmoid(z)*(1-sigmoid(z))
  • 相关阅读:
    20145317《信息安全系统设计基础》第六周学习总结(1)
    20145317《信息安全系统设计基础》第五周学习总结2
    20145317 《信息安全系统设计基础》第5周学习总结
    20145317《信息安全系统设计基础》第三周学习总结
    20145317 《信息安全系统设计基础》第二周学习总结
    20145317彭垚《信息安全系统设计基础》第1周学习总结
    20145317《信息安全系统设计基础》第0周学习总结
    20145317彭垚 java课程总结
    20145317彭垚 《Java程序设计》第五次实验报告
    20145315 《信息安全系统设计基础》第十周学习总结
  • 原文地址:https://www.cnblogs.com/yuelien/p/13780757.html
Copyright © 2020-2023  润新知