• BP神经网络——交叉熵作代价函数


    Sigmoid函数

    当神经元的输出接近 1时,曲线变得相当平,即σ′(z)的值会很小,进而也就使∂C/∂w∂C/∂b会非常小。造成学习缓慢,下面有一个二次代价函数的cost变化图,epoch从15到50变化很小。

    引入交叉熵代价函数

    针对上述问题,希望对输出层选择一个不包含sigmoid的权值更新,使得

    由链式法则,得到

    由σ′(z) = σ(z)(1− σ(z))以及σ(z)=a,可以将上式转换成

    对方程进行关于a的积分,可得

    对样本进行平均之后就是下面的交叉熵代价函数

    对比之前的输出层delta,相当于去掉了前面的

    相应的代码仅改动了一行(58->59),新的cost变化图如下。

    在训练和测试数据各5000个时,识别正确数从4347稍提高到4476。

     

     1 # coding:utf8
     2 import cPickle
     3 import numpy as np
     4 import matplotlib.pyplot as plt
     5 
     6 
     7 class Network(object):
     8     def __init__(self, sizes):
     9         self.num_layers = len(sizes)
    10         self.sizes = sizes
    11         self.biases = [np.random.randn(y, 1) for y in sizes[1:]]  # L(n-1)->L(n)
    12         self.weights = [np.random.randn(y, x)
    13                         for x, y in zip(sizes[:-1], sizes[1:])]
    14 
    15     def feedforward(self, a):
    16         for b_, w_ in zip(self.biases, self.weights):
    17             a = self.sigmoid(np.dot(w_, a)+b_)
    18         return a
    19 
    20     def SGD(self, training_data, test_data,epochs, mini_batch_size, eta):
    21         n_test = len(test_data)
    22         n = len(training_data)
    23         plt.xlabel('epoch')
    24         plt.title('cost')
    25         cy=[]
    26         cx=range(epochs)
    27         for j in cx:
    28             self.cost = 0.0
    29             np.random.shuffle(training_data)  # shuffle
    30             for k in xrange(0, n, mini_batch_size):
    31                 mini_batch = training_data[k:k+mini_batch_size]
    32                 self.update_mini_batch(mini_batch, eta)
    33             cy.append(self.cost/n)
    34             print "Epoch {0}: {1} / {2}".format(
    35                     j, self.evaluate(test_data), n_test)
    36         plt.plot(cx,cy)
    37         plt.scatter(cx,cy)
    38         plt.show()
    39 
    40     def update_mini_batch(self, mini_batch, eta):
    41         for x, y in mini_batch:
    42             delta_b, delta_w,cost = self.backprop(x, y)
    43             self.weights -= eta/len(mini_batch)*delta_w
    44             self.biases -= eta/len(mini_batch)*delta_b
    45             self.cost += cost
    46 
    47     def backprop(self, x, y):
    48         b=np.zeros_like(self.biases)
    49         w=np.zeros_like(self.weights)
    50         a_ = x
    51         a = [x]
    52         for b_, w_ in zip(self.biases, self.weights):
    53             a_ = self.sigmoid(np.dot(w_, a_)+b_)
    54             a.append(a_)
    55         for l in xrange(1, self.num_layers):
    56             if l==1:
    57                 # delta= self.sigmoid_prime(a[-1])*(a[-1]-y)  # O(k)=a[-1], t(k)=y
    58                 delta= a[-1]-y  # cross-entropy
    59             else:
    60                 sp = self.sigmoid_prime(a[-l])   # O(j)=a[-l]
    61                 delta = np.dot(self.weights[-l+1].T, delta) * sp
    62             b[-l] = delta
    63             w[-l] = np.dot(delta, a[-l-1].T)
    64         cost=0.5*np.sum((b[-1])**2)
    65         return (b, w,cost)
    66 
    67     def evaluate(self, test_data):
    68         test_results = [(np.argmax(self.feedforward(x)), y)
    69                         for (x, y) in test_data]
    70         return sum(int(x == y) for (x, y) in test_results)
    71 
    72     def sigmoid(self,z):
    73         return 1.0/(1.0+np.exp(-z))
    74 
    75     def sigmoid_prime(self,z):
    76         return z*(1-z)
    77 
    78 if __name__ == '__main__':
    79 
    80         def get_label(i):
    81             c=np.zeros((10,1))
    82             c[i]=1
    83             return c
    84 
    85         def get_data(data):
    86             return [np.reshape(x, (784,1)) for x in data[0]]
    87 
    88         f = open('mnist.pkl', 'rb')
    89         training_data, validation_data, test_data = cPickle.load(f)
    90         training_inputs = get_data(training_data)
    91         training_label=[get_label(y_) for y_ in training_data[1]]
    92         data = zip(training_inputs,training_label)
    93         test_inputs = training_inputs = get_data(test_data)
    94         test = zip(test_inputs,test_data[1])
    95         net = Network([784, 30, 10])
    96         net.SGD(data[:5000],test[:5000],50,10, 3.0,)   # 4476/5000 (4347/5000)
  • 相关阅读:
    Linux命令-tail命令
    服务器重装ip未更改,ssh连不上(WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED)
    Docker中Nginx部署go应用
    Django+gunicorn+nginx项目部署
    Django之 CVB&FVB
    Django之form校验&后台管理
    python argparse例子实践
    重新认识递归
    Django之数据库对象关系映射
    jenkins参数化构建&HTML报告
  • 原文地址:https://www.cnblogs.com/qw12/p/6107553.html
Copyright © 2020-2023  润新知