• 深度学习入门书籍笔记


    title: 深度学习入门书籍笔记
    date: 2022-08-02 12:57:39
    mathjax:true
    tags:
    - 深度学习
    - python
    

    第3章 神经网络

    3.2 激活函数

    将输入信号的总和转换为输出信号,一般称为激活函数

    3.2.3 阶跃函数的图形

    p44

    import numpy as np
    import matplotlib.pyplot as plt
    
    def step_function(x):
        return np.array(x > 0,dtype=np.int)
    
    x = np.arange(-5.0,5.0,0.1)
    y = step_function(x)
    plt.plot(x,y)
    plt.ylim(-0.1,1.1)
    plt.show()
    

    image-20220802130340924

    3.2.4 sigmoid函数实现

    p45

    sigmoid函数也叫Logistic函数,用于隐层神经元输出,取值范围为(0,1),它可以将一个实数映射到(0,1)的区间

    import numpy as np
    from matplotlib import pyplot as plt
    
    
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    x = np.array([-1.0,1.0,2.0])
    print(sigmoid(x)) # [0.26894142 0.73105858 0.88079708]
    
    x = np.arange(-5.0,5.0,0.1)
    y = sigmoid(x)
    plt.plot(x,y)
    plt.ylim(-0.1,1.1)
    plt.show()
    

    image-20220802130821396

    3.2.7 ReLu函数

    p49

    import numpy as np
    from matplotlib import pyplot as plt
    
    
    def relu(x):
        return np.maximum(0,x)
    
    x = np.arange(-5.0,5.0,0.1)
    y = relu(x)
    plt.plot(x,y)
    plt.ylim(0,2)
    plt.show()
    

    image-20220802131502225

    3.3 多维数组的运算

    3.3.1 多维数组

    p50

    import numpy as np
    A = np.array([1,2,3,4])
    print(np.ndim(A)) # 1
    
    print(A.shape) # (4,)
    
    print(A.shape[0]) # 4
    
    B = np.array([[1,2],[3,4],[5,6]])
    print(B)
    # [[1 2]
    #  [3 4]
    #  [5 6]]
    
    print(np.ndim(B)) # 2
    
    print(B.shape) # (3, 2)
    

    3.3.2 矩阵乘法

    p52

    import numpy as np
    
    A = np.array([[1,2],[3,4]])
    print(A.shape) # (2, 2)
    
    B = np.array([[5,6],[7,8]])
    print(B.shape) # (2, 2)
    
    print(np.dot(A,B))
    # [[19 22]
    #  [43 50]]
    

    3.4 3层神经网络实现

    p62

    一层神经网络

    image-20220802151906850

    代码实现

    import numpy as np
    
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    X = np.array([1.0,0.5])
    W1 = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
    B1 = np.array([0.1,0.2,0.3])
    
    print(W1.shape) # (2, 3)
    print(B1.shape) # (3,)
    print(X.shape) # (2,)
    
    A1 = np.dot(X,W1) + B1
    print(A1) #[0.3 0.7 1.1]
    
    Z1 = sigmoid(A1)
    print(Z1) # [0.57444252 0.66818777 0.75026011]
    

    3层代码实现

    import numpy as np
    
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    # identity_function也称恒等函数,没有特定意义,只是为了之前的流程保持一致
    def identity_function(x):
        return x
    
    def init_network():
        network = {}
        network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
        network['B1'] = np.array([0.1,0.2,0.3])
        network['W2'] = np.array([[0.1,0.4],[0.2,0.5],[0.3,0.6]])
        network['B2'] = np.array([0.1,0.2])
        network['W3'] = np.array([[0.1,0.3],[0.2,0.4]])
        network['B3'] = np.array([0.1,0.2])
    
        return network
    
    def forward(network,X):
        W1,W2,W3 = network['W1'],network['W2'],network['W3']
        B1,B2,B3 = network['B1'],network['B2'],network['B3']
    
        # 第一层
        a1 = np.dot(X,W1) + B1
        z1 = sigmoid(a1)
        # 第二层
        a2 = np.dot(z1,W2) + B2
        z2 = sigmoid(a2)
        # 第三层
        a3 = np.dot(z2,W3) + B3
        y = identity_function(a3)
    
        return y
    
    network = init_network()
    x = np.array([1.0,0.5])
    y = forward(network,x)
    
    print(y)  # [0.31682708 0.69627909]
    

    init_work()函数会进行权重和偏置值的初始化,并保存到字典变量network中

    forward()函数封装输入信号到输出信号的处理过程,forward一词表示从输入到输出方向的传递过程,和backward相反

    3.5 输出层的设计

    3.5.1 softmax函数实现

    p64

    # 普通版本
    def softmax(a):
        x = np.exp(a)
        sum = np.sum(x)
        return x / sum
    
    # 防溢出版本 p67
    def softmax(a):
        c = np.max(a)
        exp_a = np.exp(a - c) # 溢出对策
        sum_exp_a = np.sum(exp_a)
        y = exp_a / sum_exp_a
        
        return y
    
    print(softmax(np.array([0.3,2.9,4.0]))) # [0.01821127 0.24519181 0.73659691]
    

    softmax()函数的输出是0.0到1.0之间的实数,并且softmax()函数的输出值的总和是1,所以才可以把softmax()函数的输出解释为”概率“,输出结果的第一个可以理解为概率为1.8%,第二个为25%,第三个为74%

    3.6 手写数字识别

    3.6.1 mnist数据集

    p70

    import sys,os
    sys.path.append(os.pardir)
    from dataset.mnist import load_mnist
    
    (x_train,t_train),(x_test,t_test) = load_mnist(flatten=True,normalize=False)
    
    print(x_train.shape) # (60000, 784)
    print(t_train.shape) # (60000,)
    print(x_test.shape) # (10000, 784)
    print(t_test.shape) # (10000,)
    print(np.ndim(x_train)) # 2
    

    load_mnis()函数以(训练图像,训练标签),(测试图像,测试标签)为返回值

    normalize设置是否将输入图像正规化为0.01.0的值,如果为False,则输入图像会保持原来的0255

    flatten设置是否展开图像(变为一维数组),如果设为False,输出1×28×28的三维数组,设为True,输出784个元素构成的一维数组

    x_train的输出

    image-20220802161529669

    显示图片

    import sys,os
    sys.path.append(os.pardir)
    import numpy as np
    from dataset.mnist import load_mnist
    from PIL import Image
    
    def img_show(img):
        pil_img = Image.fromarray(np.uint8(img))
        pil_img.show()
    
    (x_train,t_train),(x_test,t_test) = load_mnist(flatten=True,normalize=False)
    img = x_train[0]
    label = t_train[0]
    print(label) #5
    
    print(img.shape) # (784,)
    img = img.reshape(28,28)
    print(img.shape) # (28, 28)
    
    img_show(img)
    

    img矩阵(部分)shape为(784,)

    image-20220802162617370

    img矩阵(部分)shape为(28, 28)

    image-20220802162808025

    3.6.3 批处理

    p77

    # coding: utf-8
    import sys, os
    sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
    import numpy as np
    import pickle
    from dataset.mnist import load_mnist
    from common.functions import sigmoid, softmax
    
    
    def get_data():
        (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=False)
        return x_test, t_test
    
    
    def init_network():
        with open("sample_weight.pkl", 'rb') as f:
            network = pickle.load(f)
        return network
    
    
    def predict(network, x):
        w1, w2, w3 = network['W1'], network['W2'], network['W3']
        b1, b2, b3 = network['b1'], network['b2'], network['b3']
    
        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        z2 = sigmoid(a2)
        a3 = np.dot(z2, w3) + b3
        y = softmax(a3)
    
        return y
    
    
    x, t = get_data()
    print(x.shape) # (10000, 784)
    network = init_network()
    print(network)
    
    batch_size = 100 # 批数量
    accuracy_cnt = 0
    
    for i in range(0, len(x), batch_size):
        x_batch = x[i:i+batch_size]
        print(x_batch.shape) # (100, 784) # 100个矩阵,每个矩阵里面784个元素
        y_batch = predict(network, x_batch)  
        p = np.argmax(y_batch, axis=1)
        accuracy_cnt += np.sum(p == t[i:i+batch_size])
    
    print("Accuracy:" + str(float(accuracy_cnt) / len(x)))
    
    

    第4章 神经网络的学习

    4.2 损失函数

    4.2.1 均方误差

    p86

    import numpy as np
    
    def mean_squared_error(y,t):
        return 0.5 * np.sum((y-t)**2)
    y = [0.1,0.05,0.6,0.0,0.05,0.1,0.0,0.1,0.0,0.0,0.0]
    t = [0,0,0,1,0,0,0,0,0,0,0]
    
    print(mean_squared_error(np.array(y),np.array(t))) # 0.6974999999999999
    

    4.2.2 交叉熵误差

    p87

    import numpy as np
    
    def cross_entroy_error(y,t):
        delta = 1e-7 # 防止除数为0,防止分母为0
        return -np.sum(t * np.log(y + delta))
    

    4.2.3 mini-batch学习

    这里损失函数以交叉熵为例,可以写成下面的式子

    \[E \ = \ -\frac{1}{N}\sum_{n}^{} \sum_{k}^{} t_{nk} log_{y_{nk}} \]

    假设batch大小为N,算出误差后,最后还要除以N,进行正则化。通过除以N,可以求得单个数据的“平均损失函数”。通过这样的平均化,可以获得和训练数据的数量无关的统一指标。

    p89

    如果数据很大,我们从全部数据中选出一部分,作为全部数据的”近似“,神经网络学习也是从训练数据中选出一批数据(称为mini-batch批量),然后对这个批量进行学习。比如,从60000个训练数据中,随机选择100笔,再用这个100笔进行学习,这种学习方式就是mini-batch学习

    import sys,os
    sys.path.append(os.pardir)
    import numpy as np
    from dataset.mnist import load_mnist
    
    (x_train,t_train),(x_test,t_test) = load_mnist(normalize=True,one_hot_label=True)
    
    print(x_train.shape) # (60000, 784)
    print(t_train.shape) # (60000, 10)
    
    ###### 随机挑选10个 #######
    train_size = x_train.shape[0] # 60000
    batch_size = 10
    batch_mask = np.random.choice(train_size,batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    

    np.random.choice(a,b) :从a中挑选b个数据

    4.2.4 mini-batch版交叉熵实现

    p91

    当y的纬度是1时,即求单个数据的交叉熵误差时,需要改变数据的形状。并且当输入是mini-batch时,要用batch的个数进行正规化,计算单个数据的平均交叉熵误差

    监督数据是one-hot形式

    import numpy as np
    
    def cross_entroy_error_mini_batch(y,t):
        if y.ndim == 1:
            t = t.reshape(1,t.shape)
            y = y.reshape(1,y.shape)
    
        batch_size = y.shape(1,t.size)
        return -np.sum(t * np.log(y + 1e-7)) / batch_size
    
    
    t = np.array([1,2,3])
    
    print(np.ndim(t))  # 1
    print(t) # [1 2 3]
    print(t.shape) # (3,)
    
    t = t.reshape(1,t.size)
    print(t) # [[1 2 3]]
    print(t.shape) # (1, 3)
    

    监督数据是标签形式(非one-hot形式,而是像”2“,”7“,”8“)

    def cross_entropy_error_mini_batch_label(y,t):
        if y.ndim == 1:
            t = t.reshape(1,t.size)
            y = y.reshape(1,y.size)
            
        batch_size = y.shape[0]
        return -np.sum(np.log(y[np.arange(batch_size),t]+1e-7)) / batch_size
    

    np.arange(batch_size):会返回0到batch_size-1的数组

    t中的标签是以[2,7,0,9,4]的形式存储的,所以y[np.arange(batch_size),t]能抽出各个数据的正确解标签对应的神经网络的输出,这里生成numpy数组[y[0,2],y[1,7],y[2,0],y[3,9],y[4,4]]

    4.3 数值微分

    4.3.1 导数

    p95

    import numpy as np
    from matplotlib import pyplot as plt
    
    
    def numerical_diff(f,x):
        h = 1e-4 # 0.0001
        return (f(x+h) - f(x-h)) / (2*h)
    
    def function_1(x):
        return x**2
    
    x = np.arange(0.0,20.0,0.1) # 以0.1为刻度单位,从0到20的数组
    y = function_1(x)
    plt.xlabel("x")
    plt.ylabel("y")
    plt.plot(x,y)
    plt.show()
    
    # 计算y=x^2在x=2处的导数
    
    print(numerical_diff(function_1,2)) # 4.000000000004
    

    4.4 梯度

    p101

    import numpy as np
    
    # y = x1^2+x2^2
    def function_2(x):
        return x[0]**2+x[1]**2
    
    def numerical_gradient(f,x):
        h = 1e-4 # 0.0001
        grad = np.zeros_like(x) # 生成和x形状相同的数组
    
        for idx in range(x.size):
            tmp_val = x[idx]
            # f(x+h)的计算
            x[idx] = tmp_val + h
            fxh1 = f(x)
    
            # f(x-h)的计算
            x[idx] = tmp_val - h
            fxh2 = f(x)
    
            grad[idx] = (fxh1 -fxh2) / (2*h)
            x[idx] = tmp_val # 还原值
    
        return grad
    
    x = np.array([3.0,4.0])
    # 计算 y = x1^2+x2^2 在(3,4)位置的导数
    print(numerical_gradient(function_2,x)) # [6. 8.]
    

    4.4.1 梯度下降法

    p105

    import numpy as np
    from matplotlib import pyplot as plt
    
    
    def function_2(x):
        return x[0]**2 + x[1]**2
    
    def numerical_gradient(f,x):
        h = 1e-4
        grad = np.zeros_like(x)
    
        for idx in range(x.size):
            tmp_val = x[idx]
            x[idx] = tmp_val + h
            fxh1 = f(x)
    
            x[idx] = tmp_val - h
            fxh2 = f(x)
    
            grad[idx] = (fxh1 - fxh2) / (2*h)
    
        return grad
    
    
    def gradient_descent(f,init_x,lr=0.01,step_num=100):
        x = init_x
        x_history = []
        for i in range(step_num):
            x_history.append(x.copy())
            grad = numerical_gradient(f,x)
            x -= lr*grad
    
        return x,np.array(x_history)
    
    x, x_history = gradient_descent(function_2, np.array([-3.0, 4.0]))
    
    plt.plot( [-5, 5], [0,0], '--b')
    plt.plot( [0,0], [-5, 5], '--b')
    plt.plot(x_history[:,0], x_history[:,1], 'o')
    
    plt.xlim(-3.5, 3.5)
    plt.ylim(-4.5, 4.5)
    plt.xlabel("X0")
    plt.ylabel("X1")
    plt.show()
    

    X[:,0]是numpy中数组的一种写法,表示对一个二维数组,取该二维数组第一维中的所有数据,第二维中取第0个数据,直观来说,X[:,0]就是取所有行的第0个数据, X[:,1] 就是取所有行的第1个数据。

    如果要找最低点,用梯度下降,找最高点,用梯度上升法

    4.4.2 神经网络的梯度

    p107

    import sys,os
    import time
    
    sys.path.append(os.pardir)
    import numpy as np
    
    def cross_entropy_error(y, t):
        if y.ndim == 1:  # 如果不进行转换,后面很多操作无法进行下去
            t = t.reshape(1, t.size)
            y = y.reshape(1, y.size)
    
        # 监督数据是one-hot-vector的情况下,转换为正确解标签的索引
        if t.size == y.size:
            t = t.argmax(axis=1)
    
        batch_size = y.shape[0]
        return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size
    
    def softmax(x):
        if x.ndim == 2:
            x = x.T
            x = x - np.max(x, axis=0)
            y = np.exp(x) / np.sum(np.exp(x), axis=0)
            return y.T
    
        x = x - np.max(x) # 溢出对策
        return np.exp(x) / np.sum(np.exp(x))
    
    def numerical_gradient(f, x):
        h = 1e-4  # 0.0001
        grad = np.zeros_like(x)
    
        it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
        while not it.finished:
            idx = it.multi_index
            tmp_val = x[idx]
            x[idx] = float(tmp_val) + h
            fxh1 = f(x)  # f(x+h)
    
            x[idx] = tmp_val - h
            fxh2 = f(x)  # f(x-h)
            grad[idx] = (fxh1 - fxh2) / (2 * h)
    
            x[idx] = tmp_val  # 还原值
            it.iternext()
    
        return grad
    
    class SimpleNet:
        def __init__(self):
            self.W = np.random.randn(2,3) # 用高斯分布进行初始化
    
        def predict(self,x):
            return np.dot(x,self.W)
    
        def loss(self,x,t):
            z = self.predict(x)
            y = softmax(z)
            loss = cross_entropy_error(y,t)
    
            return loss
    
    # 参数测试用
    t = np.array([1,2,3])
    # print(t.argmax(axis=1)) numpy.AxisError: axis 1 is out of bounds for array of dimension 1  报错不能,执行,需要进行reshape后才能运行
    print(np.ndim(t))  # 1
    print(t) # [1 2 3]
    print(t.shape) # (3,)
    
    # 参数测试用
    t = t.reshape(1,t.size)
    print(t) # [[1 2 3]]
    print(t.shape) # (1, 3)
    print(t.argmax(axis=1)) # 2
    
    net = SimpleNet()
    print(net.W) # 权重参数
    x = np.array([0.6,0.9])
    p = net.predict(x)
    print(p)
    print(np.argmax(p))  # 2
    print(np.ndim(p))    # 1
    
    t = np.array([0,0,1])
    net.loss(x,t)
    
    

    4.5 学习算法的实现

    p109

    神经网络学习分为4个步骤

    步骤1(mini-batch)

    从训练数据中随机挑选一部分数据

    步骤2(计算梯度)

    为了减少mini-batch的损失函数的值,需要求出各个权重参数的梯度。梯度表示损失函数的值减少最多的方向

    步骤3(更新参数)

    将权重参数沿梯度方向进行微小更新

    步骤4(重复)

    重复上述步骤

    4.5.1 2层神经网络的分类

    p111

    import sys,os
    
    import numpy as np
    
    sys.path.append(os.pardir)
    from common.functions import *
    from common.gradient import numerical_gradient
    
    class TwoLayerNet:
    
        def __init__(self,input_size,hidden_size,output_size,weight_init_std=0.01):
            # 初始化权重
            self.params = {}
            self.params['W1'] = weight_init_std * np.random.randn(input_size,hidden_size)
            self.params['b1'] = np.zeros(hidden_size)
    
            self.params['W2'] = weight_init_std * np.random.randn(hidden_size,output_size)
            self.params['b2'] = np.zeros(output_size)
    
        def predict(self,x):
            W1,W2 = self.params['W1'],self.params['W2']
            b1,b2 = self.params['b1'],self.params['b2']
    
            a1 = np.dot(x,W1) + b1
            z1 = sigmoid(a1)
            a2 = np.dot(z1,W2) + b2
            y = softmax(a2)
    
            return y
    
        # x:输入数据。t:监督数据
        def loss(self,x,t):
            y = self.predict(x)
            return cross_entropy_error(y,t)
    
        def accuracy(self,x,t):
            y = self.predict(x)
            y = np.argmax(y,axis=1)
            t = np.argmax(t,axis=1)
    
            accuracy = np.sum(y == t) / float(x.shape[0])
            return accuracy
    
        # x:输入数据,t:监督数据
        def numerical_gradient(self,x,t):
            loss_W = lambda W:self.loss(x,t)
    
            grads = {}
            grads['W1'] = numerical_gradient(loss_W,self.params['W1'])
            grads['b1'] = numerical_gradient(loss_W,self.params['b1'])
            grads['W2'] = numerical_gradient(loss_W,self.params['W2'])
            grads['b2'] = numerical_gradient(loss_W,self.params['b2'])
    
            return grads
    
    net = TwoLayerNet(input_size=784,hidden_size=100,output_size=10)
    print(net.params['W1'].shape) #(784, 100)
    print(net.params['b1'].shape) #(100,)
    print(net.params['W2'].shape) #(100, 10)
    print(net.params['b2'].shape) #(10,)
    
    
    x = np.random.randn(100,784) # 伪输入数据(100笔)
    t = np.random.randn(100,10)  # 伪正确数(100笔)
    
    grads = net.numerical_gradient(x,t)
    print(grads['W1'].shape) #(784, 100)
    print(grads['b1'].shape)#(100,)
    print(grads['W2'].shape) #(100, 10)
    print(grads['b2'].shape)#(10,)
    
    

    input_size:输入层的神经元数,输入图像大小是784(28×28)

    hidden_size:隐藏层的神经元数,将隐藏层的个数设置一个合适的值就可以了

    output_size:输出层的神经元数,输出为10个类别

    np.random.randn(m,n):生成m×n的矩阵

    4.5.2 mini-batch实现

    p115

    mini-batch设为100,每次从60000个数据选100个出来,执行10000次。每更新一次,都对训练数据计算损失函数的值,并添加到数组中。

    import sys,os
    
    import numpy as np
    from torch import nn
    
    sys.path.append(os.pardir)
    from common.functions import *
    from common.gradient import numerical_gradient
    from dataset.mnist import load_mnist
    
    import torch
    
    class TwoLayerNet:
    
        def __init__(self,input_size,hidden_size,output_size,weight_init_std=0.01):
    
            # 初始化权重
            self.params = {}
            self.params['W1'] = weight_init_std * np.random.randn(input_size,hidden_size)
            self.params['b1'] = np.zeros(hidden_size)
    
            self.params['W2'] = weight_init_std * np.random.randn(hidden_size,output_size)
            self.params['b2'] = np.zeros(output_size)
    
        def predict(self,x):
            W1,W2 = self.params['W1'],self.params['W2']
            b1,b2 = self.params['b1'],self.params['b2']
            a1 = np.dot(x,W1) + b1
            z1 = sigmoid(a1)
            a2 = np.dot(z1,W2) + b2
            y = softmax(a2)
    
            return y
    
        # x:输入数据。t:监督数据
        def loss(self,x,t):
            y = self.predict(x)
            return cross_entropy_error(y,t)
    
        def accuracy(self,x,t):
            y = self.predict(x)
            y = np.argmax(y,axis=1)
            t = np.argmax(t,axis=1)
    
            accuracy = np.sum(y == t) / float(x.shape[0])
            return accuracy
    
        # x:输入数据,t:监督数据
        def numerical_gradient(self,x,t):
            loss_W = lambda W:self.loss(x,t)
    
            grads = {}
            grads['W1'] = numerical_gradient(loss_W,self.params['W1'])
            grads['b1'] = numerical_gradient(loss_W,self.params['b1'])
            grads['W2'] = numerical_gradient(loss_W,self.params['W2'])
            grads['b2'] = numerical_gradient(loss_W,self.params['b2'])
    
            return grads
    
    (x_train,t_train),(x_test,t_test) = load_mnist(normalize=True,one_hot_label=True)
    
    train_loss_list = []
    
    # 超参数
    iters_num = 10000
    train_size = x_train.shape[0]
    batch_size = 100
    learning_rate = 0.1
    network = TwoLayerNet(input_size=784,hidden_size=50,output_size=10)
    
    for i in range(iters_num):
        # 获取mini-batch
        batch_mask = np.random.choice(train_size,batch_size)
        x_batch = x_train[batch_mask]
        t_batch = t_train[batch_mask]
    
        # 计算梯度
        grad = network.numerical_gradient(x_batch,t_batch)
    
        # 更新参数
        for key in ('W1','b1','W2','b2'):
            network.params[key] -= learning_rate * grad[key]
    
        # 记录学习过程
        loss = network.loss(x_batch,t_batch).cuda()
        train_loss_list.append(loss)
    

    4.5.3 基于测试数据的评价

    p117

    import sys,os
    
    import numpy as np
    from torch import nn
    
    sys.path.append(os.pardir)
    from common.functions import *
    from common.gradient import numerical_gradient
    from dataset.mnist import load_mnist
    
    import torch
    
    class TwoLayerNet:
    
        def __init__(self,input_size,hidden_size,output_size,weight_init_std=0.01):
    
            # 初始化权重
            self.params = {}
            self.params['W1'] = weight_init_std * np.random.randn(input_size,hidden_size)
            self.params['b1'] = np.zeros(hidden_size)
    
            self.params['W2'] = weight_init_std * np.random.randn(hidden_size,output_size)
            self.params['b2'] = np.zeros(output_size)
    
        def predict(self,x):
            W1,W2 = self.params['W1'],self.params['W2']
            b1,b2 = self.params['b1'],self.params['b2']
            a1 = np.dot(x,W1) + b1
            z1 = sigmoid(a1)
            a2 = np.dot(z1,W2) + b2
            y = softmax(a2)
    
            return y
    
        # x:输入数据。t:监督数据
        def loss(self,x,t):
            y = self.predict(x)
            return cross_entropy_error(y,t)
    
        def accuracy(self,x,t):
            y = self.predict(x)
            y = np.argmax(y,axis=1)
            t = np.argmax(t,axis=1)
    
            accuracy = np.sum(y == t) / float(x.shape[0])
            return accuracy
    
        # x:输入数据,t:监督数据
        def numerical_gradient(self,x,t):
            loss_W = lambda W:self.loss(x,t)
    
            grads = {}
            grads['W1'] = numerical_gradient(loss_W,self.params['W1'])
            grads['b1'] = numerical_gradient(loss_W,self.params['b1'])
            grads['W2'] = numerical_gradient(loss_W,self.params['W2'])
            grads['b2'] = numerical_gradient(loss_W,self.params['b2'])
    
            return grads
    
    (x_train,t_train),(x_test,t_test) = load_mnist(normalize=True,one_hot_label=True)
    
    train_loss_list = []
    train_acc_list = []
    test_acc_list = []
    
    
    # 超参数
    iters_num = 10000
    train_size = x_train.shape[0]
    batch_size = 100
    learning_rate = 0.1
    
    # 平均每个epoch的重复次数
    iter_per_epoch = max(train_size / batch_size,1)
    
    network = TwoLayerNet(input_size=784,hidden_size=50,output_size=10)
    
    for i in range(iters_num):
        print("第{}次".format(i))
        # 获取mini-batch
        batch_mask = np.random.choice(train_size,batch_size)
        x_batch = x_train[batch_mask]
        t_batch = t_train[batch_mask]
    
        # 计算梯度
        grad = network.numerical_gradient(x_batch,t_batch)
        print(grad)
        # 更新参数
        for key in ('W1','b1','W2','b2'):
            network.params[key] -= learning_rate * grad[key]
    
        # 记录学习过程
        loss = network.loss(x_batch,t_batch)
        train_loss_list.append(loss)
    
        # 计算每个epoch的识别精度
        if i % iter_per_epoch == 0:
            train_acc = network.accuracy(x_train,t_train)
            test_acc = network.accuracy(x_test,t_test)
            train_acc_list.append(train_acc)
            test_acc_list.append(test_acc)
            print(str(train_acc),str(test_acc))
    

    第5章 误差反向传播法

    5.2 链式法则

    5.2.1 计算图的反向传播

    p127

    image-20220805182020024

    反向传播的计算顺序为,将信号E乘以结点的局部导数,然后将结果传递给下一个结点

    5.3 反向传播

    5.3.1 加法层的实现

    p130

    加法节点的反向传播只乘以1,所以输入的值会原封不动地流向下一个节点。

    比如z(x,y) = x+y,求x和y求导都是1

    5.4 简单层的实现

    5.4.1 乘法层的实现

    p135

    image-20220805164044250

    class MulLayer:
        def __init__(self):
            self.x = None
            self.y = None
    
        def forward(self,x,y):
            self.x = x
            self.y = y
            out = x * y
            return out
    
        def backward(self,dout):
            dx = dout * self.y
            dy = dout * self.x
    
            return dx,dy
    
    apple = 100
    apple_num = 2
    tax = 1.1
    
    mul_apple_layer = MulLayer()
    mul_tax_layer = MulLayer()
    
    # forward
    apple_price = mul_apple_layer.forward(apple,apple_num)
    price = mul_tax_layer.forward(apple_price,tax)
    print(price) # 220
    
    # backward
    dprice = 1
    dapple_price,dtax = mul_tax_layer.backward(dprice)
    dapple,dapple_num = mul_apple_layer.backward(dapple_price)
    print(dapple_price,dtax,dapple,dapple_num) # 1.1 200 2.2 110
    

    5.4.2 加法层的实现

    p137

    class AddLayer:
        def __init__(self):
            pass  # pass 表示什么也不运行
    
        def forward(self,x,y):
            out = x + y
            return out
    
        def backward(self,dout):
            dx = dout * 1
            dy = dout * 1
            return dx,dy
    
    

    5.4.3 组合实现

    p138

    image-20220805165100370

    class AddLayer:
        def __init__(self):
            pass  # pass 表示什么也不运行
    
        def forward(self,x,y):
            out = x + y
            return out
    
        def backward(self,dout):
            dx = dout * 1
            dy = dout * 1
            return dx,dy
    
    class MulLayer:
        def __init__(self):
            self.x = None
            self.y = None
    
        def forward(self,x,y):
            self.x = x
            self.y = y
            out = x * y
            return out
    
        def backward(self,dout):
            dx = dout * self.y
            dy = dout * self.x
    
            return dx,dy
    
    apple = 100
    apple_num = 2
    orange = 150
    orange_num = 3
    tax = 1.1
    
    # layer
    mul_apple_layer = MulLayer()
    mul_orange_layer = MulLayer()
    add_apple_orange_layer = AddLayer()
    mul_tax_layer = MulLayer()
    
    # forward
    apple_price = mul_apple_layer.forward(apple,apple_num)
    orange_price = mul_orange_layer.forward(orange,orange_num)
    all_price = add_apple_orange_layer.forward(apple_price,orange_price)
    price = mul_tax_layer.forward(all_price,tax)
    
    # backward
    dprice = 1
    dall_price,dtax = mul_tax_layer.backward(dprice)
    dapple_price,dorange_price = add_apple_orange_layer.backward(dall_price)
    dorange,dorange_num = mul_orange_layer.backward(dorange_price)
    dapple,dapple_num = mul_apple_layer.backward(dapple_price)
    
    print(price) # 715
    print(dapple_num,dapple,dorange,dorange_num,dtax) # 110,2.2,3.3,165,650
    

    5.5 激活函数层的实现

    5.5.1 ReLu层的实现

    p140

    import numpy as np
    
    
    class Relu:
        def __init__(self):
            self.mask = None
    
        def forward(self,x):
            self.mask  = (x <= 0)
            out = x.copy()
            out[self.mask] = 0
    
        def backward(self,dout):
            dout[self.mask] = 0
            dx = dout
            return dx
    
    x = np.array([[1.0,-0.5],[-2.0,3.0]])
    print(x)
    # [[ 1.  -0.5]
    #  [-2.   3. ]]
    
    mask = (x <= 0)
    print(mask)
    # [[False  True]
    #  [True   False]]
    

    mask变量是由True/False组成的numpy数组,它会把正向传播中x中元素小于等于0的地方保存为True,其他地方保存为False。

    如果正向传播的输入值小于等于0,则反向传播的值为0,因此,反向传播中会使用正向传播时保存的mask,将从上游传来的dout的mask中的元素为True的地方设为0

    5.5.2 Sigmoid层

    p141

    sigmoid反向传播流程

    image-20220805183010982

    image-20220805183134495

    sigmoid层的计算图简洁版

    简洁版效率更高,不用在意sigmoid层中的琐碎细节

    image-20220805183454220

    对公式进行整理

    image-20220805183804680

    代码实现

    import numpy as np
    
    class Sigmoid:
        def __init__(self):
            self.out = None
    
        def forward(self,x):
            out = 1 / (1 + np.exp(-x))
            return out
    
        def backward(self,dout):
            dx = dout * (1.0 - self.out) * self.out
            return dx
    

    5.6 Affine/Softmax层的实现

    5.6.1 Affine层

    p145

    神经网络的正向传播中进行的矩阵的乘积运算在几何学领域被统称为“仿射变换”。这里将仿射变换的处理实现为“Affine层”

    这里,X,W,B分别是形状为(2,),(2,3),(3,)的多维数组image-20220805203958546

    (X· W)+B后得到Y,Y的维度是(3,),Y求完导后还是(3,),反推得到下式

    image-20220805204901970

    因为X维度是(2,),所以右边的式子的维度也要求是(2,),因此后面的式子那样写,只有那样它们相乘的维度才是(2,),这里的X是单个数据为对象的

    image-20220805205603728

    5.6.2 批版本的Affine层

    p148

    上面的Affine层的X是单个数据为对象的。现在把N个数据一起进行正向传播的情况,也就是批版本的Affine层。

    image-20220805210406168

    矩阵的形状

    b里面有3个元素,所以维度是(3,),x_dot_w里面有2个数组,每个数组里面有3个元素,所以维度是(2,3),y里面有3个数组,每个数组里面又有2个数组,数组有4个元素,所以维度是(2,3,4)

    b = np.array([1,2,3])
    print(b.shape,np.ndim(b)) # (3,) 1
    
    x_dot_w = np.array([[1,2,3],
                  [4,5,6]])
    print(x_dot_w.shape,np.ndim(x_dot_w)) # (2, 3) 2
    
    y = np.array([
        [
            [1,2,3,0],
            [4,5,6,0]
        ],
        [
            [7,8,9,0],
            [10,11,12,0]
        ],
        [
            [17, 28, 29, 20],
            [210, 121, 122, 20]
        ]
    ])
    print(y.shape,np.ndim(y)) # (3, 2, 4) 3
    

    Affine层的实现

    import numpy as np
    
    class Affine:
        def __init__(self,W,b):
            self.W = W
            self.b = b
            self.x = None
            self.dW = None
            self.db = None
    
        def forward(self,x):
            self.x = x
            out = np.dot(x,self.W) + self.b
    		return out
            
        def backward(self,dout):
            dx = np.dot(dout,self.W.T)
            self.dW = np.dot(self.x.T,dout)
            self.db = np.sum(dout,axis=0)
            
            return dx
        
    b = np.array([[1,2,3],
                  [4,5,6]])
    print(b.T)
    # [[1 4]
    #  [2 5]
    #  [3 6]]
    

    5.6.3 softmax-with-loss层

    p152

    神经网络学习的目的就是通过调整权重参数,使神经网络的输出(softmax的输出)接近监督标签。因此,必须将神经网络的输出与监督标签的误差高效地传递给前面的层。使用交叉熵作为softmax函数的损失函数后,反向传播得到了(y1-t1,y2-t2,y3-t3),为了得到这样的结果特意设计交叉熵误差函数。回归问题中输出层使用恒等函数,损失函数使用"平方和误差"作为“恒等函数"的损失函数,反向传播也能得到(y1-t1,y2-t2,y3-t3)这样的结果

    代码实现

    import numpy as np
    
    def softmax(x):
        if x.ndim == 2:
            x = x.T
            x = x - np.max(x, axis=0)
            y = np.exp(x) / np.sum(np.exp(x), axis=0)
            return y.T
    
        x = x - np.max(x)  # 溢出对策
        return np.exp(x) / np.sum(np.exp(x))
    
    def cross_entropy_error(y, t):
        if y.ndim == 1:
            t = t.reshape(1, t.size)
            y = y.reshape(1, y.size)
    
        # 监督数据是one-hot-vector的情况下,转换为正确解标签的索引
        if t.size == y.size:
            t = t.argmax(axis=1)
    
        batch_size = y.shape[0]
        return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size
    
    class SoftmaxWithLoss:
        def __init__(self):
            self.loss = None   # 损失
            self.y = None # softmax的输出
            self.t = None # 监督数据(one-hot vector)
    
        def forward(self,x,t):
            self.t = t
            self.y = softmax(x)
            self.loss = cross_entropy_error(self.y,self.t)
    
            return self.loss
    
        def backward(self,dout=1):
            batch_size = self.t.shape[0]
            dx = (self.y - self.t) / batch_size
    
            return dx
    
    

    反向传播时,将要传输的值除以批次大小,传递给前面的是单个数据的误差

    5.7 误差反向传播法的实现

    5.7.2 对应误差反向传播法的神经网络的实现

    p156

    import sys,os
    sys.path.append(os.pardir)
    import numpy as np
    from common.layers import *
    from common.gradient import numerical_gradient
    from collections import OrderedDict
    
    class TwoLayerNet:
        def __init__(self,input_size,hidden_size,output_size,weight_init_std=0.01):
            # 初始化权重
            self.params = {}
            self.params['W1'] = weight_init_std * np.random.randn(input_size,hidden_size)
            self.params['b1'] = np.zeros(hidden_size)
            self.params['W2'] = weight_init_std * np.random.randn(hidden_size,output_size)
            self.params['b2'] = np.zeros(output_size)
    
            # 生成层
            self.layers = OrderedDict() # 让字典有序
            self.layers['Affine1'] = Affine(self.params['W1'],self.params['b1'])
            self.layers['Relu1'] = Relu()
            self.layers['Affine2'] = Affine(self.params['W2'],self.params['b2'])
            self.lastLayer = SoftmaxWithLoss()
    
        def predict(self,x):
            for layer in self.layers.values():
                x = layer.forward(x)
    
            return x
    
        # x:输入数据,t:监督数据
        def loss(self,x,t):
            y = self.predict(x)
            return self.lastLayer.forward(y,t)
    
        def accuracy(self,x,t):
            y = self.predict(x)
            y = np.argmax(y,axis=1)
            if t.ndim != 1 : t = np.argmax(t,axis=1)
            accuracy = np.sum(y == t) / float(x.shape[0])
            return accuracy
    
        # x:输入数据,t:监督数据
        def numerical_gradient(self,x,t):
            loss_W = lambda W:self.loss(x,t)
    
            grads = {}
            grads['W1'] = numerical_gradient(loss_W,self.params['W1'])
            grads['b1'] = numerical_gradient(loss_W,self.params['b1'])
            grads['W2'] = numerical_gradient(loss_W,self.params['W2'])
            grads['b2'] = numerical_gradient(loss_W,self.params['b2'])
    
            return grads
    
        def gradient(self,x,t):
            # forward
            self.loss(x,t)
    
            # backward
            dout = 1
            dout = self.lastLayer.backward(dout)
    
            layers = list(self.layers.values())
            layers.reverse() # 反转
            for layer in layers:
                dout = layer.backward(dout)
    
            # 设定
            grads = {}
            grads['W1'] = self.layers['Affine1'].dW
            grads['b1'] = self.layers['Affine1'].db
            grads['W2'] = self.layers['Affine2'].dW
            grads['b2'] = self.layers['Affine2'].db
    
            return grads
    

    5.7.3 误差反向传播法的梯度确认

    这里有两种求梯度的方法

    • 基于数值微分,实现简单,不太容易出错,但是很耗费时间
    • 解析性地求解数学式的方法(误差反向传播法),即使存在大量参数,也可以高效计算梯度,但是实现复杂,容易出错

    在确认误差反向传播法的实现是否正确时,是需要用到数值微分的。所以,经常会比较数值微分的结果和误差反向传播法的结果,确认误差反向传播的实现是否正确。

    梯度确认是值确认数值微分和误差反向传播法的实现是否一致(严格说,是非常相近)

    p158

    import sys,os
    sys.path.append(os.pardir)
    import numpy as np
    from dataset.mnist import load_mnist
    from twolayernet import TwoLayerNet
    
    # 读入数据
    (x_train,t_train),(x_test,t_test) = load_mnist(normalize=True,one_hot_label=True)
    
    network = TwoLayerNet(input_size=784,hidden_size=50,output_size=10)
    
    x_batch = x_train[:3]
    t_batch = t_train[:3]
    
    grad_numerical = network.numerical_gradient(x_batch,t_batch)
    grad_backprop = network.gradient(x_batch,t_batch)
    
    # 求各个权重的绝对值误差的平均值
    for key in grad_numerical.keys():
        diff = np.average(np.abs(grad_backprop[key] - grad_numerical[key]))
        print(key + ":"+str(diff))
    
    

    5.7.4 误差反向传播法的学习

    p160

    import sys,os
    sys.path.append(os.pardir)
    import numpy as np
    from dataset.mnist import load_mnist
    from twolayernet import TwoLayerNet
    
    # 读入数据
    (x_train,t_train),(x_test,t_test) = load_mnist(normalize=True,one_hot_label=True)
    
    network = TwoLayerNet(input_size=784,hidden_size=50,output_size=10)
    
    # 设置超参数
    iters_num = 10000
    train_size = x_train.shape[0]
    batch_size = 100
    learning_rate = 0.1
    train_loss_list = []
    train_acc_list = []
    test_acc_list = []
    
    iter_per_epoch = max(train_size / batch_size,1)
    
    for i in range(iters_num):
        batch_mask = np.random.choice(train_size,batch_size)
        x_batch = x_train[batch_mask]
        t_batch = t_train[batch_mask]
    
        # 通过误差反向传播法求梯度
        grad = network.gradient(x_batch,t_batch)
    
        # 更新
        for key in ('W1','b1','W2','b2'):
            network.params[key] -= learning_rate * grad[key]
    
        loss = network.loss(x_batch,t_batch)
        train_loss_list.append(loss)
    
        if i % iter_per_epoch == 0:
            train_acc = network.accuracy(x_train,t_train)
            test_acc = network.accuracy(x_test,t_test)
            train_acc_list.append(train_acc)
            test_acc_list.append(test_acc)
            print(train_acc,test_acc)
    

    第6章 与学习相关的技巧

    6.1 参数的更新

    6.1.2 SGD

    SGD低效的根本原因是梯度的方向没有指向最小值的方向

    p165

    class SGD:
        def __init__(self,lr=0.01):
            self.lr = lr
        
        def update(self,params,grads):
            for key in params.keys():
                params[key] -= self.lr * grads[key]
    

    6.1.4 Momentum(冲量)

    p168

    公式如下

    \[v \gets αv \ - η \frac{\partial L}{\partial W} \\ W \gets W \ + v \]

    αv:该项承担使物体逐渐减速的任务,对应的是物理中的地面摩擦力或空气阻力

    代码实现

    import numpy as np
    
    
    class Momentum:
        def __init__(self,lr=0.01,momentum=0.9):
            self.lr = lr
            self.momentum = momentum
            self.v = None
            
        def update(self,params,grads):
            if self.v is None:
                self.v = {}
                for key, val in params.items():
                    self.v[key] = np.zeros_like(val)
            
            for key in params.keys():
                self.v[key] = self.momentum*self.v[key] - self.lr * grads[key]
                params[key] += self.v[key]
    

    v保存物体的速度,初始化时,v中什么都不保存,但当第一次调用update()时,v会以字典变量的形式保存与参数结构相同的数据

    6.1.5 AdaGrad

    p170

    AdaGrad会为参数的每个元素适当地调节学习率,它会记录过去所有梯度的平方和,因此,学习越深入,更新的幅度越小。如果无止境的学习,更新量就会为0,完全不再更新。RMSProp可以改善这个问题,它会逐渐忘掉过去的梯度。

    公式如下

    \[h \gets h\ + \frac{\partial L}{\partial W} \odot \frac{\partial L}{\partial W} \\ W \gets W \ - \ η\frac{1}{\sqrt{h} } \frac{\partial L}{\partial W} \]

    这里的h它保存了以前的所有梯度的平方和。\(\frac{1}{\sqrt{h} }\)可以使变动大的元素的学习率变小,使变动小的元素的学习率变大。

    代码实现

    import numpy as np
    
    
    class AdaGrad:
        def __init__(self,lr=0.01):
            self.lr = lr
            self.h = None
    
        def update(self,params,grads):
            if self.h is None:
                self.h = {}
                for key,val in params.items():
                    self.h[key] = np.zeros_like(val)
    
            for key in params.keys():
                self.h[key] += grads[key] * grads[key]
                params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)
    

    6.1.6 Adam

    p173

    Adam通俗来讲,就是融合了Momentum和AdaGrad的方法

    6.2 权重的初始值

    6.2.1 可以将权重初始值设为0吗

    p176

    不可以,权重设置为0,严格来说,不能将权重设置为一样的值,因为权重被更新为相同的值,并拥有了对称的值(重复的值),使得神经网络拥有许多不同的权重的意义丧失了。因此,必须随机生成初始化值。

    6.2.2 隐藏层的激活值的分布

    p177

    import numpy as np
    import matplotlib.pyplot as plt
    
    def sigmoid(x):
        return 1/(1+np.exp(-x))
    
    x = np.random.randn(1000,100) # 1000×100的矩阵
    node_num = 100 # 各隐藏层的节点(神经元)数
    hidden_layer_size = 5 # 隐藏层有5层
    activations = {}  # 激活值的结果保存在这里
    
    for i in range(hidden_layer_size):
        if i!= 0:
            x = activations[i-1]
    
        # w = np.random.randn(node_num,node_num) * 1   # 使用标准差为1的高斯分布
        w = np.random.randn(node_num,node_num) * 0.01  # 使用标准差为0.01的高斯分布
        z = np.dot(x,w)
        a = sigmoid(z)
        activations[i] = a
    
    # 绘制直方图
    for i,a in activations.items():
        plt.subplot(1,len(activations),i+1)
        plt.title(str(i+1)+"-layer")
        plt.hist(a.flatten(),30,range=(0,1))
    plt.show()
    
    

    输出结果

    image-20220808205707457

    由上图可以知道,各层的激活值偏向0和1的分布,输出不断地靠近0或者1它的导数值逐渐接近0,。偏向0和1的数据分布,会造成反向传播中梯度的值不断减少,直至消失。这个问题称为梯度消失

    使用标准差为0.01的高斯分布如下图所示

    w = np.random.randn(node_num,node_num) * 0.01 
    

    image-20220808210247186

    这次集中在0.5附近,不会有梯度消失的问题,但是这里的值都集中在0.5附近,会出现“表现力受限”的问题。

    使用标准差为$\frac{1}{\sqrt{n} } $的高斯分布如下图所示

    w = np.random.randn(node_num, node_num) / np.sqrt(node_num)
    

    image-20220809085651809

    使用Xavier初始值(标准差为$\frac{1}{\sqrt{n} } $)后,比之前更有广度的分布,所以这里sigmoid函数的表现力不受限制。这里sigmoid换成tanh函数也可以。

    当激活函数使用ReLU时,权重初始值使用He初始值(标准差为$\sqrt\frac{2}{{n} } $),激活函数是sigmoid或tanh等S型曲线函数时使用Xavier初始值。

    6.3 Batch Normalization

    p184

    • 可以使学习快速进行(可以增大学习率)
    • 不那么依赖初始值
    • 抑制过拟合

    6.4 正则化

    6.4.1 过拟合

    p189

    过拟合的主要原因

    • 模型拥有大量的参数,表现力强
    • 训练数据小

    6.4.2 权值衰减

    p191

    权值衰减可以用来抑制过拟合,在学习的过程中对大的权重进行惩罚,来抑制过拟合,很大过拟合是因为权重参数取值过大才发生的。让损失函数加上下面的任一一个正则化项

    正则化项

    • L1范数是各元素的绝对值之和
    • L2范数是各个元素的平方和
    • L∞范数是各个元素的绝对值中最大的那一个

    6.4.3 Dropout

    p193

    一种抑制过拟合的方法,会在学习过程中随机删除神经元

    import numpy as np
    
    
    class Dropout:
        def __init__(self,dropout_ratio=0.5):
            self.dropout_ratio = dropout_ratio
            self.mask = None
    
        def forward(self,x,train_flg=True):
            if train_flg:
                self.mask = np.random.rand(*x.shape) > self.dropout_ratio
                print(self.mask)
                # [[True False]
                #  [True  True]]
                return x * self.mask
    
            else:
                return x * (1.0 - self.dropout_ratio)
    
        def backward(self,dout):
            return dout * self.mask
    
    d = Dropout()
    x = np.array([[1.0,-0.5],[-2.0,3.0]])
    print(d.forward(x))
    # [[ 1. -0.]
    #  [-2.  3.]]
    print(d.backward(1))
    # [[1 0]
    #  [1 1]]
    

    *的作用:在函数定义中,收集所有位置参数到一个新的元组,并将整个元组赋值给变量

    import numpy as np
    x = np.array([[1,2],
                  [3,4]])
    print(np.random.rand(*x.shape))
    # [[0.52955823 0.80267996]
    #  [0.03912289 0.12095148]]
    

    6.5 超参数的验证

    超参数一般为神经元数量,batch大小,学习率,权值衰减等,不能使用测试数据评估超参数的性能,如果用测试数据确认超参数的好坏,会导致模型不能拟合其他数据,泛化能力低。

    6.5.1 验证数据

    p195

    调整超参数时,必须用验证数据来评估超参数的好坏。训练数据用于参数(权重和偏置)的学习,验证数据用于超参数的性能评估,确认泛化能力,需要使用测试数据(最好只用一次)

    一般拿到数据集后,会分成训练数据,验证数据,测试数据。如果数据集没有划分,需要自己手动划分,这里以MNIST为例,划分20%为验证数据。

    (x_train,t_train),(x_test,t_test) = load_minist()
    
    # 打乱训练数据
    x_train,t_train = shuffle_dataset(x_train,t_train)
    
    # 分割验证数据
    validation_rate = 0.20
    validation_num = int(x_train.shape[0]*validation_rate)
    
    x_val = x_train[:validation_num]
    t_val = t_train[:validation_num]
    x_train = x_train[validation_num:]
    t_train = t_train[validation_num:]
    
    

    6.5.2 超参数的最优化

    p197

    步骤0

    设定超参数的范围

    步骤1

    从设定的超参数范围中随机采样

    步骤2

    使用步骤1中采样的超参数的值进行学习,通过验证数据评估估计精度(但是要将epoch设置得很小)

    步骤3

    重复步骤1和步骤2(100次等),根据它们的识别精度的结果,缩小超参数的范围。

  • 相关阅读:
    Handler使用入门 下载文件
    基于QT前端的mplayer播放器2
    HTTP协议详解(真的很经典)
    安装 gentoo minimal cd
    linux 分区方案
    MFC 中添加按钮和事件
    makefile 学习
    堆和栈
    paste命令使用
    cut用法
  • 原文地址:https://www.cnblogs.com/bzwww/p/16815910.html
Copyright © 2020-2023  润新知