• 【动手学深度学习pytorch】学习笔记 8.6. 循环神经网络的简洁实现


    8.6. 循环神经网络的简洁实现 — 动手学深度学习 2.0.0-beta0 documentation (d2l.ai)

    使用深度学习框架的高级API提供的函数 更有效地 实现相同的语言模型:根据用户提供的文本的前缀生成后续文本

    知识点:nn.RNN(input_size, hidden_size, num_layers

    下面部分知识点是第九章的,提前用到了,可以热热身。到第九章再详细看。

    LSTM:nn.LSTM()是9.2的内容,这里暂且囫囵吞枣,绕过去。9.2. 长短期记忆网络(LSTM) — 动手学深度学习 2.0.0-beta0 documentation (d2l.ai)

    pytorch中RNN参数的详细解释_lwgkzl的博客-CSDN博客_pytorch rnn

    torch.nn.RNN(input_size, hidden_size, num_layers)函数解析_Hanjieee的博客-CSDN博客_num_layers


    Pytorch中nn.RNN()基本用法和输入输出_Fantine_Deng的博客-CSDN博客_nn.rnn

     测试nn.RNN()

    import torch
    from torch import nn
    
    ######### 定义模型和输入 #########
    rnn = nn.RNN(2, 3, 1)  # (input_size, hidden_size, num_layers)
    
    input = torch.randn(5, 1, 2)  # (seq_len, batch_size, input_size)
    h0 = torch.randn(1, 1, 3)  # (num_layers, batch_size, hidden_size)
    
    ######### 将输入喂入模型 #########
    output, hn = rnn(input, h0)
    
    #########查看模型参数#########
    print(rnn._parameters)
    
    # https://blog.csdn.net/Fantine_Deng/article/details/111356280

    OrderedDict([

    ('weight_ih_l0', Parameter containing:
    tensor([[-0.0892,  0.1417],
            [-0.3719,  0.1958],
            [-0.0948, -0.2139]], requires_grad=True)),

    ('weight_hh_l0', Parameter containing:
    tensor([[ 0.4076,  0.2693,  0.0957],
            [ 0.0461,  0.2012,  0.1977],
            [-0.3464, -0.3319, -0.5038]], requires_grad=True)),

    ('bias_ih_l0', Parameter containing:
    tensor([ 0.2128, -0.5474,  0.5349], requires_grad=True)),

    ('bias_hh_l0', Parameter containing:
    tensor([0.4805, 0.4561, 0.0080], requires_grad=True))])

    循环神经网络的简洁实现 源代码:

    import torch
    from torch import nn
    from torch.nn import functional as F
    from d2l import torch as d2l
    
    batch_size, num_steps = 32, 35
    train_iter, vocab = d2l.load_data_time_machine(batch_size, num_steps)
    
    num_hiddens = 256  # 256个隐藏单元
    rnn_layer = nn.RNN(len(vocab), num_hiddens)
    
    state = torch.zeros((1, batch_size, num_hiddens))  # 形状是(隐藏层数,批量大小,隐藏单元数)
    print('state.shape', state.shape)
    
    X = torch.rand(size=(num_steps, batch_size, len(vocab)))
    print('X.shape', X.shape)
    Y, state_new = rnn_layer(X, state)
    print('Y.shape', Y.shape)
    print('state_new.shape', state_new.shape)
    
    
    class RNNModel(nn.Module):
        """循环神经网络模型"""
        def __init__(self, rnn_layer, vocab_size, **kwargs):
            super(RNNModel, self).__init__(**kwargs)
            self.rnn = rnn_layer
            self.vocab_size = vocab_size
            self.num_hiddens = self.rnn.hidden_size
            # 如果RNN是双向的(之后将介绍),num_directions应该是2,否则应该是1
            if not self.rnn.bidirectional:
                self.num_directions = 1
                self.linear = nn.Linear(self.num_hiddens, self.vocab_size)
            else:
                self.num_directions = 2
                self.linear = nn.Linear(self.num_hiddens * 2, self.vocab_size)
    
        def forward(self, inputs, state):
            X = F.one_hot(inputs.T.long(), self.vocab_size)
            X = X.to(torch.float32)
            Y, state = self.rnn(X, state)
            # 全连接层首先将Y的形状改为(时间步数*批量大小,隐藏单元数)
            # 它的输出形状是(时间步数*批量大小,词表大小)。
            output = self.linear(Y.reshape((-1, Y.shape[-1])))
            return output, state
    
        def begin_state(self, device, batch_size=1):
            if not isinstance(self.rnn, nn.LSTM):
                # nn.GRU以张量作为隐状态
                return  torch.zeros((self.num_directions * self.rnn.num_layers,
                                     batch_size, self.num_hiddens), device=device)
            else:
                # nn.LSTM以元组作为隐状态
                return (torch.zeros((
                    self.num_directions * self.rnn.num_layers,
                    batch_size, self.num_hiddens), device=device),
                        torch.zeros((
                            self.num_directions * self.rnn.num_layers,
                            batch_size, self.num_hiddens), device=device))
    
    
    device = d2l.try_gpu()
    net = RNNModel(rnn_layer, vocab_size=len(vocab))
    net = net.to(device)
    print(d2l.predict_ch8('time traveller', 10, net, vocab, device))
    
    
    num_epochs, lr = 500, 1
    d2l.train_ch8(net, train_iter, vocab, lr, num_epochs, device)

    可以看到,最开始的预测也是很烂:time traveller<unk>pzppppppp

    前50轮:

    time traveller the the the the the the the the the the the the t
    time traveller and and and and and and and and and and and and a
    time traveller and the the that the the the the the the the the 
    time traveller the this this thing the this this this this this 
    time traveller the and he this thith sime thave the this the thi

    450-500轮:

    time traveller held in his hand was a glitteree and so ou vabkt 
    time traveller held in his hald was that sxistertare pals gs our
    time traveller held in timy beti he trivellem but sowing and wny
    time travellerit s against reason said the medical man there are
    time traveller after the pauserequired for the grome was e begin

    perplexity 1.4, 83962.5 tokens/sec on cpu
    time traveller after the pauserequired for the grome was e begin

    课本结论:与上一节相比,由于深度学习框架的高级API对代码进行了更多的优化, 该模型在较短的时间内达到了较低的困惑度。

  • 相关阅读:
    Linux的常用目录学习笔记
    htm,html,xhtml,xml,xsl,dhtml,shtm和shtml的区分
    js中迭代元素特性与DOM中的DocumentFragment类型 笔记
    查找算法--线性结构的查找方法
    有关rand(),srand()产生随机数学习总结
    nodejs版本管理工具nvm使用说明
    解决微信端公众号网页获取短信验证码ajax重复调用两次的问题
    判断安卓苹果ipad,iphone,微信
    css的对号错号,也就是勾和叉
    flex的一些方法注释或者叫flex笔记
  • 原文地址:https://www.cnblogs.com/hbuwyg/p/16366328.html
Copyright © 2020-2023  润新知