• pytorch做seq2seq注意力模型的翻译


    以下是对pytorch 1.0版本 的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑):

      1 # -*- coding: utf-8 -*-
      2 """
      3 Translation with a Sequence to Sequence Network and Attention
      4 *************************************************************
      5 **Author**: `Sean Robertson <https://github.com/spro/practical-pytorch>`_
      6 
      7 In this project we will be teaching a neural network to translate from
      8 French to English.
      9 
     10 ::
     11 
     12     [KEY: > input, = target, < output]
     13 
     14     > il est en train de peindre un tableau .
     15     = he is painting a picture .
     16     < he is painting a picture .
     17 
     18     > pourquoi ne pas essayer ce vin delicieux ?
     19     = why not try that delicious wine ?
     20     < why not try that delicious wine ?
     21 
     22     > elle n est pas poete mais romanciere .
     23     = she is not a poet but a novelist .
     24     < she not not a poet but a novelist .
     25 
     26     > vous etes trop maigre .
     27     = you re too skinny .
     28     < you re all alone .
     29 
     30 ... to varying degrees of success.
     31 
     32 This is made possible by the simple but powerful idea of the `sequence
     33 to sequence network <http://arxiv.org/abs/1409.3215>`__, in which two
     34 recurrent neural networks work together to transform one sequence to
     35 another. An encoder network condenses an input sequence into a vector,
     36 and a decoder network unfolds that vector into a new sequence.
     37 
     38 .. figure:: /_static/img/seq-seq-images/seq2seq.png
     39    :alt:
     40 
     41 To improve upon this model we'll use an `attention
     42 mechanism <https://arxiv.org/abs/1409.0473>`__, which lets the decoder
     43 learn to focus over a specific range of the input sequence.
     44 
     45 **Recommended Reading:**
     46 
     47 I assume you have at least installed PyTorch, know Python, and
     48 understand Tensors:
     49 
     50 -  https://pytorch.org/ For installation instructions
     51 -  :doc:`/beginner/deep_learning_60min_blitz` to get started with PyTorch in general
     52 -  :doc:`/beginner/pytorch_with_examples` for a wide and deep overview
     53 -  :doc:`/beginner/former_torchies_tutorial` if you are former Lua Torch user
     54 
     55 
     56 It would also be useful to know about Sequence to Sequence networks and
     57 how they work:
     58 
     59 -  `Learning Phrase Representations using RNN Encoder-Decoder for
     60    Statistical Machine Translation <http://arxiv.org/abs/1406.1078>`__
     61 -  `Sequence to Sequence Learning with Neural
     62    Networks <http://arxiv.org/abs/1409.3215>`__
     63 -  `Neural Machine Translation by Jointly Learning to Align and
     64    Translate <https://arxiv.org/abs/1409.0473>`__
     65 -  `A Neural Conversational Model <http://arxiv.org/abs/1506.05869>`__
     66 
     67 You will also find the previous tutorials on
     68 :doc:`/intermediate/char_rnn_classification_tutorial`
     69 and :doc:`/intermediate/char_rnn_generation_tutorial`
     70 helpful as those concepts are very similar to the Encoder and Decoder
     71 models, respectively.
     72 
     73 And for more, read the papers that introduced these topics:
     74 
     75 -  `Learning Phrase Representations using RNN Encoder-Decoder for
     76    Statistical Machine Translation <http://arxiv.org/abs/1406.1078>`__
     77 -  `Sequence to Sequence Learning with Neural
     78    Networks <http://arxiv.org/abs/1409.3215>`__
     79 -  `Neural Machine Translation by Jointly Learning to Align and
     80    Translate <https://arxiv.org/abs/1409.0473>`__
     81 -  `A Neural Conversational Model <http://arxiv.org/abs/1506.05869>`__
     82 
     83 
     84 **Requirements**
     85 """
     86 from __future__ import unicode_literals, print_function, division
     87 from io import open
     88 import unicodedata
     89 import string
     90 import re
     91 import random
     92 
     93 import torch
     94 import torch.nn as nn
     95 from torch import optim
     96 import torch.nn.functional as F
     97 
     98 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     99 
    100 ######################################################################
    101 # Loading data files
    102 # ==================
    103 #
    104 # The data for this project is a set of many thousands of English to
    105 # French translation pairs.
    106 #
    107 # `This question on Open Data Stack
    108 # Exchange <http://opendata.stackexchange.com/questions/3888/dataset-of-sentences-translated-into-many-languages>`__
    109 # pointed me to the open translation site http://tatoeba.org/ which has
    110 # downloads available at http://tatoeba.org/eng/downloads - and better
    111 # yet, someone did the extra work of splitting language pairs into
    112 # individual text files here: http://www.manythings.org/anki/
    113 #
    114 # The English to French pairs are too big to include in the repo, so
    115 # download to ``data/eng-fra.txt`` before continuing. The file is a tab
    116 # separated list of translation pairs:
    117 #
    118 # ::
    119 #
    120 #     I am cold.    J'ai froid.
    121 #
    122 # .. Note::
    123 #    Download the data from
    124 #    `here <https://download.pytorch.org/tutorial/data.zip>`_
    125 #    and extract it to the current directory.
    126 
    127 ######################################################################
    128 # Similar to the character encoding used in the character-level RNN
    129 # tutorials, we will be representing each word in a language as a one-hot
    130 # vector, or giant vector of zeros except for a single one (at the index
    131 # of the word). Compared to the dozens of characters that might exist in a
    132 # language, there are many many more words, so the encoding vector is much
    133 # larger. We will however cheat a bit and trim the data to only use a few
    134 # thousand words per language.
    135 #
    136 # .. figure:: /_static/img/seq-seq-images/word-encoding.png
    137 #    :alt:
    138 #
    139 #
    140 
    141 
    142 ######################################################################
    143 # We'll need a unique index per word to use as the inputs and targets of
    144 # the networks later. To keep track of all this we will use a helper class
    145 # called ``Lang`` which has word → index (``word2index``) and index → word
    146 # (``index2word``) dictionaries, as well as a count of each word
    147 # ``word2count`` to use to later replace rare words.
    148 #
    149 
    150 SOS_token = 0
    151 EOS_token = 1
    152 
    153 
    154 # 每个单词需要对应唯一的索引作为稍后的网络输入和目标.为了追踪这些索引
    155 # 则使用一个帮助类 Lang ,类中有 词 → 索引 (word2index) 和 索引 → 词
    156 # (index2word) 的字典, 以及每个词word2count 用来替换稀疏词汇.
    157 
    158 
    159 # 此处创建的Lang 对象来表示源/目标语言,它包含三部分:word2index、
    160 # index2word 和word2count,分别表示单词到id、id 到单词和单词的词频。
    161 # word2count的作用是用于过滤一些低频词(把它变成unknown)
    162 
    163 class Lang:
    164     def __init__(self, name):
    165         self.name = name
    166         self.word2index = {}
    167         self.word2count = {}
    168         self.index2word = {0: "SOS", 1: "EOS"}
    169         self.n_words = 2  # Count SOS and EOS
    170 
    171     def addSentence(self, sentence):
    172         for word in sentence.split(' '):
    173             self.addWord(word)  # 用于添加单词
    174 
    175     def addWord(self, word):
    176         if word not in self.word2index:  # 是不是新的词
    177             # 如果不在word2index里,则需要新的定义字典
    178             self.word2index[word] = self.n_words
    179             self.word2count[word] = 1
    180             self.index2word[self.n_words] = word
    181             self.n_words += 1  # 相当于每次index+1
    182         else:
    183             self.word2count[word] += 1  # 计算每次词的个数
    184 
    185 
    186 ######################################################################
    187 # The files are all in Unicode, to simplify we will turn Unicode
    188 # characters to ASCII, make everything lowercase, and trim most
    189 # punctuation.
    190 #
    191 
    192 # Turn a Unicode string to plain ASCII, thanks to
    193 # http://stackoverflow.com/a/518232/2809427
    194 
    195 # 此处是为了将Unicode字符串转换为纯ASCII
    196 # 原文件是Unicode编码
    197 def unicodeToAscii(s):
    198     return ''.join(
    199         c for c in unicodedata.normalize('NFD', s)
    200         if unicodedata.category(c) != 'Mn'
    201     )
    202 
    203 
    204 # Lowercase, trim, and remove non-letter characters
    205 
    206 # 小写,修剪和删除非字母字符
    207 def normalizeString(s):
    208     s = unicodeToAscii(s.lower().strip())
    209     s = re.sub(r"([.!?])", r" 1", s)
    210     s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    211     return s
    212 
    213 
    214 ######################################################################
    215 # To read the data file we will split the file into lines, and then split
    216 # lines into pairs. The files are all English → Other Language, so if we
    217 # want to translate from Other Language → English I added the ``reverse``
    218 # flag to reverse the pairs.
    219 #
    220 
    221 
    222 # 要读取数据文件,我们将把文件分成行,然后将行成对分开. 这些文件
    223 # 都是英文→其他语言,所以如果我们想从其他语言翻译→英文,我们添加了
    224 # 翻转标志 reverse来翻转词语对.
    225 def readLangs(lang1, lang2, reverse=False):
    226     print("Reading lines...")
    227 
    228     # Read the file and split into lines
    229     # 读取文件并按行分开
    230     lines = open('data/%s-%s.txt' % (lang1, lang2), encoding='utf-8'). 
    231         read().strip().split('
    ')
    232 
    233     # Split every line into pairs and normalize
    234     # 将每一行分成两列并进行标准化
    235     pairs = [[normalizeString(s) for s in l.split('	')] for l in lines]
    236 
    237     # Reverse pairs, make Lang instances
    238     # 翻转对,Lang实例化
    239     if reverse:
    240         pairs = [list(reversed(p)) for p in pairs]
    241         input_lang = Lang(lang2)
    242         output_lang = Lang(lang1)
    243     else:
    244         input_lang = Lang(lang1)
    245         output_lang = Lang(lang2)
    246 
    247     return input_lang, output_lang, pairs
    248 
    249 
    250 ######################################################################
    251 # Since there are a *lot* of example sentences and we want to train
    252 # something quickly, we'll trim the data set to only relatively short and
    253 # simple sentences. Here the maximum length is 10 words (that includes
    254 # ending punctuation) and we're filtering to sentences that translate to
    255 # the form "I am" or "He is" etc. (accounting for apostrophes replaced
    256 # earlier).
    257 #
    258 
    259 # 由于例句较多,为了方便快速训练,则会将数据集裁剪为相对简短的句子.
    260 # 这里的单词的最大长度是10词(包括结束标点符号),
    261 # 保留”I am” 和”He is” 开头的数据
    262 
    263 MAX_LENGTH = 10
    264 
    265 eng_prefixes = (
    266     "i am ", "i m ",
    267     "he is", "he s ",
    268     "she is", "she s",
    269     "you are", "you re ",
    270     "we are", "we re ",
    271     "they are", "they re "
    272 )
    273 
    274 
    275 def filterPair(p):
    276     return len(p[0].split(' ')) < MAX_LENGTH and 
    277            len(p[1].split(' ')) < MAX_LENGTH and 
    278            p[1].startswith(eng_prefixes)
    279     # 是否满足长度
    280 
    281 
    282 def filterPairs(pairs):
    283     return [pair for pair in pairs if filterPair(pair)]
    284 
    285 
    286 ######################################################################
    287 # The full process for preparing the data is:
    288 #
    289 # -  Read text file and split into lines, split lines into pairs
    290 # -  Normalize text, filter by length and content
    291 # -  Make word lists from sentences in pairs
    292 #
    293 
    294 def prepareData(lang1, lang2, reverse=False):
    295     input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
    296     # 读入数据lang1,lang2,并翻转
    297     print("Read %s sentence pairs" % len(pairs))
    298     # 一共读入了多少对
    299     pairs = filterPairs(pairs)
    300     # 符合条件的配对有多少对
    301     print("Trimmed to %s sentence pairs" % len(pairs))
    302     print("Counting words...")
    303     for pair in pairs:
    304         input_lang.addSentence(pair[0])
    305         output_lang.addSentence(pair[1])
    306     print("Counted words:")
    307     print(input_lang.name, input_lang.n_words)
    308     print(output_lang.name, output_lang.n_words)
    309     return input_lang, output_lang, pairs
    310 
    311 
    312 # 对数据进行预处理
    313 input_lang, output_lang, pairs = prepareData('eng', 'fra', True)
    314 print(random.choice(pairs))  # 随机展示一对
    315 
    316 
    317 ######################################################################
    318 # The Seq2Seq Model
    319 # =================
    320 #
    321 # A Recurrent Neural Network, or RNN, is a network that operates on a
    322 # sequence and uses its own output as input for subsequent steps.
    323 #
    324 # A `Sequence to Sequence network <http://arxiv.org/abs/1409.3215>`__, or
    325 # seq2seq network, or `Encoder Decoder
    326 # network <https://arxiv.org/pdf/1406.1078v3.pdf>`__, is a model
    327 # consisting of two RNNs called the encoder and decoder. The encoder reads
    328 # an input sequence and outputs a single vector, and the decoder reads
    329 # that vector to produce an output sequence.
    330 #
    331 # .. figure:: /_static/img/seq-seq-images/seq2seq.png
    332 #    :alt:
    333 #
    334 # Unlike sequence prediction with a single RNN, where every input
    335 # corresponds to an output, the seq2seq model frees us from sequence
    336 # length and order, which makes it ideal for translation between two
    337 # languages.
    338 #
    339 # Consider the sentence "Je ne suis pas le chat noir" → "I am not the
    340 # black cat". Most of the words in the input sentence have a direct
    341 # translation in the output sentence, but are in slightly different
    342 # orders, e.g. "chat noir" and "black cat". Because of the "ne/pas"
    343 # construction there is also one more word in the input sentence. It would
    344 # be difficult to produce a correct translation directly from the sequence
    345 # of input words.
    346 #
    347 # With a seq2seq model the encoder creates a single vector which, in the
    348 # ideal case, encodes the "meaning" of the input sequence into a single
    349 # vector — a single point in some N dimensional space of sentences.
    350 #
    351 
    352 
    353 ######################################################################
    354 # The Encoder
    355 # -----------
    356 #
    357 # The encoder of a seq2seq network is a RNN that outputs some value for
    358 # every word from the input sentence. For every input word the encoder
    359 # outputs a vector and a hidden state, and uses the hidden state for the
    360 # next input word.
    361 #
    362 # .. figure:: /_static/img/seq-seq-images/encoder-network.png
    363 #    :alt:
    364 #
    365 #
    366 
    367 class EncoderRNN(nn.Module):
    368     def __init__(self, input_size, hidden_size):
    369         super(EncoderRNN, self).__init__()
    370         self.hidden_size = hidden_size
    371         # 定义隐藏层
    372         self.embedding = nn.Embedding(input_size, hidden_size)
    373         # word embedding的定义可以这么理解,例如nn.Embedding(2, 4)
    374         # 2表示有2个词,4表示4维度,其实也就是一个2x4的矩阵,
    375         # 如果有100个词,每个词10维,就可以写为nn.Embedding(100, 10)
    376         # 注意这里的词向量的建立只是初始的词向量,并没有经过任何修改优化
    377         # 需要建立神经网络通过learning的办法修改word embedding里面的参数
    378         # 使得word embedding每一个词向量能够表示每一个不同的词。
    379         self.gru = nn.GRU(hidden_size, hidden_size)  # 用到了上面提到的GRU模型
    380 
    381     def forward(self, input, hidden):
    382         embedded = self.embedding(input).view(1, 1, -1)  # -1是指自适应,view相当于reshape函数
    383         output = embedded
    384         output, hidden = self.gru(output, hidden)
    385         return output, hidden
    386 
    387     def initHidden(self):  # 初始化
    388         return torch.zeros(1, 1, self.hidden_size, device=device)
    389 
    390 
    391 ######################################################################
    392 # The Decoder
    393 # -----------
    394 #
    395 # The decoder is another RNN that takes the encoder output vector(s) and
    396 # outputs a sequence of words to create the translation.
    397 #
    398 
    399 
    400 ######################################################################
    401 # Simple Decoder
    402 # ^^^^^^^^^^^^^^
    403 #
    404 # In the simplest seq2seq decoder we use only last output of the encoder.
    405 # This last output is sometimes called the *context vector* as it encodes
    406 # context from the entire sequence. This context vector is used as the
    407 # initial hidden state of the decoder.
    408 #
    409 # At every step of decoding, the decoder is given an input token and
    410 # hidden state. The initial input token is the start-of-string ``<SOS>``
    411 # token, and the first hidden state is the context vector (the encoder's
    412 # last hidden state).
    413 #
    414 # .. figure:: /_static/img/seq-seq-images/decoder-network.png
    415 #    :alt:
    416 #
    417 #
    418 
    419 class DecoderRNN(nn.Module):
    420     # DecoderRNN与encoderRNN结构类似,结合图片即可搞清逻辑
    421     def __init__(self, hidden_size, output_size):
    422         super(DecoderRNN, self).__init__()
    423         self.hidden_size = hidden_size
    424 
    425         self.embedding = nn.Embedding(output_size, hidden_size)
    426         self.gru = nn.GRU(hidden_size, hidden_size)
    427         self.out = nn.Linear(hidden_size, output_size)
    428         self.softmax = nn.LogSoftmax(dim=1)
    429 
    430     def forward(self, input, hidden):
    431         output = self.embedding(input).view(1, 1, -1)  # -1是指自适应,view相当于reshape函数
    432         output = F.relu(output)
    433         output, hidden = self.gru(output, hidden)  # 此处使用gru神经网络
    434         # 对上述结果使用softmax,就是图片中左边倒数第二个
    435         output = self.softmax(self.out(output[0]))
    436         return output, hidden
    437 
    438     def initHidden(self):
    439         return torch.zeros(1, 1, self.hidden_size, device=device)
    440 
    441 
    442 ######################################################################
    443 # I encourage you to train and observe the results of this model, but to
    444 # save space we'll be going straight for the gold and introducing the
    445 # Attention Mechanism.
    446 #
    447 
    448 
    449 ######################################################################
    450 # Attention Decoder
    451 # ^^^^^^^^^^^^^^^^^
    452 #
    453 # If only the context vector is passed betweeen the encoder and decoder,
    454 # that single vector carries the burden of encoding the entire sentence.
    455 #
    456 # Attention allows the decoder network to "focus" on a different part of
    457 # the encoder's outputs for every step of the decoder's own outputs. First
    458 # we calculate a set of *attention weights*. These will be multiplied by
    459 # the encoder output vectors to create a weighted combination. The result
    460 # (called ``attn_applied`` in the code) should contain information about
    461 # that specific part of the input sequence, and thus help the decoder
    462 # choose the right output words.
    463 #
    464 # .. figure:: https://i.imgur.com/1152PYf.png
    465 #    :alt:
    466 #
    467 # Calculating the attention weights is done with another feed-forward
    468 # layer ``attn``, using the decoder's input and hidden state as inputs.
    469 # Because there are sentences of all sizes in the training data, to
    470 # actually create and train this layer we have to choose a maximum
    471 # sentence length (input length, for encoder outputs) that it can apply
    472 # to. Sentences of the maximum length will use all the attention weights,
    473 # while shorter sentences will only use the first few.
    474 #
    475 # .. figure:: /_static/img/seq-seq-images/attention-decoder-network.png
    476 #    :alt:
    477 #
    478 #
    479 
    480 class AttnDecoderRNN(nn.Module):
    481     def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
    482         super(AttnDecoderRNN, self).__init__()
    483         self.hidden_size = hidden_size
    484         self.output_size = output_size
    485         self.dropout_p = dropout_p
    486         self.max_length = max_length
    487 
    488         self.embedding = nn.Embedding(self.output_size, self.hidden_size)
    489         self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
    490         self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
    491         self.dropout = nn.Dropout(self.dropout_p)
    492         self.gru = nn.GRU(self.hidden_size, self.hidden_size)
    493         self.out = nn.Linear(self.hidden_size, self.output_size)
    494 
    495     def forward(self, input, hidden, encoder_outputs):
    496         # 对于输入的input内容进行embedding和dropout操作
    497         # dropout是指随机丢弃一些神经元
    498         embedded = self.embedding(input).view(1, 1, -1)
    499         embedded = self.dropout(embedded)
    500 
    501         # 此处相当于学出来了attention的权重
    502         # 需要注意的是torch的concatenate函数是torch.cat,是在已有的维度上拼接,
    503         # 而stack是建立一个新的维度,然后再在该纬度上进行拼接。
    504         attn_weights = F.softmax(
    505             self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
    506 
    507         # 将attention权重作用在encoder_outputs上
    508         # 对存储在两个批batch1和batch2内的矩阵进行批矩阵乘操作。
    509         # batch1和 batch2都为包含相同数量矩阵的3维张量。
    510         # 如果batch1是形为b×n×m的张量,batch1是形为b×m×p的张量,
    511         # 则out和mat的形状都是n×p
    512         attn_applied = torch.bmm(attn_weights.unsqueeze(0),
    513                                  encoder_outputs.unsqueeze(0))
    514         # 拼接操作,将embedded和attn_Applied拼接起来
    515         output = torch.cat((embedded[0], attn_applied[0]), 1)
    516         # 返回一个新的张量,对输入的制定位置插入维度 1
    517         output = self.attn_combine(output).unsqueeze(0)
    518 
    519         output = F.relu(output)
    520         output, hidden = self.gru(output, hidden)
    521 
    522         output = F.log_softmax(self.out(output[0]), dim=1)
    523         return output, hidden, attn_weights
    524 
    525     def initHidden(self):
    526         return torch.zeros(1, 1, self.hidden_size, device=device)
    527 
    528 
    529 ######################################################################
    530 # .. note:: There are other forms of attention that work around the length
    531 #   limitation by using a relative position approach. Read about "local
    532 #   attention" in `Effective Approaches to Attention-based Neural Machine
    533 #   Translation <https://arxiv.org/abs/1508.04025>`__.
    534 #
    535 # Training
    536 # ========
    537 #
    538 # Preparing Training Data
    539 # -----------------------
    540 #
    541 # To train, for each pair we will need an input tensor (indexes of the
    542 # words in the input sentence) and target tensor (indexes of the words in
    543 # the target sentence). While creating these vectors we will append the
    544 # EOS token to both sequences.
    545 #
    546 
    547 def indexesFromSentence(lang, sentence):
    548     return [lang.word2index[word] for word in sentence.split(' ')]
    549 
    550 
    551 def tensorFromSentence(lang, sentence):
    552     # 获得词的索引
    553     indexes = indexesFromSentence(lang, sentence)
    554     # 将EOS标记添加到两个序列中
    555     indexes.append(EOS_token)
    556     return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
    557 
    558 
    559 def tensorsFromPair(pair):
    560     # 每一对为需要输入的张量(输入句子中的词的索引)和目标张量
    561     # (目标语句中的词的索引)
    562     input_tensor = tensorFromSentence(input_lang, pair[0])
    563     target_tensor = tensorFromSentence(output_lang, pair[1])
    564     return (input_tensor, target_tensor)
    565 
    566 
    567 ######################################################################
    568 # Training the Model
    569 # ------------------
    570 #
    571 # To train we run the input sentence through the encoder, and keep track
    572 # of every output and the latest hidden state. Then the decoder is given
    573 # the ``<SOS>`` token as its first input, and the last hidden state of the
    574 # encoder as its first hidden state.
    575 #
    576 # "Teacher forcing" is the concept of using the real target outputs as
    577 # each next input, instead of using the decoder's guess as the next input.
    578 # Using teacher forcing causes it to converge faster but `when the trained
    579 # network is exploited, it may exhibit
    580 # instability <http://minds.jacobs-university.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf>`__.
    581 #
    582 # You can observe outputs of teacher-forced networks that read with
    583 # coherent grammar but wander far from the correct translation -
    584 # intuitively it has learned to represent the output grammar and can "pick
    585 # up" the meaning once the teacher tells it the first few words, but it
    586 # has not properly learned how to create the sentence from the translation
    587 # in the first place.
    588 #
    589 # Because of the freedom PyTorch's autograd gives us, we can randomly
    590 # choose to use teacher forcing or not with a simple if statement. Turn
    591 # ``teacher_forcing_ratio`` up to use more of it.
    592 #
    593 
    594 teacher_forcing_ratio = 0.5
    595 
    596 
    597 # teacher forcing即指使用教师强迫其能够更快的收敛
    598 # 不过当训练好的网络被利用时,容易表现出不稳定性
    599 # teacher_forcing_ratio即指教师训练比率
    600 # 用于训练的函数
    601 
    602 
    603 def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion,
    604           max_length=MAX_LENGTH):
    605     # encoder即指EncoderRNN(input_lang.n_words, hidden_size)
    606     # attn_decoder即指 AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1)
    607     # hidden=256
    608     encoder_hidden = encoder.initHidden()
    609 
    610     # encoder_optimizer 即指optim.SGD(encoder.parameters(), lr=learning_rate)
    611     # decoder_optimizer 即指optim.SGD(decoder.parameters(), lr=learning_rate)
    612     # nn.Parameter()是Variable的一种,常被用于模块参数(module parameter)。
    613     # Parameters 是 Variable 的子类。Paramenters和Modules一起使用的时候会有一些特殊的属性,
    614     # 即:当Paramenters赋值给Module的属性的时候,他会自动的被加到 Module的 参数列表中
    615     # (即:会出现在 parameters() 迭代器中)。将Varibale赋值给Module属性则不会有这样的影响。
    616     # 这样做的原因是:我们有时候会需要缓存一些临时的状态(state), 比如:模型中RNN的最后一个隐状态。
    617     # 如果没有Parameter这个类的话,那么这些临时变量也会注册成为模型变量。
    618     encoder_optimizer.zero_grad()
    619     decoder_optimizer.zero_grad()
    620 
    621     # 得到长度
    622     input_length = input_tensor.size(0)
    623     target_length = target_tensor.size(0)
    624 
    625     # 初始化outour值
    626     encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)
    627 
    628     loss = 0
    629 
    630     # 以下循环是学习过程
    631     for ei in range(input_length):
    632         encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
    633         encoder_outputs[ei] = encoder_output[0, 0]  # 这里为什么取 0,0
    634 
    635     # 定义decoder的Input值
    636     decoder_input = torch.tensor([[SOS_token]], device=device)
    637 
    638     decoder_hidden = encoder_hidden
    639 
    640     use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False
    641 
    642     if use_teacher_forcing:
    643         # Teacher forcing: Feed the target as the next input
    644         # 教师强制: 将目标作为下一个输入
    645         # 你观察教师强迫网络的输出,这些网络是用连贯的语法阅读的,但却远离了正确的翻译 -
    646         # 直观地来看它已经学会了代表输出语法,并且一旦老师告诉它前几个单词,就可以"拾取"它的意思,
    647         # 但它没有适当地学会如何从翻译中创建句子.
    648         for di in range(target_length):
    649             # 通过decoder得到输出值
    650             decoder_output, decoder_hidden, decoder_attention = decoder(
    651                 decoder_input, decoder_hidden, encoder_outputs)
    652             # 定义损失函数并计算
    653             loss += criterion(decoder_output, target_tensor[di])
    654             decoder_input = target_tensor[di]  # Teacher forcing
    655 
    656     else:
    657         # Without teacher forcing: use its own predictions as the next input
    658         # 没有教师强迫: 使用自己的预测作为下一个输入
    659         for di in range(target_length):
    660             # 通过decoder得到输出值
    661             decoder_output, decoder_hidden, decoder_attention = decoder(
    662                 decoder_input, decoder_hidden, encoder_outputs)
    663 
    664             # topk:第k个最小元素,返回第k个最小元素
    665             # 返回前k个最大元素,注意是前k个,largest=False,返回前k个最小元素
    666             # 此函数的功能是求取1-D 或N-D Tensor的最低维度的前k个最大的值,返回值为两个Tuple
    667             # 其中values是前k个最大值的Tuple,indices是对应的下标,默认返回结果是从大到小排序的。
    668             topv, topi = decoder_output.topk(1)
    669             decoder_input = topi.squeeze().detach()  # detach from history as input
    670 
    671             loss += criterion(decoder_output, target_tensor[di])
    672             if decoder_input.item() == EOS_token:
    673                 break
    674     # 反向传播
    675     loss.backward()
    676 
    677     # 更新参数
    678     encoder_optimizer.step()
    679     decoder_optimizer.step()
    680 
    681     return loss.item() / target_length
    682 
    683 
    684 ######################################################################
    685 # This is a helper function to print time elapsed and estimated time
    686 # remaining given the current time and progress %.
    687 #
    688 
    689 import time
    690 import math
    691 
    692 
    693 # 根据当前时间和进度百分比,这是一个帮助功能,用于打印经过的时间和估计的剩余时间.
    694 
    695 def asMinutes(s):
    696     m = math.floor(s / 60)
    697     s -= m * 60
    698     return '%dm %ds' % (m, s)
    699 
    700 
    701 def timeSince(since, percent):
    702     now = time.time()
    703     s = now - since
    704     es = s / (percent)
    705     rs = es - s
    706     return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
    707 
    708 
    709 ######################################################################
    710 # The whole training process looks like this:
    711 #
    712 # -  Start a timer
    713 # -  Initialize optimizers and criterion
    714 # -  Create set of training pairs
    715 # -  Start empty losses array for plotting
    716 #
    717 # Then we call ``train`` many times and occasionally print the progress (%
    718 # of examples, time so far, estimated time) and average loss.
    719 #
    720 
    721 def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):
    722     start = time.time()
    723     plot_losses = []
    724     print_loss_total = 0  # Reset every print_every
    725     plot_loss_total = 0  # Reset every plot_every
    726 
    727     encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    728     decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    729 
    730     # 获取训练的一对样本
    731     training_pairs = [tensorsFromPair(random.choice(pairs))
    732                       for i in range(n_iters)]
    733     # 定义出的损失函数
    734     criterion = nn.NLLLoss()
    735 
    736     for iter in range(1, n_iters + 1):
    737         training_pair = training_pairs[iter - 1]
    738         input_tensor = training_pair[0]
    739         target_tensor = training_pair[1]
    740 
    741         # 训练的过程并用于当损失函数
    742         loss = train(input_tensor, target_tensor, encoder,
    743                      decoder, encoder_optimizer, decoder_optimizer, criterion)
    744         print_loss_total += loss
    745         plot_loss_total += loss
    746 
    747         if iter % print_every == 0:
    748             print_loss_avg = print_loss_total / print_every
    749             print_loss_total = 0
    750             # 打印进度(样本的百分比,到目前为止的时间,估计的时间)和平均损失.
    751             print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
    752                                          iter, iter / n_iters * 100, print_loss_avg))
    753 
    754         if iter % plot_every == 0:
    755             plot_loss_avg = plot_loss_total / plot_every
    756             plot_losses.append(plot_loss_avg)
    757             plot_loss_total = 0
    758     # 绘制图像
    759     showPlot(plot_losses)
    760 
    761 
    762 ######################################################################
    763 # Plotting results
    764 # ----------------
    765 #
    766 # Plotting is done with matplotlib, using the array of loss values
    767 # ``plot_losses`` saved while training.
    768 #
    769 
    770 import matplotlib.pyplot as plt
    771 
    772 plt.switch_backend('agg')
    773 import matplotlib.ticker as ticker
    774 import numpy as np
    775 
    776 
    777 # 使用matplotlib进行绘图,使用训练时保存的损失值plot_losses数组.
    778 def showPlot(points):
    779     plt.figure()
    780     fig, ax = plt.subplots()
    781     # this locator puts ticks at regular intervals
    782     # 这个定位器会定期发出提示信息
    783     loc = ticker.MultipleLocator(base=0.2)
    784     ax.yaxis.set_major_locator(loc)
    785     plt.plot(points)
    786 
    787 
    788 ######################################################################
    789 # Evaluation
    790 # ==========
    791 #
    792 # Evaluation is mostly the same as training, but there are no targets so
    793 # we simply feed the decoder's predictions back to itself for each step.
    794 # Every time it predicts a word we add it to the output string, and if it
    795 # predicts the EOS token we stop there. We also store the decoder's
    796 # attention outputs for display later.
    797 #
    798 
    799 def evaluate(encoder, decoder, sentence, max_length=MAX_LENGTH):
    800     with torch.no_grad():
    801         # 从sentence中得到对应的变量
    802         input_tensor = tensorFromSentence(input_lang, sentence)
    803         # 长度
    804         input_length = input_tensor.size()[0]
    805 
    806         # encoder即指EncoderRNN(input_lang.n_words, hidden_size)
    807         # attn_decoder即指 AttnDecoderRNN(hidden_size,
    808         # output_lang.n_words, dropout_p=0.1)
    809         # hidden=256
    810         encoder_hidden = encoder.initHidden()
    811 
    812         # 初始化outputs值
    813         encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)
    814 
    815         # 以下是学习过程
    816         for ei in range(input_length):
    817             encoder_output, encoder_hidden = encoder(input_tensor[ei],
    818                                                      encoder_hidden)
    819             encoder_outputs[ei] += encoder_output[0, 0]
    820 
    821         # 定义好decoder部分的input值
    822         decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS
    823 
    824         # 设置好隐藏层
    825         decoder_hidden = encoder_hidden
    826 
    827         decoded_words = []
    828         decoder_attentions = torch.zeros(max_length, max_length)
    829 
    830         for di in range(max_length):
    831             # 得到结果
    832             decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_outputs)
    833 
    834             # attention部分的数据
    835             decoder_attentions[di] = decoder_attention.data
    836             # 选择output中的第一个值
    837             topv, topi = decoder_output.data.topk(1)
    838             if topi.item() == EOS_token:
    839                 decoded_words.append('<EOS>')
    840                 break
    841             else:
    842                 decoded_words.append(output_lang.index2word[topi.item()])  # 将output_lang添加到decoded
    843 
    844             decoder_input = topi.squeeze().detach()
    845 
    846         return decoded_words, decoder_attentions[:di + 1]
    847 
    848 
    849 ######################################################################
    850 # We can evaluate random sentences from the training set and print out the
    851 # input, target, and output to make some subjective quality judgements:
    852 #
    853 
    854 # 从训练集中评估随机的句子并打印出输入,目标和输出以作出一些主观质量判断
    855 def evaluateRandomly(encoder, decoder, n=10):
    856     for i in range(n):
    857         pair = random.choice(pairs)
    858         print('>', pair[0])
    859         print('=', pair[1])
    860         output_words, attentions = evaluate(encoder, decoder, pair[0])
    861         output_sentence = ' '.join(output_words)
    862         print('<', output_sentence)
    863         print('')
    864 
    865 
    866 ######################################################################
    867 # Training and Evaluating
    868 # =======================
    869 #
    870 # With all these helper functions in place (it looks like extra work, but
    871 # it makes it easier to run multiple experiments) we can actually
    872 # initialize a network and start training.
    873 #
    874 # Remember that the input sentences were heavily filtered. For this small
    875 # dataset we can use relatively small networks of 256 hidden nodes and a
    876 # single GRU layer. After about 40 minutes on a MacBook CPU we'll get some
    877 # reasonable results.
    878 #
    879 # .. Note::
    880 #    If you run this notebook you can train, interrupt the kernel,
    881 #    evaluate, and continue training later. Comment out the lines where the
    882 #    encoder and decoder are initialized and run ``trainIters`` again.
    883 #
    884 
    885 hidden_size = 256
    886 # 编码部分
    887 encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
    888 # 加入了attention机制的解码部分
    889 attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)
    890 # 训练部分
    891 trainIters(encoder1, attn_decoder1, 75000, print_every=5000)
    892 
    893 ######################################################################
    894 # 随机生成一组结果
    895 evaluateRandomly(encoder1, attn_decoder1)
    896 
    897 ######################################################################
    898 # Visualizing Attention
    899 # ---------------------
    900 #
    901 # A useful property of the attention mechanism is its highly interpretable
    902 # outputs. Because it is used to weight specific encoder outputs of the
    903 # input sequence, we can imagine looking where the network is focused most
    904 # at each time step.
    905 #
    906 # You could simply run ``plt.matshow(attentions)`` to see attention output
    907 # displayed as a matrix, with the columns being input steps and rows being
    908 # output steps:
    909 #
    910 
    911 output_words, attentions = evaluate(encoder1, attn_decoder1, "je suis trop froid .")
    912 plt.matshow(attentions.numpy())
    913 
    914 
    915 ######################################################################
    916 # For a better viewing experience we will do the extra work of adding axes
    917 # and labels:
    918 
    919 def showAttention(input_sentence, output_words, attentions):
    920     # Set up figure with colorbar
    921     fig = plt.figure()
    922     ax = fig.add_subplot(111)
    923     cax = ax.matshow(attentions.numpy(), cmap='bone')
    924     fig.colorbar(cax)
    925 
    926     # Set up axes
    927     ax.set_xticklabels([''] + input_sentence.split(' ') +
    928                        ['<EOS>'], rotation=90)
    929     ax.set_yticklabels([''] + output_words)
    930 
    931     # Show label at every tick
    932     ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    933     ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
    934 
    935     plt.show()
    936 
    937 
    938 def evaluateAndShowAttention(input_sentence):
    939     output_words, attentions = evaluate(
    940         encoder1, attn_decoder1, input_sentence)
    941     print('input =', input_sentence)
    942     print('output =', ' '.join(output_words))
    943     showAttention(input_sentence, output_words, attentions)
    944 
    945 
    946 evaluateAndShowAttention("elle a cinq ans de moins que moi .")
    947 evaluateAndShowAttention("elle est trop petit .")
    948 evaluateAndShowAttention("je ne crains pas de mourir .")
    949 evaluateAndShowAttention("c est un jeune directeur plein de talent .")
    950 
    951 ######################################################################
    952 # Exercises
    953 # =========
    954 #
    955 # -  Try with a different dataset
    956 #
    957 #    -  Another language pair
    958 #    -  Human → Machine (e.g. IOT commands)
    959 #    -  Chat → Response
    960 #    -  Question → Answer
    961 #
    962 # -  Replace the embeddings with pre-trained word embeddings such as word2vec or
    963 #    GloVe
    964 # -  Try with more layers, more hidden units, and more sentences. Compare
    965 #    the training time and results.
    966 # -  If you use a translation file where pairs have two of the same phrase
    967 #    (``I am test 	 I am test``), you can use this as an autoencoder. Try
    968 #    this:
    969 #
    970 #    -  Train as an autoencoder
    971 #    -  Save only the Encoder network
    972 #    -  Train a new Decoder for translation from there
    973 #
  • 相关阅读:
    Java-Class-C:org.springframework.util.Assert.java
    获取数组中最大最小值方法
    1423 Greatest Common Increasing Subsequence (LCIS)
    Strange Addition
    APUE读书笔记-第15章-进程间通信
    【技术文档】开发一个人力资源管理系统遇到的问题及解决的方法
    picture control控件
    armlinux下的网路传输(tcp/ip)
    黑马程序员—面向接口编程的好处
    【.NET中AOP的实现方案】静态代理
  • 原文地址:https://www.cnblogs.com/www-caiyin-com/p/10123346.html
Copyright © 2020-2023  润新知