• pytorch-- Attention Mechanism


    1. paper:  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

    Encoder

      每个时刻输入一个词,隐藏层状态根据公式ht=f(ht−1,xt)改变。其中激活函数f可以是sigmod,tanh,ReLU,sotfplus等。
      读完序列的每一个词之后,会得到一个固定长度向量c=tanh(VhN)
    Decoder

      由结构图可以看出,t时刻的隐藏层状态ht由ht−1,yt−1,c决定:ht=f(ht−1,yt−1,c),其中h0=tanh(V′c)
      最后的输出yt是由ht,yt−1,c决定
      P=(yt|yt−1,yt−2,...,y1,c)=g(ht,yt−1,c)

    以上,f,gf,g都是激活函数,其中g一般是softmax

    对此我在pytoch环境下进行实现seq2seq最初版的模型:

    (参考:https://github.com/graykode/nlp-tutorial)

      1 import numpy as np
      2 import torch
      3 import torch.nn as nn
      4 from torch.autograd import Variable
      5 
      6 dtype = torch.FloatTensor
      7 # S: Symbol that shows starting of decoding input
      8 # E: Symbol that shows ending of decoding output
      9 # P: Symbol that will fill in blank sequence if current batch data size is short than time steps
     10 
     11 char_arr = [c for c in 'SEPabcdefghijklmnopqrstuvwxyz']
     12 num_dic = {n: i for i, n in enumerate(char_arr)}
     13 
     14 seq_data = [['man', 'women'], ['black', 'white'], ['king', 'queen'], ['girl', 'boy'], ['up', 'down'], ['high', 'low']]
     15 
     16 # Seq2Seq Parameter
     17 n_step = 5
     18 n_hidden = 128
     19 n_class = len(num_dic)    #29
     20 batch_size = len(seq_data)    #6
     21 
     22 def make_batch(seq_data):
     23     input_batch, output_batch, target_batch = [], [], []
     24 
     25     for seq in seq_data:
     26         for i in range(2):
     27             seq[i] = seq[i] + 'P' * (n_step - len(seq[i]))
     28 
     29         input = [num_dic[n] for n in seq[0]]
     30         output = [num_dic[n] for n in ('S' + seq[1])]
     31         target = [num_dic[n] for n in (seq[1] + 'E')]
     32 
     33         input_batch.append(np.eye(n_class)[input])
     34         output_batch.append(np.eye(n_class)[output])
     35         target_batch.append(target) # not one-hot
     36 
     37     # make tensor
     38     return Variable(torch.Tensor(input_batch)), Variable(torch.Tensor(output_batch)), Variable(torch.LongTensor(target_batch))
     39 
     40 # Model
     41 class Seq2Seq(nn.Module):
     42     def __init__(self):
     43         super(Seq2Seq, self).__init__()
     44 
     45         self.enc_cell = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)
     46         self.dec_cell = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)
     47         self.fc = nn.Linear(n_hidden, n_class)
     48 
     49     def forward(self, enc_input, enc_hidden, dec_input):
     50         enc_input = enc_input.transpose(0, 1) # enc_input: [max_len(=n_step, time step), batch_size, n_class]
     51         dec_input = dec_input.transpose(0, 1) # dec_input: [max_len(=n_step, time step), batch_size, n_class]
     52 
     53         # enc_states : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
     54         _, enc_states = self.enc_cell(enc_input, enc_hidden)
     55         # outputs : [max_len+1(=6), batch_size, num_directions(=1) * n_hidden(=128)]
     56         outputs, _ = self.dec_cell(dec_input, enc_states)
     57 
     58         model = self.fc(outputs) # model : [max_len+1(=6), batch_size, n_class]
     59         return model
     60 
     61 
     62 input_batch, output_batch, target_batch = make_batch(seq_data)
     63 
     64 model = Seq2Seq()
     65 criterion = nn.CrossEntropyLoss()
     66 optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
     67 
     68 for epoch in range(5000):
     69     # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
     70     hidden = Variable(torch.zeros(1, batch_size, n_hidden))
     71 
     72 
     73     # input_batch : [batch_size, max_len(=n_step, time step), n_class]
     74     # output_batch : [batch_size, max_len+1(=n_step, time step) (becase of 'S' or 'E'), n_class]
     75     # target_batch : [batch_size, max_len+1(=n_step, time step)], not one-hot
     76     output = model(input_batch, hidden, output_batch)
     77     # output : [max_len+1, batch_size, n_class]
     78     output = output.transpose(0, 1) # [batch_size, max_len+1(=6), n_class]
     79     loss = 0
     80     for i in range(0, len(target_batch)):
     81         # output[i] : [max_len+1, n_class, target_batch[i] : max_len+1]
     82         loss += criterion(output[i], target_batch[i])
     83     if (epoch + 1) % 1000 == 0:
     84         print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))
     85 
     86     optimizer.zero_grad()
     87     loss.backward()
     88     optimizer.step()
     89 
     90 
     91 # Test
     92 def translate(word):
     93     input_batch, output_batch, _ = make_batch([[word, 'P' * len(word)]])
     94 
     95     # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
     96     hidden = Variable(torch.zeros(1, 1, n_hidden))
     97     output = model(input_batch, hidden, output_batch)
     98     # output : [max_len+1(=6), batch_size(=1), n_class]
     99 
    100     predict = output.data.max(2, keepdim=True)[1] # select n_class dimension
    101     decoded = [char_arr[i] for i in predict]
    102     end = decoded.index('E')
    103     translated = ''.join(decoded[:end])
    104 
    105     return translated.replace('P', '')
    106 
    107 print('test')
    108 print('man ->', translate('man'))
    109 print('mans ->', translate('mans'))
    110 print('king ->', translate('king'))
    111 print('black ->', translate('black'))
    112 print('upp ->', translate('upp'))

    之后,在seq2seq模型基础上,提出了attention机制。

    论文: NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

  • 相关阅读:
    Community Server(CS)是一个非常优秀的Asp.net开源软件
    乐在其中设计模式(C#)
    VSTO+WinForm+WebService+WCF+WPF示例
    系出名门 Android 系列文章索引
    DataTable 和List 相互转换
    C#异步TCP通讯类库FlyTcpFramework
    WCF+BizTalk开发系列
    精通MVC 3 框架
    我对架构的理解
    DataTable转换成IList
  • 原文地址:https://www.cnblogs.com/dhName/p/11872118.html
Copyright © 2020-2023  润新知