Introduction
What does understanding language mean?
This “understanding” of text is mainly derived by transforming texts to useable computational representations, which are discrete or continuous combinatorial structures such as vectors or tensors, graphs and trees.
Computational Graphs
Technically, a computational graph is an abstraction that models mathematical expressions. Let’s see how the
computational graph models expressions. Consider the expression:
We can then represent the original expression using a directed acyclic graph (DAG) in which
the nodes are the mathematical operations, like multiplication and addition.
Pytorch Basics
Unlike Theano, Caffe, and TensorFlow, PyTorch implements a tape-based
automatic differentiation method that allows us to define and execute
computational graphs dynamically.
Static frameworks like Theano, Caffe, and TensorFlow require the computational graph to be first declared, compiled, and then executed. Although this leads to extremely efficient implementations (useful in production and mobile settings), it can become quite cumbersome during research and development. Modern frameworks like Chainer, DyNet, and PyTorch implement dynamic computational graphs to allow for a more flexible, imperative style of development, without needing to compile the models before every execution. Dynamic computational graphs are especially useful in modeling NLP tasks for which each input could potentially result in a different graph structure.
Tensor
A tensor of order zero is just a number, or a scalar. A tensor of order one (1st-order tensor) is an array of numbers, or a vector. Similarly, a 2nd-order tensor is an array of vectors, or a matrix.
Create Tensors
def describe(x):
print("Type: {}".format(x.type()))
print("Shape: {}".format(x.shape))
print("Values: {}".format(x))
# 1. initialize a random tensor
describe(torch.Tensor(2,3))
# 2. uniform random on the interval [0,1)
describe(torch.rand(2,3))
# 3. random standard normaldistribution
decribe(torch.randn(2,3))
Any PyTorch method with an underscore (_)
refers to an in-place operation; that is, it modifies the content in place without
creating a new object:
x = torch.ones(2,3)
x.fill_(5) # x has been changed
# creating and initializing a tensor from lists
x = torch.Tensor([[1,2,3],[2,3,4]])
# creating and initializing a tensor from numpy. note the type of x_t is torch.DoubleTensor other than the default FloatTensor
x = np.random.rand(2,3)
x_t = torch.from_numpy(x)
Tensor Types and Size
x = torch.FloatTensor([[1,2,3]])
# Type: torch.FloatTensor
x = x.long()
# torch.LongTensor
x = torch.tensor([[1,2,3]],dtype=torch.int64)
# Type: torch.LongTensor
We use the shape property and size() method of a tensor object to access the
measurements of its dimensions. The two ways of accessing these measurements
are mostly synonymous.
Tensor Operations
x = torch.arange(6)
# tensor([0, 1, 2, 3, 4, 5])
x = x.view(2,3)
# tensor([[0, 1, 2],
# [3, 4, 5]])
torch.sum(x,dim=0)
# tensor([3, 5, 7]) 对第0个维度进行操作
torch.sum(x,dim=1)
# tensor([ 3, 12]) 对第1个维度进行操作
Indexing, Slicing, and Joining
describe(x[:1,:2])
# Type: torch.LongTensor
# Shape: torch.Size([1, 2])
# Values: tensor([[0, 1]])
describe(x[0,1])
# Type: torch.LongTensor
# Shape: torch.Size([])
# Values: 1
# complex indexing
indices = torch.LongTensor([0,2])
describe(torch.index_select(x,dim=1,index=indices))
# Type: torch.LongTensor
# Shape: torch.Size([2, 2])
# Values: tensor([[0, 2],
# [3, 5]])
indices = torch.LongTensor([0,0,0])
describe(torch.index_select(x,dim=0,index=indices))
# Type: torch.LongTensor
# Shape: torch.Size([3, 3])
# Values: tensor([[0, 1, 2],
# [0, 1, 2],
# [0, 1, 2]])
concatenating Tensors
x.shape
# torch.Size([2, 3])
describe(torch.cat([x,x],dim=0))
# Type: torch.LongTensor
# Shape: torch.Size([4, 3])
# Values: tensor([[0, 1, 2],
# [3, 4, 5],
# [0, 1, 2],
# [3, 4, 5]])
describe(torch.cat([x,x],dim=1))
# Type: torch.LongTensor
# Shape: torch.Size([2, 6])
# Values: tensor([[0, 1, 2, 0, 1, 2],
# [3, 4, 5, 3, 4, 5]])
describe(torch.stack([x,x]))
# Type: torch.LongTensor
# Shape: torch.Size([2, 2, 3])
# Values: tensor([[[0, 1, 2],
# [3, 4, 5]],
# [[0, 1, 2],
# [3, 4, 5]]])
Linear algebra on tensors : multiplication
x = torch.arange(6).view(2,3)
# differnet from the examples in the book, the type of x is LongTensor, but x in the examples was FloatTensor
x2 = torch.ones(3,2)
x2[:,1] += 1
# tensor([[1., 2.],
# [1., 2.],
# [1., 2.]])
torch.mm(x,x2)
# this will appear RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'mat2'
# so we should change x = torch.arange(6.0).view(2,3)
Tensors and Computational Graphs
In the computational graph setting, gradients
exist for each parameter in the model and can be thought of as the parameter's
contribution to the error signal.
x = torch.ones(2,2,requires_grad=True)
y = (x+2)*(x+5) + 3
# tensor([[21., 21.],
# [21., 21.]], grad_fn=<AddBackward>)
z = y.mean()
# tensor(21., grad_fn=<MeanBackward1>)
z.backward()
print(x.grad is None)
# False
x.grad
# tensor([[2.2500, 2.2500],
# [2.2500, 2.2500]])
how to calculate x?
把 x 的值带进去,就能得到 x 的梯度值
CUDA Tensors
torch.cuda.is_available()
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# define which gpu to use
device = torch.device(6)
x = torch.rand(2,2).to(device)
# Type: torch.cuda.FloatTensor
note: To operate on CUDA and non-CUDA objects, we need to ensure that they are on the same device. If we don’t, the computations will break.
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py
Exercise
# 1. Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.
a = torch.ones(2,2)
b = a.unsqueeze(0)
# torch.Size([1, 2, 2])
# 2. Remove the extra dimension you just added to the previous tensor.
b.squeeze(0)
# 3. Create a random tensor of shape 5x3 in the interval [3, 7)
3 + torch.rand(5,3)*(7-3)
# 4. Create a tensor with values from a normal distribution (mean=0, std=1).
a = torch.rand(3,3)
a.normal_()
# 5. Retrieve the indexes of all the nonzero elements in the tensor
a = torch.Tensor([1, 1, 1, 0, 1])
torch.nonzero(a)
# tensor([[0],
# [1],
# [2],
# [4]])
# 6. Create a random tensor of size (3,1) and then horizontally stack four copies together.
a = torch.rand(3,1)
# tensor([[0.5543],
# [0.1504],
# [0.6194]])
a.expand(3,4)
# tensor([[0.5543, 0.5543, 0.5543, 0.5543],
# [0.1504, 0.1504, 0.1504, 0.1504],
# [0.6194, 0.6194, 0.6194, 0.6194]])
# 7. Return the batch matrix-matrix product of two three-dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).
torch.bmm(a,b)
# 8. Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5),b=torch.rand(5,4)).
a=torch.rand(3,4,5)
b=torch.rand(5,4)
torch.bmm(a,b.unsqueeze(0).expand(a.size(0),*b.size()))
note: expand can only expand the dimension with size 1
*b.shape 表示展开shape,只有传参的时候才能使用;譬如 shape=(1,2) ,相当于调用f的时候f(1,2)