• Pytorch之Tensor学习


    Pytorch之Tensor学习

    Tensors是与数组和矩阵类似的数据结构,比如它与numpy 的ndarray类似,但tensors可以在GPU上运行。实际上,tensors和numpy数组经常共用内存,消除了拷贝数据的需要。Tensors被优化的可以自动求微分。

    import torch
    import numpy as np
    

    初始化Tensor

    • 直接从数据
    data=[[1,2],[3,4]]
    
    x_data=torch.tensor(data)
    
    x_data
    
    tensor([[1, 2],
            [3, 4]])
    
    • 从numpy数组
    np_array=np.array(data)
    
    x_np=torch.tensor(np_array)
    
    x_np
    
    tensor([[1, 2],
            [3, 4]], dtype=torch.int32)
    
    x_np=torch.from_numpy(np_array)
    
    x_np
    
    tensor([[1, 2],
            [3, 4]], dtype=torch.int32)
    
    • 从另一个tensor

    新tensor与参数tensor相比,保留了其特性(shape,datatype)等,除非显式的替换:

    x_ones=torch.ones_like(x_data);x_ones
    
    tensor([[1, 1],
            [1, 1]])
    
    x_rand=torch.rand_like(x_data,dtype=torch.float);x_rand
    
    tensor([[0.1462, 0.1567],
            [0.6331, 0.8472]])
    
    • 随机或者恒定值

    shape是tensor维度的元组

    shape=(2,3)
    rand_tensor=torch.rand(shape)
    ones_tensor=torch.ones(shape)
    zeros_tensor=torch.zeros(shape)
    print(rand_tensor)
    print(ones_tensor)
    print(zeros_tensor)
    
    tensor([[0.4811, 0.5744, 0.8909],
            [0.6602, 0.9882, 0.1145]])
    tensor([[1., 1., 1.],
            [1., 1., 1.]])
    tensor([[0., 0., 0.],
            [0., 0., 0.]])
    

    Tensor的属性

    Tensor属性为shape,datatype,被储存在的设备,device

    tensor=torch.rand(3,4)
    
    tensor.shape
    
    torch.Size([3, 4])
    
    tensor.dtype
    
    torch.float32
    
    tensor.device
    
    device(type='cpu')
    

    Tensor运算

    超过100个tensor运算,包括算术,线性代数,矩阵操作(转置,索引,切片),采样等。每个运算都可以在GPU上进行(常常比在CPU上更快)

    默认地,tensors在CPU上被创建。我们需要显式的通过.to方法来将它移动到GPU上。在设备间拷贝大型tensor对于时间和开销都是高昂的。

    if torch.cuda.is_available():
        tensor=tensor.to('cuda')
    

    类似numpy的索引和切片:

    tensor=torch.ones((4,4));tensor
    
    tensor([[1., 1., 1., 1.],
            [1., 1., 1., 1.],
            [1., 1., 1., 1.],
            [1., 1., 1., 1.]])
    
    tensor[0]
    
    tensor([1., 1., 1., 1.])
    
    tensor[:,0]
    
    tensor([1., 1., 1., 1.])
    
    tensor[...,-1]=100;tensor
    
    tensor([[  1.,   1.,   1., 100.],
            [  1.,   1.,   1., 100.],
            [  1.,   1.,   1., 100.],
            [  1.,   1.,   1., 100.]])
    
    tensor[:,1]=10;tensor
    
    tensor([[  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.]])
    

    除了常用的索引选择数据,PyTorch还提供了一些高级的选择函数:

    help(torch.index_select)
    
    Help on built-in function index_select:
    
    index_select(...)
        index_select(input, dim, index, *, out=None) -> Tensor
        
        Returns a new tensor which indexes the :attr:`input` tensor along dimension
        :attr:`dim` using the entries in :attr:`index` which is a `LongTensor`.
        
        The returned tensor has the same number of dimensions as the original tensor
        (:attr:`input`).  The :attr:`dim` th dimension has the same size as the length
        of :attr:`index`; other dimensions have the same size as in the original tensor.
        
        .. note:: The returned tensor does **not** use the same storage as the original
                  tensor.  If :attr:`out` has a different shape than expected, we
                  silently change it to the correct shape, reallocating the underlying
                  storage if necessary.
        
        Args:
            input (Tensor): the input tensor.
            dim (int): the dimension in which we index
            index (IntTensor or LongTensor): the 1-D tensor containing the indices to index
        
        Keyword args:
            out (Tensor, optional): the output tensor.
        
        Example::
        
            >>> x = torch.randn(3, 4)
            >>> x
            tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
                    [-0.4664,  0.2647, -0.1228, -1.1068],
                    [-1.1734, -0.6571,  0.7230, -0.6004]])
            >>> indices = torch.tensor([0, 2])
            >>> torch.index_select(x, 0, indices)
            tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
                    [-1.1734, -0.6571,  0.7230, -0.6004]])
            >>> torch.index_select(x, 1, indices)
            tensor([[ 0.1427, -0.5414],
                    [-0.4664, -0.1228],
                    [-1.1734,  0.7230]])
    

    help(torch.masked_select)
    
    Help on built-in function masked_select:
    
    masked_select(...)
        masked_select(input, mask, *, out=None) -> Tensor
        
        Returns a new 1-D tensor which indexes the :attr:`input` tensor according to
        the boolean mask :attr:`mask` which is a `BoolTensor`.
        
        The shapes of the :attr:`mask` tensor and the :attr:`input` tensor don't need
        to match, but they must be :ref:`broadcastable <broadcasting-semantics>`.
        
        .. note:: The returned tensor does **not** use the same storage
                  as the original tensor
        
        Args:
            input (Tensor): the input tensor.
            mask  (BoolTensor): the tensor containing the binary mask to index with
        
        Keyword args:
            out (Tensor, optional): the output tensor.
        
        Example::
        
            >>> x = torch.randn(3, 4)
            >>> x
            tensor([[ 0.3552, -2.3825, -0.8297,  0.3477],
                    [-1.2035,  1.2252,  0.5002,  0.6248],
                    [ 0.1307, -2.0608,  0.1244,  2.0139]])
            >>> mask = x.ge(0.5)
            >>> mask
            tensor([[False, False, False, False],
                    [False, True, True, True],
                    [False, False, False, True]])
            >>> torch.masked_select(x, mask)
            tensor([ 1.2252,  0.5002,  0.6248,  2.0139])
    

    help(torch.gather)
    
    Help on built-in function gather:
    
    gather(...)
        gather(input, dim, index, *, sparse_grad=False, out=None) -> Tensor
        
        Gathers values along an axis specified by `dim`.
        
        For a 3-D tensor the output is specified by::
        
            out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
            out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
            out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2
        
        :attr:`input` and :attr:`index` must have the same number of dimensions.
        It is also required that ``index.size(d) <= input.size(d)`` for all
        dimensions ``d != dim``.  :attr:`out` will have the same shape as :attr:`index`.
        Note that ``input`` and ``index`` do not broadcast against each other.
        
        Args:
            input (Tensor): the source tensor
            dim (int): the axis along which to index
            index (LongTensor): the indices of elements to gather
        
        Keyword arguments:
            sparse_grad (bool, optional): If ``True``, gradient w.r.t. :attr:`input` will be a sparse tensor.
            out (Tensor, optional): the destination tensor
        
        Example::
        
            >>> t = torch.tensor([[1, 2], [3, 4]])
            >>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]]))
            tensor([[ 1,  1],
                    [ 4,  3]])
    

    可以用torch.cat来合并tensor,沿着某个方向,另外还有torch.stack,这稍微与torch.cat有些不一样。

    t1=torch.cat([tensor,tensor,tensor],dim=1);t1
    
    tensor([[  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.,   1.,  10.,   1., 100.,   1.,  10.,   1., 100.]])
    
    torch.cat([tensor,tensor,tensor],dim=0)
    
    tensor([[  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.],
            [  1.,  10.,   1., 100.]])
    

    catstack的区别在于前者会再增加现有维度的值,可以理解为续接,后者会增加一个维度,可以理解为叠加。

    a=torch.arange(0,12).reshape(3,4)
    
    a
    
    tensor([[ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11]])
    
    torch.cat([a,a]).shape
    
    torch.Size([6, 4])
    
    torch.stack([a,a]).shape
    
    torch.Size([2, 3, 4])
    
    torch.cat([a,a])
    
    tensor([[ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11],
            [ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11]])
    
    torch.stack([a,a])
    
    tensor([[[ 0,  1,  2,  3],
             [ 4,  5,  6,  7],
             [ 8,  9, 10, 11]],
    
            [[ 0,  1,  2,  3],
             [ 4,  5,  6,  7],
             [ 8,  9, 10, 11]]])
    
    • 算术运算
    tensor=torch.arange(0,9).reshape(3,3);tensor
    
    tensor([[0, 1, 2],
            [3, 4, 5],
            [6, 7, 8]])
    

    以下计算了tensor之间的矩阵乘法,y1,y2的值相同

    y1=tensor@tensor.T
    
    y1
    
    tensor([[  5,  14,  23],
            [ 14,  50,  86],
            [ 23,  86, 149]])
    
    y2=tensor.matmul(tensor.T)
    
    y2
    
    tensor([[  5,  14,  23],
            [ 14,  50,  86],
            [ 23,  86, 149]])
    
    y3=torch.empty(3,3)
    torch.add(tensor,tensor.T,out=y3)
    print(y3)
    
    tensor([[ 0.,  4.,  8.],
            [ 4.,  8., 12.],
            [ 8., 12., 16.]])
    

    单元素tensor,比如通过aggregate所有值得到一个值,那么就可以通过item()来得到Python的数值。

    agg=tensor.sum();agg
    
    tensor(36)
    
    agg_item=agg.item();agg_item
    
    36
    

    在位操作,那些把结果储存在运算数的运算被称为在位操作,可以用_来标识。比如x.copy_(y)x.t_()将会改变x

    tensor
    
    tensor([[0, 1, 2],
            [3, 4, 5],
            [6, 7, 8]])
    
    tensor.add_(5)
    
    tensor([[ 5,  6,  7],
            [ 8,  9, 10],
            [11, 12, 13]])
    
    tensor
    
    tensor([[ 5,  6,  7],
            [ 8,  9, 10],
            [11, 12, 13]])
    

    在位运算可能会省存储空间,但当计算导数的时候,会出错,因此不建议使用。

    与numpy 数组的相互转换

    使用numpy()from_numpy()将tensor和numpy数组相互转换。但需要注意的是:这两个函数所产生的tensor和Numpy的数组共享相同的内存(所以它们之间的转换很快),改变其中一个就改变了另一个!

    Tensor to Numpy array

    t=torch.ones(5)
    
    t
    
    tensor([1., 1., 1., 1., 1.])
    
    n=t.numpy();n
    
    array([ 1.,  1.,  1.,  1.,  1.], dtype=float32)
    
    t.add_(1)
    
    tensor([2., 2., 2., 2., 2.])
    

    Numpy array to Tensor

    n=np.ones(5)
    t=torch.from_numpy(n)
    t
    
    tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
    
    np.add(n,1,out=n)
    
    array([ 2.,  2.,  2.,  2.,  2.])
    
    t
    
    tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
    
    n
    
    array([ 2.,  2.,  2.,  2.,  2.])
    

    此外,除了上面的方法,还有一个常用的方法就算直接使用torch.tensor()将numpy数组转换为tensor,需要注意的的是该方法总是会进行数据拷贝,返回的tensor和原来的数据不再共享内存。

    a=np.arange(9).reshape(3,3)
    c=torch.tensor(a)
    a+=1
    print(c)
    print(a)
    
    tensor([[0, 1, 2],
            [3, 4, 5],
            [6, 7, 8]], dtype=torch.int32)
    [[1 2 3]
     [4 5 6]
     [7 8 9]]
    

    View()

    view()来改变tensor的形状,该方法返回的新tensor与源tensor共享内存(其实是同一个tensor),也即更改其中的一个,另外一个也会跟着改变。具有相同功能的reshape,也不能保证返回的是其拷贝。

    x=torch.randn(5,3);x
    
    tensor([[-0.5722, -0.4844,  1.5515],
            [-0.2504,  0.2010,  0.0182],
            [ 0.0400,  0.0397,  2.0167],
            [ 1.8868, -0.4670,  0.5968],
            [ 0.9070,  0.5825, -1.0549]])
    
    y=x.view(15);y
    
    tensor([-0.5722, -0.4844,  1.5515, -0.2504,  0.2010,  0.0182,  0.0400,  0.0397,
             2.0167,  1.8868, -0.4670,  0.5968,  0.9070,  0.5825, -1.0549])
    
    y[0]=100
    
    x
    
    tensor([[ 1.0000e+02, -4.8445e-01,  1.5515e+00],
            [-2.5042e-01,  2.0102e-01,  1.8231e-02],
            [ 3.9969e-02,  3.9711e-02,  2.0167e+00],
            [ 1.8868e+00, -4.6697e-01,  5.9683e-01],
            [ 9.0702e-01,  5.8254e-01, -1.0549e+00]])
    
    z=x.view(-1,5);z
    
    tensor([[ 1.0000e+02, -4.8445e-01,  1.5515e+00, -2.5042e-01,  2.0102e-01],
            [ 1.8231e-02,  3.9969e-02,  3.9711e-02,  2.0167e+00,  1.8868e+00],
            [-4.6697e-01,  5.9683e-01,  9.0702e-01,  5.8254e-01, -1.0549e+00]])
    
    q=x.reshape(15);q
    
    tensor([ 1.0000e+02, -4.8445e-01,  1.5515e+00, -2.5042e-01,  2.0102e-01,
             1.8231e-02,  3.9969e-02,  3.9711e-02,  2.0167e+00,  1.8868e+00,
            -4.6697e-01,  5.9683e-01,  9.0702e-01,  5.8254e-01, -1.0549e+00])
    
    q[0]=250;x
    
    tensor([[ 2.5000e+02, -4.8445e-01,  1.5515e+00],
            [-2.5042e-01,  2.0102e-01,  1.8231e-02],
            [ 3.9969e-02,  3.9711e-02,  2.0167e+00],
            [ 1.8868e+00, -4.6697e-01,  5.9683e-01],
            [ 9.0702e-01,  5.8254e-01, -1.0549e+00]])
    

    如果我们想要返回一个真正新的副本(即不共享内存),可以先用clone创造一个副本,再用view

    x_cp=x.clone().view(15)
    x-=1
    print(x)
    print(x_cp)
    
    tensor([[ 2.4900e+02, -1.4844e+00,  5.5149e-01],
            [-1.2504e+00, -7.9898e-01, -9.8177e-01],
            [-9.6003e-01, -9.6029e-01,  1.0167e+00],
            [ 8.8677e-01, -1.4670e+00, -4.0317e-01],
            [-9.2979e-02, -4.1746e-01, -2.0549e+00]])
    tensor([ 2.5000e+02, -4.8445e-01,  1.5515e+00, -2.5042e-01,  2.0102e-01,
             1.8231e-02,  3.9969e-02,  3.9711e-02,  2.0167e+00,  1.8868e+00,
            -4.6697e-01,  5.9683e-01,  9.0702e-01,  5.8254e-01, -1.0549e+00])
    

    使用clone还有一个好处就是会记录在计算图中,即梯度回传到副本时也会传到源tensor.
    另外一个常用的函数就是item(),它可以将一个标量tensor转换为python number

    x=torch.randn(1);x
    
    tensor([-0.9871])
    
    x.item()
    
    -0.9870905876159668
    

    线性代数

    • 迹:torch.trace
    help(torch.trace)
    
    Help on built-in function trace:
    
    trace(...)
        trace(input) -> Tensor
        
        Returns the sum of the elements of the diagonal of the input 2-D matrix.
        
        Example::
        
            >>> x = torch.arange(1., 10.).view(3, 3)
            >>> x
            tensor([[ 1.,  2.,  3.],
                    [ 4.,  5.,  6.],
                    [ 7.,  8.,  9.]])
            >>> torch.trace(x)
            tensor(15.)
    

    • 对角线元素:torch.diag
    help(torch.diag)
    
    Help on built-in function diag:
    
    diag(...)
        diag(input, diagonal=0, *, out=None) -> Tensor
        
        - If :attr:`input` is a vector (1-D tensor), then returns a 2-D square tensor
          with the elements of :attr:`input` as the diagonal.
        - If :attr:`input` is a matrix (2-D tensor), then returns a 1-D tensor with
          the diagonal elements of :attr:`input`.
        
        The argument :attr:`diagonal` controls which diagonal to consider:
        
        - If :attr:`diagonal` = 0, it is the main diagonal.
        - If :attr:`diagonal` > 0, it is above the main diagonal.
        - If :attr:`diagonal` < 0, it is below the main diagonal.
        
        Args:
            input (Tensor): the input tensor.
            diagonal (int, optional): the diagonal to consider
        
        Keyword args:
            out (Tensor, optional): the output tensor.
        
        .. seealso::
        
                :func:`torch.diagonal` always returns the diagonal of its input.
        
                :func:`torch.diagflat` always constructs a tensor with diagonal elements
                specified by the input.
        
        Examples:
        
        Get the square matrix where the input vector is the diagonal::
        
            >>> a = torch.randn(3)
            >>> a
            tensor([ 0.5950,-0.0872, 2.3298])
            >>> torch.diag(a)
            tensor([[ 0.5950, 0.0000, 0.0000],
                    [ 0.0000,-0.0872, 0.0000],
                    [ 0.0000, 0.0000, 2.3298]])
            >>> torch.diag(a, 1)
            tensor([[ 0.0000, 0.5950, 0.0000, 0.0000],
                    [ 0.0000, 0.0000,-0.0872, 0.0000],
                    [ 0.0000, 0.0000, 0.0000, 2.3298],
                    [ 0.0000, 0.0000, 0.0000, 0.0000]])
        
        Get the k-th diagonal of a given matrix::
        
            >>> a = torch.randn(3, 3)
            >>> a
            tensor([[-0.4264, 0.0255,-0.1064],
                    [ 0.8795,-0.2429, 0.1374],
                    [ 0.1029,-0.6482,-1.6300]])
            >>> torch.diag(a, 0)
            tensor([-0.4264,-0.2429,-1.6300])
            >>> torch.diag(a, 1)
            tensor([ 0.0255, 0.1374])
    

    • triu 上三角
    help(torch.triu)
    
    Help on built-in function triu:
    
    triu(...)
        triu(input, diagonal=0, *, out=None) -> Tensor
        
        Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices
        :attr:`input`, the other elements of the result tensor :attr:`out` are set to 0.
        
        The upper triangular part of the matrix is defined as the elements on and
        above the diagonal.
        
        The argument :attr:`diagonal` controls which diagonal to consider. If
        :attr:`diagonal` = 0, all elements on and above the main diagonal are
        retained. A positive value excludes just as many diagonals above the main
        diagonal, and similarly a negative value includes just as many diagonals below
        the main diagonal. The main diagonal are the set of indices
        :math:`lbrace (i, i) 
    brace` for :math:`i in [0, min{d_{1}, d_{2}} - 1]` where
        :math:`d_{1}, d_{2}` are the dimensions of the matrix.
        
        Args:
            input (Tensor): the input tensor.
            diagonal (int, optional): the diagonal to consider
        
        Keyword args:
            out (Tensor, optional): the output tensor.
        
        Example::
        
            >>> a = torch.randn(3, 3)
            >>> a
            tensor([[ 0.2309,  0.5207,  2.0049],
                    [ 0.2072, -1.0680,  0.6602],
                    [ 0.3480, -0.5211, -0.4573]])
            >>> torch.triu(a)
            tensor([[ 0.2309,  0.5207,  2.0049],
                    [ 0.0000, -1.0680,  0.6602],
                    [ 0.0000,  0.0000, -0.4573]])
            >>> torch.triu(a, diagonal=1)
            tensor([[ 0.0000,  0.5207,  2.0049],
                    [ 0.0000,  0.0000,  0.6602],
                    [ 0.0000,  0.0000,  0.0000]])
            >>> torch.triu(a, diagonal=-1)
            tensor([[ 0.2309,  0.5207,  2.0049],
                    [ 0.2072, -1.0680,  0.6602],
                    [ 0.0000, -0.5211, -0.4573]])
        
            >>> b = torch.randn(4, 6)
            >>> b
            tensor([[ 0.5876, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
                    [-0.2447,  0.9556, -1.2919,  1.3378, -0.1768, -1.0857],
                    [ 0.4333,  0.3146,  0.6576, -1.0432,  0.9348, -0.4410],
                    [-0.9888,  1.0679, -1.3337, -1.6556,  0.4798,  0.2830]])
            >>> torch.triu(b, diagonal=1)
            tensor([[ 0.0000, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
                    [ 0.0000,  0.0000, -1.2919,  1.3378, -0.1768, -1.0857],
                    [ 0.0000,  0.0000,  0.0000, -1.0432,  0.9348, -0.4410],
                    [ 0.0000,  0.0000,  0.0000,  0.0000,  0.4798,  0.2830]])
            >>> torch.triu(b, diagonal=-1)
            tensor([[ 0.5876, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
                    [-0.2447,  0.9556, -1.2919,  1.3378, -0.1768, -1.0857],
                    [ 0.0000,  0.3146,  0.6576, -1.0432,  0.9348, -0.4410],
                    [ 0.0000,  0.0000, -1.3337, -1.6556,  0.4798,  0.2830]])
    

    • tril 下三角
    help(torch.tril)
    
    Help on built-in function tril:
    
    tril(...)
        tril(input, diagonal=0, *, out=None) -> Tensor
        
        Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices
        :attr:`input`, the other elements of the result tensor :attr:`out` are set to 0.
        
        The lower triangular part of the matrix is defined as the elements on and
        below the diagonal.
        
        The argument :attr:`diagonal` controls which diagonal to consider. If
        :attr:`diagonal` = 0, all elements on and below the main diagonal are
        retained. A positive value includes just as many diagonals above the main
        diagonal, and similarly a negative value excludes just as many diagonals below
        the main diagonal. The main diagonal are the set of indices
        :math:`lbrace (i, i) 
    brace` for :math:`i in [0, min{d_{1}, d_{2}} - 1]` where
        :math:`d_{1}, d_{2}` are the dimensions of the matrix.
        
        Args:
            input (Tensor): the input tensor.
            diagonal (int, optional): the diagonal to consider
        
        Keyword args:
            out (Tensor, optional): the output tensor.
        
        Example::
        
            >>> a = torch.randn(3, 3)
            >>> a
            tensor([[-1.0813, -0.8619,  0.7105],
                    [ 0.0935,  0.1380,  2.2112],
                    [-0.3409, -0.9828,  0.0289]])
            >>> torch.tril(a)
            tensor([[-1.0813,  0.0000,  0.0000],
                    [ 0.0935,  0.1380,  0.0000],
                    [-0.3409, -0.9828,  0.0289]])
        
            >>> b = torch.randn(4, 6)
            >>> b
            tensor([[ 1.2219,  0.5653, -0.2521, -0.2345,  1.2544,  0.3461],
                    [ 0.4785, -0.4477,  0.6049,  0.6368,  0.8775,  0.7145],
                    [ 1.1502,  3.2716, -1.1243, -0.5413,  0.3615,  0.6864],
                    [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024,  0.0978]])
            >>> torch.tril(b, diagonal=1)
            tensor([[ 1.2219,  0.5653,  0.0000,  0.0000,  0.0000,  0.0000],
                    [ 0.4785, -0.4477,  0.6049,  0.0000,  0.0000,  0.0000],
                    [ 1.1502,  3.2716, -1.1243, -0.5413,  0.0000,  0.0000],
                    [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024,  0.0000]])
            >>> torch.tril(b, diagonal=-1)
            tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
                    [ 0.4785,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
                    [ 1.1502,  3.2716,  0.0000,  0.0000,  0.0000,  0.0000],
                    [-0.0614, -0.7344, -1.3164,  0.0000,  0.0000,  0.0000]])
    

    广播机制

    x=torch.arange(1,3).view(1,2);x
    
    tensor([[1, 2]])
    
    y=torch.arange(1,4).view(3,1);y
    
    tensor([[1],
            [2],
            [3]])
    
    x+y
    
    tensor([[2, 3],
            [3, 4],
            [4, 5]])
    

    运算的内存开销

    索引,view是不会开辟新内存,而y=x+y这样的运算是会新开内存,然后将y指向新内存。

    x=torch.tensor([1,2])
    y=torch.tensor([3,4])
    id_before=id(y)
    y=y+x
    id(y)==id_before
    
    False
    

    如果我们想指定结果到原来y的内存,可以使用索引来进行替换操作。

    x=torch.tensor([1,2])
    y=torch.tensor([3,4])
    id_before=id(y)
    y[:]=y+x
    id_before==id(y)
    
    True
    

    我们还可以使用运算符全名函数的out参数或者自加符号(也即add_):

    x=torch.tensor([1,2])
    y=torch.tensor([3,4])
    id_before=id(y)
    torch.add(x,y,out=y)
    id(y)==id_before
    
    True
    
    y+=x
    id(y)==id_before
    
    True
    
    y.add_(x)
    id(y)==id_before
    
    True
    
    y.requires_grad
    
    False
    

    自动求梯度

    Pytorch提供的autograd包能根据输入和前向传播过程自动构建计算图,并执行反向传播。

    如果将Tensor类的属性.require_grad设置为True,它将追踪在其上的所有操作(这样就可以利用链式法则进行梯度传播了)。完成计算后,可以调用.backward()来完成所有梯度计算。此tensor的梯度将累积到.grad属性中。

    注意在y.backward()时,如果y是标量,则不需要backward()传入任何参数,否则,需要传入一个与y同形的tensor,则此时y.backward(w)的含义是:先计算L=torch.sum(y*w),则L是个标量,然后求L对自变量x的导数。

    如果不想要被继续追踪,可以调用.detach()可将其从追踪记录中分离出来,这样就可以防止将来的计算被追踪,这样梯度就传不过去了。此外,还可以用with torch.no_grad()将不想被追踪的操作代码块包裹起来,这种方法在评价模型的时候很常用,因为在评估模型时,我们并不需要计算可训练参数(requires_grad=True)的梯度。

    Function是另外一个很重要的类。TensorFunction互相结合就可以构建一个记录有整个计算过程的有向无环图(DAG)。每个tensor都有一个.grad_fn属性,该属性即创建该TensorFunction,也就是说该tensor是不是通过某些运算得到的,若是,则grad_fn1返回一个与这些运算相关的对象,否则是None.

    x=torch.ones(2,2,requires_grad=True)
    print(x)
    print(x.grad_fn)
    print(x.grad) # 未计算则为None
    print(x.dtype)
    
    tensor([[1., 1.],
            [1., 1.]], requires_grad=True)
    None
    None
    torch.float32
    
    y=x+2
    print(y)
    print(y.grad_fn)
    
    tensor([[3., 3.],
            [3., 3.]], grad_fn=<AddBackward0>)
    <AddBackward0 object at 0x000001BA1B94F860>
    

    注意x是直接创建的,所以没有grad_fn,而y是通过一个加法操作创建的,所以它有grad_fn。像x这种直接创建的称为叶子节点,叶子节点对应的grad_fnNone.

    z=y*y*3
    out=z.mean()
    print(z,out)
    
    tensor([[27., 27.],
            [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
    

    通过.requires_grad_()来用in-place的方式改变requires_grad属性:

    a=torch.randn(2,2)
    a=((a*3)/(a-1))
    print(a.requires_grad)
    a.requires_grad_(True)
    print(a.requires_grad)
    b=(a*a).sum()
    print(b.grad_fn)
    
    False
    True
    <SumBackward0 object at 0x000001BA1B92FBA8>
    

    梯度

    因为out是一个标量,所以调用backward()时不需要指定求导变量:

    out
    
    tensor(27., grad_fn=<MeanBackward0>)
    
    out.backward()
    
    print(x.grad)
    
    tensor([[4.5000, 4.5000],
            [4.5000, 4.5000]])
    
    x
    
    tensor([[1., 1.],
            [1., 1.]], requires_grad=True)
    

    out为o,因为:

    [o=1/4 sum_{i=1}^{4}3(x_i+2)^2 ]

    所以:

    [frac {partial o } {partial x_i }|_{x_i=1}=9/2=4.5 ]

    量为向量的函数对于向量的梯度就是一个雅可比矩阵J,而torch.autograd这个包就是用来计算一些雅可比矩阵的乘积的,例如,如果v是已给标量函数的 $$ l=g( y^{ ightarrow} ) $$ 的梯度:

    [v=( frac {partial l} {y_1} ... frac {partial l} {y_m}) ]

    根据链式法则,我们有l关于 $$ x^{ ightarrow} $$ 的雅可比矩阵

    [VJ= (frac {partial l} {x_1} ... frac {partial l} {x_m} ) ]

    注意:grad 在反向传播过程中是累加的,这意味着每一次运行反向传播,梯度都会累加之前的梯度,所以一般在反向传播之前需要把梯度清零。

    out2=x.sum();out2
    
    tensor(4., grad_fn=<SumBackward0>)
    
    out2.backward()
    print(x.grad)
    
    tensor([[5.5000, 5.5000],
            [5.5000, 5.5000]])
    
    out3=x.sum()
    x.grad.data.zero_()
    out3.backward()
    print(x.grad)
    
    tensor([[1., 1.],
            [1., 1.]])
    

    小练习:

    a=torch.tensor([1,2,3],requires_grad=True,dtype=torch.float32)
    
    print(a.grad)
    
    None
    
    b=a**2;b
    
    tensor([1., 4., 9.], grad_fn=<PowBackward0>)
    
    b.requires_grad
    
    True
    
    w=torch.tensor([0.1,0.2,0.3])
    
    b.backward(w)
    
    print(a.grad)
    
    tensor([0.2000, 0.8000, 1.8000])
    
    d=b.sum();d
    
    tensor(14., grad_fn=<SumBackward0>)
    
    d.requires_grad
    
    True
    

    d.backward()


    RuntimeError Traceback (most recent call last)
    in
    ----> 1 d.backward()

    E:softwareAnacondaenvspytorch_envlibsite-packages orch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253 create_graph=create_graph,
    254 inputs=inputs)
    --> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256
    257 def register_hook(self, hook):

    E:softwareAnacondaenvspytorch_envlibsite-packages orchautograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147 Variable.execution_engine.run_backward(
    148 tensors, grad_tensors
    , retain_graph, create_graph, inputs,
    --> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
    150
    151

    RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

    d=2*x
    for i in range(11):
        d.backward(retain_graph=True)
        print(x.grad)
    
    tensor(4.)
    tensor(6.)
    tensor(8.)
    tensor(10.)
    tensor(12.)
    tensor(14.)
    tensor(16.)
    tensor(18.)
    tensor(20.)
    tensor(22.)
    tensor(24.)
    
    
    ```
    d=2*x
    for i in range(11):
        d.backward()
        print(x.grad)
    ```
    tensor(26.)
    

    RuntimeError Traceback (most recent call last)
    in
    1 d=2*x
    2 for i in range(11):
    ----> 3 d.backward()
    4 print(x.grad)

    E:softwareAnacondaenvspytorch_envlibsite-packages orch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253 create_graph=create_graph,
    254 inputs=inputs)
    --> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256
    257 def register_hook(self, hook):

    E:softwareAnacondaenvspytorch_envlibsite-packages orchautograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147 Variable.execution_engine.run_backward(
    148 tensors, grad_tensors
    , retain_graph, create_graph, inputs,
    --> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
    150
    151

    RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

    c=a.sum();c
    
    tensor(6., grad_fn=<SumBackward0>)
    
    c.backward()
    
    a.grad
    
    tensor([1.2000, 1.8000, 2.8000])
    
    a.grad.data.zero_()
    
    tensor([0., 0., 0.])
    
    c=a.sum()
    c.backward()
    print(a.grad)
    
    tensor([1., 1., 1.])
    
    torch.arange(0,9).view(3,3)
    
    tensor([[0, 1, 2],
            [3, 4, 5],
            [6, 7, 8]])
    
    torch.arange(0,9).view(3,3).sum()
    
    tensor(36)
    
    • 更实际的例子
    x=torch.tensor([1.0,2.0,3.0,4.0],requires_grad=True) #注意赋值时候是1.0,而不是1,2,3,否则dtype不是torch.float
    
    x.dtype
    
    torch.float32
    
    y=2*x
    z=y.view(2,2)
    print(z)
    v=torch.tensor([[1.0,0.1],[0.01,0.001]],dtype=torch.float)
    z.backward(v)
    print(x.grad)
    
    tensor([[2., 4.],
            [6., 8.]], grad_fn=<ViewBackward>)
    tensor([2.0000, 0.2000, 0.0200, 0.0020])
    
    • 中断梯度追踪的例子
    x=torch.tensor(1.0,requires_grad=True)
    y1=x**2
    with torch.no_grad():
        y2=x**3
    y3=y1+y2
    
    print(x.requires_grad)
    print(y1,y1.requires_grad)
    print(y2,y2.requires_grad)
    print(y3,y3.requires_grad)
    
    True
    tensor(1., grad_fn=<PowBackward0>) True
    tensor(1.) False
    tensor(2., grad_fn=<AddBackward0>) True
    
    y3.backward()
    print(x.grad)
    
    tensor(2.)
    
    
     $$ y_3=y_1+y_2=x^2+x^3 $$ ,当x=1时, $$ frac {d y_3} {dx}  $$ 不应该是5么?实际上,由于 y2的定义被`torch.no_grad()`包裹,所以与y2有关的梯度是不会回传的,只有y1有关的梯度才会回传。
    

    上面提到,y2.requires_grad=False,所以不能调用y2.backward(),会报错:

    y2.backward()


    RuntimeError Traceback (most recent call last)
    in
    ----> 1 y2.backward()

    E:softwareAnacondaenvspytorch_envlibsite-packages orch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253 create_graph=create_graph,
    254 inputs=inputs)
    --> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256
    257 def register_hook(self, hook):

    E:softwareAnacondaenvspytorch_envlibsite-packages orchautograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147 Variable.execution_engine.run_backward(
    148 tensors, grad_tensors
    , retain_graph, create_graph, inputs,
    --> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
    150
    151

    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    此外,若我们要修改tensor的数值,但又不希望被autograd记录(即不影响反向传播),那么就可以对tensor.data操作.

    x=torch.ones(1,requires_grad=True)
    print(x.data) # 还是一个tensor
    print(x.data.requires_grad) #但已经独立于计算图之外
    
    y=2*x
    x.data*=100 #仅仅改变了值,不会记录在计算图,所以不会影响梯度传播
    
    y.backward()
    print(x)
    print(x.grad)
    
    tensor([1.])
    False
    tensor([100.], requires_grad=True)
    tensor([2.])
    

    注意reshape的使用

    考虑 $$ y=sum_{i=1}^{n} {x_i} $$

    example 1:

    x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
    
    y=x.sum()
    print(y)
    y.backward()
    print(x.grad)
    
    tensor(15., grad_fn=<SumBackward0>)
    tensor([[1., 1., 1., 1., 1.]])
    

    example 2:故意多一个步骤,让输入变下形状

    x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True).reshape(-1,1);x
    
    tensor([[1.],
            [2.],
            [3.],
            [4.],
            [5.]], grad_fn=<ViewBackward>)
    
    y=x.sum()
    y.backward()
    print(x.grad)
    
    None
    
    
    E:softwareAnacondaenvspytorch_envlibsite-packagesipykernel\__main__.py:3: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
      app.launch_new_instance()
    

    如果初始时就使用reshape,那么被求导的变量实际是reshape之前的tensor,而非x,但被要求求导的对象没有变量名,所以不能使用.grad,正确的方法:

    x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
    print(x)
    z=x.reshape(-1,1)
    print(z)
    y=z.sum()
    y.backward()
    x.grad
    
    
    tensor([[1., 2., 3., 4., 5.]], requires_grad=True)
    tensor([[1.],
            [2.],
            [3.],
            [4.],
            [5.]], grad_fn=<ViewBackward>)
    
    
    
    
    
    tensor([[1., 1., 1., 1., 1.]])
    
    
    
    ##### 愿你一寸一寸地攻城略地,一点一点地焕然一新 #####
  • 相关阅读:
    easyUI 后台经典框架DEMO下载
    一个通过JSONP跨域调用WCF REST服务的例子(以jQuery为例)
    WCF的三个名称/命名空间,你是否傻傻分不清楚?
    未找到与约束ContractName Microsoft.VisualStudio.Text.ITextDocumentFactoryService... 匹配的导出 VS2012报错
    jquery easyui tabs单击刷新右键刷新
    在64位Windows7上安装64位Oracle11g
    Linux入门
    服务器硬件知识
    IP后面带/30 /29 /27等是什么意思?
    vitualbox安装centos7卡死
  • 原文地址:https://www.cnblogs.com/johnyang/p/14954063.html
Copyright © 2020-2023  润新知