• [笔记] 将numpy的操作转移到pytorch的tensor上运行可以加速


    简单起见,仅实验了矩阵加法及广播操作,其他操作未实验。

    目前结论是:

    • 将numpy转为pytorch的tensor,可以加速(0.22s -> 0.12s)
    • 如果将tensor加载到gpu上,能够加速更多(0.22s -> 0.0005s),但是内存与显存的拷贝时间不容忽视

    实验过的环境如下,结论都成立:

    • Win10, 64 bit
    • Ubuntu 18.04, 64 bit

    但是据同事在Win10的Linux子系统下验证,据说将numpy转为pytorch的tensor后反而比前者更慢,怀疑是子系统实现产生的问题。

    下面是验证流程。

    import time
    import numpy as np
    import torch
    
    print(torch.__version__)
    
    1.4.0
    
    def check_time(func, run_times=10):
        t = time.time()
        for i in range(run_times):
            func()
        print('avg time = %s sec' % ((time.time()-t)/run_times))
    
    shape = (5000,5000)
    a = np.ones(shape, dtype=np.float)
    b = np.ones(shape, dtype=np.float)
    k = np.ones((shape[0],1), dtype=np.float)
    
    # - simple numpy ndarray plus
    
    def test_np_1():
        c = a+b
        return c
    
    check_time(test_np_1)
    
    avg time = 0.21692438125610353 sec
    
    # - simple numpy ndarray and broadcast
    
    def test_np_2():
        c = a+b+k
        return c
    
    check_time(test_np_2)
    
    avg time = 0.45278918743133545 sec
    
    # - use pytorch tensor
    
    def test_torch_1():
        ta = torch.from_numpy(a)
        tb = torch.from_numpy(b)
        tc = ta+tb
        c = tc.numpy()
        return c
    
    check_time(test_torch_1)
    
    avg time = 0.11778402328491211 sec
    
    # - use pytorch tensor and broadcast
    
    def test_torch_2():
        ta = torch.from_numpy(a)
        tb = torch.from_numpy(b)
        tk = torch.from_numpy(k)
        tc = ta+tb+tk
        c = tc.numpy()
        return c
    
    check_time(test_torch_2)
    
    avg time = 0.2651021957397461 sec
    
    # - check gpu
    
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
    
    ga = torch.from_numpy(a).float().to(device)
    gb = torch.from_numpy(b).float().to(device)
    gk = torch.from_numpy(k).float().to(device)
    
    cuda:0
    
    # - try tensor on gpu
    
    def test_torch_cuda_1():
        ca = torch.from_numpy(a).float().to(device)
        cb = torch.from_numpy(b).float().to(device)
        cc = ca+cb
        c = cc.cpu().numpy()
        return c
    
    check_time(test_torch_cuda_1)
    
    avg time = 0.44039239883422854 sec
    
    # - try tensor on gpu and broadcast
    
    def test_torch_cuda_2():
        ca = torch.from_numpy(a).float().to(device)
        cb = torch.from_numpy(b).float().to(device)
        ck = torch.from_numpy(k).float().to(device)
        cc = ca+cb+ck
        c = cc.cpu().numpy()
        return c
    
    check_time(test_torch_cuda_2)
    
    avg time = 0.4477779150009155 sec
    
    # - try tensor on gpu and broadcast, preload in gpu before call, and not copy to cpu after
    
    def test_torch_cuda_3():
        cc = ga+gb+gk
        return cc
    
    check_time(test_torch_cuda_3)
    
    avg time = 0.0004986286163330078 sec
  • 相关阅读:
    接口自动化平台
    MAC安装社区版本IDEA
    Python比较图片的不同
    快看!markdown的语法原来如此简单~
    说一说你不了解的Tailwind CSS响应式设计~
    Tailwind CSS安装和构建的正确操作方式
    一款绝对让你惊艳的CSS框架——TailwindCSS
    备受争议的PHP前景究竟如何?我们该何去何从?
    laravel8更新之速率限制改进
    laravel8更新之维护模式改进
  • 原文地址:https://www.cnblogs.com/journeyonmyway/p/12520818.html
Copyright © 2020-2023  润新知