• [pytorch][持续更新]pytorch踩坑汇总


    1. BN层不能少于1张图片
    File "/home/user02/wildkid1024/haq/models/mobilenet.py", line 71, in forward
        x = self.features(x)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
        input = module(input)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
        input = module(input)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/user02/wildkid1024/haq/lib/utils/utils.py", line 244, in lambda_forward
        return m.old_forward(x)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward
        exponential_average_factor, self.eps)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/nn/functional.py", line 1619, in batch_norm
        raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
    ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
    

    问题分析: 模型中用了batchnomolization,训练中用batch训练的时候,应该是有单数,比如dataset的总样本数为17,batch_size为8,就会报这样的错误。
    解决方案: 1. 将dataloader的一个丢弃参数设置为true 2. 手动舍弃小于1的样本数量 3. 如果是验证过程,通过设置model.eval()改变BN层的行为。 4. 如果训练过程中只能使用1个sample,替换BN为InstanceNorm.

    1. 自动求导的时候没有设定变量可微分
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/user02/anaconda2/envs/py3_dl/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
    

    问题分析 模型在使用梯度值的时候没有设置参数的requires_grad=True,导致在取偏导的时候,计算图已经被销毁。
    解决方案 检查一下是否使用了model.eval()和torch.no_grad()函数,如果有就删除,如果没有,那就再input上添加var.required_grad = True。

    1. PyTorch训练一个epoch时,模型不能接着训练,Dataloader卡死,或者程序会非0值退出
      问题分析 与pytorch的多线程有关系,pytorch在多线程读取的时候可能会出现死锁的情况。
      解决方案  1. 检查data读取是否使用了cv2.imread,建议改成PIL的Image读取。或者关闭关闭Opencv的多线程:cv2.setNumThreads(0)和cv2.ocl.setUseOpenCL(False)。 2. 将num_works设置为0,此时数据读取会变慢。如果不想设置为0,那么应当设置pin_memory=True来预先分配内存。
  • 相关阅读:
    windows下mongodb的安装
    命令行执行大sql文件
    用css实现3D立方体旋转特效
    tp框架的详细介绍,tp框架基础
    用smarty来做简易留言系统,明细步骤简单操作
    怎么用php语言来做文件缓存
    用smarty模板做数据实现修改、分页等功能
    用smarty模板做的登录
    smarty函数
    Smarty变量
  • 原文地址:https://www.cnblogs.com/wildkid1024/p/12842272.html
Copyright © 2020-2023  润新知