• PyTorch 使用一些错误记录


    • 错误一
    Traceback (most recent call last):
    2012   File "train.py", line 131, in <module>
    2013     for _, (input_images, ground_truths, masks) in enumerate(data_loader):
    2014   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    2015     data = self._next_data()
    2016   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    2017     return self._process_data(data)
    2018   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    2019     data.reraise()
    2020   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
    2021     raise self.exc_type(msg)
    2022 OSError: Caught OSError in DataLoader worker process 3.
    2023 Original Traceback (most recent call last):
    2024   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    2025     data = fetcher.fetch(index)
    2026   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    2027     data = [self.dataset[idx] for idx in possibly_batched_index]
    2028   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    2029     data = [self.dataset[idx] for idx in possibly_batched_index]
    2030   File "/home/guoxiefan/PyTorch/ImageInpainting/LBAM/src/dataset.py", line 76, in __getitem__
    2031     ground_truth = self.image_files_transforms(image.convert('RGB'))
    2032   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/PIL/Image.py", line 873, in convert
    2033     self.load()
    2034   File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/PIL/ImageFile.py", line 247, in load
    2035     "(%d bytes not processed)" % len(b)
    2036 OSError: image file is truncated (16 bytes not processed)
    

    解决方案:[Link]

    • 错误二
    Traceback (most recent call last):
      File "train.py", line 136, in <module>
        outputs = generator(input_images, masks)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 148, in forward
        inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in scatter
        return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
        inputs = scatter(inputs, target_gpus, dim) if inputs else []
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
        res = scatter_map(inputs)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
        return list(zip(*map(scatter_map, obj)))
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
        return Scatter.apply(target_gpus, None, dim, obj)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
        outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/cuda/comm.py", line 147, in scatter
        return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
    RuntimeError: cuda runtime error (60) : peer mapping resources exhausted at /opt/conda/conda-bld/pytorch_1579022051443/work/aten/src/THC/THCGeneral.cpp:141
    

    解决:对于nn.DataParallel作用的nn.Module,传入参数一般为实数,或者为原始数据([B * C * H * W])。传入参数与并行有关,需要特别注意。nn.DataParallel并行切分 B 维度。

    • 问题三
    Traceback (most recent call last):
      File "train.py", line 123, in <module>
        for _, (input_images, ground_truths, masks) in enumerate(data_loader):
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
        data = self._next_data()
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
        idx, data = self._get_data()
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
        success, data = self._try_get_data()
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
        data = self._data_queue.get(timeout=timeout)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 113, in get
        return _ForkingPickler.loads(res)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
        fd = df.detach()
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
        return reduction.recv_handle(conn)
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
        return recvfds(s, 1)[0]
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds
        len(ancdata))
    RuntimeError: received 0 items of ancdata
    

    解决 [Link] [Link]

    Traceback (most recent call last):
      File "train.py", line 124, in <module>
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 113, in get
      File "/data/guoxiefan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 310, in rebuild_storage_filename
    RuntimeError: unable to open shared memory object </torch_24388_2219814394> in read-write mode
    

    尽量不要在调用函数中重复创建模型对象(然后.cuda()放到GPU上),例如VGG提取特征的VGG模型,最好在一次创建,然后再传参。

    • 问题四

    writer = SummaryWriter(log_dir) 必须写 close(),会出现 open file too many的错误

  • 相关阅读:
    Python开发入门与实战16-APACHE部署
    Python开发入门与实战15-IIS部署
    Python开发入门与实战14-基于Extjs的界面
    团队作业4:第三篇Scrum冲刺博客(歪瑞古德小队)
    团队作业4:第二篇Scrum冲刺博客(歪瑞古德小队)
    团队作业4:第一篇Scrum冲刺博客(歪瑞古德小队)
    团队作业4:项目冲刺集合贴(歪瑞古德小队)
    团队作业3:需求改进&系统设计(歪瑞古德小队)
    团队作业2:需求规格说明书(歪瑞古德小队)
    使用docker安装codimd,搭建你自己的在线协作markdown编辑器
  • 原文地址:https://www.cnblogs.com/solvit/p/12397546.html
Copyright © 2020-2023  润新知