• 【tf.keras】Resource exhausted: OOM when allocating tensor with shape [9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc


    运行以下类似代码:

    while True:
        inputs, outputs = get_AlexNet()
        model = tf.keras.Model(inputs=inputs, outputs=outputs)
        
        model.summary()
    
        adam_opt = tf.keras.optimizers.Adam(learning_rate)
        # The compile step specifies the training configuration.
        model.compile(optimizer=adam_opt, loss='categorical_crossentropy', metrics=['accuracy'])
    
        # load weights from h5 file
        model.load_weights('alexnet_weights.h5')
    

    最后会报错:

    OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    

    解决办法:

    from keras import backend as K
    K.clear_session()
    

    如:

    from keras import backend as K
    
    while True:
        # 清空之前model占用的内存,防止OOM
        K.clear_session()
    
        inputs, outputs = get_AlexNet()
        model = tf.keras.Model(inputs=inputs, outputs=outputs)
        
        model.summary()
    
        adam_opt = tf.keras.optimizers.Adam(learning_rate)
        # The compile step specifies the training configuration.
        model.compile(optimizer=adam_opt, loss='categorical_crossentropy', metrics=['accuracy'])
    
        # load weights from h5 file
        model.load_weights('alexnet_weights.h5')
    

    详细报错如下:

    2019-06-03 21:54:24.789150: W T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB.  Current allocation summary follows.
    2019-06-03 21:54:24.804684: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (256): 	Total Chunks: 243, Chunks in use: 243. 60.8KiB allocated for chunks. 60.8KiB in use in bin. 6.6KiB client-requested in use in bin.
    2019-06-03 21:54:24.813190: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (512): 	Total Chunks: 19, Chunks in use: 19. 14.3KiB allocated for chunks. 14.3KiB in use in bin. 14.3KiB client-requested in use in bin.
    2019-06-03 21:54:24.841197: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (1024): 	Total Chunks: 52, Chunks in use: 52. 62.5KiB allocated for chunks. 62.5KiB in use in bin. 60.6KiB client-requested in use in bin.
    2019-06-03 21:54:24.843308: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (2048): 	Total Chunks: 2, Chunks in use: 2. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 3.0KiB client-requested in use in bin.
    2019-06-03 21:54:24.844847: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.846267: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (8192): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.848125: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (16384): 	Total Chunks: 31, Chunks in use: 31. 511.0KiB allocated for chunks. 511.0KiB in use in bin. 496.0KiB client-requested in use in bin.
    2019-06-03 21:54:24.849356: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (32768): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.850511: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (65536): 	Total Chunks: 16, Chunks in use: 16. 1.43MiB allocated for chunks. 1.43MiB in use in bin. 1.42MiB client-requested in use in bin.
    2019-06-03 21:54:24.852015: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (131072): 	Total Chunks: 23, Chunks in use: 23. 3.72MiB allocated for chunks. 3.72MiB in use in bin. 3.46MiB client-requested in use in bin.
    2019-06-03 21:54:24.863147: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (262144): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.864633: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.865992: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (1048576): 	Total Chunks: 17, Chunks in use: 17. 21.15MiB allocated for chunks. 21.15MiB in use in bin. 19.92MiB client-requested in use in bin.
    2019-06-03 21:54:24.867384: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (2097152): 	Total Chunks: 52, Chunks in use: 52. 144.75MiB allocated for chunks. 144.75MiB in use in bin. 137.86MiB client-requested in use in bin.
    2019-06-03 21:54:24.868803: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (4194304): 	Total Chunks: 3, Chunks in use: 3. 17.16MiB allocated for chunks. 17.16MiB in use in bin. 10.13MiB client-requested in use in bin.
    2019-06-03 21:54:24.870144: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.871061: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (16777216): 	Total Chunks: 3, Chunks in use: 2. 62.97MiB allocated for chunks. 42.20MiB in use in bin. 37.19MiB client-requested in use in bin.
    2019-06-03 21:54:24.871849: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.874994: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (67108864): 	Total Chunks: 21, Chunks in use: 21. 1.40GiB allocated for chunks. 1.40GiB in use in bin. 1.31GiB client-requested in use in bin.
    2019-06-03 21:54:24.875718: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (134217728): 	Total Chunks: 20, Chunks in use: 20. 2.98GiB allocated for chunks. 2.98GiB in use in bin. 2.81GiB client-requested in use in bin.
    2019-06-03 21:54:24.876800: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:630] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-06-03 21:54:24.877455: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:646] Bin for 144.00MiB was 128.00MiB, Chunk State: 
    2019-06-03 21:54:24.877906: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:665] Chunk at 0000000B03E00000 of size 1280
    2019-06-03 21:54:24.878316: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:665] Chunk at 0000000B03E00500 of size 256
    2019-06-03 21:54:24.879415: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:665] Chunk at 0000000B03E00600 of size 256
    2019-06-03 21:54:24.879816: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:665] Chunk at 0000000B03E00700 of size 256
    ...
    2019-06-03 21:54:24.998647: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:674] 1 Chunks of size 256733696 totalling 244.84MiB
    2019-06-03 21:54:24.998857: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:678] Sum Total of in-use chunks: 4.60GiB
    2019-06-03 21:54:24.999076: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:680] Stats: 
    Limit:                  4965636505
    InUse:                  4943860224
    MaxInUse:               4943860224
    NumAllocs:                 2362778
    MaxAllocSize:            516972544
    
    2019-06-03 21:54:24.999520: W T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:279] ********x************************************************************************x*****************x
    2019-06-03 21:54:25.001526: W T:srcgithub	ensorflow	ensorflowcoreframeworkop_kernel.cc:1275] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    2019-06-03 21:54:25.108672: W T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 372.96MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2019-06-03 21:54:25.129713: W T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 482.40MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2019-06-03 21:54:25.145367: W T:srcgithub	ensorflow	ensorflowcorecommon_runtimefc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 331.52MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    Traceback (most recent call last):
      File "E:/PycharmProjects/ActiveLearning/AlexNet_AL.py", line 156, in <module>
        validation_data=(x_val, y_val))
      File "C:	aotaoPythonPython36libsite-packages	ensorflowpythonkerasengine	raining.py", line 1363, in fit
        validation_steps=validation_steps)
      File "C:	aotaoPythonPython36libsite-packages	ensorflowpythonkerasengine	raining_arrays.py", line 264, in fit_loop
        outs = f(ins_batch)
      File "C:	aotaoPythonPython36libsite-packages	ensorflowpythonkerasackend.py", line 2914, in __call__
        fetched = self._callable_fn(*array_vals)
      File "C:	aotaoPythonPython36libsite-packages	ensorflowpythonclientsession.py", line 1382, in __call__
        run_metadata_ptr)
      File "C:	aotaoPythonPython36libsite-packages	ensorflowpythonframeworkerrors_impl.py", line 519, in __exit__
        c_api.TF_GetCode(self.status.status))
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    	 [[Node: training_4/Adam/gradients/dense/kernel/Regularizer/Square_grad/Mul_1 = Mul[T=DT_FLOAT, _class=["loc:@training_4/Adam/gradients/AddN_5"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense/kernel/Regularizer/Square/ReadVariableOp, training_4/Adam/gradients/dense/kernel/Regularizer/Square_grad/Mul)]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
    	 [[Node: metrics_4/acc/Mean/_1023 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1423_metrics_4/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    

    References

    Keras解决OOM超内存问题 -- silent56_th
    Keras 循环训练模型跑数据时内存泄漏的问题解决办法 -- jemmie_w

  • 相关阅读:
    Windows下安装MongoDB
    介绍了MongoDB在32位Windows7下的安装以及一些简单应用
    简单的实例来理解WCF 数据服务
    对缓存的思考——提高命中率
    如何在安装过程中部署DevExpress控件
    [WCF REST] 解决资源并发修改的一个有效的手段:条件更新(Conditional Update)
    MEF实现IoC
    Lucene索引分析工具Luke.Net 0.5升级版 (兼容Lucene.Net 2.9.4.1)
    Mongodb在windows下面作为服务启动 出现“服务没有响应控制功能”
    写自己的ASP.NET MVC框架
  • 原文地址:https://www.cnblogs.com/wuliytTaotao/p/10970519.html
Copyright © 2020-2023  润新知