• 解决TensorFlow程序无限制占用GPU


    今天遇到一个奇怪的现象,使用tensorflow-gpu的时候,出现内存超额~~如果我训练什么大型数据也就算了,关键我就写了一个y=W*x.......显示如下图所示:

    程序如下:

    import tensorflow as tf
    
    w = tf.Variable([[1.0,2.0]])
    b = tf.Variable([[2.],[3.]])
    
    y = tf.multiply(w,b)
    
    init_op = tf.global_variables_initializer()
    
    with tf.Session() as sess:
        sess.run(init_op)
        print(sess.run(y))
    

    出错提示:

    • 占用的内存越来越多,程序崩溃之后,整个电脑都奔溃了,因为整个显卡全被吃了
    2018-06-10 18:28:00.263424: I T:srcgithub	ensorflow	ensorflowcoreplatformcpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    2018-06-10 18:28:00.598075: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:1356] Found device 0 with properties: 
    name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
    pciBusID: 0000:01:00.0
    totalMemory: 6.00GiB freeMemory: 4.97GiB
    2018-06-10 18:28:00.598453: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:1435] Adding visible gpu devices: 0
    2018-06-10 18:28:01.265600: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-06-10 18:28:01.265826: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:929]      0 
    2018-06-10 18:28:01.265971: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:942] 0:   N 
    2018-06-10 18:28:01.266220: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4740 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
    2018-06-10 18:28:01.331056: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 4.63G (4970853120 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.399111: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 4.17G (4473767936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.468293: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 3.75G (4026391040 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.533138: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 3.37G (3623751936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.602452: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 3.04G (3261376768 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.670225: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 2.73G (2935238912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.733120: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 2.46G (2641714944 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.800101: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 2.21G (2377543424 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.862064: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.99G (2139789056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.925434: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.79G (1925810176 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:01.986180: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.61G (1733229056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.043456: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.45G (1559906048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.103531: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.31G (1403915520 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.168973: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.18G (1263524096 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.229387: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 1.06G (1137171712 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.292997: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 976.04M (1023454720 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.356714: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 878.44M (921109248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.418167: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 790.59M (828998400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-06-10 18:28:02.482394: E T:srcgithub	ensorflow	ensorflowstream_executorcudacuda_driver.cc:936] failed to allocate 711.54M (746098688 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    

    分析原因:

      1. 显卡驱动不是最新版本,用__驱动软件__更新一下驱动,或者自己去下载更新。
      1. TF运行太多,注销全部程序冲洗打开。
      1. 由于TF内核编写的原因,默认占用全部的GPU去训练自己的东西,也就是像meiguo一样优先政策吧

      这个时候我们得设置两个方面:

      1. 选择什么样的占用方式?优先占用__还是__按需占用
      2. 选择最大占用多少GPU,因为占用过大GPU会导致其它程序奔溃。最好在0.7以下

    先更新驱动:

    再设置TF程序:

    注意:单独设置一个不行!按照网上大神博客试了,结果效果还是很差(占用很多GPU)

    设置TF:

    • 按需占用
    • 最大占用70%GPU

    修改代码如下:

    import tensorflow as tf
    
    w = tf.Variable([[1.0,2.0]])
    b = tf.Variable([[2.],[3.]])
    
    y = tf.multiply(w,b)
    
    init_op = tf.global_variables_initializer()
    
    config = tf.ConfigProto(allow_soft_placement=True)
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:
        sess.run(init_op)
        print(sess.run(y))
    

    成功解决:

    2018-06-10 18:21:17.532630: I T:srcgithub	ensorflow	ensorflowcoreplatformcpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    2018-06-10 18:21:17.852442: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:1356] Found device 0 with properties: 
    name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
    pciBusID: 0000:01:00.0
    totalMemory: 6.00GiB freeMemory: 4.97GiB
    2018-06-10 18:21:17.852817: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:1435] Adding visible gpu devices: 0
    2018-06-10 18:21:18.511176: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-06-10 18:21:18.511397: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:929]      0 
    2018-06-10 18:21:18.511544: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:942] 0:   N 
    2018-06-10 18:21:18.511815: I T:srcgithub	ensorflow	ensorflowcorecommon_runtimegpugpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4740 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
    [[2. 4.]
     [3. 6.]]
    

    参考资料:

    主要参考博客

    错误实例

  • 相关阅读:
    vue3自定义指令过滤input的输入内容
    我的收藏周刊011
    我的收藏周刊012
    20192416《系统与网络攻防技术》实验八实验报告
    Dapr学习(3)之服务调用概述
    摄影后期调色RAW数码照片图像HDR全景制作Lightroom(Lr)2022(Mac/win)
    新入手的苹果电脑需要安装的几款工具
    图片编辑工具Photoshop 2022中文
    Xmind 2022激活版终于等到你!
    矢量图像绘图设计(AI 2022)Illustrator 2022中文
  • 原文地址:https://www.cnblogs.com/wjy-lulu/p/9164160.html
Copyright © 2020-2023  润新知