• 基于Ubuntu16.04的GeForce GTX 1080驱动安装,遇到的问题及对应的解决方法


    1.在主机上插上GPU之后,查看设备:

    $ nvidia-smi
    Tue Dec  5 10:36:43 2017       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
    |  0%   34C    P8     8W / 200W |    284MiB /  8112MiB |      1%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID  Type  Process name                               Usage      |
    |=============================================================================|
    |    0      1008    G   /usr/lib/xorg/Xorg                             117MiB |
    |    0      1614    G   compiz                                         155MiB |
    |    0      1886    G   fcitx-qimpanel                                   9MiB |
    +-----------------------------------------------------------------------------+

    可见系统已经检测到GeForce GTX 1080.

    另外,这台机器之前搭载过1060,从上面的结果还可以看到对应的驱动NVIDIA 375.66还在;而使用GTX1080对应要装驱动NVIDIA 367.27

    $ sudo add-apt-repository ppa:graphics-drivers/ppa
    $ sudo apt-get update

    中间过程遇到Y/n时候直接回车继续

    然后装驱动nvidia-367

    $ sudo apt-get install nvidia-367

    在这一步,因为与之前的驱动nvidia375存在冲突,会报错:

    Building initial module for 4.10.0-32-generic
    ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-384.0.crash'
    Error! Bad return status for module build on kernel: 4.10.0-32-generic (x86_64)
    Consult /var/lib/dkms/nvidia-384/384.98/build/make.log for more information.
    dpkg: error processing package nvidia-384 (--configure):
     subprocess installed post-installation script returned error exit status 10
    dpkg: dependency problems prevent configuration of libcuda1-384:
     libcuda1-384 depends on nvidia-384 (>= 384.98); however:
      Package nvidia-384 is not configured yet.
    
    dpkg: error processing package libcuda1-384 (--configure):
     dependency problems - leaving unconfigured
    dpkg: dependency problems prevent configuration of nvidia-367:
     nvidia-367 depends on nvidia-384; however:
      Package nvidia-384 is not configured yet.
    
    dpkg: error processing package nvidia-367 (--configure):
     dependency problems - leaving unconfigured
    dpkg: dependency problems prevent configuration of nvidia-opencl-icd-384:
     nvidia-opencl-icd-384 depends on nvidia-384 (>= 384.98); however:
      Package nvidia-384 is not configured yet.
    
    dpkg: error processing package nvidia-opencl-icd-384 (--configure):
     dependency problems - leaving unconfigured
    Setting up nvidia-prime (0.8.2) ...
    No apport report written because the error message indicates its a followup error from a previous failure.
    No apport report written because the error message indicates its a followup error from a previous failure.
    No apport report written because MaxReports is reached already
    Processing triggers for libc-bin (2.23-0ubuntu9) ...
    Processing triggers for initramfs-tools (0.122ubuntu8.8) ...
    update-initramfs: Generating /boot/initrd.img-4.10.0-32-generic
    Errors were encountered while processing:
     nvidia-384
     nvidia-375
     libcuda1-384
     libcuda1-375
     nvidia-367
     nvidia-opencl-icd-384
     nvidia-opencl-icd-375
    E: Sub-process /usr/bin/dpkg returned an error code (1)

    对于这个问题,先把之前的驱动卸掉

    $ sudo apt-get remove --purge nvidia-375

    然后看log文件为什么编译内核报错

    $ vim /var/lib/dkms/nvidia-384/384.98/build/make.log
    ......
     CONFTEST: drm_atomic_available
     CONFTEST: drm_atomic_modeset_nonblocking_commit_available
     CONFTEST: is_export_symbol_gpl_refcount_inc
     CONFTEST: is_export_symbol_gpl_refcount_dec_and_test
      CC [M]  /var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-instance.o
      CC [M]  /var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-gpu-numa.o
    cc: error: unrecognized command line option ‘-fstack-protector-strong’
    scripts/Makefile.build:294: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-instance.o' failed
    make[2]: *** [/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-instance.o] Error 1
    make[2]: *** Waiting for unfinished jobs....
      CC [M]  /var/lib/dkms/nvidia-384/384.98/build/nvidia/nv.o
      CC [M]  /var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-frontend.o
    cc: error: unrecognized command line option ‘-fstack-protector-strong’
    scripts/Makefile.build:294: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-gpu-numa.o' failed
    make[2]: *** [/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-gpu-numa.o] Error 1
    cc: error: unrecognized command line option ‘-fstack-protector-strong’
    cc: error: unrecognized command line option ‘-fstack-protector-strong’
    scripts/Makefile.build:294: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-frontend.o' failed
    make[2]: *** [/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-frontend.o] Error 1
    scripts/Makefile.build:294: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv.o' failed
    make[2]: *** [/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv.o] Error 1
    Makefile:1524: recipe for target '_module_/var/lib/dkms/nvidia-384/384.98/build' failed
    make[1]: *** [_module_/var/lib/dkms/nvidia-384/384.98/build] Error 2
    make[1]: Leaving directory '/usr/src/linux-headers-4.10.0-32-generic'
    Makefile:81: recipe for target 'modules' failed
    make: *** [modules] Error 2

    从网上查了一下,得知‘-fstack-protector-strong’ 选项是gcc4.9以后的版本才加入的,也就是说需要安装gcc4.9以后的版本才可以编译通过.

    通过 gcc -v 命令查看机器上的gcc是4.8版本,确认是gcc版本问题,所以升级gcc到4.9版本:

    $ sudo apt-get install gcc-4.9
    $ cd /usr/bin/
    $ sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc -f
    $ gcc -v

    然后继续驱动安装

    $ sudo apt-get install nvidia-367
    $ sudo apt-get install mesa-common-dev
    $ sudo apt-get install freeglut3-dev

    之后重启系统让GTX1080显卡驱动生效.

    2.CUDA8(支持GTX1080)的下载安装

    (因为本机器之前已经装过,所以这里先直接测试,过段时间有空再重新搞机器踩一下坑再更新)

    3.测试

    通过nvidia-smi看到驱动改为了nvidia384(有些人显示的是nvidia367,虽然这里显示不同,但是从编译过程中看到nvidia367依赖于nvidia384,而且后面的测试和使用也没问题,所以没影响)

    $ nvidia-smi
    Tue Dec  5 15:27:51 2017       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.98                 Driver Version: 384.98                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
    | 33%   62C    P2   139W / 200W |   7898MiB /  8112MiB |     57%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0      1008      G   /usr/lib/xorg/Xorg                           188MiB |
    |    0      1508      G   compiz                                       110MiB |
    |    0      4491      C   python                                      7587MiB |
    +-----------------------------------------------------------------------------+

    样例测试1:

    $ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery
    $ make
    $ ./deviceQuery
    ./deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "GeForce GTX 1080"
      CUDA Driver Version / Runtime Version          9.0 / 8.0
      CUDA Capability Major/Minor version number:    6.1
      Total amount of global memory:                 8113 MBytes (8506769408 bytes)
      (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
      GPU Max Clock rate:                            1848 MHz (1.85 GHz)
      Memory Clock rate:                             5005 Mhz
      Memory Bus Width:                              256-bit
      L2 Cache Size:                                 2097152 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
    Result = PASS

    样例测试2:

    $ cd NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
    $ make
    $ ./nbody -benchmark -numbodies=256000 -device=0
    Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance) 
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation) 
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)
    
    NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
    
    > Windowed mode
    > Simulation data stored in video memory
    > Single precision floating point simulation
    > 1 Devices used for simulation
    gpuDeviceInit() CUDA Device [0]: "GeForce GTX 1080
    > Compute 6.1 CUDA device: [GeForce GTX 1080]
    number of bodies = 256000
    256000 bodies, total time for 10 iterations: 2981.761 ms
    = 219.790 billion interactions per second
    = 4395.792 single-precision GFLOP/s at 20 flops per interaction

    4. 查看GPU工作状态

    使用nvidia-smi命令即可。

    如果要周期性显示,例如每10s 显示一次GPU的情况:

    $ watch -n 10 nvidia-smi

    具体如下所示:重要的参数主要是温度、内存使用、GPU占有率,具体如下红框所示。

    另附:nvidia-smi 命令解读 

    ======================================================================================

    补充 2018.2.3

    最近在另一台服务器装GTX1060之后遇到的问题:

    ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

    原因:驱动装了cuda8.0和cudnn8.0版本,而tensorflow-gpu1.5的版本要求cuda9.0
    解决方法:回滚tensorflow-gpu到1.4版本

    pip install tensorflow-gpu==1.4 -i https://pypi.tuna.tsinghua.edu.cn/simple gevent

    参考:

    深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0

    ubuntu 16.04 更新 gcc/g++ 4.9.2

    Linux下监视NVIDIA的GPU使用情况

    查看GPU实时工作状态的命令

    http://blog.csdn.net/w5688414/article/details/79187499

  • 相关阅读:
    使用循环计算斐波那契数列
    java 面向过程实现万年历
    学习js回调函数
    序列化和反序列化
    SQL语句增加字段、修改字段、修改类型、修改默认值
    使用SQL语句查询日期(当月天数,当月第一天,当月最后一天,本年最后一天,当月第一个星期) 日期转字符串
    RGB颜色表
    String 与StringBuilder有什么区别
    C# byte数组与Image的相互转换
    CentOS 7 MySQL、Tomcat、Zookeeper设置开机自启
  • 原文地址:https://www.cnblogs.com/bymo/p/7987415.html
Copyright © 2020-2023  润新知