• 问题记录 | deepin15.10重装nvidia驱动及cuda


    问题描述:

    nvidia-smi也有显示,显卡驱动是在的,而且nvcc显示出来的cuda版本9.0也没错,不是9.1。不知道问题所在,索性重装全部。

    sudo tee /proc/acpi/bbswitch <<<ON
    # ON
    nvidia-smi
    

    显示如下:

    Tue May 28 22:21:07 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 950M    Off  | 00000000:01:00.0 Off |                  N/A |
    | N/A   50C    P0    N/A /  N/A |      0MiB /  2004MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
    nvcc --version
    

    显示如下:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2017 NVIDIA Corporation
    Built on Fri_Sep__1_21:08:03_CDT_2017
    Cuda compilation tools, release 9.0, V9.0.176
    
    lspci | grep -i nvidia
    

    显示如下:

    01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 950M] (rev a2)
    

    检查pytorch调用cuda是否正常:

    python -c 'import torch; print(torch.cuda.is_available())'
    

    显示如下:

    False
    

    卸载cuda

    sudo /usr/local/cuda-9.0/bin/uninstall_cuda_9.0.pl  
    #这里之后只剩下cudnn的东西,也可以完全删了。
    sudo rm -rf /usr/local/cuda-9.0/
    

    卸载nvidia驱动及大黄蜂bunmblebee

    sudo apt-get remove --purge nvidia-cuda-dev nvidia-cuda-toolkit nvidia-nsight nvidia-visual-profiler
    sudo apt autoremove --purge bumblebee-nvidia nvidia-driver nvidia-settings
    

    安装显卡驱动和大黄蜂bumblebee

    sudo apt-get install nvidia-smi
    sudo apt-get install bumblebee-nvidia nvidia-driver nvidia-settings
    

    安装显卡驱动测试程序

    sudo apt-get install mesa-utils
    

    显示N卡相关信息:

    optirun glxinfo|grep NVIDIA
    

    运行测试程序

    optirun glxgears -info
    

    成功调用显卡驱动,信息如下:

    GL_RENDERER   = GeForce GTX 950M/PCIe/SSE2
    GL_VERSION    = 4.6.0 NVIDIA 390.67
    GL_VENDOR     = NVIDIA Corporation
    

    安装cuda

    https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
    下载runfile

    sudo ./cuda_9.0.176_384.81_linux.run
    

    安装过程只有这个选no

    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
    (y)es/(n)o/(q)uit: n
    

    下载安装cudnn

    <https://developer.nvidia.com/rdp/cudnn-archive>

    登录下载对应版本我是选择了

    cudnn-9.0-linux-x64-v7.5.0.56

    这个版本的

    把对应的额外的cudnn库放入cuda对应的位置:

    sudo cp lib64/* /usr/local/cuda/lib64/
    sudo cp include/* /usr/local/cuda/include/
    

    然后检查环境变量并开启默认N卡

    # 检查LD_LIABRARY_PATH和PATH
    sudo vim ~/.bashrc
    
    # 用大黄蜂开启默认N卡
    sudo tee /proc/acpi/bbswitch<<<ON
    

    再次检查pytorch是否能调用cuda

    python -c "import torch;print(torch.cuda.is_available())"
    

    显示如下:

    True
    

    检查tensorflow是否正常调用gpu

    python3 -c "import tensorflow as tf;print(tf.test.is_gpu_available());print(tf.test.gpu_device_name())"
    

    显示如下:

    2019-05-28 22:52:25.862539: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-05-28 22:52:26.319239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-05-28 22:52:26.319674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
    name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
    pciBusID: 0000:01:00.0
    totalMemory: 1.96GiB freeMemory: 1.92GiB
    2019-05-28 22:52:26.319696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    

    都正常了,没有比我这更复杂了吧,卸了重装,有卸载过程和安装过程。

  • 相关阅读:
    每天一个Linux命令(03):du命令
    每天一个linux命令(02):route命令
    Ubuntu相关配置
    kvm 虚拟机XML文件
    virtio,macvtap,sriov
    dns配置文件
    Bug预防体系(上千bug分析后总结的最佳实践)
    python-函数
    python实用脚本集
    深入浅出QOS详解(转)
  • 原文地址:https://www.cnblogs.com/ManWingloeng/p/10941075.html
Copyright © 2020-2023  润新知