• Cuda 9.2 CuDnn7.0 官方文档解读


    Cuda 9.2 CuDnn7.0 官方文档解读

    本篇博客主要是解读官方的文档,预先的条件等,不涉及配置
    如果想要安装的话,请看我下一篇博客

    准备工作(下载)

    Cuda Toolkit 9.2下载地址

    国内下载慢,加速方法见之前博客:独家git加速

    cuDNN 下载地址

    用google账号登陆:选择需要的

    1535164557901

    选择第一个:

    1535164580699

    如果有疑问建议直接查看

    官网说明-cuDNN安装手册

    官方说明-Cuda安装手册

    官方说明-tensorflow安装手册

    Cuda9.2的支持

    1535178275452

    Cuda9.0的支持

    1535202522342

    显卡驱动重装

    为了确信是否显卡驱动可以正常使用

    CUDA安装

    系统要求

    • 支持cuda的GPU
    • ubuntu16.04等受支持的Linux版本
    • NVIDIA CUDA工具包
    • 验证是否拥有支持cuda的GPU
    lspci | grep -i nvidia
    

    如果有显示,那么你的GPU就具备了CUDA功能。

    • 验证linux版本是否支持
    uname -m && cat /etc/*release
    

    得到以下显示:

    x86_64
    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=16.04
    DISTRIB_CODENAME=xenial
    DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
    NAME="Ubuntu"
    VERSION="16.04.5 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.5 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial

    • 验证gcc
    gcc --version
    

    得到结果:

    gcc (Ubuntu 4.9.3-13ubuntu2) 4.9.3
    Copyright (C) 2015 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    • 验证内核

    CUDA驱动程序要求在安装驱动程序时安装内核运行版本的内核头文件和开发包,以及重建驱动程序时。例如,如果您的系统运行内核版本3.17.4-301,则还必须安装3.17.4-301内核头文件和开发包。

    虽然Runfile安装不执行包验证,但如果当前没有安装这些包的版本,则驱动程序的RPM和Deb安装将尝试安装内核头和开发包。但是,它将安装这些软件包的最新版本,这些软件包可能与您的系统正在使用的内核版本匹配,也可能不匹配。 因此,最好在安装CUDA驱动程序之前以及更改内核版本时手动确保安装正确版本的内核头文件和开发包。

    可以通过运行以下命令找到系统正在运行的内核版本:

    uname -r
    

    安装符合当前的kernel的内核头文件和开发包:

    sudo apt-get install linux-headers-$(uname -r)
    
    • 下载验证

    可以通过将

    (9.2)http://developer.nvidia.com/cuda-downloads/checksums

    (9.0)https://developer.download.nvidia.com/compute/cuda/9.0/Prod/docs/sidebar/md5sum.txt

    上发布的MD5校验和与下载文件的校验和进行比较来验证下载。如果任一校验和不同,则下载的文件已损坏,需要再次下载。

    9.0的md5校验

    7a00187b2ce5c5e350e68882f42dd507 cuda_9.0.176_384.81_linux.run
    19369a391a7475cace0f3c377aebbecb cuda_9.0.176_mac.dmg
    fca8046970f3e27539802aa0c06a3158 cuda_9.0.176_mac_network.dmg
    ecba5d6c7d86ab5c3c3be97330ca85a0 cuda_9.0.176_win10.exe
    019d6e230bccef462fd0d7af61238c35 cuda_9.0.176_win10_network.exe
    48d85427ddb4c0eae8ee46aea9d3126e cuda_9.0.176_windows.exe
    8acb5a71367f83d7838723385742342f cuda_9.0.176_windows_network.exe
    be0f029e95881adc28cc31381207b522 cuda_cluster_pkgs_9.0.176_rhel6.tar.gz
    d0c6a642bd0f0367f90924704226e8c0 cuda_cluster_pkgs_9.0.176_rhel7.tar.gz
    95a0042baddd9a5a265e5266f45a5d17 cuda_cluster_pkgs_9.0.176_ubuntu1604.tar.gz
    d76d08b6da82e54326851d43435a0c61 cuda-repo-fedora25-9.0.176-1.x86_64.rpm
    a58a6f09f8a1d56f5e6da1d87460e867 cuda-repo-fedora25-9-0-local-9.0.176-1.x86_64.rpm
    918c8a71a00f89c5a733ef58011ad65c cuda-repo-opensuse422-9.0.176-1.x86_64.rpm
    91f5e497e119ac22d9dfa6af1a117b47 cuda-repo-opensuse422-9-0-local-9.0.176-1.x86_64.rpm
    4a1114e243d0e0908495d1ad6c92c239 cuda-repo-rhel6-9.0.176-1.x86_64.rpm
    43e548a51e3705dd51d21b9926208600 cuda-repo-rhel6-9-0-local-9.0.176-1.x86_64.rpm
    f9b8e8c0de6863e7cb1ca47c5d9b8589 cuda-repo-rhel7-9.0.176-1.ppc64le.rpm
    960afa71b11da89980990736c77e0495 cuda-repo-rhel7-9.0.176-1.x86_64.rpm
    8c9d14558cf63e077a61bf78a5bcfb2a cuda-repo-rhel7-9-0-local-9.0.176-1.ppc64le.rpm
    f78a97891c183c47b0cc187a76eaa715 cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm
    dc2784693028df4855db8b31b3937c24 cuda-repo-sles122-9.0.176-1.x86_64.rpm
    c7a2664834c20cfe75f2d1956fc87ae8 cuda-repo-sles122-9-0-local-9.0.176-1.x86_64.rpm
    4b8004dd747dbb664e85b17dec3fee51 cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
    70e7d1027e225043678e7150f7a6fd7b cuda-repo-ubuntu1604_9.0.176-1_ppc64el.deb
    e78e6ff56582f09a0cbc607049bdb2fd cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
    405d20e594b87f44fb8ed3b137e88707 cuda-repo-ubuntu1604-9-0-local_9.0.176-1_ppc64el.deb
    3544486e0b99e633572a65bfb24058ed cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
    d717d6649cbf90d32e3fd2c0c7202be0 cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb
    

    9.2的MD5校验:

    8254739a33574820ff27628453fe33f3 cuda_9.2.64_mac.dmg
    be64422cf5c0bd2cfaa46da7846ac69b cuda_9.2.64_mac_network.dmg
    dd6e33e10d32a29914b7700c7b3d1ca0 cuda_9.2.88_396.26_linux.run
    af15dd3f975786f458532f3e76ddabb8 cuda_9.2.88_win10.exe
    078adfc1368a4574f707f740cab0f7d0 cuda_9.2.88_win10_network.exe
    d087c023d2a2228b1677f574267b7dde cuda_9.2.88_windows.exe
    5580c6cc8fcdcfbff57c26eae8c314d4 cuda_9.2.88_windows_network.exe
    155f07336f25e6fda0a4cabe089d1714 cuda_cluster_pkgs_9.2.88_rhel6.tar.gz
    b08c07f824a42562a8381fb41a4f908d cuda_cluster_pkgs_9.2.88_rhel7.tar.gz
    92953b1d3693651acb752fa28f0cc6dc cuda_cluster_pkgs_9.2.88_ubuntu1604.tar.gz
    4843e002fd0cb5055d9c39cd00694206 cuda-repo-fedora27-9.2.88-1.x86_64.rpm
    ee2ac2f7fe6a5aad34ac2d531d0731a4 cuda-repo-fedora27-9-2-local-9.2.88-1.x86_64.rpm
    ab95956ffd611a7ebb38f550852fc309 cuda-repo-opensuse423-9.2.88-1.x86_64.rpm
    bc0d7fbf211a4b5a07b4a3b63cc89c10 cuda-repo-opensuse423-9-2-local-9.2.88-1.x86_64.rpm
    eec36b2c075269665e41149ea6c67fe6 cuda-repo-rhel6-9.2.88-1.x86_64.rpm
    1340694af5858c9bd509315dd5b1266c cuda-repo-rhel6-9-2-local-9.2.88-1.x86_64.rpm
    1121b80c9a16d58d009c042056482fb0 cuda-repo-rhel7-9.2.88-1.ppc64le.rpm
    7bf25a80b002511501af8f555f384f86 cuda-repo-rhel7-9.2.88-1.x86_64.rpm
    48f44700fbe87ba54f727bd5ddd89718 cuda-repo-rhel7-9-2-local-9.2.88-1.ppc64le.rpm
    c16d67b69f491053534abbdd35bf9696 cuda-repo-rhel7-9-2-local-9.2.88-1.x86_64.rpm
    12a708a88fbd4f99c40f001794ff585b cuda-repo-sles123-9.2.88-1.x86_64.rpm
    35dbcab7f84106673e3f1345ba3eede1 cuda-repo-sles123-9-2-local-9.2.88-1.x86_64.rpm
    62937c222edc19cb49df06978348fea8 cuda-repo-ubuntu1604_9.2.88-1_amd64.deb
    50fb1ccc35e098ff4499cbf954d66bcf cuda-repo-ubuntu1604_9.2.88-1_ppc64el.deb
    3d3e8b7920780e35b839811cc9dd5b46 cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
    465cf1ddd9d6cc186edd6f5d7c96711f cuda-repo-ubuntu1604-9-2-local_9.2.88-1_ppc64el.deb
    5cbe16157b4941289ca3eb5406633f23 cuda-repo-ubuntu1710_9.2.88-1_amd64.deb
    1a5da1a97cbd322f91e87df565a7f669 cuda-repo-ubuntu1710-9-2-local_9.2.88-1_amd64.deb
    2e222e145ef3232912679fa03a320c24 cuda_9.2.64.1_mac.dmg
    0e615d99152b9dfce5da20dfece6b7ea cuda_9.2.88.1_linux.run
    4e1a26062537287e0edb653c40a1cc29 cuda_9.2.88.1_windows.exe
    bb7ecdbac12aa2e7b485611901d1a7eb cuda-repo-fedora27-9-2-local-cublas-update-1-1.0-1.x86_64.rpm
    3125beed3d35629dbc1046fbf9505707 cuda-repo-opensuse423-9-2-local-cublas-update-1-1.0-1.x86_64.rpm
    54fe8c72d4d8f7731ad8e52fd1040342 cuda-repo-rhel6-9-2-local-cublas-update-1-1.0-1.x86_64.rpm
    b579c048ad60d79daa0eaf25733b212a cuda-repo-rhel7-9-2-local-cublas-update-1-1.0-1.ppc64le.rpm
    783b48d12726130d6e1b3ac5d4511bac cuda-repo-rhel7-9-2-local-cublas-update-1-1.0-1.x86_64.rpm
    c581b59f0e724b7936e4c099aa72cba2 cuda-repo-sles123-9-2-local-cublas-update-1-1.0-1.x86_64.rpm
    0ad9c90134f1fc2e8eb249256855c21a cuda-repo-ubuntu1604-9-2-local-cublas-update-1_1.0-1_amd64.deb
    d1761f5229b96d2fe4b8fc9654fed928 cuda-repo-ubuntu1604-9-2-local-cublas-update-1_1.0-1_ppc64el.deb
    9d1174a17f0721dc520c2c6775eba342 cuda-repo-ubuntu1710-9-2-local-cublas-update-1_1.0-1_amd64.deb
    

    要计算下载文件的MD5校验和,请运行以下命令:

    $ md5sum <file>
    

    我下载的是cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb,所以用到的校验码为d717d6649cbf90d32e3fd2c0c7202be0,如果你下载的是cuda_9.0.176_384.81_linux.run,那么对应的校验码是7a00187b2ce5c5e350e68882f42dd507

    处理之前安装的cuda文件

    之前安装的版本会对现有版本造成冲突,需要卸载

    Use the following command to uninstall a Toolkit runfile installation:

    $ sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
    

    Use the following command to uninstall a Driver runfile installation:

    $ sudo /usr/bin/nvidia-uninstall
    

    Use the following commands to uninstall a RPM/Deb installation:

    $ sudo yum remove <package_name>                      # Redhat/CentOS
    $ sudo dnf remove <package_name>                      # Fedora
    $ sudo zypper remove <package_name>                   # OpenSUSE/SLES
    $ sudo apt-get --purge remove <package_name>          # Ubuntu
    

    下载的deb安装过程

    Installation Instructions(下载界面提供):
    sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
    sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
    sudo apt-get update
    sudo apt-get install cuda

    官方说明提供:

    1. 安装元数据
    sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
    
    1. 安装公钥

    使用本地仓库进行安装时:

    $ sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
    

    使用网络仓库安装时:

    $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
    
    1. 更新apt存储库缓存
    sudo apt-get update
    
    1. 安装cuda
    sudo apt-get install cuda
    

    下载的runfile的安装过程

    • 卸载已有的NVIDIA驱动(可选):sudo apt-get remove --purge nvidia*
      一些开发需要的包(可选):
    sudo apt-get install build-essential
    sudo apt-get install vim cmake git
    sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev
    
    • 屏蔽nouveau,sudo vim /etc/modprobe.d/blacklist-nouveau.conf,加入
    blacklist nouveau
    blacklist lbm-nouveau
    options nouveau modeset=0
    alias nouveau off
    alias lbm-nouveau off
    
    • 禁用 nouveau 内核模块
    echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
    sudo update-initramfs -u
    

    重启计算机 sudo reboot

    • 临时关闭图形界面
    sudo service lightdm stop
    
    • 执行安装(注意参数,不安装opengl-lib)
    sudo sh cuda_<version>_linux.run --override --no-opengl-lib
    Do you accept the previously read EULA? (accept/decline/quit): accept
    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64? ((y)es/(n)o/(q)uit): y
    Install the CUDA  Toolkit? ((y)es/(n)o/(q)uit): y
    Enter Toolkit Location [ default is /usr/local/cuda-9.1 ]:
    Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
    Install the CUDA Samples? ((y)es/(n)o/(q)uit): y
    Enter CUDA Samples Location [ default is /home/user ]:
    

    如果安装没有报错,重启使驱动生效。检查驱动状态: nvidia-smi

    • 修改环境变量: (可编辑 .bashrc 或 /etc/profile 文件)
    $ export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
    $ export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    

    编译运行CUDA测试用例(自行查看官方说明)。

    安装完以后的操作

    The PATH variable needs to include /usr/local/cuda-9.0/bin

    To add this path to the PATH variable:

    $ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
    

    In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-9.0/lib64 on a 64-bit system, or /usr/local/cuda-9.0/lib on a 32-bit system

    • To change the environment variables for 64-bit operating systems:

      $ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64
                               ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
      
    • To change the environment variables for 32-bit operating systems:

      $ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib
                               ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
      

    Note that the above paths change when using a custom install path with the runfile installation method.

    验证安装

    • 验证驱动版本

    When the driver is loaded, the driver version can be found by executing the command

    $ cat /proc/driver/nvidia/version
    
    • 编译例子进行测试

    The version of the CUDA Toolkit can be checked by running nvcc -V in a terminal window. The nvcc command runs the compiler driver that compiles CUDA programs. It calls the gcc compiler for C code and the NVIDIA PTX compiler for the CUDA code.

    The NVIDIA CUDA Toolkit includes sample programs in source form. You should compile them by changing to ~/NVIDIA_CUDA-9.0_Samples and typing make. The resulting binaries will be placed under ~/NVIDIA_CUDA-9.0_Samples/bin.

    • 运行二进制文件

    编译后,找到并运行 DEVICEQUERY 下 〜/ NVIDIA_CUDA- 9.0 _Samples

    from the command line as the superuser.

    Running the bandwidthTest program ensures that the system and the CUDA-capable device are able to communicate correctly.

    继续安装cuDNN

    Unzip the cuDNN package.

    $ tar -xzvf cudnn-<>.tgz
    

    Copy the following files into the CUDA Toolkit directory.(注意路径)

    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
    

    编译运行CUDA测试用例(自行查看官方说明)

    解决问题的思路

    主要问题:NVIDIA驱动装不上(或者装完图形界面出现问题)、驱动装上了但和CUDA要求的版本不匹配

    1. 谨慎使用最新版NVIDIA驱动和Ubuntu自带的附加驱动
    2. 建议使用CUDA自带的GPU驱动程序,不建议单独安装驱动程序
    3. 建议参考官方教程进行安装
    4. 旧版教程参考意义不大
    5. 建议关掉图形界面进行安装

    Trouble Shootting

    1. 安装过程出现 "The driver installation is unable to locate the kernel source. ",使用uname -r查看内核版本,如果高于4.10,如4.13.0,CUDA 9.1自带驱动不支持最新Linux内核,请降级内核:

      sudo apt-get purge linux-image-4.13.0-26-generic
      sudo apt-get purge linux-headers-4.13.0-26-generic
      

      Update initramfs image:

      sudo update-initramfs -u
      

      重启,使用 uname -r 命令确保内核版本正确

    2. 如果确认内核版本没问题了,还出现(1)的问题,再尝试安装以下包:

      apt-get install linux-source
      apt-get source linux-image-$(uname -r)
      apt-get install linux-headers-$(uname -r)
      

    Tensorflow版本对应情况

    1535165665103

    Reference

    官网说明-cuDNN安装手册

    官方说明-Cuda安装手册9.2

    cuda9.0

    官方说明-tensorflow安装手册

    比较新的博客

    规整的博客,赞赞了

    版本对应问题

    Cuda安装与测试

    卸载原有CUDA安装更高版本CUDA

    更改Cudnn版本

    如何查看ubuntu下显卡是否已经成功安装

    Install NVIDIA Driver and CUDA.md

    NVIDIA驱动安装

    快速搭建深度学习环境上

    快速搭建深度学习环境下

    Later

    安装caffe2

    install caffe2 with anaconda

    安装facebook最新的网络

    老师推荐博客

  • 相关阅读:
    RegExp.$1
    Wide&Deep 模型学习教程
    docker 安装与使用的相关问题
    Centos 防火墙
    odoo ERP 系统安装与使用
    Linux 开机自动启动脚本
    intel RDT技术管理cache和memory_bandwidth
    tensorflow 中 inter_op 和 intra_op
    centos 7 安装 nginx 或 apache,及其比较
    Dependency injection in .NET Core的最佳实践
  • 原文地址:https://www.cnblogs.com/pprp/p/9540500.html
Copyright © 2020-2023  润新知