00、CUDA简介
CUDA和GPU的并行处理能力来加速深度学习和其他计算密集型应用程序
01、CPU+GPU协同架构
02、部署环境
[docker@lab-250 ~]$ cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.0 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.0"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION=7.0
Red Hat Enterprise Linux Server release 7.0 (Maipo)
Red Hat Enterprise Linux Server release 7.0 (Maipo)
[docker@lab-250 ~]$ uname -r
3.10.0-123.el7.x86_64
[docker@lab-250 ~]$ uname -a
Linux lab-250 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
注意:要在服务器上安装GPU显卡
03、下载CUDA-Tookit
https://developer.nvidia.com/cuda-toolkit-archive
CUDA Toolkit 9.0 (Sept 2017), Online Documentation //实验下载此版本,根据系统下载对应的安装包,建议选择本地集成成果包!
https://developer.nvidia.com/cuda-toolkit
注意:下面的安装,是由于系统是rhel7.0,错误认为是centos7.0导致部分rpm未安装需要单独下载。一般对应版本是不需要在额外下载rpm包
cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64-rpm #centos7,由于centos是基于rhel7的开源发行版本,所以名字rhel7
04、setup
Installation Instructions: rpm -i cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm yum clean all && yum makecache yum install cuda
Other installation options are available in the form of meta-packages.
For example, to install all the library packages, replace "cuda" with the "cuda-libraries-9-0" meta package
注意:安装cuda的时候它会自动找NVIDIA显卡的,不需要提前把NVIDIA显卡设置为默认显卡
错误处理:
https://mirrors.aliyun.com/epel/7/aarch64/Packages/d/dkms-2.6.1-1.el7.noarch.rpm
https://mirrors.aliyun.com/centos/7.6.1810/os/x86_64/Packages/libvdpau-1.1.1-3.el7.x86_64.rpm
--> Finished Dependency Resolution Error: Package: 1:nvidia-kmod-384.81-2.el7.x86_64 (cuda-9-0-local) Requires: dkms You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest [root@lab-250 ~]# rz -E rz waiting to receive. [root@lab-250 ~]# rpm -ivh dkms-2.6.1-1.el7.noarch.rpm warning: dkms-2.6.1-1.el7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY error: Failed dependencies: elfutils-libelf-devel is needed by dkms-2.6.1-1.el7.noarch [root@lab-250 ~]# [root@lab-250 ~]# yum install -y elfutils-libelf-devel Resolving Dependencies --> Running transaction check ---> Package elfutils-libelf-devel.x86_64 0:0.158-3.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved
[root@lab-250 ~]# rpm -ivh dkms-2.6.1-1.el7.noarch.rpm
warning: dkms-2.6.1-1.el7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:dkms-2.6.1-1.el7 ################################# [100%]
[root@lab-250 ~]#
[root@lab-250 ~]# yum install -y cuda
05、设置环境变量
/usr/local/cuda-9.0 #默认安装位置
vim /etc/profile
export CUDA_HOME="/usr/local/cuda-9.0"
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
source /etc/profile
[docker@lab-250 ~]$ nvcc -V #验证环境变量
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
[docker@lab-250 ~]$ nvidia-smi #查看本机GPU显卡信息,由于测试机未安装GPU显卡导致的
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installe
d and running.
引用:
https://baijiahao.baidu.com/s?id=1610852365402771191&wfr=spider&for=pc
https://www.jianshu.com/p/34a504af8d51