深度学习环境的搭建

一份深度学习环境配置的备忘。

安装 Ubuntu16.04

推荐使用 Linux 系统，可以少给自己找些麻烦，既然要吃程序员这口饭，使用 Linux 这关是逃不开的。

Linux 有许多发行版，具体用哪个版本倒不必过分细究，不过还是推荐使用稳定的、使用人数多的发行版，比如 Ubuntu16.04 LST。

首先需要做一个 U 盘启动盘，接着分区。最好上 SSD 用来装系统，再挂一个 4T 的机械硬盘，分区可以如下：

系统区用 efi 格式，200M 或者 300M 左右差不多了
swap 分区，32G 左右差不多
剩余部分 EXT4 格式挂到根目录/下
最后把机械硬盘挂载上

系统安装上之后还有些动作是需要做的：

更换默认的源
安装个搜狗输入法，打字方便不少
安装个 open ssh，肯定会远程连的

其实还有的折腾，装机先到这里吧。

常出版本问题的库

有几个库经常出一些问题，比如说 numpy、cuda/cudnn 和 protobuf。

numpy

Numpy是不得不用的python科学计算基础库，被几乎所有的深度学习框架依赖，目前版本已经到1.6。

Tips：Numpy出问题，基本上就是版本问题，框架未必支持最新版，最好不要装最新版本的，要自己制定版本安装。

cuda/cudnn

搞深度学习得有块 GPU 吧，我看有很多人用 cuda8，不过还是比较推荐安装 cuda10，最近刚出的 TensorFlow2.0 就必须使用 cuda10。先安装 nvidia-driver ，再安装 cuda。

注:NVIDIM 的驱动要在 cuda 之前安装，见NVIDIA 环境配置这一小节。

官网下载链接 https://developer.nvidia.com/cuda-downloads

# 显卡驱动安装

1.下载官网内Linux版本的.run安装包
2. cd xxx/xxx #以目录usr/ser为例,通过cd usr/ser进入安装包存放目录
3. sudo service lightdm stop #关闭图形界面
4. ctrl+alt+F1 #若此时长时间黑屏，通过此命令进入命令行
5. cd xxx/xxx #在命令行进入安装包存放目录
6. sudo bash NVIDIA-Linux-x86_64-xx.xx.run --no-opengl-files  

#只安装驱动文件，默认enter选项，进入命令行 
#–no-opengl-files 只安装驱动文件，不安装OpenGL文件。这个参数最重要
7. sudo reboot #重启电脑。
8. nvidia-smi或nvidia-settings  #显示驱动详细信息，即安装成功

protobuf

当初安装 caffe 的时候，因为 protobuf 的版本问题被折腾得够呛，可以用 protobuf --version 看看当前的版本，不出意外因该是 2.6.1，有点低。

建议自己找个目录安装 3.4 以上的版本，和系统隔离，让一些包比如caffe编译的时候依赖上自己的这个库。

https://github.com/protocolbuffers/protobuf/releases

常用软件

接着可以安装 opencv、caffe、tensorflow、pytorch、anaconda 了。

OpenCV

OpenCV 安装写过一篇：Ubuntu16.04配置OpenCV环境。

还有视频教学的: http://space.bilibili.com/365916694?

NVIDIA 环境配置

nvidim-smi

配置 nvidia-driver 环境，官网下载链接： https://www.nvidia.com/Download/index.aspx?lang=cn。

# 命令行下载
sudo ubuntu-drivers devices

# 以上命令可能会遇到问题
The program 'ubuntu-drivers' is currently not installed. You can install it by typing:
sudo apt install ubuntu-drivers-common

# 进入命令行界面 Ctrl + ALt + F1
# 比如当我们安装 NVIDIA 的驱动程序时，就需要先关闭 X server
sudo service lightdm stop

# 以上命令可能遇到问题
Failed to stop lightdm.service: Unit lightdm.service not loaded.
how-to-install-nvidia-run

# 安装命令
# 只安装驱动文件，不安装 opengl 文件
sudo ./NVIDIA*.run -no-opengl-files 
# 重启 X service
sudo service lightdm start
# 查看驱动安装结果
nvdia-smi

Ubuntu 16.04 关闭x server

Ubuntu中LightDM是什么（转）

nvdia 驱动安装常见问题

笔记本双显卡系统，登录界面无限循环，无法进入桌面
- 普通笔记本默认采用集显作为视频输出，此时没有关闭 opengl 文件的安装，会继续使用 ubuntu 默认的 nouveau 驱动，而后者已经被禁掉
- sudo ./NVIDIA*.run -no-opengl-files 只安装驱动文件，不安装 opengl 文件
The Nouveau Kernel driver is currently in use by your system
- 禁用 ubuntu 默认的驱动 nouveau
- vim /etc/modprobe.d/blacklist.conf
- blacklist nouveau # 禁用 nouveau 驱动
- sudo update0initramfs -u # 更新 kernel
- lsmod | grep nouveau # 查看是否更新
https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07

Caffe

Caffe 挺不好装的，可以 fork 一下这个项目 https://github.com/longpeng2008/Caffe_Long 然后参照这个 Makefile。这里最好指定 opencv 和 protobuf 的位置。

nvidia-driver, cuda, caffe 是递进的依赖关系，后者必须依赖于前置，另外 cudnn 的安装时可选的
本文安装方法只使用与 ubuntu 系统
一般安装过程不顺利，问题出在 nvida-driver 上

Caffe 安装

Caffe 安装可以参见 Caffe Installation，高于 17.04 的 ubuntu 可以一键安装 caffe，不过我们将在 ubuntu16.04 上安装 caffe。

安装 caffe 需要依赖很多东西，安装顺序也是有讲究的：

升级 pip 和 pip3，理顺 ubuntu 上自带的的 python 环境
自行编译 protobuf https://github.com/protocolbuffers/protobuf/blob/master/src/README.md
自行编译 openblas https://blog.csdn.net/y5492853/article/details/79558194
安装 opencv http://wykvictor.github.io/2018/08/01/OpenCV-6.html 这里需要自己再扩展一下，哪些装哪些不装
安装 NVIDIA 和 cuda
安装 caffe
最后安装 Anaconda，不然会对编译 opencv 造成影响 How To Install the Anaconda Python Distribution on Ubuntu 16.04

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev
protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libatlas-base-dev
sudo install OpenBLAS
sudo apt-get install libopenblas-dev
==============================================
# 使用三哥定制版本的 caffe
git clone https://github.com/YujieShui/Caffe_Long
# 接着修改 makefile

==============================================
# 这个是使用官网版本的 caffe，我这里使用三哥定制版本的 caffe
git clone https://github.com/BVLC/caffe.git
cd caffe
# 此时编译就会遇到很多的问题,看接下里给出的解决方案,再编译
cp Makefile.config.example Makefile.config
make -j4

Caffe 配置文件解析

## Refer to http://caffe.berkeleyvision.org/installation.html 
# Contributions simplifying and improving our build system are welcome! 
# cuDNN acceleration switch (uncomment to build with cuDNN). 
# 是否使用 cudnn 加速
# USE_CUDNN := 1 
# CPU-only switch (uncomment to build without GPU support). 
# CPU_ONLY := 1 
# uncomment to disable IO dependencies and corresponding data layers 
# 是否使用功能 opencv
# USE_OPENCV := 0 
# 是否使用 LEVELDB 输入格式
# USE_LEVELDB := 0 
# 是否使用 USE_LMDB 输入格式
# USE_LMDB := 0 
# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary) 
# You should not set this flag if you will be reading LMDBs with any 
# possibility of simultaneous read and write 
# ALLOW_LMDB_NOLOCK := 1 
# Uncomment if you're using OpenCV 3 
# 是否使用 opencv3，这里开启
OPENCV_VERSION := 3 
# To customize your choice of compiler, uncomment and set the following. 
# N.B. the default for Linux is g++ and the default for OSX is clang++ 
# CUSTOM_CXX := g++ 
# CUDA directory contains bin/ and lib/ directories that we need. 
# cuda 的目录
CUDA_DIR := /usr/local/cuda-8.0 
# On Ubuntu 14.04, if cuda tools are installed via 
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead: 
# CUDA_DIR := /usr 
# CUDA architecture setting: going with all of them. 
# For CUDA < 6.0, comment the *_50 lines for compatibility. 
# 与 cuda 架构有关 http://www.caffecn.cn/?/question/1077
# cuda10 要注释掉前两个
CUDA_ARCH := -gencode arch=compute_20,code=sm_20  
            -gencode arch=compute_20,code=sm_21  
            -gencode arch=compute_30,code=sm_30  
            -gencode arch=compute_35,code=sm_35  
            -gencode arch=compute_50,code=sm_50  
            -gencode arch=compute_50,code=compute_50 
# BLAS choice: 
# 矩阵加速库选择
# atlas for ATLAS (default) 
# mkl for MKL 
# open for OpenBlas 
BLAS := open 
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories. 
# Leave commented to accept the defaults for your choice of BLAS 
# (which should work)! 
BLAS_INCLUDE := /opt/OpenBLAS/include 
BLAS_LIB := /opt/OpenBLAS/lib 
# Homebrew puts openblas in a directory that is not on the standard search path 
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include 
# BLAS_LIB := $(shell brew --prefix openblas)/lib 
# This is required only if you will compile the matlab interface. 
# MATLAB directory should contain the mex binary in /bin. 
# MATLAB_DIR := /usr/local/MATLAB/R2014b 
# MATLAB_DIR := /Applications/MATLAB_R2012b.app 
# NOTE: this is required only if you will compile the python interface. 
# We need to be able to find Python.h and numpy/arrayobject.h. 
# python 路径和 python 接口，PYTHON_INCLUDE PYTHON_LIB， WITH_PYTHON_LAYER
PYTHON_INCLUDE := /usr/include/python2.7  
/usr/lib/python2.7/dist-packages/numpy/core/include 
# Anaconda Python distribution is quite popular. Include path: 
# Verify anaconda location, sometimes it's in root. 
# ANACONDA_HOME := $(HOME)/anaconda 
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include  
# $(ANACONDA_HOME)/include/python2.7  
# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include  
# We need to be able to find libpythonX.X.so or .dylib. 
PYTHON_LIB := /usr/lib 
# PYTHON_LIB := $(ANACONDA_HOME)/lib 
# Homebrew installs numpy in a non standard path (keg only) 
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include 
# PYTHON_LIB += $(shell brew --prefix numpy)/lib 
# Uncomment to support layers written in Python (will link against Python libs) 
WITH_PYTHON_LAYER := 1 
# Whatever else you find you need goes here. 
# 其它依赖库，这里依赖了自己编译的 protobuf 和 opencv
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/local/include /usr/include/hdf5/serial /home/longpeng/opts/opencv3.2/include /home/longpeng/opts/protobuf3.1/include /usr/local/lib/python2.7/dist-packages/numpy/core/include/ 
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/local/lib64 /usr/lib/x86_64-linux-gnu/hdf5/serial /usr/local/cuda-8.0/lib64 /home/longpeng/opts/opencv3.2/lib /home/longpeng/opts/protobuf3.1/lib 
# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies 
# INCLUDE_DIRS += $(shell brew --prefix)/include 
# LIBRARY_DIRS += $(shell brew --prefix)/lib 
# Uncomment to use `pkg-config` to specify OpenCV library paths. 
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.) 
# USE_PKG_CONFIG := 1 
BUILD_DIR := build 
DISTRIBUTE_DIR := distribute 
#build with support for Python layers 
WITH_PYTHON_LAYER:=1 
# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171 
# DEBUG := 1 
# The ID of the GPU that 'make runtest' will use to run unit tests. 
TEST_GPUID := 0 
# enable pretty build (comment to see full commands) 
Q ?= @

Caffe 编译问题列表

# without using a build folder, for me there will be linking problem says libcaffe.o cant not find “xxx"
https://github.com/BVLC/caffe/issues/2348#issuecomment-97093859

# fatal error: hdf5.h: No such file or directory
http://homeway.me/2018/01/25/setup-caffe-for-deep-learning/

# Unsupported gpu architecture ‘compute_20'
https://blog.csdn.net/kemgine/article/details/78781377
http://www.caffecn.cn/?/question/1077

# 使用 opencv3
OPENCV_VERSION := 3
https://github.com/BVLC/caffe/issues/3517

# 是否使用 python 定义网络,要打开
# Uncomment to support layers written in Python (will link against Python libs)
WITH_PYTHON_LAYER := 1

Anaconda

Anaconda 在 OpenCV 和 Caffe 装好之后装，不然会导致 OpenCV 和 Caffe 的 Python 库依赖到 Anaconda 上去，我们希望它们使用系统的 Python。

具体安装我有写过一篇: 如何使用Anaconda

TensorFlow

现在 TensorFlow2.0 已经出了，不过没有发行正式版，无法通过 anaconda 安装，需要通过 pip 进行安装。

前面的环境配置好之后，TensorFlow 的安装并不难。

其它

其它可能还会安装一下 Docker，日常用的软件等。原则就是用到啥就下载啥，用啥自己心里要有数，别把环境搞乱就行。

不过也别害怕，不是生产环境顶多重新配，就是会浪费点时间，还是谨慎点好。

Ubuntu平台实录——CUDA安装

相关阅读:
【类的继承与派生】学习笔记
 c++类的学习笔记
 c++链表
 实验六--类和对象
 mission3--dp
POJ2718Smallest Difference(暴力全排列)
我也不知道该起什么标题....
noip2014题解
 Windows平台整合SpringBoot+KAFKA__第2部分_代码编写前传
 Windows平台整合SpringBoot+KAFKA_第1部分_环境配置部分
原文地址：https://www.cnblogs.com/shuiyj/p/13185108.html