• Ubuntu16.04下安装nvidia-docker2


    若docker-ce、nvidia、CUDA等都安装完成之后,开启docker服务时,能够正常运行,并有预测结果,那表示服务开启没问题;若都安装成功之后,用docker命令开启服务时,一直报错,可能表示你没有安装nvidia-docker2:
    报错信息:
    tf-serving@tfserving-KVM:~/model/yolo$ docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/f40efd9bd62fc00e53e1d48ea0cbbf8e2c76efdac28238239c6a0c49f52aaebc/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: : unknown.
    $ systemctl status docker.service
    
    结果:
    docker.service - Docker Application Container Engine
       Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
      Drop-In: /etc/systemd/system/docker.service.d
               └─override.conf
       Active: failed (Result: start-limit-hit) since 四 2019-06-20 11:43:26 CST; 8s ago
         Docs: https://docs.docker.com
      Process: 8024 ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime (code=exited, status=1/FAILURE)
     Main PID: 8024 (code=exited, status=1/FAILURE)
    
    6月 20 11:43:26 tfserving-KVM systemd[1]: Failed to start Docker Application Container Engine.
    6月 20 11:43:26 tfserving-KVM systemd[1]: docker.service: Unit entered failed state.
    6月 20 11:43:26 tfserving-KVM systemd[1]: docker.service: Failed with result 'exit-code'.
    6月 20 11:43:26 tfserving-KVM systemd[1]: docker.service: Service hold-off time over, scheduling restart.
    6月 20 11:43:26 tfserving-KVM systemd[1]: Stopped Docker Application Container Engine.
    6月 20 11:43:26 tfserving-KVM systemd[1]: docker.service: Start request repeated too quickly.
    6月 20 11:43:26 tfserving-KVM systemd[1]: Failed to start Docker Application Container Engine.
    6月 20 11:43:26 tfserving-KVM systemd[1]: docker.service: Unit entered failed state.
    6月 20 11:43:26 tfserving-KVM systemd[1]: docker.service: Failed with result 'start-limit-hit'.

    出现类似这样的信息,错误,解决办法,可尝试:

    (1)重新安装docker-ce及tensorflow_model_server;

    (2)安装nvidia-docker2,步骤如下:

    0 If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
    docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
    sudo apt-get purge -y nvidia-docker
     
    1. 创建用户组
    sudo groupadd docker
     
    2.添加用户进入docker组,以便该用户可以使用docker
    sudo gpasswd -a ${USER} docker
     
    3.重启服务
    sudo service docker restart
     
    4.修改docker仓库和存放目录
    sudo vim /etc/docker/daemon.json
     
    5.修改如下,其中https://cwoel6s9.mirror.aliyuncs.com 为从阿里服务器申请的镜像地址,/data/docker     为存放目录
    {
        "registry-mirrors": [
            "https://cwoel6s9.mirror.aliyuncs.com"
        ],
        "graph":"/data/docker",
        "storage-driver": "overlay",
        "runtimes": {
            "nvidia": {
                "path": "nvidia-container-runtime",
                "runtimeArgs": []
            }
        }
     
    6.重启docker,并使用docker info查看修改信息
    systemctl daemon-reload
    systemctl restart docker
    docker info
     
    7.nvidia-docker 安装,添加源
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |
      sudo apt-key add -
     
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |
      sudo tee /etc/apt/sources.list.d/nvidia-docker.list
     
    sudo apt-get update
     
    8.安装nvidia-docker2软件包并重新加载Docker守护程序配置
    sudo apt-get install nvidia-docker2
    sudo pkill -SIGHUP dockerd
     
    安装完成后,当需要使用gpu时,使用nvidia-docker代替docker!

    # 注意:执行这一步,如果没有安装nvidia驱动,会报错:
    # docker:ERROR response from Daemon....
    # 安装nvida驱动就可以了
  • 相关阅读:
    null和undefined的区别
    "NetworkError: 404 Not Found fontawesome-webfont.woff?v=4.0.3
    php字符串
    php数组
    Oracle 和 MySQL的区别(不完整)
    拦截器和过滤器的区别
    SpringMVC和Struts2的区别
    Redis的介绍
    SpringBoot入门(2)
    SpringBoot入门(1)
  • 原文地址:https://www.cnblogs.com/aidenzdly/p/10564374.html
Copyright © 2020-2023  润新知