原创 未名小宇宙 2018-01-20 22:48:35 @引用自今日头条
莫问出处 莫问归途
本文演示在Kubernetes集群上搭建TensorFlow集群。
集群如下:(请不要问node1哪去了,因为TA罢工了)
一、安装Docker
安装:yum install docker
启动:
systemctl enable docker
systemctl start docker
验证:
二、安装ETCD
安装:yum install etcd
配置:/etc/etcd/etcd.conf
启动:
systemctl enable etcd
systemctl start etcd
验证:
三、安装K8S
安装:
yum install kubernetes-master
yum install kubernetes-node
配置:
/etc/kubernetes/config:
/etc/kubernetes/apiserver:
/etc/kubernetes/controller-manager:
启动:
systemctl enable kube-apiserver
systemctl start kube-apiserver
systemctl enable kube-controller-manager
systemctl start kube-controller-manager
systemctl enable kube-scheduler
systemctl start kube-scheduler
systemctl enable kubelet
systemctl start kubelet
systemctl enable kube-proxy
systemctl start kube-proxy
验证:
四、安装TF
1、提前下载tf镜像:docker pull tensorflow/tensorflow:latest
备注:docker官方下载速度可能很慢,可以从其他源下载。
2、提前安装rhsm,否则无法下载pod-infrastructure
yum install *rhsm*
3、编写RC:tf-rc.yaml
4、运行RC:kubectl create -f tf-rc.yaml
5、验证RC:
kubectl get rc
kubectl get pod
6、编写Service:tf-svc.yaml
7、运行Service:kubectl create -f tf-svc.yaml
8、验证Service:
kubectl get svc
curl http://192.168.0.180:30000
五、使用TF
1、浏览器打开(通过cat /var/log/messages | grep 8888可以找到token):
2、编写TensorFlow程序,执行"Run":