docker on spark - 润新知

docker on spark
从docker 仓库 pull 镜像
docker pull sequenceiq/spark:1.4.0

构建 docker 镜像
docker build –rm -t sequenceiq/spark:1.4.0 .
-t 选项是你要构建的sequenceiq/spark image的tag，就好比ubuntu:13.10一样 –rm 选项是告诉Docker在构建完毕后删除暂时的Container，Dockerfile的每一行指令都会创建一个暂时的Container，一般你是不须要这些暂时生成的Container的

执行镜像

if using boot2docker make sure your VM has more than 2GB memory

in your /etc/hosts file add $(boot2docker ip) as host ‘sandbox’ to make it easier to access your sandbox UI

open yarn UI ports when running container

docker run -it -p 8088:8088 -p 8042:8042 -h sandbox sequenceiq/spark:1.4.0 bash
or
docker run -d -h sandbox sequenceiq/spark:1.4.0 -d

假设要进行交互式操作（比如Shell脚本），那我们必须使用-i -t參数同容器进行数据交互。可是当通过管道同容器进行交互时，就不须要使用-t參数

-h来设定hostname

假设使用-p或者-P，那么容器会开放部分port到主机。仅仅要对方能够连接到主机。就能够连接到容器内部。
当使用-P时，Docker会在主机中随机从49153 和65535之间查找一个未被占用的port绑定到容器。
你能够使用docker port来查找这个随机绑定port。

假设在docker run后面追加-d=true或者-d。那么容器将会执行在后台模式。此时全部I/O数据仅仅能通过网络资源或者共享卷组来进行交互。由于容器不再监听你执行docker run的这个终端命令行窗体。
但你能够通过执行docker attach来又一次附着到该容器的回话中。
须要注意的是。容器执行在后台模式下，是不能使用–rm选项的。

-p 8088:8088 这个port是resourcemanager 或者集群，-p 8042:8042 这个port是 nodemanagerport

版本号
Hadoop 2.6.0 and Apache Spark v1.4.0 on Centos

測试
There are two deploy modes that can be used to launch Spark applications on YARN.

YARN-client mode

In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. Estimating Pi (yarn-cluster mode): # execute the the following command which should write the "Pi is roughly 3.1418" into the logs # note you must specify --files argument in cluster mode to enable metrics spark-submit --class org.apache.spark.examples.SparkPi --files $SPARK_HOME/conf/metrics.properties --master yarn-cluster --driver-memory 1g --executor-memory 1g --executor-cores 1 $SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar

YARN-cluster mode

# execute the the following command which should print the "Pi is roughly 3.1418" to the screen spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1 $SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar
相关阅读:
jieba库分词统计
 第九次作业——测试报告和用户手册
 第八次作业——系统设计和任务分配
 第七次作业-团队选题报告和需求规格说明书
 第六次作业——结对项目之需求分析与原型设计
 小学四则运算的简单实现
 jieba库分词
 第九次团队作业——测试报告和用户手册
 第八次作业——系统设计与团队分配（个人）
团队项目之选题报告和需求规格说明书
原文地址：https://www.cnblogs.com/mfmdaoyou/p/6931820.html