• 让spark运行在mesos上 -- 分布式计算系统spark学习(五)


    mesos集群部署参见上篇

    运行在mesos上面和 spark standalone模式的区别是:

    1)stand alone

    需要自己启动spark master

    需要自己启动spark slaver(即工作的worker)

    2)运行在mesos

    启动mesos master

    启动mesos slaver

    启动spark的 ./sbin/start-mesos-dispatcher.sh -m mesos://127.0.0.1:5050

    配置spark的可执行程序的路径(也就是mesos里面所谓EXECUTOR),提供给mesos下载运行。

    在mesos上面的运行流程:

    1)通过spark-submit提交任务到spark-mesos-dispatcher

    2)spark-mesos-dispatcher 把通过driver 提交到mesos master,并收到任务ID

    3)mesos master 分配到slaver 让它执行任务

    4) spark-mesos-dispatcher,通过任务ID查询任务状态

    前期配置:

    1) spark-env.sh

    配置mesos库,以及spark可以执行的二进制程序包(这里偷懒用了官网的包,这里URI支持hdfs,http)。

    #到spark的安装目录
    vim conf/spark-env.sh
    
    # Options read by executors and drivers running inside the cluster
    # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    # - SPARK_CLASSPATH, default classpath entries to append
    # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
    # - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos
    MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
    SPARK_EXECUTOR_URI=http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz

    2)spark-defaults.conf

    # Default system properties included when running spark-submit.
    # This is useful for setting default environmental settings.
    
    # Example:
    # spark.master                     spark://master:7077
    # spark.eventLog.enabled           true
    # spark.eventLog.dir               hdfs://namenode:8021/directory
    # spark.serializer                 org.apache.spark.serializer.KryoSerializer
    # spark.driver.memory              5g
    # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    
    spark.executor.uri              http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
    spark.master                    mesos://10.230.136.197:5050

    3)  修改测试WordCount.java

    这里测试例子,可以参考 提交任务到spark

    /**
     * Illustrates a wordcount in Java
     */
    package com.oreilly.learningsparkexamples.mini.java;
    
    import java.util.Arrays;
    import java.util.List;
    import java.lang.Iterable;
    
    import scala.Tuple2;
    
    import org.apache.commons.lang.StringUtils;
    
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.JavaPairRDD;
    import org.apache.spark.api.java.JavaSparkContext;
    import org.apache.spark.api.java.function.FlatMapFunction;
    import org.apache.spark.api.java.function.Function2;
    import org.apache.spark.api.java.function.PairFunction;
    
    
    public class WordCount {
      public static void main(String[] args) throws Exception {
        String inputFile = args[0];
        String outputFile = args[1];
        // Create a Java Spark Context.
        SparkConf conf = new SparkConf().setMaster("mesos://10.230.136.197:5050").setAppName("wordCount").set("spark.executor.uri", "http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz");
                    JavaSparkContext sc = new JavaSparkContext(conf);
        // Load our input data.
        JavaRDD<String> input = sc.textFile(inputFile);
        // Split up into words.
        JavaRDD<String> words = input.flatMap(
          new FlatMapFunction<String, String>() {
            public Iterable<String> call(String x) {
              return Arrays.asList(x.split(" "));
            }});
        // Transform into word and count.
        JavaPairRDD<String, Integer> counts = words.mapToPair(
          new PairFunction<String, String, Integer>(){
            public Tuple2<String, Integer> call(String x){
              return new Tuple2(x, 1);
            }}).reduceByKey(new Function2<Integer, Integer, Integer>(){
                public Integer call(Integer x, Integer y){ return x + y;}});
        // Save the word count back out to a text file, causing evaluation.
        counts.saveAsTextFile(outputFile);
            }
    }

    进入到examples/mini-complete-example 中,重新编译生成jar包。

    4)启动spark-mesos-dispather

    ./sbin/start-mesos-dispatcher.sh -m mesos://10.230.136.197:5050
    Spark Command: /app/otter/jdk1.7.0_80/bin/java -cp /home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.mesos.MesosClusterDispatcher --host vg-log-analysis-prod --port 7077 -m mesos://10.230.136.197:5050
    ========================================
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/11/05 11:31:36 INFO MesosClusterDispatcher: Registered signal handlers for [TERM, HUP, INT]
    15/11/05 11:31:36 WARN Utils: Your hostname, vg-log-analysis-prod resolves to a loopback address: 127.0.0.1; using 10.230.136.197 instead (on interface eth0)
    15/11/05 11:31:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    15/11/05 11:31:36 INFO MesosClusterDispatcher: Recovery mode in Mesos dispatcher set to: NONE
    15/11/05 11:31:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/11/05 11:31:37 INFO SecurityManager: Changing view acls to: qingpingzhang
    15/11/05 11:31:37 INFO SecurityManager: Changing modify acls to: qingpingzhang
    15/11/05 11:31:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(qingpingzhang); users with modify permissions: Set(qingpingzhang)
    15/11/05 11:31:37 INFO SecurityManager: Changing view acls to: qingpingzhang
    15/11/05 11:31:37 INFO SecurityManager: Changing modify acls to: qingpingzhang
    15/11/05 11:31:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(qingpingzhang); users with modify permissions: Set(qingpingzhang)
    15/11/05 11:31:37 INFO Utils: Successfully started service on port 8081.
    15/11/05 11:31:37 INFO MesosClusterUI: Started MesosClusterUI at http://10.230.136.197:8081
    WARNING: Logging before InitGoogleLogging() is written to STDERR
    W1105 03:31:37.927594  3374 sched.cpp:1487]
    **************************************************
    Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
    **************************************************
    I1105 03:31:37.931098  3408 sched.cpp:164] Version: 0.24.0
    I1105 03:31:37.939507  3406 sched.cpp:262] New master detected at master@10.230.136.197:5050
    I1105 03:31:37.940353  3406 sched.cpp:272] No credentials provided. Attempting to register without authentication
    I1105 03:31:37.943528  3406 sched.cpp:640] Framework registered with 20151105-021937-16777343-5050-32543-0001
    15/11/05 11:31:37 INFO MesosClusterScheduler: Registered as framework ID 20151105-021937-16777343-5050-32543-0001
    15/11/05 11:31:37 INFO Utils: Successfully started service on port 7077.
    15/11/05 11:31:37 INFO MesosRestServer: Started REST server for submitting applications on port 7077

    5)提交任务

    ./bin/spark-submit  --master mesos://10.230.136.197:7077 --deploy-mode cluster --class com.oreilly.learningsparkexamples.mini.java.WordCount  /home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar  /home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/README.md /home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/wordcounts.txt

    然后我们可以看到mesos-master的输出,收到任务,然后发送给slaver,最后更新任务状态:

    I1105 05:08:33.312283  7490 master.cpp:2094] Received SUBSCRIBE call for framework 'Spark Cluster' at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:33.312641  7490 master.cpp:2164] Subscribing framework Spark Cluster with checkpointing enabled and capabilities [  ]
    I1105 05:08:33.313761  7486 hierarchical.hpp:391] Added framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:08:33.315335  7490 master.cpp:4613] Sending 1 offers to framework 20151105-050710-3314083338-5050-7469-0000 (Spark Cluster) at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:33.426009  7489 master.cpp:2739] Processing ACCEPT call for offers: [ 20151105-050710-3314083338-5050-7469-O0 ] on slave 20151105-045733-3314083338-5050-7152-S0 at slave(1)@10.29.23.28:5051 (ip-10-29-23-28.ec2.internal) for framework 20151105-050710-331408333$
    -5050-7469-0000 (Spark Cluster) at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:33.427104  7486 hierarchical.hpp:814] Recovered cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000] (total: cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000], allocated: ) on slave 20151105-045733-3314083338-5050-7152-S0 from framework
    20151105-050710-3314083338-5050-7469-0000
    I1105 05:08:39.177790  7484 master.cpp:4613] Sending 1 offers to framework 20151105-050710-3314083338-5050-7469-0000 (Spark Cluster) at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:39.181149  7489 master.cpp:2739] Processing ACCEPT call for offers: [ 20151105-050710-3314083338-5050-7469-O1 ] on slave 20151105-045733-3314083338-5050-7152-S0 at slave(1)@10.29.23.28:5051 (ip-10-29-23-28.ec2.internal) for framework 20151105-050710-331408333$
    -5050-7469-0000 (Spark Cluster) at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:39.181699  7485 hierarchical.hpp:814] Recovered cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000] (total: cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000], allocated: ) on slave 20151105-045733-3314083338-5050-7152-S0 from framework
    20151105-050710-3314083338-5050-7469-0000
    I1105 05:08:44.183100  7486 master.cpp:4613] Sending 1 offers to framework 20151105-050710-3314083338-5050-7469-0000 (Spark Cluster) at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:44.186468  7484 master.cpp:2739] Processing ACCEPT call for offers: [ 20151105-050710-3314083338-5050-7469-O2 ] on slave 20151105-045733-3314083338-5050-7152-S0 at slave(1)@10.29.23.28:5051 (ip-10-29-23-28.ec2.internal) for framework 20151105-050710-331408333$
    -5050-7469-0000 (Spark Cluster) at scheduler-fae58488-7d56-4661-a078-938d12871930@10.230.136.197:57392
    I1105 05:08:44.187100  7485 hierarchical.hpp:814] Recovered cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000] (total: cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000], allocated: ) on slave 20151105-045733-3314083338-5050-7152-S0 from framework
    20151105-050710-3314083338-5050-7469-0000
    
    I1105 05:12:18.668609  7489 master.cpp:4069] Status update TASK_FAILED (UUID: 8d30c637-b885-487b-b174-47232cc0e49f) for task driver-20151105131213-0002 of framework 20151105-050710-3314083338-5050-7469-0000 from slave 20151105-045733-3314083338-5050-7152-S0 at slave(1)@10
    .29.23.28:5051 (ip-10-29-23-28.ec2.internal)
    I1105 05:12:18.668689  7489 master.cpp:4108] Forwarding status update TASK_FAILED (UUID: 8d30c637-b885-487b-b174-47232cc0e49f) for task driver-20151105131213-0002 of framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:12:18.669001  7489 master.cpp:5576] Updating the latest state of task driver-20151105131213-0002 of framework 20151105-050710-3314083338-5050-7469-0000 to TASK_FAILED
    I1105 05:12:18.669373  7483 hierarchical.hpp:814] Recovered cpus(*):1; mem(*):1024 (total: cpus(*):8; mem(*):5986; disk(*):196338; ports(*):[31000-32000], allocated: ) on slave 20151105-045733-3314083338-5050-7152-S0 from framework 20151105-050710-3314083338-5050-7469-000
    0
    I1105 05:12:18.670912  7489 master.cpp:5644] Removing task driver-20151105131213-0002 with resources cpus(*):1; mem(*):1024 of framework 20151105-050710-3314083338-5050-7469-0000 on slave 20151105-045733-3314083338-5050-7152-S0 at slave(1)@10.29.23.28:5051 (ip-10-29-23-28
    .ec2.internal)

    mesos-slaver的输出,收到任务,然后下载spark的可执行文件失败,最后执行任务失败:

    1105 05:11:31.363765 17084 slave.cpp:1270] Got assigned task driver-20151105131130-0001 for framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.365025 17084 slave.cpp:1386] Launching task driver-20151105131130-0001 for framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.376075 17084 slave.cpp:4852] Launching executor driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/$
    0151105-050710-3314083338-5050-7469-0000/executors/driver-20151105131130-0001/runs/461ceb14-9247-4a59-b9d4-4b0a7947e353'
    I1105 05:11:31.376448 17085 containerizer.cpp:640] Starting container '461ceb14-9247-4a59-b9d4-4b0a7947e353' for executor 'driver-20151105131130-0001' of framework '20151105-050710-3314083338-5050-7469-0000'
    I1105 05:11:31.376878 17084 slave.cpp:1604] Queuing task 'driver-20151105131130-0001' for executor driver-20151105131130-0001 of framework '20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.379096 17083 linux_launcher.cpp:352] Cloning child process with flags =
    I1105 05:11:31.382968 17083 containerizer.cpp:873] Checkpointing executor's forked pid 17098 to '/tmp/mesos/meta/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-050710-3314083338-5050-7469-0000/executors/driver-20151105131130-0001/runs/461ceb14-9247-4a$
    9-b9d4-4b0a7947e353/pids/forked.pid'
    E1105 05:11:31.483093 17078 fetcher.cpp:515] Failed to run mesos-fetcher: Failed to fetch all URIs for container '461ceb14-9247-4a59-b9d4-4b0a7947e353' with exit status: 256
    E1105 05:11:31.483355 17079 slave.cpp:3342] Container '461ceb14-9247-4a59-b9d4-4b0a7947e353' for executor 'driver-20151105131130-0001' of framework '20151105-050710-3314083338-5050-7469-0000' failed to start: Failed to fetch all URIs for container '461ceb14-9247-4a59-b9d$
    -4b0a7947e353' with exit status: 256
    I1105 05:11:31.483444 17079 containerizer.cpp:1097] Destroying container '461ceb14-9247-4a59-b9d4-4b0a7947e353'
    I1105 05:11:31.485548 17084 cgroups.cpp:2433] Freezing cgroup /sys/fs/cgroup/freezer/mesos/461ceb14-9247-4a59-b9d4-4b0a7947e353
    I1105 05:11:31.487112 17080 cgroups.cpp:1415] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/461ceb14-9247-4a59-b9d4-4b0a7947e353 after 1.48992ms
    I1105 05:11:31.488673 17082 cgroups.cpp:2450] Thawing cgroup /sys/fs/cgroup/freezer/mesos/461ceb14-9247-4a59-b9d4-4b0a7947e353
    I1105 05:11:31.490102 17082 cgroups.cpp:1444] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/461ceb14-9247-4a59-b9d4-4b0a7947e353 after 1.363968ms
    I1105 05:11:31.583328 17082 containerizer.cpp:1284] Executor for container '461ceb14-9247-4a59-b9d4-4b0a7947e353' has exited
    I1105 05:11:31.583977 17081 slave.cpp:3440] Executor 'driver-20151105131130-0001' of framework 20151105-050710-3314083338-5050-7469-0000 exited with status 1
    I1105 05:11:31.585134 17081 slave.cpp:2717] Handling status update TASK_FAILED (UUID: 0c429164-a159-4e72-9872-c5fbd2ef9004) for task driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000 from @0.0.0.0:0
    W1105 05:11:31.585384 17084 containerizer.cpp:988] Ignoring update for unknown container: 461ceb14-9247-4a59-b9d4-4b0a7947e353
    I1105 05:11:31.585605 17084 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: 0c429164-a159-4e72-9872-c5fbd2ef9004) for task driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.585911 17084 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_FAILED (UUID: 0c429164-a159-4e72-9872-c5fbd2ef9004) for task driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.596305 17081 slave.cpp:3016] Forwarding the update TASK_FAILED (UUID: 0c429164-a159-4e72-9872-c5fbd2ef9004) for task driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000 to master@10.230.136.197:5050
    I1105 05:11:31.611620 17083 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 0c429164-a159-4e72-9872-c5fbd2ef9004) for task driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.611702 17083 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_FAILED (UUID: 0c429164-a159-4e72-9872-c5fbd2ef9004) for task driver-20151105131130-0001 of framework 20151105-050710-3314083338-5050-7469-0000
    I1105 05:11:31.616345 17083 slave.cpp:3544] Cleaning up executor 'driver-20151105131130-0001' of framework 20151105-050710-3314083338-5050-7469-0000

    那么这个问题咋解决呢?

    去mesos的slaver上面查看运行日志(默认是在/tmp/mesos/slaves/目录下),发现尼玛,这个跟stand alone还不一样(stand alone模式下面,spark的master会建立一个http server,把jar包提供给spark worker下载执行,但是mesos模式居然不是这样,jar包需要是放到hdfs 或者http 路径下面?坑爹了),这有点不合理啊,亲。

    /tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105151012-0001/runs/bed1c620-e849-4076-b130-c95c47133599$ cat stderr
    I1105 07:10:17.387609 17897 fetcher.cpp:414] Fetcher Info: {"cache_directory":"/tmp/mesos/fetch/slaves/20151105-045733-3314083338-5050-7152-S0/qingpingzhang","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz"}}],"sandbox_directory":"/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105151012-0001/runs/bed1c620-e849-4076-b130-c95c47133599","user":"qingpingzhang"}
    I1105 07:10:17.390316 17897 fetcher.cpp:369] Fetching URI '/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar'
    I1105 07:10:17.390344 17897 fetcher.cpp:243] Fetching directly into the sandbox directory
    I1105 07:10:17.390384 17897 fetcher.cpp:180] Fetching URI '/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar'
    I1105 07:10:17.390418 17897 fetcher.cpp:160] Copying resource with command:cp '/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar' '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105151012-0001/runs/bed1c620-e849-4076-b130-c95c47133599/learning-spark-mini-example-0.0.1.jar'
    cp: cannot stat ‘/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar’: No such file or directory
    Failed to fetch '/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar': Failed to copy with command 'cp '/home/qingpingzhang/dev/spark-1.5.1-bin-hadoop2.6/examples/mini-complete-example/target/learning-spark-mini-example-0.0.1.jar' '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105151012-0001/runs/bed1c620-e849-4076-b130-c95c47133599/learning-spark-mini-example-0.0.1.jar'', exit status: 256
    Failed to synchronize with slave (it's probably exited)

    好吧,为了能够跑通测试,现在把jar包和readme.txt文件copy到mesos slaver的机器上面去。

    ./bin/spark-submit  --master mesos://10.230.136.197:7077 --deploy-mode cluster --class com.oreilly.learningsparkexamples.mini.java.WordCount  /tmp/learning-spark-mini-example-0.0.1.jar  /tmp/README.md /tmp/wordcounts.txt

    果然任务就执行成功了,meso-slave输入出如下:

    I1105 07:43:26.748515 17244 slave.cpp:1270] Got assigned task driver-20151105154326-0002 for framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:26.749575 17244 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000' from gc
    I1105 07:43:26.749703 17247 gc.cpp:84] Unscheduling '/tmp/mesos/meta/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000' from gc
    I1105 07:43:26.749825 17246 slave.cpp:1386] Launching task driver-20151105154326-0002 for framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:26.760673 17246 slave.cpp:4852] Launching executor driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks$
    20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105154326-0002/runs/910a4983-2732-41dd-b014-66827c044c16'
    I1105 07:43:26.760967 17244 containerizer.cpp:640] Starting container '910a4983-2732-41dd-b014-66827c044c16' for executor 'driver-20151105154326-0002' of framework '20151105-070418-3314083338-5050-12075-0000'
    I1105 07:43:26.761265 17246 slave.cpp:1604] Queuing task 'driver-20151105154326-0002' for executor driver-20151105154326-0002 of framework '20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:26.763134 17249 linux_launcher.cpp:352] Cloning child process with flags =
    I1105 07:43:26.766726 17249 containerizer.cpp:873] Checkpointing executor's forked pid 18129 to '/tmp/mesos/meta/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105154326-0002/runs/910a4983-2732-4$dd-b014-66827c044c16/pids/forked.pid'
    I1105 07:43:33.153153 17244 slave.cpp:2379] Got registration for executor 'driver-20151105154326-0002' of framework 20151105-070418-3314083338-5050-12075-0000 from executor(1)@10.29.23.28:54580
    I1105 07:43:33.154284 17246 slave.cpp:1760] Sending queued task 'driver-20151105154326-0002' to executor 'driver-20151105154326-0002' of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:33.160464 17243 slave.cpp:2717] Handling status update TASK_RUNNING (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 from executor(1)@10.29.23.28:54580
    I1105 07:43:33.160643 17242 status_update_manager.cpp:322] Received status update TASK_RUNNING (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:33.160940 17242 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_RUNNING (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:33.168092 17243 slave.cpp:3016] Forwarding the update TASK_RUNNING (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 to master@10.230.136.197:5050
    I1105 07:43:33.168218 17243 slave.cpp:2946] Sending acknowledgement for status update TASK_RUNNING (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 to executor(1)@10.29.23.28:54580
    I1105 07:43:33.171906 17247 status_update_manager.cpp:394] Received status update acknowledgement (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:33.172025 17247 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_RUNNING (UUID: eb83c08d-19fc-4d20-ab17-6f69ee8d90b2) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:36.454344 17249 slave.cpp:3926] Current disk usage 1.10%. Max allowed age: 6.223128215149259days
    I1105 07:43:39.174698 17247 slave.cpp:1270] Got assigned task 0 for framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:39.175014 17247 slave.cpp:1386] Launching task 0 for framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:39.185343 17247 slave.cpp:4852] Launching executor 20151105-045733-3314083338-5050-7152-S0 of framework 20151105-070418-3314083338-5050-12075-0001 with resources cpus(*):1; mem(*):1408 in work directory '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-$
    0/frameworks/20151105-070418-3314083338-5050-12075-0001/executors/20151105-045733-3314083338-5050-7152-S0/runs/8b233e11-54a6-41b6-a8ad-f82660d640b5'
    I1105 07:43:39.185643 17247 slave.cpp:1604] Queuing task '0' for executor 20151105-045733-3314083338-5050-7152-S0 of framework '20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:39.185694 17246 containerizer.cpp:640] Starting container '8b233e11-54a6-41b6-a8ad-f82660d640b5' for executor '20151105-045733-3314083338-5050-7152-S0' of framework '20151105-070418-3314083338-5050-12075-0001'
    I1105 07:43:39.185786 17247 slave.cpp:1270] Got assigned task 1 for framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:39.185931 17247 slave.cpp:1386] Launching task 1 for framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:39.185968 17247 slave.cpp:1604] Queuing task '1' for executor 20151105-045733-3314083338-5050-7152-S0 of framework '20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:39.187925 17245 linux_launcher.cpp:352] Cloning child process with flags =
    I1105 07:43:46.388809 17249 slave.cpp:2379] Got registration for executor '20151105-045733-3314083338-5050-7152-S0' of framework 20151105-070418-3314083338-5050-12075-0001 from executor(1)@10.29.23.28:36458
    I1105 07:43:46.389571 17244 slave.cpp:1760] Sending queued task '0' to executor '20151105-045733-3314083338-5050-7152-S0' of framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:46.389883 17244 slave.cpp:1760] Sending queued task '1' to executor '20151105-045733-3314083338-5050-7152-S0' of framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:49.534858 17246 slave.cpp:2717] Handling status update TASK_RUNNING (UUID: b6d77a8f-ef6f-414b-9bdc-0fa1c2a96841) for task 1 of framework 20151105-070418-3314083338-5050-12075-0001 from executor(1)@10.29.23.28:36458
    I1105 07:43:49.535087 17246 slave.cpp:2717] Handling status update TASK_RUNNING (UUID: 6633c11b-c403-45c6-82c5-f495fcc9ae70) for task 0 of framework 20151105-070418-3314083338-5050-12075-0001 from executor(1)@10.29.23.28:36458
    #......更多日志....
    I1105 07:43:53.012852 17245 slave.cpp:2946] Sending acknowledgement for status update TASK_FINISHED (UUID: 72c9805d-ea9d-49c1-ac2a-5c3940aa77f5) for task 3 of framework 20151105-070418-3314083338-5050-12075-0001 to executor(1)@10.29.23.28:36458
    I1105 07:43:53.016926 17244 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 5f175f09-2a35-46dd-9ac9-fbe648d9780a) for task 2 of framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:53.017357 17247 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 72c9805d-ea9d-49c1-ac2a-5c3940aa77f5) for task 3 of framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:53.146410 17246 slave.cpp:1980] Asked to shut down framework 20151105-070418-3314083338-5050-12075-0001 by master@10.230.136.197:5050
    I1105 07:43:53.146461 17246 slave.cpp:2005] Shutting down framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:53.146515 17246 slave.cpp:3751] Shutting down executor '20151105-045733-3314083338-5050-7152-S0' of framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:53.637172 17247 slave.cpp:2717] Handling status update TASK_FINISHED (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 from executor(1)@10.29.23.28:54580
    I1105 07:43:53.637706 17246 status_update_manager.cpp:322] Received status update TASK_FINISHED (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:53.637755 17246 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_FINISHED (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:53.643517 17242 slave.cpp:3016] Forwarding the update TASK_FINISHED (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 to master@10.230.136.197:5050
    I1105 07:43:53.643635 17242 slave.cpp:2946] Sending acknowledgement for status update TASK_FINISHED (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000 to executor(1)@10.29.23.28:54580
    I1105 07:43:53.647647 17246 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:53.647703 17246 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_FINISHED (UUID: 9fad65ef-d651-4d2c-91a7-bac8cf4b55a4) for task driver-20151105154326-0002 of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:54.689678 17248 containerizer.cpp:1284] Executor for container '910a4983-2732-41dd-b014-66827c044c16' has exited
    I1105 07:43:54.689708 17248 containerizer.cpp:1097] Destroying container '910a4983-2732-41dd-b014-66827c044c16'
    I1105 07:43:54.691368 17248 cgroups.cpp:2433] Freezing cgroup /sys/fs/cgroup/freezer/mesos/910a4983-2732-41dd-b014-66827c044c16
    I1105 07:43:54.693023 17245 cgroups.cpp:1415] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/910a4983-2732-41dd-b014-66827c044c16 after 1.624064ms
    I1105 07:43:54.694628 17249 cgroups.cpp:2450] Thawing cgroup /sys/fs/cgroup/freezer/mesos/910a4983-2732-41dd-b014-66827c044c16
    I1105 07:43:54.695976 17249 cgroups.cpp:1444] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/910a4983-2732-41dd-b014-66827c044c16 after 1312us
    I1105 07:43:54.697335 17246 slave.cpp:3440] Executor 'driver-20151105154326-0002' of framework 20151105-070418-3314083338-5050-12075-0000 exited with status 0
    I1105 07:43:54.697371 17246 slave.cpp:3544] Cleaning up executor 'driver-20151105154326-0002' of framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:54.697621 17245 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105154326-0002/runs/910a4983-2732-41dd-b014-66827c044c16' for gc 6.99999192621333days i
    n the future
    I1105 07:43:54.697669 17245 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105154326-0002' for gc 6.99999192552296days in the future
    I1105 07:43:54.697700 17245 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105154326-0002/runs/910a4983-2732-41dd-b014-66827c044c16' for gc 6.99999192509333d
    ays in the future
    I1105 07:43:54.697713 17246 slave.cpp:3633] Cleaning up framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:54.697726 17245 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000/executors/driver-20151105154326-0002' for gc 6.99999192474667days in the future
    I1105 07:43:54.697756 17245 status_update_manager.cpp:284] Closing status update streams for framework 20151105-070418-3314083338-5050-12075-0000
    I1105 07:43:54.697813 17244 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000' for gc 6.99999192391407days in the future
    I1105 07:43:54.697856 17244 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20151105-045733-3314083338-5050-7152-S0/frameworks/20151105-070418-3314083338-5050-12075-0000' for gc 6.99999192340148days in the future
    I1105 07:43:58.147456 17248 slave.cpp:3820] Killing executor '20151105-045733-3314083338-5050-7152-S0' of framework 20151105-070418-3314083338-5050-12075-0001
    I1105 07:43:58.147546 17246 containerizer.cpp:1097] Destroying container '8b233e11-54a6-41b6-a8ad-f82660d640b5'
    I1105 07:43:58.149194 17246 cgroups.cpp:2433] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8b233e11-54a6-41b6-a8ad-f82660d640b5
    I1105 07:43:58.200407 17245 containerizer.cpp:1284] Executor for container '8b233e11-54a6-41b6-a8ad-f82660d640b5' has exited

    然后/tmp/wordcounts.txt/目录下也如愿以偿的出现了统计结果:

    ll /tmp/wordcounts.txt/
    total 28
    drwxr-xr-x  2 qingpingzhang qingpingzhang 4096 Nov  5 07:43 ./
    drwxrwxrwt 14 root          root          4096 Nov  5 07:43 ../
    -rw-r--r--  1 qingpingzhang qingpingzhang 1970 Nov  5 07:43 part-00000
    -rw-r--r--  1 qingpingzhang qingpingzhang   24 Nov  5 07:43 .part-00000.crc
    -rw-r--r--  1 qingpingzhang qingpingzhang 1682 Nov  5 07:43 part-00001
    -rw-r--r--  1 qingpingzhang qingpingzhang   24 Nov  5 07:43 .part-00001.crc
    -rw-r--r--  1 qingpingzhang qingpingzhang    0 Nov  5 07:43 _SUCCESS
    -rw-r--r--  1 qingpingzhang qingpingzhang    8 Nov  5 07:43 ._SUCCESS.crc

    至此,spark 运行在mesos框架上也算是跑通了。

    这里没有把mesos,spark的webui 中的图贴出来。

    总结一下:

    1)搭建好mesos集群

    2)安装好spark,启动spark的mesos-dispather(连接到mesos的matser上)

    3)采用spark-submit来提交任务(提交任务的时候,需要把jar包放到可以下载的地方,例如:hdfs,http等)

    -----------------

    使用mesos的好处,主要是能够动态的启动和分配任务,尽量利用好机器资源。

    由于我们的日志分析就2台机器,所以还是打算用spark的stand alone 模式启动。

  • 相关阅读:
    PHP的注释规范
    IP地址与,域名,DNS服务器,端口号的联系与概念
    转: CentOS上安装LAMP之第一步:Apache环境及安装过程报错解决方案(纯净系统环境)
    转:VMware中CentOS配置静态IP进行网络访问(NAT方式和桥接模式)
    虚拟主机详细的配置
    PHP操作MySQL
    【优化】EXPLAIN--type
    数据库范式
    【优化】碎片OPTIMIZE
    【原理】原理与优化(二)
  • 原文地址:https://www.cnblogs.com/zhangqingping/p/4939264.html
Copyright © 2020-2023  润新知