• spark on yarn详解


    1、参考文档:
    spark-1.3.0:http://spark.apache.org/docs/1.3.0/running-on-yarn.html
    spark-1.6.0:http://spark.apache.org/docs/1.6.0/running-on-yarn.html

    备注:从spark-1.6.0开始,spark on yarn命令有略微改变,具体参考官方文档,这里以spark 1.3.0集群为主。

    2、前期准备
    编译spark,参看文档:http://www.cnblogs.com/wcwen1990/p/7688027.html
    spark安装部署(包括local模式和standalone模式):http://www.cnblogs.com/wcwen1990/p/6889521.html

    3、spark on yarn配置:

    1)启动hadoop集群:

    sbin/hadoop-daemon.sh start namenode
    sbin/hadoop-daemon.sh start datanode

    sbin/yarn-daemon.sh start resourcemanager
    sbin/yarn-daemon.sh start nodemanager

    sbin/mr-jobhistory-daemon.sh start historyserver

    2)启动spark历史日志服务:

    sbin/start-history-server.sh

    3)查看进程信息:

    $ jps
    3182 DataNode
    3734 JobHistoryServer
    3949 Jps
    3555 NodeManager
    3295 ResourceManager
    3857 HistoryServer
    3094 NameNode

    4、spark-submit方式提交应用到yarn(提交可以以client模式和cluster模式进行应用提交):

    1)spark-1.3.0:

    $ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]

    For example:

    $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi
         --master yarn-cluster
         --num-executors 3
         --driver-memory 4g
         --executor-memory 2g
         --executor-cores 1
         --queue thequeue
         lib/spark-examples*.jar
         10

    2)spark-1.6.0:

    $ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]

    For example:

    $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi
         --master yarn
         --deploy-mode cluster
         --driver-memory 4g
         --executor-memory 2g
         --executor-cores 1
         --queue thequeue
         lib/spark-examples*.jar
         10

    5、spark-shell方式运行在yarn上(spark-shell只能通过client模式运行):

    1)spark-1.3.0:

    $ ./bin/spark-shell --master yarn-client

    2)spark-1.6.0:

    $ ./bin/spark-shell --master yarn --deploy-mode client

    6、测试,这里以spark-1.3.0为例:

    $ ./bin/spark-shell --master yarn-client

    spark on yarn模式下运行wordcount程序:

    scala> sc.textFile("/user/hadoop/mapreduce/wordcount/input/wc.input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).map(x => (x._2,x._1)).sortByKey(false).map(x => (x._2,x._1)).collect
    ... ...
    res0: Array[(String, Int)] = Array((scala,1), (hive,1), (oozie,1), (mapreduce,1), (zookeeper,1), (hue,1), (yarn,1), (sqoop,1), (kafka,1), (spark,1), (hadoop,1), (flume,1), (hdfs,1), (storm,1), (hbase,1))

    scala> sc.stop

    以上程序运行过程可以通过web ui查看详情,具体地址有以下几个:

    yarn:http://chavin.king:8088
    spark应用监控:http://chavin.king:4040
    历史日志服务:http://chavin.king:18080

  • 相关阅读:
    【ZOJ2112】【整体二分+树状数组】带修改区间第k大
    【POJ2104】【整体二分+树状数组】区间第k大
    【清澄A1333】【整体二分+二维树状数组】矩阵乘法(梁盾)
    【BZOJ2752】【线段树】高速公路
    【POJ2886】【线段树】Who Gets the Most Candies?
    【POJ2482】【线段树】Stars in Your Window
    【HDU4348】【主席树】To the moon
    JDBC
    java异常
    JavaScript对象
  • 原文地址:https://www.cnblogs.com/wcwen1990/p/7835319.html
Copyright © 2020-2023  润新知