• spark的standlone模式安装和application 提交


    spark的standlone模式安装

    安装一个standlone模式的spark集群,这里是最基本的安装,并测试一下如何进行任务提交。
    require:提前安装好jdk 1.7.0_80 ;scala 2.11.8
    可以参考官网的说明:http://spark.apache.org/docs/latest/spark-standalone.html

    1. 到spark的官网下载spark的安装包

    http://spark.apache.org/downloads.html

    spark-2.0.2-bin-hadoop2.7.tgz.tar

    2. 解压缩

    cd /home/hadoop/soft
    tar -zxvf spark-2.0.2-bin-hadoop2.7.tgz.tar
    ln -s /home/hadoop/soft/spark-2.0.2-bin-hadoop2.7 /usr/local/spark

    3.配置环境变量

    su - hadoop
    vi ~/.bashrc

    export SPARK_HOME="/usr/local/spark"
    export PATH="$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH"
    

    source ~/.bashrc
    which spark-shell

    4.修改spark的配置

    进入spark配置目录进行配置:

    cd /usr/local/spark/conf
    cp log4j.properties.template log4j.properties  ##修改 log4j.rootCategory=WARN, console
    
    cp spark-env.sh.template spark-env.sh
    
    

    vi spark-env.sh ##设置spark的环境变量,进入spark-env.sh文件添加:

    export SPARK_HOME=/usr/local/spark
    export SCALA_HOME=/usr/local/scala
    

    至此,Spark就已经安装好了

    5. 运行spark:

    Spark-Shell命令可以进入spark,可以使用Ctrl D组合键退出Shell:
    Spark-Shell

    hadoop@ubuntuServer01:~$ spark-shell 
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel).
    16/12/08 16:44:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/12/08 16:44:44 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
    Spark context Web UI available at http://192.168.17.50:4040
    Spark context available as 'sc' (master = local[*], app id = local-1481186684381).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.0.2
          /_/
             
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> 
    
    

    启动spark服务:
    start-master.sh ##

    hadoop@ubuntuServer01:~$ start-master.sh 
    starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-ubuntuServer01.out
    hadoop@ubuntuServer01:~$ jps
    2630 Master
    2683 Jps
    

    这里我们启动了主结点,jps多了一个Master的spark进程。
    如果主节点启动成功,master默认可以通过web访问:http://ubuntuServer01:8080,查看sparkMaster的UI。
    sparkMasterUI

    图中所述的spark://ubuntuServer01:7077 就是从结点启动的参数。
    spark的master节点HA可以通过zookeeper和Local File System两种方法实现,具体可以参考官方的文档 http://spark.apache.org/docs/latest/spark-standalone.html#high-availability。
    启动spark的slave从节点
    start-slave.sh spark://ubuntuServer01:7077

    hadoop@ubuntuServer01:~$ start-slave.sh spark://ubuntuServer01:7077
    starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-ubuntuServer01.out
    hadoop@ubuntuServer01:~$ jps
    2716 Worker
    2765 Jps
    2630 Master
    hadoop@ubuntuServer01:~$ 
    

    运行jps命令,发现多了一个spark的worker进程。UI页面上的workers列表中也多了一条记录。

    6. 运行一个Application在spark集群上。

    运行一个交互式的spark shell在spark集群中:通过如下命令行:
    spark-shell --master spark://ubuntuServer01:7077

    hadoop@ubuntuServer01:~$ spark-shell --master spark://ubuntuServer01:7077
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel).
    16/12/08 17:51:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/12/08 17:51:05 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
    Spark context Web UI available at http://192.168.17.50:4040
    Spark context available as 'sc' (master = spark://ubuntuServer01:7077, app id = app-20161208175104-0000).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.0.2
          /_/
             
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> 
    
    

    从运行日志中可以看到job的UI(Spark web UI)页面地址:http://192.168.17.50:4040
    和application id "app-20161208175104-0000",任务运行结束后,Spark web UI页面也会随之关闭。

    使用spark-submit脚本执行一个spark任务:

    spark-submit 
      --class org.apache.spark.examples.SparkPi 
      --master spark://ubuntuServer01:7077 
      --executor-memory 1G 
      --total-executor-cores 1 
      $SPARK_HOME/examples/jars/spark-examples_2.11-2.0.2.jar 
      10
    

    使用spark-submit 提交 application可以参考spark的官方文档。
    http://spark.apache.org/docs/latest/submitting-applications.html

  • 相关阅读:
    [灵魂拷问]MySQL面试高频100问(工程师方向)
    前后端分离模式下的权限设计方案
    Netty实战:设计一个IM框架
    超实用,Linux中查看文本的小技巧
    Java面试,如何在短时间内做突击
    挑战10个最难回答的Java面试题(附答案)
    SpringBoot是如何动起来的
    Lab_2_SysOps_VPC_Linux_v2.5
    Lab_1_SysOps_Compute_Linux_v2.5
    change-resource-tags.sh
  • 原文地址:https://www.cnblogs.com/honeybee/p/6146161.html
Copyright © 2020-2023  润新知