• setup spark-cluster并运行sparksql example


    setup spark-cluster并运行sparksql example

    准备spark-standalone集群环境

    上传spark-2.1.1-bin-hadoop2.7.tgz到/root下

    tar zxvf spark-2.1.1-bin-hadoop2.7.tgz

    此为演示,生产环境略有不同

    cd spark-2.1.1-bin-hadoop2.7/conf

    cp spark-env.sh.template spark-env.sh

    echo SPARK_MASTER_HOST=0.0.0.0>>spark-env.sh

    cd ../sbin

    systemctl stop firewalld.service

    systemctl disable firewalld.service

    在开发机上确保telnet masterip 7077正常

    cd spark-2.1.1-bin-hadoop2.7/sbin

    [root@t430 sbin]# ./start-all.sh

    starting org.apache.spark.deploy.master.Master, logging to /root/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-t430.out

    localhost: starting org.apache.spark.deploy.worker.Worker, logging to /root/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-t430.out

    [root@t430 sbin]# cat /root/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-t430.out

    Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/jre/bin/java -cp /root/spark-2.1.1-bin-hadoop2.7/conf/:/root/spark-2.1.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host t430 --port 7077 --webui-port 8080

    ========================================

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

    17/05/27 11:01:43 INFO Master: Started daemon with process name: 25154@t430

    17/05/27 11:01:43 INFO SignalUtils: Registered signal handler for TERM

    17/05/27 11:01:43 INFO SignalUtils: Registered signal handler for HUP

    17/05/27 11:01:43 INFO SignalUtils: Registered signal handler for INT

    17/05/27 11:01:43 WARN Utils: Your hostname, t430 resolves to a loopback address: 127.0.0.1; using 192.168.1.110 instead (on interface wlp3s0)

    17/05/27 11:01:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

    17/05/27 11:01:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    17/05/27 11:01:44 INFO SecurityManager: Changing view acls to: root

    17/05/27 11:01:44 INFO SecurityManager: Changing modify acls to: root

    17/05/27 11:01:44 INFO SecurityManager: Changing view acls groups to:

    17/05/27 11:01:44 INFO SecurityManager: Changing modify acls groups to:

    17/05/27 11:01:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()

    17/05/27 11:01:44 INFO Utils: Successfully started service 'sparkMaster' on port 7077.

    17/05/27 11:01:44 INFO Master: Starting Spark master at spark://t430:7077

    17/05/27 11:01:44 INFO Master: Running Spark version 2.1.1

    17/05/27 11:01:44 INFO Utils: Successfully started service 'MasterUI' on port 8080.

    17/05/27 11:01:44 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://192.168.1.110:8080

    17/05/27 11:01:44 INFO Utils: Successfully started service on port 6066.

    17/05/27 11:01:44 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066

    17/05/27 11:01:44 INFO Master: I have been elected leader! New state: ALIVE

    17/05/27 11:01:47 INFO Master: Registering worker 192.168.1.110:15325 with 8 cores, 14.4 GB RAM

    导出计算任务jar

    以JavaSparkHiveExample为例

    package :org.apache.spark.examples.sql

    检查JavaSparkHiveExample.java

    确保其中:

    .config("spark.master", "local") //HERE

    spark.sql("LOAD DATA LOCAL INPATH '/tmp/examples/src/main/resources/kv1.txt' INTO TABLE src");

    Figure 1右键JavaSparkHiveExample.java选择export

    clip_image001

    Figure 2注意导出的jar文件

    clip_image002

    上传计算任务和数据文件

    上传项目内目录example,整个目录到服务器/tmp下,具体位置在上文代码修改部分。

    上传导出的sparksql.jar到/home下。

    [root@t430 ~]# ls -R /tmp/examples/

    /tmp/examples/:

    src

    /tmp/examples/src:

    main

    /tmp/examples/src/main:

    resources

    /tmp/examples/src/main/resources:

    employees.json full_user.avsc kv1.txt people.json people.txt user.avsc users.avro users.parquet

    [root@t430 ~]# ls /home/sparksql.jar

    /home/sparksql.jar

    [root@t430 ~]#

    提交任务jar到集群

    先上传资源文件和任务jar到集群后,执行下面命令

    [root@t430 spark-2.1.1-bin-hadoop2.7]# pwd

    /root/spark-2.1.1-bin-hadoop2.7

    [root@t430 spark-2.1.1-bin-hadoop2.7]#

    bin/spark-submit --class org.apache.spark.examples.sql.hive.JavaSparkHiveExample --master spark://192.168.1.110:7077 --executor-memory 10G --total-executor-cores 6 /home/sparksql.jar

    计算结果如下,部分。

    17/05/27 15:34:11 INFO CodeGenerator: Code generated in 8.29917 ms

    +---+------+---+------+

    |key| value|key| value|

    +---+------+---+------+

    | 2| val_2| 2| val_2|

    | 2| val_2| 2| val_2|

    | 4| val_4| 4| val_4|

    | 4| val_4| 4| val_4|

    | 5| val_5| 5| val_5|

    | 5| val_5| 5| val_5|

    | 5| val_5| 5| val_5|

    | 5| val_5| 5| val_5|

    | 5| val_5| 5| val_5|

    | 5| val_5| 5| val_5|

    | 8| val_8| 8| val_8|

    | 8| val_8| 8| val_8|

    | 9| val_9| 9| val_9|

    | 9| val_9| 9| val_9|

    | 10|val_10| 10|val_10|

    | 10|val_10| 10|val_10|

    | 11|val_11| 11|val_11|

    | 11|val_11| 11|val_11|

    | 12|val_12| 12|val_12|

    | 12|val_12| 12|val_12|

    +---+------+---+------+

    only showing top 20 rows

    17/05/27 15:34:11 INFO SparkUI: Stopped Spark web UI at http://192.168.1.110:4040

    17/05/27 15:34:11 INFO StandaloneSchedulerBackend: Shutting down all executors

    17/05/27 15:34:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down

    17/05/27 15:34:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

    17/05/27 15:34:11 INFO MemoryStore: MemoryStore cleared

    17/05/27 15:34:11 INFO BlockManager: BlockManager stopped

    17/05/27 15:34:11 INFO BlockManagerMaster: BlockManagerMaster stopped

    17/05/27 15:34:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

    17/05/27 15:34:11 INFO SparkContext: Successfully stopped SparkContext

    17/05/27 15:34:11 INFO ShutdownHookManager: Shutdown hook called

    17/05/27 15:34:11 INFO ShutdownHookManager: Deleting directory /tmp/spark-2c2b1724-6cc4-4e3f-8677-097a49c32709

    [root@t430 spark-2.1.1-bin-hadoop2.7]#

    总结

    本文演示了将sparksql-hive例子在spark-standalone集群中运行。未使用hdfs。

  • 相关阅读:
    原来生成函数这么简单
    p1919 A*B Problem升级版
    线性基初步
    高斯消元详解
    FFT模板
    BSGS(大小步)算法
    p1516&poj1061&bzoj1477 青蛙的约会
    p1082 同余方程
    qboimathtest1 t1 魔法串
    qboimathtest1 t2 配对
  • 原文地址:https://www.cnblogs.com/wifi0/p/6950162.html
Copyright © 2020-2023  润新知