setup spark-cluster并运行sparksql example
准备spark-standalone集群环境
上传spark-2.1.1-bin-hadoop2.7.tgz到/root下
tar zxvf spark-2.1.1-bin-hadoop2.7.tgz
此为演示,生产环境略有不同
cd spark-2.1.1-bin-hadoop2.7/conf |
cp spark-env.sh.template spark-env.sh |
echo SPARK_MASTER_HOST=0.0.0.0>>spark-env.sh |
cd ../sbin |
systemctl stop firewalld.service systemctl disable firewalld.service |
在开发机上确保telnet masterip 7077正常 |
cd spark-2.1.1-bin-hadoop2.7/sbin [root@t430 sbin]# ./start-all.sh starting org.apache.spark.deploy.master.Master, logging to /root/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-t430.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /root/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-t430.out [root@t430 sbin]# cat /root/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-t430.out Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/jre/bin/java -cp /root/spark-2.1.1-bin-hadoop2.7/conf/:/root/spark-2.1.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host t430 --port 7077 --webui-port 8080 ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/05/27 11:01:43 INFO Master: Started daemon with process name: 25154@t430 17/05/27 11:01:43 INFO SignalUtils: Registered signal handler for TERM 17/05/27 11:01:43 INFO SignalUtils: Registered signal handler for HUP 17/05/27 11:01:43 INFO SignalUtils: Registered signal handler for INT 17/05/27 11:01:43 WARN Utils: Your hostname, t430 resolves to a loopback address: 127.0.0.1; using 192.168.1.110 instead (on interface wlp3s0) 17/05/27 11:01:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/05/27 11:01:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/05/27 11:01:44 INFO SecurityManager: Changing view acls to: root 17/05/27 11:01:44 INFO SecurityManager: Changing modify acls to: root 17/05/27 11:01:44 INFO SecurityManager: Changing view acls groups to: 17/05/27 11:01:44 INFO SecurityManager: Changing modify acls groups to: 17/05/27 11:01:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 17/05/27 11:01:44 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 17/05/27 11:01:44 INFO Master: Starting Spark master at spark://t430:7077 17/05/27 11:01:44 INFO Master: Running Spark version 2.1.1 17/05/27 11:01:44 INFO Utils: Successfully started service 'MasterUI' on port 8080. 17/05/27 11:01:44 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://192.168.1.110:8080 17/05/27 11:01:44 INFO Utils: Successfully started service on port 6066. 17/05/27 11:01:44 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066 17/05/27 11:01:44 INFO Master: I have been elected leader! New state: ALIVE 17/05/27 11:01:47 INFO Master: Registering worker 192.168.1.110:15325 with 8 cores, 14.4 GB RAM |
导出计算任务jar
以JavaSparkHiveExample为例
package :org.apache.spark.examples.sql
检查JavaSparkHiveExample.java
确保其中:
|
spark.sql("LOAD DATA LOCAL INPATH '/tmp/examples/src/main/resources/kv1.txt' INTO TABLE src"); |
Figure 1右键JavaSparkHiveExample.java选择export
Figure 2注意导出的jar文件
上传计算任务和数据文件
上传项目内目录example,整个目录到服务器/tmp下,具体位置在上文代码修改部分。
上传导出的sparksql.jar到/home下。
[root@t430 ~]# ls -R /tmp/examples/ /tmp/examples/: src /tmp/examples/src: main /tmp/examples/src/main: resources /tmp/examples/src/main/resources: employees.json full_user.avsc kv1.txt people.json people.txt user.avsc users.avro users.parquet [root@t430 ~]# ls /home/sparksql.jar /home/sparksql.jar [root@t430 ~]# |
提交任务jar到集群
先上传资源文件和任务jar到集群后,执行下面命令
[root@t430 spark-2.1.1-bin-hadoop2.7]# pwd /root/spark-2.1.1-bin-hadoop2.7 [root@t430 spark-2.1.1-bin-hadoop2.7]# |
bin/spark-submit --class org.apache.spark.examples.sql.hive.JavaSparkHiveExample --master spark://192.168.1.110:7077 --executor-memory 10G --total-executor-cores 6 /home/sparksql.jar |
计算结果如下,部分。
17/05/27 15:34:11 INFO CodeGenerator: Code generated in 8.29917 ms +---+------+---+------+ |key| value|key| value| +---+------+---+------+ | 2| val_2| 2| val_2| | 2| val_2| 2| val_2| | 4| val_4| 4| val_4| | 4| val_4| 4| val_4| | 5| val_5| 5| val_5| | 5| val_5| 5| val_5| | 5| val_5| 5| val_5| | 5| val_5| 5| val_5| | 5| val_5| 5| val_5| | 5| val_5| 5| val_5| | 8| val_8| 8| val_8| | 8| val_8| 8| val_8| | 9| val_9| 9| val_9| | 9| val_9| 9| val_9| | 10|val_10| 10|val_10| | 10|val_10| 10|val_10| | 11|val_11| 11|val_11| | 11|val_11| 11|val_11| | 12|val_12| 12|val_12| | 12|val_12| 12|val_12| +---+------+---+------+ only showing top 20 rows 17/05/27 15:34:11 INFO SparkUI: Stopped Spark web UI at http://192.168.1.110:4040 17/05/27 15:34:11 INFO StandaloneSchedulerBackend: Shutting down all executors 17/05/27 15:34:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down 17/05/27 15:34:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/05/27 15:34:11 INFO MemoryStore: MemoryStore cleared 17/05/27 15:34:11 INFO BlockManager: BlockManager stopped 17/05/27 15:34:11 INFO BlockManagerMaster: BlockManagerMaster stopped 17/05/27 15:34:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/05/27 15:34:11 INFO SparkContext: Successfully stopped SparkContext 17/05/27 15:34:11 INFO ShutdownHookManager: Shutdown hook called 17/05/27 15:34:11 INFO ShutdownHookManager: Deleting directory /tmp/spark-2c2b1724-6cc4-4e3f-8677-097a49c32709 [root@t430 spark-2.1.1-bin-hadoop2.7]# |
总结
本文演示了将sparksql-hive例子在spark-standalone集群中运行。未使用hdfs。