• spark下载安装,运行examples(spark一)


    1.官方网址

    http://spark.apache.org/

    image.png
    image.png

    2.点击下载

    下载最新版本目前是(2.4.3)
    此spark预设为hadoop2.7或者更高版本,我前面安装的是hadoop3.1.2后面试一下不知道兼容不
    具体地址:http://spark.apache.org/downloads.html

    image.png
    image.png

    跳转到此页面继续选择一个下载地址

    image.png
    image.png

    选择我们下载好的spark安装包上传到我们的虚拟机

    image.png
    image.png

    上传成功

    [shaozhiqi@hadoop102 opt]$ cd software/
    [shaozhiqi@hadoop102 software]$ ll
    total 739668
    -rw-rw-r--. 1 shaozhiqi shaozhiqi 332433589 Jun 23 19:59 hadoop-3.1.2.tar.gz
    -rw-rw-r--. 1 shaozhiqi shaozhiqi 194990602 Jun 23 19:59 jdk-8u211-linux-x64.tar.gz
    -rw-rw-r--. 1 shaozhiqi shaozhiqi 229988313 Jun 30 17:46 spark-2.4.3-bin-hadoop2.7.tgz
    

    解压

    [shaozhiqi@hadoop102 software]$ tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz -C /opt/module/
    

    进入解压后的spark目录

    [shaozhiqi@hadoop102 module]$ pwd
    /opt/module
    [shaozhiqi@hadoop102 module]$ ll
    total 12
    drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:48 hadoop-3.1.2
    drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:46 jdk1.8.0_211
    drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 May 1 13:19 spark-2.4.3-bin-hadoop2.7
    [shaozhiqi@hadoop102 module]$ cd spark-2.4.3-bin-hadoop2.7/
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ ls
    bin data jars LICENSE NOTICE R RELEASE yarn
    conf examples kubernetes licenses python README.md sbin
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
    

    3 相关文件解释

    3.1 有bin目录和sbin目录,sbin目录里放的都是负责管理集群的命令

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ cd sbin/
    [shaozhiqi@hadoop102 sbin]$ ls
    slaves.sh start-mesos-shuffle-service.sh stop-mesos-dispatcher.sh
    spark-config.sh start-shuffle-service.sh stop-mesos-shuffle-service.sh
    spark-daemon.sh start-slave.sh stop-shuffle-service.sh
    spark-daemons.sh start-slaves.sh stop-slave.sh
    start-all.sh start-thriftserver.sh stop-slaves.sh
    start-history-server.sh  stop-all.sh stop-thriftserver.sh
    start-master.sh stop-history-server.sh
    start-mesos-dispatcher.sh stop-master.sh
    [shaozhiqi@hadoop102 sbin]$
    

    3.2 bin目录里面是一些spark具体的操作命令,如提交任务等

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ cd bin/
    [shaozhiqi@hadoop102 bin]$ ls
    beeline load-spark-env.sh spark-class spark-shell spark-submit
    beeline.cmd pyspark spark-class2.cmd spark-shell2.cmd spark-submit2.cmd
    docker-image-tool.sh pyspark2.cmd spark-class.cmd spark-shell.cmd spark-submit.cmd
    find-spark-home pyspark.cmd sparkR spark-sql
    find-spark-home.cmd run-example sparkR2.cmd spark-sql2.cmd
    load-spark-env.cmd run-example.cmd sparkR.cmd spark-sql.cmd
    [shaozhiqi@hadoop102 bin]$
    

    3.3 Conf主要是spark的配置文件

    [shaozhiqi@hadoop102 conf]$ ll
    total 36
    -rw-r--r--. 1 shaozhiqi shaozhiqi 996 May 1 13:19 docker.properties.template
    -rw-r--r--. 1 shaozhiqi shaozhiqi 1105 May 1 13:19 fairscheduler.xml.template
    -rw-r--r--. 1 shaozhiqi shaozhiqi 2025 May  1 13:19 log4j.properties.template
    -rw-r--r--. 1 shaozhiqi shaozhiqi 7801 May 1 13:19 metrics.properties.template
    -rw-r--r--. 1 shaozhiqi shaozhiqi 865 May 1 13:19 slaves.template
    -rw-r--r--. 1 shaozhiqi shaozhiqi 1292 May 1 13:19 spark-defaults.conf.template
    -rwxr-xr-x. 1 shaozhiqi shaozhiqi 4221 May 1 13:19 spark-env.sh.template
    [shaozhiqi@hadoop102 conf]$ pwd
    /opt/module/spark-2.4.3-bin-hadoop2.7/conf
    [shaozhiqi@hadoop102 conf]$
    

    4. 操作

    4.1 重命名这三个配置文件:

    [shaozhiqi@hadoop102 conf]$ mv slaves.template slaves
    [shaozhiqi@hadoop102 conf]$ mv spark-defaults.conf.template spark-defaults.conf
    [shaozhiqi@hadoop102 conf]$ mv spark-env.sh.template spark-env.sh
    

    4.2修改slaves(配置worker)

    [shaozhiqi@hadoop102 conf]$ vim slaves
    # A Spark Worker will be started on each of the machines listed below.
    hadoop102
    hadoop103
    hadoop104
    

    4.3修改spark-env.sh,配置marster

    [shaozhiqi@hadoop102 conf]$ vim spark-env.sh
    SPARK_MASTER_HOST=hadoop102
    SPARK_MASTER_PORT=7077
    # Options for the daemons used in the standalone deploy mode
    # - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
    # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
    

    4.4分发到我们的其他机器

    [shaozhiqi@hadoop102 module]$ testxsync spark-2.4.3-bin-hadoop2.7/
    

    4.5检查是否分发成功

    103成功多了spark-2.4.3-bin-hadoop2.7

    [shaozhiqi@hadoop103 module]$ ll
    total 12
    drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:30 hadoop-3.1.2
    drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:19 jdk1.8.0_211
    drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 Jun 30 18:35 spark-2.4.3-bin-hadoop2.7
    [shaozhiqi@hadoop103 module]$
    

    104成功

    [shaozhiqi@hadoop104 ~]$ cd /opt/module/
    [shaozhiqi@hadoop104 module]$ ll
    total 12
    drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:27 hadoop-3.1.2
    drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:23 jdk1.8.0_211
    drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 Jun 30 18:35 spark-2.4.3-bin-hadoop2.7
    [shaozhiqi@hadoop104 module]$
    

    4.6单独启动spark(Hadoop的namenode和datanode都没有启动)

    [shaozhiqi@hadoop102 hadoop-3.1.2]$ jps
    12022 Jps
    [shaozhiqi@hadoop102 hadoop-3.1.2]$
    

    到spark目录

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-all.sh
    starting org.apache.spark.deploy.master.Master, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.master.Master-1-hadoop102.out
    hadoop104: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
    hadoop103: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
    hadoop102: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
    hadoop104: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
    hadoop104: JAVA_HOME is not set
    hadoop104: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
    hadoop103: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
    hadoop103: JAVA_HOME is not set
    hadoop103: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
    hadoop102: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
    hadoop102: JAVA_HOME is not set
    hadoop102: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
    

    日志中也有fail,验证下页面:

    image.png
    image.png


    Workers没有其他机器,启动失败

    4.7重新修改下我们的配置文件,先停掉spark

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/stop-all.sh
    export JAVA_HOME=/opt/module/jdk1.8.0_211
    export SPARK_MASTER_HOS=hadoop102
    export SPARK_MASTER_PORT=7077
    

    4.8重新分发下修改的配置

    [shaozhiqi@hadoop102 module]$ testxsync spark-2.4.3-bin-hadoop2.7/
    

    4.9重新启动spark

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-all.sh
    starting org.apache.spark.deploy.master.Master, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.master.Master-1-hadoop102.out
    hadoop103: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
    hadoop104: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
    hadoop102: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
    

    4.10验证:

    image.png
    image.png

    4.11查看进程:

    102

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ jps
    13217 Worker
    13297 Jps
    13135 Master
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
    

    103

    [shaozhiqi@hadoop103 conf]$ jps
    10528 Worker
    10601 Jps
    [shaozhiqi@hadoop103 conf]$
    

    104

    [shaozhiqi@hadoop104 module]$ jps
    11814 Jps
    11741 Worker
    [shaozhiqi@hadoop104 module]$
    

    4.12跑一个官方的示例

    查看示例版本

    [shaozhiqi@hadoop102 examples]$ cd jars
    [shaozhiqi@hadoop102 jars]$ ll
    total 2132
    -rw-r--r--. 1 shaozhiqi shaozhiqi 153982 May 1 13:19 scopt_2.11-3.7.0.jar
    -rw-r--r--. 1 shaozhiqi shaozhiqi 2023919 May 1 13:19 spark-examples_2.11-2.4.3.jar
    

    提交任务
    bin/spark-submit
    --class org.apache.spark.examples.SparkPi //指定一个主类
    --master spark://hadoop102:7077  //指明也提交给那个集群
    --executor-memory 1G //任务执行时的内存可不指定
    --total-executor-cores 2 // 执行executor个数
    ./examples/jars/spark-examples_2.11-2.4.3.jar //那个jar包执行
    100 //参数

    bin/spark-submit 
    --class org.apache.spark.examples.SparkPi 
    --master spark://hadoop102:7077 
    --executor-memory 1G 
    --total-executor-cores 2 
    ./examples/jars/spark-examples_2.11-2.4.3.jar 
    100
    

    查看我们的spark监控:发现了我们刚刚执行的任务在执行中

    image.png
    image.png

    4.13 Spark-shell也可以提交任务。会打开我们的Scala代码编辑器,这样我们可以直接写代码进行提交任务

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ bin/spark-shell --master spark://hadoop102:7077
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://hadoop102:4040
    Spark context available as 'sc' (master = spark://hadoop102:7077, app id = app-20190630044455-0001).
    Spark session available as 'spark'.
    Welcome to
     ____ __
     / __/__ ___ _____/ /__
     _ / _ / _ `/ __/ '_/
     /___/ .__/\_,_/_/ /_/\_ version 2.4.3
     /_/
    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
    Type in expressions to have them evaluated.
    Type :help for more information.
    scala>
    

    访问4.13.中的web ui http://hadoop102:4040

    之所以要替换成IP是因为我们的win10没有配置ip和机器名的映射,此页面的作用我后续会补充

    image.png
    image.png
  • 相关阅读:
    C#面向对象编程
    WPF Storyboard的启动
    WPF中的窗体Show()和ShowDialog()区别。
    四元数
    小学生四则运算
    小学生四则运算
    javascript ===与==的区别
    a标签的href与onclick中使用js的区别
    10步让你成为更优秀的程序员
    检查SQL Server被哪个进程占用,且杀进程。
  • 原文地址:https://www.cnblogs.com/shaozhiqi/p/11534882.html
Copyright © 2020-2023  润新知