• 配置spark历史服务(spark二)


    1. 编辑spark-defaults.conf位置文件

    添加spark.eventLog.enabled和spark.eventLog.dir的配置
    修改spark.eventLog.dir为我们之前在hdfs配置的端口
    hdfs配置参考hadoop(七)集群配置同步(hadoop完全分布式四)|9

    [shaozhiqi@hadoop102 conf]$ pwd
    /opt/module/spark-2.4.3-bin-hadoop2.7/conf
    [shaozhiqi@hadoop102 conf]$ vim spark-defaults.conf
    # spark.master spark://master:7077
    # spark.eventLog.enabled true
    # spark.eventLog.dir hdfs://namenode:8021/directory
    # spark.serializer org.apache.spark.serializer.KryoSerializer
    # spark.driver.memory 5g
    # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    spark.eventLog.enabled true
    spark.eventLog.dir hdfs://hadoop102:9000/directory
    

    2. 分发我们conf修改的配置文件

    分发配置参考hadoop(六)rsync远程同步|xsync集群分发(完全分布式准备三)|8

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ testxsync conf/
    

    找个机器看下是否同步成功

    [shaozhiqi@hadoop103 spark-2.4.3-bin-hadoop2.7]$ cd conf
    [shaozhiqi@hadoop103 conf]$ cat spark-defaults.conf
    # spark.master spark://master:7077
    # spark.eventLog.enabled true
    # spark.eventLog.dir     hdfs://namenode:8021/directory
    # spark.serializer org.apache.spark.serializer.KryoSerializer
    # spark.driver.memory 5g
    # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    spark.eventLog.enabled true
    spark.eventLog.dir hdfs://hadoop102:9000/directory
    [shaozhiqi@hadoop103 conf]$
    

    3. 启动我们的hdfs

    防止启动报错,先删除data logs 然后格式化namenode
    bin/hdfs namenode –format

    [shaozhiqi@hadoop102 hadoop-3.1.2]$ start-dfs.sh
    

    启动成功,查看进程

    [shaozhiqi@hadoop102 hadoop-3.1.2]$ start-dfs.sh
    Starting namenodes on [hadoop102]
    Starting datanodes
    hadoop103: WARNING: /opt/module/hadoop-3.1.2/logs does not exist. Creating.
    hadoop104: WARNING: /opt/module/hadoop-3.1.2/logs does not exist. Creating.
    Starting secondary namenodes [hadoop104]
    [shaozhiqi@hadoop102 hadoop-3.1.2]$ jps
    3088 Master
    3168 Worker
    4452 Jps
    3366 CoarseGrainedExecutorBackend
    4200 DataNode
    4076 NameNode
    3773 GetConf
    [shaozhiqi@hadoop102 hadoop-3.1.2]$
    

    Yarn等我们提交任务到yarn时再启动

    4. 查看我们的hdfs namenode ui

    image.png
    image.png

    5. 创建hdfs文件夹,和我们上面配置的spark-defaults.conf中的一样

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ hadoop fs -mkdir /directory
    

    再次查看:

    image.png
    image.png

    6. 再次修改spark-env.sh添加历史服务参数

    [shaozhiqi@hadoop102 conf]$ vi spark-env.sh
    export JAVA_HOME=/opt/module/jdk1.8.0_211
    export SPARK_MASTER_HOS=hadoop102
    export SPARK_MASTER_PORT=7077
    export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://hadoop102:9000/directory"
    

    7. 同步我们的spark-env.sh

    shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ testxsync conf/spark-env.sh
    

    8. 执行一个spark进程

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ bin/spark-submit 
    > --class org.apache.spark.examples.SparkPi 
    > --master spark://hadoop102:7077 
    > --executor-memory 1G 
    > --total-executor-cores 2 
    > ./examples/jars/spark-examples_2.11-2.4.3.jar 
    > 100
    

    9. 查看spark ui多了我们的进程

     

    image.png
    image.png


    点击spark pi进程,由于我们的任务还在执行,可以直接跳转

    image.png
    image.png

    10. 发现好久都没有执行完看下日志

    19/07/01 07:15:53 WARN TaskSchedulerImpl:Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    

    难道是没有资源了?
    点击kill掉spark shell和我们的spark Pi,然后单独提交spark Pi任务试下

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ bin/spark-submit 
    > --class org.apache.spark.examples.SparkPi 
    > --master spark://hadoop102:7077 
    > --executor-memory 1G 
    > --total-executor-cores 2 
    > ./examples/jars/spark-examples_2.11-2.4.3.jar 
    > 100
    
    image.png
    image.png

    可以看到50多秒句结束了
    当任务执行结束现在去访问spark 的4000,发现发问不了

    11. 开启历史服务就可以访问已结束的任务了

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-history-server.sh
    starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.history.HistoryServer-1-hadoop102.out
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ jps
    

    可以看到多了HistoryServer

    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ jps
    3505 Worker
    4708 HistoryServer
    4775 Jps
    4027 DataNode
    3437 Master
    3901 NameNode
    [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
    

    12. 访问history ui,成功

    image.png
    image.png

    13. 查看hdfsz有无生成执行结果文件

    文件已生成历史服务配置成功

    image.png
    image.png
  • 相关阅读:
    3种方式提高页面加载速度
    CSS中的层叠、特殊性、继承、样式表中的@import
    jQuery从零开始(二)
    jQuery从零开始(一)
    设计模式
    Vue-cli3脚手架工具快速创建一个项目
    Git上传到码云及其常见问题详解
    eclipse导入本地的svn项目后不能在team提交更新
    js拖拽上传图片
    axure 动态面板制作图片轮播 (01图片轮播)
  • 原文地址:https://www.cnblogs.com/shaozhiqi/p/11534895.html
Copyright © 2020-2023  润新知