• Spark-on-yarn


    说明

    以下配置适合spark on yarn cluster

    版本对应

    spark scala hadoop
    2.3.1 2.11 2.9.1

    安装要求

    spark节点和yarn nodemanager节点保持一致

    安装步骤

    1.spark yarn方式打包

    将yarn依赖到spark中

    ./build/mvn -Pyarn -Phadoop-2.9 -Dhadoop.version=2.9.1 -DskipTests clean package
    
    

    2.关联HDFS

    • 复制 hdfs 配置文件到spark conf目录
    scp hadoop/etc/hadoop/core-site.xml spark/conf
    scp hadoop/etc/hadoop/hdfs-site.xml spark/conf
    
    
    • 配置spark-env.sh

    添加hadoop环境变量

    export HADOOP_HOME=/ddhome/bin/hadoop
    export HADOOP_CONF_DIR=/ddhome/bin/hadoop/etc/hadoop
    
    

    3.关联yarn

    • 上传spark yarn jars 到 hdfs
    hadoop fs -mkdir -p /spark/jars
    hadoop fs -put spark/jars/*.jar /spark/jars
    
    
    • 配置spark-default.xml

    添加spark.yarn.jars参数

    spark.yarn.jars hdfs://masters/spark/jars/*.jar
    
    

    4.HA

    • 配置spark-env.sh

    添加zookeeper参数

    export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=dddbva:2181,dddbvb:2181,dddcva:2181 -Dspark.deploy.zookeeper.dir=/spark"
    
    

    运行实例

    spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1024m --executor-memory 1024m --executor-cores 1 spark/examples/jars/spark-examples_2.11-2.3.1.jar 10
    
    

    运行过程中常见报错

    • 1 日志级别设置:默认WARN,改成INFO/DEBUG
    # 只有几行输出,看不到详细的内容,当初以为是挂了
    [root@ddcve hadoop]# spark-sql --master yarn
    18/09/04 08:57:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/09/04 08:57:58 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    18/09/04 08:57:58 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    18/09/04 08:58:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    
    # 解决措施:将日志级别改成INFO/DEBUG就可以看到具体输出了
    
    
    • 2 找不到yarn resourcemanager,缺失hadoop节点
    ERROR:
    2014-08-11 20:10:59,795 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030<br>
    2014-08-11 20:11:01,838 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)<br>
    
    # 解决措施:在spark节点上添加hadoop节点
    
    
    • 3 java.net.UnknownHostException: masters

    解决措施:复制hdfs core-site.xml, hdfs-site.xml到spark conf目录

    • 4 找不到spark yarn关联包, 需要对spark源码手动编译yarn模式打包

    手动编译打包

    WARN:
    2018-09-04 08:59:05 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.<br>
    2018-09-04 08:59:07 INFO  Client:54 - Uploading resource file:/tmp/spark-73483914-a54f-4e9e-ad19-5f0326a65c43/__spark_libs__1336817845101923206.zip -> hdfs://masters/user/root/.sparkStaging/application_1535967010469_0004/__spark_libs__1336817845101923206.zip
    
    # 解决措施:
    # spark yarn build
    ./build/mvn -Pyarn -Phadoop-2.9 -Dhadoop.version=2.9.1 -DskipTests clean package
    
    # 将spark/jars目录下的所有jar包上传到hdfs
    hadoop fs -mkdir -p /spark/jars
    hadoop fs -put jars/*.jar /spark/jars
    
    # 复制spark-defaults.conf.template, 添加spark.yarn.jars配置项
    cp conf/spark-defaults.conf.template conf/spark-defaults.conf
    spark.yarn.jars hdfs://masters/spark/jars/*.jar
    
    
  • 相关阅读:
    MySQL_Sql_打怪升级_进阶篇_进阶1:基础查询
    [20210218]shared latch spin count 5.txt
    springboot逻辑删除
    springboot整合mybatis-plus的两种分页查询方式--自定义的结果集分页返回浏览器
    spring图片上传并返回绝对路径
    springboot+mybatisplus 逻辑删除
    mybatisplus枚举类
    Could not find resource com/meng/dao/UserMapper.xml
    jsp页面表单提交后mysql未更新原因
    springMVC拦截器的使用
  • 原文地址:https://www.cnblogs.com/dzqk/p/10008823.html
Copyright © 2020-2023  润新知