• spark cdh5编译安装[spark-1.0.2 hadoop2.3.0 cdh5.1.0]


    前提你得安装有Hadoop 我的版本hadoop2.3-cdh5.1.0

    1、下载maven包

    2、配置M2_HOME环境变量,配置maven 的bin目录到path路径

    3、export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

    4、到官方下载spark-1.0.2.gz压缩包、解压

    5、进入spark解压包目录

    6、执行./make-distribution.sh --hadoop 2.3.0-cdh5.1.0 --with-yarn --tgz

    7、漫长的等待

    8、完成后会在当前目录下生成spark-1.0.2-bin-2.3.0-cdh5.1.0.tgz

    9、复制到安装目录解压

    10、配置conf下的配置文件

    cp spark-env.sh.template spark-env.sh

    vim spark-env.sh

    配置参数:对应即可

    export JAVA_HOME=/home/hadoop/jdk
    export HADOOP_HOME=/home/hadoop/hadoop-2.3.0-cdh5.1.0
    export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
    export SPARK_YARN_APP_NAME=spark-on-yarn
    export SPARK_EXECUTOR_INSTANCES=1
    export SPARK_EXECUTOR_CORES=2
    export SPARK_EXECUTOR_MEMORY=3500m
    export SPARK_DRIVER_MEMORY=3500m
    export SPARK_MASTER_IP=master
    export SPARK_MASTER_PORT=7077
    export SPARK_WORKER_CORES=2
    export SPARK_WORKER_MEMORY=3500m
    export SPARK_WORKER_INSTANCES=1

    11、配置slaves

    slave01
    slave02
    slave03
    slave04
    slave05

    12、分发

    拷贝spark安装目录到各个slave节点

    13、启动

    sbin/start-all.sh

    14、运行实例

    $SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-client     --num-executors 3     --driver-memory 4g     --executor-memory 2g     --executor-cores 1     /home/hadoop/spark/lib/spark-examples-1.0.2-hadoop2.3.0-cdh5.1.0.jar     100

    15、发送实例竟然没成功

    在yarn监控界面点击日志出现一堆这些错误

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    16、解决问题

    将spark目录下lib包的spark核心包拿到本地,发现里面有一个yarn-defaul.xml文件,打开发现

      <!-- Resource Manager Configs -->
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>0.0.0.0</value>
      </property> 

    可想而知,到本地找resorcemanager,如果运行节点不是在yarn节点的resourcemanager上运行,怎么可能找到呢

    17、修改这个配置如下

      <!-- Resource Manager Configs -->
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
      </property> 

    18、打包重新分发spark到各个节点

  • 相关阅读:
    基础设计模式-04 复杂对象创建的建造器模式
    String,StringBuffer和StringBuilder
    js常用删除
    基础设计模式-03 从过滤器(Filter)校验链学习职责链模式
    基础设计模式-02 UML图
    注册全局方法,全局过滤器和全局组件
    py-faster-rcnn
    mac下编译cpu only caffe并用xCode建caffe工程
    OpenCV学习笔记(一)
    caffe生成voc格式lmdb
  • 原文地址:https://www.cnblogs.com/ningbj/p/3939888.html
Copyright © 2020-2023  润新知