• Spark集群搭建简要


    Spark集群搭建

    1 Spark编译

    1.1 下载源代码

    git clone git://github.com/apache/spark.git -b branch-1.6
    

    1.2 修改pom文件

    增加cdh5.0.2相关profile,如下:
    <profile>
      <id>cdh5.0.2</id>
      <properties>
    	<hadoop.version>2.3.0-cdh5.0.2</hadoop.version>
    	<hbase.version>0.96.1.1-cdh5.0.2</hbase.version>
    	<flume.version>1.4.0-cdh5.0.2</flume.version>
    	<zookeeper.version>3.4.5-cdh5.0.2</zookeeper.version>
      </properties>
    </profile>
    

    1.3 编译

    build/mvn -Pyarn -Pcdh5.0.2 -Phive -Phive-thriftserver -Pnative -DskipTests package
    

    上述命令,由于国外maven.twttr.com被墙,添加hosts,199.16.156.89 maven.twttr.com,再次执行。

    2 Spark集群搭建[SPARK ON YARN]

    2.1 修改配置文件

    --spark-env.sh--
    export SPARK_SSH_OPTS="-p9413"
    export HADOOP_CONF_DIR=/opt/hadoop/hadoop-cluster/modules/hadoop-2.3.0-cdh5.0.2/etc/hadoop
    export SPARK_EXECUTOR_INSTANCES=1
    export SPARK_EXECUTOR_CORES=4
    export SPARK_EXECUTOR_MEMORY=1G
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
    --slaves--
    192.168.3.211 hadoop-dev-211
    192.168.3.212 hadoop-dev-212
    192.168.3.213 hadoop-dev-213
    192.168.3.214 hadoop-dev-214
    

    2.2 集群规划,启动集群

    --集群规划--
    hadoop-dev-211	Master、Woker
    hadoop-dev-212  Woker
    hadoop-dev-213	Woker
    hadoop-dev-214	Woker
    --启动Master--
    sbin/start-master.sh
    --启动Wokers--
    sbin/start-slaves.sh
    

    2.3 查看界面

    3 集成hive

    将hive-site.xml和hive-log4j.properties至spark中conf目录
    

    4 Spark实例演示

    4.1 读取mysql数据至hive

    # 步骤1,启动spark-shell
    bin/spark-shell --jars lib_managed/jars/hadoop-lzo-0.4.17.jar 
    --driver-class-path /opt/hadoop/hadoop-cluster/modules/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.6-bin.jar
    # 步骤2,读取mysql数据
    val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop-dev-212:3306/hive","dbtable" -> "VERSION", "user" -> "hive", "password" -> "123456")).load();
    # 步骤3,转成hive表
    jdbcDF.saveAsTable("test");
  • 相关阅读:
    SQL Server和Oracle数据库索引介绍
    ITPUB上一个Oracle面试题
    国服《巫妖王之怒》3.35冰双持新手献礼指南
    WLK奥法输出循环
    flume架构初接触
    密码校验正则表达式(java 环境)
    初学Mahout测试kmeans算法
    身份证校验(java)
    gcc 中 O选项对空函数的优化
    《肖申克的救赎》语录
  • 原文地址:https://www.cnblogs.com/riordon/p/5670206.html
Copyright © 2020-2023  润新知