• Dream------hive on spark


    一、Hive on Spark是Hive跑在Spark上,用的是Spark执行引擎,而不是MapReduce,和Hive on Tez的道理一样。
    并且用的是$HIVE_HOME/bin/hive,liunx命令运行客户端
     
    这个时候需要下载spark的源码并且要重新编译,一个不支持hive的版本。
     
    步骤:
    1、下载spark1.4.1的源码
    https://github.com/apache/spark/tree/v1.4.1

    并解压

    2、使用编译命令:
    ./make-distribution.sh --name "hadoop-2.6.0" --tgz "-Dyarn.version=2.6.0 -Dhadoop.version=2.6.0 -Pyarn"
     
    3、配置spark-env.sh文件
     
    export JAVA_HOME=/usr/local/soft/jdk1.7.0
         #export SPARK_MASTER_IP=hadoop-spark01
         export SPARK_MASTER_WEBUI_PORT=8099
         #export SPARK_MASTER_IP=localhost
         export SPARK_MASTER_PORT=7077
         export SPARK_WORKER_CORES=2
         export SPARK_WORKER_INSTANCES=2
         export SPARK_WORKER_MEMORY=1g
         #export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/nfs/spark/recovery"
         export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop-spark01:2181,hadoop-spark02:2181,hadoop-spark03:2181 -Dspark.deploy.zookeeper.dir=/home/data/spark/zkdir" (这是spark的HA配置)    
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
         export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
         export HIVE_CONF_DIR=$HIVE_HOME/conf
         export SPARK_HOME=/usr/local/soft/spark-1.4.1-bin-hadoop-2.6.0
         export SPARK_CLASSPATH=/usr/local/soft/sparkclasspath/mysql-connector-java-5.1.38-bin.jar:/usr/local/soft/sparkclasspath/hiv
    e-hbase-handler-1.2.1.jar:/usr/local/soft/sparkclasspath/hbase-common-1.1.2.jar:/usr/local/soft/sparkclasspath/hbase-client-1.1.2.jar:/usr/local/soft/sparkclasspath/hbase-protocol-1.1.2.jar:/usr/local/soft/sparkclasspath/hbase-server-1.1.2.jar:/usr/local/soft/sparkclasspath/protobuf-java-2.5.0.jar:/usr/local/soft/sparkclasspath/htrace-core-3.1.0-incubating.jar:/usr/local/soft/sparkclasspath/guava-12.0.1.jar:/usr/local/soft/sparkclasspath/hive-exec-1.2.1.jar     
    #export SPARK_LIBRARY_PATH=/usr/local/soft/hbase-1.1.2/lib
         export SPARK_JAR=/usr/local/soft/spark-1.4.1-bin-hadoop-2.6.0/lib/spark-assembly-1.4.1-hadoop2.6.0.jar
         export PATH=$SPARK_HOME/bin:$PATH
     
    4、将spark-assembly-1.4.1-hadoop2.6.0.jar包,拷贝到$HIVE_HOME/lib目录下
     
    5、修改hive-site.xml
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop-spark01:9083</value>
    <description>Thrift URI forthe remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>
     
    <property>
    <name>hive.server2.thrift.min.worker.threads</name>
    <value>5</value>
    <description>Minimum number of Thrift worker threads</description>
    </property>
     
    <property>
    <name>hive.server2.thrift.max.worker.threads</name>
    <value>500</value>
    <description>Maximum number of Thrift worker threads</description>
    </property>
     
    <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
    </property>
     
    <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>hadoop-spark01</value>
    <description>Bind host on which to run the HiveServer2 Thrift interface.Can be overridden by setting$HIVE_SERVER2_THRIFT_BIND_HOST</description>
    </property>
     <property>
       <name>spark.serializer</name>
    <value>org.apache.spark.serializer.KryoSerializer</value>
    </property>
    <property>
    <name>spark.eventLog.enabled</name>
     <value>true</value>
     </property>
     <property>
      <name>spark.eventLog.dir</name>
      <value>hdfs://founder/sparklog/logs</value>
     </property>
     <property>
    <name>spark.master</name>
      <value>spark://hadoop-spark01:7077,hadoop-spark02:7077</value>
     </property>
    还有这些参数也要配置上
    1、hive.exec.local.scratchdir
    /opt/hive-1.2/tmp
    2、hive.downloaded.resources.dir
    /opt/hive-1.2/resources
     
     
     
    配置Mysql数据库
    1、javax.jdo.option.ConnectionPassword
    123456
    2、javax.jdo.option.ConnectionURL
    jdbc:mysql://hadoop-spark01:3306/hive_db
     
    3、javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
     
    4、javax.jdo.option.ConnectionUserName
    root
     
     
    6、启动
    启动spark
    ./start-all.sh
    在backup-master节点上
    ./start-master
     
    启动hive
    ./hive
     
     
    ------------------------------------------------------------------------------------------------------------------------------------------------------
     
    二、使用beeline连接,这个比较使用,因为可以使用jdbc让客户端连接
         首先特么的这个是不用重新编译spark的源码的,他需要支持hive
        1、启动spark
        2、启动thriftserver
           cd $SPARK_HOME/sbin
           ./start-thriftserver.sh --master spark://hadoop-spark01:7077 --executor-memory 1g
     
        3、启动hive metastore
            hive --service metastore > metastore.log 2>&1 &    
     
     
        使用beeline连接
            
      [root@hadoop-spark01 logs]# beeline
        beeline> !connect jdbc:hive2://hadoop-spark01:10000
        0: jdbc:hive2://hadoop-spark01:10000> select count(*) from t_trackinfo;
    +------+--+
    | _c0  |
    +------+--+
    | 188  |
    +------+--+
    1 row selected (16.738 seconds)
     
     
     
     
     
     
     
     
     
     
    需要注意的几点:
    1、我的hive中的数据是从hbase同步过来的。
    2、不需要从新编译hive源码。直接从apache官网上下载就可以了。
    3、一般使用的都是thriftserver2这种方式,通过客户端程序通过jdbc操作hive。所以不用编译源码,做好相应的配置就可以了。
     
    这些配置已经过时,并且写在spark-defaults.conf文件里面,就可以了
     
    SPARK_CLASSPATH was detected (set to '/usr/local/soft/sparkclasspath/mysql-connector-java-5.1.38-bin.jar:/usr/local/soft/sparkcla
    sspath/hive-hbase-handler-1.2.1.jar:/usr/local/soft/sparkclasspath/hbase-common-1.1.2.jar:/usr/local/soft/sparkclasspath/hbase-client-1.1.2.jar:/usr/local/soft/sparkclasspath/hbase-protocol-1.1.2.jar:/usr/local/soft/sparkclasspath/hbase-server-1.1.2.jar:/usr/local/soft/sparkclasspath/protobuf-java-2.5.0.jar:/usr/local/soft/sparkclasspath/htrace-core-3.1.0-incubating.jar:/usr/local/soft/sparkclasspath/guava-12.0.1.jar:/usr/local/soft/sparkclasspath/hive-exec-1.2.1.jar').This is deprecated in Spark 1.0+.
     
    Please instead use:
     - ./spark-submit with --driver-class-path to augment the driver classpath
     - spark.executor.extraClassPath to augment the executor classpath
     
    SPARK_WORKER_INSTANCES was detected (set to '2').
    This is deprecated in Spark 1.0+.
     
    Please instead use:
     - ./spark-submit with --num-executors to specify the number of executors
     - Or set SPARK_EXECUTOR_INSTANCES
     - spark.executor.instances to configure the number of instances in the spark config.
  • 相关阅读:
    构建企业级数据湖?Azure Data Lake Storage Gen2实战体验(中)
    构建企业级数据湖?Azure Data Lake Storage Gen2实战体验(上)
    寻觅Azure上的Athena和BigQuery (二):神奇的PolyBase
    寻觅Azure上的Athena和BigQuery(一):落寞的ADLA
    Azure中国CDN全球覆盖功能初探
    第一次负责项目感悟
    C#读取静态类常量属性和值
    std::thread使用
    C#泛型编程
    C++模板类
  • 原文地址:https://www.cnblogs.com/wangliansong/p/5075046.html
Copyright © 2020-2023  润新知