• 调用javaAPI访问hive


    jdbc远程连接hiveserver2

    2016-04-26 15:59 本站整理 浏览(425)
     
     
    在之前的学习和实践Hive中,使用的都是CLI或者hive –e的方式,该方式仅允许使用HiveQL执行查询、更新等操作,并且该方式比较笨拙单一。幸好Hive提供了轻客户端的实现,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,两者都允许远程客户端使用多种编程语言如Java、Python向Hive提交请求,取回结果。HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift
    server,而HiveServer2却不会。既然已经存在HiveServer为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供了更好的支持。

    所以本文将以HiveServer2为例,介绍并编写远程操作的Hive的Java API。

    首先先列出并本文使用的hive的关键的配置信息:

    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/usr/hive/warehouse</value>               //(hive中的数据库和表在HDFS中存放的文件夹的位置)
      <description>location of default database for the warehouse</description>
    </property>
    <property>
      <name>hive.server2.thrift.port</name>
      <value>10000</value>                               //(HiveServer2远程连接的端口,默认为10000)
      <description>Port number of HiveServer2 Thrift interface.
      Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
    </property>
    
    <property>
      <name>hive.server2.thrift.bind.host</name>
      <value>**.**.**.**</value>                          //(hive所在集群的IP地址)
      <description>Bind host on which to run the HiveServer2 Thrift interface.
      Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>
    </property>
    <property>
      <name>hive.server2.long.polling.timeout</name>
      <value>5000</value>                                // (默认为5000L,此处修改为5000,不然程序会报错)
      <description>Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>  //(Hive的元数据库,我采用的是本地Mysql作为元数据库)
      <description>JDBC connect string for a JDBC metastore</description>
    </property>
    
    <property>                         
      <name>javax.jdo.option.ConnectionDriverName</name>          //(连接元数据的驱动名)
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>             //(连接元数据库用户名)
      <value>hive</value>
      <description>username to use against metastore database</description>
    </property>
    
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>             // (连接元数据库密码)
      <value>hive</value>
      <description>password to use against metastore database</description>
    </property>


    
    
    
    

    确保上述正确配置后,下面启动HiveServer2服务:

    先启动元数据库,在命令行中键入:hive --service metastore & (&符号表示该进程将在后台运行,因为执行此命令后命令行会卡住,如果没加此符号,用ctrl+C退回命令行输入界面后会自动shotdown 该服务)

    如下图:



    之后命令行会卡住,此时查看日志文件hive.log,显示如下:

    2016-04-26 04:44:53,956 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5060)) - Starting hive metastore on port 9083
    2016-04-26 04:44:54,174 WARN  [main]: conf.HiveConf (HiveConf.java:initialize(1390)) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    2016-04-26 04:44:54,326 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(494)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    2016-04-26 04:44:54,412 INFO  [main]: metastore.ObjectStore (ObjectStore.java:initialize(245)) - ObjectStore, initialize called
    2016-04-26 04:44:57,240 WARN  [main]: conf.HiveConf (HiveConf.java:initialize(1390)) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    2016-04-26 04:44:57,246 INFO  [main]: metastore.ObjectStore (ObjectStore.java:getPMF(314)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    2016-04-26 04:45:03,597 INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf(228)) - Initialized ObjectStore
    2016-04-26 04:45:03,806 WARN  [main]: metastore.ObjectStore (ObjectStore.java:checkSchema(6273)) - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.0
    2016-04-26 04:45:04,811 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(552)) - Added admin role in metastore
    2016-04-26 04:45:04,828 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(561)) - Added public role in metastore
    2016-04-26 04:45:04,984 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(589)) - No user is added in admin role, since config is empty
    2016-04-26 04:45:05,361 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5182)) - Starting DB backed MetaStore Server
    2016-04-26 04:45:05,369 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Started the new metaserver on port [9083]...
    2016-04-26 04:45:05,369 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - Options.minWorkerThreads = 200
    2016-04-26 04:45:05,370 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5198)) - Options.maxWorkerThreads = 100000
    2016-04-26 04:45:05,370 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5200)) - TCP keepalive = true
    此时证明metastore已经开启。

    接下来开启hiveserver2服务:

    在命令行中键入:hive --service hiveserver2 &

    同上,也会出现命令行卡住的现象。查看日志文件如下:

    2016-04-26 04:53:24,212 INFO  [main]: server.HiveServer2 (HiveStringUtils.java:startupShutdownMessage(605)) - STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting HiveServer2
    STARTUP_MSG:   host = master/(你之前配置的IP)
    STARTUP_MSG:   args = []
    STARTUP_MSG:   version = 0.13.0
    STARTUP_MSG:   classpath = /opt/modules/hadoop-2.2.0/etc/hadoop:/opt/modules/hadoop-2.2.0/share/hadoop/common/lib
    //(……中间略掉classpath内容,日志信息太长……)
    STARTUP_MSG:   build = file:///Users/hbutani/svn/branch-0.13 -r Unknown; compiled by 'hbutani' on Tue Apr 15 13:55:42 PDT 2014
    ************************************************************/
    2016-04-26 04:53:24,553 WARN  [main]: conf.HiveConf (HiveConf.java:initialize(1390)) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    2016-04-26 04:53:25,258 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(494)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    2016-04-26 04:53:25,325 INFO  [main]: metastore.ObjectStore (ObjectStore.java:initialize(245)) - ObjectStore, initialize called
    2016-04-26 04:53:28,312 WARN  [main]: conf.HiveConf (HiveConf.java:initialize(1390)) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    2016-04-26 04:53:28,313 INFO  [main]: metastore.ObjectStore (ObjectStore.java:getPMF(314)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    2016-04-26 04:53:31,537 INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf(228)) - Initialized ObjectStore
    2016-04-26 04:53:32,064 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(552)) - Added admin role in metastore
    2016-04-26 04:53:32,079 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(561)) - Added public role in metastore
    2016-04-26 04:53:32,205 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(589)) - No user is added in admin role, since config is empty
    2016-04-26 04:53:33,887 INFO  [main]: session.SessionState (SessionState.java:start(358)) - No Tez session required at this point. hive.execution.engine=mr.
    2016-04-26 04:53:34,168 WARN  [main]: conf.HiveConf (HiveConf.java:initialize(1390)) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    2016-04-26 04:53:34,241 INFO  [main]: service.CompositeService (SessionManager.java:init(70)) - HiveServer2: Async execution thread pool size: 100
    2016-04-26 04:53:34,241 INFO  [main]: service.CompositeService (SessionManager.java:init(72)) - HiveServer2: Async execution wait queue size: 100
    2016-04-26 04:53:34,242 INFO  [main]: service.CompositeService (SessionManager.java:init(74)) - HiveServer2: Async execution thread keepalive time: 10
    2016-04-26 04:53:34,244 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:OperationManager is inited.
    2016-04-26 04:53:34,247 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:SessionManager is inited.
    2016-04-26 04:53:34,247 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:CLIService is inited.
    2016-04-26 04:53:34,247 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:ThriftBinaryCLIService is inited.
    2016-04-26 04:53:34,247 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:HiveServer2 is inited.
    2016-04-26 04:53:34,248 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:OperationManager is started.
    2016-04-26 04:53:34,248 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:SessionManager is started.
    2016-04-26 04:53:34,248 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:CLIService is started.
    2016-04-26 04:53:34,698 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(589)) - No user is added in admin role, since config is empty
    2016-04-26 04:53:34,699 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(624)) - 0: get_databases: default
    2016-04-26 04:53:34,701 INFO  [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(306)) - ugi=hh	ip=unknown-ip-addr	cmd=get_databases: default	
    2016-04-26 04:53:34,725 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(494)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    2016-04-26 04:53:34,728 INFO  [main]: metastore.ObjectStore (ObjectStore.java:initialize(245)) - ObjectStore, initialize called
    2016-04-26 04:53:34,745 INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf(228)) - Initialized ObjectStore
    2016-04-26 04:53:34,795 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:ThriftBinaryCLIService is started.
    2016-04-26 04:53:34,796 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
    2016-04-26 04:53:34,947 WARN  [Thread-5]: conf.HiveConf (HiveConf.java:initialize(1390)) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    2016-04-26 04:53:35,584 INFO  [Thread-5]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(88)) - ThriftBinaryCLIService listening on /(你的IP):10000

    你也可以通过下述命令查看hiveserver2是否已经开启:

    [hh@master Desktop]$ netstat -nl |grep 10000
    tcp        0      0 (你的IP):10000        0.0.0.0:*                   LISTEN

    
    

    此时证明hiveserver2服务已经开启!
    (注意:一定要去查看日志信息,因为命令行并不会报错,如果启动失败,相应的异常会在日志信息中显示,日志文件hive.log的路径在$HIVE_HOME/conf/hive-log4j.properties中配置)

    下面开始编写java API:

    首先列出本程序依赖的Jar包:

    hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
    $HIVE_HOME/lib/hive-exec-0.11.0.jar 
    $HIVE_HOME/lib/hive-jdbc-0.11.0.jar 
    $HIVE_HOME/lib/hive-metastore-0.11.0.jar 
    $HIVE_HOME/lib/hive-service-0.11.0.jar 
    $HIVE_HOME/lib/libfb303-0.9.0.jar 
    $HIVE_HOME/lib/commons-logging-1.0.4.jar 
    $HIVE_HOME/lib/slf4j-api-1.6.1.jar


    下面贴出java代码:

    JDBCToHiveUtils.java

    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.PreparedStatement;
    import java.sql.SQLException;
    
    public class JDBCToHiveUtils {
    	private static String driverName ="org.apache.hive.jdbc.HiveDriver";
    	private static String Url="jdbc:hive2://**.**.**.**:10000/default";    //填写hive的IP,之前在配置文件中配置的IP
    	private static Connection conn;
    	public static Connection getConnnection()
    	{
    		try
                   {
        	          Class.forName(driverName);
        	          conn = DriverManager.getConnection(Url,"hh","");        //此处的用户名一定是有权限操作HDFS的用户,否则程序会提示"permission deny"异常
                   }
    		catch(ClassNotFoundException e)  {
        	           e.printStackTrace();
        	           System.exit(1);
                    }
    		 catch (SQLException e) {
    			e.printStackTrace();
    		}
    		return conn;
    	}
    	public static PreparedStatement prepare(Connection conn, String sql) {
    		PreparedStatement ps = null;
    		try {
    			ps = conn.prepareStatement(sql);
    		} catch (SQLException e) {
    			e.printStackTrace();
    		}
    		return ps;
    	}
    }


    QueryHiveUtils.java

    import java.sql.Connection;
    import java.sql.PreparedStatement;
    import java.sql.ResultSet;
    import java.sql.SQLException;
    import java.sql.Statement;
    
    public class QueryHiveUtils {
    	private static Connection conn=JDBCToHiveUtils.getConnnection();
    	private static PreparedStatement ps;
    	private static ResultSet rs;
    	public static void getAll(String tablename)
    	{
    		String sql="select * from "+tablename;
    		System.out.println(sql);
    		try {
    			ps=JDBCToHiveUtils.prepare(conn, sql);
    			rs=ps.executeQuery();
    			int columns=rs.getMetaData().getColumnCount();
    			while(rs.next())
    		    {
    			    for(int i=1;i<=columns;i++)
    			    {
    			        System.out.print(rs.getString(i));
    			        System.out.print("		");
    			    }
    			    System.out.println();
    			}
    		} catch (SQLException e) {
    			// TODO Auto-generated catch block
    			e.printStackTrace();
    		}
    
    	}
    
    }


    QuerHiveTest.java

    public class QueryHiveTest {
    
    	public static void main(String[] args) {
    		String tablename="test1";
                    QueryHiveUtils.getAll(tablename);
    	}
    
    }


    运行结果如下:
    select * from test1
    1		张三		男		20.0				
    2		李四		女		35.0				
    3		王五		男		null				
    4		赵六		null		70.0
  • 相关阅读:
    spark 系列之六 SparkStreaming数据源之socket流
    spark 系列之五 SparkStreaming数据源之文件流
    spark 系列之四 Spark数据源之关系型数据库
    spark 系列之三 RDD,DataFrame与DataSet之间的转化
    spark 系列之二 Dataframe的使用
    spark 系列之一 RDD的使用
    Java Int类型与字符,汉字之间的转换
    Java实现栈
    名字的漂亮度
    在字符串中找出连续最长的数字串
  • 原文地址:https://www.cnblogs.com/superAng/p/5939642.html
Copyright © 2020-2023  润新知