• Hive学习笔记——hive hook


    Hive hook是hive的钩子函数,可以下面这几种情况下被执行

     参考

    https://www.slideshare.net/julingks/apache-hive-hooksminwookim130813
    

    这些hook函数的hive sql运行过程中的执行顺序

    Driver.run()
    
    => HiveDriverRunHook.preDriverRun()(hive.exec.driver.run.hooks)
    
    => Driver.compile()
    
    => HiveSemanticAnalyzerHook.preAnalyze()(hive.semantic.analyzer.hook)
    
    => SemanticAnalyze(QueryBlock, LogicalPlan, PhyPlan, TaskTree)
    
    => HiveSemanticAnalyzerHook.postAnalyze()(hive.semantic.analyzer.hook)
    
    => QueryString redactor(hive.exec.query.redactor.hooks)
    
    => QueryPlan Generation
    
    => Authorization
    
    => Driver.execute()
    
    => ExecuteWithHookContext.run() || PreExecute.run() (hive.exec.pre.hooks)
    
    => TaskRunner
    
    => if failed, ExecuteWithHookContext.run()(hive.exec.failure.hooks)
    
    => ExecuteWithHookContext.run() || PostExecute.run() (hive.exec.post.hooks)
    
    => HiveDriverRunHook.postDriverRun()(hive.exec.driver.run.hooks)
    

    参考

    https://my.oschina.net/kavn/blog/1514648
    

    1.ExecuteWithHookContext接口

    下面将实现ExecuteWithHookContext接口来实现一个钩子函数,其他的hook还有实现HiveSemanticAnalyzerHook接口,继承AbstractSemanticAnalyzerHook抽象类等

    ExecuteWithHookContext可以实现3种类型的hook,分别是pre-execution,post-execution和execution-failure,这个在hive sql的执行过程中已经处于最后几个步骤了

    依赖

    需要注意依赖的版本需要和集群保持一致,我的cdh集群的版本为cdh5.16.2

    如果使用的是apache版本的1.1.0版本的hive-exec的话,代码的内容会和cloudera的1.1.0-cdh5.16.2的hive-exec会有些不同,比如

    1.1.0-cdh5.16.2版本的TOK_QUERY=789

    但是1.1.0版本的TOK_QUERY=777

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <parent>
            <artifactId>interview-parent</artifactId>
            <groupId>com.interview</groupId>
            <version>1.0-SNAPSHOT</version>
        </parent>
        <modelVersion>4.0.0</modelVersion>
    
        <artifactId>interview-bigdata</artifactId>
    
        <dependencies>
            <!-- logback -->
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>log4j-over-slf4j</artifactId>
                <version>1.7.25</version>
            </dependency>
            <!--hive-->
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-exec</artifactId>
                <version>1.1.0-cdh5.16.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-metastore</artifactId>
                <version>1.1.0-cdh5.16.2</version>
            </dependency>
            <!--hadoop-->
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.6.0-cdh5.16.2</version>
            </dependency>
    
        </dependencies>
    
        <!--<build>-->
            <!--<plugins>-->
                <!--<plugin>-->
                    <!--<groupId>org.apache.maven.plugins</groupId>-->
                    <!--<artifactId>maven-shade-plugin</artifactId>-->
                    <!--<executions>-->
                        <!--<!– Run shade goal on package phase –>-->
                        <!--<execution>-->
                            <!--<phase>package</phase>-->
                            <!--<goals>-->
                                <!--<goal>shade</goal>-->
                            <!--</goals>-->
                            <!--<configuration>-->
                                <!--<filters>-->
                                    <!--<filter>-->
                                        <!--<!– Do not copy the signatures in the META-INF folder.-->
                                        <!--Otherwise, this might cause SecurityExceptions when using the JAR. –>-->
                                        <!--<artifact>*:*</artifact>-->
                                        <!--<excludes>-->
                                            <!--<exclude>META-INF/*.SF</exclude>-->
                                            <!--<exclude>META-INF/*.DSA</exclude>-->
                                            <!--<exclude>META-INF/*.RSA</exclude>-->
                                        <!--</excludes>-->
                                    <!--</filter>-->
                                <!--</filters>-->
    
                                <!--<createDependencyReducedPom>false</createDependencyReducedPom>-->
                            <!--</configuration>-->
                        <!--</execution>-->
                    <!--</executions>-->
                <!--</plugin>-->
    
                <!--<plugin>-->
                    <!--<groupId>org.apache.maven.plugins</groupId>-->
                    <!--<artifactId>maven-compiler-plugin</artifactId>-->
                    <!--<configuration>-->
                        <!--<source>1.8</source>-->
                        <!--<target>1.8</target>-->
                    <!--</configuration>-->
                <!--</plugin>-->
    
            <!--</plugins>-->
        <!--</build>-->
    
    </project>
    

    代码,只是简单的打印了一行日志

    package com.bigdata.hive;
    
    import org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext;
    import org.apache.hadoop.hive.ql.hooks.HookContext;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class MyHiveHook implements ExecuteWithHookContext {
    
        private static Logger logger = LoggerFactory.getLogger(MyHiveHook.class);
    
        public void run(HookContext hookContext) throws Exception {
            logger.info("this is my hive hook");
        }
    }
    

    打包

    mvn clean package
    

    将打好的jar包上传到所有hiveserver2所在机器的/var/lib/hive目录下,或者找一个hdfs目录

    root@master:/var/lib/hive# ls
    examples.desktop  interview-bigdata-1.0-SNAPSHOT.jar
    

    修改owner成hive

    sudo chown hive:hive ./interview-bigdata-1.0-SNAPSHOT.jar
    

    去cloudera manager中配置hive的辅助jar目录和hook函数

    jar目录

    hook函数,此处配置成hive.exec.pre.hooks,此时添加的hook函数将在sql执行之前运行

     

    第一次配置之后需要重启hive集群,之后替换jar包的时候就只需要在hue中执行reload命令即可

    执行sql

    查看hiveserver2日志

    tail -n 100 /var/log/hive/hadoop-cmf-hive-HIVESERVER2-master.log.out
    

    可以看到打印的日志

    下面尝试获取一下截取query,并进行打印

    参考了

    https://towardsdatascience.com/apache-hive-hooks-and-metastore-listeners-a-tale-of-your-metadata-903b751ee99f
    

     代码,替换jar包的时候需要重启hive才能生效

    package com.bigdata.hive;
    
    import org.apache.hadoop.hive.metastore.api.Database;
    import org.apache.hadoop.hive.ql.QueryPlan;
    import org.apache.hadoop.hive.ql.hooks.*;
    import org.apache.hadoop.hive.ql.plan.HiveOperation;
    
    import org.codehaus.jackson.map.ObjectMapper;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.util.HashSet;
    import java.util.Set;
    
    public class MyHiveHook implements ExecuteWithHookContext {
    
        private static Logger logger = LoggerFactory.getLogger(MyHiveHook.class);
        private static final HashSet<String> OPERATION_NAMES = new HashSet<>();
    
        static {
            OPERATION_NAMES.add(HiveOperation.CREATETABLE.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERDATABASE.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERDATABASE_OWNER.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERTABLE_ADDCOLS.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERTABLE_LOCATION.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERTABLE_PROPERTIES.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAME.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAMECOL.getOperationName());
            OPERATION_NAMES.add(HiveOperation.ALTERTABLE_REPLACECOLS.getOperationName());
            OPERATION_NAMES.add(HiveOperation.CREATEDATABASE.getOperationName());
            OPERATION_NAMES.add(HiveOperation.DROPDATABASE.getOperationName());
            OPERATION_NAMES.add(HiveOperation.DROPTABLE.getOperationName());
        }
    
        @Override
        public void run(HookContext hookContext) throws Exception {
            assert (hookContext.getHookType() == HookContext.HookType.POST_EXEC_HOOK);
    
            QueryPlan plan = hookContext.getQueryPlan();
    
            String operationName = plan.getOperationName();
            logWithHeader("Query executed: " + plan.getQueryString());
            logWithHeader("Operation: " + operationName);
            if (OPERATION_NAMES.contains(operationName)
                    && !plan.isExplain()) {
                logWithHeader("Monitored Operation");
                Set<ReadEntity> inputs = hookContext.getInputs();
                Set<WriteEntity> outputs = hookContext.getOutputs();
    
                for (Entity entity : inputs) {
                    logWithHeader("Hook metadata input value: " +  toJson(entity));
                }
    
                for (Entity entity : outputs) {
                    logWithHeader("Hook metadata output value: " +  toJson(entity));
                }
    
            } else {
                logWithHeader("Non-monitored Operation, ignoring hook");
            }
        }
    
        private static String toJson(Entity entity) throws Exception {
            ObjectMapper mapper = new ObjectMapper();
            switch (entity.getType()) {
                case DATABASE:
                    Database db = entity.getDatabase();
                    return mapper.writeValueAsString(db);
                case TABLE:
                    return mapper.writeValueAsString(entity.getTable().getTTable());
            }
            return null;
        }
    
        private void logWithHeader(Object obj){
            logger.info("[CustomHook][Thread: "+Thread.currentThread().getName()+"] | " + obj);
        }
    }
    

     输出,可以看到select * from test,执行的将会成为两个operation,

    一个是SWITCHDATABASE

    2020-03-22 23:50:33,374 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-39]: <PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook from=org.apache.hadoop.hive.ql.Driver>
    2020-03-22 23:50:33,374 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Handler-Pool: Thread-39]: [CustomHook][Thread: HiveServer2-Handler-Pool: Thread-39] | Query executed: USE `default`
    2020-03-22 23:50:33,374 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Handler-Pool: Thread-39]: [CustomHook][Thread: HiveServer2-Handler-Pool: Thread-39] | Operation: SWITCHDATABASE
    2020-03-22 23:50:33,374 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Handler-Pool: Thread-39]: [CustomHook][Thread: HiveServer2-Handler-Pool: Thread-39] | Non-monitored Operation, ignoring hook
    2020-03-22 23:50:33,375 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-39]: </PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook start=1584892233374 end=1584892233375 duration=1 from=org.apache.hadoop.hive.ql.Driver>
    

     一个是QUERY

    2020-03-22 23:50:35,282 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-50]: Executing command(queryId=hive_20200322235050_83cdb414-52bd-4990-9d20-87f5dc0d76dc): SELECT * from test
    2020-03-22 23:50:35,283 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-50]: <PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook from=org.apache.hadoop.hive.ql.Driver>
    2020-03-22 23:50:35,283 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-50]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-50] | Query executed: SELECT * from test
    2020-03-22 23:50:35,283 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-50]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-50] | Operation: QUERY
    2020-03-22 23:50:35,283 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-50]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-50] | Non-monitored Operation, ignoring hook
    2020-03-22 23:50:35,283 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-50]: </PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook start=1584892235283 end=1584892235283 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    

    如果执行的是建表语句,比如

    CREATE TABLE `test1`(
    	  `id` int, 
    	  `name` string)
    	ROW FORMAT SERDE 
    	  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    	STORED AS INPUTFORMAT 
    	  'org.apache.hadoop.mapred.TextInputFormat' 
    	OUTPUTFORMAT 
    	  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    	LOCATION
    	  'hdfs://master:8020/user/hive/warehouse/test'
    	TBLPROPERTIES (
    	  'COLUMN_STATS_ACCURATE'='true', 
    	  'numFiles'='3', 
    	  'numRows'='6', 
    	  'rawDataSize'='36', 
    	  'totalSize'='42', 
    	  'transient_lastDdlTime'='1584893066')
    

    那么hook函数的输出将会是

    2020-03-23 00:08:16,486 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-94]: <PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook from=org.apache.hadoop.hive.ql.Driver>
    2020-03-23 00:08:16,486 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-94]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-94] | Query executed: CREATE TABLE `test1`(
              `id` int,
              `name` string)
            ROW FORMAT SERDE
              'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
            STORED AS INPUTFORMAT
              'org.apache.hadoop.mapred.TextInputFormat'
            OUTPUTFORMAT
              'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
            LOCATION
              'hdfs://master:8020/user/hive/warehouse/test'
            TBLPROPERTIES (
              'COLUMN_STATS_ACCURATE'='true',
              'numFiles'='3',
              'numRows'='6',
              'rawDataSize'='36',
              'totalSize'='42',
              'transient_lastDdlTime'='1584893066')
    2020-03-23 00:08:16,486 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-94]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-94] | Operation: CREATETABLE
    2020-03-23 00:08:16,486 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-94]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-94] | Monitored Operation
    2020-03-23 00:08:16,487 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-94]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-94] | Hook metadata input value: null
    2020-03-23 00:08:16,497 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-94]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-94] | Hook metadata output value: {"description":"Default Hive database","name":"default","parameters":{},"ownerType":"ROLE","setName":true,"setDescription":true,"locationUri":"hdfs://master:8020/user/hive/warehouse","setLocationUri":true,"setOwnerName":true,"ownerName":"public","setPrivileges":false,"setOwnerType":true,"parametersSize":0,"setParameters":true,"privileges":null}
    2020-03-23 00:08:16,519 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-94]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-94] | Hook metadata output value: {"lastAccessTime":0,"parameters":{},"owner":"hive","ownerType":"USER","dbName":"default","tableType":"MANAGED_TABLE","tableName":"test1","setTableName":true,"setOwner":true,"retention":0,"setRetention":false,"partitionKeysSize":0,"partitionKeysIterator":[],"setPartitionKeys":true,"viewOriginalText":null,"setViewOriginalText":false,"viewExpandedText":null,"setViewExpandedText":false,"setTableType":true,"setPrivileges":false,"setTemporary":false,"setOwnerType":true,"setDbName":true,"createTime":1584893296,"setCreateTime":true,"setLastAccessTime":false,"setSd":true,"parametersSize":0,"setParameters":true,"privileges":null,"sd":{"location":null,"parameters":{},"numBuckets":-1,"inputFormat":"org.apache.hadoop.mapred.SequenceFileInputFormat","outputFormat":"org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat","compressed":false,"sortCols":[],"parametersSize":0,"setParameters":true,"cols":[],"colsSize":0,"serdeInfo":{"setSerializationLib":true,"name":null,"parameters":{"serialization.format":"1"},"setName":false,"parametersSize":1,"setParameters":true,"serializationLib":"org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe"},"skewedInfo":{"skewedColNamesSize":0,"skewedColNamesIterator":[],"setSkewedColNames":true,"skewedColValuesSize":0,"skewedColValuesIterator":[],"setSkewedColValues":true,"skewedColValueLocationMapsSize":0,"setSkewedColValueLocationMaps":true,"skewedColValueLocationMaps":{},"skewedColNames":[],"skewedColValues":[]},"bucketCols":[],"setNumBuckets":true,"setSerdeInfo":true,"bucketColsSize":0,"bucketColsIterator":[],"setBucketCols":true,"sortColsSize":0,"sortColsIterator":[],"setSortCols":true,"setSkewedInfo":true,"storedAsSubDirectories":false,"setStoredAsSubDirectories":false,"colsIterator":[],"setCols":true,"setLocation":false,"setInputFormat":true,"setOutputFormat":true,"setCompressed":false},"temporary":false,"partitionKeys":[]}
    2020-03-23 00:08:16,519 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-94]: </PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook start=1584893296486 end=1584893296519 duration=33 from=org.apache.hadoop.hive.ql.Driver>
    2020-03-23 00:08:16,519 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-94]: </PERFLOG method=TimeToSubmit start=1584893296477 end=1584893296519 duration=42 from=org.apache.hadoop.hive.ql.Driver>
    

    如果执行的是修改字段的语句,比如

    ALTER TABLE test1 CHANGE id id String COMMENT "test"
    

    那么hook函数的输出会是

    2020-03-28 14:19:12,912 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-39]: </PERFLOG method=compile start=1585376352892 end=1585376352912 duration=20 from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,912 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Handler-Pool: Thread-39]: Completed compiling command(queryId=hive_20200328141919_d2739f08-478e-4f95-949b-e0bd176e4eab); Time taken: 0.02 seconds
    2020-03-28 14:19:12,913 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-79]: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,913 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-79]: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,913 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-79]: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,922 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-79]: </PERFLOG method=acquireReadWriteLocks start=1585376352913 end=1585376352922 duration=9 from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,922 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-79]: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,922 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-79]: Executing command(queryId=hive_20200328141919_d2739f08-478e-4f95-949b-e0bd176e4eab): ALTER TABLE test1 CHANGE id id String COMMENT "test"
    2020-03-28 14:19:12,923 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-79]: <PERFLOG method=PreHook.com.bigdata.hive.MyHiveHook from=org.apache.hadoop.hive.ql.Driver>
    2020-03-28 14:19:12,923 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-79]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-79] | Query executed: ALTER TABLE test1 CHANGE id id String COMMENT "test"
    2020-03-28 14:19:12,923 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-79]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-79] | Operation: ALTERTABLE_RENAMECOL
    2020-03-28 14:19:12,923 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-79]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-79] | Monitored Operation
    2020-03-28 14:19:12,928 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-79]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-79] | Hook metadata input value: {"lastAccessTime":0,"ownerType":"USER","parameters":{"last_modified_time":"1585376257","totalSize":"0","numRows":"-1","rawDataSize":"-1","COLUMN_STATS_ACCURATE":"false","numFiles":"0","transient_lastDdlTime":"1585376257","last_modified_by":"hive"},"owner":"hive","tableName":"test1","dbName":"default","tableType":"MANAGED_TABLE","privileges":null,"sd":{"location":"hdfs://master:8020/user/hive/warehouse/test","parameters":{},"inputFormat":"org.apache.hadoop.mapred.TextInputFormat","outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","compressed":false,"numBuckets":-1,"sortCols":[],"cols":[{"comment":"test","name":"id","type":"string","setName":true,"setType":true,"setComment":true},{"comment":null,"name":"name","type":"string","setName":true,"setType":true,"setComment":false}],"colsSize":2,"serdeInfo":{"setSerializationLib":true,"name":null,"parameters":{"serialization.format":"1"},"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","parametersSize":1,"setParameters":true,"setName":false},"skewedInfo":{"skewedColNamesSize":0,"skewedColNamesIterator":[],"setSkewedColNames":true,"skewedColValuesSize":0,"skewedColValuesIterator":[],"setSkewedColValues":true,"skewedColValueLocationMapsSize":0,"setSkewedColValueLocationMaps":true,"skewedColValueLocationMaps":{},"skewedColNames":[],"skewedColValues":[]},"bucketCols":[],"parametersSize":0,"setParameters":true,"colsIterator":[{"comment":"test","name":"id","type":"string","setName":true,"setType":true,"setComment":true},{"comment":null,"name":"name","type":"string","setName":true,"setType":true,"setComment":false}],"setCols":true,"setLocation":true,"setInputFormat":true,"setOutputFormat":true,"setCompressed":true,"setNumBuckets":true,"setSerdeInfo":true,"bucketColsSize":0,"bucketColsIterator":[],"setBucketCols":true,"sortColsSize":0,"sortColsIterator":[],"setSortCols":true,"setSkewedInfo":true,"storedAsSubDirectories":false,"setStoredAsSubDirectories":true},"temporary":false,"partitionKeys":[],"setTableName":true,"setOwner":true,"retention":0,"setRetention":true,"partitionKeysSize":0,"partitionKeysIterator":[],"setPartitionKeys":true,"viewOriginalText":null,"setViewOriginalText":false,"viewExpandedText":null,"setViewExpandedText":false,"setTableType":true,"setPrivileges":false,"setTemporary":false,"setOwnerType":true,"setDbName":true,"createTime":1584893296,"setCreateTime":true,"setLastAccessTime":true,"setSd":true,"parametersSize":8,"setParameters":true}
    2020-03-28 14:19:12,935 INFO  com.bigdata.hive.MyHiveHook: [HiveServer2-Background-Pool: Thread-79]: [CustomHook][Thread: HiveServer2-Background-Pool: Thread-79] | Hook metadata output value: {"lastAccessTime":0,"ownerType":"USER","parameters":{"last_modified_time":"1585376257","totalSize":"0","numRows":"-1","rawDataSize":"-1","COLUMN_STATS_ACCURATE":"false","numFiles":"0","transient_lastDdlTime":"1585376257","last_modified_by":"hive"},"owner":"hive","tableName":"test1","dbName":"default","tableType":"MANAGED_TABLE","privileges":null,"sd":{"location":"hdfs://master:8020/user/hive/warehouse/test","parameters":{},"inputFormat":"org.apache.hadoop.mapred.TextInputFormat","outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","compressed":false,"numBuckets":-1,"sortCols":[],"cols":[{"comment":"test","name":"id","type":"string","setName":true,"setType":true,"setComment":true},{"comment":null,"name":"name","type":"string","setName":true,"setType":true,"setComment":false}],"colsSize":2,"serdeInfo":{"setSerializationLib":true,"name":null,"parameters":{"serialization.format":"1"},"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","parametersSize":1,"setParameters":true,"setName":false},"skewedInfo":{"skewedColNamesSize":0,"skewedColNamesIterator":[],"setSkewedColNames":true,"skewedColValuesSize":0,"skewedColValuesIterator":[],"setSkewedColValues":true,"skewedColValueLocationMapsSize":0,"setSkewedColValueLocationMaps":true,"skewedColValueLocationMaps":{},"skewedColNames":[],"skewedColValues":[]},"bucketCols":[],"parametersSize":0,"setParameters":true,"colsIterator":[{"comment":"test","name":"id","type":"string","setName":true,"setType":true,"setComment":true},{"comment":null,"name":"name","type":"string","setName":true,"setType":true,"setComment":false}],"setCols":true,"setLocation":true,"setInputFormat":true,"setOutputFormat":true,"setCompressed":true,"setNumBuckets":true,"setSerdeInfo":true,"bucketColsSize":0,"bucketColsIterator":[],"setBucketCols":true,"sortColsSize":0,"sortColsIterator":[],"setSortCols":true,"setSkewedInfo":true,"storedAsSubDirectories":false,"setStoredAsSubDirectories":true},"temporary":false,"partitionKeys":[],"setTableName":true,"setOwner":true,"retention":0,"setRetention":true,"partitionKeysSize":0,"partitionKeysIterator":[],"setPartitionKeys":true,"viewOriginalText":null,"setViewOriginalText":false,"viewExpandedText":null,"setViewExpandedText":false,"setTableType":true,"setPrivileges":false,"setTemporary":false,"setOwnerType":true,"setDbName":true,"createTime":1584893296,"setCreateTime":true,"setLastAccessTime":true,"setSd":true,"parametersSize":8,"setParameters":true}
    

    2.HiveSemanticAnalyzerHook接口

    参考:https://www.iteye.com/blog/crazymatrix-2092830

    实现HiveSemanticAnalyzerHook接口可以在hive执行语法分析前后插入hook函数,即preAnalyze和postAnalyze

    有时,用户写的sql会带有上下文,比如select * from test,这时test这张表会属于用户当前session中的某个库,比如use default,可以使用

    private final SessionState ss = SessionState.get();
    

     来获得当前用户session中的库

    package com.bigdata.hive;
    
    import java.io.Serializable;
    import java.util.ArrayList;
    import java.util.List;
    
    import org.apache.commons.lang.StringUtils;
    import org.apache.hadoop.hive.metastore.api.FieldSchema;
    import org.apache.hadoop.hive.ql.exec.Task;
    import org.apache.hadoop.hive.ql.metadata.Hive;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.metadata.Table;
    import org.apache.hadoop.hive.ql.parse.*;
    import org.apache.hadoop.hive.ql.session.SessionState;
    import org.apache.hadoop.hive.ql.session.SessionState.LogHelper;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class MyHiveHook2 implements HiveSemanticAnalyzerHook {
    
        private final static String NO_PARTITION_WARNING = "WARNING: HQL is not efficient, Please specify partition condition! HQL:%s ;USERNAME:%s";
    
        private final SessionState ss = SessionState.get();
        private final LogHelper console = SessionState.getConsole();
        private Hive hive = null;
        private String username;
        private String currentDatabase = "default";
        private String hql;
        private String whereHql;
        private String tableAlias;
        private String tableName;
        private String tableDatabaseName;
        private Boolean needCheckPartition = false;
    
        private static Logger logger = LoggerFactory.getLogger(MyHiveHook2.class);
    
    
        @Override
        public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context, ASTNode ast) throws SemanticException {
            try {
                logger.info(context.getCommand()); // 当前query
                logger.info(ss.getUserName());  // 当前user
                hql = StringUtils.replaceChars(context.getCommand(), '
    ', ' ');
                logger.info("hql: " + hql);
                if (hql.contains("where")) {
                    whereHql = hql.substring(hql.indexOf("where"));
                }
                
                username = context.getUserName();
                logger.info(ast.getToken().getText()); // TOK_QUERY
                logger.info(String.valueOf(ast.getToken().getType())); // token code
                logger.info("" + (ast.getToken().getType() == HiveParser.TOK_QUERY));
                if (ast.getToken().getType() == HiveParser.TOK_QUERY) {
                    try {
                        hive = context.getHive();
    
                        currentDatabase = hive.getDatabaseCurrent().getName();
                        logger.info("current database: " + currentDatabase); // session db
                    } catch (HiveException e) {
                        throw new SemanticException(e);
                    }
    
                    extractFromClause((ASTNode) ast.getChild(0));
    
                    String dbname = StringUtils.isEmpty(tableDatabaseName) ? currentDatabase : tableDatabaseName;
                    String tbname = tableName;
    
                    String[] parts = tableName.split(".");
                    if (parts.length == 2) {
                        dbname = parts[0];
                        tbname = parts[1];
                    }
                    logger.info("this is hive database name: " + dbname); // current db
                    logger.info("this is hive table name: " + tbname);  // current table
                    Table t = hive.getTable(dbname, tbname);
                    if (t.isPartitioned()) {
                        if (StringUtils.isBlank(whereHql)) {
                            console.printError(String.format(NO_PARTITION_WARNING, hql, username));
                        } else {
                            List<FieldSchema> partitionKeys = t.getPartitionKeys();
                            List<String> partitionNames = new ArrayList<String>();
                            for (int i = 0; i < partitionKeys.size(); i++) {
                                partitionNames.add(partitionKeys.get(i).getName().toLowerCase());
                            }
    
                            if (!containsPartCond(partitionNames, whereHql, tableAlias)) {
                                console.printError(String.format(NO_PARTITION_WARNING, hql, username));
                            }
                        }
                    }
    
    
                }
            } catch (Exception ex) {
                logger.info("error: ", ex);
            }
            return ast;
        }
    
        private boolean containsPartCond(List<String> partitionKeys, String sql, String alias) {
            for (String pk : partitionKeys) {
                if (sql.contains(pk)) {
                    return true;
                }
                if (!StringUtils.isEmpty(alias) && sql.contains(alias + "." + pk)) {
                    return true;
                }
            }
            return false;
        }
    
        private void extractFromClause(ASTNode ast) {
            if (HiveParser.TOK_FROM == ast.getToken().getType()) {
                ASTNode refNode = (ASTNode) ast.getChild(0);
                if (refNode.getToken().getType() == HiveParser.TOK_TABREF && ast.getChildCount() == 1) {
                    ASTNode tabNameNode = (ASTNode) (refNode.getChild(0));
                    int refNodeChildCount = refNode.getChildCount();
                    if (tabNameNode.getToken().getType() == HiveParser.TOK_TABNAME) {
                        if (tabNameNode.getChildCount() == 2) {
                            tableDatabaseName = tabNameNode.getChild(0).getText().toLowerCase();
                            tableName = BaseSemanticAnalyzer.getUnescapedName((ASTNode) tabNameNode.getChild(1))
                                    .toLowerCase();
                        } else if (tabNameNode.getChildCount() == 1) {
                            tableName = BaseSemanticAnalyzer.getUnescapedName((ASTNode) tabNameNode.getChild(0))
                                    .toLowerCase();
                        } else {
                            return;
                        }
    
                        if (refNodeChildCount == 2) {
                            tableAlias = BaseSemanticAnalyzer.unescapeIdentifier(refNode.getChild(1).getText())
                                    .toLowerCase();
                        }
                        needCheckPartition = true;
                    }
                }
            }
        }
    
        @Override
        public void postAnalyze(HiveSemanticAnalyzerHookContext context,
                                List<Task<? extends Serializable>> rootTasks) throws SemanticException {
            logger.info(context.getCommand());
    
        }
    
    }
    

     输出是

    2020-04-01 00:41:41,485 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Handler-Pool: Thread-39]: Compiling command(queryId=hive_20200401004141_bf4c578a-6872-4947-8396-7c11b0530539): select * from test
    2020-04-01 00:41:41,486 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-39]: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    2020-04-01 00:41:41,501 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-39]: </PERFLOG method=parse start=1585672901486 end=1585672901501 duration=15 from=org.apache.hadoop.hive.ql.Driver>
    2020-04-01 00:41:41,502 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-39]: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    2020-04-01 00:41:41,550 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: select * from test
    2020-04-01 00:41:41,551 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: hive
    2020-04-01 00:41:41,551 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: hql: select * from test
    2020-04-01 00:41:41,551 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: TOK_QUERY
    2020-04-01 00:41:41,551 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: 789
    2020-04-01 00:41:41,551 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: true
    2020-04-01 00:41:41,561 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: current database: default
    2020-04-01 00:41:41,561 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: this is hive database name: default
    2020-04-01 00:41:41,561 INFO  com.bigdata.hive.MyHiveHook2: [HiveServer2-Handler-Pool: Thread-39]: this is hive table name: test
    2020-04-01 00:41:41,621 INFO  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: [HiveServer2-Handler-Pool: Thread-39]: Starting Semantic Analysis
    2020-04-01 00:41:41,623 INFO  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: [HiveServer2-Handler-Pool: Thread-39]: Completed phase 1 of Semantic Analysis
    2020-04-01 00:41:41,623 INFO  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: [HiveServer2-Handler-Pool: Thread-39]: Get metadata for source tables
    2020-04-01 00:41:41,648 INFO  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: [HiveServer2-Handler-Pool: Thread-39]: Get metadata for subqueries
    2020-04-01 00:41:41,656 INFO  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: [HiveServer2-Handler-Pool: Thread-39]: Get metadata for destination tables
    2020-04-01 00:41:41,708 INFO  hive.ql.Context: [HiveServer2-Handler-Pool: Thread-39]: New scratch dir is hdfs://master:8020/tmp/hive/hive/3fc884ec-0f81-49e7-ae87-45ce85ca4139/hive_2020-04-01_00-41-41_485_2813780198360550808-1
    2020-04-01 00:41:41,710 INFO  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: [HiveServer2-Handler-Pool: Thread-39]: Completed getting MetaData in Semantic Analysis
    
  • 相关阅读:
    批量修改同一种控件属性
    线程池的使用
    金蝶K3常用数据表
    安装更新
    ApexSQL
    c# 计算一个整型数组的平均
    sqlServer基础知识
    c# 获取字符串数组中最长的的字符串并输出最长的字符串
    c# 获取数组中最大数的值
    c# 计算1-100之间的所有质数(素数)的和
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/12542656.html
Copyright © 2020-2023  润新知