• Hadoop-Impala学习笔记之SQL参考


    参考:Apache Impala Guide--Impala SQL Language Reference。

    Impala使用和Hive一样的元数据存储,Impala可以访问使用原生Impala CREATE TABLE创建的表和Hive DDL创建的表;

    Impala支持和HiveQL类似的DML;

    支持TRUNCATE(至少在5.5及之前,不支持Kudu表)

    UPDATE/UPSERT(5.10开始支持Kudu表)

    Impala提供了很多和HiveQL等价的内置函数;

    支持大多数子句如WITH, JOIN, AGGREGATE,DISTINCT, UNION ALL, ORDER BY, LIMIT(相关子查询除外);

    支持和Hive一样的数据类型,如STRING, TINYINT,SMALLINT, INT, BIGINT, FLOAT, DOUBLE, BOOLEAN, STRING, TIMESTAMP(需要注意Kudu和Parquet文件格式的限制);

    impala支持分区表、外部表;

    impala支持绝大部分SQL 92语句;

    查看各种上下文信息,可以使用SHOW XXX;如:

    SHOW DATABASES [[LIKE] 'pattern']
    SHOW SCHEMAS [[LIKE] 'pattern'] - an alias for SHOW DATABASES
    SHOW TABLES [IN database_name] [[LIKE] 'pattern']
    SHOW [AGGREGATE | ANALYTIC] FUNCTIONS [IN database_name] [[LIKE] 'pattern']
    SHOW CREATE TABLE [database_name].table_name
    SHOW CREATE VIEW [database_name].view_name
    SHOW TABLE STATS [database_name.]table_name
    SHOW COLUMN STATS [database_name.]table_name
    SHOW [RANGE] PARTITIONS [database_name.]table_name
    SHOW FILES IN [database_name.]table_name [PARTITION (key_col_expression [,
    key_col_expression]]
    SHOW ROLES
    SHOW CURRENT ROLES
    SHOW ROLE GRANT GROUP group_name
    SHOW GRANT ROLE role_name

    和RDBMS一样,支持统计信息收集COMPUTE [INCREMENTAL] STATS [db_name.]table_name(不同于Hive的ANALYZE TABLE需要分别为列和表收集统计信息,会一次性收集);增量统计信息收集适合于分区表(在ETL的最后应该收集统计信息)。下列语句可用来查看表的统计信息:

    [quickstart.cloudera:21000] > show table stats t1;
    Query: show table stats t1
    +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------------------+
    | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                               |
    +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------------------+
    | -1    | 7      | 512B | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/t1 |
    +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------------------+
    Fetched 1 row(s) in 3.17s
    [quickstart.cloudera:21000] > show column stats t1;
    Query: show column stats t1
    +--------+------+------------------+--------+----------+----------+
    | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
    +--------+------+------------------+--------+----------+----------+
    | x      | INT  | -1               | -1     | 4        | 4        |
    +--------+------+------------------+--------+----------+----------+
    Fetched 1 row(s) in 0.11s

    支持优化器提示(但是比较弱,要掌握优化器提示,必须先掌握执行计划)

    有好几种格式(有点乱):

    SELECT STRAIGHT_JOIN select_list FROM
    join_left_hand_table
    JOIN [{ /* +BROADCAST */ | /* +SHUFFLE */ }]
    join_right_hand_table
    remainder_of_query;
    
    INSERT insert_clauses
    [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
    [/* +CLUSTERED */]
    SELECT remainder_of_query;
    
    SELECT select_list FROM
    table_ref
    /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE}
    [,RANDOM_REPLICA] */
    remainder_of_query;
    
    -- 最近的优化器提示支持下列格式:
    SELECT select_list FROM
    join_left_hand_table
    JOIN -- +BROADCAST|SHUFFLE
    join_right_hand_table
    remainder_of_query;
    
    INSERT insert_clauses
    /* +SHUFFLE|NOSHUFFLE */
    SELECT remainder_of_query;

    有针对Impala执行本身的、也有针对控制HDFS调度的。

    支持的各种hint可以参考Query Hints in Impala SELECT Statements

    数据移动操作,LOAD DATA语句可以用来将HDFS目录的数据移动到Impala数据目录(目前不支持从本地文件系统移动,真正的移动、不是复制)可以直接映射到HDFS文件,那什么时候需要这个操作??(其实和外部表性质类似,只不过移动后文件为Impala管辖,否则为HDFS管辖)。

    LOAD DATA INPATH 'hdfs_file_or_directory_path' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

     会话上下文设置:

     它原先是一个impala-shell命令,后来被扩展为SQL语句,主要用于JDBC和ODBC API中。

    不带任何参数可输出所有可设置的上下文选项列表,如下:

    [quickstart.cloudera:21000] > set
                                > ;
    Query options (defaults shown in []):
        ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
        ABORT_ON_ERROR: [0]
        ALLOW_UNSUPPORTED_FORMATS: [0]
        APPX_COUNT_DISTINCT: [0]
        BATCH_SIZE: [0]
        BUFFER_POOL_LIMIT: [0]
        COMPRESSION_CODEC: [NONE]
        DEBUG_ACTION: []
        DECIMAL_V2: [0]
        DEFAULT_JOIN_DISTRIBUTION_MODE: [0]
        DEFAULT_ORDER_BY_LIMIT: [-1]
        DEFAULT_SPILLABLE_BUFFER_SIZE: [2097152]
        DISABLE_CACHED_READS: [0]
        DISABLE_CODEGEN: [0]
        DISABLE_CODEGEN_ROWS_THRESHOLD: [50000]
        DISABLE_OUTERMOST_TOPN: [0]
        DISABLE_ROW_RUNTIME_FILTERING: [0]
        DISABLE_STREAMING_PREAGGREGATIONS: [0]
        DISABLE_UNSAFE_SPILLS: [0]
        ENABLE_EXPR_REWRITES: [1]
        EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100]
        EXPLAIN_LEVEL: [1]
        HBASE_CACHE_BLOCKS: [0]
        HBASE_CACHING: [0]
        MAX_ERRORS: [100]
        MAX_IO_BUFFERS: [0]
        MAX_NUM_RUNTIME_FILTERS: [10]
        MAX_ROW_SIZE: [524288]
        MAX_SCAN_RANGE_LENGTH: [0]
        MEM_LIMIT: [0]
        MIN_SPILLABLE_BUFFER_SIZE: [65536]
        MT_DOP: [0]
        NUM_NODES: [0]
        NUM_SCANNER_THREADS: [0]
        OPTIMIZE_PARTITION_KEY_SCANS: [0]
        PARQUET_ANNOTATE_STRINGS_UTF8: [0]
        PARQUET_ARRAY_RESOLUTION: [2]
        PARQUET_DICTIONARY_FILTERING: [1]
        PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0]
        PARQUET_FILE_SIZE: [0]
        PARQUET_READ_STATISTICS: [1]
        PREFETCH_MODE: [1]
        QUERY_TIMEOUT_S: [0]
        REPLICA_PREFERENCE: [0]
        REQUEST_POOL: []
        RESERVATION_REQUEST_TIMEOUT: [0]
        RM_INITIAL_MEM: [0]
        RUNTIME_BLOOM_FILTER_SIZE: [1048576]
        RUNTIME_FILTER_MAX_SIZE: [16777216]
        RUNTIME_FILTER_MIN_SIZE: [1048576]
        RUNTIME_FILTER_MODE: [2]
        RUNTIME_FILTER_WAIT_TIME_MS: [0]
        S3_SKIP_INSERT_STAGING: [1]
        SCAN_NODE_CODEGEN_THRESHOLD: [1800000]
        SCHEDULE_RANDOM_REPLICA: [0]
        SCRATCH_LIMIT: [-1]
        SEQ_COMPRESSION_MODE: [0]
        STRICT_MODE: [0]
        SUPPORT_START_OVER: [false]
        SYNC_DDL: [0]
        V_CPU_CORES: [0]
    
    Shell Options
        LIVE_PROGRESS: False
        LIVE_SUMMARY: False
    
    Variables:
        No variables defined.

    作为命令行参数值不需要带引号,否则需要引号。所有可用的选项可以参考“Query Options for the SET Statement”。

  • 相关阅读:
    滴滴日送400万红包,仅仅为人群不冷漠?
    C++提供的四种新式转换--const_cast dynamic_cast reinterpret_cast static_cast
    GreenDao开源ORM框架浅析
    Python 计数器
    Linux虚拟内存的添加
    Linux iptables
    Python set
    Python dict get items pop update
    Python contains
    Python reverse
  • 原文地址:https://www.cnblogs.com/zhjh256/p/10664450.html
Copyright © 2020-2023  润新知