• Hive桶列BucketedTables


    The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

     In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

    set mapred.reduce.tasks = 3;

    set hive.enforce.bucketing = true;

    CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

    COMMENT 'A bucketed copy of user_info'

    PARTITIONED BY(ds STRING)

    CLUSTERED BY(user_id) INTO 3 BUCKETS;

    INSERT into TABLE user_info_bucketed

    PARTITION (ds='2015-07-25')

    values

    (100,'python','postgresql'), (101,'python','postgresql'), (102,'python','postgresql'), (103,'python','postgresql'), (104,'python','postgresql'), (105,'python','postgresql'), (106,'python','postgresql'), (107,'python','postgresql'), (108,'python','postgresql'), (109,'python','postgresql'), (111,'python','postgresql'), (112,'python','postgresql'), (113,'python','postgresql'), (114,'python','postgresql'), (115,'python','postgresql'), (116,'python','postgresql'), (117,'python','postgresql'), (118,'python','postgresql'), (119,'python','postgresql'), (120,'python','postgresql'), (121,'python','postgresql'), (122,'python','postgresql'), (2000,'R','Oracle'), (2001,'R','Oracle'), (2002,'R','Oracle'), (2003,'R','Oracle'), (2004,'R','Oracle'), (2005,'R','Oracle'), (2006,'R','Oracle'), (2007,'R','Oracle'), (2008,'R','Oracle'), (2009,'R','Oracle'), (2010,'R','Oracle'), (2011,'R','Oracle'), (2012,'R','Oracle'), (2013,'R','Oracle'), (2014,'R','Oracle'), (2015,'R','Oracle'), (2016,'R','Oracle'), (2017,'R','Oracle'), (2018,'R','Oracle'), (2019,'R','Oracle'), (2020,'R','Oracle'), (2030,'R','Oracle'), (2040,'R','Oracle'), (2050,'R','Oracle');

    [spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
    drwxrwxrwx   - huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
    -rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
    102pythonpostgresql
    105pythonpostgresql
    108pythonpostgresql
    111pythonpostgresql
    114pythonpostgresql
    117pythonpostgresql
    120pythonpostgresql
    2001ROracle
    2004ROracle
    2007ROracle
    2010ROracle
    2013ROracle
    2016ROracle
    2019ROracle
    2040ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
    100pythonpostgresql
    103pythonpostgresql
    106pythonpostgresql
    109pythonpostgresql
    112pythonpostgresql
    115pythonpostgresql
    118pythonpostgresql
    121pythonpostgresql
    2002ROracle
    2005ROracle
    2008ROracle
    2011ROracle
    2014ROracle
    2017ROracle
    2020ROracle
    2050ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
    101pythonpostgresql
    104pythonpostgresql
    107pythonpostgresql
    113pythonpostgresql
    116pythonpostgresql
    119pythonpostgresql
    122pythonpostgresql
    2000ROracle
    2003ROracle
    2006ROracle
    2009ROracle
    2012ROracle
    2015ROracle
    2018ROracle
    2030ROracle

  • 相关阅读:
    洛谷 P1903 【模板】分块/带修改莫队(数颜色)
    BZOJ 2038: [2009国家集训队]小Z的袜子(hose)
    LibreOJ #6208. 树上询问
    LibreOJ #6002. 「网络流 24 题」最小路径覆盖
    hdu 3861 The King’s Problem
    洛谷 P2868 [USACO07DEC]观光奶牛Sightseeing Cows
    洛谷 P2905 [USACO08OPEN]农场危机Crisis on the Farm
    洛谷 U3348 A2-回文数
    洛谷 P1001 A+B Problem
    LibreOJ #2130. 「NOI2015」软件包管理器
  • 原文地址:https://www.cnblogs.com/wwxbi/p/4662996.html
Copyright © 2020-2023  润新知