• Greeplum 系列(四) 实战


    Greeplum 系列(四) 实战

    表结构

    (1) 拉链表结构

    create table public.member_fatdt0 (
        member_id varchar(64),
        phoneno varchar(64),
        dw_beg_date date,
        dw_end_date date,
        dtype char(1),
        dw_status char(1),
        dw_ins_date date
    ) with (appendonly=true, compresslevel=5)
    distributed by(member_id)
    partition by range (dw_end_date)
    (
        partition p20111201 start (date '2011-12-01') inclusive,
        partition p20111202 start (date '2011-12-02') inclusive,
        partition p20111203 start (date '2011-12-03') inclusive,
        partition p20111204 start (date '2011-12-04') inclusive,
        partition p20111205 start (date '2011-12-05') inclusive,
        partition p20111206 start (date '2011-12-06') inclusive,
        partition p20111207 start (date '2011-12-07') inclusive,
        partition p30001231 start (date '3000-12-31') inclusive
        end (date '3001-01-01') exclusive
    );
    

    (2) 增量表结构

    create table public.member_delta (
        member_id varchar(64),
        phoneno varchar(64),
        action char(1),
        dw_ins_date date
    ) with (appendonly=true, compresslevel=5)
    distributed by(member_id);
    

    (3) 临时表结构

    create table public.member_tmp0 (
        member_id varchar(64),
        phoneno varchar(64),
        dw_beg_date date,
        dw_end_date date,
        dtype char(1),
        dw_status char(1),
        dw_ins_date date
    ) with (appendonly=true, compresslevel=5)
    distributed by(member_id)
    partition by list (dtype)
    (
        partition phis values('H'),
        partition pcur values('C'),
        default partition other
    );
    

    (4) 临时表结构

    create table public.member_tmp1 (
        member_id varchar(64),
        phoneno varchar(64),
        dw_beg_date date,
        dw_end_date date,
        dtype char(1),
        dw_status char(1),
        dw_ins_date date
    ) with (appendonly=true, compresslevel=5)
    distributed by(member_id);
    

    数据导入

    1. insert

    insert into public.member_delta values('mem006', '13100000006', 'I', date'2011-12-03');
    insert into public.member_delta values('mem002', '13100000002', 'D', date'2011-12-03');
    insert into public.member_delta values('mem003', '13100000003', 'U', date'2011-12-03');
    

    2. insert

    mem001,13100000001,2011-12-01,3000-12-31,C,I,2011-12-01
    mem002,13100000002,2011-12-01,3000-12-31,C,I,2011-12-01
    mem003,13100000003,2011-12-01,3000-12-31,C,I,2011-12-01
    mem004,13100000004,2011-12-01,3000-12-31,C,I,2011-12-01
    mem005,13100000005,2011-12-01,3000-12-31,C,I,2011-12-01
    
    copy public.member_fatdt0_1_prt_p30001231 from '/home/gpadmin/member_his_init.dat' with delimiter ',';
    

    3. 外部表

    首先,启动 gpfdist 服务

    nohup gpfdist -d /home/gpadmin/data -p 8888 -l /home/gpadmin/data/gpfdist.log &
    

    其次,创建外部表

    drop external table if exists public.member_ext;
    create external table public.member_ext (
        member_id varchar(64),
        phoneno varchar(64),
        action char(1),
        dw_ins_date date
    ) 
    location ('gpfdist://localhost:8888/member_delta.dat')
    format 'text' (delimiter ',' null as '' escape 'off')
    encoding 'utf-8'
    log errors into member_err segment reject limit 2 rows;
    

    再编辑 /home/gpadmin/data/member_delta.dat 文件

    mem001,13100000001,I,2011-12-01
    

    最后,将外部表的数据加载到内部表

    insert into public.member_delta select * from public.member_ext;
    

    4. gpload

    gpload 是对外部表的一层封装,首先编写 gpload 控制文件 gpload.yml,代码如下:

    ---  
    VERSION: 1.0.0.1  
    DATABASE: test  
    USER: gpadmin  
    HOST: localhost 
    PORT: 5432  
    GPLOAD:  
      INPUT:  
        - SOURCE:  
            LOCAL_HOSTNAME:  
              - master  
            PORT: 8888  
            FILE:  
              - /home/gpadmin/data/member_delta.dat
        - COLUMNS:  
            - member_id: varchar(64),
            - phoneno: varchar(20),
            - action: char(1),
            - dw_ins_date: date
        - FORMAT: text  
        - DELIMITER: ','  
        - ERROR_LIMIT: 2
        - ERROR_TABLE: public.member_err 
      OUTPUT:  
        - TABLE: public.member_delta  
        - MODE: INSERT  
      SQL:
        - BEFORE: "truncate table public.member_delta"
        - AFTER: "analyze public.member_delta"
    

    执行 gpload 命令,结果如下:

    gpload -f gpload.yml
    
    2018-05-18 20:53:10|INFO|gpload session started 2018-05-18 20:53:10
    2018-05-18 20:53:10|INFO|started gpfdist -p 8888 -P 8889 -f "/home/gpadmin/data/member_delta.dat" -t 30
    2018-05-18 20:53:11|INFO|running time: 0.85 seconds
    2018-05-18 20:53:11|INFO|rows Inserted          = 1
    2018-05-18 20:53:11|INFO|rows Updated           = 0
    2018-05-18 20:53:11|INFO|data formatting errors = 0
    2018-05-18 20:53:11|INFO|gpload succeeded
    

    5. 可执行的外部表

    drop external table if exists public.member_ext;
    create external web table public.member_ext (
        member_id varchar(64),
        phoneno varchar(64),
        action char(1),
        dw_ins_date date
    ) 
    execute 'cat /home/gpadmin/data/member_delta.dat' on master
    format 'text' (delimiter ',' null as '' escape 'off')
    encoding 'utf-8';
    

    create table public.test_partition_list (
    member_id numeric,
    city character varying(32)
    ) distributed by(member_id)
    partition by list(city)
    (
    partition guangzhou values('guangzhou'),
    partition hangzhou values('hangzhou'),
    partition shanghai values('shanghai'),
    partition beijing values('beijing'),
    default partition other_city
    );


    每天用心记录一点点。内容也许不重要,但习惯很重要!

  • 相关阅读:
    mysql 数据库的简单操作 2
    mysql简单操作,增删查改.
    当mysq启动时出现错误1067时应如何解决
    JS中 逻辑或 || 逻辑与 && 的使用方法总结
    编写一段程序,运行时向用户提问“你考了多少分?(0~100)”,接受输入后判断其等级并显示出来。判断依据如下:等级={优 (90~100分);良 (80~89分);中 (60~69分);差 (0~59分);}
    IF的使用
    第一个输出程序 Console.WriteLine
    Day2_and_Day3 文件操作
    linux安装VLAN,系统怎么划分VLAN打标签上交换机
    Python3.5+selenium(11)脚本模块化&参数化
  • 原文地址:https://www.cnblogs.com/binarylei/p/9069419.html
Copyright © 2020-2023  润新知