• greenplum 5.7 + create table + insert into


    os: centos 7.4
    gp: gpdb-5.7.0

    三台机器
    node1 为master host
    node2、node3为segment host

    psql 登录 node1 master

    $ psql -d peiybdb
    peiybdb=# select current_database();
     current_database 
    ------------------
     peiybdb
    (1 row)
    
    create table tmp_t0(
    c1 varchar(100),
    c2 varchar(100),
    c3 varchar(100)
    );
    NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'c1' as the Greenplum Database data distribution key for this table.
    HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
    CREATE TABLE
    
    peiybdb=# d+ tmp_t0
                            Table "public.tmp_t0"
     Column |          Type          | Modifiers | Storage  | Description 
    --------+------------------------+-----------+----------+-------------
     c1     | character varying(100) |           | extended | 
     c2     | character varying(100) |           | extended | 
     c3     | character varying(100) |           | extended | 
    Has OIDs: no
    Distributed by: (c1)
    

    查看node1、node2、node3的文件目录大小

    [gpadmin@node1 gpseg-1]$ pwd
    /u01/greenplum-data/gpseg-1
    [gpadmin@node1 gpseg-1]$ du -sh
    109M    .
    
    [gpadmin@node2 gpseg0]$ pwd
    /u01/greenplum-data/gpseg0
    [gpadmin@node2 gpseg0]$ du -sh
    109M    .
    
    [gpadmin@node3 gpseg1]$ pwd
    /u01/greenplum-data/gpseg1
    [gpadmin@node3 gpseg1]$ du -sh
    109M    .

    插入数据

    peiybdb=# select gp_opt_version();
                    gp_opt_version                 
    -----------------------------------------------
     GPOPT version: 2.55.20, Xerces version: 3.1.2
    (1 row)
    
    
    set optimizer=on;
    set optimizer_enumerate_plans=on;
    set optimizer_minidump=always;
    set optimizer_enable_constant_expression_evaluation=off;
    set client_min_messages='debug5';
    
    
    insert into tmp_t0
    (c1,c2,c3)
    select generate_series(1, 1000000) as c1,
           md5(random()::text) as c2 ,
           md5(random()::text) as c3
    ;
    

    node1 master上查询pg_stat_activity

    peiybdb=# x
    Expanded display is on.
    peiybdb=# select * from pg_stat_activity;
    -[ RECORD 1 ]----+-------------------------------------------
    datid            | 16384
    datname          | peiybdb
    procpid          | 10904
    sess_id          | 20
    usesysid         | 10
    usename          | gpadmin
    current_query    | insert into tmp_t0
                     | (c1,c2,c3)
                     | select generate_series(1, 1000000) as c1, 
                     |        md5(random()::text) as c2 , 
                     |        md5(random()::text) as c3 
                     | 
    waiting          | f
    query_start      | 2018-05-02 07:20:33.325637+00
    backend_start    | 2018-05-02 07:13:17.573451+00
    client_addr      | 192.168.56.1
    client_port      | 63217
    application_name | 
    xact_start       | 2018-05-02 07:20:33.3212+00
    waiting_reason   | 
    rsgid            | 0
    rsgname          | 
    rsgqueueduration | 
    -[ RECORD 2 ]----+-------------------------------------------
    datid            | 16384
    datname          | peiybdb
    procpid          | 10906
    sess_id          | 21
    usesysid         | 10
    usename          | gpadmin
    current_query    | <IDLE>
    waiting          | f
    query_start      | 2018-05-02 07:17:17.675179+00
    backend_start    | 2018-05-02 07:13:17.580506+00
    client_addr      | 192.168.56.1
    client_port      | 63218
    application_name | 
    xact_start       | 
    waiting_reason   | 
    rsgid            | 0
    rsgname          | 
    rsgqueueduration | 
    -[ RECORD 3 ]----+-------------------------------------------
    datid            | 16384
    datname          | peiybdb
    procpid          | 11095
    sess_id          | 22
    usesysid         | 10
    usename          | gpadmin
    current_query    | select * from pg_stat_activity;
    waiting          | f
    query_start      | 2018-05-02 07:28:07.30111+00
    backend_start    | 2018-05-02 07:27:54.501407+00
    client_addr      | 
    client_port      | -1
    application_name | psql
    xact_start       | 2018-05-02 07:28:07.30111+00
    waiting_reason   | 
    rsgid            | 0
    rsgname          | 
    rsgqueueduration | 

    再次查看node1、node2、node3的文件目录大小

    [gpadmin@node1 gpseg-1]$ du -sh
    109M    .
    
    [gpadmin@node2 gpseg0]$ du -sh
    224M    .
    
    [gpadmin@node3 gpseg1]$ du -sh
    224M    .

    可以观察到 master 节点node1的数据文件大小并没有发生变化,segment的node2、node3的数据文件增长了不少。
    主要就是由于 master 节点是用来存储定义,segment是用来存储数据的。

    查看tmp_t0的定义

    peiybdb=# 
    peiybdb=# d
                  List of relations
     Schema |  Name  | Type  |  Owner  | Storage 
    --------+--------+-------+---------+---------
     public | tmp_t0 | table | gpadmin | heap
    (1 row)
    
    peiybdb=# d+ tmp_t0;
                            Table "public.tmp_t0"
     Column |          Type          | Modifiers | Storage  | Description 
    --------+------------------------+-----------+----------+-------------
     c1     | character varying(100) |           | extended | 
     c2     | character varying(100) |           | extended | 
     c3     | character varying(100) |           | extended | 
    Has OIDs: no
    Distributed by: (c1)

    Distributed by 这个就是tmp_t0表的分布列,表的分布列一定要合理,能够降数据比较均匀的分布到各个segment节点上。
    检索数据时能够在多个节点并发处理数据。

  • 相关阅读:
    python3.4+pyspider爬58同城(二)
    pyspider安装后,点击run,报pyhton has stop working或python已停止运行的错误
    PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\Users\video\AppData\Local\Temp\tmpfipzk8ma'--问题解决
    使用firefoxprofile,selenium设置firefox,初始化firefox
    排序算法讲解
    Java寫聊天小程序
    csproj项目工程文件的脚本/动态链接库设置
    常见的内存加密防破解及安全方案
    Animator直接引用FBX下的AnimClip与直接引用单独的AnimClip的对比
    Jupyter多内核的手动配置(Python多版本)
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9792995.html
Copyright © 2020-2023  润新知