• lightdb copy from where\values多值加载时过滤数据


    copy可用于快速导入和导出数据,主要用途如下:

    • The COPY command moves data between PostgreSQL tables and standard file system files.
    • COPY TO copies the contents of the table to the file.
    • COPY TO can also copy the results of the SELECT query. That is, if the column list is specified, COPY TO only copies the data in the specified columns to the file.
    • The COPY command instructs the PostgreSQL server to read from or write to the file directly. Therefore, the file must be accessible to the PostgreSQL user.
    • COPY FROM copies the data from the file to the table.
    • When using the COPY FROM command, each field in the file is inserted sequentially to the specified column. Table columns not specified in the COPY FROM column list get their default values.
    • It is necessary to grant SELECT privilege on the table read by COPY TO, and the INSERT privilege in the table where the values are inserted with COPY FROM.
    • COPY TO can only be used with tables, not views. However, if we want to copy the contents of the view, we must feed the COPY command with the sql query.
    • Files named in the COPY command are read or written directly by the server, not by the client application. Therefore, it must be located on or accessible to the database server machine, not the client either.
    • We shouldn’t confuse COPY with \copy in psql. \copy calls COPY FROM STDIN or COPY TO STDOUT and then retrieves and stores the data from a file accessible by the psql client. Therefore, file accessibility and access rights depend on the client rather than the server when using \copy.

      copy from除了直接导入数据外,还能包含where子句进行数据过滤,如下:

    zjh@postgres=# create table big_table(id int, v varchar(100));
    CREATE TABLE
    
    zjh@postgres=# insert into big_table select i, i || '-' || i from generate_series(1,100000) i;
    INSERT 0 100000
    zjh@postgres=# exit
    [zjh@hs-10-20-30-193 ~]$ ltsql -p14000 postgres
    ltsql (13.3-22.2)
    Type "help" for help.
    
    zjh@postgres=# \copy ( select * from big_table) to '/home/zjh/big_table.csv' delimiter '|' csv header;
    COPY 100000
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
    COPY 99000
    zjh@postgres=# \timing on
    Timing is on.
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
    COPY 99000
    Time: 44.725 ms
    zjh@postgres=# 
    zjh@postgres=# \timing on
    Timing is on.
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
    COPY 99000
    Time: 45.757 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true);
    COPY 100000
    Time: 41.660 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true);
    COPY 100000
    Time: 41.146 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
    COPY 99000
    Time: 42.736 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
    COPY 99000
    Time: 42.753 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
    COPY 99000
    Time: 42.475 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v like '99%';
    COPY 1111
    Time: 28.995 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%';
    COPY 98889
    Time: 43.442 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%';
    COPY 98889
    Time: 40.352 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%';
    COPY 98889
    Time: 60.274 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%';   -- 不同函数对性能影响很大
    COPY 98889
    Time: 61.503 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%';  -- 不同函数对性能影响大
    COPY 98889
    Time: 53.131 ms
    zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%';
    COPY 98889
    Time: 49.655 ms
    zjh@postgres=# 

    pg_bulkload是没有该特性的。如下:

    其filter函数在记录不满足要求的时候返回null记录,而不是丢弃该记录,所以并不符合该语义。

    无论是copy还是pg_bulkload,相比oracle sql loader,都存在很大的不足,常见的特性包括如下:

    • 基于绝对位置的分隔符
    • pg_bulkload不支持过滤记录
    • pg_bulkload/copy都不支持对字段调用函数
    • 不支持replace
    • copy不支持parallel/direct等特性
    • 不支持指定列的默认值
    • 缓冲大小优化等

    lightdb 22.4中将完整的支持上述特性,通过ltldr提供。

    当前版本可以通过insert values多值实现该特性。如下:

    insert into big_table select * from (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase) where depno = 1;

    需要注意,pg不支持超过1000个绑定变量值,这极大地限制了insert values多值的价值,甚至不如jdbc batch的优化。 

    此时,可以考虑使用file_fdw+nfs实现,如下:

    http://www.light-pg.com/docs/lightdb/13.3-22.2/file-fdw.html

    http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-values.html

    https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

    http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-copy.html

    https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-sql-loader-field-list-contents.html#GUID-83FF6EDC-C7F8-4F29-8994-59153BE31924 # 应用SQL函数到字段

    https://www.depesz.com/2019/02/03/waiting-for-postgresql-12-allow-copy-from-to-filter-data-using-where-conditions/

    https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/moving-data-with-postgresql-copy-and-copy-commands/ba-p/1561266

  • 相关阅读:
    jquery 添加关键字小插件
    打印出所有每一位都与其他位不重复的自然数
    尾递归版,斐波那契数列
    如何在移动端宽度自适应实现正方型?
    css隐藏元素的六类13种方法
    如何给行内元素设置宽高?
    css实现垂直水平居中的方法
    pwa
    目录树生成工具treer
    服务端返回的json数据,导致前端报错的原因及解决方法
  • 原文地址:https://www.cnblogs.com/zhjh256/p/16705038.html
Copyright © 2020-2023  润新知