• PostgreSQL中的not in、not exists、left join/is null


    哪种方法是找出在一张表中,而不在另一张表中的记录的最佳方法呢?

    SELECT  l.*
    FROM    t_left l
    LEFT JOIN
            t_right r
    ON      r.value = l.value
    WHERE   r.value IS NULL;
    

      

    SELECT  l.*
    FROM    t_left l
    WHERE   l.value NOT IN
            (
            SELECT  value
            FROM    t_right r
            );
    

      

    SELECT l.*
    FROM  t_left l
    WHERE  NOT EXISTS
        (
        SELECT NULL
        FROM  t_right r
        WHERE  r.value = l.value
        );
    

      

    我们先把环境准备一下:

    postgres 11.9

    CREATE TABLE t_left (
        id INT NOT NULL PRIMARY KEY,
        value INT NOT NULL,
        stuffing VARCHAR(200) NOT NULL
    );
    CREATE TABLE t_right (
        id INT NOT NULL PRIMARY KEY,
        value INT NOT NULL,
        stuffing VARCHAR(200) NOT NULL
    );
    

      

    CREATE INDEX ix_left_value ON t_left (value);
    CREATE INDEX ix_right_value ON t_right (value);
    

      

    INSERT
    INTO  t_left
    SELECT s, s % 10000, RPAD('Value '|| s || ' ', 200, '*')
    FROM generate_series(1, 100000) s;
    

      

    INSERT
    INTO  t_right
    SELECT s, s % 10000 + 1, RPAD('Value '|| s || ' ', 200, '*')
    FROM  generate_series(1, 1000000) s;
    

      

    ANALYZE t_left;
    ANALYZE t_right;
    

    表t_left包含100000行记录,其中有1000个不同的记录;

    只是测试

    表t_right包含100000行记录,其中有1000个不同的记录。

    t_left表有10条记录不存在与t_right表中。

    NOT IN

    先看看not in

    postgres=# SELECT  l.id, l.value
    postgres-# FROM    t_left l
    postgres-# WHERE   value NOT IN
    postgres-#         (
    postgres(#         SELECT  value
    postgres(#         FROM    t_right r
    postgres(#         );
       id   | value 
    --------+-------
      20000 |     0
      10000 |     0
      60000 |     0
      30000 |     0
      40000 |     0
      50000 |     0
      70000 |     0
      80000 |     0
      90000 |     0
     100000 |     0
    (10 rows)
    
    postgres=# explain analyze SELECT  l.id, l.value
    postgres-# FROM    t_left l
    postgres-# WHERE   value NOT IN
    postgres-#         (
    postgres(#         SELECT  value
    postgres(#         FROM    t_right r
    postgres(#         );
                                                                   QUERY PLAN                                                               
    ----------------------------------------------------------------------------------------------------------------------------------------
     Gather  (cost=1000.00..1520921531.00 rows=50000 width=8) (actual time=7329.726..40554.033 rows=10 loops=1)
       Workers Planned: 1
       Workers Launched: 1
       ->  Parallel Seq Scan on t_left l  (cost=0.00..1520915531.00 rows=29412 width=8) (actual time=5506.368..40405.539 rows=5 loops=2)
             Filter: (NOT (SubPlan 1))
             Rows Removed by Filter: 49995
             SubPlan 1
               ->  Materialize  (cost=0.00..49211.00 rows=1000000 width=4) (actual time=0.001..0.534 rows=4084 loops=100000)
                     ->  Seq Scan on t_right r  (cost=0.00..40304.00 rows=1000000 width=4) (actual time=0.002..0.575 rows=3739 loops=37433)
     Planning Time: 0.155 ms
     Execution Time: 40557.112 ms
    (11 rows)
    
    postgres=# 
    

    这里之所以先讨论not in,是因为其对待null的方式特殊。

    不幸的是,postgresql优化器不能利用到t_right.value被定义为not null的事实。因此,不可以返回null值。(即not in不能返回null值)

    这里可以做一个小测试:

    postgres=# create table aa(id int,age int);
    postgres=# insert into aa values(1,1);
    postgres=# insert into aa values(2,2);
    postgres=# insert into aa(id) values(3);
    postgres=# select * from aa where age not in (1,2);
     id | age 
    ----+-----
    (0 rows)
    
    postgres=# select * from aa where age not in (1);
     id | age 
    ----+-----
      2 |   2
    (1 row)
    
    postgres=# 
    

    这就是为什么postgresql对not in查询使用一个特殊的访问方法。

    NOT EXISTS

    postgres=# SELECT  l.id, l.value
    postgres-# FROM    t_left l
    postgres-# WHERE   NOT EXISTS
    postgres-#         (
    postgres(#         SELECT  select
    postgres(#         FROM    t_right r
    postgres(#         WHERE   r.value = l.value
    postgres(#         );
       id   | value 
    --------+-------
      10000 |     0
      30000 |     0
      20000 |     0
      40000 |     0
      60000 |     0
      50000 |     0
      80000 |     0
      70000 |     0
      90000 |     0
     100000 |     0
    (10 rows)
    
    postgres=# explain analyze SELECT  l.id, l.value
    postgres-# FROM    t_left l
    postgres-# WHERE   NOT EXISTS
    postgres-#         (
    postgres(#         SELECT  value
    postgres(#         FROM    t_right r
    postgres(#         WHERE   r.value = l.value
    postgres(#         );
                                                                        QUERY PLAN                                                                    
    --------------------------------------------------------------------------------------------------------------------------------------------------
     Gather  (cost=1000.42..34410.22 rows=10 width=8) (actual time=31.555..262.307 rows=10 loops=1)
       Workers Planned: 1
       Workers Launched: 1
       ->  Nested Loop Anti Join  (cost=0.42..33409.22 rows=6 width=8) (actual time=43.314..257.792 rows=5 loops=2)
             ->  Parallel Seq Scan on t_left l  (cost=0.00..3619.24 rows=58824 width=8) (actual time=0.007..10.066 rows=50000 loops=2)
             ->  Index Only Scan using ix_right_value on t_right r  (cost=0.42..3.53 rows=100 width=4) (actual time=0.005..0.005 rows=1 loops=100000)
                   Index Cond: (value = l.value)
                   Heap Fetches: 99990
     Planning Time: 0.328 ms
     Execution Time: 262.348 ms
    (10 rows)
    
    postgres=# 
    

    LEFT JOIN / IS NULL

    postgres=# SELECT  l.id, l.value
    postgres-# FROM    t_left l
    postgres-# WHERE   NOT EXISTS
    postgres-#         (
    postgres(#         SELECT  value
    postgres(#         FROM    t_right r
    postgres(#         WHERE   r.value = l.value
    postgres(#         );
       id   | value 
    --------+-------
      80000 |     0
      10000 |     0
      20000 |     0
      30000 |     0
      40000 |     0
      50000 |     0
      60000 |     0
      70000 |     0
      90000 |     0
     100000 |     0
    (10 rows)
    
    postgres=# explain analyze SELECT  l.id, l.value
    postgres-# FROM    t_left l
    postgres-# WHERE   NOT EXISTS
    postgres-#         (
    postgres(#         SELECT  value
    postgres(#         FROM    t_right r
    postgres(#         WHERE   r.value = l.value
    postgres(#         );
                                                                        QUERY PLAN                                                                    
    --------------------------------------------------------------------------------------------------------------------------------------------------
     Gather  (cost=1000.42..34410.22 rows=10 width=8) (actual time=32.109..259.553 rows=10 loops=1)
       Workers Planned: 1
       Workers Launched: 1
       ->  Nested Loop Anti Join  (cost=0.42..33409.22 rows=6 width=8) (actual time=43.744..255.254 rows=5 loops=2)
             ->  Parallel Seq Scan on t_left l  (cost=0.00..3619.24 rows=58824 width=8) (actual time=0.008..9.777 rows=50000 loops=2)
             ->  Index Only Scan using ix_right_value on t_right r  (cost=0.42..3.53 rows=100 width=4) (actual time=0.005..0.005 rows=1 loops=100000)
                   Index Cond: (value = l.value)
                   Heap Fetches: 99990
     Planning Time: 0.396 ms
     Execution Time: 259.608 ms
    (10 rows)
    
    postgres=# 
    

      

    postgresql按照相同的方式对待left join和not exists,使用相同的执行计划(nested loop anti join)。

    至于NOT IN,这在语义上是不同的, PostgreSQL试图考虑这一点,并限制自己对子计划使用过滤器。

  • 相关阅读:
    文件系统管理
    软件包管理
    用户和用户组管理
    权限管理
    漏洞验证系列--MongoDB未授权访问
    【Jenkins】三、设置定时任务
    在CentOS Linux 7.5上安装MySQL
    CentOS7使用yum时File contains no section headers.解决办法
    Windows批处理(cmd/bat)常用命令学习
    Fiddler抓包工具总结
  • 原文地址:https://www.cnblogs.com/abclife/p/14230697.html
Copyright © 2020-2023  润新知