• Postgres的索引01


    一.PG 9.3有以下索引类型

    1.b-tree
    • 1.1支持前导模糊查询,如xxx%或者^'xxx'
    • 1.2忽略大小写字符前导模糊查询,如ILIKE 'XXX%'或者~*'^xxx'
    • 1.3支持常见的条件运算符< = <= = >= >
    2.hash
    • 仅支持=条件运算符
    3.gin
    • 支持多列值索引,例如数据类型,全文检索类型
    • <@ 被包含 array[1,2,3] <@ array[2,3,4]
    • @> 包含 array[1,2,3] @> array[2]
    • = 相等 array[1,2,3] = array[1,2,3]
    • && 相交 array[1,2,3]&& array[2]
    4.gist
    • 不是单类索引,算是一种索引框架,支持许多不同的索引策略,可以自定义条件运算符
    • 支持近邻排序,如取某一个点的10个近邻
    select * from places order by localtion <-> point '(101,456)' limit 10;
    
    • << -- 严格在左侧, 例如circle '((0,0),1)' << circle '((5,0),1)'
    • &< -- 表示左边的平面体不会扩展到超过右边的平面体的右边. 例如box '((0,0),(1,1))' &< box '((0,0),(2,2))'
    • &> -- 表示左边的平面体不会扩展到超过右边的平面体的左边. 例如box '((0,0),(3,3))' &> box '((0,0),(2,2))'
    • >> -- 严格在右
    • <<| -- 严格在下
    • &<| -- 不会扩展到超出上面
    • |&> -- 不会扩展到超出下面
    • |>> -- 严格在上
    • @> -- 包含
    • <@ -- 被包含
    • ~= -- 相同
    • && -- 相交

      http://www.postgresql.org/docs/9.3/static/functions-geometry.html

    5.sp-gist
    • 与gist类似,也是一张索引框架,支持基于磁盘存储的非平衡数据结构,如四叉树、k-d树、radix树
    • 支持操作符 << >> ~= <@
    • <^ 在下面,circle'((0,0),1)' <^ circle'((0,5),1) 左边的圆在右边的圆的下边
    • >^ 在上面,circle'((0,5),1)' 》^ circle'((0,0),1) 左边的圆在右边的圆的上边

    二.使用索引的好处

    1.利用索引进行排序减少CPU开销
    • 1.1 查询条件就是索引列
    postgres=# c db1
    You are now connected to database "db1" as user "yzw".
    db1=# create table test(id int,info text,crt_time timestamp);
    CREATE TABLE
    db1=# insert into test select generate_series(1,10000), md5(random()::text),clock_timestamp();
    INSERT 0 10000
    db1=# create index idx_test_1 on test(id);
    CREATE INDEX
    db1=# explain analyze select * from test where id<100 order by id;
                                                              QUERY PLAN                                                          
    ------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=396.80..405.13 rows=3333 width=44) (actual time=0.106..0.111 rows=99 loops=1)
       Sort Key: id
       Sort Method: quicksort  Memory: 32kB
       ->  Bitmap Heap Scan on test  (cost=66.12..201.78 rows=3333 width=44) (actual time=0.050..0.059 rows=99 loops=1)
             Recheck Cond: (id < 100)
             Heap Blocks: exact=1
             ->  Bitmap Index Scan on idx_test_1  (cost=0.00..65.28 rows=3333 width=0) (actual time=0.036..0.036 rows=99 loops=1)
                   Index Cond: (id < 100)
     Planning time: 0.520 ms
     Execution time: 0.178 ms
    (10 rows)
    
    • 1.2 查询条件不是索引列
    db1=# explain analyze select * from test where info='c969799412fed1c8f91eff5e65353a85' order by id;
                                                  QUERY PLAN                                               
    -------------------------------------------------------------------------------------------------------
     Sort  (cost=219.01..219.01 rows=1 width=45) (actual time=1.112..1.112 rows=1 loops=1)
       Sort Key: id
       Sort Method: quicksort  Memory: 25kB
       ->  Seq Scan on test  (cost=0.00..219.00 rows=1 width=45) (actual time=0.011..1.104 rows=1 loops=1)
             Filter: (info = 'c969799412fed1c8f91eff5e65353a85'::text)
             Rows Removed by Filter: 9999
     Planning time: 0.081 ms
     Execution time: 1.129 ms
    (8 rows)
    
    > 为何都有排序的节点Sort Key?
    
    # 关闭enable_seqscan全表扫描后,查询索引列没有了排序节点
    db1=# set enable_seqscan=off;
    SET
    db1=# explain analyze select * from test where id<100 order by id;
                                                          QUERY PLAN                                                      
    ----------------------------------------------------------------------------------------------------------------------
     Index Scan using idx_test_1 on test  (cost=0.29..10.04 rows=100 width=45) (actual time=0.005..0.016 rows=99 loops=1)
       Index Cond: (id < 100)
     Planning time: 0.119 ms
     Execution time: 0.034 ms
    (4 rows)
    

    enable_seqscan 9.4默认是on,9.3是off?

    2.加速带条件的查询,删除,更新
    • 2.1 正常开启全表扫描和索引扫描情况下,有索引的列查找走索引
    db1=# set enable_seqscan=on;
    SET
    db1=# explain analyze select * from test where id=1;
                                                        QUERY PLAN                                                    
    ------------------------------------------------------------------------------------------------------------------
     Index Scan using idx_test_1 on test  (cost=0.29..8.30 rows=1 width=45) (actual time=0.014..0.015 rows=1 loops=1)
       Index Cond: (id = 1)
     Planning time: 0.067 ms
     Execution time: 0.032 ms
    (4 rows)
    
    • 2.2在没有索引条件下的查询效率,即使有索引列也会走全表扫描
    db1=# show enable_indexscan;
     enable_indexscan 
    ------------------
     on
    (1 row)
    
    db1=# show enable_bitmapscan;
     enable_bitmapscan 
    -------------------
     on
    (1 row)
    
    db1=# set enable_indexscan=off,enable_bitmapscan=off;
    db1=# set enable_indexscan=off;set enable_bitmapscan=off;
    SET
    SET
    db1=# show enable_indexscan;show enable_bitmapscan;
     enable_indexscan 
    ------------------
     off
    (1 row)
    
     enable_bitmapscan 
    -------------------
     off
    (1 row)
    # 关闭索引后,变成全表扫描了
    db1=# explain analyze select * from test where id=1;
                                               QUERY PLAN                                            
    -------------------------------------------------------------------------------------------------
     Seq Scan on test  (cost=0.00..219.00 rows=1 width=45) (actual time=0.012..0.943 rows=1 loops=1)
       Filter: (id = 1)
       Rows Removed by Filter: 9999
     Planning time: 0.138 ms
     Execution time: 0.971 ms
    (5 rows)
    
    • 2.3 加速join操作
    db1=# set enable_indexscan=on;set enable_bitmapscan=on;
    SET
    SET
    db1=# insert into test1 select generate_series(1,10000), md5(random()::text),clock_timestamp();
    INSERT 0 10000
    

    test1表没有建索引,走全表扫描,test表走id索引,并且出现嵌套循环

    db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on(t1.id=t2.id and t2.id=1);
                                                            QUERY PLAN                                                         
    ---------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.29..227.31 rows=1 width=90) (actual time=0.032..0.896 rows=1 loops=1)
       ->  Index Scan using idx_test_1 on test t1  (cost=0.29..8.30 rows=1 width=45) (actual time=0.019..0.020 rows=1 loops=1)
             Index Cond: (id = 1)
       ->  Seq Scan on test1 t2  (cost=0.00..219.00 rows=1 width=45) (actual time=0.010..0.873 rows=1 loops=1)
             Filter: (id = 1)
             Rows Removed by Filter: 9999
     Planning time: 0.124 ms
     Execution time: 0.927 ms
    (8 rows)
    

    给test1表增加索引后,也走索引,test1表的索引数据在内存,因此速度更快

    db1=# create index idx_test1_id on test1(id);
    CREATE INDEX
    db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on(t1.id=t2.id and t2.id=1);
                                                              QUERY PLAN                                                          
    ------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.57..16.62 rows=1 width=90) (actual time=0.033..0.034 rows=1 loops=1)
       ->  Index Scan using idx_test_1 on test t1  (cost=0.29..8.30 rows=1 width=45) (actual time=0.011..0.012 rows=1 loops=1)
             Index Cond: (id = 1)
       ->  Index Scan using idx_test1_id on test1 t2  (cost=0.29..8.30 rows=1 width=45) (actual time=0.020..0.020 rows=1 loops=1)
             Index Cond: (id = 1)
     Planning time: 0.240 ms
     Execution time: 0.059 ms
    (7 rows)
    

    merge join,两个join的表按照join列做好排序后,再进行join,也能用上索引,通常来说,能够使用merge join的地方,使用hash join更快

    db1=# show enable_hashjoin;
     enable_hashjoin 
    -----------------
     on
    (1 row)
    
    db1=# show enable_mergejoin;
     enable_mergejoin 
    ------------------
     on
    (1 row)
    # 关闭hashjoin
    set enable_hashjoin=off;
    db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on t1.id=t2.id;
                                                                   QUERY PLAN                                                               
    ----------------------------------------------------------------------------------------------------------------------------------------
     Merge Join  (cost=0.57..884.57 rows=10000 width=90) (actual time=0.020..10.837 rows=10000 loops=1)
       Merge Cond: (t1.id = t2.id)
       ->  Index Scan using idx_test_1 on test t1  (cost=0.29..367.29 rows=10000 width=45) (actual time=0.006..2.453 rows=10000 loops=1)
       ->  Index Scan using idx_test1_id on test1 t2  (cost=0.29..367.29 rows=10000 width=45) (actual time=0.006..3.625 rows=10000 loops=1)
     Planning time: 0.309 ms
     Execution time: 11.304 ms
    (6 rows)
    # 如果没有索引,效率最差,先全表扫描,然后排序,再join
    db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on t1.id=t2.id;
                                                           QUERY PLAN                                                        
    -------------------------------------------------------------------------------------------------------------------------
     Merge Join  (cost=1716.77..1916.77 rows=10000 width=90) (actual time=3.090..7.286 rows=10000 loops=1)
       Merge Cond: (t1.id = t2.id)
       ->  Sort  (cost=858.39..883.39 rows=10000 width=45) (actual time=1.571..2.007 rows=10000 loops=1)
             Sort Key: t1.id
             Sort Method: quicksort  Memory: 1166kB
             ->  Seq Scan on test t1  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.005..0.789 rows=10000 loops=1)
       ->  Sort  (cost=858.39..883.39 rows=10000 width=45) (actual time=1.514..2.039 rows=10000 loops=1)
             Sort Key: t2.id
             Sort Method: quicksort  Memory: 1166kB
             ->  Seq Scan on test1 t2  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.003..0.748 rows=10000 loops=1)
     Planning time: 0.171 ms
     Execution time: 7.614 ms
    (12 rows)
    # 自动使用hash join
    db1=# set enable_hashjoin=on;set enable_indexscan=on;set enable_bitmapscan=on;
    SET
    db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on t1.id=t2.id;
                                                           QUERY PLAN                                                        
    -------------------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=319.00..763.00 rows=10000 width=90) (actual time=2.208..7.150 rows=10000 loops=1)
       Hash Cond: (t1.id = t2.id)
       ->  Seq Scan on test t1  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.005..0.966 rows=10000 loops=1)
       ->  Hash  (cost=194.00..194.00 rows=10000 width=45) (actual time=2.160..2.160 rows=10000 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 782kB
             ->  Seq Scan on test1 t2  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.003..0.959 rows=10000 loops=1)
     Planning time: 0.211 ms
     Execution time: 7.502 ms
    (8 rows)
    
    3.加速外键约束更新和删除操作
    create table p(id int primary key, info text, crt_time timestamp);
    create table f(id int primary key, p_id int references p(id) on delete cascade on update cascade, info text, crt_time timestamp);
    insert into p select generate_series(1,10000), md5(random()::text), clock_timestamp();
    insert into f select generate_series(1,10000), generate_series(1,10000), md5(random()::text), clock_timestamp();
    

    f表的p_id列未加索引情况下

    db1=# explain (analyze,verbose,costs,buffers,timing) update p set id=1 where id=0;
                                                           QUERY PLAN                                                       
    ------------------------------------------------------------------------------------------------------------------------
     Update on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.053..0.053 rows=0 loops=1)
       Buffers: shared hit=7
       ->  Index Scan using p_pkey on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.019..0.019 rows=1 loops=1)
             Output: 1, info, crt_time, ctid
             Index Cond: (p.id = 0)
             Buffers: shared hit=3
     Planning time: 0.068 ms
     Trigger RI_ConstraintTrigger_a_16424 for constraint f_p_id_fkey on p: time=1.225 calls=1 # p表上耗时长
     Trigger RI_ConstraintTrigger_c_16426 for constraint f_p_id_fkey on f: time=0.068 calls=1
     Execution time: 1.377 ms
    (10 rows)
    

    增加p表索引后

    create index idx_f_1 on f(p_id);
    db1=#  explain (analyze,verbose,costs,buffers,timing) update p set id=0 where id=1;
                                                           QUERY PLAN                                                       
    ------------------------------------------------------------------------------------------------------------------------
     Update on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.055..0.055 rows=0 loops=1)
       Buffers: shared hit=7
       ->  Index Scan using p_pkey on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.022..0.023 rows=1 loops=1)
             Output: 0, info, crt_time, ctid
             Index Cond: (p.id = 1)
             Buffers: shared hit=3
     Planning time: 0.079 ms
     Trigger RI_ConstraintTrigger_a_16424 for constraint f_p_id_fkey on p: time=0.132 calls=1 # p表耗时短
     Trigger RI_ConstraintTrigger_c_16426 for constraint f_p_id_fkey on f: time=0.085 calls=1
     Execution time: 0.307 ms
    (10 rows)
    
    4.索引在排他约束中的使用
    • 要求左右操作符互换对结果没有影响,例如x=y,y=x结果都是true或者unknown
    db1=# CREATE TABLE test2(id int,geo point,EXCLUDE USING btree (id WITH pg_catalog.=));
    CREATE TABLE
    db1=# insert into test2 (id) values (1);
    INSERT 0 1
    db1=# insert into test2 (id) values (1);
    ERROR:  conflicting key value violates exclusion constraint "test2_id_excl"
    DETAIL:  Key (id)=(1) conflicts with existing key (id)=(1).
    
    > 模拟unique
    
    5.加速唯一值约束、排他约束
    • 主键
    • 唯一键
    CREATE TABLE test3(id int,geo point,EXCLUDE USING spGIST (geo WITH pg_catalog.~=));
    select * from pg_indexes where tablename='test3';
    db1=# select * from pg_indexes where tablename='test3';
     schemaname | tablename |   indexname    | tablespace |                        indexdef                         
    ------------+-----------+----------------+------------+---------------------------------------------------------
     public     | test3     | test3_geo_excl |            | CREATE INDEX test3_geo_excl ON test3 USING spgist (geo)
    (1 row)
    

    三.索引的弊端

    • 随着表的记录块的变迁需要更新,因此会对这类操作带来一定的性能影响
    • 块不变更的情况下触发hot特性,可以不需要更新索引
    • 写多读少的场景,索引弊端可能大于其好处

    四.注意事项

    • 1.正常创建索引时,会阻断除查询意外的其他操作
    • 2.使用并行CONCURRENTLY选项后,可以允许同时对标的DML操作,但是对于频繁DML的表,这种创建索引的时间非常长
    • 3.某些索引不记录WAL,所以如果有利于WAL进行数据恢复的情况,如crash recovery,流复制,warm standby等,这类索引在使用前需要重建(HASH索引)
  • 相关阅读:
    c语言使用指针交换数值
    OD
    ADO Connection failure
    Javascript 日期 加减
    RAD C++Builder xe7 std::map xtree BUG
    c++ map
    c++ vector
    TDictionary 是delphi用的,c++builder用起来太吃力。
    datasnap 如何监控客户端的连接情况
    DataSnap高级技术(7)—TDSServerClass中Lifecycle生命周期三种属性说明
  • 原文地址:https://www.cnblogs.com/jenvid/p/10180534.html
Copyright © 2020-2023  润新知