• X 关于 BRIN 索引的实际测试


    网上的测试文章:
    https://blog.csdn.net/weixin_39540651/article/details/103770484

    关于 BRIN 索引的实际测试

    pg9.5增加的brin索引,存储了被索引字段在块级别的边界值(最大值、最小值)以及其他统计信息。
    BRIN索引的扫描原理很简单,扫描BRIN的元数据,根据元数据和用户输入的条件进行比较,过滤不符合条件的HEAP PAGE,只扫描需要扫描的HEAP PAGE。
    所以brin索引仅仅适合存储与值线性相关性很好的列。

    例子:
    我们创建两张表,一张表插入线性相关很好的数据,另一张表插入离散的数据来测试brin索引的性能。

    –建表插入数据:

    bill=# create table t_brin1(id int);
    CREATE TABLE
    bill=# create table t_brin2(id int);
    CREATE TABLE
    bill=# insert into t_brin1 select random()*1000000 from generate_series(1,10000000);
    INSERT 0 10000000
    bill=# insert into t_brin2 select generate_series(1,10000000);
    INSERT 0 10000000

    –在两张表上都创建brin索引

    bill=# create index idx_t_brin1 on t_brin1 using brin(id);
    CREATE INDEX
    bill=# create index idx_t_brin2 on t_brin2 using brin(id);
    CREATE INDEX

    –通过pg_stats表查看两张表索引列的线性相关性
    可以发现t_brin1表中数据比较离散,而t_brin2中数据线性相关性很好。

    bill=# select correlation from pg_stats where tablename='t_brin1';
    correlation
    ---------------
    -0.0061385944
    (1 row)

    bill=# select correlation from pg_stats where tablename='t_brin2';
    correlation
    -------------
    1
    (1 row)

    –测试性能

    t_brin1表:
    bill=# explain (analyze ,verbose,timing,costs,buffers) select * from t_brin1 where id between 1 and 100000;
    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------
    Bitmap Heap Scan on bill.t_brin1 (cost=262.18..194509.84 rows=988836 width=4) (actual time=1.332..1244.478 rows=1000823 loops=1)
    Output: id
    Recheck Cond: ((t_brin1.id >= 1) AND (t_brin1.id <= 100000))
    Rows Removed by Index Recheck: 8999177
    Heap Blocks: lossy=44248
    Buffers: shared hit=16324 read=27932 written=31
    -> Bitmap Index Scan on idx_t_brin1 (cost=0.00..14.97 rows=9999977 width=0) (actual time=1.299..1.299 rows=442880 loops=1)
    Index Cond: ((t_brin1.id >= 1) AND (t_brin1.id <= 100000))
    Buffers: shared hit=8
    Planning Time: 0.083 ms
    Execution Time: 1302.285 ms
    (11 rows)


    t_brin2表:

    bill=# explain (analyze ,verbose,timing,costs,buffers) select * from t_brin2 where id between 1 and 100000;
    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------
    Bitmap Heap Scan on bill.t_brin2 (cost=26.92..46009.03 rows=91581 width=4) (actual time=0.166..16.945 rows=100000 loops=1)
    Output: id
    Recheck Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))
    Rows Removed by Index Recheck: 15712
    Heap Blocks: lossy=512
    Buffers: shared hit=1 read=513
    -> Bitmap Index Scan on idx_t_brin2 (cost=0.00..4.03 rows=115607 width=0) (actual time=0.142..0.142 rows=5120 loops=1)
    Index Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))
    Buffers: shared hit=1 read=1
    Planning Time: 0.274 ms
    Execution Time: 22.391 ms
    (11 rows)


    可以发现t_brin2表上时间少了60多倍!其原因也很简单,因为t_brin1表上数据分散比较离散,所以使用brin索引扫描时必然要过滤掉大量的数据(t_brin1过滤了8999177条记录),自然性能就差很多了。这就好比我们用渔网去捕鱼,t_brin1的情况就类似我们收网上来一看渔网中鱼没几条,而大部分都是不需要的垃圾,而t_brin2的情况恰恰相反,一渔网拉上来全部是鱼。

    看到这你可能会感慨brin索引带来的性能提升十分巨大,但是不要急,我们还能继续优化上面的这个查询。

    BRIN索引有一个参数pages_per_range可以用来近一步提升brin索引的性能。这个参数用来决定每多少个数据块统计一次边界,例如pages_per_range默认值是128即表示每128个数据块统计一次边界。
    1、BRIN索引的精确度。pages_per_range=1,说明边界精确到1个数据块。pages_per_range越小,精度越高,过滤性就越好(注意过滤性越好取决于列的线性相关性很好的情况下,否则就是白瞎)。
    2、BRIN索引本身的大小。pages_per_range越小,BRIN索引本身就越大。BRIN越大,单次走BRIN索引扫描BRIN块的成本就越高。

    所以我们可以设置合适的pages_per_range来优化上面查询。

    bill=# drop index idx_t_brin2;
    DROP INDEX
    bill=# create index idx_t_brin2 on t_brin2 using brin(id) with (pages_per_range=16);
    CREATE INDEX
    bill=# explain (analyze ,verbose,timing,costs,buffers) select * from t_brin2 where id between 1 and 100000;
    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------
    Bitmap Heap Scan on bill.t_brin2 (cost=34.50..44354.49 rows=82814 width=4) (actual time=0.662..14.559 rows=100000 loops=1)
    Output: id
    Recheck Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))
    Rows Removed by Index Recheck: 1248
    Heap Blocks: lossy=448
    Buffers: shared hit=462
    -> Bitmap Index Scan on idx_t_brin2 (cost=0.00..13.79 rows=83153 width=0) (actual time=0.648..0.648 rows=4480 loops=1)
    Index Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))
    Buffers: shared hit=14
    Planning Time: 0.151 ms
    Execution Time: 20.102 ms
    (11 rows)

    可以看到查询时间又近一步缩短了!

  • 相关阅读:
    VMware workstation安装linux(ubuntu)配置详解
    虚拟机的三种网络模式
    虚拟机(VMware Workstation)的使用方法
    零基础学习hadoop到上手工作线路指导(初级篇)
    Hadoop到底能做什么?怎么用hadoop?
    什么是云计算技术
    什么是云计算
    静态数据库迁移
    apache-mysql-php安装与配置
    CentOS6.6下的authpuppy源码安装与配置完全纯净版,内有问题解答
  • 原文地址:https://www.cnblogs.com/chendian0/p/13595294.html
Copyright © 2020-2023  润新知