• lightdb/postgresql中的事务回卷原理解析及避免


      在pg中,由于事务id采用32位实现,所以是采用循环复用的,如下:

      虽然最大支持4 billion个事务(32位无符号数,xid最大可以到40亿),但是新老事务相差2 billion是上限,当达到2 billion的时候,PostgreSQL就会强制关闭。 在此之前,pg会有一系列的提示(以下为默认值):

    • 当小于10 million的时候,日志中会告警。
    • 当小于1 million的时候,PostgreSQL 会进入只读状态。

      只要保证日常维护freeze,确保当前xid-oldestxid不超过20亿这个边界,是能够完全正常运行的。

      实际和冻结相关的参数如下:

    zjh@lt_test=# show %freez%;
                    name                 |  setting  |                                  description                                  
    -------------------------------------+-----------+-------------------------------------------------------------------------------
     autovacuum_freeze_max_age           | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound.
     autovacuum_multixact_freeze_max_age | 400000000 | Multixact age at which to autovacuum a table to prevent multixact wraparound.
     vacuum_freeze_min_age               | 50000000  | Minimum age at which VACUUM should freeze a table row.  --接近很新的记录不用处理,可以降低不必要的冻结
     vacuum_freeze_table_age             | 150000000 | Age at which VACUUM should scan whole table to freeze tuples. --因为正常可能会被关闭全局autovacuum,而是人工调度,所以这个参数是这个用途
     vacuum_multixact_freeze_min_age     | 5000000   | Minimum age at which VACUUM should freeze a MultiXactId in a table row.
     vacuum_multixact_freeze_table_age   | 150000000 | Multixact age at which VACUUM should scan whole table to freeze tuples.
    (6 rows)
    
    zjh@lt_test=#  select * from pg_xid_assignment();
     next_xid | oldest_xid | xid_vac_limit | xid_warn_limit | xid_stop_limit | xid_wrap_limit | oldest_xid_db 
    ----------+------------+---------------+----------------+----------------+----------------+---------------
     41119082 |        480 |     200000480 |     2136484127 |     2146484127 |     2147484127 |             1
    (1 row)
    zjh@lt_test=# select * from pg_xid_assignment();
      next_xid  | oldest_xid | xid_vac_limit | xid_warn_limit | xid_stop_limit | xid_wrap_limit | oldest_xid_db 
    ------------+------------+---------------+----------------+----------------+----------------+---------------
     2941154270 | 2891144668 |    3091144668 |      732661019 |      742661019 |      743661019 |         13604
    (1 row)
    -- 40 billion 取模之后,相减和没有回卷是一样的。
    zjh@lt_test=# SELECT datname,                   
     datfrozenxid,
     age(datfrozenxid),                                             
     (age(datfrozenxid)::numeric/1000000000*100)::numeric(4,2) as "% WRAPAROUND RISK"
    FROM pg_database ORDER BY 3 DESC;
      datname  | datfrozenxid |   age    | % WRAPAROUND RISK 
    -----------+--------------+----------+-------------------
     postgres  |   2891144668 | 50009950 |              5.00
     test      |   2891144737 | 50009881 |              5.00
     lt_test   |   2891144799 | 50009819 |              5.00
     template1 |   2941144860 |     9758 |              0.00
     template0 |   2941144921 |     9697 |              0.00
    (5 rows)

      xid_wrap_limit:发生回卷的上限,减去oldest_xid=2147483647。

      xid_stop_limit:进入只读状态的上线,减去oldest_xid=2146483647。也就是10 million。

      xid_warn_limit:log里面打告警,提示需要做freeze操作了。也就是1 million。

      下列查询可以每个库的已冻结事务id:

    zjh@lt_test=# SELECT datname,
     datfrozenxid,
     age(datfrozenxid),  // 32位整型
     (age(datfrozenxid)::numeric/1000000000*100)::numeric(4,2) as "% WRAPAROUND RISK"  /*这里的基数到底是10亿还是20亿?下同*/
    FROM pg_database ORDER BY 3 DESC;
      datname  | datfrozenxid |    age    | % WRAPAROUND RISK 
    -----------+--------------+-----------+-------------------
     postgres  |   2091130736 | 150012652 |             15.00
     test      |   2091130804 | 150012584 |             15.00
     lt_test   |   2091130862 | 150012526 |             15.00
     template1 |   2141130946 | 100012442 |             10.00
     template0 |   2141131007 | 100012381 |             10.00
    (5 rows)
    
    zjh@lt_test=# SELECT 
    zjh@lt_test-# c.oid::regclass as table_name,
    zjh@lt_test-# greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as "TXID age",
    zjh@lt_test-# (greatest(age(c.relfrozenxid),age(t.relfrozenxid))::numeric/1000000000*100)::numeric(4,2) as "% WRAPAROUND RISK"
    zjh@lt_test-# FROM pg_class c
    zjh@lt_test-# LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
    zjh@lt_test-# WHERE c.relkind IN ('r', 'm')
    zjh@lt_test-# ORDER BY 2 DESC;
                     table_name                 | TXID age  | % WRAPAROUND RISK 
    --------------------------------------------+-----------+-------------------
     utl_file.utl_file_dir                      | 150012644 |             15.00
     pg_statistic                               | 150012644 |             15.00
     pg_type                                    | 150012644 |             15.00
     pg_foreign_table                           | 150012644 |             15.00
     pg_authid                                  | 150012644 |             15.00
     pg_statistic_ext_data                      | 150012644 |             15.00

      通常是有几张表的冻结事务id特别小(因为vacuum正常不会做freeze,只有超过参数后才会做freeze,所以持续累计),导致整个库的事务冻结事务id无法推进、进而年龄越来越大直至超过为止。

      所以,当vacuum进程无法跟上数据库DML活动的速度时,当到达一个瓶颈时,就可能发生事务ID回卷(Transaction ID Wraparound),此时pg数据库会强制重启。原因是此时已经可以确定mvcc无法在保证正确性。通常来说事务ID回卷会在下列情况下可能会发生:

    • autovacuum被关闭
    • 长时间未提交事务,如OLAP
    • 复制槽保留了事务id
    • DML活动导致autovacuum操作被频繁取消

      1、3、4是比较容易出问题,需要注意。4要特别注意监控,不要该告警。

      当当前的事务id - 表冻结事务id>=autovacuum_freeze_max_age时,会触发防止事务ID回卷的激进vacuum。如下:

    2022-11-26 20:36:09.283575T,,,,,autovacuum worker,,00000,2022-11-26 20:35:59 CST,0,118166,LOG:  automatic aggressive vacuum to prevent wraparound of table "lt_test.pg_toast.pg_toast_100508": index scans: 0
        pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped frozen
        tuples: 0 removed, 0 remain, 0 are dead but not yet removable, oldest xmin: 241134359
        buffer usage: 25 hits, 1 misses, 0 dirtied
        avg read rate: 108.507 MB/s, avg write rate: 0.000 MB/s
        system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
        WAL usage: 1 records, 0 full page images, 188 bytes

      这是一个非常消耗io和cpu的操作,务必要安排在低峰期调度。

    https://github.com/MasaoFujii/pg_cheat_funcs

    https://developer.aliyun.com/article/560513

    https://www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

    https://www.percona.com/blog/overcoming-vacuum-wraparound/

  • 相关阅读:
    慢性肾炎患者的家庭养护
    中国居民五谷膳食指南
    第二单元丨对环境的察觉
    一个人是否靠谱,闭环很重要
    打赢营销胜仗
    治本修身──中醫內分泌調節養生法
    理想的激励
    卓越演讲:打动听众必知必会的策略和方法(原书第3版)
    每日一题_191008
    每日一题_191007
  • 原文地址:https://www.cnblogs.com/lightdb/p/16928126.html
Copyright © 2020-2023  润新知