今天看了同事写的的关于复制槽和hot_standby_feedback的帖子,将以前不是很清楚的地方和他一起讨论了一下,在这里做个记录,以防再次混乱:
参考同事的公众号文章(搜索公众号:PostgreSQL学徒):
https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&mid=2247483746&idx=1&sn=0dc832d9f3b65877f4605964e4fbc644&chksm=fa662953cd11a045616e38941042cbbd48368a10bb9deb01df68b05dda1e79d35a711a34b52b&mpshare=1&scene=1&srcid=&sharer_sharetime=1585038808502&sharer_shareid=79f55170ad37c5ec25af8a7b37a9222e&key=c967bf038ee4b907c263d306b687d31f07bc9627332581eb31df109c33a680f9a5172b11ce8bb69559902a4c921aa73a536664d3abfe62ebd908cbbc76438dc616889a04409c1697a53c24ee4ab2be39&ascene=1&uin=NTI1MzU3MjE1&devicetype=Windows+10&version=62080079&lang=zh_CN&exportkey=A34952GR4X0U8jQgP0qcFMY%3D&pass_ticket=tQftMthkFUZop2iYDlEAy1atZOVcMAZTZIJCzERsZPCg%2BFHLq0yl2IMTweDYZma%2B
一:设置为on时
大致意思:
当设置了hot_standby_feedback为on时,备库经过wal_receiver_status_interval时间将备库所有查询中的事务快照中最小的xid发送给主库,主库将该值记录到pg_replication_slots中的xmin。
然后主库在执行delete操作,紧接着做vacuum时,就会保护那些要删除的块不被立即清理。(块的xmax>oldxmin则不会被删除)
该函数计算当前tuple的xmax是否大于或等于OldestXmin。xmax是删除这个tuple的事务ID,而OldestXmin由GetOldestXmin函数计算,是所有活跃事务的ID,以及所有事务的xmin 组成的集合中最小的事务ID。所有ID大于这个OldestXmin的事务,都是“新近”开启的事务,其他事务可能需要读取这个旧版本用于查询,所以不能物理删除,则返回HEAPTUPLE_RECENTLY_DEAD,保留此tuple。换句话说,就是产生垃圾tuple的事务号,通常在为垃圾tuple的头信息中的xmax版本号大于或等于vacuum开启时数据库中最小的(backend_xmin, backend_xid),这条垃圾tuple就不能被回收
换句我理解的话就是:
保证vacuum时,判断在delete这个tuple的时刻,系统中(包括主、备)活跃的事务及查询都结束了,才会去清理这个tuple。
也即是说:vcuum开始的oldxmin > xmax时才会删除这个tuple,只有当oldxmin值增长到大于删除tuple的事务id时,才允许删除这个tuple。oldxmin大于xmax的事务或者查询,都对删除的tuple的事务可见了,已经认为这个tuple是被删除了,删除这tuple就没有影响了。
所以这个逻辑不止是对备库有用,主库也是如此。
举个例子:
1)备库执行一个查询select * from test,事务快照为:100..105,则返回给主库的xmin为100(假如当前只有一个查询,如果有多个查询则计算所有查询的最低xid)。
2)备库还在查询过程中,主库删除了这张表的所有数据,其xid为106,并立即执行vacuum。
#此时主库中test表的所有行的xmax将会设置为106。
#vacuum进程在清理数据时,比较每个tuple的xmax和oldxmin的大小,如果大于oldxmin则不清理。这里 106>100,则不会清理test表的数据块,备库也就不会担心正在执行的查询的test表数据被删除。
#那么当下一次,或者下几次vacuum来临时,这个查询可能已经结束了,oldxmin比这写块的xmax(106)大了,则这些块就会被清理了。
这个被清理的时刻是:删除数据时,系统中正在执行的事务、查询都完成了。
二、设置为off时
1)备库查询select * from test;
2)主库删除test表数据,然后vacuum;
3)备库会在等待 大概max_standby_streaming_delay时间将备库查询cancel掉,为什么是大概,我这里就偷懒引用一下同事的说明和实验:
当备库执行SQL时,有可能与正在应用的WAL发生冲突,此查询如果30s没有执行完就被中止,注意30s不是备库上单个查询允许的最大执行时间,
是指当备库上应用WAL时允许的最大WAL延迟应用时间,因此备库上查询的执行时间有可能不到这个值就被中止了,此参数可以设置为-1,表示当从库上的WAL应用进程与从库上执行的查询冲突时,WAL应用进程一直等待直到从库查询执行完成。
实验过程:
postgres=# insert into test values(generate_series(1,40000000));
INSERT 0 40000000
postgres=# analyze test;
ANALYZE
备库上执行一下查询:
postgres=# show hot_standby_feedback ;
hot_standby_feedback
----------------------
off
(1 row)
postgres=# select count(*) from test where id = 6666666;
此处夯住...
主库上删除id为6666666的数据,然后做一下vacuum:
postgres=# delete from test where id = 6666666;
DELETE 2
postgres=# vacuum test;
VACUUM
这个时候再去备库上,查询会报错:
postgres=# select count(*) from test where id = 6666666;
FATAL: terminating connection due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
相关参数详解:
hot_standby (boolean)
Specifies whether or not you can connect and run queries during recovery, as described in Section 26.5. The default value is on. This parameter can only be set at server start. It only has effect during archive recovery or in standby mode.
max_standby_archive_delay (integer)
When Hot Standby is active, this parameter determines how long the standby server should wait before canceling standby queries that conflict with about-to-be-applied WAL entries, as described in Section 26.5.2. max_standby_archive_delay applies when WAL data is being read from WAL archive (and is therefore not current). If this value is specified without units, it is taken as milliseconds. The default is 30 seconds. A value of -1 allows the standby to wait forever for conflicting queries to complete. This parameter can only be set in the postgresql.conf file or on the server command line.
Note that max_standby_archive_delay is not the same as the maximum length of time a query can run before cancellation; rather it is the maximum total time allowed to apply any one WAL segment's data. Thus, if one query has resulted in significant delay earlier in the WAL segment, subsequent conflicting queries will have much less grace time.
max_standby_streaming_delay (integer)
When Hot Standby is active, this parameter determines how long the standby server should wait before canceling standby queries that conflict with about-to-be-applied WAL entries, as described in Section 26.5.2. max_standby_streaming_delay applies when WAL data is being received via streaming replication. If this value is specified without units, it is taken as milliseconds. The default is 30 seconds. A value of -1 allows the standby to wait forever for conflicting queries to complete. This parameter can only be set in the postgresql.conf file or on the server command line.
Note that max_standby_streaming_delay is not the same as the maximum length of time a query can run before cancellation; rather it is the maximum total time allowed to apply WAL data once it has been received from the primary server. Thus, if one query has resulted in significant delay, subsequent conflicting queries will have much less grace time until the standby server has caught up again.
wal_receiver_status_interval (integer)
Specifies the minimum frequency for the WAL receiver process on the standby to send information about replication progress to the primary or upstream standby, where it can be seen using the pg_stat_replication view. The standby will report the last write-ahead log location it has written, the last position it has flushed to disk, and the last position it has applied. This parameter's value is the maximum amount of time between reports. Updates are sent each time the write or flush positions change, or at least as often as specified by this parameter. Thus, the apply position may lag slightly behind the true position. If this value is specified without units, it is taken as seconds. The default value is 10 seconds. Setting this parameter to zero disables status updates completely. This parameter can only be set in the postgresql.conf file or on the server command line.
hot_standby_feedback (boolean)
Specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby. This parameter can be used to eliminate query cancels caused by cleanup records, but can cause database bloat on the primary for some workloads. Feedback messages will not be sent more frequently than once per wal_receiver_status_interval. The default value is off. This parameter can only be set in the postgresql.conf file or on the server command line.
If cascaded replication is in use the feedback is passed upstream until it eventually reaches the primary. Standbys make no other use of feedback they receive other than to pass upstream.
This setting does not override the behavior of old_snapshot_threshold on the primary; a snapshot on the standby which exceeds the primary's age threshold can become invalid, resulting in cancellation of transactions on the standby. This is because old_snapshot_threshold is intended to provide an absolute limit on the time which dead rows can contribute to bloat, which would otherwise be violated because of the configuration of a standby.
wal_receiver_timeout (integer)
Terminate replication connections that are inactive for longer than this amount of time. This is useful for the receiving standby server to detect a primary node crash or network outage. If this value is specified without units, it is taken as milliseconds. The default value is 60 seconds. A value of zero disables the timeout mechanism. This parameter can only be set in the postgresql.conf file or on the server command line.
wal_retrieve_retry_interval (integer)
Specifies how long the standby server should wait when WAL data is not available from any sources (streaming replication, local pg_wal or WAL archive) before trying again to retrieve WAL data. If this value is specified without units, it is taken as milliseconds. The default value is 5 seconds. This parameter can only be set in the postgresql.conf file or on the server command line.
This parameter is useful in configurations where a node in recovery needs to control the amount of time to wait for new WAL data to be available. For example, in archive recovery, it is possible to make the recovery more responsive in the detection of a new WAL log file by reducing the value of this parameter. On a system with low WAL activity, increasing it reduces the amount of requests necessary to access WAL archives, something useful for example in cloud environments where the amount of times an infrastructure is accessed is taken into account.
recovery_min_apply_delay (integer)
By default, a standby server restores WAL records from the sending server as soon as possible. It may be useful to have a time-delayed copy of the data, offering opportunities to correct data loss errors. This parameter allows you to delay recovery by a specified amount of time. For example, if you set this parameter to 5min, the standby will replay each transaction commit only when the system time on the standby is at least five minutes past the commit time reported by the master. If this value is specified without units, it is taken as milliseconds. The default is zero, adding no delay.
It is possible that the replication delay between servers exceeds the value of this parameter, in which case no delay is added. Note that the delay is calculated between the WAL time stamp as written on master and the current time on the standby. Delays in transfer because of network lag or cascading replication configurations may reduce the actual wait time significantly. If the system clocks on master and standby are not synchronized, this may lead to recovery applying records earlier than expected; but that is not a major issue because useful settings of this parameter are much larger than typical time deviations between servers.
The delay occurs only on WAL records for transaction commits. Other records are replayed as quickly as possible, which is not a problem because MVCC visibility rules ensure their effects are not visible until the corresponding commit record is applied.
The delay occurs once the database in recovery has reached a consistent state, until the standby is promoted or triggered. After that the standby will end recovery without further waiting.
This parameter is intended for use with streaming replication deployments; however, if the parameter is specified it will be honored in all cases except crash recovery. hot_standby_feedback will be delayed by use of this feature which could lead to bloat on the master; use both together with care.