os: centos 7.4
postgresql: 10.4
pg_waldump 是 postgres 10.x 之后的命令,对应之前版本的 pg_xlogdump。
postgresql 从 10 版本开始,将所用xlog相关的全部用wal替换了,同时大版本的命名规则也做了调整。
实际工作总用户难免会误操作,当寻求用户误操作时间时,基本只能得到一个大概的时间范围。
但是通过pg_waldump可以定位到具体的xid或者lsn,然后使用recovery.conf 设置 recovery_target_inclusive = false 进行pitr异机恢复。
具体如下:
# vi recovery.conf
standby_mode = on
restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_inclusive = false
#大概恢复
#recovery_target_time = '2017-12-21 10:00:00 +08'
#精确恢复
recovery_target_xid = '1485020'
pause_at_recovery_target = true
pg_waldump –help
$ pg_waldump --help
pg_waldump decodes and displays PostgreSQL write-ahead logs for debugging.
Usage:
pg_waldump [OPTION]... [STARTSEG [ENDSEG]]
Options:
-b, --bkp-details output detailed information about backup blocks
-e, --end=RECPTR stop reading at WAL location RECPTR
-f, --follow keep retrying after reaching end of WAL
-n, --limit=N number of records to display
-p, --path=PATH directory in which to find log segment files or a
directory with a ./pg_wal that contains such files
(default: current directory, ./pg_wal, $PGDATA/pg_wal)
-r, --rmgr=RMGR only show records generated by resource manager RMGR;
use --rmgr=list to list valid resource manager names
-s, --start=RECPTR start reading at WAL location RECPTR
-t, --timeline=TLI timeline from which to read log records
(default: 1 or the value used in STARTSEG)
-V, --version output version information, then exit
-x, --xid=XID only show records with transaction ID XID
-z, --stats[=record] show statistics instead of records
(optionally, show per-record statistics)
-?, --help show this help, then exit
分析
select pg_current_wal_lsn(),
pg_walfile_name(pg_current_wal_lsn()),
pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
0/16A9330 | 000000010000000000000001 | (000000010000000000000001,6984496)
(1 row)
postgres=# create database peiybdb;
postgres=# create table tmp_t0(c1 varchar(100),c2 varchar(100));
postgres=# insert into tmp_t0(c1,c2) values('123','456');
select pg_current_wal_lsn(),
pg_walfile_name(pg_current_wal_lsn()),
pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
0/16C16D8 | 000000010000000000000001 | (000000010000000000000001,7083736)
(1 row)
使用pg_waldump观察输出
$ pg_waldump -s 0/16A9330 $PGDATA/pg_wal/000000010000000000000001
挑选了一些关键信息如下
rmgr: Database len (rec/tot): 42/ 42, tx: 559, lsn: 0/016A9B50, prev 0/016A9AE0, desc: CREATE copy dir 1/1663 to 16388/1663
rmgr: Storage len (rec/tot): 42/ 42, tx: 0, lsn: 0/016A9CA8, prev 0/016A9C70, desc: CREATE base/13806/16389
rmgr: Heap len (rec/tot): 63/ 63, tx: 561, lsn: 0/016C1638, prev 0/016C1600, desc: INSERT+INIT off 1, blkref #0: rel 1663/13806/16389 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 561, lsn: 0/016C1678, prev 0/016C1638, desc: COMMIT 2018-06-10 10:46:52.926390 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/016C16A0, prev 0/016C1678, desc: RUNNING_XACTS nextXid 562 latestCompletedXid 561 oldestRunningXid 562
再次插入一行
select pg_current_wal_lsn(),
pg_walfile_name(pg_current_wal_lsn()),
pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
0/16C17B8 | 000000010000000000000001 | (000000010000000000000001,7083960)
(1 row)
insert into tmp_t0(c1,c2) values('111','222');
select pg_current_wal_lsn(),
pg_walfile_name(pg_current_wal_lsn()),
pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
0/16C1878 | 000000010000000000000001 | (000000010000000000000001,7084152)
(1 row)
再次 pg_waldump
$ pg_waldump -s 0/16C17B8 $PGDATA/pg_wal/000000010000000000000001
挑选了一些关键信息如下
rmgr: Heap len (rec/tot): 54/ 150, tx: 562, lsn: 0/016C17B8, prev 0/016C1780, desc: INSERT off 2, blkref #0: rel 1663/13806/16389 blk 0 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 562, lsn: 0/016C1850, prev 0/016C17B8, desc: COMMIT 2018-06-10 10:54:03.267369 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/016C1878, prev 0/016C1850, desc: RUNNING_XACTS nextXid 563 latestCompletedXid 562 oldestRunningXid 563
插入两行
select pg_current_wal_lsn(),
pg_walfile_name(pg_current_wal_lsn()),
pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
0/16C1AB8 | 000000010000000000000001 | (000000010000000000000001,7084728)
(1 row)
insert into tmp_t0(c1,c2) values('111','222'),('112','333');
select pg_current_wal_lsn(),
pg_walfile_name(pg_current_wal_lsn()),
pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
0/16C1BD0 | 000000010000000000000001 | (000000010000000000000001,7085008)
(1 row)
$ pg_waldump -s 0/16C1AB8 $PGDATA/pg_wal/000000010000000000000001
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/016C1AB8, prev 0/016C1A90, desc: RUNNING_XACTS nextXid 564 latestCompletedXid 563 oldestRunningXid 564
rmgr: Heap len (rec/tot): 63/ 63, tx: 564, lsn: 0/016C1AF0, prev 0/016C1AB8, desc: INSERT off 5, blkref #0: rel 1663/13806/16389 blk 0
rmgr: Heap len (rec/tot): 63/ 63, tx: 564, lsn: 0/016C1B30, prev 0/016C1AF0, desc: INSERT off 6, blkref #0: rel 1663/13806/16389 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 564, lsn: 0/016C1B70, prev 0/016C1B30, desc: COMMIT 2018-06-10 10:58:37.405144 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/016C1B98, prev 0/016C1B70, desc: RUNNING_XACTS nextXid 565 latestCompletedXid 564 oldestRunningXid 565
可以看出,wal记录的格式是row
checkpoint 的日志输出
随后发现有一些 checkpoint 的日志输出
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/016C1BD0, prev 0/016C1B98, desc: RUNNING_XACTS nextXid 565 latestCompletedXid 564 oldestRunningXid 565
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/016C1C08, prev 0/016C1BD0, desc: CHECKPOINT_ONLINE redo 0/16C1BD0; tli 1; prev tli 1; fpw true; xid 0:565; oid 24576; multi 1; offset 0; oldest xid 548 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 565; online
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/016C1C78, prev 0/016C1C08, desc: RUNNING_XACTS nextXid 565 latestCompletedXid 564 oldestRunningXid 565
pg_waldump: FATAL: error in WAL record at 0/16C1C78: invalid record length at 0/16C1CB0: wanted 24, got 0
wal日志记录了数据库集群操作的整个流水,是pitr的关键文件。