• LOG: invalid primary checkpoint record/PANIC: could not locate a valid checkpoint record恢复方法


    正常情况下lightdb/postgresql进程被杀或--mode=immediate模式停止的时候,是会进行正常恢复的。如下:

    2022-04-23 08:27:28.085797C [unknown] zjh@postgres ::1(51378) walsender  57P03[2022-04-23 08:27:28 UTC] 0 [82739] FATAL:  the database system is in recovery mode
    2022-04-23 08:27:58.093597C [unknown] zjh@postgres ::1(52034) walsender  57P03[2022-04-23 08:27:58 UTC] 0 [82851] FATAL:  the database system is in recovery mode
    2022-04-23 08:28:28.098248C [unknown] zjh@postgres ::1(52690) walsender  57P03[2022-04-23 08:28:28 UTC] 0 [82974] FATAL:  the database system is in recovery mode
    2022-04-23 08:28:58.106009C [unknown] zjh@postgres ::1(53290) walsender  57P03[2022-04-23 08:28:58 UTC] 0 [83085] FATAL:  the database system is in recovery mode
    2022-04-23 08:29:28.112561C [unknown] zjh@postgres ::1(53948) walsender  57P03[2022-04-23 08:29:28 UTC] 0 [83212] FATAL:  the database system is in recovery mode
    2022-04-23 08:29:53.571639C  @  postmaster  00000[2022-04-22 05:48:38 UTC] 0 [207357] LOG:  all server processes terminated; reinitializing
    2022-04-23 08:29:54.990799C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  database system was interrupted; last known up at 2022-04-22 12:23:26 UTC
    2022-04-23 08:29:55.023361C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  database system was not properly shut down; automatic recovery in progress
    2022-04-23 08:29:55.023954C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  redo starts at 19/7F15410
    2022-04-23 08:29:55.023972C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  invalid record length at 19/7F15448: wanted 24, got 0
    2022-04-23 08:29:55.023978C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  redo done at 19/7F15410
    2022-04-23 08:29:55.024782C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  checkpoint starting: end-of-recovery immediate
    2022-04-23 08:29:55.062822C  @  startup  00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037 s, sync=0.001 s, total=0.039 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
    2022-04-23 08:29:55.067348C  @  postmaster  00000[2022-04-22 05:48:38 UTC] 0 [207357] LOG:  database system is ready to accept connections

    但是某些情况下,异常终止后,可能会导致主检查点记录无效(断电、坏道都有可能),例如:

    2022-04-19 19:05:26.104431T [unknown] lightdb@repmgr 127.0.0.1(39248) client backend 57P03[2022-04-19 19:05:26 CST] 0 [45004] FATAL: the database system is shutting down
    2022-04-19 19:05:36.105421T [unknown] lightdb@repmgr 127.0.0.1(39250) client backend 57P03[2022-04-19 19:05:36 CST] 0 [45036] FATAL: the database system is shutting down
    2022-04-19 19:07:59.027435T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: server process (PID 90426) was terminated by signal 9: 已杀死
    2022-04-19 19:07:59.027435T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] DETAIL: Failed process was running: select pg_backend_pid();
    2022-04-19 19:07:59.027818T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: terminating any other active server processes
    2022-04-19 19:07:59.030308T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: archiver process (PID 47644) exited with exit code 2
    2022-04-19 19:07:59.121224T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: abnormal database system shutdown
    2022-04-19 19:07:59.212224T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: database system is shut down

    [lightdb@hs-10-19-36-28 log]$ tail -fn 100 lightdb-2022-04-19_190953.log
    2022-04-19 19:09:53.964212T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: starting LightDB 13.3-22.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (NeoKylin 4.8.5-36), 64-bit
    2022-04-19 19:09:53.964485T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: listening on IPv4 address "0.0.0.0", port 5432
    2022-04-19 19:09:53.964500T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: listening on IPv6 address "::", port 5432
    2022-04-19 19:09:53.965481T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: listening on Unix socket "/tmp/.s.LIGHTDB.5432"
    2022-04-19 19:09:53.999301T @ startup 00000[2022-04-19 19:09:53 CST] 0 [45593] LOG: database system was interrupted; last known up at 2022-04-19 18:19:48 CST
    2022-04-19 19:09:55.838765T @ startup 00000[2022-04-19 19:09:53 CST] 0 [45593] LOG: invalid primary checkpoint record
    2022-04-19 19:09:55.838796T @ startup XX000[2022-04-19 19:09:53 CST] 0 [45593] PANIC: could not locate a valid checkpoint record
    2022-04-19 19:09:55.873920T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: startup process (PID 45593) was terminated by signal 6: Aborted
    2022-04-19 19:09:55.873956T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: aborting startup due to startup process failure
    2022-04-19 19:09:55.879035T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: database system is shut down

      此时可以通过lt_resetwal -f $PGDATA强制清理wal,然后启动,虽然这样会丢失部分数据(在业务量较大的系统中,总是具有高可用环境或至少使用lt_probackup增量备份到其它服务器是非常重要的)。

     https://postgrespro.com/list/thread-id/2559322

    https://cdmana.com/2022/03/202203241421490061.html

    https://github.com/pgbackrest/pgbackrest/issues/839

    https://dockerquestions.com/2020/11/15/docker-postgres-database-invalid-primary-checkpoint-record/

  • 相关阅读:
    js写入和读取cookie
    算法笔记汇总精简版
    廖雪峰Git教程3
    廖雪峰Git教程2
    廖雪峰Git教程1
    PHP fastcgi_finish_request 方法
    PHP获取远程文件的大小,通过ob_get_contents实现
    PHP 浮点型运算相关问题
    php中的func_num_args、func_get_arg与func_get_args函数
    C# ListView用法详解
  • 原文地址:https://www.cnblogs.com/zhjh256/p/16169442.html
Copyright © 2020-2023  润新知