• postgresql 数据库无法启动


    在数据库无法启动时,一般可以根据报错信息,采取对应措施即可,下面列出一些在数据库启动时报出错误比较严重而解决方式又不那么明显的处理方法。

    模拟错误,查到pg_class系统表中一个索引在磁盘中的位置,通过vim任意修改其中内容。

    
    postgres=# select pg_relation_filepath('pg_class_oid_index');
     pg_relation_filepath
    ----------------------
     base/13219/36870
    (1 row)
    
    
    $ cd $PGDATA
    $ vim  base/13219/36870
    

    重启数据库。

    $pg_ctl restart -m fast
    postgres@db-192-168-173-230-> pg_ctl restart -m fast
    waiting for server to shut down.... done
    server stopped
    waiting for server to start....2019-03-12 11:59:17.312 CST [5688] LOG:  00000: listening on IPv4 address "0.0.0.0", port 1921
    2019-03-12 11:59:17.312 CST [5688] LOCATION:  StreamServerPort, pqcomm.c:593
    2019-03-12 11:59:17.312 CST [5688] LOG:  00000: listening on IPv6 address "::", port 1921
    2019-03-12 11:59:17.312 CST [5688] LOCATION:  StreamServerPort, pqcomm.c:593
    2019-03-12 11:59:17.314 CST [5688] LOG:  00000: listening on Unix socket "./.s.PGSQL.1921"
    2019-03-12 11:59:17.314 CST [5688] LOCATION:  StreamServerPort, pqcomm.c:587
    2019-03-12 11:59:17.400 CST [5688] LOG:  00000: redirecting log output to logging collector process
    2019-03-12 11:59:17.400 CST [5688] HINT:  Future log output will appear in directory "log".
    2019-03-12 11:59:17.400 CST [5688] LOCATION:  SysLogger_Start, syslogger.c:667
     done
    server started
    
    

    数据库可以正常启动,日志也没有报错。

    但连接数据库时,会报出错误:

    $ psql
    psql: FATAL:  could not read block 1 in file "base/13219/36870": read only 32756 of 32768 bytes
    

    由于上面是模拟的错误,我们自然是知道出错的是哪个表或索引,但突然遇到该问题又进不去数据库时,可以使用oid2name来确定对应的数据库和对象。

    $ oid2name
    All databases:
        Oid  Database Name  Tablespace
    ----------------------------------
      13219       postgres  pg_default
      16393           swrd  pg_default
      13218      template0  pg_default
          1      template1  pg_default
    $ oid2name  -f 36870
    From database "postgres":
      Filenode          Table Name
    ------------------------------
         36870  pg_class_oid_index
    

    我上面的情况是数据库可以启动,但是无法进入,当遇到无法启动但遇到类似错误的方法也适用。

    1. 单用户启动数据库

    下面通过单用户模式进入数据库:

    $ postgres --single   -P -d 1
    

    -P 参数是关闭系统索引。

    -d 1是设置debug日志级别为1。级别是从1-5,数字越高日志越详尽。

    $ postgres --single -P -d 1
    2019-03-12 11:17:16.677 CST [1092] DEBUG:  00000: mmap(12998148096) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
    2019-03-12 11:17:16.677 CST [1092] LOCATION:  CreateAnonymousSegment, pg_shmem.c:485
    2019-03-12 11:17:16.759 CST [1092] NOTICE:  00000: database system was shut down at 2019-03-12 11:16:54 CST
    2019-03-12 11:17:16.759 CST [1092] LOCATION:  StartupXLOG, xlog.c:6363
    2019-03-12 11:17:16.759 CST [1092] DEBUG:  00000: checkpoint record is at 2/67000028
    2019-03-12 11:17:16.759 CST [1092] LOCATION:  StartupXLOG, xlog.c:6646
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: redo record is at 2/67000028; shutdown true
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupXLOG, xlog.c:6724
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: next transaction ID: 0:46060157; next OID: 36864
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupXLOG, xlog.c:6728
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: next MultiXactId: 1; next MultiXactOffset: 0
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupXLOG, xlog.c:6731
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: oldest unfrozen transaction ID: 561, in database 1
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupXLOG, xlog.c:6734
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: oldest MultiXactId: 1, in database 1
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupXLOG, xlog.c:6737
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: commit timestamp Xid oldest/newest: 0/0
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupXLOG, xlog.c:6741
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: transaction ID wrap limit is 2147484208, limited by database with OID 1
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  SetTransactionIdLimit, varsup.c:368
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  SetMultiXactIdLimit, multixact.c:2269
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: starting up replication slots
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  StartupReplicationSlots, slot.c:1110
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  SetMultiXactIdLimit, multixact.c:2269
    2019-03-12 11:17:16.760 CST [1092] DEBUG:  00000: MultiXact member stop limit is now 4294757632 based on MultiXact 1
    2019-03-12 11:17:16.760 CST [1092] LOCATION:  SetOffsetVacuumLimit, multixact.c:2632
    
    PostgreSQL stand-alone backend 11.0
    backend> reindex table pg_class;
    2019-03-12 11:18:34.181 CST [1092] DEBUG:  00000: building index "pg_class_oid_index" on table "pg_class" serially
    2019-03-12 11:18:34.181 CST [1092] LOCATION:  index_build, index.c:2297
    2019-03-12 11:18:34.188 CST [1092] DEBUG:  00000: building index "pg_class_relname_nsp_index" on table "pg_class" serially
    2019-03-12 11:18:34.188 CST [1092] LOCATION:  index_build, index.c:2297
    2019-03-12 11:18:34.191 CST [1092] DEBUG:  00000: building index "pg_class_tblspc_relfilenode_index" on table "pg_class" serially
    2019-03-12 11:18:34.191 CST [1092] LOCATION:  index_build, index.c:2297
    backend> 2019-03-12 11:18:47.832 CST [1092] NOTICE:  00000: shutting down
    2019-03-12 11:18:47.832 CST [1092] LOCATION:  ShutdownXLOG, xlog.c:8459
    2019-03-12 11:18:47.986 CST [1092] LOG:  00000: checkpoint starting: shutdown immediate
    2019-03-12 11:18:47.986 CST [1092] LOCATION:  LogCheckpointStart, xlog.c:8508
    2019-03-12 11:18:47.988 CST [1092] DEBUG:  00000: performing replication slot checkpoint
    2019-03-12 11:18:47.988 CST [1092] LOCATION:  CheckPointReplicationSlots, slot.c:1074
    2019-03-12 11:18:48.000 CST [1092] DEBUG:  00000: checkpoint sync: number=1 file=base/13219/1259 time=1.022 msec
    2019-03-12 11:18:48.000 CST [1092] LOCATION:  mdsync, md.c:1251
    2019-03-12 11:18:48.004 CST [1092] LOG:  00000: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.001 s, total=0.019 s; sync files=1, longest=0.001 s, average=0.001 s; distance=16384 kB, estimate=16384 kB
    2019-03-12 11:18:48.004 CST [1092] LOCATION:  LogCheckpointEnd, xlog.c:8590
    2019-03-12 11:18:48.029 CST [1092] NOTICE:  00000: database system is shut down
    2019-03-12 11:18:48.029 CST [1092] LOCATION:  UnlinkLockFiles, miscinit.c:860
    $ pg_ctl start
    waiting for server to start....2019-03-12 11:18:53.104 CST [1185] LOG:  00000: listening on IPv4 address "0.0.0.0", port 1921
    2019-03-12 11:18:53.104 CST [1185] LOCATION:  StreamServerPort, pqcomm.c:593
    2019-03-12 11:18:53.104 CST [1185] LOG:  00000: listening on IPv6 address "::", port 1921
    2019-03-12 11:18:53.104 CST [1185] LOCATION:  StreamServerPort, pqcomm.c:593
    2019-03-12 11:18:53.107 CST [1185] LOG:  00000: listening on Unix socket "./.s.PGSQL.1921"
    2019-03-12 11:18:53.107 CST [1185] LOCATION:  StreamServerPort, pqcomm.c:587
    2019-03-12 11:18:53.191 CST [1185] LOG:  00000: redirecting log output to logging collector process
    2019-03-12 11:18:53.191 CST [1185] HINT:  Future log output will appear in directory "log".
    2019-03-12 11:18:53.191 CST [1185] LOCATION:  SysLogger_Start, syslogger.c:667
     done
    server started
    

    可以看到pg_class索引已修复。然后启动数据库即可,数据库已恢复正常。这里测试的是系统表的索引,至于我们自定义的非系统对象,即使删掉在数据库启动或进入时,都不会报错,只有在用到时才会报错。如果不是磁盘坏道,在报错后,通常reindex一下即可。

    2. 使用物理备份恢复

    将之前的数据目录mv一下,创建新的数据库目录,然后使用备份恢复启动。

    3. 可能碰到是pg的bug,尝试升级到小版本的最高版本。

    4. 搜索邮件列表或在列表提问,寻求帮助

    5. 修改源码,将数据库报错的部分,修改成警告,使之可以正常启动。

    我尝试搜索源码,修改了几处,但由于报同类型错误的地方太多,没有进行所有的修改。下面是修改几处后报出的错误。

    $ /opt/pgsql11_modify/bin/psql
    WARNING:  could not read block 1 in file "base/13219/36864": read only 32756 of 32768 bytes
    psql: FATAL:  could not open file "base/13219/36864.1" (target block 131072): previous segment is only 1 blocks
    

    6.使用pg_resetxlog工具

    这是最后的办法。在数据库启动后,应及时将数据导出,然后在其他集群中恢复。

  • 相关阅读:
    Java 泛型学习四 自动擦除
    Java 泛型学习三 通配符
    ava的动态性支持学习七 方法句柄调用的性能对比
    Java 泛型学习一 泛型概念
    Java中正则Matcher类的matches()、lookAt()和find()的区别
    Java 泛型学习五 泛型擦除带来的不自然
    Java I/O 流的简单介绍及示例
    Java 泛型学习二 泛型的限制
    Java 泛型学习六 泛型类型系统
    字节存储机制:bigedian 和 littleendian
  • 原文地址:https://www.cnblogs.com/xiaotengyi/p/10518925.html
Copyright © 2020-2023  润新知