• repmgr+pg12构建高可用集群(3)


    1、前面搭建好了简单的repmgr集群,这时查看集群和repmgr服务状态,可知repmgrd并未运行

    [postgres@localhost bin]$ ./repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------
     1  | node1 | primary | * running |          | default  | 100      | 1        | host=192.168.101.9 port=5432 user=postgres  dbname=postgres
     2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=192.168.101.7 port=5432 user=postgres  dbname=postgres
    [postgres@localhost bin]$ ./repmgr service status
     ID | Name  | Role    | Status    | Upstream | repmgrd     | PID | Paused? | Upstream last seen
    ----+-------+---------+-----------+----------+-------------+-----+---------+--------------------
     1  | node1 | primary | * running |          | not running | n/a | n/a     | n/a
     2  | node2 | standby |   running | node1    | not running | n/a | n/a     | n/a

    2、修改repmgr.conf参数

    vim /etc/repmgr/12/repmgr.conf
    failover='automatic'  
    promote_command='/usr/pgsql-12/bin/repmgr standby promote' 
    follow_command='/usr/pgsql-12/bin/repmgr standby follow'

    failover参数有两个
    automatic:表示开启故障自动切换
    manual:不开启故障自动切换

    不开启故障自动切换,备机检测到主机故障后的日志如下,可以看到备机不会自动升级为主机

    [2020-04-24 22:49:17] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 22:49:17] [WARNING] unable to reconnect to node 1 after 6 attempts
    [2020-04-24 22:49:17] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate, and will not follow the new primary
    [2020-04-24 22:49:17] [DETAIL] "failover" is set to "manual" in repmgr.conf
    [2020-04-24 22:49:17] [HINT] manually execute "repmgr standby follow" to have this node follow the new primary
    [2020-04-24 22:49:17] [INFO] follower node awaiting notification from a candidate node
    [2020-04-24 22:50:17] [WARNING] no notification received from new primary after 60 seconds

    3、此时开启集群repmgrd进程

    主备机bin目录下执行:
    ./repmgrd -d

    4、开启后查看服务状态

    [postgres@localhost bin]$ ./repmgr service status
     ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
    ----+-------+---------+-----------+----------+---------+-------+---------+--------------------
     1  | node1 | primary | * running |          | running | 11558 | no      | n/a
     2  | node2 | standby |   running | node1    | running | 10818 | no      | 0 second(s) ago

    5、此时模拟主机故障,备机日志如下

    [2020-04-24 23:14:02] [INFO] monitoring connection to upstream node "node1" (ID: 1)
    [2020-04-24 23:14:38] [WARNING] unable to ping "host=192.168.101.9 port=5432 user=postgres  dbname=postgres"
    [2020-04-24 23:14:38] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:14:38] [WARNING] unable to connect to upstream node "node1" (ID: 1)
    [2020-04-24 23:14:38] [INFO] checking state of node 1, 1 of 6 attempts
    [2020-04-24 23:14:38] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
    [2020-04-24 23:14:38] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:14:38] [INFO] sleeping 10 seconds until next reconnection attempt
    [2020-04-24 23:14:48] [INFO] checking state of node 1, 2 of 6 attempts
    [2020-04-24 23:14:48] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
    [2020-04-24 23:14:48] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:14:48] [INFO] sleeping 10 seconds until next reconnection attempt
    [2020-04-24 23:14:58] [INFO] checking state of node 1, 3 of 6 attempts
    [2020-04-24 23:14:58] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
    [2020-04-24 23:14:58] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:14:58] [INFO] sleeping 10 seconds until next reconnection attempt
    [2020-04-24 23:15:08] [INFO] checking state of node 1, 4 of 6 attempts
    [2020-04-24 23:15:08] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
    [2020-04-24 23:15:08] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:15:08] [INFO] sleeping 10 seconds until next reconnection attempt
    [2020-04-24 23:15:18] [INFO] checking state of node 1, 5 of 6 attempts
    [2020-04-24 23:15:18] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
    [2020-04-24 23:15:18] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:15:18] [INFO] sleeping 10 seconds until next reconnection attempt
    [2020-04-24 23:15:28] [INFO] checking state of node 1, 6 of 6 attempts
    [2020-04-24 23:15:28] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
    [2020-04-24 23:15:28] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
    [2020-04-24 23:15:28] [WARNING] unable to reconnect to node 1 after 6 attempts
    [2020-04-24 23:15:28] [INFO] 0 active sibling nodes registered
    [2020-04-24 23:15:28] [INFO] primary node  "node1" (ID: 1) and this node have the same location ("default")
    [2020-04-24 23:15:28] [INFO] no other sibling nodes - we win by default
    [2020-04-24 23:15:28] [NOTICE] this node is the only available candidate and will now promote itself
    [2020-04-24 23:15:28] [INFO] promote_command is:
      "/usr/pgsql-12/bin/repmgr standby promote"
    NOTICE: promoting standby to primary
    DETAIL: promoting server "node2" (ID: 2) using pg_promote()
    NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
    NOTICE: STANDBY PROMOTE successful
    DETAIL: server "node2" (ID: 2) was successfully promoted to primary
    [2020-04-24 23:15:29] [INFO] 0 followers to notify
    [2020-04-24 23:15:29] [INFO] switching to primary monitoring mode
    [2020-04-24 23:15:29] [NOTICE] monitoring cluster primary "node2" (ID: 2)

    可知备机正确升级为主机提供服务

    6、查看集群状态

    [postgres@localhost bin]$ ./repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------
     1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=192.168.101.9 port=5432 user=postgres  dbname=postgres
     2  | node2 | primary | * running |          | default  | 100      | 2        | host=192.168.101.7 port=5432 user=postgres  dbname=postgres
    
    
    
    但行好事,莫问前程
  • 相关阅读:
    面试时会经常遇到的经典算法
    PHP面试题,自己几斤几两,看看就知道了
    springboot整合mybatis时java.sql.SQLException: The server time zone value 'Öйú±ê׼ʱ¼ä' is unrecognized or represents more than one time zone.
    springboot项目启动无法访问到controller原因之一:引导类位置有问题
    Windows上Tomcat启动,服务中没有Tomcat
    Navicat无法启动,提示无法启动程序,因为计算机中丢失MSVCP140.dll
    未配置jdk环境变量,cmd环境能运行java -version命令
    棒谷科技java岗笔试题与初试题
    Dubbo注册中心Zookeeper安装步骤
    POST提交表单,本地Windows测试无乱码,而将项目部署到服务器端产生乱码原因之一
  • 原文地址:https://www.cnblogs.com/mingfan/p/12770905.html
Copyright © 2020-2023  润新知