• 采用pacemaker+corosync实现postgresql双机热备、高可用方案


    环境说明

    参照上章已完成postgresql流复制配置,并关闭postgres服务。

    su - postgres
    pg_ctl -D /data/postgresql/data/ stop -m fast

    配置集群环境

    安装pacemaker和corosync

    需要在所有节点执行

    yum install -y pacemaker pcs psmisc policycoreutils-python postgresql-server
    关闭防火墙

    需要在所有节点执行

    systemctl disable firewalld.service
    systemctl stop firewalld.service
    启用PCS

    需要在所有节点执行

    [root@master ~]# systemctl start pcsd.service
    [root@master ~]# systemctl enable pcsd.service
    Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
    [root@master ~]# echo hacluster |  passwd hacluster --stdin
    更改用户 hacluster 的密码 。
    passwd:所有的身份验证令牌已经成功更新。
    
    集群认证

    在任意机器执行,选择master执行

    [root@master ~]# pcs cluster auth -u hacluster -p hacluster 192.168.174.200 192.168.174.201
    192.168.174.200: Authorized
    192.168.174.201: Authorized
    同步设置

    选择master执行

    [root@master ~]# pcs cluster setup --last_man_standing=1 --name pgcluster 192.168.174.200 192.168.174.201
    Destroying cluster on nodes: 192.168.174.200, 192.168.174.201...
    192.168.174.200: Stopping Cluster (pacemaker)...
    192.168.174.201: Stopping Cluster (pacemaker)...
    192.168.174.200: Successfully destroyed cluster
    192.168.174.201: Successfully destroyed cluster
    
    Sending 'pacemaker_remote authkey' to '192.168.174.200', '192.168.174.201'
    192.168.174.200: successful distribution of the file 'pacemaker_remote authkey'
    192.168.174.201: successful distribution of the file 'pacemaker_remote authkey'
    Sending cluster config files to the nodes...
    192.168.174.200: Succeeded
    192.168.174.201: Succeeded
    
    Synchronizing pcsd certificates on nodes 192.168.174.200, 192.168.174.201...
    192.168.174.200: Success
    192.168.174.201: Success
    Restarting pcsd on the nodes in order to reload the certificates...
    192.168.174.200: Success
    192.168.174.201: Success
    启动集群

    选择master执行

    [root@master ~]# pcs cluster start --all
    192.168.174.200: Starting Cluster...
    192.168.174.201: Starting Cluster...
    验证

    验证corosync,选择在master执行

    [root@master ~]# pcs status corosync
    
    Membership information
    ----------------------
        Nodeid      Votes Name
             1          1 192.168.174.200 (local)
             2          1 192.168.174.201

    验证pacemaker,选择在master执行

    [root@master ~]# pcs status
    Cluster name: pgcluster
    WARNING: no stonith devices and stonith-enabled is not false
    WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
    Stack: unknown
    Current DC: NONE
    Last updated: Fri Apr 13 21:24:51 2018		Last change: 
    
    0 nodes configured
    0 resources configured
    
    
    No resources
    
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled

    配置自由切换

    注意:在配置自动切换前,一定要先把master、slave1上的PostgreSQL服务停掉

    选择在master执行

    vim cluster_setup.sh
    # 将cib配置保存到文件
    pcs cluster cib pgsql_cfg                                                                   
    # 在pacemaker级别忽略quorum
    pcs -f pgsql_cfg property set no-quorum-policy="ignore"        
    # 禁用STONITH           
    pcs -f pgsql_cfg property set stonith-enabled="false"                    
    # 设置资源粘性,防止节点在故障恢复后发生迁移     
    pcs -f pgsql_cfg resource defaults resource-stickiness="INFINITY"       
    # 设置多少次失败后迁移
    pcs -f pgsql_cfg resource defaults migration-threshold="3"                 
    # 设置master节点虚ip
    pcs -f pgsql_cfg resource create vip-master IPaddr2 ip="192.168.174.210" cidr_netmask="24"    op start   timeout="60s" interval="0s"  on-fail="restart"    op monitor timeout="60s" interval="10s" on-fail="restart"    op stop    timeout="60s" interval="0s"  on-fail="block"                             
    # 设置slave节点虚ip                       
    pcs -f pgsql_cfg resource create vip-slave IPaddr2 ip="192.168.174.211" cidr_netmask="24"    op start   timeout="60s" interval="0s"  on-fail="restart"    op monitor timeout="60s" interval="10s" on-fail="restart"    op stop    timeout="60s" interval="0s"  on-fail="block"                                                        
    # 设置pgsql集群资源
    # pgctl、psql、pgdata和config等配置根据自己的环境修改,node list填写节点的hostname,master_ip填写虚master_ip
    pcs -f pgsql_cfg resource create pgsql pgsql pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/var/lib/pgsql/data/" config="/var/lib/pgsql/data/postgresql.conf" rep_mode="sync" node_list="master slave1" master_ip="192.168.174.210"  repuser="replicator" primary_conninfo_opt="password=123456 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" restore_command="cp /var/lib/data/pgsql/xlog_archive/%f %p" restart_on_promote='true' op start   timeout="60s" interval="0s"  on-fail="restart" op monitor timeout="60s" interval="4s" on-fail="restart" op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" op promote timeout="60s" interval="0s"  on-fail="restart" op demote  timeout="60s" interval="0s"  on-fail="stop" op stop    timeout="60s" interval="0s"  on-fail="block"       
     # 设置master/slave模式,clone-max=2,两个节点
    pcs -f pgsql_cfg resource master pgsql-cluster pgsql master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true                                                                       
    # 配置master ip组
    pcs -f pgsql_cfg resource group add master-group vip-master        
    # 配置slave ip组     
    pcs -f pgsql_cfg resource group add slave-group vip-slave                 
    # 配置master ip组绑定master节点
    pcs -f pgsql_cfg constraint colocation add master-group with master pgsql-cluster INFINITY    
    # 配置启动master节点
    pcs -f pgsql_cfg constraint order promote pgsql-cluster then start master-group symmetrical=false score=INFINITY                                 
    # 配置停止master节点                                                                   
    pcs -f pgsql_cfg constraint order demote  pgsql-cluster then stop  master-group symmetrical=false score=0                                                                                                                
    # 配置slave ip组绑定slave节点
    pcs -f pgsql_cfg constraint colocation add slave-group with slave pgsql-cluster INFINITY         
    # 配置启动slave节点
    pcs -f pgsql_cfg constraint order promote pgsql-cluster then start slave-group symmetrical=false score=INFINITY                               
    # 配置停止slave节点                                                                         
    pcs -f pgsql_cfg constraint order demote  pgsql-cluster then stop  slave-group symmetrical=false score=0                                                                                                                  
    # 把配置文件push到cib
    pcs cluster cib-push pgsql_cfg

    注意:

    # 设置pgsql集群资源
    # pgctl、psql、pgdata和config等配置根据自己的环境修改,node list填写节点的hostname,master_ip填写虚master_ip
    pcs -f pgsql_cfg resource create pgsql pgsql pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/var/lib/pgsql/data/" config="/var/lib/pgsql/data/postgresql.conf" rep_mode="sync" node_list="master slave1" master_ip="192.168.174.210"  repuser="replicator" primary_conninfo_opt="password=123456 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" restore_command="cp /var/lib/data/pgsql/xlog_archive/%f %p" restart_on_promote='true' op start   timeout="60s" interval="0s"  on-fail="restart" op monitor timeout="60s" interval="4s" on-fail="restart" op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" op promote timeout="60s" interval="0s"  on-fail="restart" op demote  timeout="60s" interval="0s"  on-fail="stop" op stop    timeout="60s" interval="0s"  on-fail="block"       

    执行文件

    sh cluster_setup.sh

    验证

    [root@master ~]# pcs status
    Cluster name: pgcluster
    WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
    Stack: corosync
    Current DC: slave1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
    Last updated: Fri Apr 13 21:30:29 2018
    Last change: Fri Apr 13 21:29:56 2018 by root via crm_attribute on master
    
    2 nodes configured
    4 resources configured
    
    Online: [ master slave1 ]
    
    Full list of resources:
    
     Master/Slave Set: pgsql-cluster [pgsql]
         Masters: [ master ]
         Slaves: [ slave1 ]
     Resource Group: master-group
         vip-master	(ocf::heartbeat:IPaddr2):	Started master
     Resource Group: slave-group
         vip-slave	(ocf::heartbeat:IPaddr2):	Started slave1
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled

    注意:当vip-master和vip-slave都是Started状态时,集群节点配置正确。否则,请检查你的配置是否有误

    查看同步状态

    在master查询:select * from pg_stat_replication;

    #查询结果
    6305	16384	replicator	slave1	192.168.174.201		60052	2018-04-13 13:29:47.76656+00	streaming	0/5000408	0/5000408	0/5000408	0/5000408	1	sync

    测试故障转移

    在master执行,模拟故障关闭服务

    [root@master ~]# su - postgres
    上一次登录:五 4月 13 22:01:13 CST 2018
    -bash-4.2$ pg_ctl -D /var/lib/pgsql/data/ stop -m fast
    等待服务器进程关闭 ..... 完成
    服务器进程已经关闭

    现master上数据库访问不通,访问slaver1上数据库可提供数据读写服务。

    现向slaver1数据库插入几条数据

    insert into mytab(id) values (4);
    insert into mytab(id) values (5);
    insert into mytab(id) values (6);

    恢复服务,在master上执行

    -bash-4.2$ pg_ctl -D /var/lib/pgsql/data/ start
    正在启动服务器进程
    -bash-4.2$ 日志:  数据库上次关闭时间为 2018-04-13 14:01:25 GMT
    cp: 无法获取"/var/lib/data/pgsql/xlog_archive/00000002.history" 的文件状态(stat): 没有那个文件或目录
    日志:  正在进入备用模式
    cp: 无法获取"/var/lib/data/pgsql/xlog_archive/000000010000000000000006" 的文件状态(stat): 没有那个文件或目录
    日志:  在0/6000080上已到达一致性恢复状态
    日志:  在 0/6000080 位置有零长度的记录
    cp: 无法获取"/var/lib/data/pgsql/xlog_archive/000000010000000000000006" 的文件状态(stat): 没有那个文件或目录
    日志:  数据库系统准备接受只读请求的连接
    cp: 无法获取"/var/lib/data/pgsql/xlog_archive/00000002.history" 的文件状态(stat): 没有那个文件或目录
    日志:  流复制成功连接到主服务器
    日志:  redo 在 0/6000080 开始

    在master上查询:select * from mytab;发现已同步为最新数据。


    在slaver1上查询:select * from pg_stat_replication;

    #查询结果
    58621	16384	replicator	master	192.168.174.200		52578	2018-04-13 14:02:39.701236+00	streaming	0/6000548	0/6000548	0/6000548	0/6000548	1	sync

    对比两次的查询结果

    --6305	16384	replicator	slave1	192.168.174.201		60052	2018-04-13 13:29:47.76656+00	streaming	0/5000408	0/5000408	0/5000408	0/5000408	1	sync
    --58621	16384	replicator	master	192.168.174.200		52578	2018-04-13 14:02:39.701236+00	streaming	0/6000548	0/6000548	0/6000548	0/6000548	1	sync

    发现master在模拟故障后从主机变为从机。

    至此双机热备方案已全部完成!


  • 相关阅读:
    iOS-开发日志-UIButton
    苹果API常用英语名词
    iOS-开发日志-UIimageView
    IOS-开发日志-UILabel相关
    iOS-开发日志-UIPageControl
    Maven-生成可执行的Jar包
    RabbitMQ不讲武德,发个消息也这么多花招
    Azure Service Bus(三)在 .NET Core Web 应用程序发送ServiceBus Queue
    windows server 2012 R2里IIS配置.net core2.1遇到的坑
    VScode中配置C++运行环境
  • 原文地址:https://www.cnblogs.com/gmhappy/p/9472453.html
Copyright © 2020-2023  润新知