• postgresql 9.1 基于 async stream 的 master/salve 切换过程


    vm: Oracle VM VirtualBox 5.2.4 r119785 (Qt5.6.2)
    os: debian 8.2
    postgresql: 9.1.22

    ip 规划如下:两个网段
    # 192.168.56.x
    # 192.168.165.x

    node1
    bond0 192.168.56.100
    bond0:1 192.168.56.101
    eth0
    eth1
    bond1 192.168.165.100
    bond1:1 192.168.165.101
    eth2
    eth3

    node2
    bond0 192.168.56.200
    bond0:1
    eth0
    eth1
    bond1 192.168.165.200
    bond1:1
    eth2
    eth3

    node3
    eth0 192.168.56.50
    eth1 192.168.165.50

    其中 192.168.56.101、192.168.165.101 为 pgsql的 vip,应用程序只需要连接这两个ip中的一个就ok了。

    #######################################################################
    debian 8.2 安装 pgsql 9.1

    #vi /etc/apt/sources.list.d/pgdg.list
    deb http://apt.postgresql.org/pub/repos/apt/ jessie-pgdg main
    
    #wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | 
    apt-key add -
    #apt-get update
    #apt-get install postgresql-9.1

    包含了如下包
    postgresql-9.1
    postgresql-client-9.1
    postgresql-client-common
    postgresql-common
    postgresql-contrib-9.1

    手动安装如下包

    #apt-get install postgresql-server-dev-9.1

    包含了如下包
    libpq-dev
    postgresql-server-dev-9.1

    使用bonding,安装 ifenslave

    # apt-get install ifenslave

    bonding 配置,可以参考另外一篇blog。

    #######################################################################
    搭建 pgsql stream 环境,一主两从

    node1 节点创建复制用户

    $ psql
    psql (9.1.22)
    postgres=# CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'rep123';

    node1、node1、node3 节点修改 pg_hba.conf

    $ cd /etc/postgresql/9.1/main
    $ vi pg_hba.conf
    
    host     replication     replicator     192.168.56.100/32         md5
    host     replication     replicator     192.168.165.100/32        md5
    host     replication     replicator     192.168.56.200/32         md5
    host     replication     replicator     192.168.165.200/32        md5
    host     replication     replicator     192.168.56.50/32          md5
    host     replication     replicator     192.168.165.50/32         md5
    host     replication     replicator     192.168.56.101/32         md5
    host     replication     replicator     192.168.165.101/32        md5

    node2,node3 两个节点上修改 postgresql.conf省略。

    $ ls -l
    drwx------ 5 postgres postgres 4096 1月  16 16:12 base
    drwx------ 2 postgres postgres 4096 1月  16 18:51 global
    drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_clog
    drwx------ 4 postgres postgres 4096 1月  16 16:12 pg_multixact
    drwx------ 2 postgres postgres 4096 1月  16 18:48 pg_notify
    drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_serial
    drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_stat_tmp
    drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_subtrans
    drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_tblspc
    drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_twophase
    -rw------- 1 postgres postgres    4 1月  16 16:12 PG_VERSION
    drwx------ 3 postgres postgres 4096 1月  16 16:12 pg_xlog
    -rw------- 1 postgres postgres  133 1月  16 18:48 postmaster.opts
    lrwxrwxrwx 1 root     root       36 116 16:12 server.crt -> /etc/ssl/certs/ssl-cert-snakeoil.pem
    lrwxrwxrwx 1 root     root       38 116 16:12 server.key -> /etc/ssl/private/ssl-cert-snakeoil.key

    node2,node3 两个节点上使用pg_basebackup来拉取文件

    $ pg_basebackup -h 192.168.56.101 -D /var/lib/postgresql/9.1/main -U replicator -v -P

    node2,node3 两个节点上配置 recovery.conf

    $ cd /var/lib/postgresql/9.1/main
    $ vi recovery.conf 
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
    trigger_file = '/tmp/postgresql.trigger.5432'

    #######################################################################
    node1 节点 pgbench 生成数据过程中关闭 pgsql

    $ psql
    psql (9.1.22)
    postgres=# create database peiybdb
    $ pgbench -i -s 200 peiybdb 

    node2 查看 lsn

    $ psql
    psql (9.1.22)
    postgres=# select pg_last_xlog_receive_location(),
           pg_last_xlog_replay_location(),
           pg_last_xact_replay_timestamp(), 
           now() - pg_last_xact_replay_timestamp() AS replication_delay;
    -[ RECORD 1 ]-----------------+------------------------------
    pg_last_xlog_receive_location | 0/84001D8
    pg_last_xlog_replay_location  | 0/84001D8
    pg_last_xact_replay_timestamp | 2018-01-16 22:07:30.637413+08
    replication_delay             | 00:03:41.62903

    node3 查看 lsn

    $ psql
    psql (9.1.22)
    postgres=# select pg_last_xlog_receive_location(),
           pg_last_xlog_replay_location(),
           pg_last_xact_replay_timestamp(), 
           now() - pg_last_xact_replay_timestamp() AS replication_delay;
    -[ RECORD 1 ]-----------------+------------------------------
    pg_last_xlog_receive_location | 0/84001D8
    pg_last_xlog_replay_location  | 0/84001D8
    pg_last_xact_replay_timestamp | 2018-01-16 22:07:30.637413+08
    replication_delay             | 00:04:04.309157

    node1 节点 使用 pg_controldata 查看最后生成的 lsn

    $ /usr/lib/postgresql/9.1/bin/pg_controldata /var/lib/postgresql/9.1/main

    #######################################################################
    node1 的 postgresql 会话关闭

    $ psql
    psql (9.1.22)
    postgres=# SELECT 'select pg_terminate_backend('|| psa.procpid || ');' as term_spid, 
           psa.usename <> 'postgres'
           psa.procpid,
           psa.waiting,
           psa.client_addr,
           psa.current_query,
           now() - psa.xact_start as xact_time,
           now() - psa.query_start as query_time,
           psa.* 
    FROM pg_stat_activity psa
    WHERE 1=1
      and psa.usename <> 'postgres'
    order by case when psa.current_query = '<IDLE>' then 99
                  when psa.current_query = '<insufficient privilege>' then 98
                  else 1
              end,
             now() - psa.xact_start desc 
    ;

    然后手动执行 select pg_terminate_backend(”);

    node1 的文件 recovery.done

    $ cd /var/lib/postgresql/9.1/main/
    $ vi recovery.done
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
    trigger_file = '/tmp/postgresql.trigger.5432'

    node1 的 postgresql 关闭

    # service postgresql stop 
    或者
    $ /usr/lib/postgresql/9.1/bin/pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main 

    node1 的文件 recovery.done 重命名为 recovery.conf,谨防启动后变成双主

    $ cd /var/lib/postgresql/9.1/main/
    $ mv recovery.done recovery.conf

    node1 的 bond 网卡子接口 vip 删除

    # ip addr del 192.168.56.101/24 dev bond0 label bond0:1
    # ip addr del 192.168.165.101/24 dev bond1 label bond1:1
    # ifconfig
    # ip a

    node2 的 bond 网卡子接口 vip 添加

    # ip addr add 192.168.56.101/24 brd 192.168.56.255 dev bond0 label bond0:1
    # ip addr add 192.168.165.101/24 brd 192.168.165.255 dev bond1 label bond1:1
    # ifconfig
    # ip a

    node2 进行 arping 广播

    # arping -q -A -c 1 -I bond0:1 192.168.56.101
    # arping -q -A -c 1 -I bond1:1 10.10.165.101

    node2 的从库提升为主库

    $ /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main

    node2 查看pgsql 状态

    $ /usr/lib/postgresql/9.1/bin/pg_controldata /var/lib/postgresql/9.1/main  

    node2 上生成的 时间线拷贝到其余的从节点上
    pgsql 9.1 还需要手动 拷贝这个时间线文件,pgsql 9.3 开始就不需要手动拷贝生成的时间线文件了,会自动传输。这个一定要注意,否则再 log 文件会生成这样的FATAL 提示

    timeline 2 of the primary does not match recovery target timeline 1

    $ scp ./00000002.history postgres@192.168.56.100:/var/lib/postgresql/9.1/main/pg_xlog/
    $ scp ./00000002.history postgres@192.168.56.50:/var/lib/postgresql/9.1/main/pg_xlog/

    node1 启动pgsql 9.1

    # service postgresql start 
    或者
    $ /usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"

    node2 查看从库复制情况

    $ psql
    psql (9.1.22)
    postgres=# select client_addr,
           pg_xlog_location_diff(pg_current_xlog_insert_location(), pg_current_xlog_location() ) as local_noflush_delay,
           pg_xlog_location_diff(pg_current_xlog_location(), sent_location) as local_sent_delay,
           pg_xlog_location_diff(sent_location, write_location) as stream_write_delay,
           pg_xlog_location_diff(sent_location, flush_location) as stream_flush_delay,
           pg_xlog_location_diff(sent_location, replay_location) as stream_replay_delay 
    from pg_stat_replication;

    node2 查看master当前的 lsn

    $ psql
    psql (9.1.22)
    postgres=# select pg_current_xlog_insert_location(), 
           pg_current_xlog_location() 
    ;

    node3 查看slave当前的 lsn

    $ psql
    psql (9.1.22)
    postgres=# select pg_last_xlog_receive_location(),
           pg_last_xlog_replay_location(),
           pg_last_xact_replay_timestamp(), 
           now() - pg_last_xact_replay_timestamp() AS replication_delay
    ;

    至此,pgsql 9.1 的master/slave 切换成功。由于使用了vip,应用不需要做任何的修改,只是在应用的日志里会输出到数据库的连接丢失或者无法连接数据库的提示。

    #######################################################################
    ##### 参考
    #######################################################################
    1、查看ip

    # ifconfig
    或者
    # ip -a

    注意 scope global bond0
    scope global secondary bond0:1

    2、取消bond0:1 (192.168.56.101)

    # ip addr del 192.168.56.101/24 dev bond0 label bond0:1
    或者
    # ifconfig bond0:1 192.168.56.101 down

    3、添加bond0:1 (192.168.56.101)

    # ip addr add 192.168.56.101/24 brd 192.168.56.255 dev bond0 label bond0:1
    或者
    # ifconfig bond0:1 192.168.56.101/24 up

    4、postgresql 关闭

    # service postgresql stop 
    或者
    $ /usr/lib/postgresql/9.1/bin/pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main 

    5、postgresql 启动

    # service postgresql start 
    或者
    $ /usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"

    6、postgresql slave提升为master

    $ /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main  
    或者生成recovery.conf 的 trigger_file
    $ touch /tmp/postgresql.trigger.5432

    7、recovery.conf 文件

    vi recovery.conf 
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
    trigger_file = '/tmp/postgresql.trigger.5432'

    注意,recovery_target_timeline 一定要设置为 latest ,否则就算把新的时间线文件拷贝到slave上,也会报 timeline 2 of the primary does not match recovery target timeline 1 的错误

    8、bond0:1、bond1:1 变更后执行 arping

    # arping -q -A -c 1 -I bond0:1 192.168.56.101
    # arping -q -A -c 1 -I bond1:1 10.10.165.101

    关于 arping 命令,可以参考 http://man.linuxde.net/arping

  • 相关阅读:
    GNSS学习笔记-观测量模型和定位定速方程
    矩阵学习-基本矩阵分类
    矩阵学习-QR分解和最小二乘问题求解
    ARM 处理器 MIPS/DMIPS/MFLOPS 理解
    GNSS学习笔记-信号频率分配表
    左移右移为负数的情况
    GNSS学习笔记-坐标转换
    GCC预编译宏查看
    Windows Store无法联网时安装WSL
    Virtual box中Ubuntu虚拟机磁盘碎片整理和空间清理方法
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9793065.html
Copyright © 2020-2023  润新知