常用命令:
修改虚拟IP
pcs resource update virtual_ip ip=10.16.10.200
重启节点服务
#重启节点的ClusterIP服务
pcs resource cleanup ClusterIP
#重启节点所有的服务
pcs resource cleanup
两台都在online状态时,服务运行在node1服务器上,想切换到node2服务器上
pcs cluster standby node1
将node1再次激活
pcs cluster unstandby node1
常用命令:
查看集群状态:#pcs status
查看集群当前配置:#pcs config
开机后集群自启动:#pcs cluster enable --all
启动集群:#pcs cluster start --all
停止集群:#pcs cluster destroy --all
查看集群资源状态:#pcs resource show
设置节点为备用状态:#pcs cluster standby node1
取消节点备用状态:#pcs cluster unstandby node1
(pcs resource create umail_data ocf:heartbeat:Filesystem device="/dev/mapper/umail_vg-umail_lv" directory="/umaildata" fstype="xfs")
----------------------------------------------------------------------------------------------------------
1. 安装常见的习惯用的包(node1、node2)
yum install vim wget tmux lrzsz unzip -y
2. 配置服务器hosts记录
环境:(node1、node2)
192.168.1.181 drbd1.cspcs.com node1
192.168.1.216 drbd2.cspcs.com node2
(切记需要注意使用hostname的时候,需要直接显示的主机名)
(相关的环境记得改成自已的,欢迎踩到坑里)
127.0.0.1 localhost drbd1.cspcs.com localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.181 drbd1.cspcs.com node1
192.168.1.216 drbd2.cspcs.com node2
# Created by anaconda
NETWORKING=yes
HOSTNAME=drbd1.cspcs.com
node1
node2
3、关闭iptables和SELINUX,避免安装过程中报错,部署完成后可以再开启(node1,node2)
# systemctl stop firewalld
# systemctl disable firewalld
# setenforce 0
# vi /etc/selinux/config
---------------
SELINUX=disabled
---------------
(node1,node2)
reboot
4、创建LVM步骤 (node1,node2) LVM
fdisk /dev/sdb
pvdisplay
pvcreate /dev/sdb1
vgcreate umail_vg /dev/sdb1
vgdisplay
lvcreate -l 4095 -n umail_lv umail_vg
5、时间同步(node1,node2)
# yum install -y rdate
# rdate -s time-b.nist.gov
6、(DRBD安装)(node1,node2)
3、由于编译安装没成功,所以选择yum方式安装(node1,node2)
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# yum install -y kmod-drbd84 drbd84-utils
# systemctl enable drbd
4、配置文件介绍
# /etc/drbd.conf #主配置文件
# /etc/drbd.d/global_common.conf #全局配置文件
5、加载DRBD模块、查看DRBD模块是否加载到内核:(node1,node2)
# modprobe drbd
# lsmod |grep drbd
drbd 397041 0
libcrc32c 12644 2 xfs,drbd
如果加载DRBD模块报下面的错误:
# modprobe drbd
FATAL: Module drbd not found.
备注:由于在安装依赖包的时候,已经安装kernel,所以一般情况下不会出现下面的错误。如果出现了可以先尝试重启看下,如果重启后还是不行,就按照下面的方法操作:
原因:这个报错是因为内核并不支持此模块,所以需要更新内核,
更新内核的方法是:yum install kernel(备注:如果没报错不建议更新)
更新后,记得一定要重新启动操作系统!!!
重启系统后再次使用命令查看,此时的内核版本变为
# uname -r
此时再次尝试加载模块drbd
# modprobe drbd
参数配置:(node1,node2)
# vi /etc/drbd.d/db.res
resource r0{
protocol C;
startup { wfc-timeout 0; degr-wfc-timeout 120;}
disk { on-io-error detach;}
net{
timeout 60;
connect-int 10;
ping-int 10;
max-buffers 2048;
max-epoch-size 2048;
}
syncer { rate 200M;}
on drbd1.cspcs.com{
device /dev/drbd0;
disk /dev/mapper/umail_vg-umail_lv;
address 192.168.1.181:7788;
meta-disk internal;
}
on drbd2.cspcs.com{
device /dev/drbd0;
disk /dev/mapper/umail_vg-umail_lv;
address 192.168.1.216:7788;
meta-disk internal;
}
}
(使用drbdadm create-md r0这个步骤的的时候,如果提示如下报错,可以使用dd if=/dev/zero of=/dev/mapper/umail_vg-umail_lv bs=1M count=1)
======================================================================
将drbd1.cspcs.com主机配置为主节点:(node1,注意只有node1,这步一定要等待显示下面的状态后才能执行下一步)
此步骤可以使用cat /proc/drbd进行查看硬盘同步的状态。
drbdsetup /dev/drbd0 primary --force
【node1、node2】
mkdir /store
【node1】
mkfs.xfs /dev/drbd0
mount /dev/drbd0 /store
df -h
umount /store/
df -h
drbdadm secondary r0
cat /proc/drbd
7、测试故障转移(drbd故障转移)
【node2】
mkdir /store
cat /proc/drb
drbdadm primary r0
mount /dev/drbd0 /store
cat /proc/drbd
8、安装PCS套件(node1,node2)
# yum install corosync pcs pacemaker -y
#在上一步yum安装的时候,会创建 hacluster用户,与pcs一起用于配置集群节点
#修改两台服务器上面hacluster 账号的密码
# echo "password" | passwd --stdin hacluster
#分别在两台服务器上面启动pcsd服务
#systemctl start pcsd
以下开始,可以只要在(node1)上面操作
#授权集群节点,让节点间可以正常通讯
#pcs cluster auth node1 node2
#添加集群节点成员,创建一个名为cluster_umail的集群,并把 node1与node2添加为成员
#pcs cluster setup --name cluster_umail node1 node2 --force
#启动集群节点,下面的命令成功启动后,corosync和pacemaker守护进程会一同被启动,可以通过systemctl status corosync和systemctl status pacemaker观察
(#pcs cluster start node1 启动某个节点)
#pcs cluster start --all
启动后检测群集的状态:
#pcs status cluster
检查集群中节点的状态:
#pcs status nodes
#corosync-cmapctl |grep members
#pcs status corosync
---------------------------
群集配置
检查配置是否还有错误:
#pcs status corosync
#crm_verify -L -V
#pcs property set stonith-enabled=false
#crm_verify -L -V
#pcs property set no-quorum-policy=ignore
#pcs property
[root@mail1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@mail1 ~]#
创建虚拟IP与查看虚拟IP地址状态
#pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.1.191 cidr_netmask=32 op monitor interval=30s
#pcs status resources
#ip addr
#pcs status 查看虚拟IP的状态,pcs cluster stop停止一边的集群,看一下虚拟IP是否有转移到另外一台
在另外一台使用pcs status查看虚拟IP是否转移过来了
添加drbd资源至集群中、挂载,并且添加关联约束
[root@drbd1 ~]# pcs cluster cib add_drbd
[root@drbd1 ~]# ls -al add_drbd
-rw-r--r-- 1 root root 4083 Dec 12 21:11 add_drbd
[root@drbd1 ~]# pcs -f add_drbd resource create umaildata ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
[root@drbd1 ~]# pcs -f add_drbd resource master umaildata_sync umaildata master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
[root@drbd1 ~]# pcs -f add_drbd resource show
virtual_ip (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: umaildata_sync [umaildata]
Stopped: [ node1 node2 ]
[root@drbd1 ~]# pcs status
Cluster name: cluster_umail
Stack: corosync
Current DC: node1 (version 1.1.19-8.el7_6.1-c3c624ea3d) - partition with quorum
Last updated: Wed Dec 12 21:16:41 2018
Last change: Wed Dec 12 21:07:26 2018 by root via cibadmin on node1
2 nodes configured
1 resource configured
Online: [ node1 node2 ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started node1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
[root@drbd1 ~]# pcs -f add_drbd resource show
virtual_ip (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: umaildata_sync [umaildata]
Stopped: [ node1 node2 ]
[root@drbd1 ~]# pcs cluster cib-push add_drbd
CIB updated
[root@drbd1 ~]#
pcs cluster cib add_fs
pcs -f add_fs resource create umail_fs Filesystem device="/dev/drbd0" directory="/store" fstype="xfs"
pcs -f add_fs constraint colocation add umail_fs umaildata_sync INFINITY with-rsc-role=Master
df -h
pcs -f add_fs constraint order promote umaildata_sync then start umail_fs
pcs -f add_fs constraint colocation add virtual_ip umail_fs INFINITY
#以下步骤先启动umail_fs,然后再启动virtual_ip
pcs -f add_fs constraint order umail_fs then virtual_ip
pcs cluster cib-push add_fs
ip addr
#先使用pcs status查看一下现在是挂载在那台,然后使用pcs cluster stop将其停止一下,然后看一下是否切换到另外一台上面,然后再使用pcs cluster start 将其启动
pcs status
pcs cluster stop
pcs cluster start
##############################################################
#配置pcs,并且添加虚拟Ip与drbd资源,相关的历史命令
##############################################################
yum install corosync pcs pacemaker -y
echo "password" | passwd --stdin hacluster
systemctl start pcsd
pcs cluster auth node1 node2
pcs cluster setup --name cluster_umail node1 node2 --force
pcs cluster start --all
pcs status cluster
pcs status nodes
pcs status
corosync-cmapctl |grep 'members'
pcs status corosync
crm_verify -L -V
pcs property set stonith-enabled=false
crm_verify -L -V
pcs property set no-quorum-policy=ignore
pcs property
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.1.191
pcs status
pcs cluster stop
pcs cluster start
pcs status
pcs cluster cib add_drbd
ls -al add_drbd
pcs -f add_drbd resource create umaildata ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
pcs -f add_drbd resource master umaildata_sync umaildata master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs -f add_drbd resource show
pcs cluster cib add_fs
pcs cluster cib-push add_drbd
pcs status
#以下步骤为将drbd的设备进行挂载,并且添加关联约束
pcs cluster cib add_fs
pcs -f add_fs resource create umail_fs Filesystem device="/dev/drbd0" directory="/store" fstype="xfs"
pcs -f add_fs constraint colocation add umail_fs umaildata_sync INFINITY with-rsc-role=Master
df -h
pcs -f add_fs constraint order promote umaildata_sync then start umail_fs
pcs -f add_fs constraint colocation add virtual_ip umail_fs INFINITY
#以下步骤先启动umail_fs,然后再启动virtual_ip
pcs -f add_fs constraint order umail_fs then virtual_ip
pcs cluster cib-push add_fs
ip addr
#先使用pcs status查看一下现在是挂载在那台,然后使用pcs cluster stop将其停止一下,然后看一下是否切换到另外一台上面,然后再使用pcs cluster start 将其启动
pcs status
pcs cluster stop
pcs cluster start
##############################################################
9、umail目录配置
分别在两台服务器上面安装一下U-Mail,并且更新至最新版本一下
1、在node1进行以下操作(先把/store目录挂载到node1)
# 移动目录到 /store 目录
mv /usr/local/u-mail/data/mailbox /store
mv /usr/local/u-mail/data/backup /store <默认没有此目录,可以跳过>
mv /usr/local/u-mail/data/www/webmail/attachment /store
mv /usr/local/u-mail/data/www/webmail/netdisk /store
mv /usr/local/u-mail/data/mysql/default/umail /store
mv /usr/local/u-mail/data/mysql/default/ibdata1 /store
mv /usr/local/u-mail/data/mysql/default/ib_logfile0 /store
mv /usr/local/u-mail/data/mysql/default/ib_logfile1 /store
# 建立软链接
ln -s /store/mailbox /usr/local/u-mail/data/mailbox
ln -s /store/backup /usr/local/u-mail/data/backup <默认没有此目录,可以跳过>
ln -s /store/attachment /usr/local/u-mail/data/www/webmail/attachment
ln -s /store/netdisk /usr/local/u-mail/data/www/webmail/netdisk
ln -s /store/umail /usr/local/u-mail/data/mysql/default/umail
ln -s /store/ibdata1 /usr/local/u-mail/data/mysql/default/ibdata1
ln -s /store/ib_logfile0 /usr/local/u-mail/data/mysql/default/ib_logfile0
ln -s /store/ib_logfile1 /usr/local/u-mail/data/mysql/default/ib_logfile1
# 更正权限
chown -R umail.root /usr/local/u-mail/data/mailbox/
chown -R umail.umail /usr/local/u-mail/data/backup/ <默认没有此目录,可以跳过>
chown -R umail_apache.umail_apache /usr/local/u-mail/data/www/webmail/attachment/
chown -R umail_apache.umail_apache /usr/local/u-mail/data/www/webmail/netdisk/
chown -R umail_mysql.umail_mysql /usr/local/u-mail/data/mysql/default/umail
chown -R umail_mysql.umail_mysql /usr/local/u-mail/data/mysql/default/ibdata1
chown -R umail_mysql.umail_mysql /usr/local/u-mail/data/mysql/default/ib_logfile0
chown -R umail_mysql.umail_mysql /usr/local/u-mail/data/mysql/default/ib_logfile1
2、在node2上面进行以下操作(把/store目录挂载到node2)
# 修改原来的内容
mv /usr/local/u-mail/data/mailbox{,_bak}
mv /usr/local/u-mail/data/backup{,_bak} <默认没有此目录,可以跳过>
mv /usr/local/u-mail/data/www/webmail/attachment{,_bak}
mv /usr/local/u-mail/data/www/webmail/netdisk{,_bak}
mv /usr/local/u-mail/data/mysql/default/umail{,_bak}
mv /usr/local/u-mail/data/mysql/default/ibdata1{,_bak}
mv /usr/local/u-mail/data/mysql/default/ib_logfile0{,_bak}
mv /usr/local/u-mail/data/mysql/default/ib_logfile1{,_bak}
# 建立软链接
ln -s /store/mailbox /usr/local/u-mail/data/mailbox
ln -s /store/backup /usr/local/u-mail/data/backup <默认没有此目录,可以跳过>
ln -s /store/attachment /usr/local/u-mail/data/www/webmail/attachment
ln -s /store/netdisk /usr/local/u-mail/data/www/webmail/netdisk
ln -s /store/umail /usr/local/u-mail/data/mysql/default/umail
ln -s /store/ibdata1 /usr/local/u-mail/data/mysql/default/ibdata1
ln -s /store/ib_logfile0 /usr/local/u-mail/data/mysql/default/ib_logfile0
ln -s /store/ib_logfile1 /usr/local/u-mail/data/mysql/default/ib_logfile1
10、添加U-Mail服务至集群中
#添加umail_mysqld服务
pcs resource create umail_mysqld_server service:umail_mysqld op monitor interval="30" timeout="60" op start interval="0" timeout="60" op stop interval="0" timeout="60" meta target-role="Started"
#添加umail_nginx服务
pcs resource create umail_nginx_server service:umail_nginx op monitor interval="30" timeout="60" op start interval="0" timeout="60" op stop interval="0" timeout="60" meta target-role="Started"
#添加umail_apache服务
pcs resource create umail_apache_server service:umail_apache op monitor interval="30" timeout="60" op start interval="0" timeout="60" op stop interval="0" timeout="60" meta target-role="Started"
#添加umail_postfix服务
pcs resource create umail_postfix_server service:umail_postfix op monitor interval="30" timeout="60" op start interval="0" timeout="60" op stop interval="0" timeout="60" meta target-role="Started"
#添加启动节点约束:umail_nginx_server、umail_apache_server、umail_postfix_server 、umail_mysqld_server 启动的节点必须是 umail_fs 的节点
pcs constraint colocation add umail_nginx_server with umail_fs
pcs constraint colocation add umail_apache_server with umail_fs
pcs constraint colocation add umail_postfix_server with umail_fs
pcs constraint colocation add umail_mysqld_server with umail_fs
#添加order约束,先启动umail_fs,然后再启动 umail_mysqld_server
pcs constraint order umail_fs then umail_mysqld_server
11、设置开机启动项(node1、node2)
systemctl enable pcsd
systemctl enable corosync
systemctl enable pacemaker
vim /usr/lib/systemd/system/corosync.service
在第8行下面添加:
ExecStartPre=/usr/bin/sleep 10
#重新加载systemd守护程序
systemctl daemon-reload
cat /usr/lib/systemd/system/corosync.service
12、自行重启测试
祝君好运
------------------------------------------------------------------------------------------------------------------------
附(马总写的):DRBD常见维护
一、服务器维护建议:
1、不要同时重启两台服务器,否则可能会争夺资源(术语叫做脑裂),建议间隔5分钟左右。
2、不要同时开机两台服务器,否则可能会争夺资源(术语叫做脑裂),建议间隔5分钟左右。
3、当前心跳线是10.0.100.0网段的,建议后期在两台服务器上各加一个网卡,用网线直接将两台服务器相连(IP配置成另外一个网段)。这样可以避免由于您10.0.100.0网段出现故障造成争夺资源(术语叫做脑裂)。
二、升级注意:
1、如果将一台服务器升级到最新版本了,需要切换到另外一台也升级到最新版本。
三、怎么确认同步是否有问题:
最基本的方法,在两台服务器上运行df –h命令查看存储挂载情况:
正常情况:一台服务器挂载了(红框圈中的分区),另外一台服务器没挂载,并且两边drbd都是启动的,并且cat /proc/drbd状态正常。
不正常情况1:如果两台服务器都挂载了(红框圈中的分区),表示不正常,即发生了脑裂。这时候请联系技术支持解决。
不正常情况2:一台服务器挂载了(红框圈中的分区),另外一台服务器没挂载,但是drdb服务停止状态,并且cat /proc/drbd状态不正常。
不正常情况下drbd状态一般为:
(1). 其中两个个节点的连接状态为 StandAlone
(2). 其中一个节点的连接状态为 WFConnection,另一个问题StandAlone
查看主备服务器DRBD状态:
/etc/init.d/drbd status
或
cat /proc/drbd
四、DRBD同步异常的原因:
(1). 采用HA环境的时候自动切换导致脑裂;
(2). 人为操作或配置失误,导致产生的脑裂;
(3). 经验有限,惭愧的很,只碰到以上2中产生脑裂的原因。
(4). drbd服务停止了
五、解决方法:
一般问题状态可能是这样的:
备机(hlt1):
[root@hlt1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt1.holitech.net, 2016-10-31 10:43:50
m:res cs ro ds p mounted fstype
0:r0 WFConnection Secondary/Unknown UpToDate/DUnknown C
[root@hlt1 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt1.holitech.net, 2016-10-31 10:43:50
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:383860
主机(hlt2):
[root@hlt2 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt2.holitech.net, 2016-10-31 10:49:30
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:987208 dr:3426933 al:1388 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1380568204
[root@hlt2 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt2.holitech.net, 2016-10-31 10:49:30
m:res cs ro ds p mounted fstype
0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----- ext4
1、在备服务器操作:其中example(比如r0)是资源名。
[root@hlt1 ~]# drbdadm secondary r0
[root@hlt1 ~]# drbdadm --discard-my-data connect r0 (如果返回错误信息,就多执行一次)
2、在主服务器操作:
[root@hlt2 ~]# drbdadm connect r0
[root@hlt2 ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by root@master.luodi.com, 2013-11-03 00:03:40
1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:6852 nr:0 dw:264460 dr:8393508 al:39 bm:512 lo:0 pe:2 ua:0 ap:0 ep:1 wo:d oos:257728
[>....................] sync'ed: 4.7% (257728/264412)K
finish: 0:03:47 speed: 1,112 (1,112) K/sec
3、备主机上查看:DRBD恢复正常:
备服务器:
[root@hlt1 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt1.holitech.net, 2016-10-31 10:43:50
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:1455736720 dw:1455736720 dr:0 al:0 bm:140049 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
主服务器:
[root@hlt2 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt2.holitech.net, 2016-10-31 10:49:30
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:1455737960 nr:0 dw:85995012 dr:1403665281 al:113720 bm:139737 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
如果出现下列错误:
‘r0‘ not defined in your config (for this host).检查是否设置hosts文件和配置文件主机名是否一致
Exclusive open failed. Do it anyways 检查是否已经打开drbd服务,先关闭
DRBD日常管理: