linux高可用集群heartbeat实现http的高可用

linux高可用集群的种类非常多。比方常见的heartbeat,corosync,rhcs,keepalived,这些集群软件的出现为我们的业务生产环境提供了高可用的保证。本文将简介一下用heartbeat的v2版本号来处理一个简单的http高可用集群的搭建。

在实现http高可用集群之前，首先至少须要2台主机。而且须要做3点主要的准备工作:
1.设置节点名称，并且集群中的全部节点都能通过节点名称去解析集群中的全部主机。为了集群服务的高可用性，这里选择使用/etc/hosts。并且要保证uname -n的值必须和hostname中的值一致。
2.使用ssh让双机互信。
3.时间同步。

在安装heartbeat软件之前。首先把上面的基本工作完毕。这里採用2台主机(192.168.1.201,192.168.1.202)来做我们的高可用集群服务。

1.首先登陆192.168.1.201改动hostname=test1.qiguo.com,在/etc/sysconfig/network中改动HOSTNAME=test1.qiguo.com保证server下次启动的时候主机名不变。在/etc/hosts中加入192.168.1.201 test1.qiguo.com test1; 192.168.1.20 test2.qiguo.com test2两行，再在192.168.1.202的这台主机上运行相同的操作。
2.在192.168.1.201中运行ssh-keygen -t rsa，然后使用ssh-copy-id -i root@test2实现ssh互信。这个步骤在两个机器上都要运行。
3.在2台server上使用时间同步命令ntpdate 133.100.11.8(可用的ntpserverip地址)

上面三步完毕以后，就能够開始安装heartbeat了。能够去epel下载heartbeat的安装包，默认须要下heartbeat-2.1.4-11.el5.i386，heartbeat-gui-2.1.4-11.el5.i386。heartbeat-pils-2.1.4-11.el5.i386，heartbeat-stonith-2.1.4-11.el5.i386 这4个软件包。可是这4个软件包依赖于其他两个软件包perl-MailTools-1.77-1.el5.noarch，libnet-1.1.6-7.el5.i386。所以首先得把这2个软件包给装出来。

使用rpm -ivh perl-MailTools-1.77-1.el5.noarch的时候会报依赖关系的错误，所以用yum --nogpgcheck localinstall perl-MailTools-1.77-1.el5.noarch来安装。

然后用相同的方式来一起安装剩余的几个包。

注意：这些软件要在2个server上都要安装

安装完毕heartbeat中。heartbeat默认的配置文件在/etc/ha.d中。

ha.d中的rc.d都是资源管理相关的脚本，而resource.d中都是资源代理脚本，服务脚本在/etc/ha.d/heartbeat中。

默认装起后的heartbeat没有配置文件,只是能够从/usr/share/doc/heartbeat-2.1.4/中把ha.cf,authkeys和haresources三个文件放在/etc/ha.d中。

这3个配置文件的作用是:
authkeys:密钥文件,这个文件的权限必须为600，否则不能启动heartbeat服务
ha.cf:heartbeat服务自身的配置文件
haresources:资源代理配置文件
以下仅仅须要对这3个文件做下配置就能够实现我们的http高可用集群了。

先来看authkeys文件：
#auth 1 提供密钥的认证方式
#1 crc 循环冗余校验码认证
#2 sha1 HI! sha1算法认证
#3 md5 Hello! md5认证
这里最好採用sha或者md5认证。crc的性能偏低。假设使用md5认证的配置文件例如以下：
auth 1 # 1代码使用以下以1开头的行来作为密钥认证的条件
1 md5 9adc3f50d9bb9e9c795fce0a839aa766
生成md5的方式仅仅须要在shell命令行中，输入echo "qiguo" | md5sum就可以

第二个配置文件ha.cf里面的内容非常多。简介例如以下：

    #debugfile   /var/log/ha-debug #是否启用debug的日志
    logfile   /var/log/ha-log  #日志文件的存放位置
    #logfacility   local0  #日志的设施，假设启用了logfile,就不要启动这个选项
    keepalive   2   #每隔多少时间进行心跳检測一次
    #deadtime   30   #server经过多少时间后，还没有检測到其存在，就觉得其已经掉线
    #warntime   10   #警告时长
    #initdead   120  #一个集群起来多久。第二个集群还没启动。则觉得集群不成功
    #udpport    694  #监听的端口
    #baud     19200  #串行线的发送速率
    bcast    eth0   #以广播的方式发送心跳检測(这里我们使用广播的方式，直接启动bcast eth0就可以，这样的方式在局域网中机子多的情况下，非常耗费资源)
    #mcast    eth0 255.0.0.1 694 1 0  #以多播的方式发送心跳检測
    #ucast    eth0 192.168.1.2 #以单播的方式发送心跳检測
    #auto_failback on  #主节点挂了以后，又恢复了，是否从新跳转到主节点上，on表示从新跳转。
    #stonith baytech /etc/ha.d/conf/stonith.baytech  #定义stonith，怎么隔绝不在线的节点
    #node ken3  #集群内的节点名称，每个节点须要使用一个node,而且值必须与uname -n的值同样
    node test1.qiguo.com
    node test2.qiguo.com
    #ping 10.10.10.254  #指定ping的地址
    ping 192.168.1.1    #网管地址

第三个配置文件haresources文件是集群资源配置文件。

上面提供了非常多的配置例子，拿当中一个的例子配置文件来说明： #node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
node1就是主节点的名称，10.0.0.170就是vip,Filesystem是资源代理（资源代理能够从/etc/ha.d/resource.d和/etc/init.d/从查找，"::"代表该资源代理的參数）。这里我们做http高可用，所以配置例如以下:
test1.qiguo.com IPaddr::192.168.1.210/24/eth0 httpd就可以
上述三个配置文件成功后。就把他们拷贝到192.168.1.202这个主机上。复制完毕以后，分别在两台主机上装上httpd服务。

装上的httpd服务一定不能让他们开机自己主动启动。

假设所有配置成功以后，能够关闭httpd服务開始启动heartbeat服务了。

heartbeat[4825]: 2014/05/11_23:54:35 info: Version 2 support: false
heartbeat[4825]: 2014/05/11_23:54:35 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[4825]: 2014/05/11_23:54:35 info: **************************
heartbeat[4825]: 2014/05/11_23:54:35 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat[4826]: 2014/05/11_23:54:35 info: heartbeat: version 2.1.4
heartbeat[4826]: 2014/05/11_23:54:35 info: Heartbeat generation: 1399811242
heartbeat[4826]: 2014/05/11_23:54:35 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat[4826]: 2014/05/11_23:54:35 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
heartbeat[4826]: 2014/05/11_23:54:35 info: glib: ping heartbeat started.
heartbeat[4826]: 2014/05/11_23:54:35 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[4826]: 2014/05/11_23:54:35 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[4826]: 2014/05/11_23:54:35 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[4826]: 2014/05/11_23:54:35 info: Local status now set to: 'up'
heartbeat[4826]: 2014/05/11_23:54:36 info: Link test1.qiguo.com:eth0 up.
heartbeat[4826]: 2014/05/11_23:54:36 info: Link 192.168.1.1:192.168.1.1 up.
heartbeat[4826]: 2014/05/11_23:54:36 info: Status update for node 192.168.1.1: status ping
heartbeat[4826]: 2014/05/11_23:54:41 info: Link test2.qiguo.com:eth0 up.
heartbeat[4826]: 2014/05/11_23:54:41 info: Status update for node test2.qiguo.com: status up
harc[4835]:     2014/05/11_23:54:41 info: Running /etc/ha.d/rc.d/status status
heartbeat[4826]: 2014/05/11_23:54:42 info: Comm_now_up(): updating status to active
heartbeat[4826]: 2014/05/11_23:54:42 info: Local status now set to: 'active'
heartbeat[4826]: 2014/05/11_23:54:42 info: Status update for node test2.qiguo.com: status active
harc[4853]:     2014/05/11_23:54:42 info: Running /etc/ha.d/rc.d/status status
heartbeat[4826]: 2014/05/11_23:54:53 info: remote resource transition completed.
heartbeat[4826]: 2014/05/11_23:54:53 info: remote resource transition completed.
heartbeat[4826]: 2014/05/11_23:54:53 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[4907]:   2014/05/11_23:54:53 INFO:  Resource is stopped
heartbeat[4871]: 2014/05/11_23:54:53 info: Local Resource acquisition completed.
harc[4957]:     2014/05/11_23:54:53 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[4957]:  2014/05/11_23:54:53 received ip-request-resp IPaddr::192.168.1.210/24/eth0 OK yes
ResourceManager[4976]:  2014/05/11_23:54:53 info: Acquiring resource group: test1.qiguo.com IPaddr::192.168.1.210/24/eth0 httpd
IPaddr[5002]:   2014/05/11_23:54:53 INFO:  Resource is stopped
ResourceManager[4976]:  2014/05/11_23:54:53 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.210/24/eth0 start
IPaddr[5097]:   2014/05/11_23:54:53 INFO: Using calculated netmask for 192.168.1.210: 255.255.255.0
IPaddr[5097]:   2014/05/11_23:54:53 INFO: eval ifconfig eth0:0 192.168.1.210 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[5068]:   2014/05/11_23:54:53 INFO:  Success
ResourceManager[4976]:  2014/05/11_23:54:53 info: Running /etc/init.d/httpd  start

观察日志，能够发现高可用的http集群已经启动起来了。

如今人为的在test1这台机子上运行shutdown -h now后观察日志的变化。（也能够使用heartbeat自带的hb_standby脚本来切换。默认在/usr/lib/heartbeat文件夹下）

heartbeat[11796]: 2014/05/11_20:56:46 info: Received shutdown notice from 'test1.qiguo.com'.
heartbeat[11796]: 2014/05/11_20:56:46 info: Resources being acquired from test1.qiguo.com.
heartbeat[11862]: 2014/05/11_20:56:46 info: acquire local HA resources (standby).
heartbeat[11863]: 2014/05/11_20:56:46 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys test2.qiguo.com] to acquire.
heartbeat[11862]: 2014/05/11_20:56:46 info: local HA resource acquisition completed (standby).
heartbeat[11796]: 2014/05/11_20:56:46 info: Standby resource acquisition done [all].
harc[11888]:    2014/05/11_20:56:46 info: Running /etc/ha.d/rc.d/status status
mach_down[11903]:       2014/05/11_20:56:46 info: Taking over resource group IPaddr::192.168.1.210/24/eth0
ResourceManager[11928]: 2014/05/11_20:56:46 info: Acquiring resource group: test1.qiguo.com IPaddr::192.168.1.210/24/eth0 httpd
IPaddr[11954]:  2014/05/11_20:56:46 INFO:  Resource is stopped
ResourceManager[11928]: 2014/05/11_20:56:46 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.210/24/eth0 start
IPaddr[12049]:  2014/05/11_20:56:46 INFO: Using calculated netmask for 192.168.1.210: 255.255.255.0
IPaddr[12049]:  2014/05/11_20:56:46 INFO: eval ifconfig eth0:0 192.168.1.210 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[12020]:  2014/05/11_20:56:46 INFO:  Success
ResourceManager[11928]: 2014/05/11_20:56:46 info: Running /etc/init.d/httpd  start
mach_down[11903]:       2014/05/11_20:56:46 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[11903]:       2014/05/11_20:56:46 info: mach_down takeover complete for node test1.qiguo.com.
heartbeat[11796]: 2014/05/11_20:56:46 info: mach_down takeover complete.

打开备server上的日志，注意观察，在备server上，已经将资源所有拿了过去，如今继续訪问192.168.1.210能够看到显示的就是test2这台主机上的内容。当test1从新上线以后，因为我们上面设置了auto_failback的值为on，所以会再次把资源拿回来，这里就不再放日志文件了。到这里一个简单的高可用httpd服务就已经建立起来了。

因为非常多情况下httpd高可用服务还会用到共享文件的服务，所以有时候须要共享文件系统。仅仅须要在haresources中多定义一个文件系统的资源。

test1.qiguo.com IPaddr::192.168.1.210/24/eth0 Filesystem::192.168.1.230:/html::/var/www/html::nfs httpd。这里採用nfs文件系统来挂载。

相关阅读:
搜索部分学习小结
 递归与搜索部分知识点小结
 匿名函数
 监督学习和非监督学习
 单变量线性回归
 神经网络（2）
html
javascript
win10+celery4.x以上+redis的天坑
 Django-Views
原文地址：https://www.cnblogs.com/wgwyanfs/p/7009962.html