http://www.5dmail.net/html/2004-11-1/200411193630.htm linux 双机热备份
Red hat 9 linux的集群安装比较简单,需要的安装文件有以下几个:
net-snmp-5.0.6-17.i386.rpm
按顺序一次安装
1、heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm
2、net-snmp-5.0.6-17.i386.rpm
#rpm -ivh heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm
#rpm -ivh net-snmp-5.0.6-17.i386.rpm
#rpm -ivh heartbeat-1.0.4-2.rh.9.um.1.i386.rpm
安装完成之后,开始配置主服务器。配置文件位于/etc/ha.d下,用rpm安装之后不会产生配置文件,需要从/usr/share/doc
/heartbeat-1.0.4下,把ha.cf,,,,authkeys,,,,,,,,haresources,,,,三个文件cp到/etc
/ha.d下面。
文件在ha.cf是主要heartbeat的配置文件,authkeys是heartbeat的安全配置文件,haresource文件是heartbeat的资源文件
其文件说明如下:
ha.cf
#############################################################################################
#
# There are lots of options in this file. All you have to have is a set
# of nodes listed {"node ...}
# and one of {serial, bcast, mcast, or ucast}
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
#
# In particular, make sure that the timings and udpport
# et al are set before the heartbeat media are defined!
# All will be fine if you keep them ordered as in this
# example.
#
#
# Note on logging:
# If any of debugfile, logfile and logfacility are defined then they
# will be used. If debugfile and/or logfile are not defined and
# logfacility is defined then the respective logging and debug
# messages will be loged to syslog. If logfacility is not defined
# then debugfile and logfile will be used to log messges. If
# logfacility is not defined and debugfile and/or logfile are not
# defined then defaults will be used for debugfile and logfile as
# required and messages will be sent there.
#
# File to write debug messages to
debugfile /var/log/ha-debug 【heartbeat的debug信息记录文件】
#
#
# File to write other messages to
#
logfile /var/log/ha-log 【日志文件】
#
#
# Facility to use for syslog()/logger
#
logfacility local0 【记录日志在syslog中,可选项】
#
#
# A note on specifying "how long" times below...
#
# The default time unit is seconds
# 10 means ten seconds
#
# You can also specify them in milliseconds
# 1500ms means 1.5 seconds
#
#
# keepalive: how long between heartbeats?
#
keepalive 3 【每3秒发送一次keeplive消息】
#
# deadtime: how long-to-declare-host-dead?
#
deadtime 15 【如果15秒没有收到keeplive消息将会认为节点已经失效】
#
# warntime: how long before issuing "late heartbeat" warning?
# See the FAQ for how to use warntime to tune deadtime.
#
warntime 10 【在日志中记录最后心跳last heartbeat-best 前的警告时间】
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you've been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
#
initdead 60 【如果节点的机器重启后,可能需要一些时间启动网络,这个时间与deadtime不一样,要单独对待】
#
#
# nice_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails.
#
# The default is "off", which means that it WILL fail
# back to the node which is declared as primary in haresources
#
# "on" means that resources only move to new nodes when
# the nodes they are served on die. This is deemed as a
# "nice" behavior (unless you want to do active-active).
#
nice_failback on 【如果主节点失效之后,重新恢复后,不会再成为主节点,只有当当前主节点失效,此节点才可恢复
# There are lots of options in this file. All you have to have is a set
# of nodes listed {"node ...}
# and one of {serial, bcast, mcast, or ucast}
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
#
# In particular, make sure that the timings and udpport
# et al are set before the heartbeat media are defined!
# All will be fine if you keep them ordered as in this
# example.
#
#
# Note on logging:
# If any of debugfile, logfile and logfacility are defined then they
# will be used. If debugfile and/or logfile are not defined and
# logfacility is defined then the respective logging and debug
# messages will be loged to syslog. If logfacility is not defined
# then debugfile and logfile will be used to log messges. If
# logfacility is not defined and debugfile and/or logfile are not
# defined then defaults will be used for debugfile and logfile as
# required and messages will be sent there.
#
# File to write debug messages to
debugfile /var/log/ha-debug 【heartbeat的debug信息记录文件】
#
#
# File to write other messages to
#
logfile /var/log/ha-log 【日志文件】
#
#
# Facility to use for syslog()/logger
#
logfacility local0 【记录日志在syslog中,可选项】
#
#
# A note on specifying "how long" times below...
#
# The default time unit is seconds
# 10 means ten seconds
#
# You can also specify them in milliseconds
# 1500ms means 1.5 seconds
#
#
# keepalive: how long between heartbeats?
#
keepalive 3 【每3秒发送一次keeplive消息】
#
# deadtime: how long-to-declare-host-dead?
#
deadtime 15 【如果15秒没有收到keeplive消息将会认为节点已经失效】
#
# warntime: how long before issuing "late heartbeat" warning?
# See the FAQ for how to use warntime to tune deadtime.
#
warntime 10 【在日志中记录最后心跳last heartbeat-best 前的警告时间】
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you've been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
#
initdead 60 【如果节点的机器重启后,可能需要一些时间启动网络,这个时间与deadtime不一样,要单独对待】
#
#
# nice_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails.
#
# The default is "off", which means that it WILL fail
# back to the node which is declared as primary in haresources
#
# "on" means that resources only move to new nodes when
# the nodes they are served on die. This is deemed as a
# "nice" behavior (unless you want to do active-active).
#
nice_failback on 【如果主节点失效之后,重新恢复后,不会再成为主节点,只有当当前主节点失效,此节点才可恢复
为主节点】
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
#
# Baud rate for serial ports...
# (must precede "serial" directives)
#
#baud 19200
#
# serial serialportname ...
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cua/a # Solaris
#
# What UDP port to use for communication?
# [used by bcast and ucast]
#
#udpport 694
#
# What interfaces to broadcast heartbeats over?
#
#bcast eth1 # Linux
#bcast eth1 eth2 # Linux
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
#
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (no reason to differ
# from the port used for broadcast heartbeats)
# [ttl] the ttl value for outbound heartbeats. This affects
# how far the multicast packet will propagate. (1-255)
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
# This field should always be set to 0.
#
#
mcast eth1 225.0.0.22 694 1 0 【使用组播225.0.0.22,端口694发送keeplive消息】
#
# Set up a unicast / udp heartbeat medium
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on
# [peer-ip-addr] IP address of peer to send packets to
#
#ucast eth0 192.168.1.2
#
#
# Watchdog is the watchdog timer. If our own heart doesn't beat for
# a minute, then our machine will reboot.
#
#watchdog /dev/watchdog
#
# "Legacy" STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith <stonith_type> <configfile>
#
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# The format of the line is:
# stonith_host <hostfrom> <stonith_type> <params...>
# <hostfrom> is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
# <stonith_type> is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
# <params...> are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t <stonith_type>
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you're asking
# for a denial of service attack ;-)
#
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node rh-9-a 【定义节点名称,必须是节点的主机名】
node rh-9-b
#
# Less common options...
#
# Treats 10.10.10.254 as a psuedo-cluster-member
#
#ping www.163.com www.google.com
#
# Started and stopped with heartbeat. Restarted unless it exits
# with rc=100
#
#respawn userid /path/name/to/run
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
#
# Baud rate for serial ports...
# (must precede "serial" directives)
#
#baud 19200
#
# serial serialportname ...
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cua/a # Solaris
#
# What UDP port to use for communication?
# [used by bcast and ucast]
#
#udpport 694
#
# What interfaces to broadcast heartbeats over?
#
#bcast eth1 # Linux
#bcast eth1 eth2 # Linux
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
#
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (no reason to differ
# from the port used for broadcast heartbeats)
# [ttl] the ttl value for outbound heartbeats. This affects
# how far the multicast packet will propagate. (1-255)
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
# This field should always be set to 0.
#
#
mcast eth1 225.0.0.22 694 1 0 【使用组播225.0.0.22,端口694发送keeplive消息】
#
# Set up a unicast / udp heartbeat medium
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on
# [peer-ip-addr] IP address of peer to send packets to
#
#ucast eth0 192.168.1.2
#
#
# Watchdog is the watchdog timer. If our own heart doesn't beat for
# a minute, then our machine will reboot.
#
#watchdog /dev/watchdog
#
# "Legacy" STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith <stonith_type> <configfile>
#
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# The format of the line is:
# stonith_host <hostfrom> <stonith_type> <params...>
# <hostfrom> is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
# <stonith_type> is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
# <params...> are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t <stonith_type>
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you're asking
# for a denial of service attack ;-)
#
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node rh-9-a 【定义节点名称,必须是节点的主机名】
node rh-9-b
#
# Less common options...
#
# Treats 10.10.10.254 as a psuedo-cluster-member
#
#ping www.163.com www.google.com
#
# Started and stopped with heartbeat. Restarted unless it exits
# with rc=100
#
#respawn userid /path/name/to/run
##################################################################
authkeys
#
# Authentication file. Must be mode 600
#
#
# Must have exactly one auth directive at the front.
# auth send authentication using this method-id
#
# Then, list the method and key that go with that method-id
#
# Available methods: crc sha1, md5. Crc doesn't need/want a key.
#
# You normally only have one authentication method-id listed in this file
#
# Put more than one to make a smooth transition when changing auth
# methods and/or keys.
#
#
# sha1 is believed to be the "best", md5 next best.
#
# crc adds no security, except from packet corruption.
# Use only on physically secure networks.
#
auth 3 【指定认证加密方式,3 表示加密方式的行号】
#1 crc
#2 sha1 HI!
3 md5 Hello! 【使用md5加密,密码为hello!】
# Authentication file. Must be mode 600
#
#
# Must have exactly one auth directive at the front.
# auth send authentication using this method-id
#
# Then, list the method and key that go with that method-id
#
# Available methods: crc sha1, md5. Crc doesn't need/want a key.
#
# You normally only have one authentication method-id listed in this file
#
# Put more than one to make a smooth transition when changing auth
# methods and/or keys.
#
#
# sha1 is believed to be the "best", md5 next best.
#
# crc adds no security, except from packet corruption.
# Use only on physically secure networks.
#
auth 3 【指定认证加密方式,3 表示加密方式的行号】
#1 crc
#2 sha1 HI!
3 md5 Hello! 【使用md5加密,密码为hello!】
####################################################################################################################################
#
# This is a list of resources that move from machine to machine as
# nodes go down and come up in the cluster. Do not include
# "administrative" or fixed IP addresses in this file.
#
# <VERY IMPORTANT NOTE>
# The haresources files MUST BE IDENTICAL on all nodes of the cluster.
#
# The node names listed in front of the resource group information
# is the name of the preferred node to run the service. It is
# not necessarily the name of the current machine. If you are running
# nice_failback OFF then these services will be started
# up on the preferred nodes - any time they're up.
#
# If you are running with nice_failback ON, then the node information
# will be used in the case of a simultaneous start-up.
#
# BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
# If your files are different then almost certainly something
# won't work right.
# </VERY IMPORTANT NOTE>
#
#
# We refer to this file when we're coming up, and when a machine is being
# taken over after going down.
#
# You need to make this right for your installation, then install it in
# /etc/ha.d
#
# Each logical line in the file constitutes a "resource group".
# A resource group is a list of resources which move together from
# one node to another - in the order listed. It is assumed that there
# is no relationship between different resource groups. These
# resource in a resource group are started left-to-right, and stopped
# right-to-left. Long lists of resources can be continued from line
# to line by ending the lines with backslashes ("\").
#
# These resources in this file are either IP addresses, or the name
# of scripts to run to "start" or "stop" the given resource.
#
# The format is like this:
#
#node-name resource1 resource2 ... resourceN
#
#
# If the resource name contains an :: in the middle of it, the
# part after the :: is passed to the resource script as an argument.
# Multiple arguments are separated by the :: delimeter
#
# In the case of IP addresses, the resource script name IPaddr is
# implied.
#
# For example, the IP address 135.9.8.7 could also be represented
# as IPaddr::135.9.8.7
#
# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
# The given IP address is directed to an interface which has a route
# to the given address. This means you have to have a net route
# set up outside of the High-Availability structure. We don't set it
# up here -- we key off of it.
#
# The broadcast address for the IP alias that is created to support
# an IP address defaults to the highest address on the subnet.
#
# The netmask for the IP alias that is created defaults to the same
# netmask as the route that it selected in in the step above.
#
# The base interface for the IPalias that is created defaults to the
# same netmask as the route that it selected in in the step above.
#
# If you want to specify that this IP address is to be brought up
# on a subnet with a netmask of 255.255.255.0, you would specify
# this as IPaddr::135.9.8.7/24 .
#
# If you wished to tell it that the broadcast address for this subnet
# was 135.9.8.210, then you would specify that this way:
# IPaddr::135.9.8.7/24/135.9.8.210
#
# If you wished to tell it that the interface to add the address to
# is eth0, then you would need to specify it this way:
# IPaddr::135.9.8.7/24/eth0
#
# And this way to specify both the broadcast address and the
# interface:
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
# The IP addresses you list in this file are called "service" addresses,
# since they're they're the publicly advertised addresses that clients
# use to get at highly available services.
#
# For a hot/standby (non load-sharing) 2-node system with only
# a single service address,
# you will probably only put one system name and one IP address in here.
# The name you give the address to is the name of the default "hot"
# system.
#
# Where the nodename is the name of the node which "normally" owns the
# resource. If this machine is up, it will always have the resource
# it is shown as owning.
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
#
#-------------------------------------------------------------------
#
# Simple case: One service address, default subnet and netmask
# No servers that go up and down with the IP address
#
#just.linux-ha.org 135.9.216.110
#
#-------------------------------------------------------------------
#
# Assuming the adminstrative addresses are on the same subnet...
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 http
#-------------------------------------------------------------------
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#-------------------------------------------------------------------
#
# One service address, with the subnet, interface and bcast addr
# explicitly defined.
#
#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd
#
#-------------------------------------------------------------------
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter '::' to separate each argument.
#
rh-9-a 11.1.1.96/24/eth0 【定义主节点使用的公网IP,掩码和接口名称】
#
# Regarding the node-names in this file:
#
# They must match the names of the nodes listed in ha.cf, which in turn
# must match the `uname -n` of some node in the cluster. So they aren't
# virtual in any sense of the word.
#
# This is a list of resources that move from machine to machine as
# nodes go down and come up in the cluster. Do not include
# "administrative" or fixed IP addresses in this file.
#
# <VERY IMPORTANT NOTE>
# The haresources files MUST BE IDENTICAL on all nodes of the cluster.
#
# The node names listed in front of the resource group information
# is the name of the preferred node to run the service. It is
# not necessarily the name of the current machine. If you are running
# nice_failback OFF then these services will be started
# up on the preferred nodes - any time they're up.
#
# If you are running with nice_failback ON, then the node information
# will be used in the case of a simultaneous start-up.
#
# BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
# If your files are different then almost certainly something
# won't work right.
# </VERY IMPORTANT NOTE>
#
#
# We refer to this file when we're coming up, and when a machine is being
# taken over after going down.
#
# You need to make this right for your installation, then install it in
# /etc/ha.d
#
# Each logical line in the file constitutes a "resource group".
# A resource group is a list of resources which move together from
# one node to another - in the order listed. It is assumed that there
# is no relationship between different resource groups. These
# resource in a resource group are started left-to-right, and stopped
# right-to-left. Long lists of resources can be continued from line
# to line by ending the lines with backslashes ("\").
#
# These resources in this file are either IP addresses, or the name
# of scripts to run to "start" or "stop" the given resource.
#
# The format is like this:
#
#node-name resource1 resource2 ... resourceN
#
#
# If the resource name contains an :: in the middle of it, the
# part after the :: is passed to the resource script as an argument.
# Multiple arguments are separated by the :: delimeter
#
# In the case of IP addresses, the resource script name IPaddr is
# implied.
#
# For example, the IP address 135.9.8.7 could also be represented
# as IPaddr::135.9.8.7
#
# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
# The given IP address is directed to an interface which has a route
# to the given address. This means you have to have a net route
# set up outside of the High-Availability structure. We don't set it
# up here -- we key off of it.
#
# The broadcast address for the IP alias that is created to support
# an IP address defaults to the highest address on the subnet.
#
# The netmask for the IP alias that is created defaults to the same
# netmask as the route that it selected in in the step above.
#
# The base interface for the IPalias that is created defaults to the
# same netmask as the route that it selected in in the step above.
#
# If you want to specify that this IP address is to be brought up
# on a subnet with a netmask of 255.255.255.0, you would specify
# this as IPaddr::135.9.8.7/24 .
#
# If you wished to tell it that the broadcast address for this subnet
# was 135.9.8.210, then you would specify that this way:
# IPaddr::135.9.8.7/24/135.9.8.210
#
# If you wished to tell it that the interface to add the address to
# is eth0, then you would need to specify it this way:
# IPaddr::135.9.8.7/24/eth0
#
# And this way to specify both the broadcast address and the
# interface:
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
# The IP addresses you list in this file are called "service" addresses,
# since they're they're the publicly advertised addresses that clients
# use to get at highly available services.
#
# For a hot/standby (non load-sharing) 2-node system with only
# a single service address,
# you will probably only put one system name and one IP address in here.
# The name you give the address to is the name of the default "hot"
# system.
#
# Where the nodename is the name of the node which "normally" owns the
# resource. If this machine is up, it will always have the resource
# it is shown as owning.
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
#
#-------------------------------------------------------------------
#
# Simple case: One service address, default subnet and netmask
# No servers that go up and down with the IP address
#
#just.linux-ha.org 135.9.216.110
#
#-------------------------------------------------------------------
#
# Assuming the adminstrative addresses are on the same subnet...
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 http
#-------------------------------------------------------------------
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#-------------------------------------------------------------------
#
# One service address, with the subnet, interface and bcast addr
# explicitly defined.
#
#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd
#
#-------------------------------------------------------------------
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter '::' to separate each argument.
#
rh-9-a 11.1.1.96/24/eth0 【定义主节点使用的公网IP,掩码和接口名称】
#
# Regarding the node-names in this file:
#
# They must match the names of the nodes listed in ha.cf, which in turn
# must match the `uname -n` of some node in the cluster. So they aren't
# virtual in any sense of the word.
#
根据情况更改配置文件,两台服务器的heartbeat配置必须一样,这样才能启动heartbeat,
启动heartbeat:
/etc/rc.d/init.d/heartbeat start [stop|restart]