Wednesday, March 28, 2012

RedHat Cluster Configuration.....

In this tutorial we will configure a 4 nodes cluster with shared storage and heartbeat over a different NIC(not the main data link)  
Cluster configuration goals
  • Shared storage
  • HA-LVM: lvm failover configuration (like HP ServiceGuard) is different from clustered logical volume manager (clvm)!!
  • Bonded main data link (eg. bond0 –> eth0 + eth1)
  • Heartbeat on a different data link (eg. eth2) 
os Installation
First we performed a full CentOS 5.5 installation using kickstart, we also installed cluster packages like:
  • cman
  • rgmanager
  • qdiskd
  • ccs_tools
Networking Configuration
We configure 2 different data link:
  1. Main data link (for applications)
  2. Heartbeat data link (for cluster communication)
Main data link (bond0) uses ethernet bonding over 2 phisycal eth (eth0, eth1). This configuration assures network high availability when some network paths fail.
Cluster communication (heartbeat) uses a dedicated ethernet link (eth2), configured in a diffentent network and vlan.
To obtain such configuration cerate this file /etc/sysconfig/network-scripts/ifcfg-bond0 from scratch and fill it as below:

IPADDR=<your server main IP address (eg.>
NETMASK=<your server main network mask (eg.>
NETWORK=<your server main network (eg.>
BROADCAST=<your server main network broadcast (eg.>
BONDING_OPTS='miimon=100 mode=1'
GATEWAY=<your server main default gateway (eg.>

You can customize BONDING_OPT. Please see bonding documentation.

Modify /etc/sysconfig/network-scripts/ifcfg-eth{0,1}:
DEVICE=<eth0 or eth1, etc...>
HWADDR=<your eth MAC address (eg. 00:23:7d:3c:18:40)>

Modify heartbeat
nic /etc/sysconfig/network-scripts/ifcfg-eth2:
HWADDR=<your eth MAC address (eg. 00:23:7D:3C:CE:96)>
NETMASK=<your server heartbeat network mask (eg.>
IPADDR=<your server main IP address (eg.>

Note that heartbeat eth2 has no default gateway configured. Normally this is not required unless this node is outside other node’s network and there are not specific static routes.

Add this line to /etc/modprobe.conf:

alias bond0 bonding

Add to /etc/hosts the informations about each cluster node and replicate the file among the nodes:

# These are example!!! server1 server2 server2 server3 server3 server4 server4

  • Logical Volume Manager configuration

    We choose not to use clustered logical volume manager (clvmd, sometimes called LVMFailover) but to use HA-LVM instead. HA-LVM is totally different from clvmd and it is quite similar to HP ServiceGuard behaviour.

    HA-LVM features

    No needs to run any daemon (like clvmd aka LVMFailover)
    Each volume group can be activated exclusively on one node at a time
    Volume group configuration is not replicated automatically among the nodes (need to run vgscan on the nodes)
    Implementation not dipendent of the cluster status (can work without cluster running at all)

    HA-LVM howto

    Configure /etc/lvm/lvm.conf as below:

    Substitute existing filter with:

    filter = [ "a/dev/mpath/.*/", "a/c[0-9]d[0-9]p[0-9]$/", "a/sd*/", "r/.*/" ]

    check locking_type:

    locking_type = 1

    substitute existing volume_list with:

    volume_list = [ "vg00", "<quorum disk volume group>", "@<hostname related to heartbeat nic>" ]

  • Vg00 is the name of the root volume group (always active)
  • <quorum disk volume group> is the name of the quorum disk volume group (always active)
  • @<hostname related to heartbeat nic> is a tag. Each volume group can have one tag at a time. Cluster lvm agents tag the volume groups with the hostname (present into configuration) in order to activate them. LVM activate only volume groups that contain such tag. In this way each volume group tagged can be activated and accessed by one node at a time (because of volume_list settings)

At the end remember to regenerate initrd!

# mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r)

Storage configuration
Depending of your storage system, you should configure multipath, and each should be able to access to the same luns.

Quorum disk
Quorum disk is a 20MB LUN shared on the storage to all cluster nodes. This disk is used by the cluster to tie-break in case of split-brain events. Each node update its own information to the quorum disk. If some nodes experience network problems, the quorum disk assures that only the right group of nodes form the cluster but not both (split-brain)!

Quorum disk creation

First be sure that each node can see the same 20MB LUN. Then, on the first node, create a physical volume:

# pvcreate /dev/mpath1

create a dedicated volume group:

# vgcreate -s 8 vg_qdisk /dev/mpath1

create a logical volume and extend it to maximun volume group size:

# lvcreate -l <max_vg_pe> -n lv_qdisk vg_qdisk

Make sure that this volume group is present into volume_list inside /etc/lvm/lvm.conf. It should be activated on all nodes!

On the other nodes perform a:

# vgscan

Should appear the quorum disk volume group.

Quorum disk configuration

Now we have to populate quorum disk space with the right information. To perform this type:

# mkqdisk -c /dev/vg_qdisk/lv_qdisk -l <your_cluster_name>

Note that is not required to use your cluster name as quorum disk label, but it is recommended.

You need also to create a heuristic script to help qdisk when acting as tie-breaker. Create /usr/share/cluster/

# Network link status checker

ethtool $1 | grep -q "Link detected.*yes"
exit $?

Now activate the quorum disk:

# service qdiskd start
# chkconfig qdiskd on

Logging configuration

In order to assure a good logging you can choose to log the rgmanager to a specific file.

Add this lines to /etc/syslog.conf:

# Red Hat Cluster
local4.* /var/log/rgmanager

Add /var/log/rgmanager to logrotate syslog settings in /etc/logrotate.d/syslog:

/var/log/messages /var/log/secure /var/log/maillog /var/log/spooler /var/log/boot.log /var/log/cron /var/log/rgmanager {
/bin/kill -HUP `cat /var/run/ 2> /dev/null` 2> /dev/null || true
/bin/kill -HUP `cat /var/run/ 2> /dev/null` 2> /dev/null || true

  • Modify this line in /etc/cluster/cluster.conf:

    <rm log_facility="local4" log_level="5">

    Increment /etc/cluster/cluster.conf version and update on all nodes:

    # ccs_tool update /etc/cluster/cluster.conf

    Cluster configuration
    For configuring cluster you can choose to use:

    Luci web interface
    Manual xml configuration

    Configuring cluster using luci

    In order to use luci web interface you need to activate service ricci on all nodes and luci on one node only:

    (on all nodes)
    # chkconfig ricci on
    # service ricci start

    (choose only a node)
    # chkconfig luci on
    # luci_admin init
    # service luci restart

    Please note that luci_admin init must be executed only the first time and before starting luci service, otherwise luci will be unusable.

    now connect to luci: Here you can create a cluster, add nodes, create services, failover domains etc…
          How to Login in Luci Web interface after initialization / Default password of Luci
          When you access the luci web interface it will ask you the credentials, so to create the credentials
           Open the termianl
           #/etc/init.d/luci stop
           #luci_admin init
          (Here it will ask the password for admin, Enter the password of your choice)
           #/etc/init.d/luci start

        Now login to Luci Web interface with user admin and password which you have entered during                    initialization 

Configuring cluster editing the XML

You can also manually configure a cluster editing its main config file /etc/cluster/cluster.conf. To create the config skeleton use:

# ccs_tool create

now the just created config file is not yet usable, you should configure cluster settings, add nodes, create services, failover domains etc…

When config file is complete, copy the file on all nodes and start the cluster in this way:

(on all nodes)
# chkconfig cman on
# chkconfig rgmanager on
# service cman start
# service rgmanager start

See Recommended cluster configuration to learn the right settings for the cluster.

See Useful cluster commands to learn some useful console cluster commands to use.

Recommended cluster configuration

Here is attached a /etc/cluster/cluster.conf file of a fully configured cluster.

For commenting purposes, the file is splitted into several consecutive parts:
<?xml version="1.0"?>
<cluster alias="jcaps_prd" config_version="26" name="jcaps_prd">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternode name="" nodeid="1" votes="1">
<clusternode name="" nodeid="2" votes="1">
<clusternode name="" nodeid="3" votes="1">
<cman expected_votes="4"/>

This is the first part of the XML cluster config file.

First line describes the cluster name and the config_version. Each time you modify the XML you must increment the config_version by 1 prior to update the config on all nodes.
Fence deamon line is the default one.
Cluster node stanza contains the nodes of the cluster. Note that name property contains the FQDN of the name. This name determines the eth used for cluster communication. In this example we don’t use the main hostname but the hostname related to the eth we choose to use as cluster communication channel.
Note also that the line <fence/> is required. Note that here we do not use any fence device. Due to the nature of HA-LVM the access to the data sould be exclusive by one node at a time.
Cman expected_votes is 4 because each node give 1 vote each.

<rm log_facility="local4" log_level="5">
<failoverdomain name="jcaps_prd" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="" priority="1"/>
<failoverdomainnode name="" priority="1"/>
<failoverdomainnode name="" priority="1"/>

This section begins resource manager configuration (<rm ...>).
Resource manager section can be configured for logging. Rm logs to syslog, here we configured the log_facility and the logging level. The facility we specified allows us to log to a separate file (see logging configuration)
We configured also a failover domain containing all cluster node. We want that a service can switch to all cluster nodes, but you can also configure different behaviours here.

<service autostart="1" domain="jcaps_prd" exclusive="0" name="subversion" recovery="relocate">
<ip address="" monitor_link="1"/>
<lvm name="vg_subversion_apps" vg_name="vg_subversion_apps"/>
<lvm name="vg_subversion_data" vg_name="vg_subversion_data"/>
<fs device="/dev/vg_subversion_apps/lv_apps" force_fsck="1" force_unmount="1" fsid="61039" fstype="ext3" mountpoint="/apps/subversion" name="svn_apps" self_fence="0">
<fs device="/dev/vg_subversion_data/lv_repositories" force_fsck="1" force_unmount="1" fsid="3193" fstype="ext3" mountpoint="/apps/subversion/repositories" name="svn_repositories" self_fence="0"/>
<script file="/my_cluster_scripts/subversion/" name="subversion"/>

This section contains the services in the cluster (like HP ServiceGuard packages)

We choose the failover domain (in this case our failover domain contains all nodes so the service can run on all nodes)
We add a ip address resource (use always monitor link!)
We use also a HA-LVM resource (<lvm ...>). Here all VG specified will be tagged with the node name when activating. This means that they can be activated only on the node where the service is running (only on that node!). Note: If you do not specify any LV, all the LVs inside the VG will be activated!
Next there are also <fs ...> tags for mounting filesystem resources. It is recommended to use force_unmount and force_fsck.
You can specify also a custom script for starting application/services and so on. Please note that the script must be LSB compliant. This means that it must handle start|stop|status. Note also that default cluster behaviour is to run the script with status parameter every 30 seconds. If the script status does not return 0, the service will be marked as failed (and probably will be restarted/relocated).


This section closes the resource manager configuration (closes XML tag).

<totem consensus="4800" join="60" token="20000" token_retransmits_before_loss_const="20"/>

This is a crucial part of cluster configuration. Here you specify the failure detection time of cluster.

  • RedHat recommends to the CMAN membership (token) timeout value to be at least times that of the qdiskd timeout value. Here the value is 20 seconds.
<quorumd interval="2" label="jcaps_prd_qdisk" min_score="2" tko="5" votes="1">
<heuristic interval="2" program="/usr/share/cluster/ bond0" score="3"/>

Here we configure the quorum disk to be used by the cluster.

We choose a quorum timeout value of 10 seconds (quorumd interval * quorumd tko) which is a half of token timeout (20 seconds).
We insert also a heuristic script to determine the network health. This will help qdisk to take a decision when split-brain happens.


This concludes the configuration file closing XML tags still opened.

Useful cluster commands

ccs_tool update /etc/cluster/cluster.conf (update cluster.conf among all nodes)
clustat (see cluster status)
clusvcadm -e <service> (enable/start a service)
clusvcadm -d <service> (disable/stop service)
vgs -o vg_name,vg_size,vg_tags (show all volume groups names, size and tags)

DISCLAIMER: The information provided on this website comes without warranty of any kind and is distributed AS IS. Every effort has been made to provide the information as accurate as possible, but no warranty or fitness is implied. The information may be incomplete, may contain errors or may have become out of date. The use of this information described herein is your responsibility, and to use it in your own environments do so at your own risk.

Copyright © 2012 LINUXHOWTO.IN

1 comment: