High Availability Linode Pairs

Something like a recipe for creating database and application servers that failover on each other. This recipe is for Lighttpd and MySQL, but it can be extended easily for other applications. Later, I'll do one for Tryton and PostgreSQL. I use Ubuntu 11.04 64-bit on two Linode virtual private servers.

Initial configuration, from within the Linode^[1] dashboard

Repeat the following procedure for each member of the pair:

Go to the Remote Access tab on the Linode dashboard.
- Note the public IP. We'll need that later.
- Add a private IP, and note it. We'll need that later, too.
- If by any chance you have an IPv6 private address, note that. We'll use it all. If you don't have one, click on "Enable IPv6" and get one. We really are going to use it.
- Set a Lish password. I use APG^[2] to generate new passwords, e.g. apg -n 100.
- Add an SSH public key to the Keys box. This is in addition to the Lish password. Or instead of it, if you prefer.
- This is also good time to write to Linode support. You'll need to order an additional public IP address, and by default you are unable to purchase an additional IP address until you've requested and justified it specifically with support. You'll also need another private IP address, which doesn't cost anything but it takes a support person to do it - there's no way to automatically add more than one private IP address. These addresses will float between the pairs, so it doesn't matter which of the pair get these additional addresses. Make note of these floating addresses.

Go to the Settings tab in the Linode dashboard.
- Give the Linode a good name.
- Name a display group for your pair. Generally we use "location pair", e.g. "newark pair", "dallas pair", etc.
- Change the email alert thresholds
  
  CPU Usage: 50% (Default 80% over two hours is too high)
  
  Disk I/O Rate: 2000 I/O Ops/sec (Default 1000 ops over two hours is too low)
Go to the Dashboard tab in the Linode dashboard, and click on "Create a new Disk Image".
- Create a disk image called "var-www", type "unformatted / raw", with a size that's half your available space (just as a rule of thumb).
- Create another disk image called "var-mysql", also type "unformatted / raw", with a size that's half your remaining available space (again, a rule of thumb).
Also on the Linode dashboard tab, choose to deploy a Linux distribution.
- I like the latest Ubuntu^[3]. Right now, that's 11.04 64bit.
- Set the swap partition size next. Max it out, which on Linode seems to be 1 x RAM.
- Use everything that's left for the deployment disk size.
- Set a good root password.
- Press the Deploy button. That takes you back to the dashboard.
The configuration profile name is too pedestrian. "My Ubuntu blah blah blah". I already knew it was mine, after all I made it, didn't I? Click the "Edit" link to the right of the profile.
- Change the label to something with some teeth. Something like "Ubuntu 11.04-64bit high availability pair".
- Assign block devices for our DR:BD drives. Put var-mysql on /dev/xvdc, and var-www on /dev/xvdd. Save the changes, and you'll be taken back to the dashboard.
Each server needs to be able to assume the IP address from the other. From the "Remote Access" tab for each server, select all of the addresses, both the public and private, for IP Failover.
Boot the system and log in, either through SSH using the root password you just assigned, or through Lish.

This is a good time to repeat the previous six steps on the other member of the pair. Once that is done, continue to the final step in the Linode dashboard.

##Some notes before we get into it

We have a number of implementation specific details, stuff that changes for each cluster. I'd like to refer to things in here in a way that is internally consistent, yet easy to use. So here is an index of variable data:

Server names: alice and gertrude. If you're running vi or vim against this document, and wanted to name your systems Fred and Wilma, you should be able to run :%s/alice/fred/g followed by :%s/gertrude/wilma/g.

IP addresses:

Public IP on alice: pub.a.a.a
Private IP on alice: priv.a.a.a
IPv6 on alice: ipv6:pub:a
Public IP on gertrude: pub.g.g.g
Private IP on gertrude: priv.g.g.g
IPv6 on gertrude: ipv6:pub:g
Floating public IP: pub.f.f.f
Floating private IP: priv.f.f.f

##Initial configuration, operating system

Again, repeat the following steps for each member of the pair:

Name the server.

echo 'alice' > /etc/hostname^[4]

Fix (as in "make permanent") the networking in /etc/network/interfaces:

 #vi /etc/network/interfaces

     # The loopback interface
     auto lo
     iface lo inet loopback

     # Configuration for eth0 and aliases

     # This line ensures that the interface will be brought up during boot.
     auto eth0 eth0:0

     # eth0 - This is the main IP address that will be used for most outbound connections.
     # The address, netmask and gateway are all necessary.
     iface eth0 inet static
      address pub.a.a.a
      netmask 255.255.255.0
      gateway gate.a.a.a

     # eth0:0
     # This is a private IP
     iface eth0:0 inet static
      address priv.a.a.a
      netmask 255.255.128.0

     # eth0 ipv6
     iface eth0 inet6 static
      address ipv6:pub:a
      netmask 64
      gateway fe80::1

Deal with some networking stuff that makes our life easier.

 #vi /etc/hosts

     127.0.0.1       localhost.localdomain       localhost
     pub.a.a.a       alice.version2beta.com      # FQDN is public
     priv.a.a.a  alice               # short name is private
     pub.g.g.g   gertrude.version2beta.com   # FQDN is public
     priv.g.g.g  gertrude            # short name is private
     pub.f.f.f   ag.version2beta.com     # public floating IP
     priv.f.f.f  ag              # private floating IP

     #IPv6 addresses
     ipv6:a      ip6-alice
     ipv6:g      ip6-gertrude

     # The following lines are desirable for IPv6 capable hosts
     ::1     ip6-localhost ip6-loopback
     fe00::0 ip6-localnet
     ff00::0 ip6-mcastprefix
     ff02::1 ip6-allnodes
     ff02::2 ip6-allrouters

SSH Keys

 # ssh-keygen -t rsa

Copy /etc/.ssh/id_rsa.pub into the other server's /etc/.ssh/authorized_keys, which you'll need to create. While you're at it, put your key in there too.

Restart networking. (If there's a mistake in the previous step, we'll catch it either here or in the next few steps.)
```
 # /etc/init.d/networking restart
```
Bring the package database, and installed packages, up to date.
```
 # apt-get update && apt-get upgrade
```

Install and configure DR:BD

Install the tools

First, try:

 # apt-get install drbd8-utils
 # drbdadm -V

At the top of the options, it will give the version of the DRBD module and userland version. If these match, you're golden. If they don't (as is the case currently for me), you need to build userland tools from source.

 # drbdadm -V

 DRBD module version: 8.3.10
    userland version: 8.3.9
 you should upgrade your drbd tools!

Building drbd8-utils from source:

 # apt-get remove drbd8-utils
 # apt-get install psmisc build-essential flex git xsltproc
 # wget http://oss.linbit.com/drbd/8.3/drbd-8.3.10.tar.gz
 # tar xvzf drbd-8.3.10.tar.gz
 # cd drbd-8.3.10
 # ./configure
 # make
 # make install

All good? Try the drbdadm -V again:

 # drbdadm -V

 ...
 Version: 8.3.10 (api:88)
 GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root@alice, 2011-08-06 22:35:11

Configure DR:BD global settings in /usr/local/etc/drbd.d/global_common.conf:

 # vi /usr/local/etc/drbd.d/global_common.conf

     global {
         usage-count yes;
         # minor-count dialog-refresh disable-ip-verification
     }

     common {
         protocol C;

         handlers {
             pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
             pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
             local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
             # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
             split-brain "/usr/lib/drbd/notify-split-brain.sh root";
             # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
             # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
             # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
         }

         startup {
             # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
             wfc-timeout 15;
             degr-wfc-timeout 60;
         }

         disk {
             # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
             # no-disk-drain no-md-flushes max-bio-bvecs
         }

         net {
             # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
             # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
             # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
             cram-hmac-alg sha1;
         }

         syncer {
             # rate after al-extents use-rle cpu-mask verify-alg csums-alg
             rate 4M;
         }
     }

Configure DR:BD resource settings for mysql in /usr/local/etc/drbd.d/mysql.res:

 # vi /usr/local/etc/drbd.d/mysql.res

     resource mysql {
         net {
             shared-secret "DONTTELL!";
         }
         on alice {
             device /dev/drbd0;
             disk /dev/xvdc;
             address ipv6 [ipv6:a]:7801;
             meta-disk internal;
         }
         on gertrude {
             device /dev/drbd0;
             disk /dev/xvdc;
             address ipv6 [ipv6:g]:7801;
             meta-disk internal;
         }
     }

And create resource settings for www in /usr/local/etc/drbd.d/www.res:

 # vi /usr/local/etc/drbd.d/www.res

     resource www {
         net {
             shared-secret "DON'T TELL!";
         }
         on alice {
             device /dev/drbd1;
             disk /dev/xvdd;
             address ipv6 [ipv6:a]:7802;
             meta-disk internal;
         }
         on gertrude {
             device /dev/drbd1;
             disk /dev/xvdd;
             address ipv6 [ipv6:g]:7802;
             meta-disk internal;
         }
     }

On this setup, drbd expects configuration files to be in /usr/local/etc/, and won't recognize them in /etc/. So we've put all our new stuff in the right place. Now get rid of the old. BUT, the cluster resource manager IS going to want them in /etc/, so we'll create symbolic links.

 # rm -rd /etc/drbd.*
 # ln -s /usr/local/etc/drbd.conf /etc/drbd.conf
 # ln -s /usr/local/etc/drbd.d /etc/drbd.d

Start up DR:BD

Create metadata for the devices (only need to do this on one host):

 # drbdadm create-md mysql
 # drbdadm create-md www

Modify the init script for drbd to accomodate a problem where IPv6 might not be ready when drbd starts:

 # vi /etc/init.d/drbd # Add a sleep to the top of the script

     ### END INIT INFO
     sleep 5

On each server, at roughly the same time, start drbd:

 # /etc/init.d/drbd start

On the server that will be primarily a database server:

 # drbdadm -- --overwrite-data-of-peer primary mysql
 # drbdadm disconnect mysql
 # drbdadm connect mysql
 # drbdadm disconnect www
 # drbdadm connect www

On the server that will be primarily a web application server:

 # drbdadm -- --overwrite-data-of-peer primary www
 # drbdadm disconnect www
 # drbdadm connect www
 # drbdadm disconnect mysql
 # drbdadm connect mysql

On either or both servers, check to make sure the primaries are primary, the secondaries are secondary, and the sync is syncing:

 # cat /proc/drbd

 version: 8.3.10 (api:88/proto:86-96)
 built-in
  0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
     ns:3116484 nr:0 dw:0 dr:3593516 al:0 bm:219 lo:0 pe:2 ua:1 ap:0 ep:1 wo:f oos:1649980
     [============>.......] sync'ed: 65.5% (1608/4652)Mfinish: 0:05:08 speed: 5,328 (5,056) K/sec
  1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
     ns:0 nr:2976256 dw:2976256 dr:0 al:0 bm:181 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:7509148
     [====>...............] sync'ed: 28.4% (7332/10236)Mfinish: 0:24:20 speed: 5,136 (5,060) want: 4,096 K/sec

Format the DR:BD devices, each on the server that is primary for that device. On the database server:

 # apt-get install jfsutils
 # mkfs.jfs /dev/drbd0

And on the web application server:

 # apt-get install jfsutils
 # mkfs.ext4 /dev/drbd1

On both servers, create the mount points:

 # mkdir /var/www
 # mkdir /var/mysql

 # vi /etc/fstab // Add two lines

     /dev/drbd/by-res/www    /var/www        ext4    noauto,noatime
     /dev/drbd/by-res/mysql  /var/mysql      jfs     noauto

Mount the drives, each on it's primary server:

 # mount /dev/drbd/by-res/mysql /var/mysql/
 # mount /dev/drbd/by-res/www /var/www

And there you have it - RAID 1 drives over a network. Failover comes later, after we install and configure the server software.

Services: MySQL and Lighttpd, plus a few other little things I like to have

Install stuff. Not too much stuff. Just the right stuff.

 # apt-get install bc whois apg s3cmd lsof traceroute exim4 mailutils mutt \
     aspell-doc wamerican spellutils ispell mysql-server mysql-client lighttpd \
     lighttpd-doc lighttpd-mod-magnet lua5.1 php5-cli php5-cgi php5 php5-mcrypt \
     php5-mhash php5-gd php5-mysql php5-imagick php5-curl php5-memcache php5-ps \
     php5-pspell php5-tidy php5-xmlrpc php5-xsl libgd-tools libmagickcore3-extra \
     libmcrypt-dev mcrypt rrdtool

 # dpkg-reconfigure exim4-config

Configure MySQL.

Edit /etc/mysql/my.cnf:

 # vi /etc/mysql/my.cnf

     [client]
     port            = 3306
     socket          = /var/run/mysqld/mysqld.sock

     [mysqld_safe]
     socket          = /var/run/mysqld/mysqld.sock
     nice            = 0

     [mysqld]
     user            = mysql
     socket          = /var/run/mysqld/mysqld.sock
     port            = 3306
     basedir         = /usr
     datadir         = /var/mysql
     tmpdir          = /tmp
     skip-external-locking
     bind-address            = priv.f.f.f
     key_buffer              = 16M
     max_allowed_packet      = 16M
     thread_stack            = 192K
     thread_cache_size       = 8
     myisam-recover         = BACKUP
     #max_connections        = 100
     #table_cache            = 64
     #thread_concurrency     = 10
     query_cache_limit       = 1M
     query_cache_size        = 16M
     #general_log_file        = /var/log/mysql/mysql.log
     #general_log             = 1
     log_error                = /var/log/mysql/error.log
     #server-id              = 1
     #log_bin                        = /var/log/mysql/mysql-bin.log
     expire_logs_days        = 10
     max_binlog_size         = 100M
     #binlog_do_db           = include_database_name
     #binlog_ignore_db       = include_database_name
     # ssl-ca=/etc/mysql/cacert.pem
     # ssl-cert=/etc/mysql/server-cert.pem
     # ssl-key=/etc/mysql/server-key.pem

     [mysqldump]
     quick
     quote-names
     max_allowed_packet      = 16M

     [mysql]
     #no-auto-rehash # faster start of mysql but no tab completition

     [isamchk]
     key_buffer              = 16M

     !includedir /etc/mysql/conf.d/

We changed the default directory for MySQL, so we need to move it's files into the correct directory.

 # stop mysql
 # chown mysql:mysql /var/mysql
 # cp -Rp /var/lib/mysql/* /var/mysql/
 # ifconfig eth0:1 priv.f.f.f netmask 255.255.128.0
 # start mysql

Sync these settings with the other server (both need the same configuration so that either can run the service off the same data).

 # scp -r /etc/mysql root@gertrude:/etc/

Configure Lighttpd. (Substitute other instructions for nginx, apache, etc.)

Edit /etc/lighttpd/lighttpd.conf on the web application server:

 # vi /etc/lighttpd/lighttpd.conf

     server.modules = (
         "mod_access",
         "mod_alias",
         "mod_compress",
         "mod_redirect",
             "mod_rewrite",
     )

     server.document-root        = "/var/www"
     server.upload-dirs          = ( "/var/www/cache/uploads" )
     server.errorlog             = "/var/www/logs/error.log"
     server.pid-file             = "/var/run/lighttpd.pid"
     server.username             = "www-data"
     server.groupname            = "www-data"

     index-file.names            = ( "index.php", "index.html")

     url.access-deny             = ( "~", ".inc" )

     static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" )

     ## Use ipv6 if available
     include_shell "/usr/share/lighttpd/use-ipv6.pl"

     dir-listing.encoding        = "utf-8"
     server.dir-listing          = "enable"

     compress.cache-dir          = "/var/www/cache/compress/"
     compress.filetype           = ( "application/x-javascript", "text/css", "text/html", "text/plain" )

     include_shell "/usr/share/lighttpd/create-mime.assign.pl"
     include_shell "/usr/share/lighttpd/include-conf-enabled.pl"

     magnet.attract-physical-path-to = ( "/etc/lighttpd/modx.lua" )

Enable some more modules:

 # lighttpd-enable-mod fastcgi
 # lighttpd-enable-mod simple-vhost
 # lighttpd-enable-mod accesslog
 # lighttpd-enable-mod magnet
 # lighttpd-enable-mod status

Edit more config files:

 # vi /etc/lighttpd/conf-enabled/10-accesslog.conf

     server.modules += ( "mod_accesslog" )
     accesslog.filename = "/var/www/logs/access.log"

 # vi /etc/lighttpd/conf-enabled/10-simple-vhost.conf

     server.modules += ( "mod_simple_vhost" )
     simple-vhost.server-root         = "/var/www/servers/"
     simple-vhost.document-root       = "htdocs"
     simple-vhost.default-host        = "{hostname}"

 # vi /etc/lighttpd/conf-enabled/10-fastcgi.conf

     server.modules += ( "mod_fastcgi" )
     fastcgi.server    = ( ".php" =>
             ((
                     "bin-path" => "/usr/bin/php-cgi",
                     "socket" => "/tmp/php.socket",
                     "max-procs" => 2,
                     "idle-timeout" => 20,
                     "bin-environment" => (
                             "PHP_FCGI_CHILDREN" => "4",
                             "PHP_FCGI_MAX_REQUESTS" => "10000"
                     ),
                     "bin-copy-environment" => (
                             "PATH", "SHELL", "USER"
                     ),
                     "broken-scriptfilename" => "enable"
             ))
     )

 # vi /etc/lighttpd/modx.lua

     attr = lighty.stat(lighty.env["physical.doc-root"] .. "manager/includes/config.inc.php") -- Appears to be a ModX site
     if (attr) then
       attr = lighty.stat(lighty.env["physical.path"])
       if (not attr) then -- Requested resource doesn't exist in the file system
         path = "/index.php"
         uri = lighty.env["request.uri"]
         uri2 = string.gsub(lighty.env["request.uri"], "\?", "\&")
         -- print("Original request.uri: " .. uri .. " Replaced with: " .. uri2)
         lighty.env["uri.query"] = "q=" .. string.gsub(uri, "\?", "\&")
         lighty.env["uri.path"] = path
         lighty.env["request.uri"] = path .. "?" .. lighty.env["uri.query"]
         -- print("New request.uri: " .. lighty.env["request.uri"])
         lighty.env["physical.rel-path"] = path
         lighty.env["physical.path"] = lighty.env["physical.doc-root"] .. string.sub(lighty.env["physical.rel-path"], 2)
         -- print("New physical.path: " .. lighty.env["physical.path"])
       end
     end

 # vi /etc/lighttpd/wp-rewrite.conf

     # Use when a site has a blog
     # Example:
     # $HTTP["host"] =~ "www\.example\.com" {
     #   var.wpdir = "/blog/"
     #   include "wp-rewrite.conf"
     # }

     url.rewrite-once = (
       "^" + wpdir + "(wp-.+).*/?" => "$0",
       "^" + wpdir + "(sitemap.xml)" => "$0",
       "^" + wpdir + "(xmlrpc.php)" => "$0",
       "^" + wpdir + "keyword/([A-Za-z_0-9-])/?$" => wpdir + "index.php?keyword=$1",
       "^" + wpdir + "(.+)/?$" => wpdir + "index.php/$1"
     )

Edit /etc/php5/cgi/php.ini so that these lines are correct:

 display_errors = stderr
 error_log = /var/www/logs/php_errors.log

Create some directories we just specified, but don't yet have:

 # mkdir /var/www/run /var/www/cache /var/www/cache/compress /var/www/cache/uploads /var/www/logs /var/www/servers /var/www/servers/{hostname} /var/www/servers/{hostname}/htdocs
 # chown -R www-data:www-data /var/www/run /var/www/cache /var/www/logs/ /var/www/servers

Restart the local server and sync these settings with the other server (both need the same configuration so that either can run the service off the same data).

 # /etc/init.d/lighttpd restart
 # scp -r /etc/php5 /etc/lighttpd root@alice:/etc/

Test that either server can run either service.

On the database server:

 # stop mysql
 # ifconfig eth0:1 down
 # umount /var/mysql/
 # drbdadm secondary mysql

On the web application server:

 # drbdadm primary mysql
 # mount /dev/drbd0 /var/mysql/
 # ifconfig eth0:1 priv.f.f.f netmask 255.255.128.0
 # start mysql
 # mysql -p
 # stop mysql
 # ifconfig eth0:1 down
 # umount /var/mysql/
 # drbdadm secondary mysql
 # /etc/init.d/lighttpd stop
 # umount /var/www
 # drbdadm secondary www

On the database server:

 # drbdadm primary mysql
 # drbdadm primary www
 # mount /dev/drbd0 /var/mysql/
 # mount /dev/drbd1 /var/www/
 # /etc/init.d/lighttpd start
 # ps ax | grep light
 # /etc/init.d/lighttpd stop
 # umount /var/www
 # drbdadm secondary www
 # ifconfig eth0:1 priv.f.f.f 255.255.128.0
 # start mysql

On the web application server:

 # drbdadm primary www
 # mount /dev/drbd1 /var/www
 # /etc/init.d/lighttpd start

Configure a failover cluster

Just a note here before we start. I would have liked to run corosync^[5] rather than heartbeat^[6] based solely on my understanding of pacemaker^[7], how and why it split from heartbeat. However, corosync seems to mostly require multicast networking to work^[8] and after some struggles, I've learned that Linode doesn't support multicast.

Set up heartbeat and pacemaker

Install and configure software on both servers

 # apt-get install heartbeat pacemaker

Edit /usr/lib/ocf/resource.d/heartbeat/Filesystem to comment out this whole block. This will fix a problem with the script's handling of jfs filesystems. Besides, they say right in the code, "Why should a filesystem resource agent magically load a kernel module?" I agree. Lemme handle that part and just mount the drive please.

 # vi /usr/lib/ocf/resource.d/heartbeat/Filesystem

     #       if [ "X${HOSTOS}" != "XOpenBSD" ];then
     #               # Insert SCSI module
     #               # TODO: This probably should go away. Why should the filesystem
     #               # RA magically load a kernel module?
     #               $MODPROBE scsi_hostadapter >/dev/null
     #
     #               if [ -z "$FSTYPE" -o "$FSTYPE" = none ]; then
     #                       : No FSTYPE specified, rely on the system has the right file-system support already
     #               else
     #                       # Insert Filesystem module
     #                       $MODPROBE $FSTYPE >/dev/null
     #                       grep -e "$FSTYPE"'$' /proc/filesystems >/dev/null
     #                       if [ $? -ne 0 ] ; then
     #                               ocf_log err "Couldn't find filesystem $FSTYPE in /proc/filesystems"
     #                               return $OCF_ERR_INSTALLED
     #                       fi
     #               fi
     #       fi

Configuration files:

 # vi /etc/ha.d/authkeys

     auth 1
     1 sha1 Don'tTell

 # chmod 600 /etc/heartbeat/authkeys

 # vi /etc/ha.d/ha.cf

     autojoin none
     logfacility daemon
     keepalive 2
     deadtime 15
     warntime 5
     initdead 120
     udpport 694
     ucast eth0 priv.a.a.a
     ucast eth0 priv.g.g.g
     node alice
     node gertrude
     auto_failback on
     use_logd yes
     crm respawn

Propogate the configuration and restart each server

 # scp -r /etc/ha.d/* root@gertrude:/etc/ha.d/

 # /etc/init.d/heartbeat restart

Disable automatic start of Lighttpd (using LSB) and MySQL (using Upstart)

 # update-rc.d lighttpd disable
 # vi /etc/init/mysql.conf

     # Comment out the startup
     #start on (net-device-up
     #          and local-filesystems
     #         and runlevel [2345])

Configure the resources

 # vi /usr/lib/ocf/resource.d/linbit/drbd

     OCF_RESKEY_drbdconf_default="/usr/local/etc/drbd.conf"

 # crm configure

     primitive db_alert ocf:heartbeat:MailTo \
         params email="root" subject="(db)"
     primitive db_drbd ocf:linbit:drbd \
         params drbd_resource="mysql" \
         op start interval="0" timeout="240" \
         op stop interval="0" timeout="100"
     primitive db_fs ocf:heartbeat:Filesystem \
         params device="/dev/drbd0" directory="/var/mysql" fstype="jfs" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="120"
     primitive db_ip ocf:heartbeat:IPaddr \
         params ip="priv.f.f.f" cidr_netmask="24"
     primitive db_mysql lsb:mysql \
         op monitor interval="0" enabled="false"

     primitive www_alert ocf:heartbeat:MailTo \
         params email="root" subject="(www)"
     primitive www_drbd ocf:linbit:drbd \
         params drbd_resource="www" \
         op start interval="0" timeout="240" \
         op stop interval="0" timeout="100"
     primitive www_fs ocf:heartbeat:Filesystem \
         params device="/dev/drbd1" directory="/var/www" fstype="ext4" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="120"
     primitive www_ip ocf:heartbeat:IPaddr \
         params ip="pub.f.f.f" cidr_netmask="24"
     primitive www_lighty lsb:lighttpd \
         op monitor interval="0" enabled="false"

     group db db_ip db_fs db_mysql db_alert \
         meta target-role="Started"

     group www www_fs www_ip www_lighty www_alert \
         meta target-role="Started"

     ms ms_db_drbd db_drbd \
         meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

     ms ms_www_drbd www_drbd \
         meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

     location db_gertrude db 10: gertrude
     location db_alice db 100: alice
     location www_gertrude www 100: gertrude
     location www_alice www 10: alice

     colocation db_on_drbd inf: db ms_db_drbd:Master
     colocation www_on_drbd inf: www ms_www_drbd:Master

     order db_after_drbd inf: ms_db_drbd:promote db:start
     order www_after_drbd inf: ms_www_drbd:promote www:start

     property cluster-infrastructure="Heartbeat" \
         stonith-enabled="false" \
         no-quorum-policy="ignore" \
         default-resource-stickiness="5"

I didn't realize that Linode has a referral program, so I'm updating this footnote! If you sign up, kindly use this link: Linode.com. ↩
I built a pronouncable password generator too, in Perl, more than a decade ago. Got lots of reputations points on PerlMonks.org. But APG is cool too. ↩
I used to use Debian, and got the install down to 1.3GB comfortably. But since I started running Ubuntu on my desktop I've simplified my life and got Ubuntu Server ↩
We name our servers after couples who inspired each other in their art and life. The example shown refers to Alice B. Toklas, partner and lover of Gertrude Stein for nearly 40 years, author of The Alice B. Toklas Cookbook which contains among other recipes what we often refer to as "magic brownies" (She called it "Haschich Fudge".) Gertrude was pretty cool too. ↩
This cluster management stuff gets complicated. I have to really wrap my head around it when I'm making it work, and I still only have 80% confidence I understand what it's doing, even if I have 100% confidence what I've done works. And this is a how-to, not a why-for, so I don't want to bog it down a lot. So I'll just send you to the Wikipedia page for the Corosync project. ↩
Keeping in line with the last footnote, here's the Wikipedia page for Linux-ha (high availability Linux), the project that puts out heartbeat. ↩
Okay, so you get some theory anyway. Heartbeat is how the system knows which members of a cluster are online. Pacemaker is how it determines which resources belong with cluster node. Got it? Pacemaker strikes me as the bad boy of Linux high availability. It got spun off from the Linux-ha project back in '07 in a brouhaha that involved one guy leaving the project and not replying to emails, the head of kernel R&D at SuSE talking street, really just a big clusterf**k. Well, I may be overstating it, but the first time I configured one of these pairs, I didn't have to deal with the politics of project development to try to figure out where all the pieces came from and how they went together.

Oh, Pacemaker doesn't have a Wikipedia page, but there's a Wiki page at clusterlabs.org. ↩
IP Multicasting is a way for one computer to talk to more than one computer at a time, in one transmission. It's kinda like tuning into a TV channel - you pick the channel you want to listen to, and then there's all this stuff happening. Except not at Linode. ↩

Initial configuration, from within the Linode[1] dashboard

Install and configure DR:BD

Services: MySQL and Lighttpd, plus a few other little things I like to have

Configure a failover cluster

Initial configuration, from within the Linode^[1] dashboard