Performing failovers

Once you have learned how to replicate tables and add them to sets, it is time to learn about failover. Basically we can distinguish between two types of failovers:

  • Planned failovers
  • Unplanned failovers and crashes

In this section we will learn about both scenarios.

Planned failovers

Having planned failovers is more of a luxury scenario. In many cases you will not be so lucky and you have to rely on automatic failover or face unplanned outages.

Basically a planned failover can be seen as moving a set of tables to some other node. Once that other node is in charge of those tables, you can handle things accordingly.

In our example we want to move all tables from node 1 to node 2. In addition to that we want to drop the first node. Here is the code:

slonik<<_EOF_
cluster name = first_cluster;

node 1 admin conninfo = 'dbname=$MASTERDB host=$HOST1 user=$DBUSER';
node 2 admin conninfo = 'dbname=$SLAVEDB host=$HOST2 user=$DBUSER';

lock set (id = 1, origin = 1);
move set (id = 1, old origin = 1, new origin = 2);
wait for event (origin = 1, confirmed = 2, wait on=1);

drop node (id = 1, event node = 2);
_EOF_

After our standard introduction we can call move set. The clue here is: We have to create a lock to make this work. The reason is that we have to protect ourselves against changes made to the system while failover is performed. You must not forget this lock, otherwise you might find yourself in a truly bad situation. Just as in all of our previous examples, nodes, and sets are represented using their numbers.

Once we have moved the set to the new location, we have to wait for the event to be completed and finally we can drop the node (if this is desired).

If the script is 100 percent correct, it can be executed cleanly:

hs@hs-VirtualBox:~/slony$ ./slony_move_set.sh
debug: waiting for 1,5000016417 on 2

Once we have failed over to the second node, we can at once delete data. Slony has removed the triggers preventing this operation:

db2=# DELETE FROM t_second;
DELETE 1

The same has happened to the table on the first node. There are no more triggers but the table itself is still in place:

db1=# d t_second
   Table "public.t_second"
 Column |  Type   | Modifiers 
--------+---------+-----------
id      | integer | not null
name    | text    | 
Indexes:
    "t_second_pkey" PRIMARY KEY, btree (id)

You can now take the node offline and use it for other purposes.

Tip

Using a planned failover is also the desired strategy you should apply when upgrading a database to a new version of PostgreSQL with little downtime. Just replicate an entire database to an instance running the new version and do a controlled failover. The actual downtime of this kind of upgrading will be minimal and it is therefore possible to do it with a large amount of data.

Unplanned failovers

In case of an unplanned failover, you have not been so lucky. An unplanned failover could be some power outage, a hardware failure or some site failure. Whatever it might be, there is no need to be afraid – you can still bring the cluster back to a reasonable state easily.

To do so Slony provides the failover command:

  • failover (id = 1, backup node = 2);
  • drop node (id = 1, event node = 2);

This is all you need to execute on one of the remaining nodes to do a failover from one node to the other and to remove the node from the cluster. It is a safe and reliable procedure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.124.145