Handling failovers and dropping nodes

In this section, we will take a look and see how failovers can be handled. We will also see how nodes can be added to and removed from a Postgres-XC setup in a safe and reliable way.

Handling node failovers

If you execute a query in Postgres-XC, it might be dispatched to many different nodes inside the cluster. For example, performing a sequential scan on a highly partitioned table will involve many different nodes. The question now is: What happens if one or some of those data nodes are down?

The answer is pretty simple: Postgres-XC will not be able to perform requests making use of failed nodes. This can result in a problem for both reads and writes. A query trying to fetch from a failed node will return an error indicating that no connection is available.

For you as a user, this means that if you are running Postgres-XC, you have to come up with a proper failover and High Availability (HA) strategy for your system. We recommend creating replicas of all the nodes to make sure that the controller can always reach an alternative node in case the primary data node fails. Linux HA is a good option to make nodes failsafe and to achieve fast failovers.

At the moment, it is not possible to solely rely on Postgres-XC to create an HA strategy.

Replacing the nodes

Once in a while, it might happen that you want to drop a node. To do so, you can simply call DROP NODE from your psql shell:

test=# h DROP NODE
Command:     DROP NODE
Description: drop a cluster node
Syntax:
DROP NODE nodename

If you want to perform this kind of operation, you have to make sure that you are a superuser. Normal users are not allowed to remove a node from the cluster.

Whenever you drop a node, make sure that there is no more data on it that might be usable to you. Removing a node is simply a change inside Postgres-XC's metadata, so the operation will be quick and the data will be removed from your view of the data.

One issue is: How can you actually figure out the location of the data? Postgres-XC has a set of system tables, which allow you to retrieve information about nodes, data distribution, and so on. The following example shows how a table can be created and how we can figure out where it is:

test=# CREATE TABLE t_location (id int4) 
 DISTRIBUTE BY REPLICATION;
CREATE TABLE
test=# SELECT node_name, pcrelid, relname
 FROM   pgxc_class AS a, pgxc_node AS b, 
pg_class AS c 
WHERE  a.pcrelid = c.oid
      AND b.oid = ANY (a.nodeoids);
node_name | pcrelid |  relname
-----------+---------+------------
node2     |   16406 | t_location
node3     |   16406 | t_location
node4     |   16406 | t_location
(3 rows)

In our case, the table has been replicated to all nodes.

There is one tricky thing you have to keep in mind when dropping nodes: If you drop a name and recreate it with the same name and connection parameters, it will not be the same thing. When a new node is created, it will get a new object ID. In PostgreSQL, a name is not as relevant as the ID of an object. This means that if you drop a node accidentally and recreate it using the same name, you will still face problems. Of course, you can always magically work around it by tweaking system tables by hand, but this is not what you should do.

Therefore, we highly recommend being very cautious when dropping nodes from a production system.

Running a GTM standby

A Datanode is not the only thing that can cause downtime in case of failure. The Global Transaction Manager should also be made failsafe to make sure that nothing can go wrong in case of a disaster. If the transaction manager is missing, there is no useful way to use your Postgres-XC cluster.

To make sure that the GTM cannot be a single point of failure, you can use a GTM standby. Configuring a GTM standby is not hard to do. All you have to do is to create a GTM config on a spare node and set a handful of parameters in gtm.conf:

startup = STANDBY
active_host = 'somehost.somedomain.com'
active_port = '6666'
synchronous_backup = off

First of all, we have to set the startup parameter to STANDBY. This will tell the GTM to behave as a slave. Then we have to tell the standby where to find the main productive GTM. We do so by adding a hostname and a port.

Finally, we can decide whether the GTM should be replicated synchronously or asynchronously.

To start the standby, we can use gtm_ctl again. This time, we use the –Z gtm_standby to mark the node as standby.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.59.192