Postgres-XC is not just a fancy version of PostgreSQL but a truly distributed system. This means that you cannot just store data and expect things to be fast and efficient out of the box. If you want to optimize for speed, it can be highly beneficial to think about how data is stored behind the scenes and how queries are executed.
Sure, you can just load data and things will work, but, if performance is really an issue, you should really try to think about how you use your data. Keep in mind there is no point in using a distributed database system if your load is low. So, if you are a user of Postgres-XC, we expect your load and your requirements to be high.
One of the most important questions is where to store data. Postgres-XC cannot know what you are planning to do with your data and what kind of access pattern you are planning to run. To make sure that users get some control on where to store data, CREATE TABLE
offers some syntax extensions:
[ DISTRIBUTE BY { REPLICATION | ROUND ROBIN | { [HASH | MODULO ] ( column_name ) } } ] [ TO { GROUP groupname | NODE nodename [, ... ] } ]
The DISTRIBUTE BY
clause allows you to specify where to store a table. If you want to tell Postgres-XC that a table has to be on every node in the cluster, we recommend using REPLICATION
. This is especially useful if you are creating a small lookup table or some table that is frequently used in many queries.
If the goal is to scale out, it is recommended to spread a table to a list of nodes. Why would anybody want to split the table? The reason is actually quite simple. If you gave full replicas of a table on all the Datanodes, it actually means that you will have one write per node. Clearly, this is not more scalable than a single node because each node has to take all the load. For large tables facing heavy writes, it can therefore be beneficial to split the table to various nodes. Postgres-XC offers various ways to do that.
ROUND ROBIN
will just spread the data more or less randomly, HASH
will dispatch data based on a hash key, and MODULO
will simply evenly distribute data given a certain key.
To make management a little easier, Postgres-XC allows you to group nodes into so called node groups. This can come in pretty handy if a table is not supposed to reside on all the nodes inside the cluster but just on, say, half of them.
To group nodes, you can call CREATE NODE GROUP
:
test=# h CREATE NODE GROUP Command: CREATE NODE GROUP Description: create a group of cluster nodes Syntax: CREATE NODE GROUP groupname WITH nodename [, ... ]
Keep in mind that a node group is static; you cannot add nodes to it later on. So if you start to organize your cluster, you have to think beforehand which areas your cluster will have.
In addition to that, it is pretty hard to reorganize the data once it has been dispatched. If a table is spread to, say, four nodes, you cannot just easily add a fifth node to handle the table. First of all, adding a fifth node would require a rebalance, and secondly, most of those features are still under construction and not yet fully available to end users.
Dispatching your data cleverly is essential if you want to join the data. Let us assume a simple scenario consisting of three tables:
Let us assume that we have to join this data frequently. In this scenario, it is highly recommended to partition t_person
and t_person_payment
by the very same join Key. Doing that will enable Postgres-XC to join and merge a lot of stuff locally on the Datanodes instead of having to ship data around inside the cluster. Of course, we can also create full replicas of the t_person
table if this table is read so often that this makes sense.
t_postal_code
is a typical example of a table that might be replicated to all the nodes. We can expect postal codes to be pretty static. In real life, postal codes basically never change (at least, not 1000 postal codes per second or so), the table will also be really small, and it will be needed by many other joins as well. A full replica makes perfect sense here.
When coming up with a proper partitioning logic, we just want to remind you of a simple rule: Try to do calculations locally, that is, try to avoid moving data around at any cost.
If your goal is to use Postgres-XC to do business intelligence and data warehousing, you have to make sure that your scan speed will stay high. This can be achieved by using as much hardware as possible at the same time. Scaling out your fact tables to many hosts will make perfect sense here.
We also suggest fully replicating fairly small lookup tables so that as much work as possible can be performed on those data nodes. What does small mean in this context? Let us imagine that you are storing information about millions of people around the world. You might want to split data across many nodes, however, it would clearly not make sense if you split the list of potential countries. The number of countries on this planet is limited, so it is simply more viable to have a copy of this data on all nodes.
Requesting transaction IDs from the GTM is a fairly expensive process. If you are running a large Postgres-XC setup supposed to handle Online transaction processing (OLTP) workload, the GTM can actually turn out to be a bottleneck. The more Global Transaction IDs we need, the more important the performance of the GTM will be.
To get around this issue, we can introduce a GTM Proxy. The idea is that transaction IDs will be requested in larger blocks. The core idea is that we want to avoid network traffic and especially latency. The concept is pretty similar to how grouping the commits in PostgreSQL works.
How can a simple GTM Proxy be set up? First of all, we have to create a directory where the config is supposed to exist. Then we can make the following call:
initgtm -D /path_to_gtm_proxy/ -Z gtm_proxy
This will create a config sample, which we can simply adapt easily. After defining a nodename, we should set gtm_host
and gtm_port
to point to the active GTM. Then we can tweak the number of worker threads to a reasonable number to make sure that we can handle more load. Usually we configure the GTM Proxy in a way that the number of worker_threads
matches the number of nodes in the system. This has proven to be a robust configuration.
Finally we can start the proxy infrastructure:
gtm_ctl -D /path_to_gtm_proxy/ -Z gtm_proxy start
The GTM proxy is now available to our system.
52.15.129.90