After this little introduction we can move forward and replicate our first database. To do so we can create two databases in a database instance. We want to simply replicate between these two databases.
Creating those two databases should be an easy task once your instance is up and running:
hs@hs-VirtualBox:~$ createdb db1 hs@hs-VirtualBox:~$ createdb db2
Now we can create a table, which should be replicated from database db1
to database db2
:
db1=# CREATE TABLE t_test (id serial, name text, PRIMARY KEY (id)); NOTICE: CREATE TABLE will create implicit sequence "t_test_id_seq" for serial column "t_test.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t_test_pkey" for table "t_test" CREATE TABLE
Create this table in both databases in an identical way because the table structure won't be replicated automatically.
Once this has been done we can write a slonik
script to tell the cluster about our two nodes. slonik
is a command-line interface that we can use to talk to Slony directly. You could also work with it interactively but this is far from comfortable.
A script to register these nodes would look as follows:
#!/bin/sh MASTERDB=db1 SLAVEDB=db2 HOST1=localhost HOST2=localhost DBUSER=hs slonik<<_EOF_ cluster name = first_cluster; # define nodes node 1 admin conninfo = 'dbname=$MASTERDB host=$HOST1 user=$DBUSER'; node 2 admin conninfo = 'dbname=$SLAVEDB host=$HOST2 user=$DBUSER'; # init cluster init cluster ( id=1, comment = 'Master Node'), # group tables into sets create set (id=1, origin=1, comment='Our tables'), set add table (set id=1, origin=1, id=1, fully qualified name = 'public.t_test', comment='sample table'), store node (id=2, comment = 'Slave node', event node=1); store path (server = 1, client = 2, conninfo='dbname=$MASTERDB host=$HOST1 user=$DBUSER'), store path (server = 2, client = 1, conninfo='dbname=$SLAVEDB host=$HOST2 user=$DBUSER'), _EOF_
First of all we define a handful of environment variables. This is not necessary but can be quite handy to make sure that in case of a change, nothing is forgotten. Then our slonik
script starts.
The first thing we have to do is to define a cluster name. This is important: With Slony a cluster is more of a virtual thing—it is not necessarily related to physical hardware. We will find out later on when talking about failovers what this means.
In the next step we have to define our nodes of this cluster. The idea here is that each node will have a number associated to a connection string. Once this has been done, we can call init cluster
. During this step Slony will deploy all of the infrastructure to do replication. We don't have to install anything manually here.
Now that the cluster has been initialized we can organize our tables into replication sets, which are really just a set of tables. In Slony we will always work with replication sets. Tables are grouped into sets and replicated together. This layer of abstraction allows us to quickly move groups of tables around. In many cases it is a lot easier than to just move individual tables one by one.
Finally we have to define paths. What is a path? A path is basically the connection string to move from A to B. The main question here is why paths are needed at all. We have already defined nodes earlier so why define paths? The point is: The route from A to B is not necessarily the same as the route from B to A. This is especially important if one of these servers is in some DMZ while the other one is not. In other words, by defining paths you can easily replicate between different private networks and cross firewalls doing some NAT if necessary.
As the script is a simple shell script we can easily execute it:
hs@hs-VirtualBox:~/slony$ sh slony_first.sh
Slony has done some work in the background. When looking at our test table we can see what has happened:
db1=# d t_test Table "public.t_test" Column | Type | Modifiers --------+---------+----------------------------------------------------- id | integer | not null default nextval('t_test_id_seq'::regclass) name | text | Indexes: "t_test_pkey" PRIMARY KEY, btree (id) Triggers: _first_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON t_test FOR EACH ROW EXECUTE PROCEDURE _first_cluster.logtrigger('_first_cluster', '1', 'k') _first_cluster_truncatetrigger BEFORE TRUNCATE ON t_test FOR EACH STATEMENT EXECUTE PROCEDURE _first_cluster.log_truncate('1') Disabled triggers: _first_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON t_test FOR EACH ROW EXECUTE PROCEDURE _first_cluster.denyaccess('_first_cluster') _first_cluster_truncatedeny BEFORE TRUNCATE ON t_test FOR EACH STATEMENT EXECUTE PROCEDURE _first_cluster.deny_truncate()
A handful of triggers have been deployed automatically to keep track of these changes. Each event is covered by a trigger.
Now that this table is under Slony's control we can start to replicate it. To do so we have to come up with a slonik
script again:
#!/bin/sh MASTERDB=db1 SLAVEDB=db2 HOST1=localhost HOST2=localhost DBUSER=hs slonik<<_EOF_ cluster name = first_cluster; node 1 admin conninfo = 'dbname=$MASTERDB host=$HOST1 user=$DBUSER'; node 2 admin conninfo = 'dbname=$SLAVEDB host=$HOST2 user=$DBUSER'; subscribe set ( id = 1, provider = 1, receiver = 2, forward = no); _EOF_
After stating the cluster name and after listing the nodes, we can call subscribe set
. The point here is that in our example set number 1 is replicated from node 1 to node 2 (receiver). The forward
keyword is important to mention here. This keyword indicates whether or not the new subscriber should store the log information during replication to make it possible to be a candidate for the provider role for future nodes. Any node that is intended to be a candidate for FAILOVER
must have forward = yes
. In addition to that, this keyword is essential to do cascaded replication (meaning, A replicates to B and B replicates to C).
If you execute this script, Slony will truncate the table on the slave and reload all the data to make sure that things are in sync. In many cases you know already that you are in sync and you want to avoid copying gigabytes of data over and over again. To achieve that we can add OMIT COPY = yes
. This will tell Slony that we are sufficiently confident that data is already in sync.
After defining what we want to replicate, we can fire up those two slon
daemons in our favorite UNIX shell:
$ slon first_cluster 'host=localhostdbname=db1' $ slon first_cluster 'host=localhostdbname=db2'
This can also be done before we define this replication route—so order is not the primary concern here.
Now we can move forward and check if replication is working nicely:
db1=# INSERT INTO t_test (name) VALUES ('anna'), INSERT 0 1 db1=# SELECT * FROM t_test; id | name ----+------ 1 | anna (1 row) db1=# q hs@hs-VirtualBox:~/slony$ psql db2 psql (9.2.4) Type "help" for help. db2=# SELECT * FROM t_test; id | name ---+------ (0 rows) db2=# SELECT * FROM t_test; id | name --- +------ 1 | anna (1 row)
We add a row to the master, quickly disconnect, and query if the data is already there. If you happen to be quick enough you will see that the data comes with a small delay. In our example, we managed to get an empty table just to demonstrate what asynchronous replication really means.
Let us assume you are running a book shop. Your application connects to server A to create a new user. Then the user is redirected to a new page, which queries some information about the new user—be prepared for the possibility that the data is not there yet on server B. This is a common mistake in many web applications dealing with load balancing. The same kind of delay happens with asynchronous streaming replication.
18.221.165.115