Setting up PL/Proxy

After this brief theoretical introduction, we can move forward and run some simple PL/Proxy setups. To do so, we simply install PL/Proxy and see how it can be utilized.

Installing PL/Proxy is an easy task. First of all, we have to download the source code from http://pgfoundry.org/projects/plproxy/. Of course, you can also install binary packages if prebuilt packages are available for your operating system. However, in this section, we will simply perform an installation from the source and see how things work on a very basic level.

The first step in the installation process is to unpack the TAR archive. This can be easily done using the following command:

tar xvfz plproxy-2.5.tar.gz

Once the TAR archive has been unpacked, we can enter the newly created directory and start the compilation process by simple calling make && make install.

Tip

Please make sure that your PATH variable points to the PostgreSQL binary directory. Depending on your current setup, it might also be necessary to run your installation procedure as root.

If you want to make sure that your installation is really fine, you can also run make installcheck. It runs some simple tests to make sure your system is operating correctly.

A basic example

To get you started, we want to set up PL/Proxy in such a way that we can fetch random numbers from all the four partitions. This is the most basic example. It will show all the basic concepts of PL/Proxy.

To enable PL/Proxy, we have to load the extension into the database first:

test=# CREATE EXTENSION plproxy;
CREATE EXTENSION

This will install all the relevant code and infrastructure you need to make this work. Then we want to create four databases, which will carry the data we want to partition:

test=# CREATE DATABASE p0;
CREATE DATABASE
test=# CREATE DATABASE p1;
CREATE DATABASE
test=# CREATE DATABASE p2;
CREATE DATABASE
test=# CREATE DATABASE p3;
CREATE DATABASE

Once we have created those databases, we can run CREATE SERVER. The question is: What is a SERVER? Well, in this context you can see a SERVER as some kind of remote data source providing you with the data you need. A SERVER is always based on a module (in our case, PL/Proxy) and may carry a handful of options. In the case of PL/Proxy, those options are just a list of partitions; there can be some additional parameters as well, but the list of nodes is by far the most important thing here:

CREATE SERVER samplecluster FOREIGN DATA WRAPPER plproxy
    OPTIONS (   partition_0 'dbname=p0 host=localhost',
partition_1 'dbname=p1 host=localhost',
partition_2 'dbname=p2 host=localhost',
partition_3 'dbname=p3 host=localhost'),

Once we have created the server, we can move ahead and create ourselves a nice user mapping. The purpose of a user mapping is to tell the system what user we are going to be on the remote data source. It might very well happen that we are user A on the proxy, but user B on the underlying database servers. If you are using a foreign data wrapper to connect to, say, Oracle, this will be essential. In the case of PL/Proxy, it is quite often the case that the users on those partitions and the proxy are simply the same.

So, we can create a mapping as follows:

CREATE USER MAPPING FOR hs SERVER samplecluster;

If we are working as a superuser on the system, this will be enough. If we are not a superuser, we have to grant permissions to the user who is supposed to use our virtual server. We have to grant USAGE permissions to get this done:

GRANT USAGE ON FOREIGN SERVER samplecluster TO hs;

To see if our server has been created successfully, we can check out the pg_foreign_server system table. It holds all the relevant information about our virtual server. Whenever you want to figure out which partitions are present, you can simply consult the system table and inspect srvoptions:

test=# x
Expanded display is on.
test=# SELECT * FROM pg_foreign_server;
-[ RECORD 1 ]
srvname    | samplecluster
srvowner   | 10
srvfdw     | 16744
srvtype    | 
srvversion | 
srvacl     | {hs=U/hs}
srvoptions | {"partition_0=dbname=p0 host=localhost","partition_1=dbname=p1 host=localhost","partition_2=dbname=p2 host=localhost","partition_3=dbname=p3 host=localhost"}

As we have mentioned before, PL/Proxy is primarily a stored procedure language. We have to run a stored procedure to fetch data from our cluster. In our example, we want to run a simple SELECT statement on all the nodes of samplecluster:

CREATE OR REPLACE FUNCTION get_random() RETURNS setof text AS $$
    CLUSTER 'samplecluster';
    RUN ON ALL;
    SELECT random();
$$ LANGUAGE plproxy;

The procedure is just like an ordinary stored procedure. The only special thing here is that it has been written in PL/Proxy. The CLUSTER keyword will tell the system which cluster to take. In many cases, it can be useful to have more than just one cluster (maybe if different data sets are present on different sets of servers).

Then we have to define where to run the code. We can run on ANY (any server), ALL (on all servers) or on a specific server. In our example we have decided to run on all servers.

The most important thing here is that when the procedure is called we will get one row per node because we used RUN ON ALL. In the case of RUN ON ANY, we would have just got one row because the query would have been executed on any node inside the cluster:

test=# SELECT * FROM get_random();
get_random
-------------------
 0.879995643626899
 0.442110917530954
 0.215869579929858
 0.642985367681831
(4 rows)

Partitioned reads and writes

After this example, we want to focus on using PL/Proxy to partition the reads. Remember, the purpose of PL/Proxy is to spread the load that we want to scale out to more than just one database system.

To demonstrate how this works, we want to distribute user data to our four databases. In the first step, we have to create a simple table on all the four databases inside the cluster:

p0=# CREATE TABLE t_user (
  username  text, 
  password  text, 
  PRIMARY KEY (username)
);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t_user_pkey" for table "t_user"
CREATE TABLE

Once we have created the data structure, we can come up with a procedure to actually dispatch data into this cluster. A simple PL/Proxy procedure will do the job:

CREATE OR REPLACE FUNCTION create_user(name text, 
pass text) RETURNS void AS $$
        CLUSTER 'samplecluster';
        RUN ON hashtext($1);
$$ LANGUAGE plproxy;

The point here is that PL/Proxy will inspect the first input parameter and run a procedure called create_user on the desired node. RUN ON hashtext($1) will be our partitioning function. So the goal here is to find the right node and execute the very same procedure there. The important part is that on the desired node, the create_user procedure won't be written in PL/Proxy but simply in SQL, PL/pgSQL, or any other language. The only purpose of the PL/Proxy function is to find the right node to execute the underlying procedure.

The procedure on each of the nodes that actually puts the data into the table is pretty simple:

CREATE OR REPLACE FUNCTION create_user(name text, 
pass text) 
     RETURNS void AS $$
        INSERT INTO t_user VALUES ($1, $2);
$$ LANGUAGE sql;

It is simply an INSERT statement wrapped into a stored procedure that can do the actual work on those nodes.

Once we have deployed this procedure on all the four nodes, we can give it a try:

SELECT create_user('hans', 'paul'),

The PL/Proxy procedure in the test database will hash the input value and figure out that the data has to be on p3, which is the fourth node:

p3=# SELECT * FROM t_user;
username | password 
----------+----------
hans     | paul
(1 row)

The following SQL statement will reveal why the fourth node is correct:

test=# SELECT hashtext('hans')::int4::bit(2)::int4;
hashtext
----------
        3
(1 row)

Please keep in mind that we will start counting at 0, so the fourth node is actually number 3.

Tip

Keep in mind that the partitioning function can be any deterministic routine. However, we strongly advise keeping it as simple as possible.

In our example, we have executed a procedure on the proxy and relied on the fact that a procedure with the same name will be executed on the slave. But what if you want to call a procedure on the proxy that is supposed to execute some other procedure in the desired node? To map a proxy procedure to some other procedure, there is a command called TARGET.

To map create_user to create_new_user, just add the following line to your PL/Proxy function:

CREATE OR REPLACE FUNCTION create_user(name text, 
pass text) RETURNS void AS $$
        CLUSTER 'samplecluster';
  TARGET create_new_user; 
        RUN ON hashtext($1);
$$ LANGUAGE plproxy;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.27