Creating the tables and issuing the queries

After this introduction to Postgres-XC and its underlying ideas, it is time to create our first table and see how the cluster will behave. The next example shows a simple table. It will be distributed using the a hash key of the id column:

test=# CREATE TABLE t_test (id int4)
DISTRIBUTE BY HASH (id);
CREATE TABLE
test=# INSERT INTO t_test
SELECT * FROM generate_series(1, 1000);
INSERT 0 1000

Once the table has been created, we can add data to it. After completion, we can check if the data has been written correctly to the cluster:

test=# SELECT count(*) FROM t_test;
count
-------
  1000
(1 row)

Not surprisingly, we got 1000 rows in our table.

The interesting thing here is to see how the data is returned by the database engine. Let us take a look at the execution plan of our query:

test=# explain (VERBOSE TRUE, ANALYZE TRUE, 
NODES true, NUM_NODES true) 
SELECT count(*) FROM t_test; 
QUERY PLAN
-------------------------------------------------------
 Aggregate  (cost=2.50..2.51 rows=1 width=0) 
(actual time=5.967..5.970 rows=1 loops=1)
   Output: pg_catalog.count(*)
   ->  Materialize  (cost=0.00..0.00 rows=0 width=0) 
         (actual time=5.840..5.940 rows=3 loops=1)
         Output: (count(*))
        ->Data Node Scan (primary node count=0, 
node count=3) on       
            "__REMOTE_GROUP_QUERY__"  
(cost=0.00..0.00 rows=1000 width=0) 
            (actual time=5.833..5.915 rows=3 loops=1)
               Output: count(*)
               Node/s: node2, node3, node4
               Remote query: SELECT count(*)  FROM 
(SELECT id FROM ONLY t_test
WHERE true) group_1   
 Total runtime: 6.033 ms
(9 rows)

PostgreSQL will perform a so called Data Node Scan. This means that PostgreSQL will collect data from all the relevant nodes in the cluster. If you look closely, you can see which query will be pushed down to those nodes inside the cluster. The important thing is that the count is already shipped to the remote node. All those counts coming back from our nodes will be folded into a single count then. This kind of plan is a lot more complex than a simple local query, but it can be very beneficial as the amount of data grows because each node will only perform a subset of the operation. The fact that each node performs just a subset of operations is especially useful when many things are running in parallel.

Postgres-XC optimizer can push down operations to the Datanodes in many cases, which is good for performance. However, you should still keep an eye on your execution plans to make sure that you have reasonable plans.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.110.183