JSON is a universal data structure that is human and machine readable. JSON is supported by almost all modern programming languages, embraced as a data interchange format, and heavily used in restful web services.
XML and JSON are both used to define the data structure of exchanged documents. JSON grammar is simpler than that of XML, and JSON documents are more compact. JSON is easier to read and write. On the other hand, XML can have a defined data structure enforced by the XML schema definition (XSD) schema. Both JSON and XML have different usages as exchange formats; based on personal experience, JSON is often used within the organization itself or with web services and mobile applications due to its simplicity, while XML is used to define highly structured documents and formats to guarantee interoperability with data exchange between different organizations. For example, several Open Geospatial Consortium (OGC) standards, such as web map services, use XML as an exchange format with a defined XSD schema.
PostgreSQL supports two JSON data type, mainly JSON and JSONB, both of which are implementations of RFC 7159. Both types can be used to enforce JSON rules. Both types are almost identical; however
jsonb
is more efficient as it stores JSON documents in a binary format and also supports indexes. When using JSON, it is preferable to have UTF8 as the database encoding to ensure that the JSON type conforms to RFC 7159 standards. On one hand, when storing data as a JSON document, the JSON object is stored in a textual format. On the other hand, when storing a JSON object as JSONB, the JSON primitive data types mainly string, Boolean, and number will be mapped to text, Boolean and numeric respectively.
When casting text as a json
type, the text is stored and rendered without any processing; so, it will preserve the white space, numeric formatting, and elements' order details. JSONB does not preserve these details, as shown in the following example:
CREATE TABLE test_json( doc json ); CREATE TABLE test_jsonb ( doc jsonb ); INSERT INTO test_json VALUES ('{"car_id":1, "model":"BMW"}'::json), ('{"name":"some name", "name":"some name"}'::json); INSERT INTO test_jsonb VALUES ('{"car_id":1, "model":"BMW"}'::jsonb), ('{"name":"some name", "name":"some name"}'::jsonb); SELECT * FROM test_json; doc ------------------------------------------ {"car_id":1, "model":"BMW"} {"name":"some name", "name":"some name"} (2 rows) SELECT * FROM test_jsonb; doc ------------------------------- {"model": "BMW", "car_id": 1} {"name": "some name"} (2 rows)
The JSON objects can contain other nested JSON objects, arrays, nested arrays, arrays of JSON objects, and so on. JSON arrays and objects can be nested arbitrarily, allowing the developer to construct complex JSON documents. The array elements in JSON documents can be of different types. The following example shows how to construct an account with name
as text value, address
as JSON object, and rank
as an array:
INSERT INTO test_jsonb VALUES ('{"name":"John", "Address":{"Street":"Some street", "city":"Some city"}, "rank":[5,3,4,5,2,3,4,5]}'::JSONB); INSERT 0 1
One could get the JSON object field as a JSON object or as text. Also, JSON fields can be retrieved using the index or the fieldname. The following table summarizes the JSON retrieval operators:
Return type | ||
---|---|---|
JSON |
Text |
Description |
|
|
This returns a JSON field either using the field index or field name |
|
|
This returns a JSON field defined by a specified path |
To get the Address
and city
from the JSON object created before, one could use two methods as follows (note that the field names of JSON objects are case sensitive):
SELECT doc->'Address'->>'city', doc#>>'{Address, city}' FROM test_jsonb WHERE doc->>'name' = 'John'; ?column? | ?column? -----------+----------- Some city | Some city (1 row)
Currently, there is no way to update and delete a single field in the JSON object; however, one could work around this by deconstructing and constructing the JSON object. This can be done by converting the JSON object to a key value set using the json_each
and jsonb_each
functions. The following example shows how to decompose the JSON object in the previous example:
WITH RECURSIVE json_get(key, value, parent) AS( SELECT (jsonb_each(doc)).* , 'null'::text FROM test_jsonb WHERE doc->>'name' = 'John' UNION ALL SELECT (jsonb_each(value)).*, key FROM json_get WHERE jsonb_typeof(value) = 'object') SELECT * FROM json_get; key | value | parent ---------+------------------------------------------------+--------- name | "John" | null rank | [5, 3, 4, 5, 2, 3, 4, 5] | null Address | {"city": "Some city", "Street": "Some street"} | null city | "Some city" | Address Street | "Some street" | Address (5 rows)
Another simpler approach is to convert the JSON object to text, then use regular expressions to replace or delete an element, and finally cast the text to JSON again. To delete the rank from the account object, one can do the following:
SELECT (regexp_replace(doc::text, '"rank":(.*)],',''))::jsonb FROM test_jsonb WHERE doc->>'name' = 'John'; regexp_replace ----------------------------------------------------------------------------- {"name": "John", "Address": {"city": "Some city", "Street": "Some street"}} (1 row)
A third option to manipulate JSON objects is to use a procedural language, such as PL/Python, or manipulate the JSON object in the business logic.
JSONB documents can be indexed using the GIN index, and the index can be used for the following operators:
@>
: Does the left JSON value contain the right value??
: Does the key string exist in the JSON doc??&
: Do any of the elements in the text array exist in the JSON doc??|
: Do all the keys/elements in the text array exist in the JSON doc?To see the effect of indexing on the test_jsonb
table, let's create an index and disable sequential scan, as follows:
CREATE INDEX ON test_jsonb USING GIN (doc); SET enable_seqscan = off; EXPLAIN SELECT * FROM test_jsonb WHERE doc @> '{"name":"John"}'; --- Result QUERY PLAN ---------------------------------------------------------------------------------- Bitmap Heap Scan on test_jsonb (cost=12.00..16.01 rows=1 width=32) Recheck Cond: (doc @> '{"name": "John"}'::jsonb) -> Bitmap Index Scan on test_jsonb_doc_idx (cost=0.00..12.00 rows=1 width=0) Index Cond: (doc @> '{"name": "John"}'::jsonb) (4 rows)
It is convenient to provide an interface to share commonly used data by several applications via a restful API. Let's assume we have a table that is used by several applications; one way to make these applications aware of this table is to create a data access object (DAO) for this table, wrap it in a library, and then reuse this library in these applications. This approach has some disadvantages, such as resolving library dependency and mismatching library versions. Also, deploying new versions of a library requires a lot of effort because applications using this library need to be compiled, tested, and deployed.
The advantage of providing a RESTful API interface for the PostgreSQL database is to allow easy access to data. Also, it allows the developer to utilize micro service architecture, which leads to better agility.
There are several open source frameworks to set up a RESTful API interface for PostgreSQL. To set up a complete CRUD API using Nginx, one can use the Nginx PostgreSQL upstream module—ngx_postgres
—which is very easy to use and well supported. Unfortunately, it is not available as a Debian package, which means that one needs to compile Nginx and install it manually.
In the following example, a RESTful API to retrieve data is presented using Nginx and memcached, assuming that the data can fit completely in the memory. PostgreSQL pushes data to the memcached server, and Nginx pulls the data from the memcached server. Memcached is used to cache data and as an intermediary layer between postgres and Nginx.
As memcached uses the key/value data structure, one needs to identify the key to get data. Nginx uses the URI and passed arguments to map a certain request to a memcached value. In the following example, RESTful API best practices are not covered, and this is only for a technical demonstration. Also, in a production environment, one could combine caching and the PostgreSQL Nginx upstream module—ngx_postgres
—to achieve a highly performant CRUD system. By combining memcached and ngx_postgres
, one could get rid of the limitation of RAM size.
In order to push data to a memcached server, the PostgreSQL pg_memcache
module is required. Take a look at the following code:
apt-get install postgresql-9.4-pgmemcache psql -d template1 -c "CREATE EXTENSION pgmemcache"
To install the Nginx and memcached server on Linux, one can execute the following commands:
apt-get install nginx apt-get install memcached
Also, to add the memcached server permanently to the PostgreSQL server, one can add the following custom variable in the customized options settings block in the postgresql.conf
file:
$echo "pgmemcache.default_servers = 'localhost'">>/etc/postgresql/9.4/main/postgresql.conf $/etc/init.d/postgresql reload
To test the memcached
and pgmemcache
extensions, one can use the memcached_add(key,value)
and memcached_get(key)
functions to populate and retrieve a factionary memcached value, as follows:
template1=# SELECT memcache_add('/1', 'hello'); memcache_add -------------- t (1 row) template1=# SELECT memcache_get('/1'); memcache_get -------------- hello (1 row)
In order to allow Nginx to access memcached, the memcached server needs to be configured, and the Nginx server configuration needs to be reloaded in order for the configuration to take effect. The following is a minimal Nginx configuration file—nginx.conf
—to allow access to the memcached server. Note that in Ubuntu, the Nginx configuration file is located under /etc/nginx/nginx.conf
:
user www-data; worker_processes 4; pid /run/nginx.pid; events { worker_connections 800; } http { server { location / { set $memcached_key "$uri"; memcached_pass 127.0.0.1:11211; default_type application/json; add_header x-header-memcached true; } } }
In the preceding example, the Nginx web server is used to serve the responses from the memcached server defined by the memcached_pass
variable. The response from memcached is obtained by the key, which is the uniform
resource identifier (URI). The default response type is JSON. Finally, the x-header-memcached header is set to true to enable troubleshooting.
To test the Nginx server setting, let's retrieve the value of the /1
key generated by the pgmemcache
PostgreSQL extension, as follows:
$curl -I -X GET http://127.0.0.1/1 HTTP/1.1 200 OK Server: nginx/1.4.6 (Ubuntu) Date: Wed, 22 Jul 2015 14:50:26 GMT Content-Type: application/json Content-Length: 5 Connection: keep-alive x-header-memcached: true
Note that the server responded successfully to the request, the response type is marked as JSON, and the response is obtained by memcached as shown by the header.
Let's assume that we want to have a restful web service to present user account information, including the account ID, first name, last name, and e-mail from the account
table. The row_to_json ()
function can be used to construct a JSON document from a relational row, as follows:
car_portal=> SELECT row_to_json (row(account_id,first_name, last_name, email)) FROM car_portal_app.account LIMIT 1; row_to_json ------------------------------------------------------------- {"f1":1,"f2":"James","f3":"Butt","f4":"[email protected]"} (1 row) car_portal=> SELECT row_to_json (account) FROM car_portal_app.account LIMIT 1; row_to_json ------------------------------------------------------------------------------------------------------------------------------------- {"account_id":1,"first_name":"James","last_name":"Butt","email":"[email protected]","password":"1b9ef408e82e38346e6ebebf2dcc5ece"} (1 row)
In the preceding example, the usage of the row(account_id,first_name, last_name, email)
construct caused the row_to_json
function to be unable to determine the attribute names, and the names were replaced with f1
, f2
, and so on. To work around this, one needs to give a name to the row. This can be done in several ways, such as using subqueries or giving an alias to the result; the following example shows one way to resolve the issue by specifying aliases using CTE:
car_portal=> WITH account_info(account_id, first_name, last_name, email) AS ( SELECT account_id,first_name, last_name, email FROM car_portal_app.account LIMIT 1 ) SELECT row_to_json(account_info) FROM account_info; row_to_json --------------------------------------------------------------------------------------- {"account_id":1,"first_name":"James","last_name":"Butt","email":"[email protected]"} (1 row)
To generate entries for the account table using account_id
—the primary key—as the hash key, one can use the following:
SELECT memcache_add('/'||account_id, (SELECT row_to_json(foo) FROM (SELECT account_id, first_name,last_name, email ) AS FOO )::text) FROM car_portal_app.account;
Note that at this point, one can access data from the Nginx server. To test this, let's get the JSON representation for the account with account_id
equaling 1
, as follows:
$curl -sD - -o -X GET http://127.0.0.1/1 HTTP/1.1 200 OK Server: nginx/1.4.6 (Ubuntu) Date: Fri, 24 Jul 2015 16:38:13 GMT Content-Type: application/json Content-Length: 82 Connection: keep-alive x-header-memcached: true {"account_id":1,"first_name":"James","last_name":"Butt","email":"[email protected]"}
To ensure that the account table data is consistent with the memcached server data, one can add a trigger to reflect the changes performed on the table on memcached. Also, one needs to be careful with handling transaction rollbacks and come with the appropriate caching strategy to decrease data inconsistency between the cache and database system. The following example shows the effect of a rollback on memcached data:
car_portal=# BEGIN; BEGIN car_portal=# SELECT memcache_add('is_transactional?', 'No'); memcache_add -------------- t (1 row) car_portal=# Rollback; ROLLBACK car_portal=# SELECT memcache_get('is_transactional?'); memcache_get -------------- No (1 row)
3.147.53.139