Chapter 8. JSON and NoSQL

In Chapter 7, we zoomed the lens to the client side in the client–server relationship of the Web. In this chapter, we are going to zoom that lens to the server side to a type of database that not only uses JSON documents to store data, but interfaces with the outside world through a web API.

We live in the “information age” where vast amounts of data and knowledge are easily accessible. We can query for information on a specific subject like “naked mole-rat” on sites like Wikipedia. The information on “naked mole-rat” has to be stored somewhere, and that somewhere is ultimately a database.

If you’ve worked with a database in the past and are familiar with SQL, you were most likely working with a relational database. Relational databases are structured with tables, columns, and rows. Each table is representative of something, such as an account. The table representing an account may have a relationship with a table that represents addresses for accounts. A key in each table, such as a column that contains an account identifier, forms a relationship between the two tables.

To create, manipulate, and query these relational databases, we use Structured Query Language (SQL). With SQL, I can query for columns and rows from one or more tables of the database (Example 8-1).

Example 8-1. A simple SQL query that will return columns and rows for AccountID, FirstName, and LastName from the Account table
SELECT accountId, firstName, lastName 
FROM Account

With the relationships between the tables in a relational database, I may also query columns and rows from multiple tables (Example 8-2).

Example 8-2. A query joining two tables together; this will return columns and rows for first name, last name, street address, and zip code of all accounts and their related addresses
SELECT Account.firstName, Account.lastName, Address.street, Address.zip  
FROM Account 
JOIN Address 
ON Account.accountId = Address.accountId

The “NoSQL” title given to a database tells us that it is not a relational database. We cannot use SQL to ask the database for columns and rows of tables joined together. What the title does not tell us is what it is. It simply says, “I’ve found an alternative way to store data that is not relational.” Naturally, there is more than one “alternative way.”

The NoSQL databases out in the world today vary in nature as much as a tiger from an elephant. These types of databases vary in nature because data has different sizes, shapes, and purposes. The creators of these NoSQL databases realized this and thus created homes for data to be stored and retrieved that break away from the traditional relational mold.

One example of a NoSQL database is a key-value store. The key-value store models simple data as key-value pairs. If we were to take the English dictionary and make a database out of it, it could fit well in a key-value store. Each word could be a key, and each definition could be a value. This type of database is a great alternative to the overhead of a relational database for simple data structures.

The type of NoSQL database we are going to explore in this chapter is a document store. This alternative to a relational database structures data around the concept of a document rather than representing and relating pieces of data as tables. There are currently 20+ different document store databases being used out in the world today, some using XML documents and others JSON documents.

In this chapter, we will take a look at one type of document store database that uses JSON documents to store data. Let’s take a look at CouchDB.

The CouchDB Database

As CouchDB is a NoSQL database, it has a nonrelational way of storing data with JSON documents.

Let’s say I need to store account data for customers. With a relational database, account information could quickly span across several tables. If I needed to have more than one address for an account, then a separate address table would be necessary to support this one-to-many relationship.

In a relational database, a one-to-many relationship such as an account with multiple addresses must be queried with a join to put the data back together again. Additionally, if I were to query the database for a single account and all of its address records joined, the results would contain multiple rows. Each row would have a repeat of the same account fields, with a different address for each row. Thus, the structure of the data exists inside of the database, but is flattened into columns and rows when queried.

With CouchDB, the relationships within the data do not require the data to be separated for storage and reassembled later for reading. The relationships are expressed within the data, and the structure is maintained as the data moves in and out of the database.

In Example 8-3, the one-to-many relationship of account addresses is represented as objects in an array.

Example 8-3. A JSON document representing an account
{
  "firstName": "Bob",
  "lastName": "Barker",
  "age": 91,
  "addresses": [
    {
      "street": "123 fake st",
      "city": "Somewhere",
      "state": "OR",
      "zip": 97520
    },
    {
      "street": "456 fakey ln",
      "city": "Some Place",
      "state": "CA",
      "zip": 96026
    }
  ]
}

With the CouchDB document store, when I ask the database for an account, I get a structured document in return. There is no assembly required. This is both convenient and faster.

However, a document store is not a one-size-fits-all solution for data storage. This model can quickly turn troublesome if multiple associations are needed. What if I wanted to have relationships between cities, states, and zip codes for an address? As soon as these relationships are needed, then the relational model would better suit the data, as it would be difficult to express complex relationships within a single document.

Another thing that CouchDB does well is the facilitation of evolving data. Some data evolves over time. For example, in 1990, I could have probably gotten away with these fields for phone numbers in a record for an account:

  • Home Phone
  • Work Phone
  • Fax

Now that we aren’t in 1990, we need  a field for a mobile phone. The data needs to evolve.

In a relational database, we would need to modify the schema of the address table to support this mobile phone field. We might also decide to get rid of the fields on the account table and move the phone numbers to their own table.

With CouchDB, our data is allowed to evolve without having to modify schema. I could easily represent phone numbers in JSON as an array of objects (Example 8-4), and allow an account to have as many phone records as they truly have.

Example 8-4. An array of phone number objects
{
  "phoneNumbers": [
    {
      "description": "home phone",
      "number": "555-555-5555"
    },
    {
      "description": "cell phone",
      "number": "123-456-7890"
    },
    {
      "description": "fax",
      "number": "456-789-1011"
    }
  ]
}

If in the future an account gains a new number, we can just add it to the array, as shown in Example 8-5.

Example 8-5. The array of phone number objects gets a new “space phone”
{
  "phoneNumbers": [
    {
      "description": "home phone",
      "number": "555-555-5555"
    },
    {
      "description": "cell phone",
      "number": "123-456-7890"
    },
    {
      "description": "fax",
      "number": "456-789-1011"
    },
    {
      "description": "space phone",
      "number": "932-932-932"
    }
  ]
}

Additionally, if we needed a new field on the account record, such as “galaxy,” we would just add it to all future records (Example 8-6). No schema modification required.

Example 8-6. The addition of a “galaxy” field to an account record
{
  "firstName": "Octavia",
  "lastName": "Wilson",
  "age": 26,
  "addresses": [
    {
      "street": "123 fake st",
      "city": "Somewhere",
      "state": "OR",
      "zip": 97520
    }
  ],
  "galaxy": "Milky Way"
}

Data moves in and out of the CouchDB database as JSON documents. The question now is, how do we put data in, and how do we get it out? Couch DB uses HTTP for an API. Let’s take a look at the CouchDB API.

The CouchDB API

With CouchDB, the way we ask the database for data is by requesting resources with HTTP. With HTTP, we make requests for resources with a URL. The resource we request from the CouchDB API is a JSON document (application/json).

If I have a database named “accounts” in my locally installed CouchDB, I can use the URL http://localhost:5984/accounts/ to get information about the database. In Example 8-7, information about the accounts database is returned in JSON format.

Example 8-7. Response from http://localhost:5984/accounts/
{
    "db_name": "accounts",
    "doc_count": 3,
    "doc_del_count": 0,
    "update_seq": 7,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 28773,
    "data_size": 1248,
    "instance_start_time": "1432493477586600",
    "disk_format_version": 6,
    "committed_update_seq": 7
}

Notice the "doc_count" name-value pair. This specifies how many documents are in my database, and there are currently 3. I can query for each document in the database by the unique identifier for each document, with the URL http://localhost:5984/accounts/<unique_identifier>. If I do not know the unique identifiers for my documents, I can use the URL http://localhost:5984/accounts/_all_docs to retrieve an array of the row identifiers.

In Example 8-8, the JSON resource at URL http://localhost:5984/accounts/_all_docs contains an array of the document identifiers for each document in my database.

Example 8-8. Array of document identifiers
{
    "total_rows": 3,
    "offset": 0,
    "rows": [
        {
            "id": "3636fa3c716f9dd4f7407bd6f7000552",
            "key": "3636fa3c716f9dd4f7407bd6f7000552",
            "value": {
                "rev": "1-8a9527cbfc22e28984dfd3a3e6032635"
            }
        },
        {
            "id": "ddc14efcf71396463f53c0f880001538",
            "key": "ddc14efcf71396463f53c0f880001538",
            "value": {
                "rev": "1-3aef6f6ae7fff90dac3ff5d6c4460ceb"
            }
        },
        {
            "id": "ddc14efcf71396463f53c0f8800019ea",
            "key": "ddc14efcf71396463f53c0f8800019ea",
            "value": {
                "rev": "5-c38761b818edaf9842a63574927b7d38"
            }
        }
    ]
}

If I grab the first identifier (3636fa3c716f9dd4f7407bd6f7000552) from this array, I can then structure a URL to make a request for the document (Example 8-9).

Example 8-9. JSON resource representing an account at URL http://localhost:5984/accounts/3636fa3c716f9dd4f7407bd6f7000552
{
    "_id": "3636fa3c716f9dd4f7407bd6f7000552",
    "_rev": "1-8a9527cbfc22e28984dfd3a3e6032635",
    "firstName": "Billy",
    "lastName": "Bob",
    "address": {
        "street": "123 another st",
        "city": "Somewhere",
        "state": "OR",
        "zip": "97501"
    },
    "age": 54,
    "gender": "male",
    "famous": false
}

We’ve seen how to request data from a CouchDB database. Now let’s take a look at how we deliver data.

If I want to add another account document to my accounts database, I would do so by posting to the URL http://localhost:5984/accounts/ (see Examples 8-10 and 8-11).

Example 8-10. HTTP headers for request (http://localhost:5984/accounts/) using POST method
POST /accounts/ HTTP/1.1
Host: localhost:5984
Content-Type: application/json
Cache-Control: no-cache
Example 8-11. HTTP body for request (http://localhost:5984/accounts/) using POST method
{
    "firstName": "Janet",
    "lastName": "Jackson",
    "address": {
        "street": "456 Fakey Fake st",
        "city": "Somewhere",
        "state": "CA",
        "zip": "96520"
    },
    "age": 54,
    "gender": "female",
    "famous": true
}
Note

If you use Chrome as your Internet browser, check out “Postman” in the Chrome Web Store. Postman gives you a simple interface for building up and testing HTTP requests to APIs with all the HTTP methods (GET, POST, PUT, DELETE, etc.). In addition, Postman saves a history of your requests, so you can test the same requests over and over without having to re-create them.

After the HTTP request is successful, the CouchDB API will deliver a response message in JSON format that includes the newly generated identifier for the document (Example 8-12).

Example 8-12. The JSON response from http://localhost:5984/accounts/
{
    "ok": true,
    "id": "3636fa3c716f9dd4f7407bd6f700076c",
    "rev": "1-363f3b4bf90183781d08fe22487f3c90"
}

I may now form a URL using the new unique identifier to request the JSON document from the accounts database (Example 8-13).

Example 8-13. Resource at URL http://localhost:5984/accounts/3636fa3c716f9dd4f7407bd6f700076c
{
    "_id": "3636fa3c716f9dd4f7407bd6f700076c",
    "_rev": "1-363f3b4bf90183781d08fe22487f3c90",
    "firstName": "Janet",
    "lastName": "Jackson",
    "address": {
        "street": "456 Fakey Fake st",
        "city": "Somewhere",
        "state": "CA",
        "zip": "96520"
    },
    "age": 54,
    "gender": "female",
    "famous": true
}

If I want to update my new JSON document, I can simply include the "_id" and "_rev" name-value pairs in the body of my request and POST to the URL of the resource. For example, if I wanted to update Janet Jackson’s age to 55, I would send the same document in the last example with the updated age. The API will then respond with a JSON status document that includes the updated "rev" name-value pair (see Example 8-14).

Example 8-14. Response after posting the updated JSON document to http://localhost:5984/accounts/3636fa3c716f9dd4f7407bd6f700076c
{
    "ok": true,
    "id": "3636fa3c716f9dd4f7407bd6f700076c",
    "rev": "3-29ede949ceed9df62bd7caecb095bffe"
}

If I want to delete a document, I use the HTTP DELETE method and pass the revision identifier into the URL as a query string parameter (Example 8-15).

Example 8-15. “rev” is communicated as a query string parameter
http://localhost:5984/accounts/3636fa3c716f9dd4f7407bd6f700076c?
       rev=3-29ede949ceed9df62bd7caecb095bffe

If I request the resource at URL http://localhost:5984/accounts/ once more, it will now show that there are four documents (Example 8-16).

Example 8-16. JSON document at URL http://localhost:5984/accounts/; the “doc_count” name-value pair now has a value of 4
{
    "db_name": "accounts",
    "doc_count": 4,
    "doc_del_count": 0,
    "update_seq": 8,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 32869,
    "data_size": 1560,
    "instance_start_time": "1432493477586600",
    "disk_format_version": 6,
    "committed_update_seq": 8
}

These examples showed how we can view and create the JSON documents that are stored in a CouchDB database. If this were all we could do, we would quickly run into problems. Perhaps I need a list of last names that only includes famous people. In CouchDB, we can achieve this with “views.” See Example 8-17.

Views are how we can restructure and query the data from a CouchDB database. The views are stored in a JSON document called a design document. The design document specifies the language, and may include multiple views.

Example 8-17. A CouchDB design document for the accounts database that includes a view for famous people
{
   "_id": "_design/lists",
   "_rev": "8-4124de7756277c6a937004a763d6247d",
   "language": "javascript",
   "views": {
       "famous": {
           "map": "function(doc){if(doc.lastName!==null&&doc.famous){emit
           (doc.lastName,null)}}"
       }
   }
}

Each view is an object that may contain a map and reduce function. In Example 8-17, there is only a map function. The map function takes each document as a parameter, and then the emit() function is called. This essentially transforms the data (Example 8-18).

Example 8-18. A closer look at the “map” function for the “famous” view
function(doc) {
    if (doc.lastName !== null && doc.famous) {
        emit(doc.lastName, null)
    }
}

In Example 8-18, the function checks each JSON document in the database to make sure we have a last name, and that the name-value pair for "famous" is true. The parameters for emit() are key and value, and these will display as name-value pairs in the transformed document result.

I can then request this view as a resource via the URL http://localhost:5984/accounts/_design/lists/_view/famous (see Example 8-19). The structure is /<db-name>/_design/<design_document_name>/_view/<view_name>/.

Example 8-19. The resource at URL http://localhost:5984/accounts/_design/lists/_view/famous
{
    "total_rows": 2,
    "offset": 0,
    "rows": [
        {
            "id": "ddc14efcf71396463f53c0f880001538",
            "key": "Barker",
            "value": null
        },
        {
            "id": "3636fa3c716f9dd4f7407bd6f700076c",
            "key": "Jackson",
            "value": null
        }
    ]
}

In addition to the map step, you may optionally reduce the results. CouchDB has three built-in reduce functions: _count, _sum, and _stats. These are useful after transforming the data with the map step to retrieve statistical information on your data set.

For example, I may want information on how many accounts there are based on gender. I can add a new view to my design document (see Example 8-20).

Example 8-20. The design document with an addition of a “gender_count” view
{
   "famous": {
       "map": "function(doc){if(doc.lastName!==null&&doc.famous){emit
       (doc.lastName,null)}}"
   },
   "gender_count": {
       "map": "function(doc){if(doc.gender!==null) emit(doc.gender);}",
       "reduce": "_count"
   }
}

In the map step for the "gender_count" view, the function is emitting the value from the "gender" name-value pair. Without the reduce step, this would produce an array of objects that have the gender value for "key" (seeExample 8-21).

Example 8-21. The “gender_count” view after the map step
{
    "total_rows": 4,
    "offset": 0,
    "rows": [
        {
            "id": "3636fa3c716f9dd4f7407bd6f700076c",
            "key": "female",
            "value": null
        },
        {
            "id": "ddc14efcf71396463f53c0f8800019ea",
            "key": "female",
            "value": null
        },
        {
            "id": "3636fa3c716f9dd4f7407bd6f7000552",
            "key": "male",
            "value": null
        },
        {
            "id": "ddc14efcf71396463f53c0f880001538",
            "key": "male",
            "value": null
        }
    ]
}

After the reduce step with the built-in "_count" function, we get a result that provides a count of all the records (see Example 8-22).

Example 8-22. Resource at URL http://localhost:5984/accounts/_design/lists/_view/gender_count
{
    "rows": [
        {
            "key": null,
            "value": 4
        }
    ]
}

What we are looking for is a count for each unique gender, so we need to pass in the group flag to group the results by key. To achieve this, we add a query string parameter ?group=true to the URL (see Example 8-23).

Example 8-23. Resource at URL http://localhost:5984/accounts/_design/lists/_view/gender_count?group=true
{
    "rows": [
        {
            "key": "female",
            "value": 2
        },
        {
            "key": "male",
            "value": 2
        }
    ]
}

With the CouchDB API, we can create any number of design documents and views for a database. Each of the views within the design documents optionally contain a map and reduce function to transform the data. We can then create our custom set of URL resources for transformed data sets from our stored JSON documents. Essentially, we can build our own  custom API for our data with the CouchDB API.

Key Terms and Concepts

This chapter covered the following key terms:

Relational Database
A type of database that is structured to recognize relationships in the stored data.
NoSQL Database
A type of database that is not structured on relationships in the stored data.
CouchDB
A document store type of NoSQL database that stores data in the form of JSON documents.

We also discussed these key concepts:

  • In a relational database, there are usually tables, columns, and rows, and relationships between each of those tables, columns, and rows. There are primary keys and foreign keys.
  • There are many types of NoSQL databases that break away from the relational model.

We also discussed CouchDB. Here are some key points to remember:

  • It is a document store type of NoSQL database.
  • It stores and manages JSON documents.
  • It maintains the structure of the data as it is stored and retrieved.
  • It uses an HTTP API for accessing data as JSON document resources.
  • It uses JavaScript as its query language with the map and reduce functions for views that can be accessed through the HTTP API.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.125.51