In Chapter 7, we zoomed the lens to the client side in the client–server relationship of the Web. In this chapter, we are going to zoom that lens to the server side to a type of database that not only uses JSON documents to store data, but interfaces with the outside world through a web API.
We live in the “information age” where vast amounts of data and knowledge are easily accessible. We can query for information on a specific subject like “naked mole-rat” on sites like Wikipedia. The information on “naked mole-rat” has to be stored somewhere, and that somewhere is ultimately a database.
If you’ve worked with a database in the past and are familiar with SQL, you were most likely working with a relational database. Relational databases are structured with tables, columns, and rows. Each table is representative of something, such as an account. The table representing an account may have a relationship with a table that represents addresses for accounts. A key in each table, such as a column that contains an account identifier, forms a relationship between the two tables.
To create, manipulate, and query these relational databases, we use Structured Query Language (SQL). With SQL, I can query for columns and rows from one or more tables of the database (Example 8-1).
SELECT accountId, firstName, lastName FROM Account
With the relationships between the tables in a relational database, I may also query columns and rows from multiple tables (Example 8-2).
SELECT Account.firstName, Account.lastName, Address.street, Address.zip FROM Account JOIN Address ON Account.accountId = Address.accountId
The “NoSQL” title given to a database tells us that it is not a relational database. We cannot use SQL to ask the database for columns and rows of tables joined together. What the title does not tell us is what it is. It simply says, “I’ve found an alternative way to store data that is not relational.” Naturally, there is more than one “alternative way.”
The NoSQL databases out in the world today vary in nature as much as a tiger from an elephant. These types of databases vary in nature because data has different sizes, shapes, and purposes. The creators of these NoSQL databases realized this and thus created homes for data to be stored and retrieved that break away from the traditional relational mold.
One example of a NoSQL database is a key-value store. The key-value store models simple data as key-value pairs. If we were to take the English dictionary and make a database out of it, it could fit well in a key-value store. Each word could be a key, and each definition could be a value. This type of database is a great alternative to the overhead of a relational database for simple data structures.
The type of NoSQL database we are going to explore in this chapter is a document store. This alternative to a relational database structures data around the concept of a document rather than representing and relating pieces of data as tables. There are currently 20+ different document store databases being used out in the world today, some using XML documents and others JSON documents.
In this chapter, we will take a look at one type of document store database that uses JSON documents to store data. Let’s take a look at CouchDB.
As CouchDB is a NoSQL database, it has a nonrelational way of storing data with JSON documents.
Let’s say I need to store account data for customers. With a relational database, account information could quickly span across several tables. If I needed to have more than one address for an account, then a separate address table would be necessary to support this one-to-many relationship.
In a relational database, a one-to-many relationship such as an account with multiple addresses must be queried with a join to put the data back together again. Additionally, if I were to query the database for a single account and all of its address records joined, the results would contain multiple rows. Each row would have a repeat of the same account fields, with a different address for each row. Thus, the structure of the data exists inside of the database, but is flattened into columns and rows when queried.
With CouchDB, the relationships within the data do not require the data to be separated for storage and reassembled later for reading. The relationships are expressed within the data, and the structure is maintained as the data moves in and out of the database.
In Example 8-3, the one-to-many relationship of account addresses is represented as objects in an array.
{ "firstName": "Bob", "lastName": "Barker", "age": 91, "addresses": [ { "street": "123 fake st", "city": "Somewhere", "state": "OR", "zip": 97520 }, { "street": "456 fakey ln", "city": "Some Place", "state": "CA", "zip": 96026 } ] }
With the CouchDB document store, when I ask the database for an account, I get a structured document in return. There is no assembly required. This is both convenient and faster.
However, a document store is not a one-size-fits-all solution for data storage. This model can quickly turn troublesome if multiple associations are needed. What if I wanted to have relationships between cities, states, and zip codes for an address? As soon as these relationships are needed, then the relational model would better suit the data, as it would be difficult to express complex relationships within a single document.
Another thing that CouchDB does well is the facilitation of evolving data. Some data evolves over time. For example, in 1990, I could have probably gotten away with these fields for phone numbers in a record for an account:
Now that we aren’t in 1990, we need a field for a mobile phone. The data needs to evolve.
In a relational database, we would need to modify the schema of the address table to support this mobile phone field. We might also decide to get rid of the fields on the account table and move the phone numbers to their own table.
With CouchDB, our data is allowed to evolve without having to modify schema. I could easily represent phone numbers in JSON as an array of objects (Example 8-4), and allow an account to have as many phone records as they truly have.
{ "phoneNumbers": [ { "description": "home phone", "number": "555-555-5555" }, { "description": "cell phone", "number": "123-456-7890" }, { "description": "fax", "number": "456-789-1011" } ] }
If in the future an account gains a new number, we can just add it to the array, as shown in Example 8-5.
{ "phoneNumbers": [ { "description": "home phone", "number": "555-555-5555" }, { "description": "cell phone", "number": "123-456-7890" }, { "description": "fax", "number": "456-789-1011" }, { "description": "space phone", "number": "932-932-932" } ] }
Additionally, if we needed a new field on the account record, such as “galaxy,” we would just add it to all future records (Example 8-6). No schema modification required.
{ "firstName": "Octavia", "lastName": "Wilson", "age": 26, "addresses": [ { "street": "123 fake st", "city": "Somewhere", "state": "OR", "zip": 97520 } ], "galaxy": "Milky Way" }
Data moves in and out of the CouchDB database as JSON documents. The question now is, how do we put data in, and how do we get it out? Couch DB uses HTTP for an API. Let’s take a look at the CouchDB API.
With CouchDB, the way we ask the database for data is by requesting resources with HTTP. With HTTP, we make requests for resources with a URL. The resource we request from the CouchDB API is a JSON document (application/json
).
If I have a database named “accounts” in my locally installed CouchDB, I can use the URL http://localhost:5984/accounts/ to get information about the database. In Example 8-7, information about the accounts database is returned in JSON format.
{ "db_name": "accounts", "doc_count": 3, "doc_del_count": 0, "update_seq": 7, "purge_seq": 0, "compact_running": false, "disk_size": 28773, "data_size": 1248, "instance_start_time": "1432493477586600", "disk_format_version": 6, "committed_update_seq": 7 }
Notice the "doc_count"
name-value pair. This specifies how many documents are in my database, and there are currently 3. I can query for each document in the database by the unique identifier for each document, with the URL http://localhost:5984/accounts/<unique_identifier>. If I do not know the unique identifiers for my documents, I can use the URL http://localhost:5984/accounts/_all_docs to retrieve an array of the row identifiers.
In Example 8-8, the JSON resource at URL http://localhost:5984/accounts/_all_docs contains an array of the document identifiers for each document in my database.
{ "total_rows": 3, "offset": 0, "rows": [ { "id": "3636fa3c716f9dd4f7407bd6f7000552", "key": "3636fa3c716f9dd4f7407bd6f7000552", "value": { "rev": "1-8a9527cbfc22e28984dfd3a3e6032635" } }, { "id": "ddc14efcf71396463f53c0f880001538", "key": "ddc14efcf71396463f53c0f880001538", "value": { "rev": "1-3aef6f6ae7fff90dac3ff5d6c4460ceb" } }, { "id": "ddc14efcf71396463f53c0f8800019ea", "key": "ddc14efcf71396463f53c0f8800019ea", "value": { "rev": "5-c38761b818edaf9842a63574927b7d38" } } ] }
If I grab the first identifier (3636fa3c716f9dd4f7407bd6f7000552) from this array, I can then structure a URL to make a request for the document (Example 8-9).
{ "_id": "3636fa3c716f9dd4f7407bd6f7000552", "_rev": "1-8a9527cbfc22e28984dfd3a3e6032635", "firstName": "Billy", "lastName": "Bob", "address": { "street": "123 another st", "city": "Somewhere", "state": "OR", "zip": "97501" }, "age": 54, "gender": "male", "famous": false }
We’ve seen how to request data from a CouchDB database. Now let’s take a look at how we deliver data.
If I want to add another account document to my accounts database, I would do so by posting to the URL http://localhost:5984/accounts/ (see Examples 8-10 and 8-11).
POST /accounts/ HTTP/1.1 Host: localhost:5984 Content-Type: application/json Cache-Control: no-cache
{ "firstName": "Janet", "lastName": "Jackson", "address": { "street": "456 Fakey Fake st", "city": "Somewhere", "state": "CA", "zip": "96520" }, "age": 54, "gender": "female", "famous": true }
If you use Chrome as your Internet browser, check out “Postman” in the Chrome Web Store. Postman gives you a simple interface for building up and testing HTTP requests to APIs with all the HTTP methods (GET
, POST
, PUT
, DELETE
, etc.). In addition, Postman saves a history of your requests, so you can test the same requests over and over without having to re-create them.
After the HTTP request is successful, the CouchDB API will deliver a response message in JSON format that includes the newly generated identifier for the document (Example 8-12).
{ "ok": true, "id": "3636fa3c716f9dd4f7407bd6f700076c", "rev": "1-363f3b4bf90183781d08fe22487f3c90" }
I may now form a URL using the new unique identifier to request the JSON document from the accounts database (Example 8-13).
{ "_id": "3636fa3c716f9dd4f7407bd6f700076c", "_rev": "1-363f3b4bf90183781d08fe22487f3c90", "firstName": "Janet", "lastName": "Jackson", "address": { "street": "456 Fakey Fake st", "city": "Somewhere", "state": "CA", "zip": "96520" }, "age": 54, "gender": "female", "famous": true }
If I want to update my new JSON document, I can simply include the "_id"
and "_rev"
name-value pairs in the body of my request and POST
to the URL of the resource. For example, if I wanted to update Janet Jackson’s age to 55, I would send the same document in the last example with the updated age. The API will then respond with a JSON status document that includes the updated "rev"
name-value pair (see Example 8-14).
{ "ok": true, "id": "3636fa3c716f9dd4f7407bd6f700076c", "rev": "3-29ede949ceed9df62bd7caecb095bffe" }
If I want to delete a document, I use the HTTP DELETE
method and pass the revision identifier into the URL as a query string parameter (Example 8-15).
http://localhost:5984/accounts/3636fa3c716f9dd4f7407bd6f700076c? rev=3-29ede949ceed9df62bd7caecb095bffe
If I request the resource at URL http://localhost:5984/accounts/ once more, it will now show that there are four documents (Example 8-16).
{ "db_name": "accounts", "doc_count": 4, "doc_del_count": 0, "update_seq": 8, "purge_seq": 0, "compact_running": false, "disk_size": 32869, "data_size": 1560, "instance_start_time": "1432493477586600", "disk_format_version": 6, "committed_update_seq": 8 }
These examples showed how we can view and create the JSON documents that are stored in a CouchDB database. If this were all we could do, we would quickly run into problems. Perhaps I need a list of last names that only includes famous people. In CouchDB, we can achieve this with “views.” See Example 8-17.
Views are how we can restructure and query the data from a CouchDB database. The views are stored in a JSON document called a design document. The design document specifies the language, and may include multiple views.
{ "_id": "_design/lists", "_rev": "8-4124de7756277c6a937004a763d6247d", "language": "javascript", "views": { "famous": { "map": "function(doc){if(doc.lastName!==null&&doc.famous){emit (doc.lastName,null)}}" } } }
Each view is an object that may contain a map and reduce function. In Example 8-17, there is only a map function. The map function takes each document as a parameter, and then the emit()
function is called. This essentially transforms the data (Example 8-18).
function(doc) { if (doc.lastName !== null && doc.famous) { emit(doc.lastName, null) } }
In Example 8-18, the function checks each JSON document in the database to make sure we have a last name, and that the name-value pair for "famous"
is true. The parameters for emit()
are key
and value
, and these will display as name-value pairs in the transformed document result.
I can then request this view as a resource via the URL http://localhost:5984/accounts/_design/lists/_view/famous (see Example 8-19). The structure is /<db-name>/_design/<design_document_name>/_view/<view_name>/.
{ "total_rows": 2, "offset": 0, "rows": [ { "id": "ddc14efcf71396463f53c0f880001538", "key": "Barker", "value": null }, { "id": "3636fa3c716f9dd4f7407bd6f700076c", "key": "Jackson", "value": null } ] }
In addition to the map step, you may optionally reduce the results. CouchDB has three built-in reduce functions: _count
, _sum
, and _stats
. These are useful after transforming the data with the map step to retrieve statistical information on your data set.
For example, I may want information on how many accounts there are based on gender. I can add a new view to my design document (see Example 8-20).
{ "famous": { "map": "function(doc){if(doc.lastName!==null&&doc.famous){emit (doc.lastName,null)}}" }, "gender_count": { "map": "function(doc){if(doc.gender!==null) emit(doc.gender);}", "reduce": "_count" } }
In the map step for the "gender_count"
view, the function is emitting the value from the "gender"
name-value pair. Without the reduce step, this would produce an array of objects that have the gender value for "key"
(seeExample 8-21).
{ "total_rows": 4, "offset": 0, "rows": [ { "id": "3636fa3c716f9dd4f7407bd6f700076c", "key": "female", "value": null }, { "id": "ddc14efcf71396463f53c0f8800019ea", "key": "female", "value": null }, { "id": "3636fa3c716f9dd4f7407bd6f7000552", "key": "male", "value": null }, { "id": "ddc14efcf71396463f53c0f880001538", "key": "male", "value": null } ] }
After the reduce step with the built-in "_count"
function, we get a result that provides a count of all the records (see Example 8-22).
{ "rows": [ { "key": null, "value": 4 } ] }
What we are looking for is a count for each unique gender, so we need to pass in the group flag to group the results by key. To achieve this, we add a query string parameter ?group=true to the URL (see Example 8-23).
{ "rows": [ { "key": "female", "value": 2 }, { "key": "male", "value": 2 } ] }
With the CouchDB API, we can create any number of design documents and views for a database. Each of the views within the design documents optionally contain a map and reduce function to transform the data. We can then create our custom set of URL resources for transformed data sets from our stored JSON documents. Essentially, we can build our own custom API for our data with the CouchDB API.
This chapter covered the following key terms:
We also discussed these key concepts:
We also discussed CouchDB. Here are some key points to remember:
3.145.125.51