NoSQL databases

Now that we have gotten right to the essential attributes of the two main NoSQL services on the GCP, let's understand how their internal data models differ from traditional RDBMS. As an example, consider the relational representation of simple data about individuals in this relational table called Persons:

If we had additional information about the children and pets of these individuals, we would have additional tables and each of those tables would reference the PersonID field of the Persons table as the foreign key. That would lead to a fairly typical star-schema:

Here is how the same data would be represented in a few different types of NoSQL databases:

  • Key-value data stores: Each individual column and the associated value would be stored as a key-value pair. Redis, for instance, is a key-value store, and so is Memcache on the Google Cloud Platform. Key-value stores are optimized for queries of the form please give me the value corresponding to this particular key.
  • Document stores: This time, rather than storing individual key-value pairs for each column, the entire document is stored in the database. The whole point of document stores is that they are able to perform extremely fast hierarchical queries; document stores are optimized for queries of the form please give me the value at this particular path from the root node of the document. Such hierarchical queries are common in JavaScript, for instance, where the programmer uses the document object model (DOM) to parse elements in the HTML:
  • Wide-column stores: These are an entirely different beast from the two other categories we discussed. The emphasis here is on flexible schemas and data sorted on a particular key, called the row key:

This table is key to understanding the differences between relational and columnar databases, so let's pay some more attention to it:

  • In the columnar world, column family ~ table/relation
  • Dynamic schemas: Columns can be added on the fly without expensive DDL operations such as ALTER TABLE
  • Less redundancy: Default values or NULLs need not be in the data at all
  • No normalization: The previous format has no foreign keys, and violates just about every normal form (Boyce and Codd would be turning in their graves looking at this)
  • In reality, each value is timestamped, so it is also possible to retrieve specific versions of a particular data item.
  • For this reason, this data model is said to be four-dimensional; any data item can be accessed if we have four pieces of information: row key, column family, column name and timestamp
  • Data is stored in sorted order of row key; this is a very important point to keep in mind

The Google Cloud Platform offers two options for those of us who'd like to store their data in non-relational, distributed, and horizontally scalable structures:

  • Cloud Bigtable
  • Datastore

In this chapter, we will explore the implementation, features, and functionalities of both of these NoSQL storage options, starting with Bigtable, Google's alternative to HBase.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.174.202