102 | Big Data Simplied
super-columns, but not the actual columns within them. So, the actual columns can differ from
row to row. But the column families or super-columns, which clearly imply a certain category or
domain of data, do need to be declared, when the table is designed.
Apache Hbase, which is a part of virtually every single Hadoop distribution is one of the most
prominent examples of Wide Column stores or Columnar Databases. Cassandra is a significant
second.
5.3.3 Document Stores
Next, we look at Document Stores, where instead of having rows basically you have documents.
Conceptually, the documents in a Document Store are similar to rows. The documents contain
key and value pairs. A minor difference here is that the value of a key can itself be a document.
The documents can be JavaScript objects. They are encoded using JavaScript Object Notation
or JSON. JavaScript language ends up being used as the internal language for these databases as
well. The documents can also be in XML or other semi-structured formats. All these technologies
are extremely familiar to a web developer. As such, the document stores tend to be highly used in
web applications, because through this technology, the static content in a website and the actual
data-driven content end up having a lot in common.
Figure 5.3 Wide column stores
Table: Customers Table: Orders
Row ID: 101
Super Column: Name
Column: First_Name: John
Column: Second_Name: Doe
Super Column: Address
Column: House_No: 123
Column:Street_Name: Park
Street
Super Column: Orders
Column: Last_Order_ID: 1701
Row ID: 1701
Super Column: Pricing
Column: Price: 1000 USD
Super Column: Items
Column: Item_ID: 2345
Column: Item_ID: 7890
Row ID: 102
Super Column: Name
Column: First_Name: Jane
Column: Second_Name: Doe
Super Column: Address
Column: House_No: 456
Column:Street_Name: Green
Street
Super Column: Orders
Column: Last_Order_ID: 1702
Row ID: 1702
Super Column: Pricing
Column: Price: 700 USD
Super Column: Items
Column: Item_ID: 4321
Column: Item_ID: 5446
M05 Big Data Simplified XXXX 01.indd 102 5/20/2019 7:42:43 PM