Introducing NoSQL | 113
not a problem because we are not really updating these databases. We are just reading them.
Meanwhile, fast reads are important. So, what we are really comfortable with is sacrificing con-
sistency for availability, which is exactly what NoSQL databases do.
Therefore, while NoSQL databases are probably not going to serve the BI workload directly,
they can in fact be useful data sources in the BI pipeline. You will need to ‘squash’ the multi-
structured data into a relational structure to augment the structured data sets in your BI program.
For advanced analytical practices, regardless of the data being structured or multi-structured,
you often need to prepare the data for analytical models, so there is no reason that you cannot
use multi-structured.
5.8 BIG DATA AND NoSQL
Big Data and NoSQL are absolutely not the same thing. So, how are they related, if at all? Recall
that Big Data involves datasets that are large enough to be disruptive to relational databases and
cannot be handled efciently by them. Similarly, in extreme scaling situations, NoSQL databases
can also handle large data loads where relational databases tend to break down.
The consideration of distributed file systems is very relevant here. The fact that NoSQL data-
bases tend to distribute their data works really well in a Big Data scenario because it means all
those pieces of data on different servers can be counted at the same time by computing agents
on those servers.
The fact that NoSQL uses commodity hardware and commodity discs again works really well
for Big Data, because Big Data is handled by scaling out across large number of nodes in a cluster.
Therefore, it is sustainable only if the nodes are based on commodity hardware that is relatively
affordable.
Although the Big Data scenario is not always a web scale scenario in terms of its users, it is
still a huge scale-out scenario in terms of computations. So, the scaling features and strengths of
NoSQL tend to work very well in a Big Data scenario.
Thus, we see that there is a huge amount of overlap between NoSQL and Big Data. In fact, a
combination of HBase, which is a NoSQL wide column store database, and Hadoop, which is
a distributed computing platform for breaking up work and spreading it over a whole array of
machines in a cluster is rather common.
Summary
• NoSQL allows flexible schema, which is
quicker and cheaper to set up for global
scale of operations and ensures mas-
sive scalability, higher performance and
availability.
• NoSQL product can be installed on a
number of commodity servers and then be
federated.
• CAP stands for Consistency, Availability
and Partition tolerance. Consistency means
that the data is not corrupted and it is not
in an inconsistent state across different
nodes in a cluster. Availability means we
can get to that data rather quickly. Partition
tolerance means that that data need not all
be in one place together. CAP theorem
M05 Big Data Simplified XXXX 01.indd 113 5/20/2019 7:42:54 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.74.160