28 | Big Data Simplied
2.7.3
Hadoop Afliate Technologies
There are a few other technologies that are not strictly part of the Hadoop stack, but they are
frequently used together.
One of the few technologies that use this technique is R programming language. R is an open
source programming language that carries out statistical analysis. There are libraries for R called
rmr as in R MapReduce, rhdfs and rhbase. With those three libraries, one can create MapReduce
jobs in R, and have those jobs access HDFS files in a raw way or data that is stored in HBase using
Hbase’s APIs. Even though being an open source project, there is one company that has probably
the most prolific contributions of source code, and probably the largest number of committers on
the project, and that is a company called Revolution Analytics.
One other technology worth mentioning is Lucene. It has been around a long time as an open
source technology for building up full text indexes on databases. The wrapper on top of that core,
which provides really the high level indexing services is called Solr. Interestingly, the creator of
Lucene and Hadoop are the same person, Doug Cutting.
2.7.4 Massively Parallel Processing
Moving away from the Hadoop discussion, this section talks about a more conventional approach
for data processing in huge volumes. It employs a processing approach called Massively Parallel
Processing (MPP).
It is clustered and it splits a big query into sub queries and then it distributes those sub
queries to individual database engines. So, these are not just appliances, but the appliances
contain a cluster of database nodes. However, one is not going to program MPP engines in Java.
FIGURE 2.7 Functioning of the reduce step
{Mary, 1}
{little, 1}
{lamb, 1}
{little, 1}
{lamb, 1}
{little, 1}
{lamb, 1}
{Mary, 1}
{little, 1}
{lamb, 1}
{Mary, 5}
{little, 4}
{lamb, 4}
{Mary, 1}
{Mary, 1}
{Mary, 1}
R
M
M
M
M02 Big Data Simplified XXXX 01.indd 28 5/10/2019 9:56:53 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.194.123