74 | Big Data Simplied
Data Types: Java Vs. MapReduce
JAVA MapReduce
String Text
Int IntWritable
Long LongWritable
Null NullWritable
4.2.3 End-to-End Technical Anatomy of a MapReduce Job
1. Driver/Controller Class
/** Entry-point for MapReduce program. Constructs a Job object
representing a single Map-Reduce and asks Hadoop to run it.
Whenrunning on a cluster, the final wait For Completion call
will distribute the code for this job across the cluster through
ResourceManager.
**/
public class WordCountDriver{
public static void main(String[] args) throws Exception {
/* Create an object to represent a Job. */
Job job = new Job(conf, wordcount);
/** Tell Hadoop where to locate the code that must be shipped if this
job is to be run across a cluster.
**/
job.setJarByClass(WordCountMap.class);
/** Set the datatypes of the keys and values outputted by the maps and
reduces phase. These must agree with the types used by the Mapper and
Reducer. Mismatches in datatype will not be caught until runtime.
*/
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
/* Set the mapper and reducer to use. Pass the Map and Reduce ‘.class’
name only. */
job.setMapperClass(WordCountMap.class);
job.setReducerClass(SumReduce.class);
M04 Big Data Simplified XXXX 01.indd 74 5/10/2019 9:58:19 AM
Introducing MapReduce | 75
/** Specify the input and output locations to use for this MapReduce
job. These two arguments will pass through command line during run
time as argument 0 as HDFS input path and argument 1 as HDFS output
path respectively
**/
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
/** Submit the job and wait for it to finish. The argument specifies
whether to print progress information to output. (true means to do
so.)
**/
job.waitForCompletion(true);
}
}
2. Map Class
/** Mapper for word count.
* The base class Mapper is parameterized by
<in key type, in value type, out key type, out value type>Thus,
this mapper takes (Textkey, Text value) pairs and output(Text key,
LongWritable value) pairs. The input keys are assumed to be identifiers
for documentsi.e called file offset number, which are ignored, and
the valuesto be the content of documents. The output keys are words
foundwithin each document, and the output values are the number of
times a word appeared within a document
*/
public class WordCountMap extends Mapper<Text, Text, Text,
LongWritable> {
/** Regex pattern to find words (alphanumeric + _). */
final static Pattern WORD_PATTERN = Pattern.compile(\w+);
/** Constant 1 as a LongWritable value. */
private final static LongWritable ONE = new LongWritable(1L);
/** Text object to store a word to write to output. */
private Text word = new Text();
/** Actual map function. Takes one document’s text and emits key-value
pairs for each word found in the document.
@param key Document identifier (ignored)i.e file offset number.
@param value Text of the current document.
M04 Big Data Simplified XXXX 01.indd 75 5/10/2019 9:58:19 AM
76 | Big Data Simplied
@param context MapperContext object for accessing output, configuration
information, etc.
*/
public void map(Text key, Text value, Context context)throws
IOException, InterruptedException {
Matcher matcher = WORD_PATTERN.matcher(value.toString());
while (matcher.find()) {
word.set(matcher.group());
context.write(word, ONE);
}
}
}
3. Reduce Class
/** Reducer for word count.
* Like the Mapper base class, the base class Reducer is parameterized
by <in key type, in value type, out key type, out value type>.
For each Text key, which represents a word, this reducer gets a list
ofLongWritable values (from MapReduce intermediate state i.e shuffling,
sorting, partitioning & combining) computes the sum of those values,
and the key-valuepair (word, sum).
*/
public class SumReduce extends Reducer<Text, LongWritable, Text,
LongWritable> {
/** Actual reduce function.
@param key Word.
@param values Iterator over the values for this key.
@param context ReducerContext object for accessing
output,configuration information, etc.
*/
public void reduce(Text key, Iterator<LongWritable> values,
Context context) throws IOException,
InterruptedException {
long sum = 0L;
while (values.hasNext()) {
sum += values.next().get();
}
context.write(key, new LongWritable(sum));
}
}
M04 Big Data Simplified XXXX 01.indd 76 5/10/2019 9:58:19 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.252.204