Let's revisit WordCount, but this time use some of these predefined map
and reduce
implementations:
WordCountPredefined.java
file containing the following code:import org.apache.hadoop.conf.Configuration ; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.map.TokenCounterMapper ; import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer ; public class WordCountPredefined { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "word count1"); job.setJarByClass(WordCountPredefined.class); job.setMapperClass(TokenCounterMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
hadoop fs -rmr
output, for example.Given the ubiquity of WordCount as an example in the MapReduce world, it's perhaps not entirely surprising that there are predefined Mapper
and Reducer
implementations that together realize the entire WordCount solution. The TokenCounterMapper
class simply breaks each input line into a series of (token, 1)
pairs and the IntSumReducer
class provides a final count by summing the number of values for each key.
There are two important things to appreciate here:
18.117.187.62