256 | Big Data Simplied
7. We can run the mapper and reducer on local files (for example, python-input.txt). In order
to run the Map and Reduce on the Hadoop Distributed File System (HDFS), we need the
Hadoop Streaming jar library. So, before we run the scripts on Hadoop engine, test locally to
ensure that they are working fine.
• Run the mapper.
cat python-input.txt | python mapper.py
• Run reducer.py
cat python-input.txt | python mapper.py | sort -k1,1 | python
reducer.py
Our testing in local has completed as the mapper and reducer are working as expected so we
won’t face any further issues.
9.4.4 Running the MapReduce Python Code on Hadoop
1. Before we run the MapReduce task on Hadoop, copy local data (python-input.txt) to HDFS
inside the ‘/data’ directory.
hadoop fs -put /<your-local-path>/python-input.txt /data
hadoop fs -cat /data/python-input.txt
M09 Big Data Simplified XXXX 01.indd 256 5/10/2019 10:23:01 AM