Introducing MapReduce | 95
7. Can the number of reducers be set to zero?
a. Yes
b. No
c. Not applicable
d. None of the above
8. Where is the Map Output (intermediate
key-value data) stored?
a. HDFS
b. Local File System
c. Name Node
d. Data Node
9. When does the Reduce start in a Map
Reduce?
a. Before any Map job starts
b. When first Map job is completed
c. When all the child Map job is completed
d. None of the above
Short-answer Type Questions (5 Marks Questions)
1. Name the most common input formats
defined in Hadoop. Which one is default?
2. Rearrange the main configuration param-
eters that the user need to specify to run
Mapreduce Job.
a. Job’s input locations in the distributed
file system.
b. Input format
c. Class containing the map function.
d. Output format
e. Job’s output location in the distributed
file system.
f. Class containing the reduce function.
g. Application JAR file containing the
mapper, reducer and driver classes for
execution and deployment.
3. What is InputSplit in Hadoop? Please
explain.
4. Assume that Hadoop spawned 100 tasks
for a job and one of the tasks failed. What
will Hadoop MapReduce framework do?
5. What is the difference between an Input
Split and HDFS Block? Please explain.
6. Explain the difference between Job.sub-
mit() and waitForCompletion().
7. What will happen if we run a MapReduce
job with an output directory that already
exists? Please explain the root cause here.
8. How an input file is made ready from
HDFS by MapReduce framework. Please
explain.
9. How are the keys grouped before reaching
the Reduce phase? Explain in detail.
10. What will be the problem if the Reducer
function does not receive the values (com-
ing values from Map) in a List? Why it is
needed so? Please explain.
11. What are the main configuration parame-
ters specified in MapReduce?
Long-answer Type Questions (10 Marks Questions)
1. What is shuffling and sorting in
MapReduce? Please explain in detail.
2. Explain the internal flow of a MapReduce
job with a diagram.
3. What is Speculative Execution MapReduce?
What is the main reason behind it and how
does MapReduce framework handle it?
4. How can you troubleshoot a MapReduce
job after getting an exception? Please
explain in detail.
5. Explain in detail how Yarn schedules a
MapReduce job in the job queue.
6. How can we troubleshoot a MapReduce
job? What will be the action you take if a
M04 Big Data Simplified XXXX 01.indd 95 5/10/2019 9:58:28 AM
96 | Big Data Simplied
MapReduce job is taking too much time to
complete? How can you find out the root
cause?
7. Explain the different types of events inside
Intermediate Event (between Map and
Reduce phase).
8. What are the parameters of mappers and
reducers? Please explain the meaning of
each parameter of Mapper <LongWritable,
Text, Text, IntWritable> and Reducer
<Text, IntWritable, Text, IntWritable>.
9. Explain the differences between a com-
biner and reducer. When is it suggested to
use a combiner in a MapReduce job?
10. What is the main difference between
Mapper and Reducer? What will happen if
the number of Reducer is set to 0 (zero)?
Why Compute Nodes and the Storage
Nodes are same? Please explain in detail.
M04 Big Data Simplified XXXX 01.indd 96 5/10/2019 9:58:28 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.151.164