Working with Big Data inR | 227
b. Select the top 5 students having marks
more than 78.
c. Select the girls having ‘ta’ in their
name, for example, Ankita, Ashmita,
Tamanna, etc.
d. Select the name and a column having
total marks.
2. Explain with relevant example how the
parallel package addresses the issue of
poor performance of R.
3. Explain in detail how R can be integrated
into the Hadoop environment.
4. Explain the use of the different packages
under rhadoop package.
5. Discuss the main limitations of R as a pro-
gramming language as the volume of data
becomes large.
6. What are the main R packages which helps
in remediating the limitations that R faces
with large data sets? Discuss how any two
of them helps in addressing the issues.
7. You are a data scientist in a credit card
company. Every day you get the credit card
data consisting of fields, such as Time,
fields V1 - V28, Amount and Class. Class
value 0 signifies the transaction is normal
while 1 signifies that it is fraud. You need
to write a small R program to give a total
value of fraudulent transactions.
During the festive season, the number of
transactions has grown exponentially. Due
to the high data size, you are not able to
process the data in your laptop having
4GB RAM. What do you think the poten-
tial problem might be? What strategy can
you take in this situation so that you can
continue working in the same laptop with-
out any upgrade and using R program?
8. Differentiate between:
a. Histogram vs. box plot
b. read.table vs. read.table.ffdf functions
9. Write short notes on the following.
a. Statistical techniques of data set
exploration
b. Scatter plot
10. Write a simple program in R to count all
words having ‘an’ in it, to be executed on a
text file that resides in Hadoop.
M08 Big Data Simplified XXXX 01.indd 227 5/10/2019 10:01:19 AM