226 | Big Data Simplied
5. While loading data sets in R, data frame
object is created in the _______.
a. Hard disk
b. Cloud
c. R workspace
d. None of the above
6. Data frame object getting created in the R
workspace results in
a. Poor performance
b. Out-of-memory problem
c. Enhanced efficiency
d. All the above
7. ff and ffbase packages in R help in address-
ing _______ problem?
a. Out-of-memory
b. Boosted performance
c. Both (a) and (b)
d. None of the above
8. rhadoop package has _______ packages
under it.
a. Three
b. Five
c. Four
d. Nine
9. Which of the following is not a package
under rhadoop package?
a. rhbase
b. rmr2
c. bigglm
d. plyrmr
10. Which of the following functions helps in
the fastest read of data sets?
a. read.csv
b. read.table.ffdf
c. read.table
d. fread
Short-answer Type Questions (5 Marks Questions)
1. What is CRAN? How can we get and set
working directory in R?
2. Compare between array and matrix data
structures in R.
3. Explain with relevant example the special-
ity of a list data structure.
4. What is a package in R? How can you
install and start using a package?
5. Explain the different ways of loading a data
set to start processing with it.
6. Why is read.table.ffdf better to use than
read.table?
7. What are the main advantages of using
data.table package?
8. Explain the purpose of the following
packages.
a. Hmisc
b. ggplot2
9. Mention the use of the following functions
along with the package name they belong
to.
a. detectCores()
b. as.data.frame.ffdf()
c. subset.ffdf()
10. Explain how dplyr package helps in
achieving advanced data manipulation.
Long-answer Type Questions (10 Marks Questions)
1. A student data set has attribute Name,
Roll No, Gender, Marks_English, Marks_
Maths, Marks_Science. Write suitable R
commands to achieve the following.
a. Select only the name and roll number of
students whose English marks are miss-
ing (have NA value).
M08 Big Data Simplified XXXX 01.indd 226 5/10/2019 10:01:18 AM
Working with Big Data inR | 227
b. Select the top 5 students having marks
more than 78.
c. Select the girls having ‘ta’ in their
name, for example, Ankita, Ashmita,
Tamanna, etc.
d. Select the name and a column having
total marks.
2. Explain with relevant example how the
parallel package addresses the issue of
poor performance of R.
3. Explain in detail how R can be integrated
into the Hadoop environment.
4. Explain the use of the different packages
under rhadoop package.
5. Discuss the main limitations of R as a pro-
gramming language as the volume of data
becomes large.
6. What are the main R packages which helps
in remediating the limitations that R faces
with large data sets? Discuss how any two
of them helps in addressing the issues.
7. You are a data scientist in a credit card
company. Every day you get the credit card
data consisting of fields, such as Time,
fields V1 - V28, Amount and Class. Class
value 0 signifies the transaction is normal
while 1 signifies that it is fraud. You need
to write a small R program to give a total
value of fraudulent transactions.
During the festive season, the number of
transactions has grown exponentially. Due
to the high data size, you are not able to
process the data in your laptop having
4GB RAM. What do you think the poten-
tial problem might be? What strategy can
you take in this situation so that you can
continue working in the same laptop with-
out any upgrade and using R program?
8. Differentiate between:
a. Histogram vs. box plot
b. read.table vs. read.table.ffdf functions
9. Write short notes on the following.
a. Statistical techniques of data set
exploration
b. Scatter plot
10. Write a simple program in R to count all
words having ‘an’ in it, to be executed on a
text file that resides in Hadoop.
M08 Big Data Simplified XXXX 01.indd 227 5/10/2019 10:01:19 AM
M08 Big Data Simplified XXXX 01.indd 228 5/10/2019 10:01:19 AM
This page is intentionally left blank
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.188.138