Paired RDD
aggregate data
SeeData aggregation
creation
consonants
elements
keys()
map()
values
join data
creation
full outer
inner
left outer
nested list
right outer
key/value-pair architecture
page rank
SeePage-rank algorithm
Procedural language/PostgreSQL (PL/pgSQL)
PySpark
k-nearest neighbors (KNN) algorithm
page-rank algorithm optimization
script execution
in local mode
Standalone and Mesos cluster managers
PySpark, input/output (I/O) operations
reading CSV file
paired RDD
parseCSV() function
reading data
HDFS
sequential file
reading directory
textFile() function
wholeTextFiles() function
reading JSON file
reading table data, HBase
reading text file
count() function
Len() function
textFile() function
wholeTextFiles() function
saving RDD data to HDFS
writing data to sequential file
writing RDD
CSV file
JSON file
text file
PySparkSQL
breadth-first search algorithm
DataFrame
changing data type of column
compound logical expression
creation
data aggregation
data joining
exploratory data analysis
filament data nested list creation
filter() and count() functions
schema creation
schema definition
schema printing
SQL and HiveQL queries, execution of
summary statistics
RDD of row objects, creation
GraphFrames object creation
page-rank algorithm
reading table data, Apache Hive
PySpark streaming
integration, Apache Kafka
reading data, console