Index
A
Anonymous function
Apache Hadoop
Apache HBase
Apache Hive
Apache Kafka
Apache License
Apache Mahout
Apache Mesos
Apache Pig
Apache Spark
Apache Storm
Apache Tez
Atomicity, Consistency, Isolation, and Durability (ACID)
avg() function
B
bfs() function
Big data
characteristics
variety
velocity
veracity
volume
Breadth-first search algorithm
C
CentOS operating system
Cluster managers
count() function
Count of records
createCSV() function
createDataFrame() function
createJSON() function
createOrReplaceTempView() function
createStream() function
CSV file
reading
paired RDD
parseCSV() function
writing RDD to
D
Data aggregation
filament data
mean
paired RDD
RDD
DataFrame
changing data type of column
compound logical expression
creation
data aggregation
data joining
full outer join
inner join
left outer join
reading student data table, PostgreSQL database
reading subject data, JSON file
right outer join
exploratory data analysis
filament data nested list creation
filter() and count() functions
RDD of row objects, creation
schema creation
schema definition
schema printing
SQL and HiveQL queries, execution of
summary statistics
DataFrame abstraction
Data joining
full outer join
inner join
left outer join
reading student data table, PostgreSQL database
reading subject data, JSON file
right outer join
DataNodes
Dataset interface
Data structure, labeled point
Dense vector creation
describe() function
Distributed systems
E
E-commerce companies
Extract, transform, and load (ETL)
F
filter() function
Full outer join
G
Google file system (GFS)
GraphFrames library
GraphFrames object creation
groupBy() function
H
Hadoop distributed file system (HDFS)
reading data from
saving RDD data to
Hadoop installation
.bashrc file
CentOS User
downloading
environment file
installation directory
Java
jps command
NameNode format
passwordless login
problem
properties files
solution
starting script
HBase
Hive installation
Hive property
HiveQL and SQL queries, execution of
HiveQL commands
Hive query language (HQL)
I
Inner join
I/O operations
SeePySpark, input/output (I/O) operations
IPython
integration
Notebook
pip
PySpark
J
Java database connectivity (JDBC)
JavaScript object notation (JSON)
reading file
reading subject data from
writing RDD to file
jsonParse() function
K
K-nearest neighbors (KNN) algorithm, PySpark
L
Labeled point
Lasso regression
Left outer join
Len() function
Linear regression
Local matrix creation
M
Machine learning
map() function
Map-reduce model
Matrices
local matrix creation
row matrix creation
MLlib
Mutable collection
N, O
NameNode
nc command
Netcat
newAPIHadoopRDD() function
NoSQL databases
NumPy
array()
dtype
mean
mean temperature
medians
min() and max()
ndarray
pip
shape
standard deviation
temperature readings
variance
vstack()
P, Q
Page-rank algorithm
damping factor
function
loop
nested lists
optimization
paired RDDs
web-page system
Paired RDD
aggregate data
SeeData aggregation
creation
consonants
elements
keys()
map()
values
join data
creation
full outer
inner
left outer
nested list
right outer
key/value-pair architecture
page rank
SeePage-rank algorithm
playDataLineLength RDD
PostgreSQL database
predict() function
printSchema() function
Procedural language/PostgreSQL (PL/pgSQL)
PySpark
k-nearest neighbors (KNN) algorithm
page-rank algorithm optimization
script execution
in local mode
Standalone and Mesos cluster managers
PySpark, input/output (I/O) operations
reading CSV file
paired RDD
parseCSV() function
reading data
HDFS
sequential file
reading directory
textFile() function
wholeTextFiles() function
reading JSON file
reading table data, HBase
reading text file
count() function
Len() function
textFile() function
wholeTextFiles() function
saving RDD data to HDFS
writing data to sequential file
writing RDD
CSV file
JSON file
text file
PySpark MLlib
dense vector creation
labeled point creation
local matrix creation
row matrix creation
sparse vector creation
PySparkSQL
breadth-first search algorithm
DataFrame
changing data type of column
compound logical expression
creation
data aggregation
data joining
exploratory data analysis
filament data nested list creation
filter() and count() functions
schema creation
schema definition
schema printing
SQL and HiveQL queries, execution of
summary statistics
RDD of row objects, creation
GraphFrames object creation
page-rank algorithm
reading table data, Apache Hive
PySpark shell
problem
Python programmers
solution
PySpark streaming
integration, Apache Kafka
reading data, console
Python
conditionals
data and data type
dictionary
for and while loops
functions
lambda function
list
NumPy
SeeNumPy
set
string
tuple
typecasting
R
randomSplit() function
registerTempTable() function
Regression
lasso
linear
ridge
Relational database management system (RDBMS)
Resilient distributed dataset (RDD)
action
creation
first()
getNumPartitions()
list
parallelized()
take()
data manipulation
collect()
filter()
list
map()
sortBy()
take()
Mesos cluster manager
run set operations
SparkContext
Standalone Cluster Manager
summary statistics
temperature data
transformation
Ridge regression
Right outer join
round() function
Row matrix creation
S
save() method
saveAsTextFile() function
select command
sequenceFile() function
sequenceFile() method
Sequential file
reading data from
writing data to
show() function
Shuffling
socketTextStream() function
Software libraries
Spark
Spark architecture
driver
executors
Spark installation
allPySpark location
.bashrc File
downloading
environment file
problem
PySpark
solution
.tgz file
spark.read.csv() function
spark.sql() function
Sparse vector creation
split() function
SQL and HiveQL queries, execution of
Stochastic gradient descent (SGD)
stringToNumberSum() function
strip() function
StructField()
StructType() function
Structured query language (SQL)
summary() function
Supervised machine-learning algorithm
T
Table joining
take() function
textFile() function
train() method
type() function
U
Unix
User-defined functions (UDFs)
V
Vectors
dense vector
sparse vector
W, X, Y, Z
wholeTextFiles() function
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.50.87