Backmatter

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

9. PySpark MLlib and Linear Regression

Index

Anonymous function

Apache Hadoop

Apache HBase

Apache Hive

Apache Kafka

Apache License

Apache Mahout

Apache Mesos

Apache Pig

Apache Spark

Apache Storm

Apache Tez

Atomicity, Consistency, Isolation, and Durability (ACID)

avg() function

bfs() function

Big data

characteristics

variety

velocity

veracity

volume

Breadth-first search algorithm

CentOS operating system

Cluster managers

count() function

Count of records

createCSV() function

createDataFrame() function

createJSON() function

createOrReplaceTempView() function

createStream() function

CSV file

reading

paired RDD

parseCSV() function

writing RDD to

Data aggregation

filament data

mean

paired RDD

RDD

DataFrame

changing data type of column

compound logical expression

creation

data aggregation

data joining

full outer join

inner join

left outer join

reading student data table, PostgreSQL database

reading subject data, JSON file

right outer join

exploratory data analysis

filament data nested list creation

filter() and count() functions

RDD of row objects, creation

schema creation

schema definition

schema printing

SQL and HiveQL queries, execution of

summary statistics

DataFrame abstraction

Data joining

full outer join

inner join

left outer join

reading student data table, PostgreSQL database

reading subject data, JSON file

right outer join

DataNodes

Dataset interface

Data structure, labeled point

Dense vector creation

describe() function

Distributed systems

E-commerce companies

Extract, transform, and load (ETL)

filter() function

Full outer join

Google file system (GFS)

GraphFrames library

GraphFrames object creation

groupBy() function

Hadoop distributed file system (HDFS)

reading data from

saving RDD data to

Hadoop installation

.bashrc file

CentOS User

downloading

environment file

installation directory

Java

jps command

NameNode format

passwordless login

problem

properties files

solution

starting script

HBase

Hive installation

Hive property

HiveQL and SQL queries, execution of

HiveQL commands

Hive query language (HQL)

Inner join

I/O operations

SeePySpark, input/output (I/O) operations

IPython

integration

Notebook

pip

PySpark

Java database connectivity (JDBC)

JavaScript object notation (JSON)

reading file

reading subject data from

writing RDD to file

jsonParse() function

K-nearest neighbors (KNN) algorithm, PySpark

Labeled point

Lasso regression

Left outer join

Len() function

Linear regression

Local matrix creation

Machine learning

map() function

Map-reduce model

Matrices

local matrix creation

row matrix creation

MLlib

Mutable collection

N, O

NameNode

nc command

Netcat

newAPIHadoopRDD() function

NoSQL databases

NumPy

array()

dtype

mean

mean temperature

medians

min() and max()

ndarray

pip

shape

standard deviation

temperature readings

variance

vstack()

P, Q

Page-rank algorithm

damping factor

function

loop

nested lists

optimization

paired RDDs

web-page system

Paired RDD

aggregate data

SeeData aggregation

creation

consonants

elements

keys()

map()

values

join data

creation

full outer

inner

left outer

nested list

right outer

key/value-pair architecture

page rank

SeePage-rank algorithm

playDataLineLength RDD

PostgreSQL database

predict() function

printSchema() function

Procedural language/PostgreSQL (PL/pgSQL)

PySpark

k-nearest neighbors (KNN) algorithm

page-rank algorithm optimization

script execution

in local mode

Standalone and Mesos cluster managers

PySpark, input/output (I/O) operations

reading CSV file

paired RDD

parseCSV() function

reading data

HDFS

sequential file

reading directory

textFile() function

wholeTextFiles() function

reading JSON file

reading table data, HBase

reading text file

count() function

Len() function

textFile() function

wholeTextFiles() function

saving RDD data to HDFS

writing data to sequential file

writing RDD

CSV file

JSON file

text file

PySpark MLlib

dense vector creation

labeled point creation

local matrix creation

row matrix creation

sparse vector creation

PySparkSQL

breadth-first search algorithm

DataFrame

changing data type of column

compound logical expression

creation

data aggregation

data joining

exploratory data analysis

filament data nested list creation

filter() and count() functions

schema creation

schema definition

schema printing

SQL and HiveQL queries, execution of

summary statistics

RDD of row objects, creation

GraphFrames object creation

page-rank algorithm

reading table data, Apache Hive

PySpark shell

problem

Python programmers

solution

PySpark streaming

integration, Apache Kafka

reading data, console

Python

conditionals

data and data type

dictionary

for and while loops

functions

lambda function

list

NumPy

SeeNumPy

set

string

tuple

typecasting

randomSplit() function

registerTempTable() function

Regression

lasso

linear

ridge

Relational database management system (RDBMS)

Resilient distributed dataset (RDD)

action

creation

first()

getNumPartitions()

list

parallelized()

take()

data manipulation

collect()

filter()

list

map()

sortBy()

take()

Mesos cluster manager

run set operations

SparkContext

Standalone Cluster Manager

summary statistics

temperature data

transformation

Ridge regression

Right outer join

round() function

Row matrix creation

save() method

saveAsTextFile() function

select command

sequenceFile() function

sequenceFile() method

Sequential file

reading data from

writing data to

show() function

Shuffling

socketTextStream() function

Software libraries

Spark

Spark architecture

driver

executors

Spark installation

allPySpark location

.bashrc File

downloading

environment file

problem

PySpark

solution

.tgz file

spark.read.csv() function

spark.sql() function

Sparse vector creation

split() function

SQL and HiveQL queries, execution of

Stochastic gradient descent (SGD)

stringToNumberSum() function

strip() function

StructField()

StructType() function

Structured query language (SQL)

summary() function

Supervised machine-learning algorithm

Table joining

take() function

textFile() function

train() method

type() function

Unix

User-defined functions (UDFs)

Vectors

dense vector

sparse vector

W, X, Y, Z

wholeTextFiles() function

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Backmatter

Create new playlist

Sign In

Sign Up

Table of Contents for
Backmatter