Index
A
B
C
- chill library
- classes
- clustering
- continuous values
- CSV
- CSV files
- csvread function / How it works...
D
- data
- DataFrame
- creating, from CSV / Creating a DataFrame from CSV, How to do it..., There's more…
- URL / Creating a DataFrame from CSV, Manipulating DataFrames, Creating a DataFrame from Scala case classes
- manipulating / Manipulating DataFrames, How to do it...
- schema, printing / Printing the schema of the DataFrame
- data, sampling / Sampling the data in the DataFrame
- columns, selecting / Selecting DataFrame columns
- data by condition, filtering / Filtering data by condition
- data, sorting in frame / Sorting data in the frame
- columns, renaming / Renaming columns
- treating, as relational table / Treating the DataFrame as a relational table
- two DataFrame, joining / Joining two DataFrames
- inner join / Inner join
- right outer join / Right outer join
- left outer join / Left outer join
- saving, as file / Saving the DataFrame as a file
- creating, from Scala case classes / Creating a DataFrame from Scala case classes, How to do it..., How it works...
- JSON, loading / Loading JSON into DataFrames, How to do it…
- JSON file, reading with SQLContext.jsonFile / Reading a JSON file using SQLContext.jsonFile
- text file, converting to JSON RDD / Reading a text file and converting it to JSON RDD
- text file, reading / Reading a text file and converting it to JSON RDD
- schema, explicitly specifying / Explicitly specifying your schema, There's more…
- data, preparing / Preparing data in Dataframes, How to do it...
- Directed Acyclic Graph (DAG) / Submitting jobs to the Spark cluster (local)
- Dow Jones Index Data Set
- Driver program / There's more…
- DStreams
E
- EC2
- Elasticsearch
- ElasticSearch
- ETL tool
F
G
- gradient descent
- Graphviz
- GraphX
H
I
J
K
L
- legends property / Adding a legend to the plot
- Lempel-Ziv-Oberhumer (LZO) / Enable compression for the Parquet file
- linear regression
- used, for predicting continuous values / Predicting continuous values using linear regression, How to do it...
- data. importing / Importing the data
- each instance, converting into LabeledPoint / Converting each instance into a LabeledPoint
- training, preparing / Preparing the training and test data
- test data, preparing / Preparing the training and test data
- features, scaling / Scaling the features
- model, training / Training the model
- test data, predicting against / Predicting against test data
- model, evaluating / Evaluating the model
- parameters, regularizing / Regularizing the parameters
- mini batching / Mini batching
- LogisticRegression
M
- matrices
- working with / Working with matrices, How to do it...
- creating / Creating matrices
- creating, from values / Creating a matrix from values
- zero matrix, creating / Creating a zero matrix
- creating, out of function / Creating a matrix out of a function
- identity matrix, creating / Creating an identity matrix
- creating, from random numbers / Creating a matrix from random numbers
- Scala collection, creating / Creating from a Scala collection
- appending / Appending and conversion
- concatenating / Concatenating matrices – vertically
- concatenating, hvertcat function / Concatenating matrices – vertically
- concatenating, horzcat function / Concatenating matrices – horizontally
- data manipulation operations / Data manipulation operations
- basic statistics, computing / Computing basic statistics
- mean and variance / Mean and variance
- standard deviation / Standard deviation
- working / How it works...
- with randomly distributed values / Vectors and matrices with randomly distributed values, How it works...
- matrix
- column vectors, obtaining / Getting column vectors out of the matrix
- row vectors, obtaining / Getting row vectors out of the matrix
- inside values, obtaining / Getting values inside the matrix
- inverse, obtaining / Getting the inverse and transpose of a matrix
- transpose, obtaining / Getting the inverse and transpose of a matrix
- largest value, finding / Finding the largest value in a matrix
- sum, finding / Finding the sum, square root and log of all the values in the matrix
- square root, finding / Finding the sum, square root and log of all the values in the matrix
- log of all values, finding / Finding the sum, square root and log of all the values in the matrix
- sqrt function / Finding the sum, square root and log of all the values in the matrix
- log function / Calculating the eigenvectors and eigenvalues of a matrix
- eigenvectors, calculating / Calculating the eigenvectors and eigenvalues of a matrix
- eigenvalues, calculating / Calculating the eigenvectors and eigenvalues of a matrix
- with uniformly random values, creating / Creating a matrix with uniformly random values
- with normally distributed random values, creating / Creating a matrix with normally distributed random values
- with random values with Poisson distribution, creating / Creating a matrix with random values that has a Poisson distribution
- matrix arithmetic
- matrix of Int
- Mesos
- micro-batching
N
O
P
- PairRDD
- Parquet
- Parquet-MR project
- Parquet files
- Parquet tools
- PCA
- used, for feature reduction / Feature reduction using principal component analysis, How to do it...
- about / Feature reduction using principal component analysis
- dimensionality reduction, of data for supervised learning / Dimensionality reduction of data for supervised learning
- training data, mean-normalizing / Mean-normalizing the training data, Mean-normalizing the training data
- principal components, extracting / Extracting the principal components, Extracting the principal components
- labeled data, preparing / Preparing the labeled data
- test data, preparing / Preparing the test data
- metrics, classifying / Classify and evaluate the metrics
- metrics, evaluating / Classify and evaluate the metrics, Evaluating the metrics
- data, dimensionality reduction / Dimensionality reduction of data for unsupervised learning
- number of components / Arriving at the number of components
- pem key
- Pipeline API, used for solving binary classification
- data, importing as test / Importing and splitting data as test and training sets
- data, importing as training sets / Importing and splitting data as test and training sets
- data, splitting as training sets / Importing and splitting data as test and training sets
- data, splitting as test / Importing and splitting data as test and training sets
- participants, constructing / Construct the participants of the Pipeline
- pipeline, preparing / Preparing a pipeline and training a model
- model, training / Preparing a pipeline and training a model
- test data, predicting against / Predicting against test data
- mode, evaluating without cross-validation / Evaluating a model without cross-validation
- parameters for cross-validation, constructing / Constructing parameters for cross-validation
- cross-validator, constructing / Constructing cross-validator and fit the best model
- model, evaluating with cross-validation / Evaluating the model with cross-validation
- Pipeline API, used for solving binary classification problem
- prerequisite, for running ElasticSearch instance on machine
- Principal Component Analysis (PCA) / Gradient descent
- Privacy Enhanced Mail (PEM) / How to do it...
- Product
- pseudo-clustered mode
R
S
- save method / Save it as a Parquet file
- sbt-avro plugin
- sbt-dependency-graph plugin
- SBT assembly plugin
- sbteclipse plugin
- Scala bindings
- Scala Build Tool (SBT) / Getting Breeze – the linear algebra library
- Scala case classes
- scatter plots, creating with Bokeh-Scala
- Sense plugin
- Snappy
- Snappy compression / Enable compression for the Parquet file
- source build tool (SBT) / Getting Apache Spark
- Spark
- spark.driver.extraClassPath property
- Spark 14
- Spark application
- Spark cluster
- Spark job
- Spark job, installing on YARN
- Spark master and slave
- Spark Standalone cluster
- running, on EC2 / Running the Spark Standalone cluster on EC2
- AccessKey, creating / Creating the AccessKey and pem file
- pem file, creating / Creating the AccessKey and pem file
- environment variables, setting / Setting the environment variables
- launch script, running / Running the launch script
- installation, verifying / Verifying installation
- changes, making to code / Making changes to the code
- data, transferring / Transferring the data and job files
- job files, transferring / Transferring the data and job files
- dataset, loading into HDFS / Loading the dataset into HDFS
- job, running / Running the job
- destroying / Destroying the cluster
- Spark Streaming
- Stochastic Gradient Descent (SGD) / Gradient descent
- StreamingLogisticRegression, used for classifying Twitter stream
- Student dataset
- supervised learning / Supervised and unsupervised learning
- Support Vector Machine (SVM)
T
- time series MultiPlot, creating with Bokeh-Scala
- about / Creating a time series MultiPlot with Bokeh-Scala, How to do it...
- data, preparing / Preparing our data
- Plot, creating / Creating a plot
- line joining to all data points, creating / Creating a line that joins all the data points
- and y axes data ranges for plot, setting / Setting the x and y axes' data range for the plot
- axes, drawing / Drawing the axes and the grids
- grids, drawing / Drawing the axes and the grids
- tools, adding / Adding tools
- legend, adding to plot / Adding a legend to the plot
- multiple plots, creating in document / Multiple plots in the document
- URL / Multiple plots in the document
- toDF() function / How to do it...
- twitter-chill project
- twitter4j library
- Twitter app
- Twitter data
- Twitter stream
U
V
- vector concatenation
- vectors
- working with / Working with vectors, Getting ready
- creating / Creating vectors
- constructing, from values / Constructing a vector from values
- zero vector, creating / Creating a zero vector
- creating, out of function / Creating a vector out of a function
- vector of linearly spaced values, creating / Creating a vector of linearly spaced values
- vector with values, creating in specific range / Creating a vector with values in a specific range
- entire vector with single value, creating / Creating an entire vector with a single value
- sub-vector, slicing from bigger vector / Slicing a sub-vector from a bigger vector
- Breeze vector, creating from Scala vector / Creating a Breeze Vector from a Scala Vector
- arithmetic / Vector arithmetic
- scalar operations / Scalar operations
- dot product of two vectors, creating / Calculating the dot product of two vectors
- creating, by adding two vectors / Creating a new vector by adding two vectors together
- appending / Appending vectors and converting a vector of one type to another
- converting from one type to another / Appending vectors and converting a vector of one type to another
- concatenating / Concatenating two vectors
- standard deviation / Standard deviation
- largest value, finding / Find the largest value in a vector
- sum, finding / Finding the sum, square root and log of all the values in the vector
- log, finding / Finding the sum, square root and log of all the values in the vector
- square root, finding / Finding the sum, square root and log of all the values in the vector
- Sqrt function / Finding the sum, square root and log of all the values in the vector
- Log function / Finding the sum, square root and log of all the values in the vector
- with randomly distributed values / Vectors and matrices with randomly distributed values, How it works...
- with uniformly distributed random values, creating / Creating vectors with uniformly distributed random values
- with normally distributed random values, creating / Creating vectors with normally distributed random values
- with random values with Poisson distribution, creating / Creating vectors with random values that have a Poisson distribution
W
Y
Z
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.