Previous Chapter

Index

A

AAN
- about / K-Means in practice, ANN – Artificial Neural Networks
- theory / Theory
- Spark server, sparkling / Building the Spark server
- using / ANN in practice
account management, Databricks
- about / Account management
Amazon AWS
- URL / Amazon EC2
- pricing, URL / Amazon EC2
Amazon EC2
- about / Amazon EC2
- URL / Amazon EC2
Amazon Elastic Compute Cloud (EC2) / Installing Databricks
Apache Giraph / Overview
Apache Kafka
- URL / Kafka
- about / Kafka
- JAR library file, URL / Kafka
Apache Mesos / Apache Mesos
Apache Spark
- overview / Overview
- URL / Overview, Overview, Further reading
- Spark Machine Learning / Spark Machine Learning
- stream processing / Spark Streaming
- SQL module / Spark SQL
- graph processing / Spark graph processing
- extended eco system / Extended ecosystem
- future / The future of Spark
- cluster design / Cluster design
- cluster management / Cluster management
- performance, examining / Performance
- SQL context / The SQL context
- used, for accessing HBase / Accessing HBase with Spark
- used, for accessing Cassandra / Accessing Cassandra with Spark
- Titan, accessing with / Accessing Titan with Spark
Apache Spark streaming
- overview / Overview
- URL / Overview
- errors / Errors and recovery
- recovery / Errors and recovery
- HDFS-based checkpoint, setting up / Checkpointing
- data sources / Streaming sources
Apache YARN / Apache YARN
architecture, H2O / Architecture
Artificial Neural Net (ANN) / Sourcing the data
AWS
- URL / Installing Databricks
AWS billing / AWS billing

B

BaseConfiguration method / Alternative Groovy configuration
Bruce Penn
- URL / The Hadoop file system

C

Cassandra
- Titan, accessing with / Titan with Cassandra
- installing / Installing Cassandra
- accessing, with Apache Spark / Accessing Cassandra with Spark
classifications, with Naïve Bayes
- about / Classification with Naïve Bayes, Naïve Bayes in practice
- theory / Theory
closeness centrality algorithm
- about / The closeness centrality algorithm
Cloudera
- URL / Local Hive Metastore server
cluster design, Apache Spark / Cluster design
clustering, with K-Means
- about / Clustering with K-Means
- theory / Theory
cluster management
- about / Cluster management
- local mode / Local
- standalone mode / Standalone
- Apache YARN / Apache YARN
- Apache Mesos / Apache Mesos
- Amazon EC2 / Amazon EC2
cluster management, Databricks
- about / Cluster management
connected components algorithm
- about / The connected components algorithm

D

dashboards / Overview
data
- importing / Importing and saving data
- saving / Importing and saving data
- text files, processing / Processing the Text files
- JSON files, processing / Processing the JSON files
- Parquet files, processing / Processing the Parquet files
- sourcing / Sourcing the data
- quality / Data Quality
- moving / Moving data
- table data, importing / The table data
- folder, importing / Folder import
- library, importing / Library import
databases / Overview
Databricks
- URL / The future of Spark, Amazon EC2, Cloud, Overview, Further reading
- overview / Overview
- installing / Installing Databricks
- AWS billing / AWS billing
- menu / Databricks menus
- account management / Account management
- cluster management / Cluster management
- Notebooks / Notebooks and folders
- folder / Notebooks and folders
- jobs / Jobs and libraries
- libraries / Jobs and libraries
- references / Further reading
Databricks file system (DBFS) / The table data
Databricks tables
- about / Databricks tables
- creating, via data import / Data import
- external tables / External tables
DataFrames
- about / DataFrames
data sources, Apache Spark streaming
- Kafka / Overview
- Flume / Overview, Flume
- HDFS / Overview
- about / Streaming sources
- TCP stream / TCP stream
- file streams / File streams
- Apache Kafka / Kafka
DataStax Spark Cassandra connector
- URL / The Spark Cassandra connector
data visualization
- about / Data visualization
- dashboards / Dashboards
- RDD-based report / An RDD-based report
- stream-based report / A stream-based report
DBFS
- accessing / Databricks file system
dbutils.fs class
- about / External tables
dbutils package
- about / The DbUtils package
- DBFS / The DbUtils package
- fsutils group / Dbutils fsutils
- cache functionality / The DbUtils cache
- mount functionality / The DbUtils mount
deep learning
- about / Deep learning
- URL / Deep learning
- Scala-based H2O Sparkling Water example / Example code – income
- MNIST / The example code – MNIST
development environments, Databricks
- about / Development environments
discrete stream (DStream) / Overview
Docker
- URL / Installing Docker
- installing / Installing Docker

E

end of file markers (EOF) / Using Cassandra
environment, H2O
- processing / The processing environment
environment configuration, MLlib
- architecture / Architecture
- development environment / The development environment
- Spark, installing / Installing Spark
Extract, Transform, Load (ETL)
- about / Architecture

F

False Positive Rate (FPR) / H2O Flow
Flume
- about / Flume
- URL / Flume
folder / Notebooks and folders

G

graph, creating
- counting example / Example 1 – counting
- filtering example / Example 2 – filtering
- PageRank algorithm / Example 3 – PageRank
- triangle counting / Example 4 – triangle counting
- connected components / Example 5 – connected components
GraphInputFormat class / Using HBase
graph processing, Apache Spark / Spark graph processing
GraphX
- overview / Overview
- coding / GraphX coding
GraphX coding
- about / GraphX coding
- environment / Environment
- graph, creating / Creating a graph
Gremlin language / TinkerPop

H

H2O
- overview / Overview
- environment, processing / The processing environment
- system versions, URL / The processing environment
- installing / Installing H2O
- Sparkling Water download option, URL / Installing H2O
- build environment / The build environment
- architecture / Architecture
- URL / Architecture
- performance tuning / Performance tuning
H2O flow
- about / H2O Flow
- URL / H2O Flow
hadoop / The development environment
Hadoop file system / The Hadoop file system
Hadoop Gremlin
- URL / TinkerPop's Hadoop Gremlin
HBase
- Titan, accessing with / Titan with HBase
- accessing, with Apache Spark / Accessing HBase with Spark
head function / Dbutils fsutils
Hernan Amiune
- URL / Theory
Hive
- using / Using Hive
- local Metastore server / Local Hive Metastore server
- -based Metastore server / A Hive-based Metastore server
Hive-based Metastore server
- using / A Hive-based Metastore server

J

JavaScript Object Notation (JSON) files
- processing / Processing the JSON files
jobs
- about / Jobs and libraries

K

K-Means
- clustering / Clustering with K-Means
- using / K-Means in practice

L

LabeledPoint
- URL / Naïve Bayes in practice
libraries
- about / Jobs and libraries
local Hive Metastore server
- using / Local Hive Metastore server

M

markdown
- URL / Notebooks and folders
Mazerunner, for Neo4j
- about / Mazerunner for Neo4j
- Docker, installing / Installing Docker
- Neo4j browser / The Neo4j browser
- algorithms / The Mazerunner algorithms
Mazerunner algorithms
- about / The Mazerunner algorithms
- PageRank algorithm / The PageRank algorithm
- closeness centrality algorithm / The closeness centrality algorithm
- triangle count algorithm / The triangle count algorithm
- connected components algorithm / The connected components algorithm
- strongly connected components algorithm / The strongly connected components algorithm
MLlib
- environment configuration / The environment configuration
MNIST
- URL / Sourcing the data
- about / The example code – MNIST

N

Naïve Bayes
- classification / Classification with Naïve Bayes
- using / Naïve Bayes in practice
- URL / Naïve Bayes in practice
Neo4j browser
- about / The Neo4j browser
- URL / The Neo4j browser
Notebook / Notebooks and folders

O

OOM (Out of Memory) messages / Memory
Oryx system
- URL / Cloud

P

P (Spam|Buy) / Theory
PageRank algorithm
- about / The PageRank algorithm
Parquet files
- about / Importing and saving data
- processing / Processing the Parquet files
performance
- examining / Performance
- cluster structure / The cluster structure
- Hadoop file system / The Hadoop file system
- data locality / Data locality
- OOM (Out of Memory) messages, avoiding / Memory
- code, tuning / Coding
PostgreSQL connector library
- URL, for download / A Hive-based Metastore server
PredictionIO
- URL / Cloud

R

remove function(rm) / Dbutils fsutils
REST interface
- about / REST interface
- configuration / Configuration
- cluster management / Cluster management
- execution context / The execution context
- command execution / Command execution
- libraries / Libraries

S

SeldonIO
- URL / Cloud
Sister property / Overview
Sparkling Water component, H2O
- URL / The processing environment, Architecture
Spark Machine Learning / Spark Machine Learning
SparkOnHBase module
- URL / Spark on HBase
Spark SQL / Spark SQL
SQL
- using / Using SQL
SQL context
- about / The SQL context
streaming, Apache Spark / Spark Streaming
stream processing / Spark Streaming
strongly connected components algorithm
- about / The strongly connected components algorithm

T

tertiary education / Data visualization
textFile method / Processing the Text files
text files
- processing / Processing the Text files
TinkerPop
- about / TinkerPop
- URL / TinkerPop
Titan
- about / Titan
- URL / Titan, Installing Titan
- installing / Installing Titan
- accessing, with HBase / Titan with HBase
- accessing, with Cassandra / Titan with Cassandra
- accessing, with Apache Spark / Accessing Titan with Spark
Titan, accessing with Apache Spark
- about / Accessing Titan with Spark
- Gremlin shell / Gremlin and Groovy
- Groovy commands, executing / Gremlin and Groovy
- TinkerPop Hadoop Gremlin package / TinkerPop's Hadoop Gremlin
- alternative Groovy configuration / Alternative Groovy configuration
- Cassandra, using / Using Cassandra
- HBase, using / Using HBase
- file system, using / Using the filesystem
Titan, accessing with Cassandra
- about / Titan with Cassandra
- Cassandra, installing / Installing Cassandra
- Gremlin Cassandra script / The Gremlin Cassandra script
- Spark Cassandra connector / The Spark Cassandra connector
Titan, accessing with HBase
- about / Titan with HBase
- HBase cluster, using / The HBase cluster
- Gremlin HBase script / The Gremlin HBase script
- SparkOnHBase module, using / Spark on HBase
TitanFactory.open method / Using Cassandra
triangle count algorithm
- about / The triangle count algorithm
True Positive Rate (TPR) / H2O Flow
Twitter
- URL / A stream-based report

U

user-defined functions (UDFs)
- about / User-defined functions

V

velox system
- URL / Cloud
Vendor AP / TinkerPop

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

A

B

C

D

E

F

G

H

J

K

L

M

N

O

P

R

S

T

U

V

Table of Contents for
Index