Index
A
- A/B testing
- actions engine
- Activity Monitor
- Alternating least squares (ALS)
- Analysis of Variance (ANOVA)
- AngularJS
- annotation
- ANother Tool for Language Recognition (ANTLR)
- architecture, Spark
- about / Understanding Spark architecture
- task scheduling / Task scheduling
- Spark components / Spark components
- MQTT / MQTT, ZeroMQ, Flume, and Kafka
- ZeroMQ / MQTT, ZeroMQ, Flume, and Kafka
- Flume / MQTT, ZeroMQ, Flume, and Kafka
- Kafka / MQTT, ZeroMQ, Flume, and Kafka
- HDFS / HDFS, Cassandra, S3, and Tachyon
- Cassandra / HDFS, Cassandra, S3, and Tachyon
- S3 / HDFS, Cassandra, S3, and Tachyon
- Tachyon / HDFS, Cassandra, S3, and Tachyon
- Mesos / Mesos, YARN, and Standalone
- YARN / Mesos, YARN, and Standalone
- Standalone / Mesos, YARN, and Standalone
- Aster Data
- Avro
- AvroParquet
- Azkaban
B
- Balancer
- basic sampling
- Box-Cox transformation
- Breeze
- build.sbt file
C
- Cassandra
- categorical field
- central limit theorem (CLT)
- chunking
- classification metrics
- Client
- clique
- Cloudera
- command and control (C2)
- complex types
- connected components
- consistent sampling
- continuous space
- correlation engine
- correlations
D
- data-driven system
- data analysis life cycle / Linear models
- data analytics
- DataFrame
- data frames
- DataFrameStatFunctions
- data ingest
- data rearranging
- data transformation layer
- decision tree
- decision tree, parameters
- descriptive statistics
- Directed Acyclic Graph (DAG)
- Dirichlet distribution
- distributed algorithms
- Drools
- Dropwizard
- Druid
E
- e-mails
- edge list
- edges
- Elastic Net
- Emacs / SBT
- ensemble learning methods
- Expectation Maximization (EM) algorithm
- exploration-exploitation trade-off
- extract, transform, and load (ETL)
F
- FACTORIE toolkit
- feature construction
- Flex
- Flume
- functional approach
G
- Ganglia
- Gaussian mixture
- generalization error
- graph
- graph algorithms
- Graph for Scala
- GraphFrames
- Graphite
- GraphX
H
- Hadoop Distributed File System (HDFS)
- Hadoop HDFS
- HDFS
- heteroscedasticity
- Hive
- Homebrew package
I
- Ignite File System (IGFS)
- Impala
- influence diagrams
- interactivity
- Iris dataset
J
- Java Management Extensions (JMX)
- Java Mission Control (JMC)
- Java Specification Request (JSR) / Linear models
- joda-time library
- JSON format
- JSON package
- JSON support
- JSR110
- JSR 223
- Jython
K
- k-means clustering
- Kafka
- Kamon
- Kelly Criterion
- Kryo
- Kudu
- Kullback-Leibler (KL) distance
L
- LabeledPoint
- labeled point
- Latent Dirichlet allocation (LDA)
- Latent Dirichlet Allocation (LDA)
- Least Absolute Shrinkage and Selection Operator (LASSO)
- LIBSVM format
- Lift
- lift-json library
- Lift framework
- Limited-Memory BFGS (L-BFGS)
- linear regression
- Linear Support Vector Machine (SVM)
- logistic regression
- loss functions
M
- Machine Learning (ML)
- machine learning engine
- map optimization
- Markov Chain Decision Process
- Mesos
- metrics
- micro-batch processing
- mirrors
- MLlib algorithms
- ML libraries
- model monitoring
- monitoring
- Monthly Active Users (MAU)
- MQTT
- multiclass problems
- Multilayer Perceptron Classifier (MLCP)
- Multivariate Analysis of Variance (MANOVA)
- multivariate regression
- MurmurHash function
N
- .NET MyMediaLite library
- NameNode
- Namenode UI
- nested data
- NodeJS
- Node Manager
- nodes
- numeric field
O
- object-oriented approach
- Online Transaction Processing (OLTP)
- Oozie
- optimization
- outputs, linear models
- overfitting
P
- PageRank algorithm
- parameters, SparkR glm implementation
- Paretto chart
- Parquet
- parquet file
- pattern matching
- Pearson correlation coefficient
- perceptron
- Play
- Play framework
- Poisson distribution
- Porter Stemmer
- POS (part-of-speech) tagging
- Power Iteration Clustering (PIC)
- Principal Component Analysis (PCA)
- probabilistic structures
- problem dimensionality
- process monitoring
- Project Gutenberg
- projections
- Protobuf
- pseudo-regret
- PySpark / PySpark
- Python
- Python, calling from Java/Scala
R
- R
- read-evaluate-print-loop (REPL)
- Receiver Operating Characteristic (ROC)
- regression
- regression trees
- regularization
- Remote Procedure Call (RPC)
- Resilient Distributed Dataset (RDD)
- Resource Manager
- risk handling
- ROC
- Rsclient/Rserve
- RStudio
- Rsync
- Run-Length Encoding (RLE)
S
- S3
- SBT
- sbteclipse project
- Scala
- Scala, integrating with Python
- Scala, integrating with R
- Scala API
- scalastyle plugin
- Scala Swing
- Scalate template
- Scalatra
- Secondary Namenode
- segmentation
- sequential trials
- serialization
- serialization formats
- sessionization
- Singular Value Decomposition (SVD)
- Slick
- Spark
- Spark, applications
- SPARK-3703
- Spark Master
- Spark Notebook
- Spark Notebooks
- SparkR
- Spark RDDs
- Spark SQL
- Standalone
- Stochastic Gradient Descent (SGD)
- Stochastic Gradient Descent (SGD) algorithm
- stratified sampling
- streaming k-means
- StreamSets
- strongly connected components
- supervised learning
- SVD++
- SVMWithSGD
- Syslog
- system monitoring
T
- Tachyon
- task scheduling
- Term Frequency Inverse Document Frequency (TF-IDF)
- text analysis pipeline
- Thrift
- tokenization
- traits
- triangle counting algorithm
- triangle inequality
- Turkey paradox
U
- UI component
- unknown unknowns
- unstructured data
- unsupervised learning
V
- Vapnik-Chervonenkis (VC) dimension
- Vector
- vertices
- vi / SBT
W
X
Y
Z
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.