Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

a

acast 81
ACC (accuracy) 170, 172–173, 179
ACF (autocorrelation function) 178
ACID (Atomicity, Consistency, Isolation, Durability) 86
active state 7
ADD 99–101, 107, 112–113, 117
additive model 178
advanced analytics 1
Agglomerative Clustering 175
aggregate functions 97, 104–105
airquality dataset 21–22, 27–29, 102, 153
algebra 4
algorithms 22, 166, 168, 169, 174–176
aligned 9
Allaire, J. J. 183
allocation 181
alpha value 163
alphabetical order 89
ALTER TABLE 94, 99–101, 107, 112–113
alternative hypothesis 163
analysis 1–5, 7–8, 14, 19–20, 22, 29–30, 51–58, 60, 75, 151, 160, 167, 177, 179–182
AND 90–93
angle 168, 181
annual 2
anscombe 129, 130
Apache Hive 85
application 1, 7, 179
AR (autoregressive model) 179
area under the curve (AUC) 165, 174
ARIMA 178–179
ARIMAX 178
arithmetic 36, 159, 179
arrays, 3, 23, 119, 121—123 125
AS 98–99
ascending order 89, 105
as.Date function 39
assigning values 12, 33, 42, 46
assignment operators 47
association 180
asymmetry 162
Atkins, A. 183
Atomicity, Consistency, Isolation, Durability (ACID) 86
AUC (area under the curve) 165, 174
autocorrelation function (ACF) 178
autoregressive model 178–179
availability 85, 86
avg.() 97
axes 130

b

Balanced Iterative Reducing and Clustering using Hierarchies (Birch) 176
bar chart 130
Bar‐Line Plot 130, 133, 134, 143
Bar Plot 130–132, 143
Barr, Anthony James 1
BASE (Basically Available, Soft state, Eventual consistency) 86
Base SAS 2, 3, 86
Basically Available, Soft state, Eventual consistency (BASE) 86
Bayes theorem 166, 169
Bayesian network 179
Bell Laboratories 2
bell curve 161
Bernoulli 161
BETWEEN 103–104
Big Data xiii
binary classifier system 169, 171–173
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) 176
Book table 110
Box Plot 130, 135, 144
brackets 22
breadth 19
Brow, Dan 113, 116
Brown, Dan 107, 112
Brownlee, J. 183
browser 2, 154, 156
Bubble Plot 130, 136, 144
bundle 3
business analytics xiii, 1

c

Calculator 35
CALL 119
Call Symput 125–126
CAP theorem 85
carburettors 79
cards 9, 15, 45
categorical variables 33, 48, 57–58, 75, 159, 160
CDF (cumulative distribution function) 165
central limit theorem 162
central tendency 51, 160, 161
centroid 175, 176
CF Nodes (Characteristic Feature nodes) 176
CFT (Characteristic Feature Tree) 176
Chambers, John 2
Chang, W. 183
character variables 9, 15, 17, 22–23, 25, 33, 46
Characteristic Feature nodes (CF Nodes) 176
Characteristic Feature Tree (CFT) 176
Cheng, J. 183
Chi Square 161, 163
Cij 171
circle chart 130
CLASS 51, 53, 104
click 1–2
clustering 174–176, 180
Collins 107, 112
comma elimination 34
Comma Separated Value (CSV) Files 10, 11
completeness 174
complex matrix 180
Composite Method 179
comprehensive 3, 54
Comprehensive R Archive Network (CRAN) 3, 174
compress 45
concatenation 44
confusion matrix 171–173
consistency 85–86
constant 178
continuous variable 159–160
core 176
Corpus 180
Cosine similarity 181
count() 97
CRAN (Comprehensive R Archive Network) 3, 174
create table 98
cross tabulations 75–82
cross validation 167
CrossTable 80
Croston 179
CSV (Comma Separated Value) Files 10, 11
Cth column 22
cumulatives 75–76, 165
cumulative distribution function (CDF) 165
curse 180
curse of dimensionality 180
cyclicality 177
cylinders 79, 140

d

Dahl, Roald 107, 112, 113, 116
Dan, Brown 107, 112, 113, 116
dashes 39
data 1–5, 7–35, 37–39, 42–46, 48, 51–58, 60, 62–70, 75–79, 81–82, 85–86, 88–91, 94–98, 100, 102, 104, 106–108, 110, 112–127, 129–131, 133–143, 145, 147, 151, 153–157, 159–169, 171, 173–181, 183–184
database 3, 10, 85–86, 88, 167
data cleaning 29–31
data, importing of 7–13
data input 13–15
data inspection 19–22
data, missing values 22–29
Data_null_ 125
data output 151–156
data, printing of 16–17
Data science 7, 19, 31, 129, 151, 159, 166
Data Scientists 159–181
Data Step 4
data visualization 129–148
DataFrames 85
datalines 9, 15
dataset 8–10, 14–17, 23, 29, 51–52, 58, 66, 78, 85–86, 94–95, 99, 102, 114–115, 122–123, 125–126, 129–130, 151, 161, 167, 170, 175
datasetname 151
Data.Table package 60, 62
Date data 37–43, 45, 47–48, 124
datepart 38–39
datetime data 37
DBMS 10, 22, 87, 95, 97, 151
DBSCAN 176
dcast 81, 83
ddmmyy 38
debugging 122
Deceptio 113, 116
decimal points 52
Decision Trees (DTs) 170, 171, 174
decomposition 177, 180
decreasing 178
deep learning xiii
DELETE 22, 25, 28, 95–97, 99
delimited 9
delimiter 47
demand 179
dendrogram 175
density 176
dependent variable 166
descending order 89
describe function 60–61, 160–162
descriptive statistics 52, 160–161, 166
DescTools 47
difftime 38, 41–42
digits 39, 175
dim 20–21, 23
dimensionality reduction 180
dimensions 20, 130
discrete variable 159–161
disk 130
dispersion 51, 160–161
dmy 41–42
documents 42, 151, 153–156, 180–181
Document Term Matrix (DTM) 180
documentation 31, 53
dollar elimination 34
double quotes 12, 42, 46
dplyr package 63–64, 67–69
DROP 23, 69
dsd 9, 15
DT 38, 62
dta() function 12
Document Term Matrix (DTM) 180
DTs (Decision Trees) 170, 171, 174
duplicate 57, 176
dynamic documents 156

e

Econometrics and Time Series Analysis (ETS) 2–3
Eigendecomposition 180
Elastic Net regression 168
elbow 177
emails 42
encoded 22
end xiii, 20, 23, 119–120, 122–123, 133, 141
error 22, 29, 31, 54, 85–86, 122, 160, 163–164, 168, 171–172, 178–179
ETS (Econometrics and Time Series Analysis) 2, 4, 178–179
Euclidean distances 170, 180
Excel 10–12, 126, 151
exogenous variable 179
exploratory data analysis 19, 52, 160
Exponential Smoothing 178, 179
extraction 47
Extreme Gradient Boosting (XGBoost) 171

f

factors 20, 48, 64, 66, 75, 79, 166, 172
factorization 180
false null hypothesis 164
Fastclus 174
F‐beta score 172
FCMP 119
FMCP 119
FNR (False Negative Rate) 172
for loop 23
forecasting 177–179, 183
FORMAT 37
Fortran code 3
FPR (False Positive Rate) 172–173
fread () 11, 62
FREQ procedure 75–77
frequency distributions 75–82, 160–161
functions 3–7, 9, 11–13, 16–17, 23, 30–31, 37, 39–41, 48, 51, 60, 62–63, 67–69, 80, 86, 94, 96–97, 104, 112, 117–119, 121, 123, 125–127, 161, 165–168, 174, 176, 178, 180
FUNCTIONNAME 119
fwrite 153–154

g

Garbage In Garbage OUT (GIGO) 7
Gaussian mixture models 176
Gentleman, R. 2, 184
getnames 10, 22, 87, 95, 97
GIGO (Garbage In Garbage OUT) 7
glm 169
GNU project 2
Goodnight, James 1
Gramfort, A. 183
GRAPH (Graphics and presentation) 2
group_by() 70
group by analysis 57–60, 104–106
gtables package 80
guarantee 85–86

h

handling dates 37–42
handling numeric data 33–36
handling strings data 42–48
HAVING 105–106
Hclust (hierarchical clustering) 175
HeatMap 130, 137, 145
Hidden Markov models (HMM) 179
hierarchical clustering (Hclust) 175
histogram 130, 138, 146
HMisc package 60
HMM (Hidden Markov models) 179
Hoirnik, K. 184
homogeneity 174
html 2–3, 126–127, 151–152, 154–157
Hypothesis Testing 160, 163–164

i

IDE RStudio 3
‘if’ statement 23, 28–30
Ignore 22
Ihaka, Ross 2
IML (interactive matrix language) 2, 4
importing data 77–1
independence 165, 169
inferential statistics 160
INFORMAT 34, 37
inner join 110–112, 116, 181
INPUT statement 8, 14, 37, 43–44
input data 77–1
INR 122
INSERT INTO statement 94–96, 106–108, 112–113
install.packages 3
intck option 38–39
integers 33, 40, 106–107, 112–113
integral types 33
Internet 42, 181
INTO 94, 108
IS NULL or IS NOT NULL 102

j

Jack 106–107, 112–116
JOIN 110–112
json files 10
jsonlite package 10

k

keys 108
K‐Fold Cross Fitting 167
KMeans 174, 175
knit document 154–156
kNN (K Nearest Neighbors) 170
Kolmogorov‐Smirnov non‐parametric 164
Kullback–Leibler divergence 180
kurtosis 51, 54, 162

l

languages xiii, 1–5, 9, 37, 85, 122, 129, 183
lapply 60
LAR (Least Angle Regression) 168
LARS 168
Lasso regression 168
LDA (Latent Dirichlet Allocation) 181
LeaveOneOut (LOO) 167
left join 110–112, 115–116
libname statement 114–116
library 5, 7, 10–12, 41, 47, 52, 62, 64, 80–81, 88, 115, 153–156
LIKE 103
lines 4, 8–9, 14–15, 83, 129–130, 133–134, 139, 143, 146
Line Chart 130, 139, 146
Line Graph 130
Line Plot 146
linear regression 167
llist 104
lm 167
log 122, 124, 126, 169
logarithm 169
logistic regression 4, 169, 174
logit 169
Long Short‐Term Memory‐(LSTM) units 179
LOO (LeaveOneOut) 167
loops 119–121
loss function 168
LSTM (Long Short‐Term Memory) 179
lubridate package 41–42
Luraschi 183

m

machines xiii, 2, 19, 22, 169–170, 183
Macros 119, 121, 123, 125
MAPD (mean absolute percentage deviation) 179
MAPE (mean absolute percentage error) 179
Markdown 156
max 24, 27, 29, 51–52, 54, 58–60, 80, 97–98, 133–134
maxdec 52
maximum 30, 52, 97–98, 160
McPherson, J. 183
mdy 41
mean 21–22, 24–31, 46, 51–54, 56–57, 59–62, 70–71, 81–82, 104, 124, 129, 133–134, 151, 160–161, 163, 168, 172, 174, 176, 178–179
mean absolute percentage deviation (MAPD) 179
mean absolute percentage error (MAPE) 179
MeanShift 176
median 21–22, 27–29, 51–52, 54, 59–61, 130, 148, 161, 164–165, 168
mice package 22
min 21, 27, 29, 51–52, 55, 59–60, 97–99, 133–134
Miner 166
MiniBatchKMeans 176
minimize 167, 171, 180
minimum 30, 52, 97–99
missing values 13, 20, 22–29, 97, 102, 106
missover 9, 15
ML (Machine Learning), xiii
mode 161
MODIFY 99
Mosaic Plot 130, 140, 147
mtscaled 145
mu 161

n

NA 13, 21–25, 27–29, 31, 46–47, 81, 100, 102
Naive Bayes 169
negative 33, 37, 64, 162, 164, 171–173, 181
network 3, 85, 171, 179
Neural Nets 171, 179
Ngram 181
nlevels 75–77
nocol 76–77
nocum 76–77
nodes 85, 176
nominal variable 159–160, 166
nonintegral types 33
nopercent 76–77
normal distribution 161–162, 165
norow 76–77, 83
NoSQL 86
NOT 90–93
NULL 14, 102, 106
null hypothesis 163–164
numbers 9–10, 12, 15, 33, 35, 37, 39, 41, 43, 45–47, 52, 67, 70, 75–79, 85, 87–91, 94, 97, 105, 108, 117, 122, 125, 129, 159–161, 165–166, 170–174, 176–177
numeric data 8, 14, 33–36, 40, 44–45, 47–48, 78
numerical summary 51–71

o

object xiii, 3, 5, 9, 12–13, 20, 40, 46, 62, 94, 151, 153, 174–175
odds 169
ODS 126, 131, 134–142, 151
Ohri, Ajay 1, 7, 13, 19, 33, 43, 51, 75, 85, 119, 129, 151, 159, 183
operations research (OR) 2
operators 3, 28–30, 47, 64, 90, 104
OR 2, 90
Order By 89–93, 99, 105–107
ordinal variable 159–160
outliers 169
outobs 87–88, 102–103
output data 3–4, 14, 20, 30, 55–56, 63, 67, 80, 88–89, 96, 102, 119–120, 123, 126, 130, 151–157, 166
overfitting 167
ozone 4, 19–29, 102

p

PACF (partial autocorrelation function) 178
package 3–8, 10–12, 16–17, 22, 30–31, 41–42, 48, 60, 62–63, 80–82, 85–86, 88, 115, 117, 153–154, 156, 166, 179
PACKAGENAME 3
pandas 85
pandasql 85
parameters 12–13, 17, 62, 119, 126, 160, 167, 171, 176, 178
parametric 164, 171
Parmigiana, G. G. 184
partial autocorrelation function (PACF) 178
partition tolerance 85
patterns 103, 130, 153–155, 179
PCA (principal component analysis) 175, 180–182
PDF 153–155, 165
peaking 162
Pedregosa, F. 183
permanent dataset 10, 114
pictures 153
pie chart 130, 141, 147
pipeline 7, 19
plain text format 156
plots 4, 55–56, 129–136, 139–144, 146–149, 152–154, 173, 177–178
Poisson distribution 161
polynomial regression 168
positive 33, 37, 162, 164, 171–174, 180–181
POSIXct function 40–41
Powerpoint 151
precision 160, 172–173
predictive analytics 1, 166
predictors 166–169
PRIMARY KEY 108
principal component analysis (PCA) 175, 180–182
PRINT 46, 57
print formatted string 46
printing data 16–17, 20
probability 55–56, 161–163, 165–166, 172
Proc, xiii 4–5, 9–10, 16–20, 22–26, 29–30, 33–35, 37–39, 42–45, 48, 51–58, 60, 67–69, 75–77, 79, 86–91, 94–109, 111–115, 117–120, 122–124, 126, 131, 133–142, 151, 156–157, 167, 169, 174, 184
procedures, xiii 1–2, 4, 7, 9, 16–17, 24, 51–54, 57, 74–77, 119, 124, 163, 171, 180
processes 7, 31, 129, 163, 167, 179–180
products 1–3, 181
programs 1, 4, 122, 125, 151
projects 2–3, 19, 31, 174
PUT 35, 124
p‐value 163
PySpark 85
Python 85, 183

q

Quantile regression 168
quantitative variables 159
Quartiles 161
quicklt 62
quotation marks 42
quotes 12–13, 25, 46

r

R 1–5, 77–1, 19–31, 33–48, 51–71, 78–82, 85–117, 119–121, 143–148, 151, 153–156, 166–169, 174, 179
random subset 58
random forest 170
ranges 51, 54, 64, 67, 133, 148, 159, 161
ranuni function 58, 70
rate 172–174
raw data file 8–10, 14, 17
RConnect 156
RDBMS (Relational Database Management Systems) 3, 10, 85–86
READING DATE 37–39
readr package 5, 10–11, 153, 155
readxl package 11
recall 172–173
rectangle 130
Recurrent Neural Networks (RNN) 179
regression 4, 129, 166–169, 171, 174, 184
regularization 168
regularizer 168
Relational Database Management Systems (RDBMS) 3, 10, 85–86
replace 22–23, 26, 28–29, 31, 45, 97, 102, 151
Ridge regression 168
rmarkdown 156–157
RNN (Recurrent Neural Networks) 179
ROC (receiver operating characteristic) 173–174, 181
RODBC package 3, 10
row 10, 12, 14, 19–20, 22–23, 25, 51, 56, 62, 67, 70–71, 76, 87–94, 96, 100, 102–103, 105, 108, 144
RPubs 154, 156
RStudio 3, 153–154, 156
Rth 22

s

sample_frac 70
sample_n() 70–71
SAS 1–10–0, 12–14, 16–20, 22–26, 28–34, 36–38, 40, 42, 44, 46, 48, 51–54, 56–58, 60, 62, 67–70, 75–76, 78, 80, 85–86, 88–94, 96–122, 124–127, 129–132, 134–142, 144, 146, 148–149, 151–152, 154, 156–157, 159–160, 162, 164, 166–170, 172, 174, 176, 178–180
SAS date value 37
SASStudio 141
Scatter Plot 130
select 2, 57–58, 63–65, 86–106, 108, 110–111, 140, 168
sentiment analysis 181
SES (simple exponential smoothing) 179
sessionInfo 63
SET 100
Silhouette method 177
singular‐value decomposition (SVD) 179–181
skewness 51, 54, 162
SMA 178
Spark, xiii 85
SPC 172
special characters 46
specificity 172–174
Split‐Apply‐Combine 51
spread 81
SPSS 10, 12
SQL 3–4, 57–58, 85–91, 93–113, 115
SQL Aliases 98
SQLContext 85
sqldf package 3, 86, 88–90, 92–94, 96–106, 109
squarebrackets 22, 161, 163, 167–168, 174
SSA (singular spectrum analysis) 179, 181
standard deviation 51–52, 160–161, 170, 176
STAT (statistical analysis) 2, 134
STATA 10, 12
stationarity 178
statistics 1, 52–54, 57, 70, 81–82, 159–163, 165, 167, 169, 171, 173, 175, 177, 179, 181
statistical methods 1–2, 4, 54, 129–130, 160, 162–164, 167, 178–181, 183–184
Std 30, 52, 54
stdize 26, 29
str 5, 20–21, 28, 30–31, 43, 46–47, 64–66, 78–79, 100
strDates 39
strings 8, 13–14, 22, 33, 35–37, 39, 41–43, 45–48, 103, 107, 125
stringr 30
strptime 40–41
subclusters 176
subset 4, 58, 62, 80, 176
substr 125
substring 33, 43
substrn 43
summarise() 70
support vector machines (SVM) 169–170
SVD (singular‐value decomposition) 180
SVM (support vector machines) 169–170
symbolgen 122–124, 126
symput 125–127
syntax 4–5, 12, 22, 48, 56–57, 85–86, 88, 117, 119
Sys.Date 40, 124
SYSDAY 124

t

table package 4–5, 10–11, 28, 51, 56, 60, 62, 75–83, 85, 94, 98–101, 104, 106–108, 110, 112–113, 130, 140, 147, 153–154, 156
t‐Distributed Stochastic Neighbor Embedding (t‐SNe) 180
TDM (term document matrix) 180
TF‐IDF 180
theorem 85, 162, 166, 169
theory 162, 165
Tikhonov regularization 168
time series analysis 177–179
TNR (true negative rate) 172
topic modeling 181
tp 172
TPR (true positive rate) 172–173
translate function 45
transmute() 69
Trimn function 43
trimws 47
triplet 130
t‐SNe (t‐Distributed Stochastic Neighbor Embedding) 180
tz option 40–41

u

UNION ALL 108
UNIQUE 108
univariate 54–55, 60, 179
university 1–2, 183
upcase 125
UPDATE 8, 94, 100–101
USD 122
Ushey, K. 183
utils package 11

v

VAR 5, 19–20, 23–26, 30, 34, 51, 53–56, 81–82, 87, 91, 94–96, 99, 102, 104, 122, 124, 133, 179
variability 54, 160
variables 5, 9, 15, 17, 19–20, 22–26, 29–37, 39, 42, 44, 48, 51–54, 57–58, 64–66, 69, 75–79, 81–83, 88, 94, 99–100, 104, 106, 121–127, 130, 148–149, 159–162, 165–169, 171, 177–180
variance 4, 51, 54, 129, 161, 164, 174, 178
VARIMAX 179
Varoquaux, G. 183
vector 13–14, 36, 42, 46, 48, 169–170, 179, 181
visualization, xiii 19, 129–131, 133, 135, 137, 139, 141, 143, 145, 147
VM (Virtual Machine) 2
VMware 2

w

web scraping 181
Weisstein, E. W. 183, 184
WHERE 58, 89–90, 96, 100, 103–104
whitespace 45, 47
Wickham, H. 183, 184
Williams, G. J. 184
Word cloud 180

x

XGBoost (Extreme Gradient Boosting) 171, 174
Xie, Y. 183
XLS 10–12, 126, 151
xlsx 11
xtabs 79, 81

y

ymd 41

z

zero 33, 168, 179, 181

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

x

y

z

Table of Contents for
Index