a
- acast 81
- ACC (accuracy) 170, 172–173, 179
- ACF (autocorrelation function) 178
- ACID (Atomicity, Consistency, Isolation, Durability) 86
- active state
- ADD 99–101, 107, 112–113, 117
- additive model 178
- advanced analytics
- Agglomerative Clustering 175
- aggregate functions 97, 104–105
- airquality dataset 21–22, 27–29, 102, 153
- algebra
- algorithms 22, 166, 168, 169, 174–176
- aligned
- Allaire, J. J. 183
- allocation 181
- alpha value 163
- alphabetical order 89
- ALTER TABLE 94, 99–101, 107, 112–113
- alternative hypothesis 163
- analysis –5, –8, 14, 19–20, 22, 29–30, 51–58, 60, 75, 151, 160, 167, 177, 179–182
- AND 90–93
- angle 168, 181
- annual
- anscombe 129, 130
- Apache Hive 85
- application , , 179
- AR (autoregressive model) 179
- area under the curve (AUC) 165, 174
- ARIMA 178–179
- ARIMAX 178
- arithmetic 36, 159, 179
- arrays, , 23, 119, 121—123 125
- AS 98–99
- ascending order 89, 105
- as.Date function 39
- assigning values 12, 33, 42, 46
- assignment operators 47
- association 180
- asymmetry 162
- Atkins, A. 183
- Atomicity, Consistency, Isolation, Durability (ACID) 86
- AUC (area under the curve) 165, 174
- autocorrelation function (ACF) 178
- autoregressive model 178–179
- availability 85, 86
- avg.() 97
- axes 130
b
- Balanced Iterative Reducing and Clustering using Hierarchies (Birch) 176
- bar chart 130
- Bar‐Line Plot 130, 133, 134, 143
- Bar Plot 130–132, 143
- Barr, Anthony James
- BASE (Basically Available, Soft state, Eventual consistency) 86
- Base SAS , , 86
- Basically Available, Soft state, Eventual consistency (BASE) 86
- Bayes theorem 166, 169
- Bayesian network 179
- Bell Laboratories
- bell curve 161
- Bernoulli 161
- BETWEEN 103–104
- Big Data xiii
- binary classifier system 169, 171–173
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) 176
- Book table 110
- Box Plot 130, 135, 144
- brackets 22
- breadth 19
- Brow, Dan 113, 116
- Brown, Dan 107, 112
- Brownlee, J. 183
- browser , 154, 156
- Bubble Plot 130, 136, 144
- bundle
- business analytics xiii,
c
- Calculator 35
- CALL 119
- Call Symput 125–126
- CAP theorem 85
- carburettors 79
- cards , 15, 45
- categorical variables 33, 48, 57–58, 75, 159, 160
- CDF (cumulative distribution function) 165
- central limit theorem 162
- central tendency 51, 160, 161
- centroid 175, 176
- CF Nodes (Characteristic Feature nodes) 176
- CFT (Characteristic Feature Tree) 176
- Chambers, John
- Chang, W. 183
- character variables , 15, 17, 22–23, 25, 33, 46
- Characteristic Feature nodes (CF Nodes) 176
- Characteristic Feature Tree (CFT) 176
- Cheng, J. 183
- Chi Square 161, 163
- Cij 171
- circle chart 130
- CLASS 51, 53, 104
- click –2
- clustering 174–176, 180
- Collins 107, 112
- comma elimination 34
- Comma Separated Value (CSV) Files 10, 11
- completeness 174
- complex matrix 180
- Composite Method 179
- comprehensive , 54
- Comprehensive R Archive Network (CRAN) , 174
- compress 45
- concatenation 44
- confusion matrix 171–173
- consistency 85–86
- constant 178
- continuous variable 159–160
- core 176
- Corpus 180
- Cosine similarity 181
- count() 97
- CRAN (Comprehensive R Archive Network) , 174
- create table 98
- cross tabulations 75–82
- cross validation 167
- CrossTable 80
- Croston 179
- CSV (Comma Separated Value) Files 10, 11
- Cth column 22
- cumulatives 75–76, 165
- cumulative distribution function (CDF) 165
- curse 180
- curse of dimensionality 180
- cyclicality 177
- cylinders 79, 140
d
- Dahl, Roald 107, 112, 113, 116
- Dan, Brown 107, 112, 113, 116
- dashes 39
- data –5, –35, 37–39, 42–46, 48, 51–58, 60, 62–70, 75–79, 81–82, 85–86, 88–91, 94–98, 100, 102, 104, 106–108, 110, 112–127, 129–131, 133–143, 145, 147, 151, 153–157, 159–169, 171, 173–181, 183–184
- database , 10, 85–86, 88, 167
- data cleaning 29–31
- data, importing of –13
- data input 13–15
- data inspection 19–22
- data, missing values 22–29
- Data_null_ 125
- data output 151–156
- data, printing of 16–17
- Data science , 19, 31, 129, 151, 159, 166
- Data Scientists 159–181
- Data Step
- data visualization 129–148
- DataFrames 85
- datalines , 15
- dataset –10, 14–17, 23, 29, 51–52, 58, 66, 78, 85–86, 94–95, 99, 102, 114–115, 122–123, 125–126, 129–130, 151, 161, 167, 170, 175
- datasetname 151
- Data.Table package 60, 62
- Date data 37–43, 45, 47–48, 124
- datepart 38–39
- datetime data 37
- DBMS 10, 22, 87, 95, 97, 151
- DBSCAN 176
- dcast 81, 83
- ddmmyy 38
- debugging 122
- Deceptio 113, 116
- decimal points 52
- Decision Trees (DTs) 170, 171, 174
- decomposition 177, 180
- decreasing 178
- deep learning xiii
- DELETE 22, 25, 28, 95–97, 99
- delimited
- delimiter 47
- demand 179
- dendrogram 175
- density 176
- dependent variable 166
- descending order 89
- describe function 60–61, 160–162
- descriptive statistics 52, 160–161, 166
- DescTools 47
- difftime 38, 41–42
- digits 39, 175
- dim 20–21, 23
- dimensionality reduction 180
- dimensions 20, 130
- discrete variable 159–161
- disk 130
- dispersion 51, 160–161
- dmy 41–42
- documents 42, 151, 153–156, 180–181
- Document Term Matrix (DTM) 180
- documentation 31, 53
- dollar elimination 34
- double quotes 12, 42, 46
- dplyr package 63–64, 67–69
- DROP 23, 69
- dsd , 15
- DT 38, 62
- dta() function 12
- Document Term Matrix (DTM) 180
- DTs (Decision Trees) 170, 171, 174
- duplicate 57, 176
- dynamic documents 156
e
- Econometrics and Time Series Analysis (ETS) –3
- Eigendecomposition 180
- Elastic Net regression 168
- elbow 177
- emails 42
- encoded 22
- end xiii, 20, 23, 119–120, 122–123, 133, 141
- error 22, 29, 31, 54, 85–86, 122, 160, 163–164, 168, 171–172, 178–179
- ETS (Econometrics and Time Series Analysis) , , 178–179
- Euclidean distances 170, 180
- Excel 10–12, 126, 151
- exogenous variable 179
- exploratory data analysis 19, 52, 160
- Exponential Smoothing 178, 179
- extraction 47
- Extreme Gradient Boosting (XGBoost) 171
f
- factors 20, 48, 64, 66, 75, 79, 166, 172
- factorization 180
- false null hypothesis 164
- Fastclus 174
- F‐beta score 172
- FCMP 119
- FMCP 119
- FNR (False Negative Rate) 172
- for loop 23
- forecasting 177–179, 183
- FORMAT 37
- Fortran code
- FPR (False Positive Rate) 172–173
- fread () 11, 62
- FREQ procedure 75–77
- frequency distributions 75–82, 160–161
- functions –7, , 11–13, 16–17, 23, 30–31, 37, 39–41, 48, 51, 60, 62–63, 67–69, 80, 86, 94, 96–97, 104, 112, 117–119, 121, 123, 125–127, 161, 165–168, 174, 176, 178, 180
- FUNCTIONNAME 119
- fwrite 153–154
g
- Garbage In Garbage OUT (GIGO)
- Gaussian mixture models 176
- Gentleman, R. , 184
- getnames 10, 22, 87, 95, 97
- GIGO (Garbage In Garbage OUT)
- glm 169
- GNU project
- Goodnight, James
- Gramfort, A. 183
- GRAPH (Graphics and presentation)
- group_by() 70
- group by analysis 57–60, 104–106
- gtables package 80
- guarantee 85–86
h
- handling dates 37–42
- handling numeric data 33–36
- handling strings data 42–48
- HAVING 105–106
- Hclust (hierarchical clustering) 175
- HeatMap 130, 137, 145
- Hidden Markov models (HMM) 179
- hierarchical clustering (Hclust) 175
- histogram 130, 138, 146
- HMisc package 60
- HMM (Hidden Markov models) 179
- Hoirnik, K. 184
- homogeneity 174
- html –3, 126–127, 151–152, 154–157
- Hypothesis Testing 160, 163–164
i
- IDE RStudio
- ‘if’ statement 23, 28–30
- Ignore 22
- Ihaka, Ross
- IML (interactive matrix language) ,
- importing data 77–1
- independence 165, 169
- inferential statistics 160
- INFORMAT 34, 37
- inner join 110–112, 116, 181
- INPUT statement , 14, 37, 43–44
- input data 77–1
- INR 122
- INSERT INTO statement 94–96, 106–108, 112–113
- install.packages
- intck option 38–39
- integers 33, 40, 106–107, 112–113
- integral types 33
- Internet 42, 181
- INTO 94, 108
- IS NULL or IS NOT NULL 102
j
- Jack 106–107, 112–116
- JOIN 110–112
- json files 10
- jsonlite package 10
k
- keys 108
- K‐Fold Cross Fitting 167
- KMeans 174, 175
- knit document 154–156
- kNN (K Nearest Neighbors) 170
- Kolmogorov‐Smirnov non‐parametric 164
- Kullback–Leibler divergence 180
- kurtosis 51, 54, 162
l
- languages xiii, –5, , 37, 85, 122, 129, 183
- lapply 60
- LAR (Least Angle Regression) 168
- LARS 168
- Lasso regression 168
- LDA (Latent Dirichlet Allocation) 181
- LeaveOneOut (LOO) 167
- left join 110–112, 115–116
- libname statement 114–116
- library , , 10–12, 41, 47, 52, 62, 64, 80–81, 88, 115, 153–156
- LIKE 103
- lines , –9, 14–15, 83, 129–130, 133–134, 139, 143, 146
- Line Chart 130, 139, 146
- Line Graph 130
- Line Plot 146
- linear regression 167
- llist 104
- lm 167
- log 122, 124, 126, 169
- logarithm 169
- logistic regression , 169, 174
- logit 169
- Long Short‐Term Memory‐(LSTM) units 179
- LOO (LeaveOneOut) 167
- loops 119–121
- loss function 168
- LSTM (Long Short‐Term Memory) 179
- lubridate package 41–42
- Luraschi 183
m
- machines xiii, , 19, 22, 169–170, 183
- Macros 119, 121, 123, 125
- MAPD (mean absolute percentage deviation) 179
- MAPE (mean absolute percentage error) 179
- Markdown 156
- max 24, 27, 29, 51–52, 54, 58–60, 80, 97–98, 133–134
- maxdec 52
- maximum 30, 52, 97–98, 160
- McPherson, J. 183
- mdy 41
- mean 21–22, 24–31, 46, 51–54, 56–57, 59–62, 70–71, 81–82, 104, 124, 129, 133–134, 151, 160–161, 163, 168, 172, 174, 176, 178–179
- mean absolute percentage deviation (MAPD) 179
- mean absolute percentage error (MAPE) 179
- MeanShift 176
- median 21–22, 27–29, 51–52, 54, 59–61, 130, 148, 161, 164–165, 168
- mice package 22
- min 21, 27, 29, 51–52, 55, 59–60, 97–99, 133–134
- Miner 166
- MiniBatchKMeans 176
- minimize 167, 171, 180
- minimum 30, 52, 97–99
- missing values 13, 20, 22–29, 97, 102, 106
- missover , 15
- ML (Machine Learning), xiii
- mode 161
- MODIFY 99
- Mosaic Plot 130, 140, 147
- mtscaled 145
- mu 161
n
- NA 13, 21–25, 27–29, 31, 46–47, 81, 100, 102
- Naive Bayes 169
- negative 33, 37, 64, 162, 164, 171–173, 181
- network , 85, 171, 179
- Neural Nets 171, 179
- Ngram 181
- nlevels 75–77
- nocol 76–77
- nocum 76–77
- nodes 85, 176
- nominal variable 159–160, 166
- nonintegral types 33
- nopercent 76–77
- normal distribution 161–162, 165
- norow 76–77, 83
- NoSQL 86
- NOT 90–93
- NULL 14, 102, 106
- null hypothesis 163–164
- numbers –10, 12, 15, 33, 35, 37, 39, 41, 43, 45–47, 52, 67, 70, 75–79, 85, 87–91, 94, 97, 105, 108, 117, 122, 125, 129, 159–161, 165–166, 170–174, 176–177
- numeric data , 14, 33–36, 40, 44–45, 47–48, 78
- numerical summary 51–71
o
- object xiii, , , , 12–13, 20, 40, 46, 62, 94, 151, 153, 174–175
- odds 169
- ODS 126, 131, 134–142, 151
- Ohri, Ajay , , 13, 19, 33, 43, 51, 75, 85, 119, 129, 151, 159, 183
- operations research (OR)
- operators , 28–30, 47, 64, 90, 104
- OR , 90
- Order By 89–93, 99, 105–107
- ordinal variable 159–160
- outliers 169
- outobs 87–88, 102–103
- output data –4, 14, 20, 30, 55–56, 63, 67, 80, 88–89, 96, 102, 119–120, 123, 126, 130, 151–157, 166
- overfitting 167
- ozone , 19–29, 102
p
- PACF (partial autocorrelation function) 178
- package –8, 10–12, 16–17, 22, 30–31, 41–42, 48, 60, 62–63, 80–82, 85–86, 88, 115, 117, 153–154, 156, 166, 179
- PACKAGENAME
- pandas 85
- pandasql 85
- parameters 12–13, 17, 62, 119, 126, 160, 167, 171, 176, 178
- parametric 164, 171
- Parmigiana, G. G. 184
- partial autocorrelation function (PACF) 178
- partition tolerance 85
- patterns 103, 130, 153–155, 179
- PCA (principal component analysis) 175, 180–182
- PDF 153–155, 165
- peaking 162
- Pedregosa, F. 183
- permanent dataset 10, 114
- pictures 153
- pie chart 130, 141, 147
- pipeline , 19
- plain text format 156
- plots , 55–56, 129–136, 139–144, 146–149, 152–154, 173, 177–178
- Poisson distribution 161
- polynomial regression 168
- positive 33, 37, 162, 164, 171–174, 180–181
- POSIXct function 40–41
- Powerpoint 151
- precision 160, 172–173
- predictive analytics , 166
- predictors 166–169
- PRIMARY KEY 108
- principal component analysis (PCA) 175, 180–182
- PRINT 46, 57
- print formatted string 46
- printing data 16–17, 20
- probability 55–56, 161–163, 165–166, 172
- Proc, xiii –5, –10, 16–20, 22–26, 29–30, 33–35, 37–39, 42–45, 48, 51–58, 60, 67–69, 75–77, 79, 86–91, 94–109, 111–115, 117–120, 122–124, 126, 131, 133–142, 151, 156–157, 167, 169, 174, 184
- procedures, xiii –2, , , , 16–17, 24, 51–54, 57, 74–77, 119, 124, 163, 171, 180
- processes , 31, 129, 163, 167, 179–180
- products –3, 181
- programs , , 122, 125, 151
- projects –3, 19, 31, 174
- PUT 35, 124
- p‐value 163
- PySpark 85
- Python 85, 183
q
- Quantile regression 168
- quantitative variables 159
- Quartiles 161
- quicklt 62
- quotation marks 42
- quotes 12–13, 25, 46
r
- R –5, 77–1, 19–31, 33–48, 51–71, 78–82, 85–117, 119–121, 143–148, 151, 153–156, 166–169, 174, 179
- random subset 58
- random forest 170
- ranges 51, 54, 64, 67, 133, 148, 159, 161
- ranuni function 58, 70
- rate 172–174
- raw data file –10, 14, 17
- RConnect 156
- RDBMS (Relational Database Management Systems) , 10, 85–86
- READING DATE 37–39
- readr package , 10–11, 153, 155
- readxl package 11
- recall 172–173
- rectangle 130
- Recurrent Neural Networks (RNN) 179
- regression , 129, 166–169, 171, 174, 184
- regularization 168
- regularizer 168
- Relational Database Management Systems (RDBMS) , 10, 85–86
- replace 22–23, 26, 28–29, 31, 45, 97, 102, 151
- Ridge regression 168
- rmarkdown 156–157
- RNN (Recurrent Neural Networks) 179
- ROC (receiver operating characteristic) 173–174, 181
- RODBC package , 10
- row 10, 12, 14, 19–20, 22–23, 25, 51, 56, 62, 67, 70–71, 76, 87–94, 96, 100, 102–103, 105, 108, 144
- RPubs 154, 156
- RStudio , 153–154, 156
- Rth 22
s
- sample_frac 70
- sample_n() 70–71
- SAS 1–10–0, 12–14, 16–20, 22–26, 28–34, 36–38, 40, 42, 44, 46, 48, 51–54, 56–58, 60, 62, 67–70, 75–76, 78, 80, 85–86, 88–94, 96–122, 124–127, 129–132, 134–142, 144, 146, 148–149, 151–152, 154, 156–157, 159–160, 162, 164, 166–170, 172, 174, 176, 178–180
- SAS date value 37
- SASStudio 141
- Scatter Plot 130
- select , 57–58, 63–65, 86–106, 108, 110–111, 140, 168
- sentiment analysis 181
- SES (simple exponential smoothing) 179
- sessionInfo 63
- SET 100
- Silhouette method 177
- singular‐value decomposition (SVD) 179–181
- skewness 51, 54, 162
- SMA 178
- Spark, xiii 85
- SPC 172
- special characters 46
- specificity 172–174
- Split‐Apply‐Combine 51
- spread 81
- SPSS 10, 12
- SQL –4, 57–58, 85–91, 93–113, 115
- SQL Aliases 98
- SQLContext 85
- sqldf package , 86, 88–90, 92–94, 96–106, 109
- squarebrackets 22, 161, 163, 167–168, 174
- SSA (singular spectrum analysis) 179, 181
- standard deviation 51–52, 160–161, 170, 176
- STAT (statistical analysis) , 134
- STATA 10, 12
- stationarity 178
- statistics , 52–54, 57, 70, 81–82, 159–163, 165, 167, 169, 171, 173, 175, 177, 179, 181
- statistical methods –2, , 54, 129–130, 160, 162–164, 167, 178–181, 183–184
- Std 30, 52, 54
- stdize 26, 29
- str , 20–21, 28, 30–31, 43, 46–47, 64–66, 78–79, 100
- strDates 39
- strings , 13–14, 22, 33, 35–37, 39, 41–43, 45–48, 103, 107, 125
- stringr 30
- strptime 40–41
- subclusters 176
- subset , 58, 62, 80, 176
- substr 125
- substring 33, 43
- substrn 43
- summarise() 70
- support vector machines (SVM) 169–170
- SVD (singular‐value decomposition) 180
- SVM (support vector machines) 169–170
- symbolgen 122–124, 126
- symput 125–127
- syntax –5, 12, 22, 48, 56–57, 85–86, 88, 117, 119
- Sys.Date 40, 124
- SYSDAY 124
t
- table package –5, 10–11, 28, 51, 56, 60, 62, 75–83, 85, 94, 98–101, 104, 106–108, 110, 112–113, 130, 140, 147, 153–154, 156
- t‐Distributed Stochastic Neighbor Embedding (t‐SNe) 180
- TDM (term document matrix) 180
- TF‐IDF 180
- theorem 85, 162, 166, 169
- theory 162, 165
- Tikhonov regularization 168
- time series analysis 177–179
- TNR (true negative rate) 172
- topic modeling 181
- tp 172
- TPR (true positive rate) 172–173
- translate function 45
- transmute() 69
- Trimn function 43
- trimws 47
- triplet 130
- t‐SNe (t‐Distributed Stochastic Neighbor Embedding) 180
- tz option 40–41
v
- VAR , 19–20, 23–26, 30, 34, 51, 53–56, 81–82, 87, 91, 94–96, 99, 102, 104, 122, 124, 133, 179
- variability 54, 160
- variables , , 15, 17, 19–20, 22–26, 29–37, 39, 42, 44, 48, 51–54, 57–58, 64–66, 69, 75–79, 81–83, 88, 94, 99–100, 104, 106, 121–127, 130, 148–149, 159–162, 165–169, 171, 177–180
- variance , 51, 54, 129, 161, 164, 174, 178
- VARIMAX 179
- Varoquaux, G. 183
- vector 13–14, 36, 42, 46, 48, 169–170, 179, 181
- visualization, xiii 19, 129–131, 133, 135, 137, 139, 141, 143, 145, 147
- VM (Virtual Machine)
- VMware
w
- web scraping 181
- Weisstein, E. W. 183, 184
- WHERE 58, 89–90, 96, 100, 103–104
- whitespace 45, 47
- Wickham, H. 183, 184
- Williams, G. J. 184
- Word cloud 180
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.