abbreviations and acronyms for variables 21, 40
accuracy 39
alpha level (or significance level) 74–5, 86
alternative hypothesis (H1) 71–2, 74, 75, 76, 83, 86
concept of and key components 6
key risks 11
data communication and usage 128, 132–4, 136
key concepts 126
analytics software presentations xxii, 189–96
analytics software tools 8, 23, 60, 116, 233, 236
analytics trends 229, 233, 234–5
ANOVA (analysis of variance) 68–9, 84
application-related questions 142, 144, 145–6
artificial intelligence (AI), transparent 235
audience(s)
attitude about topic 174
building common ground with 171, 176, 183, 194
call to action 174, 176, 180, 183, 184, 185
involvement 156, 162, 166, 167
triggering emotions in 171, 174, 184
autonomy-supporting language 206, 207, 208
average dispersion see variance
see also mean; median; mode
bad data-based news xxii, 199–214
acceptance stage 203
communication strategies 206–8
resistance as challenge of 206, 207
common types
goal failure 201
insufficient competitiveness 201, 203
negative trends 201
communication strategies 204–6
confusion as challenge of 204, 205
dos and don’ts of communicating 212, 213
as game changer 203
motivation (action stage) 204
communication strategies 209, 211
frustration as challenge of 209, 211
biases see analytics biases
big data xvi, 9, 109, 115, 235
big knowledge 9
bimodal distribution 28
binary variables 19, 20, 29, 33, 55
biserial correlation coefficient 55
black box allergy 8
categorical variables 18–19, 20, 28, 29, 33, 65–8
causation, correlation vs 55, 83, 133
causation bias 133
cause-and-effect relationships 76, 80
central location see central tendency
central tendency, measures of 27–33, 34
chart shock 8
charts
Decluttering 156, 158, 166, 167
Emphasising 156, 158–9, 166, 167
Storifying 156, 160, 161, 166, 167
Involving the audience 156, 162, 166, 167
Giving meaning 156, 163–4, 167
No distortion rule 156, 164, 166, 167
see also bar charts; dendrograms; graphs; histograms; line charts; maps; pie charts; scatter plots
cloud analytics 234
hierarchical 109, 113–14, 115, 117, 120
key concepts 106
risks 120
cognitive computing 234
Cohen’s d 76
collaborative analytics 234
comparison
chart format to enable 159
competitiveness, insufficient 201, 203
complex relationships xx–xxi, 91–102
conceptual models 100
confounding variables 81–2, 83, 86, 131
continuous variables 18, 19–20, 28, 33, 54, 55, 64
interval 19, 20, 29, 30, 33, 54, 64
ratio 19, 20, 29, 30, 33, 54, 64
sizes of relationships between 76
convenience sampling 79
conversations xvi–xvii
direction and strength of see correlation coefficient
negative linear 50, 52, 53, 85
no correlation 50
positive linear 49–50, 52, 53, 84
correlation coefficient 52–5, 60, 83
biserial 55
Cramer’s V 55
Kendall’s tau 54
point-serial 55
Spearman’s rho 54
counter-arguing 208
critical mindset 229, 231, 233
curse of knowledge bias 132–3, 134, 193, 194
curvilinear relationships 50–1
making data meaningful in 163
storified 160, 161, 178–80, 182
data analysis 142
asking questions related to 142, 145
rigour and relevance of 144
Data Central 233
data communication and usage biases 128, 132–4, 136
data disagreements xxii, 217–26
data distortions xxi
avoiding, and chart design 156, 164, 166, 167
see also analytics biases
data fatigue 8
data fluency xvi, xxii, xviii–xx
elements for sustained
critical attitude 229, 231, 233
monitoring trends 229, 233, 234–5
understanding and skills 229, 230
five step action plan 236
data gathering biases 128, 129–31
data quality paranoia 8
data scepticism 8
Data Science Central 231
data storytelling xxi–xxii, 8, 9, 160, 171–86, 230
audience
attitude about topic 174
building common ground with 171, 176, 183
call to action and rewards 174, 176, 180, 183, 184, 185
triggering emotions in 171, 174, 184
situation–complication–resolution format 176, 180, 183–4, 185
specific–explore–generalise format 176, 177
use of voice in 171, 177–8, 184–5
see also storifying charts
data visualisation 153–68, 172, 230
and data disagreement 220–1, 222
frequency distributions 23, 25
see also charts
decluttering 156, 158, 166, 167
dependent variables 21, 22, 40, 81
chart format to emphasise 159
direction and strength of relationship see Pearson correlation coefficient
dispersion see spread
distance matrix 114
distance metrics see Euclidean distance
distortions see data distortions
distributed analytics 233, 234
distribution see frequency distributions; normal distributions
Drucker, Peter 147
Dunning–Kruger effect 133
emphasising, and chart design 156, 158–9, 166, 167
error of the model (residual) 57
eta squared 76
Euclidean distance 110, 111–13
Excel 236
experimental hypothesis see alternative hypothesis
experimental research designs 80, 83
exponential relationships 51
frequency distributions 15, 23–33
concept of and key components 24
measures of central tendency 27–33, 34
pointiness (kurtosis) 25–7, 40
presentation format 23, 24, 25
full mediation 99
gain-framed perspective 206
Gartner.com 233
Gaussian distributions see normal distributions
generalisations 48, 69–76, 82, 83
goal failure 201
graphic representation see data visualisation
graphs, frequency distributions 23, 24, 25
group membership, and outcomes, comparing differences between 64–8, 76
grouping see cluster analysis
histograms 23
hybrid intelligence 235
hypothesis 71
alternative (experimental) (H1) 71–2, 74, 75, 76, 83, 86
null (H0) 71–2, 73, 74–5, 76, 83, 86
important effects, and significant effects, difference between 76–82
incompetence compensation competence 8
independent variables 21–2, 40, 81
Infoworld.com 233
interaction effect 93
interactive data 162
the intercept 57, 58, 59–60, 83, 85
Intersection over Union see Jaccard coefficient or index
interval variables 19, 20, 29, 30, 33, 54, 64
involvement, audience 156, 162, 166, 167
Jaccard coefficient or index 110–11
jargon, avoidance of 63–4, 84–6, 176, 204
Kendall’s tau 54
labelling charts 158, 163, 166
line charts 164
concept of and key components 47
direction and strength of 51–5
see also correlation; correlation coefficient
loss-framed perspective 206
maps 159
market segmentation 107
meaning, giving, and chart design 156, 163–4, 166, 167
median 27, 28–9, 30, 31–2, 33, 40
mediation xxi, 91, 93, 94, 98–9, 100, 101
full 99
partial 99
mediator variable 98, 99, 100, 101, 102
meetups 233
Microsoft Excel 236
Microsoft Power BI 8, 116, 236
mobile analytics 235
moderation xxi, 91, 93–8, 100, 101
moderator variable 93, 95, 96, 97, 100, 101, 102
multidimensional scaling plot 115
multimodal distribution 28
negative linear correlation 50, 52, 53, 85
negative trends 201
nominal variables 19, 20, 33, 55
non-probability sampling 79
deviations from see kurtosis; skewed distributions
normality bias 132
null hypothesis (H0) 71–2, 73, 74–5, 76, 83, 86
odds ratio 76
ordinal variables 19, 20, 29, 33, 54
outcome variables 21–2, 40, 48, 57, 58–9, 76, 82
see also predictor variables, and outcomes
outliers 16, 30, 31–2, 34, 40, 163
overfitting 132
partial mediation 99
Pearson correlation coefficient 52–4, 76
point-biserial correlation coefficient 55
positive linear correlation 49–50, 52, 53, 84
precision 39
predictions 48, 55–69, 82, 83, 85
predictor variables 21–2, 40, 48, 76, 82
mediation analysis 91, 93, 94, 98–9, 100, 101
moderation analysis 91, 93–8, 100, 101
principal component analysis (PCA) 118
probability sampling 79
Python (programming software) xvi, 236
qualitative data 17
quality of data see data quality
quantitative data 17
quasi-experimental research 80–1
questions about data (Q&A sessions) xxi, 141–50
analysis-related questions 142, 145
application-related questions 142, 144, 145–6
data source and quality questions 142, 144–5
risks 149
tone 146
R (programming software) xvi, 236
random assignment/randomisation 80, 81, 86
ranking, chart format to enable 159
ratio variables 19, 20, 29, 30, 33, 54, 64
recency effect 211
regression analysis 56–69, 84, 85
regression line (line of best fit) 56–8, 59, 85
regression slope (or gradient) 57–9, 85
representative sample 78–9, 83, 86
residual (error of the model) 57
resistance to bad news 206, 207
resources, for sustained data fluency 229, 231, 233
risk-based segmentation 107
S-shaped relationships 51
sampling methods 79
scatter plots (or scatter diagrams) 48–50, 51, 159
segmentation see cluster analysis
selection bias 130
self-service analytics (or business intelligence) 233, 235
significance level (or alpha level) 74–5, 86
significant effects, and important effects, difference between 76–82
silhouette scores 117
measures 120
Jaccard coefficient or index 110–11
similarity matrix 114
Simon, Herbert 105
skewed distributions 25–7, 30–1, 40
slope see regression slope
Smedley, Ralph C. xviii
snowball sampling 79
software see analytics software
Spearman’s rho 54
Spinoza, Baruch xviii
SPSS 236
stat phobia 7
avoiding jargon when talking about 63–4, 84–6
storifying charts 156, 160, 161, 166, 167
storytelling see data storytelling
straight-line relationships see linear relationships
sum of squared errors (SS) 37, 38
survivor bias 130
systematic vs unsystematic variation 71–4
tables, frequency distributions 23, 25
tails of a distribution 26
transparent AI 235
chart format to show 159
negative 201
abbreviations and acronyms 21, 40
and measures of central tendency 28, 29, 30, 33
mediator 98, 99, 100, 101, 102
moderator 93, 95, 96, 97, 100, 101, 102
see also categorical variables; continuous variables; outcome variables; predictor variables
variables, relationship between 21–2
concept of and key components 47
non-linear
exponential 51
S-shaped 51
U-shaped (or curvilinear) 50–1
see also complex relationships; correlation; correlation coefficient
ANOVA (analysis of variance) 68–9, 84
variation, systematic vs unsystematic 71–4
visualisation see data visualisation
voice, and data storytelling 171, 177–8, 184–5
voluntary response sampling 79
18.222.95.7