Page references followed by f indicate an illustrated figure.
- 68-95-99.7 Rule, 118
- 80-20 rule, 129
- A
- Aaron, Hank (reported home run tally), 13, 23–24
- Activity levels, number/figure relationships, 169
- Activity metric, usage, 170
- Adjusted data. See Vehicle year
- Adjusted histogram, example, 29f
- Adrain, Robert, 119
- Aesthetics
- data visualization variable, 209
- design, impact, 214
- elements, addition, 223f
- Aggregations, 98
- levels. See Data.
- levels, summing,
- problems, 75–83, 242
- Aircraft/wildlife
- collisions, data aggregation levels, 77f
- strikes, count, 76f
- strikes, monthly count, 81f
- yearly bar segments, addition, 82f
- yearly segments, addition (data exclusion), 82f
- American Football player, jersey numbers (histogram), 115f
- Analytical aberrations, –6, 148, 242
- Analytics, usage, 149
- Anatomy of Reality (Salk), 154
- ANOVA, 121, 133
- Arithmetic average, 110
- Attention, data visualization variable, 209
- Average speed
- metric, problem, 170–172
- Player Impact Estimate, contrast, 171f
- Awareness, data visualization, 185–200
- B
- Baltimore City, Department of Transportation tow records, 50f, 59
- Bananas, ripeness
- assessment, results, 34f
- perspective, change, 36f
- photo, 37f
- ratings, respondents (changes), 35f
- stages, 33f
- Bar chart
- Orlando reported crimes, 194f
- assault crimes, comparison, 196f
- y-axis, truncation, 177–178
- Bayesian information criterion, 136
- Behavior, data visualization variable, 209
- Biases
- cognitive biases, 142
- example, 143
- involvement, 38
- Blends/joins, problems, 67–73, 241
- Bol, Manute, 125
- Boston Marathon dashboard (attributes), color (usage), 215f
- Brazil, life expectancy (linear extrapolation), 161f
- Bubble charts, choice, 174
- Bumps chart, usage, 224
- C
- Cairo, Alberto, 187
- California, infectious diseases
- data set entries, example, 89f
- geographic roles, 92f
- reported amounts, 89f, 90f
- tuberculosis infections, choropleth map, 91f
- California road trip dashboard
- confusion (avoidance), sequential color encoding (usage), 219f
- sequential color palettes, 217f
- Categorical variable, 219–220
- Centers for Disease Control and Prevention (CDC), Ebola cumulative deaths, 21
- Chartography in Ten Lessons (Warne), 188
- Charts
- challenges, 175–202, 243
- confusion, display, 189–191, 190f
- insight
- conveyance, problem, 191–194
- precision, problem, 195
- message, problem, 195–200
- misleading chart, display, 186–189, 187f, 188f
- opportunities, omissions, 222–227
- types (reported crimes), 193f
- China, life expectancy (linear extrapolation), 161f, 162f
- Chi-squared test, 121, 133
- Choropleth map, tuberculosis infections, 91f
- Chunky data, 28
- Clarity, design (impact), 214
- Clinical trials, data (usage), 132
- Clinton, Bill, 166
- Cluster function, usage, 56
- Clustering algorithm
- misses, 59f
- recommendations imperfections, 57f
- usage, 58
- Cognitive biases, 142
- Color
- attribution usage, 215f
- confusion, 214–221
- encodings
- limitations, 219–221
- usage, problems, 218–219
- field, discrepancies (calculated field), 64
- frequency, 63
- palette, simplification, 194f
- pitfalls, 216–219
- saturation, usage problems, 216–218
- scheme, conflict (elimination), 218
- sequential palettes, problems, 217f
- single-color encoding, usage, 221f
- single color palette, 221f
- usage, problems, 216
- value, 64
- Company sales dashboard, store data set (usage), 220f
- Complexity, impact, 233–236
- Confusion, impact, 233–236
- Cook, James, 78
- Corum, Jonathan, 136
- Country names
- lists, comparison (Venn diagram), 72f
- number, data sets, 69f
- D
- Dashboards, style (change), 233–234
- Data
- adjusted data. See Vehicle year.
- aggregation, 109
- analysis, 44, 109, 149
- calculation, process, –5, 74–75, 242
- chunky data, 28
- cleanliness, recognition, 60–67
- collection program, 132
- comparison, process, , 107, 242
- content, knowledge, 154
- dashboard, default/natural mappings (contrast), 230f
- data-driven prognostication, 157
- dirty data pitfall, 48–67
- discovery process, 156
- dogmatism, 202–207, 243
- dressing, processing, –7, 212, 243
- entry typos,
- fields, usage, , 59, 95, 105
- human-keyed data, rounding (example), 25f
- importance, 240
- labels, addition, 197f
- mathematical processes, application, 75
- perspective, –4, 11–12, 241
- processing, methods, , 47, 241
- sets (country names), 69f
- solutions, selection (approaches), 204f
- storytelling mode, 200
- usage, 168, 238
- values (editing), clustering algorithm misses, 59f
- voice, 243–246
- Data pitfalls,
- avoidance, –8
- types, , –7
- Data reality
- confusion, avoidance (process), 38–39
- gap, 12–24, 80, 241
- Data sets
- dirtiness, 48
- preparation, 60
- separation, 98
- usage, 220f
- Data visualization, 178–184
- approach, factors, 208, 209
- awareness, 185–200
- elements, 178–179, 180
- experience, pleasure, 231–232
- factors, importance (determination), 209f
- payoff function, 208
- process, , 173, 243
- re-creation, 181f
- analysis, extension, 183f
- scenario, precision (requirement), 210
- usage, 178, 200–202
- variables, 209
- optimization, trade-offs, 210–211
- Date
- fields, inconsistency/incompatibility,
- format problems, 48
- Decision problem, tractability, 208
- Decision science, 208
- Default mapping, natural mapping (contrast), 230f
- Descriptive statistics, 109–131
- Design
- color encoding, 219–221
- dangers, –7, 212, 243
- impact, 214
- importance, , 231–233
- system design, problems, 230–231
- Design of Everyday Things, The (Norman), 227
- Diapers, change (timestamp), 27f
- Dichotomy
- intuition analysis false dichotomy, 149–157
- optimize/satisfice false dichotomy, 207–211, 243
- Dirty data
- mismatching category levels/data entry typos,
- pitfall, 48–67, 241
- Discoverability, 228
- Distribution
- example, 112f
- Gaussian distribution, 117
- lognormal distribution, 121–123
- matching game, answer key, 114f
- multimodal distribution, 125–127
- normal distribution, 117–121, 124–125
- representative value, 127
- right-skewed distribution, 121
- trimodal distribution, NFL player weight, 125–127
- uniform distribution, 114–117
- DMAIC, 120
- Dublin bicycle stands (dashboard), pie chart (usage), 232f
- E
- Earthquakes
- actual/recorded comparison, 17f
- Archive Search (US Geological Survey), 16
- increase, 16–18
- magnitude comparison, line plot, 16f
- Ebola
- cases, WHO classification table, 23f
- cumulative counts, decrease, 21–24
- cumulative deaths, 21
- deaths (West Africa), 22f
- Emotion, design (importance), 231–233
- Encodings
- color, usage (problems), 218–219
- types, errors, 235f
- English text, letters (relative frequencies), 177f
- Epistemic errors, –4, , 11, 241
- Epistemology, , 11–12
- Error bars, inclusion, 141f
- Errors. See Encodings
- Estimated urban population/urban population/total population, percent, 99f
- Exploratory Data Analysis (EDA), 200
- Extrapolations, 157–163
- linear extrapolation, life expectancies, 161f
- North Korea/South Korea, life expectancy, 158f, 160f
- F
- Facial expressions, 151f
- Falsifiability, 43–44
- Figures, relationships, 169
- Fish, 136
- city/outlet mislabeling, 138f
- city sampling plan, 139f
- inference, 137–139
- mislabeling, 137f
- error bars, inclusion, 141f
- misleading bars, 137f
- Fitness network site, data visualizations (re-creation), 181f
- analysis, extension, 183f
- Fremont Bridge, bicycle usage, 18–21, 18f, 44–45, 398
- counter measurements, 46f
- counting, 18
- counts, time series, 19f
- Full name (character number), normal distribution (outlier inclusion), 124–125
- G
- Galileo, astronomical measurement errors, 119
- Gauss, Carl Friedrich, 119
- Gaussian distribution, 117
- Geographic roles, infectious diseases (California), 92f
- Geometric regularity, 26f
- Geospatial coordinate formats, incompatibility, 49
- God pitfall, 43–44, 241
- Google Analytics/Wikipedia population list overlap (Venn diagram), 72f
- Google Analytics/World Bank overlap (Venn diagram), 70f
- Granularity, level (increase), 77
- Graphic Continuum (Schwabish), 175
- Graphs, mistakes/gaffes, , 173, 187
- Gridlines, addition, 197f
- H
- Height (NFL players), normal distribution, 117–121
- Heuristics, 142
- Histogram
- adjusted histogram, example, 29f
- American Football player, jersey numbers, 115f
- letter, matching, 113f
- National Basketball Association (NBA) players, weights, 28f
- National Football League (NFL) Combine, weights, 31f
- National Football League (NFL) players, number/weights, 31f
- North American football players, weights, 30f
- raw vehicle year data, visualization, 52f
- Hockey player, scatterplot (versions), 152f
- How Charts Lie (Cairo), 187
- How LIfe Imitates Chess (Kasparov), 155
- How to Lie with Statistics (Huff), 187
- Huff, Darrell, 187
- Human data, 24–32, 241
- Human error, elimination (impossibility), 230
- Human-keyed data
- fingerprint, 27
- rounding, example, 25f
- I
- If/then calculations, 105
- Impact, data visualization variable, 209
- Incentives, involvement, 38
- Infectious diseases
- data set entries, example, 89f
- geographic roles, 92f
- reported amounts, 89f, 90f
- tuberculosis infections, choropleth map, 91f
- Infographics, usage, 213
- Information
- availability, 208
- obtaining, time/resources, 208
- Innovation, components, 207
- Interactive dashboard, example, 144f
- International System of Units (SI), conversion, 103
- Interpolations, 163–165, 242
- Intuition
- analysis false dichotomy, 149–157
- appraisal, 149–153
- definition, 150
- importance, reasons, 153–157
- J
- James, LeBron, 169–172
- Jersey number
- American Football player, histogram, 115f
- bin size, 116f
- K
- Kahneman, Daniel, 142, 144, 146
- Kasparov, Garry, 155, 203
- Kidney cancer, interactive dashboard, 144f
- Kramnik, Vladimir, 203
- L
- Lewis, Jay, 26
- Life expectancy
- Brazil, linear extrapolation, 161f
- change, timeline, 164f
- China, linear extrapolation, 161f, 162f
- increase, slopegraph, 164f
- North Korea/South Korea, life expectancy (extrapolation), 158f, 160f
- Line chart, reported crimes (Orlando), 190f
- LinkedIn skills (ranking), bumps chart (usage), 224f
- Logic of Scientific Discovery, The (Popper), 40
- Logic, process, 153
- Lognormal distribution, NFL player age, 121–123
- M
- Management science, 208
- Mappings
- advice, 229
- default mapping, natural mapping (contrast), 230f
- Mars Climate Orbiter
- disintegration, 74–75, 102–103
- rendering, 102f
- Martinez, Ramon, 13
- Mathematical miscues, –5, 74, 222, 242
- McCarthy, Cormac, 203
- McLean v. Arkansas Board of Education, 43
- Mean
- difference, computation (pitfalls), 135
- maximum value, standard deviations distance (calculation), 121f
- Measures/metrics, 168–172, 242
- activity metric, 170
- error, 38
- output metric, 170
- results, objective measure (number/figure relationships), 169
- units, inconsistency/incompatibility,
- Memorability, data visualization variable, 209
- Meteorites
- data, 13–16
- falls, timeline, 15f
- strikes, 14f
- Mistakes (error type), 231, 237
- Mixon, Michael, 78
- Moivre, Abraham De, 119
- Multimodal distribution, 125–127
- Munzner, Tamara, 176, 203, 234
- Muresan, Gheorghe, 125
- National Football League (NFL) Combine, 30
- National Football League (NFL) players
- age, 122f
- lognormal distribution, 121–123
- age/weight/salary/height/jersey number, distribution, 112f
- cumulative height, 130f
- cumulative salary, 130f
- full name (character number), normal distribution (outlier inclusion), 124–125
- height
- cumulative height, 130f
- distribution, 118f
- normal distribution, 117–121
- jersey number, uniform distribution, 114–117
- name length, character count, 124f
- number/weights, histogram, 31f
- salary cap hit, 127–131, 128f
- variable type, histogram letter (matching), 113f
- distribution matching game, answer key, 114f
- weight, 125f
- position grouping, 126f
- trimodal distribution, 125–127
- Natural mapping, 228
- absence, 229f
- default mapping, contrast, 230f
- New Zealand island pair, circumnavigation, 79f
- Non-null vehicle colors, Pulaski yard tows (treemap), 65f
- Normal distribution
- NFL player height, 117–121
- outlier, inclusion, 124–125
- standard normal distribution, 119f
- Norman, Don, 227, 231
- North America
- countries, urban population (percent), 96f
- football players, weights (histogram), 30f
- North Korea/South Korea, life expectancy (extrapolation), 158f, 160f
- Null hypothesis statistical test, 134
- Null values, 49
- Numbers, relationships, 169
- Numerical literacy, growth, 234
- O
- Obama, Barack, 166
- Open exploration, data visualization (usage), 200–202
- OpenRefine, usage, 56f
- Operational research, 208
- Opinions, number/figure relationships, 169
- Opportunities, omissions, 222–227
- Optimize/satisfice false dichotomy, 207–211, 243
- Orlando, narcotics crimes
- reported cases, 187f
- assault reported cases, comparison methods, 196f
- weekly reported cases, 188f
- Orlando, reported crimes
- bar chart, 194f
- categories
- line chart, 190f
- simplified color palette, 194f
- chart types, 193f
- data labels, addition, 197f
- gridlines, addition, 197f
- monthly reported crime cases, change (timeline), 198f
- monthly reported crime, category breakdown, 199f
- monthly reported crime, pie chart/treemap, 198f
- packed bubble, 194f
- pie chart, 194f
- reported thefts, statistical signals (control chart examination), 201f
- treemap, 194f
- P
- Packed bubble, Orlando reported crimes, 194f
- assault crimes, comparison, 196f
- Pageviews
- comparison, data cleaning, 71f
- data, inclusion, 70
- map, creation, 67
- world map, 68f
- Pareto rule, 129
- Passwords, display methods, 205f
- Payoff function, 208
- Percentages, usage,
- Percents
- problems, 93–101, 242
- regional percent (computation), arithmetic average (usage), 97f
- urban population. See Urban population.
- Performance, data visualization element, 179, 181
- Pie chart, 176
- Pie chart, Orlando reported crimes, 194f
- assault crimes, reported cases (comparison), 196f
- monthly reported crime, 198f
- Player Impact Estimate (PIE), 170
- average speed, contrast, 171f
- Pleasure, design (importance), 231–233
- Poe, Edgar Allan (works)
- chart (modified version), aesthetic elements (addition), 223f
- completion, timeline, 84f
- dashboard (modified version), aesthetic appeal (enhancement), 226f
- missing years
- Wikipedia tables, 83
- years, timeline plot, 86f
- Popper, Karl, 40
- Population subset, data usage (examples), 132
- Power BI, 67
- Practical Charting Techniques (Spear), 245
- Practical significance/statistical significance, notion (confusion), 135
- Preattentive attributes, 152–153
- Process flow diagram, usage, 38
- Proportions, usage,
- P-values, 108, 134
- computation, pitfalls, 135
- Q
- Quality control, data (usage), 132
- Quantitative variable, 219–220
- Quetelet, Adolphe, 119
- R
- Rates/ratios, calculation,
- Ratings, inconsistency, 32–39, 241
- Reality, data
- confusion, avoidance (process), 38–39
- gap, 12–24
- Regional percent
- computation, arithmetic average (usage), 97f
- differences, slopegraph, 101f
- Results, objective measure (number/figure relationships), 169
- Right-skewed distribution, 121
- Road, The (McCarthy), 203
- Robbins, Naomi, 176
- Rosling, Hans, 156
- Rounding, example, 25f
- Salk, Jonas, 154
- Sample size, insensitivity, 142–147, 242
- kidney cancer, interactive dashboard, 144f
- pitfalls, 145–146
- Sampling, problems, 136–142, 242
- Sankey diagram, 35
- Scatterplot, North American countries (urban population percent), 100f
- Schwabish, Jon, 175
- Search algorithms, change,
- Sequential color
- Shoplifting
- color, 190–191
- timeline, audience attention (focus), 192f
- Simon, Herbert A., 208, 211
- Single-color encoding, usage, 221f
- Sins of commission, 195
- Sins of omission, 197
- Six Sigma movement, 119–120
- Skills (ranking), bumps chart (usage), 224f
- Slips (error type), 231
- Slipups, . See Statistics
- Slopegraph
- life expectancy, increase, 164f
- regional percent differences, 101f
- Social media account, follower numbers (distribution), 128–129
- Social media poll, 32
- Spear, Mary Eleanor, 245–246
- Spelling differences, data set, 58f
- Stacked column, Orlando reported/assault crimes reported cases (comparison), 196f
- Standard deviation (SD), 118
- distance, calculation, 121f
- Standard normal distribution, 119f
- Statistical signals, control chart examination, 201f
- Statistical significance
- concept, misunderstanding, 135
- practical significance, notion (confusion), 135
- Statistics
- descriptive statistics, 109–131
- slipups, , 107, 242
- Stovetop design, natural mapping (absence), 229f
- Student's t-test, 133
- Survival function, 123
- System design, problems, 230–231
- T
- Tableau, calculated field, 64
- Tableau Desktop product, 190
- Tableau Prep, data cleaning (treemap), 66f
- Tableau Public platform/role, 212, 235
- Tapestry Conference, 136
- Tasks, data visualization, 178–184
- Tasman, Abel, 81
- Technical traps,
- Technical trespasses, 47, 241
- Test (running), data collection, 136
- Text values, misspelling, 48
- Thinking, Fast and Slow (Kahneman), 142
- Time-distance view, 184
- Timestamp, example, 27f
- Total population
- urban population/estimated urban population, percent, 99f
- urban population, percent (representation), 99f
- Totals
- problems, 88–93, 242
- Trespassing Totals, 90
- Total urban population, 99
- Towed vehicles (Pulaski tow yard), records (treemap), 63f
- Transitions, 48
- Treemap
- non-null vehicle colors, Pulaski yard tows, 65f
- Orlando, reported crimes, 194f
- assault/narcotics crimes, reported cases (comparison), 196fe
- monthly reported crime, 198f
- Tableau Prep, data cleaning, 66f
- towed vehicles (Pulaski tow yard), records (treemap), 63f
- vehicle colors, towing record basis, 61f
- Trespassing Totals, 90
- Trimodal distribution, NFL player weight, 125–127
- t-test, 121
- Turing, Alan,
- U
- Understandability, 228
- Unemployment, OMB/administration forecasts, 166, 167f, 168f
- Unit of measure (UoM) field, basis, 105
- Units
- mismatching, 48–49
- unmatching, 102–106, 242
- usage,
- Upper control limit (UCL), 202
- Urban population, percent, 94f, 95f
- country, quotient representation, 97f
- North American countries, 96f
- null values, exclusion, 96f
- regional percent
- computation, arithmetic average (usage), 97f
- differences, slopegraph, 101f
- total population/estimated urban population, inclusion, 99f
- total population, inclusion, 99f
- Usability, issues, 227–236, 243
- User, data visualization element, 178, 180
- V
- Validity, principles, 153
- Values
- Variables
- color saturation, usage problems, 216–218
- color, usage (problem), 216
- Vehicle
- colors, data cleaning, 66f
- colors, towing record basis (treemap), 61f
- towed vehicles (Pulaski tow yard), records (treemap), 63f
- Vehicle makes
- clustering algorithms, recommendations imperfections, 57f
- frequencies, analysis, 60f
- names (clustering), OpenRefine (usage), 56f
- towing frequency, word cloud (usage), 55f
- Vehicle year
- adjusted data
- histogram, visualization, 52f
- outliers, 53f
- correction, 52–53
- raw data, histogram (visualization), 52f
- Venn diagram
- country name lists, comparison, 72f
- Google Analytics/Wikipedia population list overlap, 72f
- Google Analytics/World Bank overlap, 70f
- Visualization Analysis & Design (Munzner), 176, 203, 234
- Visual Vocabulary (Financial Times), 175
- Vlamingh, Willem de, 41
- VLOOKUP function (Excel), 67, 70
- Volkswagen, spelling differences (data set), 57, 58f
- W
- Wainer, Howard, 142, 144, 146
- Warne, Frank Julian, 188
- Waterfall chart, Orlando narcotics/assault crimes reported cases (comparison), 196f
- Weights, histogram, 30f
- Welch, Jack, 119
- West Africa, Ebola deaths, 22f
- What, knowledge, 154
- When, knowledge, 155–156
- Where, knowledge, 154–155
- Who, knowledge, 156–157
- Why, knowledge, 153–154
- Wikipedia population list/Google Analytics overlap, Venn diagram, 72f
- Wildlife/aircraft
- collisions, visualization (data aggregation levels), 77f
- strikes, count, 76f
- strikes, monthly count, 81f
- yearly bar segments, addition, 82f
- yearly segments, addition, 82f
- Word cloud, 55f, 176, 196f
- World Bank/Google Analytics overlap, Venn diagram, 70
- World Bank, life expectancy, 163
- World Health Organization (WHO), Ebola cases (classification table), 23f
- World, urban population (percent), 94f, 95f
- country, quotient representation, 97f
- North America, 96f
- null values, exclusion, 96f
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.