Index

Page references followed by f indicate an illustrated figure.

  • 68-95-99.7 Rule, 118
  • 80-20 rule, 129
  • A
  • Aaron, Hank (reported home run tally), 13, 23–24
  • Activity levels, number/figure relationships, 169
  • Activity metric, usage, 170
  • Adjusted data. See Vehicle year
  • Adjusted histogram, example, 29f
  • Adrain, Robert, 119
  • Aesthetics
    • data visualization variable, 209
    • design, impact, 214
    • elements, addition, 223f
  • Aggregations, 98
    • levels. See Data.
    • levels, summing, 4
    • problems, 75–83, 242
  • Aircraft/wildlife
    • collisions, data aggregation levels, 77f
    • strikes, count, 76f
    • strikes, monthly count, 81f
      • yearly bar segments, addition, 82f
      • yearly segments, addition (data exclusion), 82f
  • American Football player, jersey numbers (histogram), 115f
  • Analytical aberrations, 5–6, 148, 242
  • Analytics, usage, 149
  • Anatomy of Reality (Salk), 154
  • ANOVA, 121, 133
  • Arithmetic average, 110
  • Attention, data visualization variable, 209
  • Average speed
    • metric, problem, 170–172
    • Player Impact Estimate, contrast, 171f
  • Awareness, data visualization, 185–200
  • B
  • Baltimore City, Department of Transportation tow records, 50f, 59
  • Bananas, ripeness
    • assessment, results, 34f
    • perspective, change, 36f
    • photo, 37f
    • ratings, respondents (changes), 35f
    • stages, 33f
  • Bar chart
    • Orlando reported crimes, 194f
      • assault crimes, comparison, 196f
    • y-axis, truncation, 177–178
  • Baseline
  • Bayesian information criterion, 136
  • Behavior, data visualization variable, 209
  • Biases
    • cognitive biases, 142
    • example, 143
    • involvement, 38
  • Blends/joins, problems, 67–73, 241
  • Bol, Manute, 125
  • Boston Marathon dashboard (attributes), color (usage), 215f
  • Brazil, life expectancy (linear extrapolation), 161f
  • Bubble charts, choice, 174
  • Bumps chart, usage, 224
  • C
  • Cairo, Alberto, 187
  • California, infectious diseases
    • data set entries, example, 89f
    • geographic roles, 92f
    • reported amounts, 89f, 90f
    • tuberculosis infections, choropleth map, 91f
  • California road trip dashboard
    • confusion (avoidance), sequential color encoding (usage), 219f
    • sequential color palettes, 217f
  • Categorical variable, 219–220
  • Centers for Disease Control and Prevention (CDC), Ebola cumulative deaths, 21
  • Chartography in Ten Lessons (Warne), 188
  • Charts
    • challenges, 175–202, 243
    • confusion, display, 189–191, 190f
    • insight
      • conveyance, problem, 191–194
      • precision, problem, 195
    • message, problem, 195–200
    • misleading chart, display, 186–189, 187f, 188f
    • opportunities, omissions, 222–227
    • types (reported crimes), 193f
  • China, life expectancy (linear extrapolation), 161f, 162f
  • Chi-squared test, 121, 133
  • Choropleth map, tuberculosis infections, 91f
  • Chunky data, 28
  • Clarity, design (impact), 214
  • Clinical trials, data (usage), 132
  • Clinton, Bill, 166
  • Cluster function, usage, 56
  • Clustering algorithm
    • misses, 59f
    • recommendations imperfections, 57f
    • usage, 58
  • Cognitive biases, 142
  • Color
    • attribution usage, 215f
    • confusion, 214–221
    • encodings
      • limitations, 219–221
      • usage, problems, 218–219
    • field, discrepancies (calculated field), 64
    • frequency, 63
    • palette, simplification, 194f
    • pitfalls, 216–219
    • saturation, usage problems, 216–218
    • scheme, conflict (elimination), 218
    • sequential palettes, problems, 217f
    • single-color encoding, usage, 221f
    • single color palette, 221f
    • usage, problems, 216
    • value, 64
  • Company sales dashboard, store data set (usage), 220f
  • Complexity, impact, 233–236
  • Confusion, impact, 233–236
  • Cook, James, 78
  • Corum, Jonathan, 136
  • Country names
    • lists, comparison (Venn diagram), 72f
    • number, data sets, 69f
  • D
  • Dashboards, style (change), 233–234
  • Data
    • adjusted data. See Vehicle year.
    • aggregation, 109
    • analysis, 44, 109, 149
    • calculation, process, 4–5, 74–75, 242
    • chunky data, 28
    • cleanliness, recognition, 60–67
    • collection program, 132
    • comparison, process, 5, 107, 242
    • content, knowledge, 154
    • dashboard, default/natural mappings (contrast), 230f
    • data-driven prognostication, 157
    • dirty data pitfall, 48–67
    • discovery process, 156
    • dogmatism, 202–207, 243
    • dressing, processing, 6–7, 212, 243
    • entry typos, 4
    • fields, usage, 4, 59, 95, 105
    • human-keyed data, rounding (example), 25f
    • importance, 240
    • labels, addition, 197f
    • mathematical processes, application, 75
    • perspective, 3–4, 11–12, 241
    • processing, methods, 4, 47, 241
    • sets (country names), 69f
    • solutions, selection (approaches), 204f
    • storytelling mode, 200
    • usage, 168, 238
    • values (editing), clustering algorithm misses, 59f
    • voice, 243–246
  • Data pitfalls, 2
    • avoidance, 7–8
      • checklist, 241–243
    • types, 1, 3–7
  • Data reality
    • confusion, avoidance (process), 38–39
    • gap, 12–24, 80, 241
      • assessment, 38
  • Data sets
    • dirtiness, 48
    • preparation, 60
    • separation, 98
    • usage, 220f
  • Data visualization, 178–184
    • approach, factors, 208, 209
    • awareness, 185–200
    • elements, 178–179, 180
    • experience, pleasure, 231–232
    • factors, importance (determination), 209f
    • payoff function, 208
    • process, 6, 173, 243
    • re-creation, 181f
      • analysis, extension, 183f
    • scenario, precision (requirement), 210
    • usage, 178, 200–202
    • variables, 209
      • optimization, trade-offs, 210–211
  • Date
    • fields, inconsistency/incompatibility, 4
    • format problems, 48
  • Decision problem, tractability, 208
  • Decision science, 208
  • Default mapping, natural mapping (contrast), 230f
  • Descriptive statistics, 109–131
  • Design
    • color encoding, 219–221
    • dangers, 6–7, 212, 243
    • impact, 214
    • importance, 7, 231–233
    • system design, problems, 230–231
  • Design of Everyday Things, The (Norman), 227
  • Diapers, change (timestamp), 27f
  • Dichotomy
    • intuition analysis false dichotomy, 149–157
    • optimize/satisfice false dichotomy, 207–211, 243
  • Dirty data
    • mismatching category levels/data entry typos, 4
    • pitfall, 48–67, 241
  • Discoverability, 228
  • Distribution
    • example, 112f
    • Gaussian distribution, 117
    • lognormal distribution, 121–123
    • matching game, answer key, 114f
    • multimodal distribution, 125–127
    • normal distribution, 117–121, 124–125
    • representative value, 127
    • right-skewed distribution, 121
    • trimodal distribution, NFL player weight, 125–127
    • uniform distribution, 114–117
  • DMAIC, 120
  • Dublin bicycle stands (dashboard), pie chart (usage), 232f
  • E
  • Earthquakes
    • actual/recorded comparison, 17f
    • Archive Search (US Geological Survey), 16
    • increase, 16–18
    • magnitude comparison, line plot, 16f
  • Ebola
    • cases, WHO classification table, 23f
    • cumulative counts, decrease, 21–24
    • cumulative deaths, 21
    • deaths (West Africa), 22f
  • Emotion, design (importance), 231–233
  • Encodings
    • color, usage (problems), 218–219
    • types, errors, 235f
  • English text, letters (relative frequencies), 177f
  • Epistemic errors, 3–4, 8, 11, 241
  • Epistemology, 3, 11–12
  • Error bars, inclusion, 141f
  • Errors. See Encodings
  • Estimated urban population/urban population/total population, percent, 99f
  • Exploratory Data Analysis (EDA), 200
  • Extrapolations, 157–163
    • linear extrapolation, life expectancies, 161f
    • North Korea/South Korea, life expectancy, 158f, 160f
  • F
  • Facial expressions, 151f
  • Falsifiability, 43–44
  • Figures, relationships, 169
  • Fish, 136
    • city/outlet mislabeling, 138f
    • city sampling plan, 139f
    • inference, 137–139
    • mislabeling, 137f
      • error bars, inclusion, 141f
    • misleading bars, 137f
  • Fitness network site, data visualizations (re-creation), 181f
    • analysis, extension, 183f
  • Fremont Bridge, bicycle usage, 18–21, 18f, 44–45, 398
    • counter measurements, 46f
    • counting, 18
    • counts, time series, 19f
  • Full name (character number), normal distribution (outlier inclusion), 124–125
  • G
  • Galileo, astronomical measurement errors, 119
  • Gauss, Carl Friedrich, 119
  • Gaussian distribution, 117
  • Geographic roles, infectious diseases (California), 92f
  • Geometric regularity, 26f
  • Geospatial coordinate formats, incompatibility, 49
  • God pitfall, 43–44, 241
    • avoidance, 44–46
  • Google Analytics
  • Google Analytics/Wikipedia population list overlap (Venn diagram), 72f
  • Google Analytics/World Bank overlap (Venn diagram), 70f
  • Granularity, level (increase), 77
  • Graphic Continuum (Schwabish), 175
  • Graphs, mistakes/gaffes, 6, 173, 187
  • Gridlines, addition, 197f
  • H
  • Height (NFL players), normal distribution, 117–121
  • Heuristics, 142
  • Histogram
    • adjusted histogram, example, 29f
    • American Football player, jersey numbers, 115f
    • letter, matching, 113f
    • National Basketball Association (NBA) players, weights, 28f
    • National Football League (NFL) Combine, weights, 31f
    • National Football League (NFL) players, number/weights, 31f
    • North American football players, weights, 30f
    • raw vehicle year data, visualization, 52f
  • Hockey player, scatterplot (versions), 152f
  • How Charts Lie (Cairo), 187
  • How LIfe Imitates Chess (Kasparov), 155
  • How to Lie with Statistics (Huff), 187
  • Huff, Darrell, 187
  • Human data, 24–32, 241
  • Human error, elimination (impossibility), 230
  • Human-keyed data
    • fingerprint, 27
    • rounding, example, 25f
  • I
  • If/then calculations, 105
  • Impact, data visualization variable, 209
  • Incentives, involvement, 38
  • Infectious diseases
    • data set entries, example, 89f
    • geographic roles, 92f
    • reported amounts, 89f, 90f
    • tuberculosis infections, choropleth map, 91f
  • Infographics, usage, 213
  • Information
    • availability, 208
    • obtaining, time/resources, 208
  • Innovation, components, 207
  • Interactive dashboard, example, 144f
  • International System of Units (SI), conversion, 103
  • Interpolations, 163–165, 242
  • Intuition
    • analysis false dichotomy, 149–157
    • appraisal, 149–153
    • definition, 150
    • importance, reasons, 153–157
  • J
  • James, LeBron, 169–172
  • Jersey number
    • American Football player, histogram, 115f
    • bin size, 116f
  • K
  • Kahneman, Daniel, 142, 144, 146
  • Kasparov, Garry, 155, 203
  • Kidney cancer, interactive dashboard, 144f
  • Kramnik, Vladimir, 203
  • L
  • Lewis, Jay, 26
  • Life expectancy
    • Brazil, linear extrapolation, 161f
    • change, timeline, 164f
    • China, linear extrapolation, 161f, 162f
    • increase, slopegraph, 164f
    • North Korea/South Korea, life expectancy (extrapolation), 158f, 160f
  • Line chart, reported crimes (Orlando), 190f
  • LinkedIn skills (ranking), bumps chart (usage), 224f
  • Logic of Scientific Discovery, The (Popper), 40
  • Logic, process, 153
  • Lognormal distribution, NFL player age, 121–123
  • M
  • Management science, 208
  • Mappings
    • advice, 229
    • default mapping, natural mapping (contrast), 230f
  • Mars Climate Orbiter
    • disintegration, 74–75, 102–103
    • rendering, 102f
  • Martinez, Ramon, 13
  • Mathematical miscues, 4–5, 74, 222, 242
  • McCarthy, Cormac, 203
  • McLean v. Arkansas Board of Education, 43
  • Mean
    • difference, computation (pitfalls), 135
    • maximum value, standard deviations distance (calculation), 121f
  • Measures/metrics, 168–172, 242
    • activity metric, 170
    • error, 38
    • output metric, 170
    • results, objective measure (number/figure relationships), 169
    • units, inconsistency/incompatibility, 4
  • Memorability, data visualization variable, 209
  • Meteorites
    • data, 13–16
    • falls, timeline, 15f
    • strikes, 14f
  • Mistakes (error type), 231, 237
  • Mixon, Michael, 78
  • Moivre, Abraham De, 119
  • Multimodal distribution, 125–127
  • Munzner, Tamara, 176, 203, 234
  • Muresan, Gheorghe, 125
  • National Football League (NFL) Combine, 30
    • weights, histogram, 31f
  • National Football League (NFL) players
    • age, 122f
      • lognormal distribution, 121–123
    • age/weight/salary/height/jersey number, distribution, 112f
    • cumulative height, 130f
    • cumulative salary, 130f
    • full name (character number), normal distribution (outlier inclusion), 124–125
    • height
      • cumulative height, 130f
      • distribution, 118f
      • normal distribution, 117–121
    • jersey number, uniform distribution, 114–117
      • name length, character count, 124f
      • number/weights, histogram, 31f
      • salary cap hit, 127–131, 128f
      • variable type, histogram letter (matching), 113f
        • distribution matching game, answer key, 114f
      • weight, 125f
        • position grouping, 126f
        • trimodal distribution, 125–127
  • Natural mapping, 228
    • absence, 229f
    • default mapping, contrast, 230f
  • New Zealand island pair, circumnavigation, 79f
  • Non-null vehicle colors, Pulaski yard tows (treemap), 65f
  • Normal distribution
    • NFL player height, 117–121
    • outlier, inclusion, 124–125
    • standard normal distribution, 119f
  • Norman, Don, 227, 231
  • North America
    • countries, urban population (percent), 96f
    • football players, weights (histogram), 30f
  • North Korea/South Korea, life expectancy (extrapolation), 158f, 160f
  • Null hypothesis statistical test, 134
  • Null values, 49
  • Numbers, relationships, 169
  • Numerical literacy, growth, 234
  • O
  • Obama, Barack, 166
  • Open exploration, data visualization (usage), 200–202
  • OpenRefine, usage, 56f
  • Operational research, 208
  • Opinions, number/figure relationships, 169
  • Opportunities, omissions, 222–227
  • Optimize/satisfice false dichotomy, 207–211, 243
  • Orlando, narcotics crimes
    • reported cases, 187f
      • assault reported cases, comparison methods, 196f
    • weekly reported cases, 188f
  • Orlando, reported crimes
    • bar chart, 194f
    • categories
      • line chart, 190f
      • simplified color palette, 194f
    • chart types, 193f
    • data labels, addition, 197f
    • gridlines, addition, 197f
    • monthly reported crime cases, change (timeline), 198f
    • monthly reported crime, category breakdown, 199f
    • monthly reported crime, pie chart/treemap, 198f
    • packed bubble, 194f
    • pie chart, 194f
    • reported thefts, statistical signals (control chart examination), 201f
    • treemap, 194f
  • P
  • Packed bubble, Orlando reported crimes, 194f
    • assault crimes, comparison, 196f
  • Pageviews
    • comparison, data cleaning, 71f
    • data, inclusion, 70
    • map, creation, 67
    • world map, 68f
  • Pareto rule, 129
  • Passwords, display methods, 205f
  • Payoff function, 208
  • Percentages, usage, 4
  • Percents
    • problems, 93–101, 242
    • regional percent (computation), arithmetic average (usage), 97f
    • urban population. See Urban population.
  • Performance, data visualization element, 179, 181
  • Pie chart, 176
  • Pie chart, Orlando reported crimes, 194f
    • assault crimes, reported cases (comparison), 196f
    • monthly reported crime, 198f
  • Player Impact Estimate (PIE), 170
    • average speed, contrast, 171f
  • Pleasure, design (importance), 231–233
  • Poe, Edgar Allan (works)
    • chart (modified version), aesthetic elements (addition), 223f
    • completion, timeline, 84f
    • dashboard (modified version), aesthetic appeal (enhancement), 226f
    • missing years
    • Wikipedia tables, 83
    • years, timeline plot, 86f
  • Popper, Karl, 40
  • Population subset, data usage (examples), 132
  • Power BI, 67
  • Practical Charting Techniques (Spear), 245
  • Practical significance/statistical significance, notion (confusion), 135
  • Preattentive attributes, 152–153
  • Process flow diagram, usage, 38
  • Proportions, usage, 4
  • P-values, 108, 134
    • computation, pitfalls, 135
  • Q
  • Quality control, data (usage), 132
  • Quantitative variable, 219–220
  • Quetelet, Adolphe, 119
  • R
  • Rates/ratios, calculation, 4
  • Ratings, inconsistency, 32–39, 241
  • Reality, data
    • confusion, avoidance (process), 38–39
    • gap, 12–24
  • Regional percent
    • computation, arithmetic average (usage), 97f
    • differences, slopegraph, 101f
  • Results, objective measure (number/figure relationships), 169
  • Right-skewed distribution, 121
  • Road, The (McCarthy), 203
  • Robbins, Naomi, 176
  • Rosling, Hans, 156
  • Rounding, example, 25f
  • Salk, Jonas, 154
  • Sample size, insensitivity, 142–147, 242
    • kidney cancer, interactive dashboard, 144f
    • pitfalls, 145–146
  • Sampling, problems, 136–142, 242
  • Sankey diagram, 35
  • Scatterplot, North American countries (urban population percent), 100f
  • Schwabish, Jon, 175
  • Search algorithms, change, 6
  • Sequential color
  • Shoplifting
    • color, 190–191
    • timeline, audience attention (focus), 192f
  • Simon, Herbert A., 208, 211
  • Single-color encoding, usage, 221f
  • Sins of commission, 195
  • Sins of omission, 197
  • Six Sigma movement, 119–120
  • Skills (ranking), bumps chart (usage), 224f
  • Slips (error type), 231
  • Slipups, 2. See Statistics
  • Slopegraph
    • life expectancy, increase, 164f
    • regional percent differences, 101f
  • Social media account, follower numbers (distribution), 128–129
  • Social media poll, 32
  • Spear, Mary Eleanor, 245–246
  • Spelling differences, data set, 58f
  • Stacked column, Orlando reported/assault crimes reported cases (comparison), 196f
  • Standard deviation (SD), 118
    • distance, calculation, 121f
  • Standard normal distribution, 119f
  • Statistical signals, control chart examination, 201f
  • Statistical significance
    • concept, misunderstanding, 135
    • practical significance, notion (confusion), 135
  • Statistics
    • descriptive statistics, 109–131
    • slipups, 5, 107, 242
  • Stovetop design, natural mapping (absence), 229f
  • Student's t-test, 133
  • Survival function, 123
  • System design, problems, 230–231
  • T
  • Tableau, calculated field, 64
  • Tableau Desktop product, 190
  • Tableau Prep, data cleaning (treemap), 66f
  • Tableau Public platform/role, 212, 235
  • Tapestry Conference, 136
  • Tasks, data visualization, 178–184
  • Tasman, Abel, 81
  • Technical traps, 4
  • Technical trespasses, 47, 241
  • Test (running), data collection, 136
  • Text values, misspelling, 48
  • Thinking, Fast and Slow (Kahneman), 142
  • Time-distance view, 184
  • Timestamp, example, 27f
  • Total population
    • urban population/estimated urban population, percent, 99f
    • urban population, percent (representation), 99f
  • Totals
    • problems, 88–93, 242
    • Trespassing Totals, 90
  • Total urban population, 99
  • Towed vehicles (Pulaski tow yard), records (treemap), 63f
  • Transitions, 48
  • Treemap
    • non-null vehicle colors, Pulaski yard tows, 65f
    • Orlando, reported crimes, 194f
      • assault/narcotics crimes, reported cases (comparison), 196fe
      • monthly reported crime, 198f
    • Tableau Prep, data cleaning, 66f
    • towed vehicles (Pulaski tow yard), records (treemap), 63f
    • vehicle colors, towing record basis, 61f
  • Trespassing Totals, 90
  • Trimodal distribution, NFL player weight, 125–127
  • t-test, 121
  • Turing, Alan, 9
  • U
  • Understandability, 228
  • Unemployment, OMB/administration forecasts, 166, 167f, 168f
  • Unit of measure (UoM) field, basis, 105
  • Units
    • mismatching, 48–49
    • unmatching, 102–106, 242
    • usage, 4
  • Upper control limit (UCL), 202
  • Urban population, percent, 94f, 95f
    • country, quotient representation, 97f
    • North American countries, 96f
    • null values, exclusion, 96f
    • regional percent
      • computation, arithmetic average (usage), 97f
      • differences, slopegraph, 101f
    • total population/estimated urban population, inclusion, 99f
    • total population, inclusion, 99f
  • Usability, issues, 227–236, 243
  • User, data visualization element, 178, 180
  • V
  • Validity, principles, 153
  • Values
  • Variables
    • color saturation, usage problems, 216–218
    • color, usage (problem), 216
  • Vehicle
    • colors, data cleaning, 66f
    • colors, towing record basis (treemap), 61f
    • towed vehicles (Pulaski tow yard), records (treemap), 63f
  • Vehicle makes
    • clustering algorithms, recommendations imperfections, 57f
    • frequencies, analysis, 60f
    • names (clustering), OpenRefine (usage), 56f
    • towing frequency, word cloud (usage), 55f
  • Vehicle year
    • adjusted data
      • histogram, visualization, 52f
      • outliers, 53f
    • correction, 52–53
    • raw data, histogram (visualization), 52f
  • Venn diagram
    • country name lists, comparison, 72f
    • Google Analytics/Wikipedia population list overlap, 72f
    • Google Analytics/World Bank overlap, 70f
  • Visualization Analysis & Design (Munzner), 176, 203, 234
  • Visual Vocabulary (Financial Times), 175
  • Vlamingh, Willem de, 41
  • VLOOKUP function (Excel), 67, 70
  • Volkswagen, spelling differences (data set), 57, 58f
  • W
  • Wainer, Howard, 142, 144, 146
  • Warne, Frank Julian, 188
  • Waterfall chart, Orlando narcotics/assault crimes reported cases (comparison), 196f
  • Weights, histogram, 30f
  • Welch, Jack, 119
  • West Africa, Ebola deaths, 22f
  • What, knowledge, 154
  • When, knowledge, 155–156
  • Where, knowledge, 154–155
  • Who, knowledge, 156–157
  • Why, knowledge, 153–154
  • Wikipedia population list/Google Analytics overlap, Venn diagram, 72f
  • Wildlife/aircraft
    • collisions, visualization (data aggregation levels), 77f
    • strikes, count, 76f
    • strikes, monthly count, 81f
      • yearly bar segments, addition, 82f
      • yearly segments, addition, 82f
  • Word cloud, 55f, 176, 196f
    • selection, 174
    • usage, 204–205
  • World Bank/Google Analytics overlap, Venn diagram, 70
  • World Bank, life expectancy, 163
  • World Health Organization (WHO), Ebola cases (classification table), 23f
  • World, urban population (percent), 94f, 95f
    • country, quotient representation, 97f
    • North America, 96f
    • null values, exclusion, 96f
  • Wozniak, Nate, 121
  • Y
  • Y-axis, inversion, 225
  • Z
  • Zeehaen's Bight, 79
  • Z score, 121
  • Zwerling, Harris L., 142, 144, 146
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.223.169