Index

Symbols

3D in data visualizations, 220-221

56th Grammy Awards, hypothesis validation example (data analysis), 104, 112-113

2008 presidential debates, 122-124

2011 Academy Awards, 14

2012 presidential debates, 14

2012 presidential election, 86

2013 Nobel Peace Prize, 5-7

2014 Grammy Awards and Twitter, xxix

2014 IBM Insight conference, 172

A

Academy Awards (2011), 14

Activity Scorecard KPI in PSD, 177-180

Adams, Ansel, 103

Adams, Douglas, 31

ad hoc analysis, 87

defining, 141-142

example of, 144-150

external social media (domain of analysis), 90

integrity of data, 150-155

internal social media (domain of analysis), 95

Adventure of the Six Napoleons, The, xx

AdWeek, JetBlue and customer positive/negative experiences, 38

affinity analysis (SMA), 165-167

affinity matrixes, 244

Africa, growth of social media, xxviii

age of author and data analysis, 34, 41-42

All Things Analytics website, 170-172

Al Qaeda, 5

Altimeter Group, 169

Always On Engagement Center (IBM), 125

analysis, depth of (taxonomy of data analysis), 84-85

analysis, domain of (taxonomy of data analysis), 84, 169

external social media, 88

ad hoc analysis, 90

deep analysis, 90-93

SSM, 89-90

internal social media, 88, 94

ad hoc analysis, 95

deep analysis, 95-97

SSM, 94-96

analysis, duration of (taxonomy of data analysis), 90-91, 96

Analytics (Enterprise Graphs), 174

Analytics Services (Enterprise Graphs), 174

analyzing comsumer reactions, 204-209

analyzing data, xxx

ad hoc analysis, 87

defining, 141-142

example of, 144-150

external social media (domain of analysis), 90

integrity of data, 150-155

internal social media (domain of analysis), 95

audience comments, filtering

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

case study, 227-228

conclusions, 247

data analysis (first pass), 235-241

data analysis (second pass), 243-244

data identification, 228, 231-235

interpreting information, 244-247

chaff, separating wheat from, 18

data collection

calculating web page visits, 20-21

“casting a net”, 19-23

data interpretation, xxv

data modeling, xxv

data validity, 20-23

data visualization, xxv

deep analysis, 87, 157

affinity analysis, 165-167

classifying leads, 160-161

Evolving Topics algorithm, 163-164

external social media (domain of analysis), 90-93

identifying leads, 158-159

internal social media (domain of analysis), 95-97

qualifying leads, 160-161

relationship matrixes, 92

suggested action phase, 161-163

support via analytics software, 163-167

defining, xxv, 83

descriptive analytics, 54

defining, 53

predictive analytics versus, 48-49

sentiment and, 55-57

Simple Social Metrics, 53

eliminating data, 21-23

keyword filtering, 28-29

regular expressions, 24-27

hypotheses, validating, 103

Cannes Lions 2013 example, 104, 110-112

Grammy Awards example, 104, 112-113

youth unemployment example, 104-110

IBMAmplify case study, 227-228

conclusions, 247

data analysis (first pass), 235-241

data analysis (second pass), 243-244

data identification, 228, 231-235

interpreting information, 244-247

iterative methods and, 117-119

marketing and, xxvi

near real-time analysis, 86

near real-time analytics, 121-123

predictive analytics

defining, 49

descriptive analytics versus, 48-49

sentiment and, 51-53

trend forecasting, 51-53

real-time analytics, 121

2008 presidential debates, 122-124

as early warning system, 139

conference data, 138-139

IBM Always On Engagement Center, 125

near real-time analytics versus, 123

stream computing, 128-136

value of, 122, 125, 138-139

real-time views, xxv

relationship matrixes, xxv

stream computing, 126

components of streams, 128-130

directed graphs, 130-133

filters, 127

IBM InfoSphere Streams, 128

real-time data analytics, 128

REST and, 132

SPL, 129-130, 134

SSM and, 131-136

Streams Studio IDE, 129

target audience, determining

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

taxonomy of analysis, 83

depth of analysis, 84-85

domain of analysis, 84, 88-90, 94-96, 169

duration of analysis, 90-91, 96

machine capacity, 84-86, 90-91, 94-98

velocity of data, 84, 99-101

themes, discovering, 103, 113-117

timing and, 57-58

topics, discovering, 103, 113-117

trends, discovering, 103, 113-117

Twitter, xxv

value pyramid, 18

analyzing sentiment, 202

defining, xxx

microblogs, 203

analyzing social media content, process of

clear communication, 195-198

consumer reaction study, 204-209

data

duplicating, 198-200

filtering, 192, 198

finding the right data, 193-194

gathering, 191-194

refining, 192, 195-200

data model, developing, 192

questions, posing, 190

tools

configuring, 192

customizing/modifying, 201-203

selecting, 204

troubleshooting, 193-209

animal testing, 11

API (Application Programming Interfaces) and Enterprise Graphs, 186

Apple iPad, Twitter data collection/filtering example, 22-29

architects and data model development, 192

Armstrong, Lance, 35, 144, 151-155, 193, 222

Asher, Jay, 48

Asia-Pacific, growth of social media, xxviii

attributes of data, 7

language, 9

ownership of data, 14

region, 9

structure, 8, 64

time, 14

type of content, 10

blogs, 12

discussion forums, 12

instructions, 11

microblogs, 12

news, 11

press releases, 11

wikis, 12

venue, 13

audiences

comments, filtering

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

finding, xxvi

target audiences, determining

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

audio as a type of content (data attribute), 10

Australia, growth of social media, xxviii

B

Babbage, Charles, 1, 22

baby boom (Post-World War II), 34

Baidu Tieba, sifting through big data, 71

bar charts, 214-215

BBC, 5

bias

data analysis and, 31-32

data identification, 6-7

big data, 72

defining, 65-66, xxx

finding, 69

looking for, 69

as natural resource, xxviii

paradox of choice, the, 70

sifting through

nonscoped/scoped datasets, 71

paradox of choice, 70, 74

signal-to-noise ratio, 71

social media as, 67-68

entertainment, 68

sharing, 69

social aspect, 68

BladeCenter (IBM), big data analysis example, 72

blogs. See also microblogs

data identification and type of content, 12

ESN, 171

identifying data in, 80

microblogging, xxvi

Yousafzai, Malala, 5

Bluemix, 204

Boardreader data aggregator, 58, 105, 192

Borse, Santosh, 170

Bowers, Jeffery, 122

Boy and His Atom, A, 110-112

Brazil, growth of social media, xxviii

Brown, Gordon, 123

Bryant, Randal, 66

Burns, Robert, 117

BusinessWeek, xxi

C

Calgary Floods project (SMA), 164-167

Cannes Lions 2013, hypothesis validation example (data analysis), 104, 110-112

CapGemini, 247

case study (data analysis), 227-228

conclusions, 247

data analysis

first pass, 235-241

second pass, 243-244

data identification, 228, 231-235

interpreting information, 244-247

“casting a net” (data collection), 19-23

chaff, separating wheat from, 17

charts

bar charts, 214-215

line charts, 216-218

pie charts, 213-214

scaling issues, 215

China

growth of social media, xxviii

IBM and Chinese factories, 193-194

RenRen, 78

social media outlets, 74

choice, the paradox of, 70

Citibank, 196-197

classifying

data and stream computing, 135-136

leads (deep analysis), 160-161

clear communication in social analytics process, 195-198

Clegg, Nick, 123

clouds (word), xxv, xxx

CNN, 13, 124

Coase, Ronald, 193

Coca-Cola, 9

collecting data

Apple and Twitter example, 22-29

“casting a net”, 19-23

“data validity”, 20-23

eliminating data, 21-23

keyword filtering, 28-29

regular expressions, 24-27

noisy data, filtering, 24-29

regular expressions, 23

egrep, 25-27

filtering noisy data, 24-27

right data, finding, 193-194

Twitter and Apple example, 22-29

web page visits, computing, 20-21

wildcards, 23

color in data visualizations, 221

Comcast

customer satisfaction, xxi, xxii

NHL playoffs, xxii

Twitter and, xxi, xxii

comments, filtering

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

communication, transparency of (ESN), 171

communities (online), 12

computer architects and data model development, 192

conferences and real-time data anlytics, 138-139

connotations (positive/negative), words with, 202

consumer reaction analysis, 204-209

Consumer Reports, social media as sharing, 69

Content Analytics (IBM), 204

content, type of (data attribute), 10

blogs, 12

discussion forums, 12

instructions, 11

microblogs, 12

news, 11

press releases, 11

wikis, 12

context, structuring data via, 63

conversations,

social media as, xx, xxi, xxii, xxiii

starting, xxvi

Cowper, William, 61

Crow, Sheryl, 155

Crux website, The, 199-200

customer satisfaction

Comcast, xxi, xxii

Twitter, xxi, xxii

customizing/modifying tools in the social analytics process, 201-203

D

data

attributes of, 7

language, 9

ownership of data, 14

region, 9

structure, 8

time, 14

type of content, 10-12

venue, 13

clear communication, 195, 198

data duplication, 198-200

deduplicating, 200

defining, 2

duplicating, 198-200

filtering, 3, 192, 198

gathering, 191-194

information, defining, 3

integrity of, 150, 155

interpreting, xxv, xxvi, xxvii

knowledge, defining, 3-4

modeling

defining, xxv

model development, 192

motion, data at, 14

noisy data

defining, 3

filtering, 24-29

private data, 15

proprietary data, 15

public data, 15

refining, 192

relevancy of, 5-7

rest, data at, 14

states of, 14

uniqueness of, 200

unprocessed data, 2

validating, 20, 23

value pyramid, 3, 18

velocity of (taxonomy of data analysis), 84

data at rest, 100-101

data in motion, 99

wisdom, defining, 3

data analysis, xxx

ad hoc analysis, 87

defining, 141-142

example of, 144-150

external social media (domain of analysis), 90

integrity of data, 150-155

internal social media (domain of analysis), 95

audience comments, filtering

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

case study, 227-228

conclusions, 247

data analysis (first pass), 235-241

data analysis (second pass), 243-244

data identification, 228, 231-235

interpreting information, 244-247

chaff, separating wheat from, 18

data collection

calculating web page visits, 20-21

“casting a net”, 19-23

data interpretation, xxv

data modeling, xxv

data validity, 20-23

data visualization, xxv

deep analysis, 87, 157

affinity analysis, 165-167

classifying leads, 160-161

Evolving Topics algorithm, 163-164

external social media (domain of analysis), 90-93

identifying leads, 158-159

internal social media (domain of analysis), 95-97

qualifying leads, 160-161

relationship matrixes, 92

suggested action phase, 161-163

support via analytics software, 163-167

defining, xxv, 83

descriptive analytics, 54

defining, 53

predictive analytics versus, 48-49

sentiment and, 55-57

Simple Social Metrics, 53

eliminating data, 21-23

keyword filtering, 28-29

regular expressions, 24-27

hypotheses, validating, 103

Cannes Lions 2013 example, 104, 110-112

Grammy Awards example, 104, 112-113

youth unemployment example, 104-110

IBMAmplify case study, 227-228

conclusions, 247

data analysis (first pass), 235-241

data analysis (second pass), 243-244

data identification, 228, 231-235

interpreting information, 244-247

iterative methods and, 117-119

marketing and, xxvi

near real-time analytics, 86, 121-123

predictive analytics

defining, 49

descriptive analytics versus, 48-49

sentiment and, 51-53

trend forecasting, 51-53

real-time analytics, 121

2008 presidential debates, 122-124

conference data, 138-139

as early warning system, 139

IBM Always On Engagement Center, 125

near real-time analytics versus, 123

stream computing, 128-136

value of, 122, 125, 138-139

real-time views, xxv

relationship matrixes, xxv

stream computing, 126

components of streams, 128-130

directed graphs, 130-133

filters, 127

IBM InfoSphere Streams, 128

real-time data analytics, 128

REST and, 132

SPL, 129-130, 134

SSM and, 131-136

Streams Studio IDE, 129

target audience, determining

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

taxonomy of analysis, 83

depth of analysis, 84-85

domain of analysis, 84, 88-90, 94-96, 169

duration of analysis, 90-91, 96

machine capacity, 84-86, 90-91, 94-98

velocity of data, 84, 99-101

themes, discovering, 103, 113-117

timing and, 57-58

topics, discovering, 103, 113-117

trends, discovering, 103, 113-117

Twitter, xxv

value pyramid, 18

data collection

Apple and Twitter example, 22-29

“casting a net”, 19-23

“data validity”, 20-23

eliminating data, 21-23

keyword filtering, 28-29

regular expressions, 24-27

noisy data, filtering, 24-29

regular expressions, 23

egrep, 25-27

filtering noisy data, 24-27

Twitter and Apple example, 22-29

web page visits, computing, 20-21

wildcards, 23

data identification

attributes of data, 7

language, 9

ownership of data, 14

region, 9

structure, 8

time, 14

type of content, 10-12

venue, 13

bias in, 6-7

case study, 228, 231-235

defining, xxiv, 1, 4

filtered data, defining, 3

goal of, 3-4

hypothesis validation and, 105-108

information, defining, 3

knowledge, defining, 3-4

noisy data, defining, 3

relevancy of, 5-7

social media outlets, 74

blogs, 80

Facebook, 77

information sharing sites, 78-79

microblogs, 79-80

professional networking sites, 75-76

RenRen, 78

social sites, 77-78

wikis, 80

unprocessed data, defining, 2

value pyramid, 3

wisdom, defining, 3

Data Services (Enterprise Graphs), 174

datasets (nonscoped/scoped), 71

Data Sources (Enterprise Graphs), 174

data streams (SPL), 129

data visualization, xxv, 211-212

3D, 220-221

bar charts, 214-215

color, 221

effectiveness of, 213

information overload, 219

line charts, 216-218

pie charts, 213-214

scaling issues, 215

scatter plots, 218

troubleshooting, 219-221

unstructured data, 222-225

word clouds, 224-225

Dave, Hardik, 170

Davidzenka, Mila, 105

Davis, Colin, 122

debates (presidential)

2008, 122-124

2014, 12

deconstructing knowledge creation (ESN), 172

deduplication of data, 200

deep analysis, 87, 157

affinity analysis, 165-167

Evolving Topics algorithm, 163-164

external social media (domain of analysis), 90-93

internal social media (domain of analysis), 95-97

leads

classifying, 160-161

identifying, 158-159

qualifying, 160-161

relationship matrixes, 92

suggested action phase, 161-163

support via analytics software, 163-167

demographics

Facebook, 77

LinkedIn, 76

RenRen, 78

Twitter, 80

YouTube, 78-79

depth of analysis (taxonomy of data analysis), 84-85

descriptive analytics, 54

defining, 53

predictive analytics versus, 48-49

sentiment and, 55-57

Simple Social Metrics, 53

detectives, social media analysts as, xxiv

directed graphs and stream computing, 130-133

discovery/innovation in ESN, 172

discussion forums

data identification and type of content, 12

ESN, 172

domain of analysis (taxonomy of data analysis), 84, 169

external social media, 88

ad hoc analysis, 90

deep analysis, 90-93

SSM, 89-90

internal social media, 88, 94

ad hoc analysis, 95

deep analysis, 95-97

SSM, 94-96

Doyle, Arthur Conan, xx

duplicated data in social analytics process, 198-200

duration of analysis (taxonomy of data analysis), 90-91, 96

E

early warning system, real-time data analytics as, 139

Econsultancy, 83

Edwards Air Force Base, 189

egrep (Extended Global Regular Expressions Print), 25-27

Eliason, Frank, xxi, xxii

eliminating data based on validty, 21-25

eminence/popularity and data analysis, 35, 42-44

Eminence Scorecard KPI in PSD, 177, 181-182

employees

ESN employee-to-employee interactions, 172-173

job roles and data analysis, 35

performance and Enterprise Graphs, 186

privacy, 170

public vs employee comments, 31-32

Enterprise Graphs

Analytics, 174

Analytics Services, 174

API, 186

components of, 174-175

Data Services, 174

Data Sources, 174

employee performance, 186

ESN and, 174-175

future of, 185-186

Graph Store, 174

PSD, 175

Activity Scorecard KPI, 177-180

assessing business benefits, 183-185

benefits of, 176

Eminence Scorecard KPI, 177, 181-182

Network Scorecard KPI, 177, 183

Reaction Scorecard KPI, 177, 180-181

sales outcomes, 186

Enterprise (Star Trek), 4

entertainment, social media as, 68

ESN (Enterprise Social Networks), 88, 169

blogs in, 171

discovery/innovation, 172

discussion forums, 172

employee-to-employee interactions, 172-173

Enterprise Graphs, components of, 174-175

IBM and, 170

knowledge, 172

PSD, 175

Activity Scorecard KPI, 177-180

assessing business benefits, 183-185

benefits of, 176

Eminence Scorecard KPI, 177, 181-182

future of Enterprise Graphs, 185-186

Network Scorecard KPI, 177, 183

Reaction Scorecard KPI, 177, 180-181

transparency of communication, 171

ESPN, 155

Europe, social media outlets, 75

evolving topics, 163-164, 206-209

expertise/profession and data analysis, 34

expressions (regular), 23

egrep, 25-27

filtering noisy data, 24-27

external social media (domain of analysis)

data at rest

deep analysis, 91-93

SSM, 90

data in motion, 88

ad hoc analysis, 90

deep analysis, 90

SSM, 89

F

Facebook, 13, 76

big data, sifting through, 71

consumer reaction analysis study, 205

data identification and type of content, 10

demographics, 77

fan pages, 77

groups, 78

identifying data in, 77

online communities, 12

public data, 15

sentiment analysis, 203

social aspect of social media, 68

social media as sharing, 69

timelines, 77

fan pages (Facebook), 77

feedback (objective), 31

feedback loops, 118

filtering

comments

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

data, 192

choosing filter words, 198

defining, 3

noisy data, 24-29

filters (stream computing), 127

finding

an audience, xxvi

big data, 69

the right data (social analytics process), 193-194

Forbes Magazine, 11, 65, 215

forecasting trends, 51-53

forums (discussion)

data identification and type of content, 12

ESN, 172

Foundation for Biomedical Research, 12

Friedlein, Ashley, 83

Fuechsel, George, 1

G

“garbage in, garbage out”, 1

gender and data analysis, 34, 41-42

Generation X, 34

geography, audience comments and data analysis, 33, 39-41

Gessner, Mila, 158

Goethe, Johann Wolfgang von, 157

Grammy Awards

hypothesis validation example (data analysis), 104, 112-113

Twitter and 2014 Grammy Awards, xxix

graphs

directed graphs and stream computing, 130-133

Enterprise Graphs

Analytics, 174

Analytics Services, 174

API, 186

components of, 174-175

Data Services, 174

Data Sources, 174

employee performance, 186

future of, 185-186

Graph Store, 174

PSD, 175-185

sales outcomes, 186

groups

Facebook groups, 78

groups (top word), xxv

H

Harvard Business Review, 212

Hawthorne, Nathaniel, 47

Holmes, Sherlock, xx

Holmes, Sr., Oliver Wendell, 195

House of Cards, xxx

Huffington Post, 13

Hurricane Sandy consumer reaction analysis study, 204-209

Hyde Park, London, 81

hypotheses, validating (data analysis), 103

Cannes Lions 2013 example, 104, 110-112

Grammy Awards example, 104, 112-113

youth unemployment example, 104

data identification/analysis, 105-108

results, 109-110

I

IBM, xxviii, 33

Always On Engagement Center, 125

Chinese factories and, 193-194

comment filtering example, 35-37

Content Analytics, 204

eminence/popularity and data analysis, 44

ESN and, 170

IBM Academy of Technology, 53

IBM BladeCenter, big data analysis example, 72

IBM Commerce, 227

IBM Connections, 94

IBM DeveloperWorks, 58

IBM InfoSphere Streams, 128

IBM Singapore, 51

IBM Watson, 162, 204

ICA, 207

Insight 2014 conference, 90, 172

Project Breadcrumb, 170

PSD, 170, 175

Activity Scorecard KPI, 177-180

assessing business benefits, 183-185

benefits of, 176

Eminence Scorecard KPI, 177, 181-182

future of Enterprise Graphs, 185-186

Network Scorecard KPI, 177, 183

Reaction Scorecard KPI, 177, 180-181

SMA, 204

data analytics case study, 235-240

deep analysis and, 158

evoling topics, 206-209

Social Listening, 158

SPL

data streams, 129

jobs, 129

operators, 129

PE, 129

ports, 129

tuples, 129-130, 134

Twitter and IBM-specific handles, 233

Watson, 162, 204

Watson Content Analytics, 207

IBMAmplify data analytics case study, 227-228

conclusions, 247

data analysis

first pass, 235-241

second pass, 243-244

data identification, 228, 231-235

interpreting information, 244-247

ICA (IBM Content Analytics), 207

IDC, ESN, 88

IDE (Integrated Development Environment) and stream computing, 129

identifying data

attributes of data, 7

language, 9

ownership of data, 14

region, 9

structure, 8

time, 14

type of content, 10-12

venue, 13

bias in, 6-7

case study, 228, 231-235

defining, xxiv, 1, 4

filtered data, 3

goal of, 3-4

hypothesis validation and, 105-108

information, defining, 3

knowledge, defining, 3-4

noisy data, defining, 3

relevancy of data, 5-7

social media outlets, 74

blogs, 80

Facebook, 77

information sharing sites, 78-79

microblogs, 79-80

professional networking sites, 75-76

RenRen, 78

social sites, 77-78

unprocessed data, 2

value pyramid, 3

wikis, 80

wisdom, defining, 3

identifying leads (deep analysis), 158-159

immediacy in social media, 47

India

growth of social media, xxviii

social media outlets, 75

information

defining, 3

data visualizations and information overload, 219

information sharing sites, identifying data in, 78-79

innovation/discovery in ESN, 172

Insight 2014 conference (IBM), 172

Instagram as “in the moment” media type, 47

instructions, data identification and type of content, 11

integrity of data, 150-155

internal social media (domain of analysis), 88

data at rest

deep analysis, 96-97

SSM, 96

data in motion, 94

ad hoc analysis, 95

deep analysis, 95

SSM, 94

Internet Statistics and Market Research Company eMarketer, xxviii

interpreting data, 244-247, xxv, xxvi, xxvii

“in the moment” media types, 47

investigation, social media as, xxiv

iPad, Twitter data collection/filtering example, 22-29

IT architects and data model development, 193

iterative methods and data analysis, 117-119

J

Japan, growth of social media, xxviii

JavaScript, JSON and stream computing, 133-136

J.D. Power, North America Airline Satisfaction Study, 39

JetBlue, positive/negative experiences and data analysis, 38

jobs

data analysis and job roles, 35

SPL, 129

.jpg files, wildcards, 23

JSON (JavaScript Object Notation) and stream computing, 133-136

K

Katz, Randy, 66

keywords

data identification and hypothesis validation (data analysis), 105-108

noisy data, filtering, 28-29

Kintz, Jarod, 14

Kipling, Rudyard, xxiv, xxv

knowledge

defining, 3-4

ESN

deconstructing the creation of, 172

redistribution of, 172

Kohirkar, Avinash, 43

KPI (Key Performance Indicators) in PSD

Activity Scorecard KPI, 177-180

Eminence Scorecard KPI, 177, 181-182

Network Scorecard KPI, 177, 183

Reaction Scorecard KPI, 177, 180-181

Kremer-Davidson, Shiri, 170

L

language

data analysis and, 33, 39-41

data attribute, 9

NLP, defining, xxx

Lazowska, Edward, 66

leads (deep analysis)

classifying, 160-161

identifying, 158-159

qualifying, 160-161

line charts, 216-218

LinkedIn, xxii, 13

data identification and type of content, 10

demographics, 76

identifying data in, 76

online communities, 12

sentiment analysis, 55, 76, 203

user profiles, 76

Linux and egrep, 25-27

location, audience comments and data analysis, 33, 39-41

London, England, 81

loops (feedback), 118

Lotus Notes Mail, 172

Lynd, Robert Staughton, 17

M

machine capacity (taxonomy of data analysis), 84-86, 90-91, 94-98

Maraboli, Steve, 121

marketing and data analysis, xxvi

matrixes (affinity), 244

Memon, Amina, 122

Merriam-Webster, xxvii, 4

Mexico, growth of social media, xxviii

microblogs, xxvi, 12. See also blogs

consumer reaction analysis study, 205

data identification, 12, 79-80

sentiment analysis, 203

Microsoft, defining big data, 66

Middle-East, growth of social media, xxviii

modeling data, defining, xxv

modifying/customizing tools in the social analytics process, 201-203

motion, data at (states of data), 14

Murphy, Capt. Edward A., 189

Murphy’s Law, 189

N

NASA, defining big data, 65

natural resource, big data as, xxviii

near real-time data analysis, 86, 11-123

Neeleman, David, 38

negative/positive bias and data analysis, 31-32

negative/positive connotations, words that can have, 202

negative/positive experiences, 37-39

Netflix, xxx

nets, casting (data collection), 19-23

network architects and data model development, 192

Network Scorecard KPI in PSD, 177, 183

networking sites (professional), identifying data in, 75-76

news, data identification and type of content, 11

New York Times, 5

NHL playoffs, Comcast customer satisfaction, xxii

NIST (National Institute of Standards and Technology), defining big data, 66

NLP (Natural Language Processing), defining, xxx

Nobel Prize, 5-7, 193

noisy data

defining, 3

filtering

keywords, 28-29

regular expressions, 24-27

nonscoped datasets, sifting through big data, 71

O

Obama, President Barack, 14, 86

objective feedback, 31

observations in structured data, 64

Occupy Wall Street movement, 33

Olympics (Summer) data visualization scaling example, 215

online communities, 12

operators (SPL), 129

Oracle Corporation, defining big data, 66

overloaded information in data visualizations, 219

ownership of data (data attribute), 14

P

Pakistan, 5

Pandya, Aroop, 170

Paradox of Choice: Why More Is Less, The, 70

PE (Processing Elements), SPL, 129

Pepsi, 9

performance (employee) and Enterprise Graphs, 186

PETA (People for the Ethical Treatment of Animals), 12

Pew Research Center, social media traffic, 41

photos/pictures as a type of content (data attribute), 10

phrases (top word), xxv

Picasso, Pablo, 211

pictures/photos as a type of content (data attribute), 10

pie charts, 213-214

Pinterest, data identification and type of content, 10

Plurad, Jason, 170

popularity/eminence and data analysis, 35, 42-44

ports (SPL), 129

positive/negative bias and data analysis, 31-32

positive/negative connotations, words that can have, 202

positive/negative experiences and data analysis, 37-39

Post-World War II baby boom, 34

predictive analytics

defining, 49

descriptive analytics versus, 48-49

sentiment and, 51-53

trend forecasting, 51-53

presidential debates

2008, 122-124

2012, 14

presidential election (2012), 86

Press, Gil, 65

press releases, data identification and type of content, 11

privacy and employees, 170

private data, 15

professional networking sites, identifying data in, 75-76

profession/expertise and data analysis, 34

Project Breadcrumb (IBM), 170

proprietary data, 15

PSD (Personal Social Dashboard), 170, 175

benefits of, 176

business benefits, assessing, 183-185

Enterprise Graphs, the future of, 185-186

KPI

Activity Scorecard, 177-180

Eminence Scorecard, 177, 181-182

Network Scorecard, 177, 183

Reaction Scorecard, 177, 180-181

public data, 15

public versus employee comments, 31-32

pyramid of data value, 3, 18

Q

qualifying leads (deep analysis), 160-161

quantitative forecasting, 51-53

questions, posing (social analytics process), 190

R

raw data, structuring, 61-62

Reaction Scorecard KPI in PSD, 177, 180-181

real-time data analytics, 121

2008 presidential debates, 122-124

conference data, 138-139

as early warning system, 139

IBM Always On Engagement Center, 125

near real-time analytics versus, 123

real-time views, xxv

stream computing, 128

directed graphs, 130-133

SPL, 129-130

SSM and, 131-136

value of, 122, 125, 138-139

redistribution of knowledge (ESN), 172

refining data (social analytics process), 192

clear communication, 195-198

data duplication, 198-200

region (data attribute), 9

regular expressions, 23

egrep, 25-27

noisy data, filtering, 24-27

Reilly, Rick, 155

Reisner, Rebecca, xxi

relationship matrixes (deep analysis), 92, xxv

relevancy of data and the data identification process, 5-7

RenRen, 76-78

representing data. See data modeling

rest, data at (states of data), 14

REST (Representational State Transfer), SSM and stream computing, 132

Robbins, Naomi, 215

Robinson, David, 170

roles (job) and data analysis, 35

Rometty, Ginni, xxviii

Romney, Mitt, 14, 86

Royal Holloway University of London, 122

Russia, growth of social media, xxviii

S

sales outcomes and Enterprise Graphs, 186

Salmon of Doubt, The, 31

Sandy (Hurricane) consumer reaction analysis study, 204-209

SapphireNow, big data analysis example, 74

satisfaction

customer satisfaction

Comcast, xxi, xxii

Twitter, xxi, xxii

data analysis and, 37-39

scaling issues with data visualization, 215

scatter plots, 218

Schwartz, Barry, 70

Science Magazine, 11

scoped datasets, sifting through big data, 71

Scott, Chief Engineer Montgomery (Star Trek), 4

selecting tools in the social analytics process, 204

sentiment analysis, 202

defining, xxx

descriptive analytics and, 55-57

LinkedIn and, 76

microblogs, 203

predictive analytics and, 51-53

seven attributes of data, 7

language, 9

ownership of data, 14

region, 9

structure, 8

time, 14

type of content, 10

blogs, 12

discussion forums, 12

instructions, 11

microblogs, 12

news, 11

press releases, 11

wikis, 12

venue, 13

sharing, social media as a way of, 69

Shirk, Adam Hull, 189

sifting through big data

nonscoped/scoped datasets, 71

paradox of choice, 70, 74

signal-to-noise ratio, 71

signal-to-noise ratio, sifting through big data, 71

Simple Social Metrics, 53

SMA (Social Media Analytics), 192, 204

affinity analysis, 165-167

Calgary Floods project, 164-167

data analytics case study, 235-240

deep analysis and, 158

evolving topics, 163-164, 206-209

SnapChat, data identification and type of content, 10

social analytics

process of, 190-192

choosing filter words, 198

clear communication, 195-198

configuring tools, 192

consumer reaction study, 204-209

customizing/modifying tools, 201-203

data duplication, 198-200

developing a data model, 192

filtering data, 192, 198

finding the right data, 193-194

gathering data, 191-194

posing questions, 190

refining data, 192, 195-200

selecting tools, 204

troubleshooting

collecting data, 193-194

consumer reaction analysis, 204-209

customizing/modifying tools, 201-203

filtering data, 198

refining data, 195-200

selecting tools, 204

Social Listening (IBM), 158

social media

as a way of sharing, 69

big data as, 67

entertainment, 68

sharing, 69

social aspect, 68

China, 74

as conversation, xx, xxi, xxii, xxiii

defining, xx, xxvii, 12

as entertainment, 68

Europe, 75

external social media (domain of analysis), 88

ad hoc analysis, 90

deep analysis, 90-93

SSM, 89-90

growth of, xxviii

identifying data in, 74

blogs, 80

information sharing sites, 78-79

microblogs, 79-80

professional networking sites, 75-76

social sites, 77-78

wikis, 80

India, 75

internal social media (domain of analysis), 88

ad hoc analysis, 95

deep analysis, 95-97

SSM, 94-96

as investigation, xxiv

social aspect of, 68

social sites, identifying data in, 77-78

Solis, Brian, 169

South Africa, 9

South Korea, growth of social media, xxviii

Speakers’ Corner (Hyde Park, London), 81

specific audiences and data analysis, 35

SPL (Streams Processing Language)

applications, 129-130

data streams, 129

jobs, 129

operators, 129

PE, 129

ports, 129

tuples, 129-130, 134

Sprout Social, positive/negative experiences and data analysis, 38

SSM (Simple Social Metrics), 85, 124

data at rest

external social media, 90

internal social media, 96

data in motion

external social media, 89

internal social media, 94

stream computing and, 131-132

classifying data, 135-136

JSON, 133-136

word clouds, 136

Stapp, Dr. John Paul, 189

Star Trek, 4

Stikeleather, Jim, 212

stream computing, 126

directed graphs, 130-133

filters, 127

IBM InfoSphere Streams, 128

real-time data analytics, 128

REST and, 132

SPL

applications, 129-130

data streams, 129

jobs, 129

operators, 129

PE, 129

ports, 129

tuples, 129-130, 134

SSM and, 131-132

classifying data, 135-136

JSON, 133-136

word clouds, 136

stream components, 128-130

Streams Studio IDE, 129

structured data, 8

attributes in, 64

context’s role in, 63

defining, 63-64

observations in, 64

raw data example, 61-62

unstructured data versus, 63-64

suggested action phase (deep analysis), 161-163

Summer Olympics data visualization scaling example, 215

Super Bowl and Twitter, 8

system architects and data model development, 192

T

Taliban, 5

target audience, determining

age of author, 34, 41-42

bias, 31-32

eminence/popularity, 35, 42-44

gender, 34, 41-42

geography, 33, 39-41

IBM example, 35-37

language, 33, 39-41

objective feedback, 31

profession/expertise, 34

public versus employee comments, 31-32

roles (job), 35

satisfaction, 37-39

specific audiences, 35

taxonomy of data analysis, 83

depth of analysis, 84-85

domain of analysis, 84, 169

external social media, 88-90

internal social media, 88, 94-96

duration of analysis, 90-91, 96

machine capacity, 84-86, 90-91, 94-98

velocity of data, 84

data at rest, 100-101

data in motion, 99

TED Talks, 170

tennis, 35

Te’o, Manti, 154

text as a type of content (data attribute), 10

themes, discovering (data analysis), 103, 113-117

Thirteen Reasons Why, 48

“Three Elements of Successful Data Visualizations, The”, 212

time (data attribute), 14

Time Magazine, 13

timelines (Facebook), 77

timing

data analytics and, 57-58

“in the moment” media types, 47

Tolkien, J. R. R., 141

topics

discovering (data analysis), 103, 113-117

evolving, 163-164, 206-209

top word groups/phrases, xxv

transparency of communication (ESN), 171

trends

discovering (data analysis), 103, 113-117

forecasting, 51-53

topics in social media, 47

troubleshooting

data visualizations

3D, 220-221

color, 221

information overload, 219

social analytics process

collecting data, 193-194

consumer reaction analysis, 204-209

customizing/modifying tools, 201-203

filtering data, 198

refining data, 195-200

selecting tools, 204

Tumblr, sifting through big data, 71

tuples (SPL), 129-130, 134

Twitter, xxvi, 13

animal testing debate, 12

Apple example and data collection, 22-29

as “in the moment” media type, 47

Citibank and, 197

Comcast and, xxi, xxii

consumer reaction analysis study, 205

customer satisfaction, xxi, xxii

data analysis, xxv

demographics, 80

eminence/popularity and data analysis, 44

Grammy Awards (2014), xxix

IBM-specific handles, 233

identifying data in, 80

positive/negative experiences and data analysis, 38

public data, 15

sentiment analysis, 203

social media as sharing, 69

SSM and, 85

Super Bowl, 8

type of content (data attribute), 10

blogs, 12

discussion forums, 12

instructions, 11

microblogs, 12

news, 11

press releases, 11

wikis, 12

Tzu, Sun, xxvi

U

unemployment (youth), hypothesis validation example (data analysis), 104

data identification/analysis, 105-108

results, 109-110

unfiltered (noisy) data

defining, 3

processing

keywords, 28-29

regular expressions, 24-27

uniqueness of data, 200

United Kingdom, 9

United Nations, 7

United States

growth of social media, xxviii

presidential debates

2008, 122-124

2012, 14

presidential election (2012), 86

unprocessed data, defining, 2

unstructured data, 8

data visualizations, 222-225

defining, 64

raw data example, 61-62

structured data versus, 63-64

user profiles (LinkedIn), 76

US Open (Tennis), 35

V

validating

data, 20-23

a hypothesis (data analysis), 103

Cannes Lions 2013 example, 104, 110-112

Grammy Awards example, 104, 112-113

youth unemployment example, 104-110

valuing data

big data, defining, 66

value pyramid, 3, 18

variety and defining big data, 66

velocity

big data, defining, 66

of data (taxonomy of data analysis), 84

data at rest, 100-101

data in motion, 99

external social media (domain of analysis), 88

ad hoc analysis, 90

deep analysis, 90

SSM, 89

internal social media (domain of analysis)

ad hoc analysis, 95

SSM, 94

venue (data attribute), 13

veracity and defining big data, 66, 69

video as a type of content (data attribute), 10

viewing data in real time (data analysis), xxv

Vine as “in the moment” media type, 47

visualizing data, xxv, 211-212

3D, 220-221

bar charts, 214-215

color, 221

effectiveness of, 213

information overload, 219

line charts, 216-218

pie charts, 213-214

scaling issues, 215

scatter plots, 218

troubleshooting, 219-221

unstructured data, 222-225

word clouds, 224-225

volume and defining big data, 66

W

Wallace, Marie, 170-172

Watson Content Analytics (IBM), 207

Watson, Dr., xx

Watson (IBM), 162, 204

web page visits, computing, 20-21

Western Governors University, 12

wheat, separating from chaff, 17

Whiting, Anita, 68

Why Greatness Cannot Be Planned, 211

wikis

data identification and type of content, 12

identifying data in, 80

wildcards, 23

Williams, David, 68

Winfrey, Oprah, 35, 144, 193, 222

wisdom, defining, 3

Wonder Bread, 232

Wong, Kyle, 11

Wong, Shara LY, 51

word clouds, xxv, xxx, 224-225

defining, 153

stream computing and, 136

word groups/phrases, xxv

World War II baby boom, 34

“worm” graph (2008 presidential debates), 122

Y

Yousafzai, Malala, 5-7

youth unemployment, hypothesis validation example (data analysis), 104

data identification/analysis, 105-108

results, 109-110

YouTube

data identification and type of content, 10

demographics, 78-79

identifying data in, 78-79

JetBlue and customer positive/negative experiences, 39

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.122.235