1. Preparing and gathering data and knowledge
Chapter 1. Philosophies of data science
1.1. Data science and this book
1.3. Developer vs. data scientist
1.4. Do I need to be a software developer?
1.5. Do I need to know statistics?
1.6. Priorities: knowledge first, technology second, opinions third
Chapter 2. Setting goals by asking good questions
2.1. Listening to the customer
2.1.1. Resolving wishes and pragmatism
2.1.2. The customer is probably not a data scientist
2.1.3. Asking specific questions to uncover fact, not opinions
2.2. Ask good questions—of the data
2.2.1. Good questions are concrete in their assumptions
2.2.2. Good answers: measurable success without too much cost
2.3. Answering the question using data
2.3.1. Is the data relevant and sufficient?
2.3.2. Has someone done this before?
Chapter 3. Data all around us: the virtual wilderness
3.1. Data as the object of study
3.1.1. The users of computers and the internet became data generators
3.2. Where data might live, and how to interact with it
3.3.1. First step: Google search
3.3.2. Copyright and licensing
Chapter 4. Data wrangling: from capture to domestication
4.1. Case study: best all-time performances in track and field
4.2.1. Some types of messy data
4.2.2. Pretend you’re an algorithm
4.2.3. Keep imagining: what are the possible obstacles and uncertainties?
Chapter 5. Data assessment: poking and prodding
5.1. Example: the Enron email data set
5.2.2. Common descriptive statistics
5.3. Check assumptions about the data
5.3.1. Assumptions about the contents of the data
5.4. Looking for something specific
5.4.2. Characterize the examples: what makes them different?
5.5. Rough statistical analysis
2. Building a product with software and statistics
6.2. Reconsidering expectations and goals
Chapter 7. Statistics and modeling: concepts and foundations
7.1. How I think about statistics
7.2. Statistics: the field as it relates to data science
7.4. Statistical modeling and inference
7.4.1. Defining a statistical model
7.4.3. Quantifying uncertainty: randomness, variance, and error terms
7.5. Miscellaneous statistical methods
Chapter 8. Software: statistics in action
8.1. Spreadsheets and GUI-based applications
8.3. Choosing statistical software tools
8.3.1. Does the tool have an implementation of the methods?
8.3.5. Well documented is good
8.3.7. Interoperability is good
8.4. Translating statistics into software
Chapter 9. Supplementary software: bigger, faster, more efficient
9.2. High-performance computing
9.3.1. Types of cloud services
9.3.2. Benefits of cloud services
9.4.1. Types of big data technologies
9.4.2. Benefits of big data technologies
Chapter 10. Plan execution: putting it all together
10.1. Tips for executing the plan
10.1.1. If you’re a statistician
10.1.2. If you’re a software engineer
10.2. Modifying the plan in progress
10.2.1. Sometimes the goals change
10.3. Results: knowing when they’re good enough
10.3.1. Statistical significance
10.3.3. Reevaluating your original accuracy and significance goals
10.4. Case study: protocols for measurement of gene activity
10.4.3. What I needed to learn
3. Finishing off the product and wrapping up
Chapter 11. Delivering a product
11.1. Understanding your customer
11.2.3. Interactive graphical application
11.3.1. Make important, conclusive results prominent
11.3.2. Don’t include results that are virtually inconclusive
11.3.3. Include obvious disclaimers for less significant results
Chapter 12. After product delivery: problems and revisions
12.1. Problems with the product and its use
12.2.1. Feedback means someone is using your product
12.2.2. Feedback is not disapproval
Chapter 13. Wrapping up: putting the project away
13.1. Putting the project away neatly
13.2. Learning from the project
Exercises: Examples and Answers
18.118.253.223