Index
A
- A/B tests, Example: Running an A/B Test
- accuracy, Correctness
- activation functions, Other Activation Functions
- AllenNLP, For Further Exploration
- Altair library, For Further Exploration
- Anaconda Python distribution, Getting Python
- args, args and kwargs
- argument unpacking, zip and Argument Unpacking
- arithmetic operations, Vectors
- arrays, Lists
- artificial neural networks, Neural Networks
- assert statements, Automated Testing and assert
- automated testing, Automated Testing and assert
- average (mean), Central Tendencies
B
- backpropagation, Backpropagation
- bagging, Random Forests
- bar charts, Bar Charts-Line Charts
- batch gradient descent, Minibatch and Stochastic Gradient Descent
- Bayesian inference, Bayesian Inference
- Bayes’s theorem, Bayes’s Theorem
- Beautiful Soup library, HTML and the Parsing Thereof
- bell-shaped curve, The Normal Distribution
- BernoulliNB model, For Further Exploration
- Beta distributions, Bayesian Inference
- betweenness centrality, Betweenness Centrality-Betweenness Centrality
- bias input, Feed-Forward Neural Networks
- bias-variance tradeoff, The Bias-Variance Tradeoff
- biased data, Biased Data
- bigram models, n-Gram Language Models
- binary judgments, Correctness
- Binomial distributions, Bayesian Inference
- binomial random variables, The Central Limit Theorem
- Bokeh library, For Further Exploration, Visualization
- Booleans, Truthiness
- bootstrap aggregating, Random Forests
- bootstrapping, Digression: The Bootstrap
- bottom-up hierarchical clustering, Bottom-Up Hierarchical Clustering-Bottom-Up Hierarchical Clustering
- breadth-first search, Betweenness Centrality
- business models, Modeling
- Buzzword clouds, Word Clouds
C
- causation, Correlation and Causation
- central limit theorem, The Central Limit Theorem
- central tendencies, Central Tendencies
- centrality
- character-level RNNs, Example: Using a Character-Level RNN
- charts
- classes, Object-Oriented Programming
- classification trees, What Is a Decision Tree?
- cleaning data, Cleaning and Munging
- closeness centrality, Betweenness Centrality
- clustering
- code examples, obtaining and using, Using Code Examples, Data Science
- coefficient of determination, The Model, Goodness of Fit
- comma-separated files, Delimited Files
- conda package manager, Virtual Environments
- conditional probability, Conditional Probability
- confidence intervals, Confidence Intervals
- confounding variables, Simpson’s Paradox
- confusion matrix, Correctness
- continuity corrections, p-Values
- continuous bag-of-words (CBOW), Word Vectors
- continuous distributions, Continuous Distributions
- control flow, Control Flow
- convolutional layers, Example: MNIST
- correctness, Correctness
- correlation, Correlation-Correlation and Causation
- correlation matrix, Many Dimensions
- cosine similarity, Word Vectors
- Counter instances, Counters
- Coursera, For Further Exploration, For Further Exploration
- covariance, Correlation
- CREATE TABLE statement, CREATE TABLE and INSERT
- cross-entropy loss function, Softmaxes and Cross-Entropy
- csv module (Python), Delimited Files
- cumulative distribution function (CDF), Continuous Distributions
- curse of dimensionality, The Curse of Dimensionality
D
- D3-style visualizations, For Further Exploration
- D3.js library, For Further Exploration, Visualization
- data
- collecting
- describing single sets of
- dispersion, Dispersion
- histograms, Describing a Single Set of Data
- largest and smallest values, Describing a Single Set of Data
- mean (average), Central Tendencies
- median, Central Tendencies
- mode, Central Tendencies
- number of data points, Describing a Single Set of Data
- quantile, Central Tendencies
- specific positions of values, Describing a Single Set of Data
- standard deviation, Dispersion
- variance, Dispersion
- working with
- cleaning and munging, Cleaning and Munging
- dataclasses, Dataclasses
- dimensionality reduction, Dimensionality Reduction
- exploring your data, Exploring Your Data-Many Dimensions
- generating progress bars, An Aside: tqdm
- manipulating data, Manipulating Data
- rescaling, Rescaling
- resources for learning about, For Further Exploration
- tools for, For Further Exploration
- using namedtuple class, Using NamedTuples
- data ethics
- biased data, Biased Data
- censorship, Recommendations
- definition of term, No, Really, What Is Data Ethics?
- examples of data misuse, What Is Data Ethics?
- government restrictions, Collaboration
- issues resulting from bad products, Building Bad Data Products
- model selection, Interpretability
- offensive predictions, Building Bad Data Products
- privacy, Data Protection
- resources for learning about, For Further Exploration
- tradeoffs between accuracy and fairness, Trading Off Accuracy and Fairness
- wide-reaching effects of data science, Should I Care About Data Ethics?
- data mining, What Is Machine Learning?
- data science
- data visualization
- Data.gov, Find Data
- databases and SQL
- CREATE TABLE and INSERT, CREATE TABLE and INSERT
- DELETE, DELETE
- GROUP BY, GROUP BY
- indexes, Indexes
- JOIN, JOIN
- NoSQL databases, NoSQL
- ORDER BY, ORDER BY
- query optimization, Query Optimization
- resources for learning about, For Further Exploration
- SELECT, SELECT
- subqueries, Subqueries
- tools, For Further Exploration
- UPDATE, UPDATE
- dataclasses, Dataclasses
- Dataset Search, Find Data
- de-meaning data, Dimensionality Reduction
- decision boundary, Support Vector Machines
- decision nodes, Creating a Decision Tree
- decision trees
- benefits and drawbacks of, What Is a Decision Tree?
- creating, Creating a Decision Tree
- decision paths in, What Is a Decision Tree?
- entropy and, Entropy
- entropy of partitions, The Entropy of a Partition
- gradient boosted decision trees, For Further Exploration
- implementing, Putting It All Together
- random forests technique, Random Forests
- resources for learning about, For Further Exploration
- tools for, For Further Exploration
- types of, What Is a Decision Tree?
- deep learning
- definition of term, Deep Learning
- dropout, Dropout
- Fizz Buzz example, Example: FizzBuzz Revisited
- Layers abstraction, The Layer Abstraction
- linear layer, The Linear Layer
- loss and optimization, Loss and Optimization
- MNIST example, Example: MNIST-Example: MNIST
- neural networks as sequences of layers, Neural Networks as a Sequence of Layers
- other activation functions, Other Activation Functions
- resources for learning about, For Further Exploration
- saving and loading models, Saving and Loading Models
- softmaxes and cross-entropy, Softmaxes and Cross-Entropy
- tensors, The Tensor
- tools for, For Further Exploration, Deep Learning
- XOR example, Example: XOR Revisited
- defaultdict, defaultdict
- degree centrality, Finding Key Connectors
- DELETE statement, DELETE
- delimited files, Delimited Files
- dependence, Dependence and Independence
- dictionaries, Dictionaries
- dimensionality reduction, Dimensionality Reduction
- directed edges, Network Analysis
- directed graphs, Directed Graphs and PageRank
- discrete distributions, Continuous Distributions
- dispersion, Dispersion
- distributional similarity, Biased Data
- domain expertise, Feature Extraction and Selection
- dot product, Vectors
- Dropout layer, Dropout
- dunder methods, Object-Oriented Programming
- dynamically typed languages, Type Annotations
E
- edges, Network Analysis
- eigenvector centrality
- elements
- creating sets of, Sets
- finding in collections, Sets
- embedding layer, Word Vectors
- ensemble learning, Random Forests
- entropy, Entropy
- enumerate function, Iterables and Generators
- equivalence classes, Using Our Model
- ethics, No, Really, What Is Data Ethics?
- exceptions, Exceptions
F
- f-strings, Strings
- F1 scores, Correctness
- false negatives/false positives, Example: Flipping a Coin, Correctness
- feature extraction and selection, Feature Extraction and Selection
- features, Feed-Forward Neural Networks
- feed-forward neural networks, Feed-Forward Neural Networks
- files
- first-class functions, Functions
- Fizz Buzz example, Example: Fizz Buzz, Example: FizzBuzz Revisited
- floating-point numbers, A More Sophisticated Spam Filter
- functional programming, Functional Programming
- functions, Functions
I
- ID3 algorithm, Creating a Decision Tree
- identity matrix, Matrices
- if statements, Control Flow
- if-then-else statements, Control Flow
- indentation, tabs versus spaces, Whitespace Formatting
- independence, Dependence and Independence
- inference (see hypothesis and inference)
- INSERT statement, CREATE TABLE and INSERT
- interactive visualizations, Visualization
- IPython shell, Virtual Environments, For Further Exploration, IPython
- Iris dataset example, Example: The Iris Dataset
- item-based collaborative filtering, Item-Based Collaborative Filtering
- iterables, Iterables and Generators
L
- language models, n-Gram Language Models
- Latent Dirichlet Analysis (LDA), Topic Modeling
- Layers abstraction
- layers of neurons, Feed-Forward Neural Networks
- least squares solution, The Model, Further Assumptions of the Least Squares Model
- LIBSVM, For Further Investigation
- line charts, matplotlib, Line Charts
- linear algebra
- linear independence, Further Assumptions of the Least Squares Model
- linear layer, The Linear Layer
- linear_model module, For Further Exploration
- lists
- appending items to, Lists
- checking list membership, Lists
- concatenating, Lists
- getting nth element of, Lists
- slicing, Lists
- sorting, Sorting
- transforming, List Comprehensions
- unpacking, Lists
- using as vectors, Vectors
- versus arrays, Lists
- logistic regression
- loss functions, Using Gradient Descent to Fit Models, Loss and Optimization, Softmaxes and Cross-Entropy
- LSTM (long short-term memory), Recurrent Neural Networks
M
- machine learning
- magnitude, computing, Vectors
- manipulating data, Manipulating Data
- MapReduce
- mathematics
- matplotlib library, matplotlib, For Further Exploration
- matrices, Matrices-Matrices
- matrix decomposition functions, For Further Exploration
- matrix factorization, Matrix Factorization-Matrix Factorization
- matrix multiplication, Matrix Multiplication, Example: Matrix Multiplication
- maximum likelihood estimation, Maximum Likelihood Estimation
- mean (average), Central Tendencies
- mean squared error, Using Gradient Descent to Fit Models
- median, Central Tendencies
- meetups example (clustering), Example: Meetups
- member functions, Object-Oriented Programming
- methods
- minibatch gradient descent, Minibatch and Stochastic Gradient Descent
- MNIST dataset example, Example: MNIST-Example: MNIST
- mode, Central Tendencies
- modeling, Modeling, Topic Modeling-Topic Modeling
- models of language, n-Gram Language Models
- modules, Modules
- momentum, Loss and Optimization
- MongoDB, For Further Exploration
- most_common method, Counters
- Movie-Lens 100k dataset, Matrix Factorization
- multi-dimensional datasets, Many Dimensions
- multiline strings, Strings
- multiple regression
- assumptions of least square model, Further Assumptions of the Least Squares Model
- bootstrapping new datasets, Digression: The Bootstrap
- goodness of fit, Goodness of Fit
- model fitting, Fitting the Model
- model for, The Model
- model interpretation, Interpreting the Model
- regularization, Regularization
- resources for learning about, For Further Exploration
- standard errors of regression coefficients, Standard Errors of Regression Coefficients
- tools for, For Further Exploration
- munging data, Cleaning and Munging
- MySQL, For Further Exploration
N
- Naive Bayes
- namedtuple class, Using NamedTuples
- natural language processing (NLP)
- character-level RNN example, Example: Using a Character-Level RNN
- definition of term, Natural Language Processing
- Gibbs sampling, An Aside: Gibbs Sampling
- grammars, Grammars
- n-gram language models, n-Gram Language Models
- recurrent neural networks (RNNs), Recurrent Neural Networks
- resources for learning about, For Further Exploration
- tools for, For Further Exploration
- topic modeling, Topic Modeling-Topic Modeling
- word clouds, Word Clouds
- word vectors, Word Vectors-Word Vectors
- nearest neighbors classification, k-Nearest Neighbors
- Netflix Prize, For Further Exploration
- network analysis
- NetworkX, For Further Exploration
- neural networks
- NLTK, For Further Exploration
- nodes, Network Analysis
- None value, Truthiness
- nonrepresentative data, Biased Data
- normal distribution, The Normal Distribution
- NoSQL databases, NoSQL
- null hypothesis, Statistical Hypothesis Testing
- null values, Truthiness
- NumPy library, Vectors, For Further Exploration, NumPy
P
- p-hacking, p-Hacking
- p-values, p-Values
- PageRank, Directed Graphs and PageRank
- pandas, For Further Exploration, Delimited Files, For Further Exploration, For Further Exploration, pandas
- parameterized models, What Is Machine Learning?
- partial derivatives, Estimating the Gradient
- perceptrons, Perceptrons
- pip package manager, Virtual Environments
- popularity-based recommender systems, Recommending What’s Popular
- Porter Stemmer, Using Our Model
- posterior distributions, Bayesian Inference
- PostgreSQL, For Further Exploration
- precision, Correctness
- predictive models
- decision trees, Decision Trees-For Further Exploration
- definition of modeling, Modeling
- guarding against potentially offensive predictions, Building Bad Data Products
- k-nearest neighbors, k-Nearest Neighbors-For Further Exploration
- logistic regression, Logistic Regression-Support Vector Machines
- machine learning and, What Is Machine Learning?
- multiple regression, Multiple Regression-For Further Exploration
- neural networks, Neural Networks-For Further Exploration
- paid accounts example, Paid Accounts
- salaries and experience example, Salaries and Experience-Salaries and Experience
- simple linear regression, Simple Linear Regression-For Further Exploration
- tradeoffs between accuracy and fairness, Trading Off Accuracy and Fairness
- types of models, What Is Machine Learning?
- principal component analysis (PCA), Dimensionality Reduction
- prior distributions, Bayesian Inference
- private methods, Object-Oriented Programming
- probability
- Bayes’s theorem, Bayes’s Theorem
- central limit theorem, The Central Limit Theorem
- conditional probability, Conditional Probability
- continuous distributions, Continuous Distributions
- definition of term, Probability
- dependence and independence, Dependence and Independence
- normal distribution, The Normal Distribution
- random variables, Random Variables
- resources for learning, For Further Exploration
- tools for, For Further Exploration
- probability density function (PDF), Continuous Distributions
- progress bars, generating, An Aside: tqdm
- pseudocounts, A More Sophisticated Spam Filter
- Python
- args, args and kwargs
- argument unpacking, zip and Argument Unpacking
- automated testing and assert statements, Automated Testing and assert
- benefits of for data science, From Scratch
- control flow, Control Flow
- Counter instances, Counters
- csv module, Delimited Files
- default dict, defaultdict
- dictionaries, Dictionaries
- downloading and installing, Getting Python
- exceptions, Exceptions
- functional programming, Functional Programming
- functions, Functions
- iterables and generators, Iterables and Generators
- json module, JSON and XML
- kwargs, args and kwargs
- list comprehensions, List Comprehensions
- lists, Lists
- modules, Modules
- object-oriented programming, Object-Oriented Programming
- randomness, Randomness
- regular expressions, Regular Expressions
- sets, Sets
- sorting, Sorting
- statsmodels module, For Further Exploration
- strings, Strings
- truthiness, Truthiness
- tuples, Tuples
- tutorials and documentation, For Further Exploration
- type annotations, Type Annotations-How to Write Type Annotations
- versions, Getting Python
- virtual environments, Virtual Environments
- whitespace formatting, Whitespace Formatting
- Zen of Python, The Zen of Python
- zip function, zip and Argument Unpacking
- PyTorch, For Further Exploration, Deep Learning
R
- R,
- R-squared, The Model, Goodness of Fit
- random forests technique, Random Forests
- random variables, Random Variables
- randomness, Randomness
- raw strings, Strings
- recall, Correctness
- recommender systems
- recurrent neural networks (RNNs), Recurrent Neural Networks
- regression coefficients, Standard Errors of Regression Coefficients
- regression trees, What Is a Decision Tree?
- regular expressions, Regular Expressions
- regularization, Regularization
- reinforcement models, What Is Machine Learning?
- relational databases, Databases and SQL
- requests library, HTML and the Parsing Thereof
- rescaling data, Rescaling
- robots.txt files, Example: Keeping Tabs on Congress
S
- scalar multiplication, Vectors
- scale, Rescaling
- scatterplot matrix, Many Dimensions
- scatterplots, Scatterplots-For Further Exploration
- scikit-learn, For Further Exploration, For Further Exploration, For Further Exploration, For Further Exploration, For Further Investigation, For Further Exploration, For Further Exploration, scikit-learn
- SciPy, For Further Exploration, For Further Exploration
- scipy.stats, For Further Exploration
- Scrapy, For Further Exploration
- seaborn, For Further Exploration, Visualization
- SELECT statement, SELECT
- semisupervised models, What Is Machine Learning?
- serialization, JSON and XML
- sets, Sets
- sigmoid function, Feed-Forward Neural Networks, Other Activation Functions
- significance, Example: Flipping a Coin
- simple linear regression
- Simpson's paradox, Simpson’s Paradox
- skip-gram model, Word Vectors
- slicing lists, Lists
- softmax function, Softmaxes and Cross-Entropy
- sorting, Sorting
- spaCy, For Further Exploration
- spam filter example, Feature Extraction and Selection, Naive Bayes-Using Our Model
- SpamAssassin public corpus, Using Our Model
- SQLite, For Further Exploration
- standard deviation, Dispersion
- standard errors, Standard Errors of Regression Coefficients
- standard normal distribution, The Normal Distribution
- statically typed languages, Type Annotations
- statistical models of language, n-Gram Language Models
- statistics
- StatsModels, For Further Exploration
- statsmodels, For Further Exploration
- status updates, analyzing, Example: Analyzing Status Updates
- stemmer functions, Using Our Model
- stochastic gradient descent, Minibatch and Stochastic Gradient Descent
- stride, Lists
- strings, Strings, JSON and XML
- Structured Query Language (SQL), Databases and SQL (see also databases and SQL)
- Student’s t-distribution, Standard Errors of Regression Coefficients
- Sum layer, Recurrent Neural Networks
- sum of squares, computing, Vectors
- supervised models, What Is Machine Learning?
- support vector machines, Support Vector Machines
- Surprise, For Further Exploration
- sys.stdin, stdin and stdout
- sys.stdout, stdin and stdout
T
- tab-separated files, Delimited Files
- tanh function, Other Activation Functions
- TensorFlow, Deep Learning
- tensors, The Tensor
- ternerary operators, Control Flow
- test sets, Overfitting and Underfitting
- text files, The Basics of Text Files, JSON and XML
- topic modeling, Topic Modeling-Topic Modeling
- tqdm library, An Aside: tqdm
- training sets, Overfitting and Underfitting
- trigrams, n-Gram Language Models
- true positives/true negatives, Correctness
- truthiness, Truthiness
- tuples, Tuples, Using NamedTuples
- Twitter APIs, Example: Using the Twitter APIs-Using Twython
- two-dimensional datasets, Two Dimensions
- Twython library, Example: Using the Twitter APIs-Using Twython
- type 1/type 2 errors, Example: Flipping a Coin, Correctness
- type annotations, Type Annotations-How to Write Type Annotations
U
- unauthenticated APIs, Using an Unauthenticated API
- underfitting and overfitting, Overfitting and Underfitting
- underflow, A More Sophisticated Spam Filter
- undirected edges, Network Analysis
- uniform distributions, Continuous Distributions
- unit tests, Testing Our Model
- unpacking lists, Lists
- unsupervised learning, Clustering
- unsupervised models, What Is Machine Learning?
- UPDATE statement, UPDATE
- user-based collaborative filtering, User-Based Collaborative Filtering
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.