Aggregation (or aggregate
)
Alignment
Anaconda
AnacondaCon conference, 364
Anscombe’s quartet
apply
∗args
, function parameter, 408
Arrays
Arrow, 58
for dates and times, 280
assert
, checking data assembly with, 166
Assignment
astype
method
Attributes
Average cluster algorithm, in hierarchical clustering, 353–354
Calculations
Carpentries, 364
CAS (computer algebra systems), 359
category
Centroid cluster algorithm, in hierarchical clustering, 353–354
Characters
Clustering
Code
Columns
Columns, with multiple variables
Columns, with values not variables
Command line
Comma-separated values. See CSV (comma-separated values)
Complete cluster algorithm, in hierarchical clustering, 352
Comprehensions
Computer algebra systems (CAS), 359
Concatenation (concat)
concurrent.features
, 360
conda
Confidence interval, in linear regression example, 285
Containers
Conversion, of data types
Counting
Covariates
Cox proportional hazards model
C printf
style formatting, 429
cProfile
, profiling code, 360
Cross-validation
CSV (comma-separated values)
Cumulative sum (cumsum
), 199
cython
, performance-related library, 360
Dash, 362
Dashboards, 362
Dask
library, 360
Data assembly
DataFrame
diagnostics (See Model diagnostics)
generalized linear (See GLM (generalized linear models))
linear (See Linear models)
Data normalization
Data sets
Data structures
Data types (dtype
)
datetime
Day, extracting date components from datetime
object, 254–257
Daylight savings time, 278
Density plots
Diagnostics. See Model diagnostics
Dictionaries (dict
)
dropna
parameter
Dropping (drop
)
dtype
. See Data types (dtype
)
Ganssle, Paul, 280
Gapminder data set, 4
Generalized linear models (GLM). See also Linear regression models
Generators
get
Git for Windows, 377
github, 365
GLM (generalized linear models). See Generalized linear models
Groups
Guido, Sarah, 241
Hendryx-Parker, Calvin, 387
hexbin
plot
Hierarchical clustering
Histograms
Ibis, 361
id
, unique identifiers, 220
IDEs (integrated development environments), Python, 382
iloc
Importing (import
). See also Exporting/importing data
Indemics (Interactive Epidemic Simulation) data set, 208
Indices
Installation
Integers (int/int64
)
integrated development environments (IDEs), 382
Interactive Epidemic Simulation (Indemics) data set, 196
IPython (ipython
)
Iteration. See Loops (for loop)
Lander, Jared, 241
Leap years/leap seconds, 278
Libraries. See also by individual types
Linear regression models. See also GLM (generalized linear models)
Linux
Lists (list
)
lmplot
Loading data
loc
Logistic regression
Loops (for loop
)
Mac
Machine Learning Operations (MLOps), 362
Many-to-one merges, 163
Markham, Kevin, 422
matplotlib
library
Mean (mean
)
Meetups, 363
melt
function
Merges (merge
)
Methods
Miniconda, 374
Mirjalili, Vahid, 241
Missing data (NaN
values)
MLOps (Machine Learning Operations), 362
Model diagnostics
Models
generalized linear (See GLM (generalized linear models))
linear (See Linear models)
Month, extracting date components from datetime
object, 254–257
Müller, Andreas, 241
Multiple regression
Multivariate statistics
NaN.
See Missing data (NaN
values)
Na
value, missing data with built-in, 218
ndarray
Negative numbers, slicing values from end of container, 230–231
Normal distribution
numba
library
Numbers (numeric
)
numpy
library
nunique
method, grouped frequency counts, 27
Packages
Pairwise relationships (pairplot
)
pandera
, 361
Panel, 362
Parameters
Patterns. See also Regular expressions (regex)
pd
PEP8 (Python Enhancement Proposal 8), 393
Performance
Pivot/unpivot
Plots/plotting (plot
)
PLOT_TYPE functions, 111
Point representation, Anscombe’s data set, 67
Poisson regression
Polars, 360
Pryke, Bejamin, 422
PyCon conference, 364
PyData, 364
pyenv
, 374
pyjanitor
, 361
Python
Anaconda distribution, 385
assert
, 166
command line and text editor, 381
comparing Pandas types with, 7
conferences, 364
enhanced features in Pandas, 3
IDEs (integrated development environments), 382
jupyter
command, 382
as object-oriented languages, 417
scientific computing stack, 350
working with objects, 5
as zero-indexed languages, 399
Python Enhancement Proposal 8 (PEP8), 393
Ranges (range
)
Raschka, Sebastian, 241
R ecosystem, 362
Regex
. See Regular expressions (regex)
Regression
keeping labels in sklearn
models, 293
Regular expressions (RegEx
)
Regularization
reindex
method, reindexing as source of missing values, 209–210
Ridge
Rows
Scalars, 40
Scatterplots
Scientific computing stack, 350
SciPy conference, 364
scipy
library
Scripts
seaborn
Searches. See Find
Semicolon (;), types of delimiters, 55
Serialization, serialize and save data in binary format, 53
Series
Shape
Shiny for Python, 362
Simple linear regression
Single cluster algorithm, in hierarchical clustering, 352–353
Siuba, 360
size
attribute, Series
, 35
sklearn
library
keeping labels in sklearn
models, 293
Slicing
snakevis
, profiling code, 360
sns.distplot
, creating histograms, 81
Special characters, regular expressions, 240
Split–apply–combine, 175
split
method
Spyder IDE, 382
SQL
Square brackets ([])
Statistical graphics
seaborn
library, 78
Statistics
statsmodels
library
Storage
str
accessor, 123
Streamlit, 362
Strings (string
)
Subplot
syntax, 68
Subsets/subsetting
sum
Summarization. See Aggregation (or aggregate
)
SyiPy, 359
Tables
tail
, returning last row, 13
T
attribute, Series
, 35
Terminal
application, Mac, 377
Text. See also Characters; Strings (string
)
Tidy data
tidyverse
, 360
Time. See datetime
timedelta
object
timeit
function, timing execution of statements or expressions, 360, 427–428
to_csv
method, 55
to_excel
method, 56
to_feather
method, 57
Transform (transform
)
True
, 434
Tuples (tuple
), 396
type
function, working with Python objects, 5
Values (value
)
columns containing values not variables (See Columns, with values not variables)
creating DataFrame
values, 34
dropping, 52
missing (See Missing data (NaN
values))
Series
attributes, 35
VanderPlas, Jake, 359
Variables
adding covariates to linear models, 324
bi-variable statistics (See Bivariate statistics)
calculations involving multiple, 191
columns containing multiple (See Columns, with multiple variables)
columns containing values not variables (See Columns, with values not variables)
multiple variable statistics (See Multivariate statistics)
single variable statistics (See Univariate statistics)
statsmodels
library used with categorical variables, 289–291
Vectors (vectorize
)
Violin plots
Visualization
Voilà, 362
52.14.181.129