anonymous functions, 80
Anscombe, F., 135
arguments, 30
arrays
changing values in, 91
copies, changing values in, 95
one-dimensional, 87
two-dimensional, 88
element-by-element operations, 91–92
one-dimensional, 87
sequences and, 91
setting type automatically, 97
setting type explicitly, 97–98
two-dimensional, 88
indexing and slicing, 90
views, 94
changing values in, 94
assignment statements, 17
attributes, 22
binomial distribution, 105–107
Boolean operators, 14, 58–59, 125
break statement, 64
break statements, 19
built-in types, 14
character classes, 209
datetime.date, 207
classifier classes, 166
collocations() method, 165
columns
creating, 128
updating, 129
comparison operators, 57–58, 93–94
compiling regular expressions, 211–212
compound statements, 55
structure, 56
concordance() method, 165
constructors
dict(), 38
list(), 29
tuple(), 29
context managers, 205
continue statements, 19
continuous distributions, 108
exponential distribution, 110
copies, changing values in, 95
corpus readers, 160
tokenizers, 161
corpuses, downloading, 166–167
creating
one-dimensional, 87
two-dimensional, 88
columns, 128
DataFrames, 114
from a file, 116
datetime object, 206
dictionaries, 38
DataFrames, 113
columns
creating, 128
updating, 129
creating, 114
from a file, 116
describe method, 118
exclude argument, 120
head method, 117
interacting with, 117
interactive display, 133
masking and filtering, 125–126
methods, 128
optimized access
by index, 124
sorting, 204
tail method, 118
datetime object, 207
creating, 206
setting the time zone, 207
translating strings to, 207
del() function, 40
delete statements, 18
describe method, 118
exclude argument, 120
dict() constructor, 38
checking for keys, 43
creating, 38
creating DataFrames from, 114–115
hash() method, 45
key/value pairs
adding, 39
updating, 39
dictionary comprehensions, 181
key_item, 42
difference() method, 51
discrete distributions, 105
binomial distribution, 105–107
disjoint sets, 48
dispersion_plot() method, 165–166
dot notation, 22
downloading, corpuses, 166–167
elif statement, 62
else statement, 61
equality operators, 56–57, 125
estimators, 156
exponential distribution, 110
expressions, 16
extend method, 31
figures, 136
fileids() method, 160
files
creating DataFrames from, 116
opening, 205
filter() function, 179
replacing with a list comprehension, 180
filtering, DataFrames, 125–126
find iterator, 211
findall() method, 165
flattening nested lists, 167
FreqDist class
built-in plot method, 164
methods, 164
frequency distributions, 161–162
frozensets, 53
f-strings, 34
functional programming, 173, 174–175
changing mutable data, 176–177
dictionary comprehensions, 181
filter() function, 179
generator(s), 182
lambda functions, 179
list comprehensions, 179
conditionals and, 181
multiple variables, 181
replacing map() and filter() with, 180
operator module, 179
inheriting, 174
state, 174
anonymous, 80
control statement, 68
datetime.now(), 206
del(), 40
lambda, 179
len, 27
max, 28
min, 28
nested, 77
as a parameter, 78
positional wildcard, 74
positional-only, 73
re.compile(), 211
re.findall(), 210
re.finditer(), 211
re.search(), 208
return statements, 75
reversed, 41
future statements, 20
generator(s), 182
global statements, 20
code cells, 9
Code Snippets, 11
existing collections and, 11
notebooks, managing, 10
named, 210
hash() method, 45
head method, 117
high-level programming languages, 15
index method, 28
indexing, 26
DataFrames and, 124
inheriting scope, 174
installing, NumPy, 86
instances, 188
interacting with DataFrame data, 117
intersections, 51
ints, 14
numerator attribute, 22
issuperset() method, 50
items() method, 40
JSON files, opening and reading, 205
Keras, 153
key_item view, 42
keys() method, 40
key/value pairs, 37
adding, 39
updating, 39
labels, DataFrames access and, 123–124
len function, 27
libraries. See also NumPy; SciPy
SciPy, 103
third-party, 85
visualization, matplotlib, 135–136
list comprehensions, 179
conditionals and, 181
multiple variables, 181
replacing map() and filter() with, 180
list() constructor, 29
lists, 29
adding and removing items, 30–31
creating DataFrames from, 115–116
flattening, 167
nested, 31
loops
break statement, 64
low-level programming languages, 15
machine learning, 153. See also Scikit-learn
overfitting, 155
splitting test and training data, 155–156
supervised versus unsupervised learning, 154
magic functions, 12
manipulating DataFrames, 127–128, 129
replacing with a list comprehension, 180
Markdown, 6
math operator methods, 195–196
colors, 139
creating multiple axes, 143–144
line styles, 138
object-oriented style, 143
plotting multiple sets of data, 141–143
max function, 28
collocations(), 165
concordance(), 165
count, 28
DataFrames, 128
describe, 118
exclude argument, 120
difference(), 51
disjoint(), 48
extend, 31
fileids(), 160
findall(), 165
hash(), 45
head, 117
index, 28
intersection(), 51
issuperset(), 50
items(), 40
keys(), 40
min(), 125
pop, 30
private, 190
public, 190
representation, 192
reverse, 32
similar(), 165
sort, 32
special, 191
subset(), 49
symmetric difference(), 51
tail, 118
union(), 50
values(), 40
min function, 28
min() method, 125
MinMaxScaler transformer, 154–155
multiple statements, 16
named groups, 210
substitution and, 211
natural language processing, 159
Natural Language Processing with Python, 169
nested functions, 77
nested lists, 31
nested wrapping functions, 78–79
NLTK (Natural Language Toolkit), 159
classifier classes, 166
defining features, 168
flattening nested lists, 167
labeling data, 167
corpus readers, 160
tokenizers, 161
fileids() method, 160
FreqDist class
built-in plot method, 164
methods, 164
frequency distributions, 161–162
Text class, 165
collocations() method, 165
concordance() method, 165
dispersion_plot() method, 165–166
findall() method, 165
similar() method, 165
NoneType, 15
nonlocal statements, 20
managing, 10
numerics, 14
installing and importing, 86
object-oriented programming, 187
instances, 188
representation, 192
special, 191
private methods, 190
datetime, creating, 206
evaluation, 59
one-dimensional arrays, 87
or operator, 59
equality/inequality, 56–57, 125
in, 40
or, 59
walrus, 60
overfitting, 155
packages, zoneinfo, 207
Pandas DataFrames. See DataFrames
parameters
functions as, 78
positional wildcard, 74
positional-only, 73
parser, 14
pass statements, 18
pop method, 30
private methods, 190
procedural programming, 174
programming languages, high-level versus low-level, 15
proper subsets, 49
public methods, 190
Punkt tokenizer, 161
PyTorch, 154
quotation marks, strings and, 33
raw strings, 33
re.compile() function, 211
re.findall() function, 210
re.finditer() function, 211
named groups, 210
substitution, 211
using named groups, 211
removing, items from dictionaries, 39–40
representation methods, 192
re.search() function, 208
reverse method, 32
rich comparison methods, 192–195
running statements, 4
Scikit-learn, 154
estimators, 156
MinMaxScaler transformer, 154–155
splitting test and training data, 155–156
training a model, 156
training and testing, 156
tutorials, 157
SciPy, 103
continuous distributions, 108
exponential distribution, 110
discrete distributions, 105
binomial distribution, 105–107
scipy.special submodule, 105
scipy.stats submodule, 105
inheriting, 174
plot types, 148
arrays and, 91
frozensets and, 53
indexing, 26
intersections, 51
lists, 29
adding and removing items, 30–31
nested, 31
sorting, 32
slicing, 27
testing membership, 26
tuples, 29
difference between, 51
disjoint, 48
proper subsets, 49
subsets and, 49
supersets and, 50
symmetric difference, 51
union, 50
shared operations, 25
similar() method, 165
slicing, 27
DataFrames, 122
sort method, 32
special characters, 33
assignment, 17
delete, 18
elif, 62
else, 61
expression, 16
future, 20
global, 20
multiple, 16
nonlocal, 20
pass, 18
running, 4
yield, 18
f-, 34
quotation marks and, 33
raw, 33
special characters, 33
translating to datetime object, 207
submodules
scipy.special, 105
scipy.stats, 105
subset() method, 49
substitution, 211
supersets, 50
symmetric difference() method, 51
syntax
tail method, 118
TensorFlow, 153
Text class, 165
collocations() method, 165
concordance() method, 165
dispersion_plot() method, 165–166
findall() method, 165
similar() method, 165
third-party libraries, 85
time series data, 206
time zone, setting for datetime object, 207
tokenizers, 161
tuple() constructor, 29
tuples, 29
two-dimensional arrays, 88
indexing and slicing, 90
types, 14–15. See also sequences
union() method, 50
updating
columns, -129
values() method, 40
views, 94
changing values in, 94
visualization libraries, 151
colors, 139
creating multiple axes, 143–144
line styles, 138
object-oriented style, 143
plotting multiple sets of data, 141–143
plot types, 148
walrus operator, 60
yield statements, 18
zoneinfo package, 207
3.133.159.224