Symbols
%matplotlib inline function 320
~ character 126–127
A
advanced search patterns 402–405
agg method 231, 238
aggregate operations 229–232
aggregate, meaning of term 201
Anaconda
Anaconda distribution 329–330
environment
creating new 336–341
installing xlrd and openpyxl libraries in 301
installing
in macOS 330–331
in Windows 333–335
Anaconda Navigator 341–343
Anaconda Prompt
commands 335–336
launching 335
AND condition 124–125
append method 371, 374, 377
apply method 72–78, 232, 237–238, 296–297
arange method 385–386
arguments 357–361
arithmetic operations 43–45
arrays 291
astype method 116–120, 140, 143
attributes 364–365
DataFrame and Series sharing 84–87
of GroupBy object 225–229
on ndarray object 386
Series object 32–34
B
B metacharacter 401
b metacharacter 401–402
bar graphs 325–327
between method 129, 142, 144
blocks 361, 373
Boolean methods 127–130, 155–157, 351–352
broadcasting Series object 45–47
buckets 221
C
capitalize method 164, 367
cd (change directory) command 333, 336
character replacement 153–155
charts
line 320–325
pie 327–328
chop threshold 316–317
class method 167
classes 24, 363–364, 392
closed intervals 70
clusters 221
coding challenges
DataFrame object 109–112
dates and times 282–287
filtering DataFrame 139–143
GroupBy object 235–237
importing and exporting 306–309
MultiIndex DataFrames 194–197
reshaping and pivoting 214–218
Series methods 75–77
Series object 50–52
text data 162–163
colormaps method 325
columns
converting to datetimes 268–269
extracting values from specific 103–106
extracting with MultiIndex DataFrames 179–182
filtering by one or more criteria 15–17
grouping by multiple 233–234
maximum width 316
renaming 106–108
selecting from DataFrame object 96–99
selecting multiple columns 97–99
selecting single column 96–97
sorting by column index 95
commands
Anaconda Prompt 335–336
Terminal 332–333
comments 348
concat function 243–244, 247, 256, 259
concatenating 239–259
coding challenge 254–258
data sets 243–245
introducing data sets 241–242
missing values in concatenated DataFrames 245–247
conditions
between method 128–130
dealing with null values 132–134
isin method 127–128
isnull and notnull methods 130–132
multiple conditions 124–127
AND condition 124–125
inversion with ~ 126–127
methods for Booleans 127
OR condition 125–126
single condition 120–124
constructors 24
contains method 155–156, 164
context blocks 317
count method 36, 86, 368, 372
counting values
in Series 14–15
with value_counts method 66–72
cross-sections, MultiIndex DataFrames 188–189
CSV files 299–301
cumsum (cumulative sum) method 38
custom functions 361–362
custom operations 232–233
D
d metacharacter 402
d+ metacharacter 405
data
fake data 391–396
getting started with Faker 392–394
installing Faker 391
populating DataFrame with fake values 395–396
grouping 18–20
in 21st century 4
text data 147–164
Boolean methods 155–157
coding challenge 162–163
letter casing and whitespace 148–151
regular expressions 163–164
splitting strings 157–161
string slicing 151–153
string slicing and character replacement 153–155
wide vs. narrow 199–200
data analysis 7
data sets
concatenating 243–245
creating GroupBy object from 222–225
importing 9–11
importing with read_csv function 55–60
melting 209–213
optimizing for memory use 114–116
data types 348–352
Booleans 351–352
converting with astype method 116–120
None object 352
numbers 348–349
strings 349–351
DataFrame 79–112
coding challenge 109–112
columns from, selecting 96–99
selecting multiple columns 97–99
selecting single column 96–97
creating
from dictionary 80–81
from NumPy ndarray 81–83
pivot table from DataFrame 200–207
extracting values from Series 106
filtering 113–144
coding challenge 139–143
by condition 127–134
converting data types with astype method 116–120
duplicates 134–139
by multiple conditions 124–127
optimizing data set for memory use 114–116
by single condition 120–124
JSON (JavaScript Object Notation) files
exporting 298
loading into 292–298
manipulating 11–14
missing values in concatenated 245–247
populating with fake values 395–396
renaming columns or rows 106–108
resetting index 108–109
rows from, selecting 99–106
extracting rows by index label 99–101
extracting rows by index position 101–103
extracting values from specific columns 103–106
setting new index 95–96
similarities between Series and 83–90
importing DataFrame with read_csv function 83–84
shared and exclusive attributes 84–87
shared methods 87–90
sorting 90–93
by index 94–95
by multiple columns 92–93
by single column 90–92
date offsets 275–277
date() function 364
dates and times 260–288
adding and subtracting durations of time 273–275
coding challenge 282–287
converting column or index values to datetimes 268–269
date offsets 275–277
storing multiple timestamps in DatetimeIndex 266–268
timedelta object 277–282
Timestamp object 261–266
pandas and datetimes 264–266
Python and datetimes 261–264
using DatetimeProperties object 269–273
DatetimeIndex 266–268
DatetimeProperties object 269–273
day_name method 271
day_of_week function 76–77
default argument 25, 359
delimiters 375
dependencies 329
describe_option function 311–312, 318
dictionaries 30, 80–81, 377–380
difference method 382
dimensions 383–385
dir (directory) command 335
drop method 163
drop_duplicates method 143, 257
dropna method 62, 77, 132, 142, 144, 152, 406
dt method 271
dt.day_name method 271, 284
duplicated method 144
duplicates 134–139
drop_duplicates method 136–139
duplicated method 134–136
E
elements 368
empty lists 369
empty strings 349
endswith method 164, 367
environments 329
equality operators 354–356
Excel workbooks 301–306
exporting 305–306
importing 302–305
installing xlrd and openpyxl libraries in Anaconda environment 301
exceptions 354
explode method 214, 219
exploding list of values 213–214
exporting 289–309
coding challenge 306–309
CSV files 299–301
Excel workbooks 305–306
JavaScript Object Notation (JSON) files 290–298
exporting DataFrame to 298
loading into DataFrame 292–298
extend method 371
F
Faker 391–396
getting started with 392–394
installing 391
populating DataFrame with fake values 395–396
fillna method 118, 132, 144
filtering DataFrame 113–144
by condition 127–134
between method 128–130
dealing with null values 132–134
isin method 127–128
isnull and notnull methods 130–132
by multiple conditions 124–127
AND condition 124–125
inversion with ~ 126–127
methods for Booleans 127
OR condition 125–126
by single condition 120–124
coding challenge 139–143
column by one or more criteria 15–17
converting data types with astype method 116–120
duplicates 134–139
drop_duplicates method 136–139
duplicated method 134–136
optimizing data set for memory use 114–116
find method 368
findall function 398–400
first method 226, 238
first-class object 72
flags 332, 337
flattening 294
float function 349, 358
floating-point number 349
floor division 353
foreign keys 240, 259
forward-fill strategy 39
from_tuples method 167, 169
functions 10, 357–362
arguments and return values 357–361
custom functions 361–362
invoking on every series value with apply method 72–75
G
get method 159–160, 378
get_group method 222, 228–229, 234, 236
get_largest_row function 232–233
get_level_values method 173–174
get_option function 312, 315
graphical spreadsheet applications 6–8
graphs, bar 325–327
groupby method 220–221, 224, 234–235
GroupBy object 220–238
aggregate operations 229–232
applying custom operation to all groups 232–233
attributes and methods of 225–229
coding challenge 235–237
creating
from data set 222–225
from scratch 221–222
grouping by multiple columns 233–234
grouping data 18–20
groups 221
H
half-open intervals 70
head method 35–36, 87, 91, 174, 228, 238, 287
heterogeneous data 85
heterogeneous lists 369
hierarchical data 166
homogeneous data 22, 383
homogeneous lists 369
I
iloc 186–188
immutable objects 262, 365
importing 289–309
coding challenge 306–309
CSV files 299–301
data set with read_csv function 55–60
data sets 9–11
Excel workbooks 301–306
JavaScript Object Notation (JSON) files 290–298
in keyword 367
index 350
converting values to datetimes 268–269
customizing Series object 26–29
extracting rows by position 101–103
labels
extracting rows by 99–101
merging on 253–254
sorting DataFrame object by 94–95
sorting with sort_index method 62–64
stacking and unstacking levels 207–209
index method 368
index position 12
inequality operators 354–356
info method 115, 117, 119, 172, 283–284
inner joins 249–250
inplace parameter 65–66
insert method 371, 377
installing 329–346
Anaconda distribution 329–330
Anaconda environment
creating new 336–341
xlrd and openpyxl libraries in 301
Anaconda Navigator 341–343
Faker 391
Jupyter Notebook 343–346
macOS setup process 330–333
Anaconda 330–331
common Terminal commands 332–333
launching Terminal 331–332
matplotlib 320
Windows setup process 333–336
Anaconda 333–335
common Anaconda Prompt commands 335–336
launching Anaconda Prompt 335
instances 24, 363
instantiation 363
int function 358
intersection method 381
inversion with 126–127
invoking functions 357
isin method 127–128, 142
isnull methods 130–132
isspace method 366
iterations
dictionary 380
list 373–374
J
join method 376
joining 239–259
coding challenge 254–258
inner joins 249–250
introducing data sets 241–242
left joins 247–248
outer joins 251–253
joins 247
JSON (JavaScript Object Notation) files 290–298
exporting DataFrame to 298
loading into DataFrame 292–298
json_normalize function 294–297, 307, 309
Jupyter Notebook 343–346
K
keyword arguments 360
L
labels 22
last method 226, 238
last_name method 392
left joins 247–248
len function 72, 235, 357–358, 369, 374
letter casing 148–151
libraries 3, 329
line charts 320–325
linters 291
list function 31
lists 368–376
converting string to 375–376
list comprehension 374–375
list iteration 373–374
literals 363
loc 182–186
local variables 362
long data set 199
lower method 150, 155–156, 164, 366–367
lstrip (left strip) method 149, 367
M
macOS 330–333
installing Anaconda in 330–331
Terminal
commands 332–333
launching 331–332
mathematical operations 36–47, 352–354
arithmetic operations 43–45
broadcasting 45–47
statistical operations 36–43
matplotlib 320
max method 69, 72, 88, 232, 236, 281
maximum column width 316
mean method 222–223, 230, 236, 271, 281, 287
melt method 211–212, 217–219
melting 211
merge method 248–249, 251–253, 256, 258–259
merging 239–259
coding challenge 254–258
introducing data sets 241–242
on index labels 253–254
metacharacters 399–401
method chaining 69, 367
methods 34, 364–365
DataFrame and Series sharing 87–90
for Booleans 127
of GroupBy object 225–229
microseconds 283
min method 69, 72, 88–89, 231, 237, 281
missing values
creating Series object with 29–30
in concatenated DataFrames 245–247
modules 261, 362–363
modulo operator 353
MultiIndex DataFrames 165–197
coding challenge 194–197
cross-sections 188–189
manipulating 189–194
resetting index 189–193
setting index 193–194
MultiIndex object 166–170
selecting with 179–188
extracting one or more columns 179–182
extracting one or more rows with iloc 186–188
extracting one or more rows with loc 182–186
sorting 175–179
MultiIndex object 166–170
munging 147
mutable object 365
N
nan object 390
narrow data 199–200
ndarray object 385–390
attributes on 386
generating numeric range with arange method 385–386
randint function 388–389
randn function 389–390
reshape method 386–388
nlargest method 64–65, 89, 110, 232
None object 352
normalizing 294
not a number 29, 390
notnull method 130–132, 144
nsmallest method 64–65, 89, 91
nth method 226–227, 238
null values 132–134
numbers 348–349
NumPy 383–390
dimensions 383–385
nan object 390
ndarray object 385–390
attributes on 386
creating DataFrame object from 81–83
generating numeric range with arange method 385–386
randint function 388–389
randn function 389–390
reshape method 386–388
nunique method 67, 88, 118, 140, 174, 194
O
objects 363–364
OOP (object-oriented programming) 347
open intervals 70
openpyxl libraries 301
operands 352
operators 352–356
equality and inequality 354–356
mathematical 352–354
option_context function 317–318
OR condition 125–126
order 22
outer joins 251–253
overwriting 65–66
P
packages 3, 320, 329
pandas 3–21
competitors vs. 8–9
configuring 310–318
chop threshold 316–317
maximum column width 316
option context 317–318
options 311–315
precision 315–316
data in 21st century 4
datetimes and 264–266
graphical spreadsheet applications vs. 6–8
regular expressions and 405–408
tour of 9–20
counting values in Series 14–15
filtering column by one or more criteria 15–17
grouping data 18–20
importing data set 9–11
manipulating DataFrame 11–14
parameters 25, 359, 361
pct_change (percent change) method 39
pd.DataFrame() function 80
pd.Series() function 25
pie charts 327–328
pivot_table method 201–205, 212, 215–216, 218
pivoting 198–219
coding challenge 214–218
creating pivot table from DataFrame 200–207
additional options for pivot tables 205–207
pivot_table method 201–205
exploding list of values 213–214
melting data set 209–213
stacking and unstacking index levels 207–209
wide vs. narrow data 199–200
plot method 321–323, 325–328
pop method 372, 377, 379
precision 315–316
primary keys 240, 253, 259
print function 358–360, 378, 393, 398–400
product method 37
pwd (print working directory) command 332–333
PyPi (Python Package Index) 347
Python 347–382
attributes and methods 364–365
classes and objects 363–364
creating Series object from 30–32
data types 348–352
Booleans 351–352
None object 352
numbers 348–349
strings 349–351
datetimes and 261–264
dictionaries 377–380
functions 357–362
arguments and return values 357–361
custom functions 361–362
lists 368–376
converting string to 375–376
list comprehension 374–375
list iteration 373–374
modules 362–363
operators 352–356
equality and inequality 354–356
mathematical 352–354
passing Series object to functions 48–49
re module 398–399
sets 380–382
string methods 365–368
tuples 376–377
variables 356–357
Pythonic way 375
Q
quotient 353
R
randint function 31, 81, 388–389, 396
randn function 389–390
range function 34
RDBMS (relational database management systems) 239
read_csv function 10, 54, 77, 79, 96, 109–110, 114, 132, 140, 157, 170–171, 189, 201, 242, 255, 268, 293, 299, 302, 309
importing data sets with 55–60
importing DataFrame with 83–84
read_excel function 302–303, 309
read_json function 293, 309
RegEx (regular expressions) 163–164, 312, 397–408
advanced search patterns 402–405
metacharacters 399–401
pandas and 405–408
Python’s re module 398–399
relational databases 8
remove method 372
rename method 107, 112
renaming columns or rows 106–108
reorder_levels method 190
replace method 152, 155, 164, 366
reset_index method 108, 111–112, 190–192, 197
reset_option function 315, 318
resetting index 108–109, 189–193
reshape method 386–388
reshaping 198–219
coding challenge 214–218
creating pivot table from DataFrame 200–207
additional options for pivot tables 205–207
pivot_table method 201–205
exploding list of values 213–214
melting data set 209–213
stacking and unstacking index levels 207–209
wide vs. narrow data 199–200
return values 357–361
round function 68, 73
rows 171
extracting
with iloc 186–188
with loc 182–186
renaming 106–108
retrieving first and last 34–36
selecting from DataFrame object 99–106
extracting rows by index label 99–101
extracting rows by index position 101–103
extracting values from specific columns 103–106
sorting by index 94
rstrip (right strip) method 366
S
S metacharacter 401
s metacharacter 400, 402, 407
sample method 88
save method 306, 309
search function 398
search patterns, advanced 402–405
select_dtypes method 98
Series object 22–78
attributes 32–34
classes and instances 24
coding challenge 50–52, 75–77
counting values in 14–15, 66–72
creating
from Python objects 30–32
with missing values 29–30
customizing index 26–29
extracting values from 106
importing data set with read_csv function 55–60
invoking function with apply method 72–75
mathematical operations 36–47
arithmetic operations 43–45
broadcasting 45–47
statistical operations 36–43
overwriting with inplace parameter 65–66
passing to Python’s built-in functions 48–49
populating with values 24–26
retrieving first and last rows 34–36
similarities between DataFrame object and 83–90
importing DataFrame with read_csv function 83–84
shared and exclusive attributes 84–87
shared methods 87–90
sorting 60–65
by index with sort_index method 62–64
by values with sort_values method 60–62
retrieving with nsmallest and nlargest methods 64–65
set function 381
set_index method 95, 108, 110, 177, 193, 197
set_option function 312, 314, 318
setdefault method 296–297
sets 380–382
setting index 95–96, 193–194
size method 224, 233, 236
slice method 153–155
slicing 350
sort_index method 62–64, 78, 94–95, 175–176, 197, 267
sort_values method 60–62, 65–66, 78, 90–93, 110–112, 175, 281–282, 287
sorting 60–65
by index with sort_index method 62–64
by values with sort_values method 60–62
DataFrame object 90–93
by index 94–95
by multiple columns 92–93
by single column 90–92
MultiIndex DataFrames 175–179
nsmallest and nlargest methods 64–65
split method 158–159, 161–162, 164, 375–376
splitting strings 157–161
SQL (Structured Query Language) 8
squeeze method 406
stack method 208, 217, 219
stacking index levels 207–209
standard library 261, 362
startswith method 164, 367
statistical operations 36–43
str function 155, 358
str.capitalize method 151
str.endswith method 157
str.extract method 405–406
str.get method 159, 407
str.len method 158
str.replace method 155
str.slice method 154
str.split method 158, 160, 213, 407
str.startswith method 156
str.strip method 150
str.title method 151
str.upper method 151
strftime method 75, 77
strings 349–351
methods 365–368
slicing 151–155
strip method 149, 162, 164, 367, 375
subtracting duration of time 273–275
sum method 37, 86–87, 229–230
SUMIF function 6
swapcase method 366
symmetric_difference method 382
T
tail method 35–36, 228, 238
tall data set 199
Terminal
commands 332–333
launching 331–332
text data 147–164
Boolean methods 155–157
coding challenge 162–163
letter casing and whitespace 148–151
regular expressions 163–164
splitting strings 157–161
string slicing 151–153
string slicing and character replacement 153–155
time-series graph 321
timedelta object 277–282
times and dates 260–288
adding and subtracting durations of time 273–275
coding challenge 282–287
converting column or index values to datetimes 268–269
date offsets 275–277
storing multiple timestamps in DatetimeIndex 266–268
timedelta object 277–282
Timestamp object 261–266
pandas and datetimes 264–266
Python and datetimes 261–264
using DatetimeProperties object 269–273
title method 151, 164, 367
to_csv method 299–300
to_datetime function 269, 279, 283–284, 288
to_excel method 305–306, 308–309
to_json method 298
to_timedelta function 278, 285
transpose method 81
tuple function 30, 376
tuples 376–377
type function 32, 294, 363
U
union method 382
unique method 152
unstack method 208–209, 219
unstacking index levels 207–209
upper method 155, 164, 365, 371
V
value_counts method 66–72, 78, 85, 110, 284, 286, 325, 327
values
counting with value_counts method 66–72
creating Series object with missing 29–30
exploding list of 213–214
populating Series object with 24–26
sorting with sort_values method 60–62
values method 380
variables 11, 199, 356–357, 365
VBA (Visual Basic for Applications) 7
visualization 319–328
bar graphs 325–327
installing matplotlib 320
line charts 320–325
pie charts 327–328
VLOOKUP function 6
W
W metacharacter 400
w metacharacter 400
weekday method 364–365
whitespace 148–151
wide data 199–200
Windows 333–336
Anaconda Prompt
commands 335–336
launching 335
installing Anaconda in 333–335
word boundaries 401
wrangling 147
X
xlrd libraries 301
xs method 188–189, 196