index

Symbols

%matplotlib inline function 320

~ character 126–127

A

advanced search patterns 402–405

agg method 231, 238

aggregate operations 229–232

aggregate, meaning of term 201

Anaconda

Anaconda distribution 329–330

environment

creating new 336–341

installing xlrd and openpyxl libraries in 301

installing

in macOS 330–331

in Windows 333–335

Anaconda Navigator 341–343

Anaconda Prompt

commands 335–336

launching 335

AND condition 124–125

append method 371, 374, 377

apply method 72–78, 232, 237–238, 296–297

arange method 385–386

arguments 357–361

arithmetic operations 43–45

arrays 291

astype method 116–120, 140, 143

attributes 364–365

DataFrame and Series sharing 84–87

of GroupBy object 225–229

on ndarray object 386

Series object 32–34

B

B metacharacter 401

b metacharacter 401–402

bar graphs 325–327

between method 129, 142, 144

blocks 361, 373

Boolean methods 127–130, 155–157, 351–352

broadcasting Series object 45–47

buckets 221

C

capitalize method 164, 367

cd (change directory) command 333, 336

character replacement 153–155

charts

line 320–325

pie 327–328

chop threshold 316–317

class method 167

classes 24, 363–364, 392

closed intervals 70

clusters 221

coding challenges

DataFrame object 109–112

dates and times 282–287

filtering DataFrame 139–143

GroupBy object 235–237

importing and exporting 306–309

MultiIndex DataFrames 194–197

reshaping and pivoting 214–218

Series methods 75–77

Series object 50–52

text data 162–163

colormaps method 325

columns

converting to datetimes 268–269

extracting values from specific 103–106

extracting with MultiIndex DataFrames 179–182

filtering by one or more criteria 15–17

grouping by multiple 233–234

maximum width 316

renaming 106–108

selecting from DataFrame object 96–99

selecting multiple columns 97–99

selecting single column 96–97

sorting by column index 95

commands

Anaconda Prompt 335–336

Terminal 332–333

comments 348

concat function 243–244, 247, 256, 259

concatenating 239–259

coding challenge 254–258

data sets 243–245

introducing data sets 241–242

missing values in concatenated DataFrames 245–247

conditions

between method 128–130

dealing with null values 132–134

isin method 127–128

isnull and notnull methods 130–132

multiple conditions 124–127

AND condition 124–125

inversion with ~ 126–127

methods for Booleans 127

OR condition 125–126

single condition 120–124

constructors 24

contains method 155–156, 164

context blocks 317

count method 36, 86, 368, 372

counting values

in Series 14–15

with value_counts method 66–72

cross-sections, MultiIndex DataFrames 188–189

CSV files 299–301

cumsum (cumulative sum) method 38

custom functions 361–362

custom operations 232–233

D

d metacharacter 402

d+ metacharacter 405

data

fake data 391–396

getting started with Faker 392–394

installing Faker 391

populating DataFrame with fake values 395–396

grouping 18–20

in 21st century 4

text data 147–164

Boolean methods 155–157

coding challenge 162–163

letter casing and whitespace 148–151

regular expressions 163–164

splitting strings 157–161

string slicing 151–153

string slicing and character replacement 153–155

wide vs. narrow 199–200

data analysis 7

data sets

concatenating 243–245

creating GroupBy object from 222–225

importing 9–11

importing with read_csv function 55–60

melting 209–213

optimizing for memory use 114–116

data types 348–352

Booleans 351–352

converting with astype method 116–120

None object 352

numbers 348–349

strings 349–351

DataFrame 79–112

coding challenge 109–112

columns from, selecting 96–99

selecting multiple columns 97–99

selecting single column 96–97

creating

from dictionary 80–81

from NumPy ndarray 81–83

pivot table from DataFrame 200–207

extracting values from Series 106

filtering 113–144

coding challenge 139–143

by condition 127–134

converting data types with astype method 116–120

duplicates 134–139

by multiple conditions 124–127

optimizing data set for memory use 114–116

by single condition 120–124

JSON (JavaScript Object Notation) files

exporting 298

loading into 292–298

manipulating 11–14

missing values in concatenated 245–247

populating with fake values 395–396

renaming columns or rows 106–108

resetting index 108–109

rows from, selecting 99–106

extracting rows by index label 99–101

extracting rows by index position 101–103

extracting values from specific columns 103–106

setting new index 95–96

similarities between Series and 83–90

importing DataFrame with read_csv function 83–84

shared and exclusive attributes 84–87

shared methods 87–90

sorting 90–93

by index 94–95

by multiple columns 92–93

by single column 90–92

date offsets 275–277

date() function 364

dates and times 260–288

adding and subtracting durations of time 273–275

coding challenge 282–287

converting column or index values to datetimes 268–269

date offsets 275–277

storing multiple timestamps in DatetimeIndex 266–268

timedelta object 277–282

Timestamp object 261–266

pandas and datetimes 264–266

Python and datetimes 261–264

using DatetimeProperties object 269–273

DatetimeIndex 266–268

DatetimeProperties object 269–273

day_name method 271

day_of_week function 76–77

default argument 25, 359

delimiters 375

dependencies 329

describe_option function 311–312, 318

dictionaries 30, 80–81, 377–380

difference method 382

dimensions 383–385

dir (directory) command 335

drop method 163

drop_duplicates method 143, 257

dropna method 62, 77, 132, 142, 144, 152, 406

dt method 271

dt.day_name method 271, 284

duplicated method 144

duplicates 134–139

drop_duplicates method 136–139

duplicated method 134–136

E

elements 368

empty lists 369

empty strings 349

endswith method 164, 367

environments 329

equality operators 354–356

Excel workbooks 301–306

exporting 305–306

importing 302–305

installing xlrd and openpyxl libraries in Anaconda environment 301

exceptions 354

explode method 214, 219

exploding list of values 213–214

exporting 289–309

coding challenge 306–309

CSV files 299–301

Excel workbooks 305–306

JavaScript Object Notation (JSON) files 290–298

exporting DataFrame to 298

loading into DataFrame 292–298

extend method 371

F

Faker 391–396

getting started with 392–394

installing 391

populating DataFrame with fake values 395–396

fillna method 118, 132, 144

filtering DataFrame 113–144

by condition 127–134

between method 128–130

dealing with null values 132–134

isin method 127–128

isnull and notnull methods 130–132

by multiple conditions 124–127

AND condition 124–125

inversion with ~ 126–127

methods for Booleans 127

OR condition 125–126

by single condition 120–124

coding challenge 139–143

column by one or more criteria 15–17

converting data types with astype method 116–120

duplicates 134–139

drop_duplicates method 136–139

duplicated method 134–136

optimizing data set for memory use 114–116

find method 368

findall function 398–400

first method 226, 238

first-class object 72

flags 332, 337

flattening 294

float function 349, 358

floating-point number 349

floor division 353

foreign keys 240, 259

forward-fill strategy 39

from_tuples method 167, 169

functions 10, 357–362

arguments and return values 357–361

custom functions 361–362

invoking on every series value with apply method 72–75

G

get method 159–160, 378

get_group method 222, 228–229, 234, 236

get_largest_row function 232–233

get_level_values method 173–174

get_option function 312, 315

graphical spreadsheet applications 6–8

graphs, bar 325–327

groupby method 220–221, 224, 234–235

GroupBy object 220–238

aggregate operations 229–232

applying custom operation to all groups 232–233

attributes and methods of 225–229

coding challenge 235–237

creating

from data set 222–225

from scratch 221–222

grouping by multiple columns 233–234

grouping data 18–20

groups 221

H

half-open intervals 70

head method 35–36, 87, 91, 174, 228, 238, 287

heterogeneous data 85

heterogeneous lists 369

hierarchical data 166

homogeneous data 22, 383

homogeneous lists 369

I

iloc 186–188

immutable objects 262, 365

importing 289–309

coding challenge 306–309

CSV files 299–301

data set with read_csv function 55–60

data sets 9–11

Excel workbooks 301–306

JavaScript Object Notation (JSON) files 290–298

in keyword 367

index 350

converting values to datetimes 268–269

customizing Series object 26–29

extracting rows by position 101–103

labels

extracting rows by 99–101

merging on 253–254

sorting DataFrame object by 94–95

sorting with sort_index method 62–64

stacking and unstacking levels 207–209

index method 368

index position 12

inequality operators 354–356

info method 115, 117, 119, 172, 283–284

inner joins 249–250

inplace parameter 65–66

insert method 371, 377

installing 329–346

Anaconda distribution 329–330

Anaconda environment

creating new 336–341

xlrd and openpyxl libraries in 301

Anaconda Navigator 341–343

Faker 391

Jupyter Notebook 343–346

macOS setup process 330–333

Anaconda 330–331

common Terminal commands 332–333

launching Terminal 331–332

matplotlib 320

Windows setup process 333–336

Anaconda 333–335

common Anaconda Prompt commands 335–336

launching Anaconda Prompt 335

instances 24, 363

instantiation 363

int function 358

intersection method 381

inversion with 126–127

invoking functions 357

isin method 127–128, 142

isnull methods 130–132

isspace method 366

iterations

dictionary 380

list 373–374

J

join method 376

joining 239–259

coding challenge 254–258

inner joins 249–250

introducing data sets 241–242

left joins 247–248

outer joins 251–253

joins 247

JSON (JavaScript Object Notation) files 290–298

exporting DataFrame to 298

loading into DataFrame 292–298

json_normalize function 294–297, 307, 309

Jupyter Notebook 343–346

K

keyword arguments 360

L

labels 22

last method 226, 238

last_name method 392

left joins 247–248

len function 72, 235, 357–358, 369, 374

letter casing 148–151

libraries 3, 329

line charts 320–325

linters 291

list function 31

lists 368–376

converting string to 375–376

list comprehension 374–375

list iteration 373–374

literals 363

loc 182–186

local variables 362

long data set 199

lower method 150, 155–156, 164, 366–367

lstrip (left strip) method 149, 367

M

macOS 330–333

installing Anaconda in 330–331

Terminal

commands 332–333

launching 331–332

mathematical operations 36–47, 352–354

arithmetic operations 43–45

broadcasting 45–47

statistical operations 36–43

matplotlib 320

max method 69, 72, 88, 232, 236, 281

maximum column width 316

mean method 222–223, 230, 236, 271, 281, 287

melt method 211–212, 217–219

melting 211

merge method 248–249, 251–253, 256, 258–259

merging 239–259

coding challenge 254–258

introducing data sets 241–242

on index labels 253–254

metacharacters 399–401

method chaining 69, 367

methods 34, 364–365

DataFrame and Series sharing 87–90

for Booleans 127

of GroupBy object 225–229

microseconds 283

min method 69, 72, 88–89, 231, 237, 281

missing values

creating Series object with 29–30

in concatenated DataFrames 245–247

modules 261, 362–363

modulo operator 353

MultiIndex DataFrames 165–197

coding challenge 194–197

cross-sections 188–189

manipulating 189–194

resetting index 189–193

setting index 193–194

MultiIndex object 166–170

selecting with 179–188

extracting one or more columns 179–182

extracting one or more rows with iloc 186–188

extracting one or more rows with loc 182–186

sorting 175–179

MultiIndex object 166–170

munging 147

mutable object 365

N

nan object 390

narrow data 199–200

ndarray object 385–390

attributes on 386

generating numeric range with arange method 385–386

randint function 388–389

randn function 389–390

reshape method 386–388

nlargest method 64–65, 89, 110, 232

None object 352

normalizing 294

not a number 29, 390

notnull method 130–132, 144

nsmallest method 64–65, 89, 91

nth method 226–227, 238

null values 132–134

numbers 348–349

NumPy 383–390

dimensions 383–385

nan object 390

ndarray object 385–390

attributes on 386

creating DataFrame object from 81–83

generating numeric range with arange method 385–386

randint function 388–389

randn function 389–390

reshape method 386–388

nunique method 67, 88, 118, 140, 174, 194

O

objects 363–364

OOP (object-oriented programming) 347

open intervals 70

openpyxl libraries 301

operands 352

operators 352–356

equality and inequality 354–356

mathematical 352–354

option_context function 317–318

OR condition 125–126

order 22

outer joins 251–253

overwriting 65–66

P

packages 3, 320, 329

pandas 3–21

competitors vs. 8–9

configuring 310–318

chop threshold 316–317

maximum column width 316

option context 317–318

options 311–315

precision 315–316

data in 21st century 4

datetimes and 264–266

graphical spreadsheet applications vs. 6–8

regular expressions and 405–408

tour of 9–20

counting values in Series 14–15

filtering column by one or more criteria 15–17

grouping data 18–20

importing data set 9–11

manipulating DataFrame 11–14

parameters 25, 359, 361

pct_change (percent change) method 39

pd.DataFrame() function 80

pd.Series() function 25

pie charts 327–328

pivot_table method 201–205, 212, 215–216, 218

pivoting 198–219

coding challenge 214–218

creating pivot table from DataFrame 200–207

additional options for pivot tables 205–207

pivot_table method 201–205

exploding list of values 213–214

melting data set 209–213

stacking and unstacking index levels 207–209

wide vs. narrow data 199–200

plot method 321–323, 325–328

pop method 372, 377, 379

precision 315–316

primary keys 240, 253, 259

print function 358–360, 378, 393, 398–400

product method 37

pwd (print working directory) command 332–333

PyPi (Python Package Index) 347

Python 347–382

attributes and methods 364–365

classes and objects 363–364

creating Series object from 30–32

data types 348–352

Booleans 351–352

None object 352

numbers 348–349

strings 349–351

datetimes and 261–264

dictionaries 377–380

functions 357–362

arguments and return values 357–361

custom functions 361–362

lists 368–376

converting string to 375–376

list comprehension 374–375

list iteration 373–374

modules 362–363

operators 352–356

equality and inequality 354–356

mathematical 352–354

passing Series object to functions 48–49

re module 398–399

sets 380–382

string methods 365–368

tuples 376–377

variables 356–357

Pythonic way 375

Q

quotient 353

R

randint function 31, 81, 388–389, 396

randn function 389–390

range function 34

RDBMS (relational database management systems) 239

read_csv function 10, 54, 77, 79, 96, 109–110, 114, 132, 140, 157, 170–171, 189, 201, 242, 255, 268, 293, 299, 302, 309

importing data sets with 55–60

importing DataFrame with 83–84

read_excel function 302–303, 309

read_json function 293, 309

RegEx (regular expressions) 163–164, 312, 397–408

advanced search patterns 402–405

metacharacters 399–401

pandas and 405–408

Python’s re module 398–399

relational databases 8

remove method 372

rename method 107, 112

renaming columns or rows 106–108

reorder_levels method 190

replace method 152, 155, 164, 366

reset_index method 108, 111–112, 190–192, 197

reset_option function 315, 318

resetting index 108–109, 189–193

reshape method 386–388

reshaping 198–219

coding challenge 214–218

creating pivot table from DataFrame 200–207

additional options for pivot tables 205–207

pivot_table method 201–205

exploding list of values 213–214

melting data set 209–213

stacking and unstacking index levels 207–209

wide vs. narrow data 199–200

return values 357–361

round function 68, 73

rows 171

extracting

with iloc 186–188

with loc 182–186

renaming 106–108

retrieving first and last 34–36

selecting from DataFrame object 99–106

extracting rows by index label 99–101

extracting rows by index position 101–103

extracting values from specific columns 103–106

sorting by index 94

rstrip (right strip) method 366

S

S metacharacter 401

s metacharacter 400, 402, 407

sample method 88

save method 306, 309

search function 398

search patterns, advanced 402–405

select_dtypes method 98

Series object 22–78

attributes 32–34

classes and instances 24

coding challenge 50–52, 75–77

counting values in 14–15, 66–72

creating

from Python objects 30–32

with missing values 29–30

customizing index 26–29

extracting values from 106

importing data set with read_csv function 55–60

invoking function with apply method 72–75

mathematical operations 36–47

arithmetic operations 43–45

broadcasting 45–47

statistical operations 36–43

overwriting with inplace parameter 65–66

passing to Python’s built-in functions 48–49

populating with values 24–26

retrieving first and last rows 34–36

similarities between DataFrame object and 83–90

importing DataFrame with read_csv function 83–84

shared and exclusive attributes 84–87

shared methods 87–90

sorting 60–65

by index with sort_index method 62–64

by values with sort_values method 60–62

retrieving with nsmallest and nlargest methods 64–65

set function 381

set_index method 95, 108, 110, 177, 193, 197

set_option function 312, 314, 318

setdefault method 296–297

sets 380–382

setting index 95–96, 193–194

size method 224, 233, 236

slice method 153–155

slicing 350

sort_index method 62–64, 78, 94–95, 175–176, 197, 267

sort_values method 60–62, 65–66, 78, 90–93, 110–112, 175, 281–282, 287

sorting 60–65

by index with sort_index method 62–64

by values with sort_values method 60–62

DataFrame object 90–93

by index 94–95

by multiple columns 92–93

by single column 90–92

MultiIndex DataFrames 175–179

nsmallest and nlargest methods 64–65

split method 158–159, 161–162, 164, 375–376

splitting strings 157–161

SQL (Structured Query Language) 8

squeeze method 406

stack method 208, 217, 219

stacking index levels 207–209

standard library 261, 362

startswith method 164, 367

statistical operations 36–43

str function 155, 358

str.capitalize method 151

str.endswith method 157

str.extract method 405–406

str.get method 159, 407

str.len method 158

str.replace method 155

str.slice method 154

str.split method 158, 160, 213, 407

str.startswith method 156

str.strip method 150

str.title method 151

str.upper method 151

strftime method 75, 77

strings 349–351

methods 365–368

slicing 151–155

strip method 149, 162, 164, 367, 375

subtracting duration of time 273–275

sum method 37, 86–87, 229–230

SUMIF function 6

swapcase method 366

symmetric_difference method 382

T

tail method 35–36, 228, 238

tall data set 199

Terminal

commands 332–333

launching 331–332

text data 147–164

Boolean methods 155–157

coding challenge 162–163

letter casing and whitespace 148–151

regular expressions 163–164

splitting strings 157–161

string slicing 151–153

string slicing and character replacement 153–155

time-series graph 321

timedelta object 277–282

times and dates 260–288

adding and subtracting durations of time 273–275

coding challenge 282–287

converting column or index values to datetimes 268–269

date offsets 275–277

storing multiple timestamps in DatetimeIndex 266–268

timedelta object 277–282

Timestamp object 261–266

pandas and datetimes 264–266

Python and datetimes 261–264

using DatetimeProperties object 269–273

title method 151, 164, 367

to_csv method 299–300

to_datetime function 269, 279, 283–284, 288

to_excel method 305–306, 308–309

to_json method 298

to_timedelta function 278, 285

transpose method 81

tuple function 30, 376

tuples 376–377

type function 32, 294, 363

U

union method 382

unique method 152

unstack method 208–209, 219

unstacking index levels 207–209

upper method 155, 164, 365, 371

V

value_counts method 66–72, 78, 85, 110, 284, 286, 325, 327

values

counting with value_counts method 66–72

creating Series object with missing 29–30

exploding list of 213–214

populating Series object with 24–26

sorting with sort_values method 60–62

values method 380

variables 11, 199, 356–357, 365

VBA (Visual Basic for Applications) 7

visualization 319–328

bar graphs 325–327

installing matplotlib 320

line charts 320–325

pie charts 327–328

VLOOKUP function 6

W

W metacharacter 400

w metacharacter 400

weekday method 364–365

whitespace 148–151

wide data 199–200

Windows 333–336

Anaconda Prompt

commands 335–336

launching 335

installing Anaconda in 333–335

word boundaries 401

wrangling 147

X

xlrd libraries 301

xs method 188–189, 196

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.134