How to do it...

  1. Read in the three baseball datasets and set the index as playerID:
>>> baseball_14 = pd.read_csv('data/baseball14.csv',
index_col='playerID')
>>> baseball_15 = pd.read_csv('data/baseball15.csv',
index_col='playerID')
>>> baseball_16 = pd.read_csv('data/baseball16.csv',
index_col='playerID')
>>> baseball_14.head()
  1. Use the index method difference to discover which index labels are in baseball_14 and not in baseball_15, and vice versa:
>>> baseball_14.index.difference(baseball_15.index)
Index(['corpoca01', 'dominma01', 'fowlede01', 'grossro01',
'guzmaje01', 'hoeslj01', 'krausma01', 'preslal01',
'singljo02'], dtype='object', name='playerID')

>>> baseball_14.index.difference(baseball_16.index)
Index(['congeha01', 'correca01', 'gattiev01', 'gomezca01',
'lowrije01', 'rasmuco01', 'tuckepr01', 'valbulu01'], dtype='object', name='playerID')
  1. There are quite a few players unique to each index. Let's find out how many hits each player has in total over the three-year period. The H column contains the number of hits:
>>> hits_14 = baseball_14['H']
>>> hits_15 = baseball_15['H']
>>> hits_16 = baseball_16['H']
>>> hits_14.head()
playerID altuvjo01 225 cartech02 115 castrja01 103 corpoca01 40 dominma01 121 Name: H, dtype: int64
  1. Let's first add together two Series using the plus operator:
>>> (hits_14 + hits_15).head()
playerID altuvjo01 425.0 cartech02 193.0 castrja01 174.0 congeha01 NaN corpoca01 NaN Name: H, dtype: float64
  1. Even though players congeha01 and corpoca01 have recorded hits for 2015, their result is missing. Let's use the add method and its parameter, fill_value, to avoid missing values:
>>> hits_14.add(hits_15, fill_value=0).head()
playerID altuvjo01 425.0 cartech02 193.0 castrja01 174.0 congeha01 46.0 corpoca01 40.0 Name: H, dtype: float64
  1. We add hits from 2016 by chaining the add method once more:
>>> hits_total = hits_14.add(hits_15, fill_value=0) 
.add(hits_16, fill_value=0)
>>> hits_total.head()
playerID altuvjo01 641.0 bregmal01 53.0 cartech02 193.0 castrja01 243.0 congeha01 46.0 Name: H, dtype: float64
  1. Check for missing values in the result:
>>> hits_total.hasnans
False
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.107.100