How to do it...

  1. Read in the flights dataset, and find the total number of flights between each origin and destination airport:
>>> flights = pd.read_csv('data/flights.csv')
>>> flights_ct = flights.groupby(['ORG_AIR', 'DEST_AIR']).size()
>>> flights_ct.head()
ORG_AIR DEST_AIR ATL ABE 31 ABQ 16 ABY 19 ACY 6 AEX 40 dtype: int64
  1. Select the total number of flights between Houston (IAH) and Atlanta (ATL) in both directions:
>>> flights_ct.loc[[('ATL', 'IAH'), ('IAH', 'ATL')]]
ORG_AIR DEST_AIR ATL IAH 121 IAH ATL 148 dtype: int64
  1. We could simply sum these two numbers together to find the total flights between the cities but there is a more efficient and automated solution that can work for all flights. Let's independently sort the origin and destination cities for each row in alphabetical order:
>>> flights_sort = flights[['ORG_AIR', 'DEST_AIR']] 
.apply(sorted, axis=1)
>>> flights_sort.head()
  1. Now that each row has been independently sorted, the column names are not correct. Let's rename them to something more generic and then again find the total number of flights between all cities:
>>> rename_dict = {'ORG_AIR':'AIR1', 'DEST_AIR':'AIR2'}
>>> flights_sort = flights_sort.rename(columns=rename_dict)
>>> flights_ct2 = flights_sort.groupby(['AIR1', 'AIR2']).size()
>>> flights_ct2.head()
AIR1 AIR2
ABE ATL 31 ORD 24 ABI DFW 74 ABQ ATL 16 DEN 46 dtype: int64
  1. Let's select all the flights between Atlanta and Houston and verify that it matches the sum of the values in step 2:
>>> flights_ct2.loc[('ATL', 'IAH')]
269
  1. If we try and select flights with Houston followed by Atlanta, we get an error:
>>> flights_ct2.loc[('IAH', 'ATL')]
IndexingError: Too many indexers
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.250.11