Hierarchical indexing

Hierarchical indexing is a feature of pandas that allows specifying two or more index levels on an axis. The specification of multiple levels in an index allows for efficient selection of subsets of data. A pandas index that has multiple levels of hierarchy is referred to as a MultiIndex.

We can demonstrate creating a MultiIndex using the sp500 data. Suppose we want to organize this data by both the Sector and Symbol. We can accomplish this with the following code:

In [107]:
   # first, push symbol into a column
   reindexed = sp500.reset_index()
   # and now index sp500 by sector and symbol
   multi_fi = reindexed.set_index(['Sector', 'Symbol'])
   multi_fi

Out[107]:
                                   Price  BookValue
   Sector                 Symbol                   
   Industrials            MMM     141.14     26.668
   Health Care            ABT      39.60     15.573
                          ABBV     53.95      2.954
   Information Technology ACN      79.79      8.326
   Financials             ACE     102.91     86.897
   ...                               ...        ...
   Information Technology YHOO     35.02     12.768
   Consumer Discretionary YUM      74.77      5.147
   Health Care            ZMH     101.84     37.181
   Financials             ZION     28.43     30.191
   Health Care            ZTS      30.53      2.150

   [500 rows x 2 columns]

We can now examine the .index property and check whether it is a MultiIndex object:

In [108]:
   # the index is a MultiIndex
   type(multi_fi.index)

Out[108]:
   pandas.core.index.MultiIndex

Then, examine the index itself:

In [109]:
   # examine the index
   print (multi_fi.index)

Out[109]:
   Sector                       Symbol
   Industrials                  MMM   
   Health Care                  ABT   
                                ABBV  
   Information Technology       ACN   
                   ...                
   Information Technology       YHOO  
   Consumer Discretionary       YUM   
   Health Care                  ZMH   
   Financials                   ZION  
   Health Care                  ZTS

A MultiIndex contains two or more levels:

In [110]:
   # this has two levels
   len(multi_fi.index.levels)

Out[110]:
   2

Also, each level is a distinct Index object:

In [111]:
   # each index level is an index
   multi_fi.index.levels[0]

Out[111]:
   Index([u'Consumer Discretionary', u'Consumer Discretionary ', u'Consumer Staples', u'Consumer Staples ', u'Energy', u'Financials', u'Health Care', u'Industrials', u'Industries', u'Information Technology', u'Materials', u'Telecommunications Services', u'Utilities'], dtype='object')

In [112]:
   # each index level is an index
   multi_fi.index.levels[1]

Out[112]:
   Index([u'A', u'AA', u'AAPL', u'ABBV', u'ABC', u'ABT', u'ACE', u'ACN', u'ACT', u'ADBE', u'ADI', u'ADM', u'ADP', u'ADS', u'ADSK', u'ADT', u'AEE', u'AEP', u'AES', u'AET', u'AFL', u'AGN', u'AIG', u'AIV', u'AIZ', u'AKAM', u'ALL', u'ALLE', u'ALTR', u'ALXN', u'AMAT', u'AME', u'AMGN', u'AMP', u'AMT', u'AMZN', u'AN', u'AON', u'APA', u'APC', u'APD', u'APH', u'ARG', u'ATI', u'AVB', u'AVP', u'AVY', u'AXP', u'AZO', u'BA', u'BAC', u'BAX', u'BBBY', u'BBT', u'BBY', u'BCR', u'BDX', u'BEAM', u'BEN', u'BF-B', u'BHI', u'BIIB', u'BK', u'BLK', u'BLL', u'BMS', u'BMY', u'BRCM', u'BRK-B', u'BSX', u'BTU', u'BWA', u'BXP', u'C', u'CA', u'CAG', u'CAH', u'CAM', u'CAT', u'CB', u'CBG', u'CBS', u'CCE', u'CCI', u'CCL', u'CELG', u'CERN', u'CF', u'CFN', u'CHK', u'CHRW', u'CI', u'CINF', u'CL', u'CLF', u'CLX', u'CMA', u'CMCSA', u'CME', u'CMG', ...], dtype='object')

Values of the index, at a specific level for every row, can be retrieved by the .get_level_values() method:

In [113]:
   # values of the index level 0
   multi_fi.index.get_level_values(0)

Out[113]:
   Index([u'Industrials', u'Health Care', u'Health Care', u'Information Technology', u'Financials', u'Health Care', u'Information Technology', u'Utilities', u'Health Care', u'Financials', u'Health Care', u'Utilities', u'Materials', u'Materials', u'Information Technology', u'Materials', u'Health Care', u'Materials', u'Industrials', u'Health Care', u'Information Technology', u'Financials', u'Information Technology', u'Consumer Staples', u'Consumer Discretionary', u'Utilities', u'Utilities', u'Financials', u'Financials', u'Financials', u'Financials', u'Health Care', u'Information Technology', u'Health Care', u'Industrials', u'Energy', u'Information Technology', u'Financials', u'Energy', u'Financials', u'Information Technology', u'Information Technology', u'Consumer Staples', u'Financials', u'Telecommunications Services', u'Information Technology', u'Information Technology', u'Consumer Discretionary', u'Consumer Discretionary', u'Financials', u'Industrials', u'Consumer Staples', u'Energy', u'Materials', u'Financials', u'Health Care', u'Health Care', u'Financials', u'Consumer Discretionary', u'Health Care', u'Consumer Discretionary', u'Materials', u'Financials', u'Consumer Discretionary', u'Health Care', u'Financials', u'Consumer Discretionary', u'Industrials', u'Consumer Discretionary', u'Financials', u'Health Care', u'Health Care', u'Information Technology', u'Consumer Staples', u'Information Technology', u'Consumer Discretionary', u'Energy', u'Energy', u'Consumer Staples', u'Financials', u'Health Care', u'Health Care', u'Consumer Discretionary', u'Consumer Discretionary', u'Industrials', u'Financials', u'Consumer Discretionary', u'Health Care', u'Utilities', u'Telecommunications Services', u'Health Care', u'Materials', u'Industrials', u'Energy', u'Energy', u'Consumer Discretionary', u'Financials', u'Health Care', u'Financials', u'Industrials', ...], dtype='object')

Access of elements via a hierarchical index is performed using the .xs() method. This method works similar to the .ix attribute but provides parameters to specify multidimensionality of the index.

The following code selects all items with a level 0 index value of Industrials:

In [114]:
   # get all stocks that are Industrials
   # note the result drops level 0 of the index
   multi_fi.xs('Industrials')

Out[114]:
            Price  BookValue
   Symbol                   
   MMM     141.14     26.668
   ALLE     52.46      0.000
   APH      95.71     18.315
   AVY      48.20     15.616
   BA      132.41     19.870
   ...        ...        ...
   UNP     196.26     46.957
   UPS     102.73      6.790
   UTX     115.54     35.252
   WM       43.37     12.330
   XYL      38.42     12.127

   [64 rows x 2 columns]

To select the rows with a specific value of the index at level 1, use the level parameter. The following code selects rows where the Symbol component of the index is ALLE.

In [115]:
   # select rows where level 1 (Symbol) is ALLE
   # note that the Sector level is dropped from the result
   multi_fi.xs('ALLE', level=1)

Out[115]:
                Price  BookValue
   Sector                       
   Industrials  52.46          0

To prevent levels from being dropped, you can use the drop_levels=False option:

In [116]:
   # Industrials, without dropping the level
   multi_fi.xs('Industrials', drop_level=False)

Out[116]:
                        Price  BookValue
   Sector      Symbol                   
   Industrials MMM     141.14     26.668
               ALLE     52.46      0.000
               APH      95.71     18.315
               AVY      48.20     15.616
               BA      132.41     19.870
   ...                    ...        ...
               UNP     196.26     46.957
               UPS     102.73      6.790
               UTX     115.54     35.252
               WM       43.37     12.330
               XYL      38.42     12.127

   [64 rows x 2 columns]

To select from a hierarchy of indexes you can chain .xs() calls with different levels together. The following code selects the row with Industrials at level 0 and UPS at level 1:

In [117]:
   # drill through the levels
   multi_fi.xs('Industrials').xs('UPS')

Out[117]:
   Price        102.73
   BookValue      6.79
   Name: UPS, dtype: float64

An alternate syntax is to pass the values of each level of the hierarchical index as a tuple:

In [118]:
   # drill through using tuples
   multi_fi.xs(('Industrials', 'UPS'))

Out[118]:
   Price        102.73
   BookValue      6.79
   Name: (Industrials, UPS), dtype: float64

Note that .xs() can only be used for getting, not setting, values.

Note

One of the things I'd like to point out about indexing in pandas, is that a pandas index is its own set of data, not references to data in the Series or DataFrame. This is different from how indexes are used in SQL databases, where the index is built upon the actual data in the table. The values in a pandas index can be completely different from the data in the row that it references, and it can be changed as needed to support much more interactive analysis than can be done with SQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.186