Resetting and reindexing

A DataFrame can have its index reset by using the .reset_index(). A common use of this, is to move the contents of a DataFrame object's index into one or more columns. The following code moves the symbols in the index of sp500 into a column and replaces the index with a default integer index. The result is a new DataFrame, not an in-place update. The code is as follows:

In [102]:
   # reset the index, moving it into a column
   reset_sp500 = sp500.reset_index()
   reset_sp500

Out[102]:
       Symbol                  Sector   Price  BookValue
   0      MMM             Industrials  141.14     26.668
   1      ABT             Health Care   39.60     15.573
   2     ABBV             Health Care   53.95      2.954
   3      ACN  Information Technology   79.79      8.326
   4      ACE              Financials  102.91     86.897
   ..     ...                     ...     ...        ...
   495   YHOO  Information Technology   35.02     12.768
   496    YUM  Consumer Discretionary   74.77      5.147
   497    ZMH             Health Care  101.84     37.181
   498   ZION              Financials   28.43     30.191
   499    ZTS             Health Care   30.53      2.150

   [500 rows x 4 columns]

One or more columns can also be moved into the index. Another common scenario is exhibited by the reset variable we just created, as this may have been data read in from a file with the symbols in a column when we really would like it in the index. To do this, we can utilize the .set_index() method. The following code moves Symbol into the index of a new DataFrame:

In [103]:
   # move the Symbol column into the index
   reset_sp500.set_index('Symbol')

Out[103]:
                           Sector   Price  BookValue
   Symbol                                           
   MMM                Industrials  141.14     26.668
   ABT                Health Care   39.60     15.573
   ABBV               Health Care   53.95      2.954
   ACN     Information Technology   79.79      8.326
   ACE                 Financials  102.91     86.897
   ...                        ...     ...        ...
   YHOO    Information Technology   35.02     12.768
   YUM     Consumer Discretionary   74.77      5.147
   ZMH                Health Care  101.84     37.181
   ZION                Financials   28.43     30.191
   ZTS                Health Care   30.53      2.150

   [500 rows x 4 columns]

An index can be explicitly set using the .set_index() method. This method, given a list of values representing the new index, will create a new DataFrame using the specified values, and then align the data from the target in the new object. The following code demonstrates this, by using a subset of sp500 and assigning a new index that contains a subset of those indexes and an additional label FOO:

In [104]:
   # get first four rows
   subset = sp500[:4].copy()
   subset

Out[104]:
                           Sector   Price  BookValue
   Symbol                                           
   MMM                Industrials  141.14     26.668
   ABT                Health Care   39.60     15.573
   ABBV               Health Care   53.95      2.954
   ACN     Information Technology   79.79      8.326

In [105]:
   # reindex to have MMM, ABBV, and FOO index labels
   reindexed = subset.reindex(index=['MMM', 'ABBV', 'FOO'])
   # note that ABT and ACN are dropped and FOO has NaN values
   reindexed

Out[105]:
                Sector   Price  BookValue
   Symbol                                
   MMM     Industrials  141.14     26.668
   ABBV    Health Care   53.95      2.954
   FOO             NaN     NaN        NaN

Reindexing can also be done upon the columns. The following reindexes the columns of subset:

In [106]:
   # reindex columns
   subset.reindex(columns=['Price', 
                           'Book Value', 
                           'NewCol'])

Out[106]:
            Price  Book Value  NewCol
   Symbol                            
   MMM     141.14         NaN     NaN
   ABT      39.60         NaN     NaN
   ABBV     53.95         NaN     NaN
   ACN      79.79         NaN     NaN

This result is created by pandas by creating a new DataFrame with the specified columns, and then aligning the data for those columns from the subset into the new object. Because subset did not have a NewCol column, the values are filled with NaN.

Finally, a DataFrame can also be reindexed on rows and columns at the same time, but that will be left as an exercise for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.132.99