A DataFrame
can have its index reset by using the .reset_index()
. A common use of this, is to move the contents of a DataFrame
object's index into one or more columns. The following code moves the symbols in the index of sp500
into a column and replaces the index with a default integer index. The result is a new DataFrame
, not an in-place update. The code is as follows:
In [102]: # reset the index, moving it into a column reset_sp500 = sp500.reset_index() reset_sp500 Out[102]: Symbol Sector Price BookValue 0 MMM Industrials 141.14 26.668 1 ABT Health Care 39.60 15.573 2 ABBV Health Care 53.95 2.954 3 ACN Information Technology 79.79 8.326 4 ACE Financials 102.91 86.897 .. ... ... ... ... 495 YHOO Information Technology 35.02 12.768 496 YUM Consumer Discretionary 74.77 5.147 497 ZMH Health Care 101.84 37.181 498 ZION Financials 28.43 30.191 499 ZTS Health Care 30.53 2.150 [500 rows x 4 columns]
One or more columns can also be moved into the index. Another common scenario is exhibited by the reset variable we just created, as this may have been data read in from a file with the symbols in a column when we really would like it in the index. To do this, we can utilize the .set_index()
method. The following code moves Symbol into the index of a new DataFrame
:
In [103]: # move the Symbol column into the index reset_sp500.set_index('Symbol') Out[103]: Sector Price BookValue Symbol MMM Industrials 141.14 26.668 ABT Health Care 39.60 15.573 ABBV Health Care 53.95 2.954 ACN Information Technology 79.79 8.326 ACE Financials 102.91 86.897 ... ... ... ... YHOO Information Technology 35.02 12.768 YUM Consumer Discretionary 74.77 5.147 ZMH Health Care 101.84 37.181 ZION Financials 28.43 30.191 ZTS Health Care 30.53 2.150 [500 rows x 4 columns]
An index can be explicitly set using the .set_index()
method. This method, given a list of values representing the new index, will create a new DataFrame
using the specified values, and then align the data from the target in the new object. The following code demonstrates this, by using a subset of sp500
and assigning a new index that contains a subset of those indexes and an additional label FOO
:
In [104]: # get first four rows subset = sp500[:4].copy() subset Out[104]: Sector Price BookValue Symbol MMM Industrials 141.14 26.668 ABT Health Care 39.60 15.573 ABBV Health Care 53.95 2.954 ACN Information Technology 79.79 8.326 In [105]: # reindex to have MMM, ABBV, and FOO index labels reindexed = subset.reindex(index=['MMM', 'ABBV', 'FOO']) # note that ABT and ACN are dropped and FOO has NaN values reindexed Out[105]: Sector Price BookValue Symbol MMM Industrials 141.14 26.668 ABBV Health Care 53.95 2.954 FOO NaN NaN NaN
Reindexing can also be done upon the columns. The following reindexes the columns of subset
:
In [106]: # reindex columns subset.reindex(columns=['Price', 'Book Value', 'NewCol']) Out[106]: Price Book Value NewCol Symbol MMM 141.14 NaN NaN ABT 39.60 NaN NaN ABBV 53.95 NaN NaN ACN 79.79 NaN NaN
This result is created by pandas by creating a new DataFrame
with the specified columns, and then aligning the data for those columns from the subset into the new object. Because subset
did not have a NewCol
column, the values are filled with NaN
.
Finally, a DataFrame
can also be reindexed on rows and columns at the same time, but that will be left as an exercise for you.
3.133.132.99