Selecting rows and values of a DataFrame using the index

Elements of an array or Series are selected using the [] operator. DataFrame overloads [] to select columns instead of rows, except for a specific case of slicing. Therefore, most operations of selection of one or more rows in a DataFrame, require alternate methods to using [].

Understanding this is important in pandas, as it is a common mistake is try and select rows using [] due to familiarity with other languages or data structures. When doing so, errors are often received, and can often be difficult to diagnose without realizing [] is working along a completely different axis than with a Series object.

Row selection using the index on a DataFrame then breaks down to the following general categories of operations:

  • Slicing using the [] operator
  • Label or location based lookup using .loc, .iloc, and .ix
  • Scalar lookup by label or location using .at and .iat

We will briefly examine each of these techniques and attributes. Remember, all of these are working against the content of the index of the DataFrame. There is no involvement with data in the columns in the selection of the rows. We will cover that in the next section on Boolean selection.

Slicing using the [] operator

Slicing a DataFrame across its index is syntactically identical to performing the same on a Series. Because of this, we will not go into the details of the various permutations of slices in this section, and only give representative examples applied to a DataFrame.

Slicing works along both positions and labels. The following code demonstrates several examples of slicing by position:

In [34]:
   # first five rows
   sp500[:5]

Out[34]:
                           Sector   Price  Book Value
   Symbol                                            
   MMM                Industrials  141.14      26.668
   ABT                Health Care   39.60      15.573
   ABBV               Health Care   53.95       2.954
   ACN     Information Technology   79.79       8.326
   ACE                 Financials  102.91      86.897

The following code returns rows starting with the ABT label through the ACN label:

In [35]:
   # ABT through ACN labels
   sp500['ABT':'ACN']

Out[35]:
                           Sector  Price  Book Value
   Symbol                                           
   ABT                Health Care  39.60      15.573
   ABBV               Health Care  53.95       2.954
   ACN     Information Technology  79.79       8.326

In general, although slicing of a DataFrame has its uses, high performance systems tend to shy away from it and use other methods. Additionally, the slice notation for rows on a DataFrame using integers can be confusing, as it looks like accessing columns by position, and hence can lead to subtle bugs.

Selecting rows by index label and location: .loc[] and .iloc[]

Rows can be retrieved via an index label value using .loc[]. This is shown in the following code:

In [36]:
   # get row with label MMM
   # returned as a Series
   sp500.loc['MMM']

Out[36]:
   Sector        Industrials
   Price              141.14
   Book Value         26.668
   Name: MMM, dtype: object

In [37]:
   # rows with label MMM and MSFT
   # this is a DataFrame result
   sp500.loc[['MMM', 'MSFT']]

Out[37]:
                           Sector   Price  Book Value
   Symbol                                            
   MMM                Industrials  141.14      26.668
   MSFT    Information Technology   40.12      10.584

Rows can also be retrieved by location using .iloc[]:

In [38]:
   # get rows in locations 0 and 2
   sp500.iloc[[0, 2]]

Out[38]:
                Sector   Price  Book Value
   Symbol                                 
   MMM     Industrials  141.14      26.668
   ABBV    Health Care   53.95       2.954

It is possible to look up the location in the index of a specific label value, which can then be used to retrieve the row(s):

In [39]:
   # get the location of MMM and A in the index
   i1 = sp500.index.get_loc('MMM')
   i2 = sp500.index.get_loc('A')
   "{0} {1}".format(i1, i2)

Out[39]:
   '0 10'

In [40]:
   # and get the rows
   sp500.iloc[[i1, i2]]

Out[40]:
                Sector   Price  Book Value
   Symbol                                 
   MMM     Industrials  141.14      26.668
   A       Health Care   56.18      16.928

Selecting rows by index label and/or location: .ix[]

Like a Series, a DataFrame also contains an .ix property that can be used to lookup rows, either by index label or location, essentially combining .loc and .iloc in one. The following looks up rows by index label by passing a list of nonintegers:

In [41]:
   # by label
   sp500.ix[['MSFT', 'ZTS']]

Out[41]:
                           Sector  Price  Book Value
   Symbol                                           
   MSFT    Information Technology  40.12      10.584
   ZTS                Health Care  30.53       2.150

Location-based lookup can be performed by passing a list of integers:

In [42]:
   # by location
   sp500.ix[[10, 200, 450]]

Out[42]:
                     Sector  Price  Book Value
   Symbol                                     
   A            Health Care  56.18      16.928
   GIS     Consumer Staples  53.81      10.236
   TRV           Financials  92.86      73.056

In general, use of .ix is not preferred due to potential confusion, and use of .loc and .iloc is recommended and gives higher performance.

Scalar lookup by label or location using .at[] and .iat[]

Scalar values can be looked up by label using .at, by passing both the row label and then the column name/value:

In [43]:
   # by label in both the index and column
   sp500.at['MMM', 'Price']

Out[43]:
   141.14

Scalar values can also be looked up by location using .iat by passing both the row location and then the column location. This is the preferred method of accessing single values and gives the highest performance.

In [44]:
   # by location.  Row 0, column 1
   sp500.iat[0, 1]

Out[44]:
   141.14
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.183.138