Slicing a Series

In Chapter 3, NumPy for pandas, we covered techniques for NumPy array slicing. pandas Series objects also support slicing and override the slicing operators to perform their magic on Series data. Just like NumPy arrays, you can pass a slice object to the [] operator of the Series to get the specified values. Slices also work with the .loc[], .iloc[], and .ix properties and accessors.

To demonstrate slicing, we will use the following Series:

In [83]:
   # a Series to use for slicing
   # using index labels not starting at 0 to demonstrate
   # position based slicing
   s = pd.Series(np.arange(100, 110), index=np.arange(10, 20))
   s

Out[83]:
   10    100
   11    101
   12    102
   13    103
   14    104
   15    105
   16    106
   17    107
   18    108
   19    109
   dtype: int64

The slice syntax is identical to that in NumPy arrays. The following example selects rows from the Series by position starting from and including 0, up to but not inclusive of 6, and stepping by 2 (alternate):

In [84]:
   # items at position 0, 2, 4
   s[0:6:2]

Out[84]:
   10    100
   12    102
   14    104
   dtype: int64

This is functionally equivalent to the following code:

In [85]:
   # equivalent to
   s.iloc[[0, 2, 4]]

Out[85]:
   10    100
   12    102
   14    104
   dtype: int64

A good feature of slicing is that particular elements of the slice are optional. The following example omits the start value and selects all items within positions 0 through 4. This is also a convenient shorthand for the .head() function of the Series:

In [86]:
   # first five by slicing, same as .head(5)
   s[:5]

   Out[86]:
   10    100
   11    101
   12    102
   13    103
   14    104
   dtype: int64

Flipping this around, you can select all the elements from a particular position to the end of the Series:

In [87]:
   # fourth position to the end
   s[4:]

Out[87]:
   14    104
   15    105
   16    106
   17    107
   18    108
   19    109
   dtype: int64

A step can be used in both scenarios, as can be seen here:

In [88]:
   # every other item in the first five positions
   s[:5:2]

Out[88]:
   10    100
   12    102
   14    104
   dtype: int64

In [89]:
   # every other item starting at the fourth position
   s[4::2]

Out[89]:
   14    104
   16    106
   18    108
   dtype: int64

An interesting usage of slicing is to specify a negative step. The following code returns the reverse of the Series:

In [90]:
   # reverse the Series
   s[::-1]

Out[90]:
   19    109
   18    108
   17    107
   16    106
   15    105
   14    104
   13    103
   12    102
   11    101
   10    100
   dtype: int64

Alternately, we can execute the following code if we want every other element, starting with position 4, in reverse:

In [91]:
   # every other starting at position 4, in reverse
   s[4::-2]

Out[91]:
   14    104
   12    102
   10    100
   dtype: int64

Negative values for the start and end of a slice have special meaning. If the series has n elements, then negative values for the start and end of the slice represent elements n + start through and not including n + end. This sounds a little confusing, but can be understood simply with the following example:

In [92]:
   # :-2, which means positions 0 through (10-2) [8]
   s[:-2]

Out[92]:
   10    100
   11    101
   12    102
   13    103
   14    104
   15    105
   16    106
   17    107
   dtype: int64

What we have discovered is a shorthand for selecting all of the items except for the last n, in this case n being 2 (-2 as passed to the slice). We can also pick the last n items in a series by using –n as the start and omitting the end component of the slice. This is also equivalent to using .tail(), but uses a little less typing (and this is a good thing):

In [93]:
   # last three items of the series
   s[-3:]

Out[93]:
   17    107
   18    108
   19    109
   dtype: int64

These can be combined, like in the following example, which returns all but the last row in the last four rows of the Series:

In [94]:
   # equivalent to s.tail(4).head(3)
   s[-4:-1]

Out[94]:
   16    106
   17    107
   18    108
   dtype: int64

An important thing to keep in mind when using slicing, is that the result of the slice is actually a view into the original Series. Modification of values through the result of the slice will modify the original Series. Consider the following example, which selects the first two elements in the Series and stores it into a new variable:

In [95]:
   copy = s.copy() # preserve s
   slice = copy[:2] # slice with first two rows
   slice

Out[95]:
   10    100
   11    101
   dtype: int64

Now, the assignment of a value to an element of a slice will change the value in the original Series:

In [96]:
   # change item with label 10 to 1000
   slice[11] = 1000
   # and see it in the source
   copy

Out[96]:
   10     100
   11    1000
   12     102
   13     103
   14     104
   15     105
   16     106
   17     107
   18     108
   19     109
   dtype: int64

Note

Keep this in mind as it is powerful, because if you were expecting slicing to use a copy of the data you will likely be tracking down some bugs in the future.

Slicing can be performed on Series objects with a noninteger index. The following Series will be used to demonstrate this:

In [97]:
   # used to demonstrate the next two slices
   s = pd.Series(np.arange(0, 5),
                 index=['a', 'b', 'c', 'd', 'e'])
   s

Out[97]:
   a    0
   b    1
   c    2
   d    3
   e    4
   dtype: int64

Slicing with integer values will extract items based on position:

In [98]:
   # slices by position as the index is characters
   s[1:3]

Out[98]:
   b    1
   c    2
   dtype: int64

With the noninteger index, it is also possible to slice with values in the same type of the index:

In [99]:
   # this slices by the strings in the index
   s['b':'d']

Out[99]:
   b    1
   c    2
   d    3
   dtype: int64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.70.21