In Chapter 3, NumPy for pandas, we covered techniques for NumPy array slicing. pandas Series
objects also support slicing and override the slicing operators to perform their magic on Series
data. Just like NumPy arrays, you can pass a slice object to the []
operator of the Series
to get the specified values. Slices also work with the .loc[]
, .iloc[]
, and .ix
properties and accessors.
To demonstrate slicing, we will use the following Series
:
In [83]: # a Series to use for slicing # using index labels not starting at 0 to demonstrate # position based slicing s = pd.Series(np.arange(100, 110), index=np.arange(10, 20)) s Out[83]: 10 100 11 101 12 102 13 103 14 104 15 105 16 106 17 107 18 108 19 109 dtype: int64
The slice syntax is identical to that in NumPy arrays. The following example selects rows from the Series
by position starting from and including 0
, up to but not inclusive of 6
, and stepping by 2
(alternate):
In [84]: # items at position 0, 2, 4 s[0:6:2] Out[84]: 10 100 12 102 14 104 dtype: int64
This is functionally equivalent to the following code:
In [85]: # equivalent to s.iloc[[0, 2, 4]] Out[85]: 10 100 12 102 14 104 dtype: int64
A good feature of slicing is that particular elements of the slice are optional. The following example omits the start value and selects all items within positions 0
through 4
. This is also a convenient shorthand for the .head()
function of the Series
:
In [86]: # first five by slicing, same as .head(5) s[:5] Out[86]: 10 100 11 101 12 102 13 103 14 104 dtype: int64
Flipping this around, you can select all the elements from a particular position to the end of the Series
:
In [87]: # fourth position to the end s[4:] Out[87]: 14 104 15 105 16 106 17 107 18 108 19 109 dtype: int64
A step can be used in both scenarios, as can be seen here:
In [88]: # every other item in the first five positions s[:5:2] Out[88]: 10 100 12 102 14 104 dtype: int64 In [89]: # every other item starting at the fourth position s[4::2] Out[89]: 14 104 16 106 18 108 dtype: int64
An interesting usage of slicing is to specify a negative step. The following code returns the reverse of the Series
:
In [90]: # reverse the Series s[::-1] Out[90]: 19 109 18 108 17 107 16 106 15 105 14 104 13 103 12 102 11 101 10 100 dtype: int64
Alternately, we can execute the following code if we want every other element, starting with position 4
, in reverse:
In [91]: # every other starting at position 4, in reverse s[4::-2] Out[91]: 14 104 12 102 10 100 dtype: int64
Negative values for the start and end of a slice have special meaning. If the series has n elements, then negative values for the start and end of the slice represent elements n + start through and not including n + end. This sounds a little confusing, but can be understood simply with the following example:
In [92]: # :-2, which means positions 0 through (10-2) [8] s[:-2] Out[92]: 10 100 11 101 12 102 13 103 14 104 15 105 16 106 17 107 dtype: int64
What we have discovered is a shorthand for selecting all of the items except for the last n, in this case n being 2
(-2
as passed to the slice). We can also pick the last n items in a series by using –n as the start and omitting the end component of the slice. This is also equivalent to using .tail()
, but uses a little less typing (and this is a good thing):
In [93]: # last three items of the series s[-3:] Out[93]: 17 107 18 108 19 109 dtype: int64
These can be combined, like in the following example, which returns all but the last row in the last four rows of the Series
:
In [94]: # equivalent to s.tail(4).head(3) s[-4:-1] Out[94]: 16 106 17 107 18 108 dtype: int64
An important thing to keep in mind when using slicing, is that the result of the slice is actually a view into the original Series
. Modification of values through the result of the slice will modify the original Series
. Consider the following example, which selects the first two elements in the Series
and stores it into a new variable:
In [95]: copy = s.copy() # preserve s slice = copy[:2] # slice with first two rows slice Out[95]: 10 100 11 101 dtype: int64
Now, the assignment of a value to an element of a slice will change the value in the original Series
:
In [96]: # change item with label 10 to 1000 slice[11] = 1000 # and see it in the source copy Out[96]: 10 100 11 1000 12 102 13 103 14 104 15 105 16 106 17 107 18 108 19 109 dtype: int64
Slicing can be performed on Series
objects with a noninteger index. The following Series
will be used to demonstrate this:
In [97]: # used to demonstrate the next two slices s = pd.Series(np.arange(0, 5), index=['a', 'b', 'c', 'd', 'e']) s Out[97]: a 0 b 1 c 2 d 3 e 4 dtype: int64
Slicing with integer values will extract items based on position:
In [98]: # slices by position as the index is characters s[1:3] Out[98]: b 1 c 2 dtype: int64
With the noninteger index, it is also possible to slice with values in the same type of the index:
In [99]: # this slices by the strings in the index s['b':'d'] Out[99]: b 1 c 2 d 3 dtype: int64
3.139.70.21