Values in a Series
object can be retrieved using the []
operator and passing either a single index label or a list of index labels. The following code retrieves the value associated with the index label 'a'
of the s3
series defined earlier:
In [28]: # single item lookup s3['a'] Out[28]: 1
Accessing this Series
using an integer value will perform a zero-based position lookup of the value:
In [29]: # lookup by position since the index is not an integer s3[1] Out[29]: 2
This is because pandas determines that the specified value is an integer and that the index is not an integer-based index. Given this, pandas decides to perform a lookup by position and not by index label.
To retrieve multiple items, you can pass a list of index labels via the []
operator. Instead of a single value, the result will be a new Series
with both index labels and values, and data copied from the original Series
.
In [30]: # multiple items s3[['a', 'c']] Out[30]: a 1 c 3 dtype: int64
To elaborate on the use of integers for lookup based on either label or position, we can examine operations using the following Series
where the index labels are integers but not starting from 0
.
In [31]: # series with an integer index, but not starting with 0 s5 = pd.Series([1, 2, 3], index=[10, 11, 12]) s5 Out[31]: 10 1 11 2 12 3 dtype: int64
Also, the following code looks up the value at the index label of 11
. Label-based lookup is performed because the type of the index is integer, as well as the value passed to the []
operator is integer:
In [32]: # by value as value passed and index are both integer s5[11] Out[32]: 2
If this was performed using a zero-based position lookup, an exception would be thrown as the Series only contains three items.
To alleviate the potential confusion in determining label-based lookup versus position-based lookup, index label based lookup can be enforced using the .loc[]
accessor:
In [33]: # force lookup by index label s5.loc[12] Out[33]: 3
Lookup by position can be enforced using the .iloc[]
accessor:
In [34]: # forced lookup by location / position s5.iloc[1] Out[34]: 2
These two options also function using lists, as shown in the following example:
In [35]: # multiple items by label (loc) s5.loc[[12, 10]] Out[35]: 12 3 10 1 dtype: int64 In [36]: # multiple items by location / position (iloc) s5.iloc[[0, 2]] Out[36]: 10 1 12 3 dtype: int64
If a location/position passed to .iloc[]
in a list is out of bounds, an exception will be thrown. This is different than with .loc[]
, which if passed a label that does not exist, will return NaN
as the value for that label:
In [37]: # -1 and 15 will be NaN s5.loc[[12, -1, 15]] Out[37]: 12 3 -1 NaN 15 NaN dtype: float64
A Series
also has a property .ix
that can be used to look up items either by label or by zero-based array position. To demonstrate this, let's revisit the s3
series:
In [38]: # reminder of the contents of s3 s3 Out[38]: a 1 b 2 c 3 dtype: int64
The following example looks up by index label:
In [39]: # label based lookup s3.ix[['a', 'c']] Out[39]: a 1 c 3 dtype: int64
The following example looks up by position:
In [40]: # position based lookup s3.ix[[1, 2]] Out[40]: b 2 c 3 dtype: int64
This can become complicated if the indexes are integers and you pass a list of integers to .ix
. Since they are of the same type, the lookup will be by index label instead of position:
In [41]: # this looks up by label and not position # note that 1,2 have NaN as those labels do not exist # in the index s5.ix[[1, 2, 10, 11]] Out[41]: 1 NaN 2 NaN 10 1 11 2 dtype: float64
This has reverted to label value lookup, and since there were no elements for labels 1
and 2
, NaN
was returned.
A fundamental difference between a NumPy ndarray
and a pandas Series
is the ability of a Series
to automatically align data from another Series
based on label values before performing an operation.
We will examine alignment using the following two Series
objects:
In [42]: s6 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) s6 Out[42]: a 1 b 2 c 3 d 4 dtype: int64 In [43]: s7 = pd.Series([4, 3, 2, 1], index=['d', 'c', 'b', 'a']) s7 Out[43]: d 4 c 3 b 2 a 1 dtype: int64
The following code adds the values in the two series:
In [44]: # add them s6 + s7 Out[44]: a 2 b 4 c 6 d 8 dtype: int64
The process of adding two Series
objects differs from the process of addition of arrays as it first aligns data based on index label values instead of simply applying the operation to elements in the same position. This becomes significantly powerful when using pandas Series
to combine data based on labels instead of having to first order the data manually.
This is a very different result than what it would have been if it were two pure NumPy arrays being added. A NumPy ndarray
would add the items in identical positions of each array resulting in different values:
In [45]: # see how different from adding numpy arrays a1 = np.array([1, 2, 3, 4]) a2 = np.array([4, 3, 2, 1]) a1 + a2 Out[45]: array([5, 5, 5, 5])
3.145.61.170