Looking up values in Series

Values in a Series object can be retrieved using the [] operator and passing either a single index label or a list of index labels. The following code retrieves the value associated with the index label 'a' of the s3 series defined earlier:

In [28]:
   # single item lookup
   s3['a']

Out[28]:
   1

Accessing this Series using an integer value will perform a zero-based position lookup of the value:

In [29]:
   # lookup by position since the index is not an integer
   s3[1]

Out[29]:
   2

This is because pandas determines that the specified value is an integer and that the index is not an integer-based index. Given this, pandas decides to perform a lookup by position and not by index label.

To retrieve multiple items, you can pass a list of index labels via the [] operator. Instead of a single value, the result will be a new Series with both index labels and values, and data copied from the original Series.

In [30]:
   # multiple items
   s3[['a', 'c']]

Out[30]:
   a    1
   c    3
   dtype: int64

To elaborate on the use of integers for lookup based on either label or position, we can examine operations using the following Series where the index labels are integers but not starting from 0.

In [31]:
   # series with an integer index, but not starting with 0
   s5 = pd.Series([1, 2, 3], index=[10, 11, 12])
   s5

Out[31]:
   10    1
   11    2
   12    3
   dtype: int64

Also, the following code looks up the value at the index label of 11. Label-based lookup is performed because the type of the index is integer, as well as the value passed to the [] operator is integer:

In [32]:
   # by value as value passed and index are both integer
   s5[11]

Out[32]:
   2

If this was performed using a zero-based position lookup, an exception would be thrown as the Series only contains three items.

To alleviate the potential confusion in determining label-based lookup versus position-based lookup, index label based lookup can be enforced using the .loc[] accessor:

In [33]:
   # force lookup by index label
   s5.loc[12]

Out[33]:
   3

Lookup by position can be enforced using the .iloc[] accessor:

In [34]:
   # forced lookup by location / position
   s5.iloc[1]

Out[34]:
   2

These two options also function using lists, as shown in the following example:

In [35]:
   # multiple items by label (loc)
   s5.loc[[12, 10]]

Out[35]:
   12    3
   10    1
   dtype: int64

In [36]:
   # multiple items by location / position (iloc)
   s5.iloc[[0, 2]]

Out[36]:
   10    1
   12    3
   dtype: int64

If a location/position passed to .iloc[] in a list is out of bounds, an exception will be thrown. This is different than with .loc[], which if passed a label that does not exist, will return NaN as the value for that label:

In [37]:
   # -1 and 15 will be NaN
   s5.loc[[12, -1, 15]]

Out[37]:
    12     3
   -1    NaN	
    15   NaN
   dtype: float64

Note

When looking to write the highest performance code for accessing items in a Series, it is recommended that you use the .loc[] method using lookup by integer position.

A Series also has a property .ix that can be used to look up items either by label or by zero-based array position. To demonstrate this, let's revisit the s3 series:

In [38]:
   # reminder of the contents of s3
   s3

Out[38]:
   a    1
   b    2
   c    3
   dtype: int64

The following example looks up by index label:

In [39]:
   # label based lookup
   s3.ix[['a', 'c']]

Out[39]:
   a    1
   c    3
   dtype: int64

The following example looks up by position:

In [40]:
   # position based lookup
   s3.ix[[1, 2]]

Out[40]:
   b    2
   c    3
   dtype: int64

This can become complicated if the indexes are integers and you pass a list of integers to .ix. Since they are of the same type, the lookup will be by index label instead of position:

In [41]:
   # this looks up by label and not position
   # note that 1,2 have NaN as those labels do not exist
   # in the index
   s5.ix[[1, 2, 10, 11]]

Out[41]:
   1    NaN
   2    NaN
   10     1
   11     2
   dtype: float64

This has reverted to label value lookup, and since there were no elements for labels 1 and 2, NaN was returned.

Note

Use of .ix is generally frowned upon by many practitioners due to this issue. It is recommended to use the .loc or .iloc[] techniques. Additionally, they are also better performing than .ix.

Alignment via index labels

A fundamental difference between a NumPy ndarray and a pandas Series is the ability of a Series to automatically align data from another Series based on label values before performing an operation.

We will examine alignment using the following two Series objects:

In [42]:
   s6 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
   s6

Out[42]:
   a    1
   b    2
   c    3
   d    4
   dtype: int64

In [43]:
   s7 = pd.Series([4, 3, 2, 1], index=['d', 'c', 'b', 'a'])
   s7

Out[43]:
   d    4
   c    3
   b    2
   a    1
   dtype: int64

The following code adds the values in the two series:

In [44]:
   # add them
   s6 + s7

Out[44]:
   a    2
   b    4
   c    6
   d    8
   dtype: int64

The process of adding two Series objects differs from the process of addition of arrays as it first aligns data based on index label values instead of simply applying the operation to elements in the same position. This becomes significantly powerful when using pandas Series to combine data based on labels instead of having to first order the data manually.

Note

Also worth noting is the order of the items in the index resulting from the addition. The two Series in the addition had the same labels but were ordered differently. The index in the result is arranged in ascending order.

This is a very different result than what it would have been if it were two pure NumPy arrays being added. A NumPy ndarray would add the items in identical positions of each array resulting in different values:

In [45]:
   # see how different from adding numpy arrays
   a1 = np.array([1, 2, 3, 4])
   a2 = np.array([4, 3, 2, 1])
   a1 + a2

Out[45]:
   array([5, 5, 5, 5])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.61.170