A Series
can be created and initialized by passing either a scalar value, a NumPy ndarray
, a Python list, or a Python Dict as the data
parameter of the Series
constructor. This is the default parameter and does not need to be specified if it is the first item.
The index
parameter of the constructor assigns a user defined index to the Series
that functions similar to a database index. This index provides a means to look up elements in the Series
by index label and not by the elements' position in the array.
If you do not specify an index at the creation of a Series
, the Series
object will construct an index automatically using integer values starting from zero and increasing by one for each item in the Series
.
The simplest means of creating a Series
is from a scalar value. A Series
with a single value has important uses in various mathematical operations such as applying a unified value across all the elements of another Series
or DataFrame
. The following code creates a one-item Series
from the scalar value 1
:
In [3]: # create one item Series s1 = pd.Series(2) s1 Out[3]: 0 2 dtype: int64
Note the output when the series s1
is printed. Two integers are displayed. The 0
value is the index label of the single item in the Series
whose value is 2
. The data type of the Series
object is also shown as being int64
. The index label is what we can use to retrieve the associated value from the Series
:
In [4]: # get value with label 0 s1[0] Out[4]: 2
This looks like a normal array access of the item at position zero in the array, but pandas really references the index of the Series for a label of value 0
and then returns the matching values.
The following example creates a Series
from a Python list:
In [5]: # create a series of multiple items from a list s2 = pd.Series([1, 2, 3, 4, 5]) s2 Out[5]: 0 1 1 2 2 3 3 4 4 5 dtype: int64
Since an index was not specified at the time of creation, pandas created an index for us with sequential zero-based integer values.
The array of values in the Series
can be retrieved using the .values
property, as shown here:
In [6]: # get the values in the Series s2.values Out[6]: array([1, 2, 3, 4, 5])
Also, the index of the series can be retrieved with the .index
property:
In [7]: # get the index of the Series s2.index Out[7]: Int64Index([0, 1, 2, 3, 4], dtype='int64')
This informs us that the type of index created by pandas is Int64Index
, it also informs about the labels in the index and their data type.
pandas will create different index types based on the type of data identified in the index parameter. These different index types are optimized to perform indexing operations for that specific data type. To specify the index at the time of creation of the Series
, use the index parameter of the constructor. The following example creates a Series
and assigns strings to each label of the index:
In [8]: # explicitly create an index # index is alpha, not integer s3 = pd.Series([1, 2, 3], index=['a', 'b', 'c']) s3 Out[8]: a 1 b 2 c 3 dtype: int64 In [9]: s3.index Out[9]: Index([u'a', u'b', u'c'], dtype='object')
The type of items in the index that are created are now of type object
. The following example retrieves the value of the item in the Series
with index label 'c'
:
In [10]: # lookup by label value, not integer position s3['c'] Out[10]: 3
A Series
created from a single scalar value is useful, as it allows you to apply an operation and a single value across all elements of a Series
. When creating a Series
object with a scalar and specifying an index with multiple labels, pandas will copy the scalar value to associate with each index label. The following code demonstrates this by creating a Series
with a scalar value and an index based on an already existing index:
In [11]: # create Series from an existing index # scalar value with be copied at each index label s4 = pd.Series(2, index=s2.index) s4 Out[11]: 0 2 1 2 2 2 3 2 4 2 dtype: int64
It is a common practice to initialize the Series
objects using NumPy ndarrays
, and with various NumPy functions that create arrays. The following code creates a Series
from five normally distributed values:
In [12]: # generate a Series from 5 normal random numbers np.random.seed(123456) pd.Series(np.random.randn(5)) Out[12]: 0 0.469112 1 -0.282863 2 -1.509059 3 -1.135632 4 1.212112 dtype: float64
NumPy also provides several convenient functions to create arrays (and hence Series
objects). The np.linspace()
method creates an array of values between two specified values:
In [13]: # 0 through 9 pd.Series(np.linspace(0, 9, 10)) Out[13]: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 dtype: float64
Likewise, the np.arange()
method creates an array of values between two specified values:
In [14]: # 0 through 8 pd.Series(np.arange(0, 9)) Out[14]: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 dtype: int64
Finally, a Series
can be directly initialized from a Python dictionary. The keys of the dictionary are used as the index labels for the Series
:
In [15]: # create Series from dict s6 = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4}) s6 Out[15]: a 1 b 2 c 3 d 4 dtype: int64
3.147.71.94