Creating Series

A Series can be created and initialized by passing either a scalar value, a NumPy ndarray, a Python list, or a Python Dict as the data parameter of the Series constructor. This is the default parameter and does not need to be specified if it is the first item.

The index parameter of the constructor assigns a user defined index to the Series that functions similar to a database index. This index provides a means to look up elements in the Series by index label and not by the elements' position in the array.

If you do not specify an index at the creation of a Series, the Series object will construct an index automatically using integer values starting from zero and increasing by one for each item in the Series.

The simplest means of creating a Series is from a scalar value. A Series with a single value has important uses in various mathematical operations such as applying a unified value across all the elements of another Series or DataFrame. The following code creates a one-item Series from the scalar value 1:

In [3]:
   # create one item Series
   s1 = pd.Series(2)
   s1

Out[3]:
   0    2
   dtype: int64

Note the output when the series s1 is printed. Two integers are displayed. The 0 value is the index label of the single item in the Series whose value is 2. The data type of the Series object is also shown as being int64. The index label is what we can use to retrieve the associated value from the Series:

In [4]:
   # get value with label 0
   s1[0]

Out[4]:
   2

This looks like a normal array access of the item at position zero in the array, but pandas really references the index of the Series for a label of value 0 and then returns the matching values.

The following example creates a Series from a Python list:

In [5]:
   # create a series of multiple items from a list
   s2 = pd.Series([1, 2, 3, 4, 5])
   s2

Out[5]:
   0    1
   1    2
   2    3
   3    4
   4    5
   dtype: int64

Since an index was not specified at the time of creation, pandas created an index for us with sequential zero-based integer values.

The array of values in the Series can be retrieved using the .values property, as shown here:

In [6]:
   # get the values in the Series
   s2.values

Out[6]:
   array([1, 2, 3, 4, 5])

Also, the index of the series can be retrieved with the .index property:

In [7]:
   # get the index of the Series
   s2.index

Out[7]:
   Int64Index([0, 1, 2, 3, 4], dtype='int64')

This informs us that the type of index created by pandas is Int64Index, it also informs about the labels in the index and their data type.

pandas will create different index types based on the type of data identified in the index parameter. These different index types are optimized to perform indexing operations for that specific data type. To specify the index at the time of creation of the Series, use the index parameter of the constructor. The following example creates a Series and assigns strings to each label of the index:

In [8]:
   # explicitly create an index
   # index is alpha, not integer
   s3 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
   s3

Out[8]:
   a    1
   b    2
   c    3
   dtype: int64

In [9]:
   s3.index

Out[9]:
   Index([u'a', u'b', u'c'], dtype='object')

The type of items in the index that are created are now of type object. The following example retrieves the value of the item in the Series with index label 'c':

In [10]:
   # lookup by label value, not integer position
   s3['c']

Out[10]:
   3

A Series created from a single scalar value is useful, as it allows you to apply an operation and a single value across all elements of a Series. When creating a Series object with a scalar and specifying an index with multiple labels, pandas will copy the scalar value to associate with each index label. The following code demonstrates this by creating a Series with a scalar value and an index based on an already existing index:

In [11]:
   # create Series from an existing index
   # scalar value with be copied at each index label
   s4 = pd.Series(2, index=s2.index)
   s4

Out[11]:
   0    2
   1    2
   2    2
   3    2
   4    2
   dtype: int64

It is a common practice to initialize the Series objects using NumPy ndarrays, and with various NumPy functions that create arrays. The following code creates a Series from five normally distributed values:

In [12]:
   # generate a Series from 5 normal random numbers
   np.random.seed(123456)
   pd.Series(np.random.randn(5))

   Out[12]:
   0    0.469112
   1   -0.282863
   2   -1.509059
   3   -1.135632
   4    1.212112
   dtype: float64

NumPy also provides several convenient functions to create arrays (and hence Series objects). The np.linspace() method creates an array of values between two specified values:

In [13]:
   # 0 through 9
   pd.Series(np.linspace(0, 9, 10))

Out[13]:
   0    0
   1    1
   2    2
   3    3
   4    4
   5    5
   6    6
   7    7
   8    8
   9    9
   dtype: float64

Likewise, the np.arange() method creates an array of values between two specified values:

In [14]:
   # 0 through 8
   pd.Series(np.arange(0, 9))

Out[14]:
   0    0
   1    1
   2    2
   3    3
   4    4
   5    5
   6    6
   7    7
   8    8
   dtype: int64

Finally, a Series can be directly initialized from a Python dictionary. The keys of the dictionary are used as the index labels for the Series:

In [15]:
   # create Series from dict
   s6 = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4})
   s6

Out[15]:
   a    1
   b    2
   c    3
   d    4
   dtype: int64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.71.94