Due to its roots in finance, pandas excels in manipulating time-series data. Its abilities have been continuously refined over all of its versions to progressively increase its capabilities for time-series manipulation. These capabilities are the core of pandas and do not require additional libraries, unlike R, which requires the inclusion of Zoo to provide this functionality.
The core of the time-series functionality in pandas revolves around the use of specialized indexes that represent measurements of data at one or more timestamps. These indexes in pandas are referred to as DatetimeIndex
objects. These are incredibly powerful objects, and their being core to pandas provides the ability to automatically align data based on dates and time, making working with sequences of data collected and time-stamped as easy as with any other type of indexes.
We will now examine how to create time-series data and DatetimeIndex
objects both using explicit timestamp objects and using specific durations of time (referred to in pandas as frequencies).
Sequences of timestamp
objects are represented by pandas as DatetimeIndex
, which is a type of pandas index that is optimized for indexing by date and time.
There are several ways to create DatetimeIndex
objects in pandas. The following creates a DateTimeindex
by passing a list of datetime
objects as Series
:
In [15]: # create a very simple time-series with two index labels # and random values dates = [datetime(2014, 8, 1), datetime(2014, 8, 2)] ts = pd.Series(np.random.randn(2), dates) ts Out[15]: 2014-08-01 1.566024 2014-08-02 0.938517 dtype: float64
Series
has taken the datetime
objects and constructed a DatetimeIndex
from the date values, where each value of DatetimeIndex
is a Timestamp
object. This is one of the cases where pandas directly constructs Timestamp
objects on your behalf.
The following verifies the type of the index and the types of the labels in the index:
In [16]: # what is the type of the index? type(ts.index) Out[16]: pandas.tseries.index.DatetimeIndex In [17]: # and we can see it is a collection of timestamps type(ts.index[0]) Out[17]: pandas.tslib.Timestamp
It is not required that you pass datetime
objects in the list to create a time series. The Series object is smart enough to recognize that a string represents datetime
and does the conversion for you. The following is equivalent to the previous example:
In [18]: # create from just a list of dates as strings! np.random.seed(123456) dates = ['2014-08-01', '2014-08-02'] ts = pd.Series(np.random.randn(2), dates) ts Out[18]: 2014-08-01 0.469112 2014-08-02 -0.282863 dtype: float64
pandas provides a utility function in pd.to_datetime()
. This function takes a sequence of similar- or mixed-type objects and pandas attempts to convert each into Timestamp
and the collection of these timestamps into DatetimeIndex
. If an object in the sequence cannot be converted, then NaT
, representing not-a-time will be returned at the position in the index:
In [19]: # convert a sequence of objects to a DatetimeIndex dti = pd.to_datetime(['Aug 1, 2014', '2014-08-02', '2014.8.3', None]) for l in dti: print (l) 2014-08-01 00:00:00 2014-08-02 00:00:00 2014-08-03 00:00:00 NaT
Be careful, as the pd.to_datetime()
function will, by default, fall back to returning a NumPy array of objects instead of DatetimeIndex
if it cannot parse a value to Timestamp
:
In [20]: # this is a list of objects, not timestamps... pd.to_datetime(['Aug 1, 2014', 'foo']) Out[20]: array(['Aug 1, 2014', 'foo'], dtype=object)
To force the function to convert to dates, you can use the coerce=True
parameter. Values that cannot be converted will be assigned NaT
in the resulting index:
In [21]: # force the conversion, NaT for items that don't work pd.to_datetime(['Aug 1, 2014', 'foo'], coerce=True) Out[21]: <class 'pandas.tseries.index.DatetimeIndex'> [2014-08-01, NaT] Length: 2, Freq: None, Timezone: None
A range of timestamps at a specific frequency can be easily created using the pd.date_range()
function. The following creates a Series
object from DatetimeIndex
of 10
consecutive days:
In [22]: # create a range of dates starting at a specific date # and for a specific number of days, creating a Series np.random.seed(123456) periods = pd.date_range('8/1/2014', periods=10) date_series = pd.Series(np.random.randn(10), index=periods) date_series Out[22]: 2014-08-01 0.469112 2014-08-02 -0.282863 2014-08-03 -1.509059 2014-08-04 -1.135632 2014-08-05 1.212112 2014-08-06 -0.173215 2014-08-07 0.119209 2014-08-08 -1.044236 2014-08-09 -0.861849 2014-08-10 -2.104569 Freq: D, dtype: float64
Like any pandas index, DatetimeIndex
can be used for various index operations, such as data alignment, selection, and slicing. The following demonstrates slicing using index locations:
In [23]: # slice by location subset = date_series[3:7] subset Out[23]: 2014-08-04 -1.135632 2014-08-05 1.212112 2014-08-06 -0.173215 2014-08-07 0.119209 Freq: D, dtype: float64
To demonstrate, we will use the following Series
created with the index of the subset we just created:
In [24]: # a Series to demonstrate alignment s2 = pd.Series([10, 100, 1000, 10000], subset.index) s2 Out[24]: 2014-08-04 10 2014-08-05 100 2014-08-06 1000 2014-08-07 10000 Freq: D, dtype: int64
When we add s2
and date_series
, alignment will be performed, returning NaN
where items do not align and the sum of the two values where they align:
In [25]: # demonstrate alignment by date on a subset of items date_series + s2 Out[25]: 2014-08-01 NaN 2014-08-02 NaN 2014-08-03 NaN 2014-08-04 8.864368 2014-08-05 101.212112 2014-08-06 999.826785 2014-08-07 10000.119209 2014-08-08 NaN 2014-08-09 NaN 2014-08-10 NaN Freq: D, dtype: float64
Items in Series
with DatetimeIndex
can be retrieved using a string representing a date instead having to specify a datetime
object:
In [26]: # lookup item by a string representing a date date_series['2014-08-05'] Out[26]: 1.2121120250208506
DatetimeIndex
can also be indexed and sliced using a string that represents a date or using datetime
objects:
In [27]: # slice between two dates specified by string representing dates date_series['2014-08-05':'2014-08-07'] Out[27]: 2014-08-05 1.212112 2014-08-06 -0.173215 2014-08-07 0.119209 Freq: D, dtype: float64
Another convenient feature of pandas is that DatetimeIndex
can be sliced using partial date specifications. As an example, the following code creates a Series
object with dates spanning two years and then selects only those items of the year 2013:
In [28]: # a two year range of daily data in a Series # only select those in 2013 s3 = pd.Series(0, pd.date_range('2013-01-01', '2014-12-31')) s3['2013'] Out[28]: 2013-01-01 0 2013-01-02 0 2013-01-03 0 ... 2013-12-29 0 2013-12-30 0 2013-12-31 0 Freq: D, Length: 365
We can also select items only in a specific year and month. This is demonstrated by the following, which selects the items in August 2014:
In [29]: # 31 items for May 2014 s3['2014-05'] Out[29]: 2014-05-01 0 2014-05-02 0 2014-05-03 0 ... 2014-05-29 0 2014-05-30 0 2014-05-31 0 Freq: D, Length: 31
We can slice data contained within two specified months, as demonstrated by the following, which returns items in August and September, 2014:
In [30]: # items between two months s3['2014-08':'2014-09'] Out[30]: 2014-08-01 0 2014-08-02 0 2014-08-03 0 ... 2014-09-28 0 2014-09-29 0 2014-09-30 0 Freq: D, Length: 61
Time-series data in pandas can be created on intervals other than daily frequency. Different frequencies can be generated with pd.date_range()
by utilizing the freq
parameter. This parameter defaults to a value of 'D'
, which represents daily frequency.
To demonstrate alternative frequencies, the following creates a DatetimeIndex
with 1-minute intervals between the two specified dates by specifying freq='T'
:
In [31]: # generate a Series at one minute intervals np.random.seed(123456) bymin = pd.Series(np.random.randn(24*60*90), pd.date_range('2014-08-01', '2014-10-29 23:59', freq='T')) bymin Out[31]: 2014-08-01 00:00:00 0.469112 2014-08-01 00:01:00 -0.282863 2014-08-01 00:02:00 -1.509059 ... 2014-10-29 23:57:00 1.850604 2014-10-29 23:58:00 -1.589660 2014-10-29 23:59:00 0.266429 Freq: T, Length: 129600
This time series allows us to slice at a finer resolution, down to the minute and smaller intervals if using finer frequencies. To demonstrate minute-level slicing, the following slices the values at 9 consecutive minutes:
In [32]: # slice down to the minute bymin['2014-08-01 00:02':'2014-08-01 00:10'] Out[32]: 2014-08-01 00:02:00 -1.509059 2014-08-01 00:03:00 -1.135632 2014-08-01 00:04:00 1.212112 2014-08-01 00:05:00 -0.173215 2014-08-01 00:06:00 0.119209 2014-08-01 00:07:00 -1.044236 2014-08-01 00:08:00 -0.861849 2014-08-01 00:09:00 -2.104569 2014-08-01 00:10:00 -0.494929 Freq: T, dtype: float64
The following table lists the possible frequency values:
Alias |
Description |
---|---|
|
Business day frequency |
|
Custom business day frequency |
|
Calendar day frequency (the default) |
|
Weekly frequency |
|
Month end frequency |
|
Business month end frequency |
|
Custom business month end frequency |
|
Month start frequency |
|
Business month start frequency |
|
Custom business month start frequency |
|
Quarter end frequency |
|
Business quarter frequency |
|
Quarter start frequency |
|
Business quarter start frequency |
|
Year end frequency |
|
Business year-end frequency |
|
Year start frequency |
|
Business year start frequency |
|
Hourly frequency |
|
Minute-by-minute frequency |
|
Second-by-second frequency |
|
Milliseconds |
|
Microseconds |
As an example, if you want to generate a time series that uses only business days, then use the 'B'
frequency:
In [33]: # generate a series based upon business days days = pd.date_range('2014-08-29', '2014-09-05', freq='B') for d in days : print (d) 2014-08-29 00:00:00 2014-09-01 00:00:00 2014-09-02 00:00:00 2014-09-03 00:00:00 2014-09-04 00:00:00 2014-09-05 00:00:00
In this time series, we can see that two days were skipped as they were on the weekend, which would not have occurred using a calendar-day frequency.
A range can be created starting at a particular date and time with a specific frequency and for a specific number of periods using the periods
parameter. To demonstrate, the following creates a 10-item DatetimeIndex
starting at 2014-08-01 12:10:01
and at 1-second intervals:
In [34]: # periods will use the frequency as the increment pd.date_range('2014-08-01 12:10:01', freq='S', periods=10) Out[34]: <class 'pandas.tseries.index.DatetimeIndex'> [2014-08-01 12:10:01, ..., 2014-08-01 12:10:10] Length: 10, Freq: S, Timezone: None
18.222.167.183