Frequencies in pandas are represented using date offsets. We have touched on this concept at the beginning of the chapter when discussing Timedelta
objects. pandas extends the capabilities of these using the concept of DateOffset
objects, which represent knowledge of how to integrate time offsets and frequencies relative to DatetimeIndex
objects.
We will examine how pandas uses date offsetting in two different ways. The first calculates offsets of a specific duration from a given date, and the other calculates offsets based on what are referred to as anchored offsets.
DatetimeIndex
objects are created at various frequencies by passing in frequency strings, such as 'M'
, 'W'
, and 'BM'
to the freq
parameter of pd.date_range()
. Under the hood, these frequency strings are translated into an instance of the pandas DateOffset
object, which is associated with the frequency of the index.
DateOffset
represents a regular frequency increment. Specific date offset logic, such as "month", "business day", or "hour", is represented in pandas with various subclasses of DateOffset
. A DateOffset
provides pandas with the intelligence to be able to determine how to calculate a specific interval of time from a reference date and time.
The use of DatetimeIndex
and DateOffset
objects provides the user of pandas great flexibility in calculating a new date/time from another using an offset other than one that represents a fixed period of time. A practical example would be to calculate the next day of business. This is not simply determined by adding one day to datetime
. If a date represents a Friday, the next business day in the US financial market is not Saturday but Monday. In some cases, one business day from a Friday may actually be Tuesday if Monday is a holiday. pandas gives us all the tools required to handle these scenarios.
We can start to examine the use of date offsets by generating a date range using 'B'
as the frequency, which will return a sequence of dates between the specified dates—but only dates that are considered business days during that interval:
In [35]: # get all business days between and inclusive of these two dates dti = pd.date_range('2014-08-29', '2014-09-05', freq='B') dti.values Out[35]: array(['2014-08-28T18:00:00.000000000-0600', '2014-08-31T18:00:00.000000000-0600', '2014-09-01T18:00:00.000000000-0600', '2014-09-02T18:00:00.000000000-0600', '2014-09-03T18:00:00.000000000-0600', '2014-09-04T18:00:00.000000000-0600'], dtype='datetime64[ns]')
This time series has omitted 2014-08-30
and 2014-08-30
, as they are Saturday and Sunday and not considered a business day.
DatetimeIndex
has a .freq
property that represents the frequency of the timestamps in the index:
In [36]: # check the frequency is BusinessDay dti.freq Out[36]: <BusinessDay>
Notice that pandas has created an instance of the BusinessDay
class to represent the DateOffset
unit of this index. As mentioned earlier, pandas represents different date offsets with a subclass of the DateOffset
class. The following are the various built-in date offset classes that are provided by pandas:
Class |
Description |
---|---|
|
Generic offset defaults to one calendar day |
|
Business day |
|
Custom business day |
|
One week, optionally anchored on a day of the week |
|
The x-th day of the y-th week of each month |
|
The x-th day of the last week of each month |
|
Calendar month end |
|
Calendar month start |
|
Business month end |
|
Business month start |
|
Custom business month end |
|
Custom business month start |
| |
|
Quarter start |
| |
|
Business quarter start |
|
Retail (52-53 week) quarter |
|
Calendar year end |
|
Calendar year start |
|
Business quarter end |
|
Business quarter start |
|
Retail (52-53 week) year |
|
One hour |
|
One minute |
|
One second |
|
One millisecond |
|
One microsecond |
pandas takes this strategy of using DateOffset
and its specializations to codify logic to calculate the next datetime
from another datetime
. This makes using these objects very flexible as well as powerful. DateOffset
objects can be used in various scenarios, including the following:
datetime
object to obtain a shifted daterollforward
and rollback
methods to move a date forward or backward to the next or previous "offset date"
DateOffset
objects can be created by passing them a datetime
object that represents a fixed duration of time or using a number of keyword arguments. Keyword arguments fall into two general categories. The first category is keywords that represent absolute dates: year, month, day, hour, minute, second, and microsecond. The second category represents relative durations and can be negative values: years, months, weeks, day, hours, minutes, seconds, and microseconds.
The following creates a 1-day offset and adds it to datetime
:
In [37]: # calculate a one day offset from 2014-8-29 d = datetime(2014, 8, 29) do = pd.DateOffset(days = 1) d + do Out[37]: Timestamp('2014-08-30 00:00:00')
The following calculates the next business day from a given date:
In [38]: # import the data offset types from pandas.tseries.offsets import * # calculate one business day from 2014-8-31 d + BusinessDay() Out[38]: Timestamp('2014-09-01 00:00:00')
Multiple units of a specific DateOffset
can be used using multiplication:
In [39]: # determine 2 business days from 2014-8-29 d + 2 * BusinessDay() Out[39]: Timestamp('2014-09-02 00:00:00')
The following demonstrates using a BMonthEnd
object to calculate the last business day of a month from a given date (in this case, 2014-09-02
):
In [40]: # what is the next business month end # from a specific date? d + BMonthEnd() Out[40]: Timestamp('2014-09-30 00:00:00')
The following uses the BMonthEnd
objects' .rollforward()
method to calculate the next month end:
In [41]: # calculate the next month end by # rolling forward from a specific date BMonthEnd().rollforward(datetime(2014, 9, 15)) Out[41]: Timestamp('2014-09-30 00:00:00')
Several of the offset classes can be parameterized to provide finer control of the offset behavior. As an example, the following calculates the date of the Tuesday (weekday = 1
) in the week prior to 2014-08-31
:
In [42]: # calculate the date of the Tuesday previous # to a specified date d - Week(weekday = 1) Out[42]: Timestamp('2014-08-26 00:00:00')
pandas also supports the concept of anchored offsets, which are frequencies that can be specified using a specific suffix. One version of anchored offsets allows you to specify weekly intervals that fall on a specific day of the week. These aliases are of this form:
Alias |
Description |
---|---|
|
Weekly on Sunday (same as |
|
Weekly on Monday |
|
Weekly on Tuesday |
|
Weekly on Wednesday |
|
Weekly on Thursday |
|
Weekly on Friday |
|
Weekly on Saturday |
As an example, the following generates an index that consists of the dates of all Wednesdays between the two specified dates:
In [43]: # calculate all Wednesdays between 2014-06-01 # and 2014-08-31 wednesdays = pd.date_range('2014-06-01', '2014-08-31', freq="W-WED") wednesdays.values Out[43]: array(['2014-06-03T18:00:00.000000000-0600', '2014-06-10T18:00:00.000000000-0600', '2014-06-17T18:00:00.000000000-0600', '2014-06-24T18:00:00.000000000-0600', '2014-07-01T18:00:00.000000000-0600', '2014-07-08T18:00:00.000000000-0600', '2014-07-15T18:00:00.000000000-0600', '2014-07-22T18:00:00.000000000-0600', '2014-07-29T18:00:00.000000000-0600', '2014-08-05T18:00:00.000000000-0600', '2014-08-12T18:00:00.000000000-0600', '2014-08-19T18:00:00.000000000-0600', '2014-08-26T18:00:00.000000000-0600'], dtype='datetime64[ns]')
Anchored offsets can also be created using the annual and quarterly frequencies. These frequency anchors are of the general form [B][A|Q][S]-[MON]
, where B
(business days) and S
(start of period instead end) are optional, A
is for annual or Q
for quarterly, and MON
is the three-digit abbreviation for the month (JAN
, FEB
, …).
To demonstrate, the following generates the business dates for quarter end in the year 2014 with the year anchored at the end of June:
In [44]: # what are all of the business quarterly end # dates in 2014? qends = pd.date_range('2014-01-01', '2014-12-31', freq='BQS-JUN') qends.values Out[44]: array(['2014-03-02T17:00:00.000000000-0700', '2014-06-01T18:00:00.000000000-0600', '2014-08-31T18:00:00.000000000-0600', '2014-11-30T17:00:00.000000000-0700'], dtype='datetime64[ns]')
Many useful mathematical operations on time-series data require that events within a specific time interval be analyzed. A simple example would be to determine how many financial transactions occurred in a specific period.
This can be performed using Timestamp
and DateOffset
, where the bounds are calculated and then items filtered based on these bounds. However, this becomes cumbersome when you need to deal with events that must be grouped into multiple periods of time as you start to need to manage sets of the Timestamp
and DateOffset
objects.
To facilitate these types of data organization and calculations, pandas makes intervals of time a formal construct using the Period
class. pandas also formalizes series of Period
objects using PeriodIndex
, which provides capabilities of aligning data items based on the indexes' associated period objects.
We will look at the Period
object and PeriodIndex
in detail in the following subsections.
pandas formalizes the concept of an interval of time using a Period
object. Period
allows you to specify durations based on frequencies such as daily, weekly, monthly, annually, quarterly, and so on, and it will provide a specific start and end Timestamp
representing the specific bounded interval of time.
Period
is created using a timestamp and a frequency (seen in the table earlier), where the timestamp represents the anchor used as a point of reference and the frequency is the duration of time.
To demonstrate, the following creates a period representing one month anchored in August 2014:
In [45]: # create a period representing a month of time # starting in August 2014 aug2014 = pd.Period('2014-08', freq='M') aug2014 Out[45]: Period('2014-08', 'M')
Period
has start_time
and end_time
properties that inform us about the derived start and end times of Period
:
In [46]: # examine the start and end times of this period aug2014.start_time, aug2014.end_time Out[46]: (Timestamp('2014-08-01 00:00:00'), Timestamp('2014-08-31 23:59:59.999999999'))
As we specified a period that is August 2014, pandas determines the anchor (start_time
) and then calculates end_time
based on the specified frequency. In this case, it calculates one month from start_time
and returns the last unit of the prior time.
Mathematical operations are overloaded on Period
to calculate another Period
based on the given value. The following creates a new Period
object based on aug2014
, which is shifted by 1
unit of its represented frequency (which is one month):
In [47]: # calculate the period that is one frequency # unit of the aug2014 period further along in time # This happens to be September 2014 sep2014 = aug2014 + 1 sep2014 Out[47]: Period('2014-09', 'M')
The concept of the shift is very important and powerful. The addition of 1
to this Period
object informs it to shift in time one positive unit of whatever frequency is represented by the object. In this case, it shifts the period one month forward to September 2014.
If we examine the start and end times represented in the sep2014
object, we see that pandas has gone through the effort of determining the correct dates representing the entirety of September 2014:
In [48]: sep2014.start_time, sep2014.end_time Out[48]: (Timestamp('2014-09-01 00:00:00'), Timestamp('2014-09-30 23:59:59.999999999'))
Note that Period
had the intelligence to know that September is 30 days and not 31. This is part of the incredible intelligence behind the Period
object that saves us a lot of coding. It is not simply adding 30 days (in this example) but one unit frequency of the period. This helps solve many difficult date management problems.
A series of Period
objects can be combined into a special form of pandas index known as PeriodIndex
. A PeriodIndex
index is useful for being able to associate data to specific intervals of time and being able to slice and perform analysis on the events in each interval represented in PeriodIndex
.
The following code creates a PeriodIndex
consisting of 1-month intervals for the year 2013:
In [49]: # create a period index representing # all monthly boundaries in 2013 mp2013 = pd.period_range('1/1/2013', '12/31/2013', freq='M') mp2013 Out[49]: <class 'pandas.tseries.period.PeriodIndex'> [2013-01, ..., 2013-12] Length: 12, Freq: M
PeriodIndex
differs from DatetimeIndex
in that the index labels are the Period
objects. The following prints the start and end times for all the Period
objects in the index:
In [50]: # loop through all period objects in the index # printing start and end time for each for p in mp2013: print ("{0} {1}".format(p.start_time, p.end_time)) Out[50]: 2013-01-01 00:00:00 2013-01-31 23:59:59.999999999 2013-02-01 00:00:00 2013-02-28 23:59:59.999999999 2013-03-01 00:00:00 2013-03-31 23:59:59.999999999 2013-04-01 00:00:00 2013-04-30 23:59:59.999999999 2013-05-01 00:00:00 2013-05-31 23:59:59.999999999 2013-06-01 00:00:00 2013-06-30 23:59:59.999999999 2013-07-01 00:00:00 2013-07-31 23:59:59.999999999 2013-08-01 00:00:00 2013-08-31 23:59:59.999999999 2013-09-01 00:00:00 2013-09-30 23:59:59.999999999 2013-10-01 00:00:00 2013-10-31 23:59:59.999999999 2013-11-01 00:00:00 2013-11-30 23:59:59.999999999 2013-12-01 00:00:00 2013-12-31 23:59:59.999999999
pandas has conveniently determined the start and end of each month, taking into account the actual number of days in each specific month.
Using PeriodIndex
, we can construct a Series
object using it as the index and associate a value to each Period
in the index:
In [51]: # create a Series with a PeriodIndex np.random.seed(123456) ps = pd.Series(np.random.randn(12), mp2013) ps Out[51]: 2013-01 0.469112 2013-02 -0.282863 2013-03 -1.509059 ... 2013-10 -2.104569 2013-11 -0.494929 2013-12 1.071804 Freq: M, Length: 12
We now have a time series where the value at a specific index label represents a measurement that spans a period of time, such as the average value of a security in a given month instead of at a specific time. This becomes very useful when we perform the resampling of the time series to another frequency.
Like DatetimeIndex
, PeriodIndex
can be used to index values using Period
, a string representing a period or partial period specification. To demonstrate, we will create another series similar to the previous one but spanning two years, 2013 and 2014:
In [52]: # create a Series with a PeriodIndex and which # represents all calendar month periods in 2013 and 2014 np.random.seed(123456) ps = pd.Series(np.random.randn(24), pd.period_range('1/1/2013', '12/31/2014', freq='M')) ps Out[52]: 2013-01 0.469112 2013-02 -0.282863 2013-03 -1.509059 ... 2014-10 0.113648 2014-11 -1.478427 2014-12 0.524988 Freq: M, Length: 24
Individual values can be selected using the specific index label using either a Period
object or a string representing a period. The following demonstrates how to use a string representation:
In [53]: # get value for period represented by 2014-06 ps['2014-06'] Out[53]: 0.567020349793672
Partial specifications can also be used, such as the following, which retrieves all values just for periods in 2014
:
In [54]: # get values for all periods in 2014 ps['2014'] Out[54]: 2014-01 0.721555 2014-02 -0.706771 2014-03 -1.039575 ... 2014-10 0.113648 2014-11 -1.478427 2014-12 0.524988 Freq: M, Length: 12
Like any index in pandas, PeriodIndex
can also be sliced. The following retrieves all values for periods between (and inclusive of) March and June 2014:
In [55]: # all values between (and including) March and June 2014 ps['2014-03':'2014-06'] Out[55]: 2014-03 -1.039575 2014-04 0.271860 2014-05 -0.424972 2014-06 0.567020 Freq: M, dtype: float64
3.129.72.176