Calculating new dates using offsets

Frequencies in pandas are represented using date offsets. We have touched on this concept at the beginning of the chapter when discussing Timedelta objects. pandas extends the capabilities of these using the concept of DateOffset objects, which represent knowledge of how to integrate time offsets and frequencies relative to DatetimeIndex objects.

We will examine how pandas uses date offsetting in two different ways. The first calculates offsets of a specific duration from a given date, and the other calculates offsets based on what are referred to as anchored offsets.

Date offsets

DatetimeIndex objects are created at various frequencies by passing in frequency strings, such as 'M', 'W', and 'BM' to the freq parameter of pd.date_range(). Under the hood, these frequency strings are translated into an instance of the pandas DateOffset object, which is associated with the frequency of the index.

DateOffset represents a regular frequency increment. Specific date offset logic, such as "month", "business day", or "hour", is represented in pandas with various subclasses of DateOffset. A DateOffset provides pandas with the intelligence to be able to determine how to calculate a specific interval of time from a reference date and time.

The use of DatetimeIndex and DateOffset objects provides the user of pandas great flexibility in calculating a new date/time from another using an offset other than one that represents a fixed period of time. A practical example would be to calculate the next day of business. This is not simply determined by adding one day to datetime. If a date represents a Friday, the next business day in the US financial market is not Saturday but Monday. In some cases, one business day from a Friday may actually be Tuesday if Monday is a holiday. pandas gives us all the tools required to handle these scenarios.

We can start to examine the use of date offsets by generating a date range using 'B' as the frequency, which will return a sequence of dates between the specified dates—but only dates that are considered business days during that interval:

In [35]:
   # get all business days between and inclusive of these two dates
   dti = pd.date_range('2014-08-29', '2014-09-05', freq='B')
   dti.values

Out[35]:
   array(['2014-08-28T18:00:00.000000000-0600',
          '2014-08-31T18:00:00.000000000-0600',
          '2014-09-01T18:00:00.000000000-0600',
          '2014-09-02T18:00:00.000000000-0600',
          '2014-09-03T18:00:00.000000000-0600',
          '2014-09-04T18:00:00.000000000-0600'], dtype='datetime64[ns]')

This time series has omitted 2014-08-30 and 2014-08-30, as they are Saturday and Sunday and not considered a business day.

DatetimeIndex has a .freq property that represents the frequency of the timestamps in the index:

In [36]:
   # check the frequency is BusinessDay
   dti.freq

Out[36]:
   <BusinessDay>

Notice that pandas has created an instance of the BusinessDay class to represent the DateOffset unit of this index. As mentioned earlier, pandas represents different date offsets with a subclass of the DateOffset class. The following are the various built-in date offset classes that are provided by pandas:

Class

Description

DateOffset

Generic offset defaults to one calendar day

BDay

Business day

CDay

Custom business day

Week

One week, optionally anchored on a day of the week

WeekOfMonth

The x-th day of the y-th week of each month

LastWeekOfMonth

The x-th day of the last week of each month

MonthEnd

Calendar month end

MonthBegin

Calendar month start

BMonthEnd

Business month end

BMonthBegin

Business month start

CBMonthEnd

Custom business month end

CBMonthBegin

Custom business month start

QuarterEnd

Quarter end

QuarterBegin

Quarter start

BQuarterEnd

Business quarter end

BQuarterBegin

Business quarter start

FYS253Quarter

Retail (52-53 week) quarter

YearEnd

Calendar year end

YearBegin

Calendar year start

BYearEnd

Business quarter end

BYearBegin

Business quarter start

FYS253

Retail (52-53 week) year

Hour

One hour

Minute

One minute

Second

One second

Milli

One millisecond

Micro

One microsecond

pandas takes this strategy of using DateOffset and its specializations to codify logic to calculate the next datetime from another datetime. This makes using these objects very flexible as well as powerful. DateOffset objects can be used in various scenarios, including the following:

  • They can be added/subtracted to/from a datetime object to obtain a shifted date
  • They can be multiplied by an integer (positive or negative) so that the increment will be applied multiple times
  • They have rollforward and rollback methods to move a date forward or backward to the next or previous "offset date"

DateOffset objects can be created by passing them a datetime object that represents a fixed duration of time or using a number of keyword arguments. Keyword arguments fall into two general categories. The first category is keywords that represent absolute dates: year, month, day, hour, minute, second, and microsecond. The second category represents relative durations and can be negative values: years, months, weeks, day, hours, minutes, seconds, and microseconds.

The following creates a 1-day offset and adds it to datetime:

In [37]:
   # calculate a one day offset from 2014-8-29
   d = datetime(2014, 8, 29)
   do = pd.DateOffset(days = 1) 
   d + do

Out[37]:
   Timestamp('2014-08-30 00:00:00')

The following calculates the next business day from a given date:

In [38]:
   # import the data offset types
   from pandas.tseries.offsets import *
   # calculate one business day from 2014-8-31
   d + BusinessDay()

Out[38]:
   Timestamp('2014-09-01 00:00:00')

Multiple units of a specific DateOffset can be used using multiplication:

In [39]:
   # determine 2 business days from 2014-8-29
   d + 2 * BusinessDay()

Out[39]:
   Timestamp('2014-09-02 00:00:00')

The following demonstrates using a BMonthEnd object to calculate the last business day of a month from a given date (in this case, 2014-09-02):

In [40]:
   # what is the next business month end
   # from a specific date?
   d + BMonthEnd()

Out[40]:
   Timestamp('2014-09-30 00:00:00')

The following uses the BMonthEnd objects' .rollforward() method to calculate the next month end:

In [41]:
   # calculate the next month end by
   # rolling forward from a specific date
   BMonthEnd().rollforward(datetime(2014, 9, 15))

Out[41]:
   Timestamp('2014-09-30 00:00:00')

Several of the offset classes can be parameterized to provide finer control of the offset behavior. As an example, the following calculates the date of the Tuesday (weekday = 1) in the week prior to 2014-08-31:

In [42]:
   # calculate the date of the Tuesday previous
   # to a specified date 
   d - Week(weekday = 1)

Out[42]:
   Timestamp('2014-08-26 00:00:00')

Anchored offsets

pandas also supports the concept of anchored offsets, which are frequencies that can be specified using a specific suffix. One version of anchored offsets allows you to specify weekly intervals that fall on a specific day of the week. These aliases are of this form:

Alias

Description

W-SUN

Weekly on Sunday (same as 'W')

W-MON

Weekly on Monday

W-TUE

Weekly on Tuesday

W-WED

Weekly on Wednesday

W-THU

Weekly on Thursday

W-FRI

Weekly on Friday

W-SAT

Weekly on Saturday

As an example, the following generates an index that consists of the dates of all Wednesdays between the two specified dates:

In [43]:
   # calculate all Wednesdays between 2014-06-01
   # and 2014-08-31
   wednesdays = pd.date_range('2014-06-01', 
                              '2014-08-31', freq="W-WED")
   wednesdays.values

Out[43]:
   array(['2014-06-03T18:00:00.000000000-0600',
          '2014-06-10T18:00:00.000000000-0600',
          '2014-06-17T18:00:00.000000000-0600',
          '2014-06-24T18:00:00.000000000-0600',
          '2014-07-01T18:00:00.000000000-0600',
          '2014-07-08T18:00:00.000000000-0600',
          '2014-07-15T18:00:00.000000000-0600',
          '2014-07-22T18:00:00.000000000-0600',
          '2014-07-29T18:00:00.000000000-0600',
          '2014-08-05T18:00:00.000000000-0600',
          '2014-08-12T18:00:00.000000000-0600',
          '2014-08-19T18:00:00.000000000-0600',
          '2014-08-26T18:00:00.000000000-0600'], dtype='datetime64[ns]')

Anchored offsets can also be created using the annual and quarterly frequencies. These frequency anchors are of the general form [B][A|Q][S]-[MON], where B (business days) and S (start of period instead end) are optional, A is for annual or Q for quarterly, and MON is the three-digit abbreviation for the month (JAN, FEB, …).

To demonstrate, the following generates the business dates for quarter end in the year 2014 with the year anchored at the end of June:

In [44]:
   # what are all of the business quarterly end
   # dates in 2014?
   qends = pd.date_range('2014-01-01', '2014-12-31', 
                         freq='BQS-JUN')
   qends.values

Out[44]:
   array(['2014-03-02T17:00:00.000000000-0700',
          '2014-06-01T18:00:00.000000000-0600',
          '2014-08-31T18:00:00.000000000-0600',
          '2014-11-30T17:00:00.000000000-0700'], 
   dtype='datetime64[ns]')

Representing durations of time using Period objects

Many useful mathematical operations on time-series data require that events within a specific time interval be analyzed. A simple example would be to determine how many financial transactions occurred in a specific period.

This can be performed using Timestamp and DateOffset, where the bounds are calculated and then items filtered based on these bounds. However, this becomes cumbersome when you need to deal with events that must be grouped into multiple periods of time as you start to need to manage sets of the Timestamp and DateOffset objects.

To facilitate these types of data organization and calculations, pandas makes intervals of time a formal construct using the Period class. pandas also formalizes series of Period objects using PeriodIndex, which provides capabilities of aligning data items based on the indexes' associated period objects.

We will look at the Period object and PeriodIndex in detail in the following subsections.

The Period object

pandas formalizes the concept of an interval of time using a Period object. Period allows you to specify durations based on frequencies such as daily, weekly, monthly, annually, quarterly, and so on, and it will provide a specific start and end Timestamp representing the specific bounded interval of time.

Period is created using a timestamp and a frequency (seen in the table earlier), where the timestamp represents the anchor used as a point of reference and the frequency is the duration of time.

To demonstrate, the following creates a period representing one month anchored in August 2014:

In [45]:
   # create a period representing a month of time
   # starting in August 2014
   aug2014 = pd.Period('2014-08', freq='M')
   aug2014

Out[45]:
   Period('2014-08', 'M')

Period has start_time and end_time properties that inform us about the derived start and end times of Period:

In [46]:
   # examine the start and end times of this period
   aug2014.start_time, aug2014.end_time

Out[46]:
   (Timestamp('2014-08-01 00:00:00'), Timestamp('2014-08-31 23:59:59.999999999'))

As we specified a period that is August 2014, pandas determines the anchor (start_time) and then calculates end_time based on the specified frequency. In this case, it calculates one month from start_time and returns the last unit of the prior time.

Mathematical operations are overloaded on Period to calculate another Period based on the given value. The following creates a new Period object based on aug2014, which is shifted by 1 unit of its represented frequency (which is one month):

In [47]:
   # calculate the period that is one frequency
   # unit of the aug2014 period further along in time
   # This happens to be September 2014
   sep2014 = aug2014 + 1
   sep2014

Out[47]:
   Period('2014-09', 'M')

The concept of the shift is very important and powerful. The addition of 1 to this Period object informs it to shift in time one positive unit of whatever frequency is represented by the object. In this case, it shifts the period one month forward to September 2014.

If we examine the start and end times represented in the sep2014 object, we see that pandas has gone through the effort of determining the correct dates representing the entirety of September 2014:

In [48]:
   sep2014.start_time, sep2014.end_time

Out[48]:
   (Timestamp('2014-09-01 00:00:00'), Timestamp('2014-09-30 23:59:59.999999999'))

Note that Period had the intelligence to know that September is 30 days and not 31. This is part of the incredible intelligence behind the Period object that saves us a lot of coding. It is not simply adding 30 days (in this example) but one unit frequency of the period. This helps solve many difficult date management problems.

PeriodIndex

A series of Period objects can be combined into a special form of pandas index known as PeriodIndex. A PeriodIndex index is useful for being able to associate data to specific intervals of time and being able to slice and perform analysis on the events in each interval represented in PeriodIndex.

The following code creates a PeriodIndex consisting of 1-month intervals for the year 2013:

In [49]:
   # create a period index representing 
   # all monthly boundaries in 2013
   mp2013 = pd.period_range('1/1/2013', '12/31/2013', freq='M')
   mp2013

Out[49]:
   <class 'pandas.tseries.period.PeriodIndex'>
   [2013-01, ..., 2013-12]
   Length: 12, Freq: M

PeriodIndex differs from DatetimeIndex in that the index labels are the Period objects. The following prints the start and end times for all the Period objects in the index:

In [50]:
   # loop through all period objects in the index
   # printing start and end time for each
   for p in mp2013:
      print ("{0} {1}".format(p.start_time, p.end_time))

Out[50]:
   2013-01-01 00:00:00 2013-01-31 23:59:59.999999999
   2013-02-01 00:00:00 2013-02-28 23:59:59.999999999
   2013-03-01 00:00:00 2013-03-31 23:59:59.999999999
   2013-04-01 00:00:00 2013-04-30 23:59:59.999999999
   2013-05-01 00:00:00 2013-05-31 23:59:59.999999999
   2013-06-01 00:00:00 2013-06-30 23:59:59.999999999
   2013-07-01 00:00:00 2013-07-31 23:59:59.999999999
   2013-08-01 00:00:00 2013-08-31 23:59:59.999999999
   2013-09-01 00:00:00 2013-09-30 23:59:59.999999999
   2013-10-01 00:00:00 2013-10-31 23:59:59.999999999
   2013-11-01 00:00:00 2013-11-30 23:59:59.999999999
   2013-12-01 00:00:00 2013-12-31 23:59:59.999999999

pandas has conveniently determined the start and end of each month, taking into account the actual number of days in each specific month.

Using PeriodIndex, we can construct a Series object using it as the index and associate a value to each Period in the index:

In [51]:
   # create a Series with a PeriodIndex
   np.random.seed(123456)
   ps = pd.Series(np.random.randn(12), mp2013)
   ps

Out[51]:
   2013-01    0.469112
   2013-02   -0.282863
   2013-03   -1.509059
   ...
   2013-10   -2.104569
   2013-11   -0.494929
   2013-12    1.071804
   Freq: M, Length: 12

We now have a time series where the value at a specific index label represents a measurement that spans a period of time, such as the average value of a security in a given month instead of at a specific time. This becomes very useful when we perform the resampling of the time series to another frequency.

Like DatetimeIndex, PeriodIndex can be used to index values using Period, a string representing a period or partial period specification. To demonstrate, we will create another series similar to the previous one but spanning two years, 2013 and 2014:

In [52]:
   # create a Series with a PeriodIndex and which
   # represents all calendar month periods in 2013 and 2014
   np.random.seed(123456)
   ps = pd.Series(np.random.randn(24), 
                  pd.period_range('1/1/2013', 
                                  '12/31/2014', freq='M'))
   ps

Out[52]:
   2013-01    0.469112
   2013-02   -0.282863
   2013-03   -1.509059
   ...
   2014-10    0.113648
   2014-11   -1.478427
   2014-12    0.524988
   Freq: M, Length: 24

Individual values can be selected using the specific index label using either a Period object or a string representing a period. The following demonstrates how to use a string representation:

In [53]:
   # get value for period represented by 2014-06
   ps['2014-06']

Out[53]:
   0.567020349793672

Partial specifications can also be used, such as the following, which retrieves all values just for periods in 2014:

In [54]:
   # get values for all periods in 2014
   ps['2014']

Out[54]:
   2014-01    0.721555
   2014-02   -0.706771
   2014-03   -1.039575
   ...
   2014-10    0.113648
   2014-11   -1.478427
   2014-12    0.524988
   Freq: M, Length: 12

Like any index in pandas, PeriodIndex can also be sliced. The following retrieves all values for periods between (and inclusive of) March and June 2014:

In [55]:
   # all values between (and including) March and June 2014
   ps['2014-03':'2014-06']

Out[55]:
   2014-03   -1.039575
   2014-04    0.271860
   2014-05   -0.424972
   2014-06    0.567020
   Freq: M, dtype: float64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.72.176