Dealing with dates

Dates are complicated. Just think of the Y2K bug, the pending Year 2038 problem, and time zones. It's a mess. We encounter dates naturally when dealing with the time-series data. pandas can create date ranges, resample time-series data, and perform date arithmetic operations.

Create a range of dates starting from January 1, 1900 with 42 days as follows:

print "Date range", pd.date_range('1/1/1900', periods=42, freq='D')

January has less than 42 days, so the end date falls in February as you can check for yourself:

Date range <class 'pandas.tseries.index.DatetimeIndex'>
[1900-01-01, ..., 1900-02-11]
Length: 42, Freq: D, Timezone: None

The following table from the pandas official documentation (refer to http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) describes frequencies used in pandas:

Short code

Description

B

Business day frequency

C

Custom business day frequency (experimental)

D

Calendar day frequency

W

Weekly frequency

M

Month end frequency

BM

Business month end frequency

MS

Month start frequency

BMS

Business month start frequency

Q

Quarter end frequency

BQ

Business quarter end frequency

QS

Quarter start frequency

BQS

Business quarter start frequency

A

Year end frequency

BA

Business year end frequency

AS

Year start frequency

BAS

Business year start frequency

H

Hourly frequency

T

Minutely frequency

S

Secondly frequency

L

Milliseconds

U

Microseconds

Date ranges have their limits in pandas. Timestamps in pandas (based on the NumPy datetime64 data type) are represented by a 64-bit integer with nanosecond resolution (a billionth of a second). This limits legal timestamps to dates in the range approximately between the year 1677 and 2262 (not all dates in these years are valid). The exact midpoint of this range is at January 1, 1970. For example, January 1, 1677 cannot be defined with a pandas timestamp, while September 30, 1677 can, as demonstrated in the following code snippet:

try:
   print "Date range", pd.date_range('1/1/1677', periods=4, freq='D')
except:
   etype, value, _ = sys.exc_info()
   print "Error encountered", etype, value

The code snippet prints the following error message:

Date range Error encountered <class 'pandas.tslib.OutOfBoundsDatetime'> Out of bounds nanosecond timestamp: 1677-01-01 00:00:00

Given all the previous information, calculate the allowed date range with pandas DateOffset as follows:

offset = DateOffset(seconds=2 ** 63/10 ** 9)
mid = pd.to_datetime('1/1/1970')
print "Start valid range", mid - offset
print "End valid range", mid + offset'

We get the following range values:

Start valid range 1677-09-21 00:12:44
End valid range 2262-04-11 23:47:16

We can convert a list of strings to dates with pandas. Of course, not all strings can be converted. If pandas is unable to convert a string, an error is often reported. Sometimes, ambiguities can arise due to differences in the way dates are defined in different locales. Use a format string in this case, as follows:

print "With format", pd.to_datetime(['19021112', '19031230'], format='%Y%m%d')

The strings should be converted without an error occurring:

With format [datetime.datetime(1902, 11, 12, 0, 0)
 datetime.datetime(1903, 12, 30, 0, 0)]

If we try to convert a string, which is clearly not a date, by default the string is not converted:

print "Illegal date", pd.to_datetime(['1902-11-12', 'not a date'])

The second string in the list should not be converted:

Illegal date ['1902-11-12' 'not a date']

To force conversion, set the coerce parameter to True:

print "Illegal date coerced", pd.to_datetime(['1902-11-12', 'not a date'], coerce=True)

Obviously, the second string still cannot be converted to a date, so the only valid value we can give it is NaT (not a time):

Illegal date coerced <class 'pandas.tseries.index.DatetimeIndex'>
[1902-11-12, NaT]
Length: 2, Freq: None, Timezone: None

The code for this example is in date_handling.py of this book's code bundle:

import pandas as pd
import sys

print "Date range", pd.date_range('1/1/1900', periods=42, freq='D')

try:
   print "Date range", pd.date_range('1/1/1677', periods=4, freq='D')
except:
   etype, value, _ = sys.exc_info()
   print "Error encountered", etype, value

print pd.to_datetime(['1900/1/1', '1901.12.11'])

print "With format", pd.to_datetime(['19021112', '19031230'], format='%Y%m%d')

print "Illegal date", pd.to_datetime(['1902-11-12', 'not a date'])
print "Illegal date coerced", pd.to_datetime(['1902-11-12', 'not a date'], coerce=True)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.184.102