Dates are complicated. Just think of the Y2K bug, the pending Year 2038 problem, and time zones. It's a mess. We encounter dates naturally when dealing with the time-series data. pandas can create date ranges, resample time-series data, and perform date arithmetic operations.
Create a range of dates starting from January 1, 1900 with 42 days as follows:
print "Date range", pd.date_range('1/1/1900', periods=42, freq='D')
January has less than 42 days, so the end date falls in February as you can check for yourself:
Date range <class 'pandas.tseries.index.DatetimeIndex'> [1900-01-01, ..., 1900-02-11] Length: 42, Freq: D, Timezone: None
The following table from the pandas official documentation (refer to http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) describes frequencies used in pandas:
Short code |
Description |
---|---|
B |
Business day frequency |
C |
Custom business day frequency (experimental) |
D |
Calendar day frequency |
W |
Weekly frequency |
M |
Month end frequency |
BM |
Business month end frequency |
MS |
Month start frequency |
BMS |
Business month start frequency |
Q |
Quarter end frequency |
BQ |
Business quarter end frequency |
QS |
Quarter start frequency |
BQS |
Business quarter start frequency |
A |
Year end frequency |
BA |
Business year end frequency |
AS |
Year start frequency |
BAS |
Business year start frequency |
H |
Hourly frequency |
T |
Minutely frequency |
S |
Secondly frequency |
L |
Milliseconds |
U |
Microseconds |
Date ranges have their limits in pandas. Timestamps in pandas (based on the NumPy datetime64
data type) are represented by a 64-bit integer with nanosecond resolution (a billionth of a second). This limits legal timestamps to dates in the range approximately between the year 1677 and 2262 (not all dates in these years are valid). The exact midpoint of this range is at January 1, 1970. For example, January 1, 1677 cannot be defined with a pandas timestamp, while September 30, 1677 can, as demonstrated in the following code snippet:
try: print "Date range", pd.date_range('1/1/1677', periods=4, freq='D') except: etype, value, _ = sys.exc_info() print "Error encountered", etype, value
The code snippet prints the following error message:
Date range Error encountered <class 'pandas.tslib.OutOfBoundsDatetime'> Out of bounds nanosecond timestamp: 1677-01-01 00:00:00
Given all the previous information, calculate the allowed date range with pandas DateOffset
as follows:
offset = DateOffset(seconds=2 ** 63/10 ** 9) mid = pd.to_datetime('1/1/1970') print "Start valid range", mid - offset print "End valid range", mid + offset'
We get the following range values:
Start valid range 1677-09-21 00:12:44 End valid range 2262-04-11 23:47:16
We can convert a list of strings to dates with pandas. Of course, not all strings can be converted. If pandas is unable to convert a string, an error is often reported. Sometimes, ambiguities can arise due to differences in the way dates are defined in different locales. Use a format string in this case, as follows:
print "With format", pd.to_datetime(['19021112', '19031230'], format='%Y%m%d')
The strings should be converted without an error occurring:
With format [datetime.datetime(1902, 11, 12, 0, 0) datetime.datetime(1903, 12, 30, 0, 0)]
If we try to convert a string, which is clearly not a date, by default the string is not converted:
print "Illegal date", pd.to_datetime(['1902-11-12', 'not a date'])
The second string in the list should not be converted:
Illegal date ['1902-11-12' 'not a date']
To force conversion, set the coerce
parameter to True
:
print "Illegal date coerced", pd.to_datetime(['1902-11-12', 'not a date'], coerce=True)
Obviously, the second string still cannot be converted to a date, so the only valid value we can give it is NaT
(not a time):
Illegal date coerced <class 'pandas.tseries.index.DatetimeIndex'> [1902-11-12, NaT] Length: 2, Freq: None, Timezone: None
The code for this example is in date_handling.py
of this book's code bundle:
import pandas as pd import sys print "Date range", pd.date_range('1/1/1900', periods=42, freq='D') try: print "Date range", pd.date_range('1/1/1677', periods=4, freq='D') except: etype, value, _ = sys.exc_info() print "Error encountered", etype, value print pd.to_datetime(['1900/1/1', '1901.12.11']) print "With format", pd.to_datetime(['19021112', '19031230'], format='%Y%m%d') print "Illegal date", pd.to_datetime(['1902-11-12', 'not a date']) print "Illegal date coerced", pd.to_datetime(['1902-11-12', 'not a date'], coerce=True)
3.137.184.102