Time zone management can be one of the most complicated issues to deal with when working with time-series data. Data is often collected in different systems across the globe using local time, and at some point, it will require coordination with data collected in other time zones.
Fortunately, pandas provides rich support for working with timestamps in different time zones. Under the covers, pandas utilizes the pytz
and dateutil
libraries to manage the time zone operations. The dateutil
support is new as of pandas 0.14.1 and currently only supported for fixed offset and tzfile zones. The default library used by pandas is pytz
, with support for dateutil
provided for compatibility with other applications.
pandas objects that are time zone-aware support a .tz
property. By default, pandas objects that are time zone-aware do not utilize a timezone
object for purposes of efficiency. The following gets the current time and demonstrates that there is no time zone information by default:
In [58]: # get the current local time and demonstrate there is no # timezone info by default now = pd.Timestamp('now') now, now.tz is None Out[58]: (Timestamp('2015-03-06 11:07:51.687326'), True)
This demonstrates that pandas treats Timestamp("now")
as UTC by default but without time zone data. This is a good default, but be aware of this. In general, I find that if you are ever collecting data based on the time that will be stored for later access, or collected from multiple data sources, it is best to always localize to UTC.
Likewise, DatetimeIndex
and its Timestamp
objects will not have associated time zone information by default:
In [59]: # default DatetimeIndex and its Timestamps do not have # time zone information rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D') rng.tz is None, rng[0].tz is None Out[59]: (True, True)
A list of common time zone names can be retrieved as shown in the following example. If you do a lot with time zone data, these will become very familiar:
In [60]: # import common timezones from pytz from pytz import common_timezones # report the first 5 common_timezones[:5] Out[60]: ['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara']
The local UTC time can be found using the following, which utilizes the .tz_localize()
method of Timestamp
passing the 'UTC'
method:
In [61]: # get now, and now localized to UTC now = Timestamp("now") local_now = now.tz_localize('UTC') now, local_now Out[61]: (Timestamp('2015-03-06 11:07:51.750893'), Timestamp('2015-03-06 11:07:51.750893+0000', tz='UTC'))
Any Timestamp
can be localized to a specific time zone by passing the time zone name to .tz_localize()
:
In [62]: # localize a timestamp to US/Mountain time zone tstamp = Timestamp('2014-08-01 12:00:00', tz='US/Mountain') tstamp Out[62]: Timestamp('2014-08-01 12:00:00-0600', tz='US/Mountain')
DatetimeIndex
can be created with a specific time zone using the tz
parameter of the pd.date_range()
method:
In [63]: # create a DatetimeIndex using a time zone rng = pd.date_range('3/6/2012 00:00:00', periods=10, freq='D', tz='US/Mountain') rng.tz, rng[0].tz Out[63]: (<DstTzInfo 'US/Mountain' LMT-1 day, 17:00:00 STD>, <DstTzInfo 'US/Mountain' MST-1 day, 17:00:00 STD>)
It is also possible to construct other time zones explicitly. This model can give you more control over which time zone is used in .tz_localize()
. The following creates two different timezone
objects and localizes a Timestamp
to each:
In [64]: # show use of time zone objects # need to reference pytz import pytz # create an object for two different time zones mountain_tz = pytz.timezone("US/Mountain") eastern_tz = pytz.timezone("US/Eastern") # apply each to 'now' mountain_tz.localize(now), eastern_tz.localize(now) Out[64]: (Timestamp('2015-03-06 11:07:51.750893-0700', tz='US/Mountain'), Timestamp('2015-03-06 11:07:51.750893-0500', tz='US/Eastern'))
Operations on multiple time-series objects will be aligned by Timestamp
in their index by taking into account the time zone information. To demonstrate, we will use the following, which creates two Series
objects using the two DatetimeIndex
objects, each with the same start, periods, and frequency but using different time zones:
In [65]: # create two Series, same start, same periods, same frequencies, # each with a different time zone s_mountain = Series(np.arange(0, 5), index=pd.date_range('2014-08-01', periods=5, freq="H", tz='US/Mountain')) s_eastern = Series(np.arange(0, 5), index=pd.date_range('2014-08-01', periods=5, freq="H", tz='US/Eastern')) s_mountain Out[65]: 2014-08-01 00:00:00-06:00 0 2014-08-01 01:00:00-06:00 1 2014-08-01 02:00:00-06:00 2 2014-08-01 03:00:00-06:00 3 2014-08-01 04:00:00-06:00 4 Freq: H, dtype: int64 In [66]: s_eastern Out[66]: 2014-08-01 00:00:00-04:00 0 2014-08-01 01:00:00-04:00 1 2014-08-01 02:00:00-04:00 2 2014-08-01 03:00:00-04:00 3 2014-08-01 04:00:00-04:00 4 Freq: H, dtype: int64
The following demonstrates the alignment of these two Series
objects by time zone by adding the two together:
In [67]: # add the two Series # This only results in three items being aligned s_eastern + s_mountain Out[67]: 2014-08-01 04:00:00+00:00 NaN 2014-08-01 05:00:00+00:00 NaN 2014-08-01 06:00:00+00:00 2 2014-08-01 07:00:00+00:00 4 2014-08-01 08:00:00+00:00 6 2014-08-01 09:00:00+00:00 NaN 2014-08-01 10:00:00+00:00 NaN Freq: H, dtype: float64
Once a time zone is assigned to an object, that object can be converted to another time zone using the tz.convert()
method:
In [68]: # convert s1 from US/Eastern to US/Pacific s_pacific = s_eastern.tz_convert("US/Pacific") s_pacific Out[68]: 2014-07-31 21:00:00-07:00 0 2014-07-31 22:00:00-07:00 1 2014-07-31 23:00:00-07:00 2 2014-08-01 00:00:00-07:00 3 2014-08-01 01:00:00-07:00 4 Freq: H, dtype: int64
Now if we add s_pacific
to s_mountain
, the alignment will force the same result:
In [69]: # this will be the same result as s_eastern + s_mountain # as the time zones still get aligned to be the same s_mountain + s_pacific Out[69]: 2014-08-01 04:00:00+00:00 NaN 2014-08-01 05:00:00+00:00 NaN 2014-08-01 06:00:00+00:00 2 2014-08-01 07:00:00+00:00 4 2014-08-01 08:00:00+00:00 6 2014-08-01 09:00:00+00:00 NaN 2014-08-01 10:00:00+00:00 NaN Freq: H, dtype: float64
3.149.213.44