Normalizing timestamps using time zones

Time zone management can be one of the most complicated issues to deal with when working with time-series data. Data is often collected in different systems across the globe using local time, and at some point, it will require coordination with data collected in other time zones.

Fortunately, pandas provides rich support for working with timestamps in different time zones. Under the covers, pandas utilizes the pytz and dateutil libraries to manage the time zone operations. The dateutil support is new as of pandas 0.14.1 and currently only supported for fixed offset and tzfile zones. The default library used by pandas is pytz, with support for dateutil provided for compatibility with other applications.

pandas objects that are time zone-aware support a .tz property. By default, pandas objects that are time zone-aware do not utilize a timezone object for purposes of efficiency. The following gets the current time and demonstrates that there is no time zone information by default:

In [58]:
   # get the current local time and demonstrate there is no
   # timezone info by default
   now = pd.Timestamp('now')
   now, now.tz is None

Out[58]:
   (Timestamp('2015-03-06 11:07:51.687326'), True)

Note

This demonstrates that pandas treats Timestamp("now") as UTC by default but without time zone data. This is a good default, but be aware of this. In general, I find that if you are ever collecting data based on the time that will be stored for later access, or collected from multiple data sources, it is best to always localize to UTC.

Likewise, DatetimeIndex and its Timestamp objects will not have associated time zone information by default:

In [59]:
   # default DatetimeIndex and its Timestamps do not have
   # time zone information
   rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D')
   rng.tz is None, rng[0].tz is None

Out[59]:
   (True, True)

A list of common time zone names can be retrieved as shown in the following example. If you do a lot with time zone data, these will become very familiar:

In [60]:
   # import common timezones from pytz
   from pytz import common_timezones
   # report the first 5
   common_timezones[:5]

Out[60]:
   ['Africa/Abidjan',
    'Africa/Accra',
    'Africa/Addis_Ababa',
    'Africa/Algiers',
    'Africa/Asmara']

The local UTC time can be found using the following, which utilizes the .tz_localize() method of Timestamp passing the 'UTC' method:

In [61]:
   # get now, and now localized to UTC
   now = Timestamp("now")
   local_now = now.tz_localize('UTC')
   now, local_now

Out[61]:
   (Timestamp('2015-03-06 11:07:51.750893'),
    Timestamp('2015-03-06 11:07:51.750893+0000', tz='UTC'))

Any Timestamp can be localized to a specific time zone by passing the time zone name to .tz_localize():

In [62]:
   # localize a timestamp to US/Mountain time zone
   tstamp = Timestamp('2014-08-01 12:00:00', tz='US/Mountain')
   tstamp

Out[62]:
   Timestamp('2014-08-01 12:00:00-0600', tz='US/Mountain')

DatetimeIndex can be created with a specific time zone using the tz parameter of the pd.date_range() method:

In [63]:
   # create a DatetimeIndex using a time zone
   rng = pd.date_range('3/6/2012 00:00:00',
                    periods=10, freq='D', tz='US/Mountain')
   rng.tz, rng[0].tz

Out[63]:
   (<DstTzInfo 'US/Mountain' LMT-1 day, 17:00:00 STD>,
    <DstTzInfo 'US/Mountain' MST-1 day, 17:00:00 STD>)

It is also possible to construct other time zones explicitly. This model can give you more control over which time zone is used in .tz_localize(). The following creates two different timezone objects and localizes a Timestamp to each:

In [64]:
   # show use of time zone objects
   # need to reference pytz
   import pytz
   # create an object for two different time zones
   mountain_tz = pytz.timezone("US/Mountain")
   eastern_tz = pytz.timezone("US/Eastern")
   # apply each to 'now'
   mountain_tz.localize(now), eastern_tz.localize(now)

Out[64]:
   (Timestamp('2015-03-06 11:07:51.750893-0700', tz='US/Mountain'),
    Timestamp('2015-03-06 11:07:51.750893-0500', tz='US/Eastern'))

Operations on multiple time-series objects will be aligned by Timestamp in their index by taking into account the time zone information. To demonstrate, we will use the following, which creates two Series objects using the two DatetimeIndex objects, each with the same start, periods, and frequency but using different time zones:

In [65]:
   # create two Series, same start, same periods, same frequencies,
   # each with a different time zone
   s_mountain = Series(np.arange(0, 5),
                       index=pd.date_range('2014-08-01', 
                                           periods=5, freq="H", 
                                           tz='US/Mountain'))
   s_eastern = Series(np.arange(0, 5), 
                      index=pd.date_range('2014-08-01', 
                                          periods=5, freq="H", 
                                          tz='US/Eastern'))
   s_mountain

Out[65]:
   2014-08-01 00:00:00-06:00    0
   2014-08-01 01:00:00-06:00    1
   2014-08-01 02:00:00-06:00    2
   2014-08-01 03:00:00-06:00    3
   2014-08-01 04:00:00-06:00    4
   Freq: H, dtype: int64

In [66]:
   s_eastern

Out[66]:
   2014-08-01 00:00:00-04:00    0
   2014-08-01 01:00:00-04:00    1
   2014-08-01 02:00:00-04:00    2
   2014-08-01 03:00:00-04:00    3
   2014-08-01 04:00:00-04:00    4
   Freq: H, dtype: int64

The following demonstrates the alignment of these two Series objects by time zone by adding the two together:

In [67]:
   # add the two Series
   # This only results in three items being aligned
   s_eastern + s_mountain

Out[67]:
   2014-08-01 04:00:00+00:00   NaN
   2014-08-01 05:00:00+00:00   NaN
   2014-08-01 06:00:00+00:00     2
   2014-08-01 07:00:00+00:00     4
   2014-08-01 08:00:00+00:00     6
   2014-08-01 09:00:00+00:00   NaN
   2014-08-01 10:00:00+00:00   NaN
   Freq: H, dtype: float64

Once a time zone is assigned to an object, that object can be converted to another time zone using the tz.convert() method:

In [68]:
   # convert s1 from US/Eastern to US/Pacific
   s_pacific = s_eastern.tz_convert("US/Pacific")
   s_pacific

Out[68]:
   2014-07-31 21:00:00-07:00    0
   2014-07-31 22:00:00-07:00    1
   2014-07-31 23:00:00-07:00    2
   2014-08-01 00:00:00-07:00    3
   2014-08-01 01:00:00-07:00    4
   Freq: H, dtype: int64

Now if we add s_pacific to s_mountain, the alignment will force the same result:

In [69]:
   # this will be the same result as s_eastern + s_mountain
   # as the time zones still get aligned to be the same
   s_mountain + s_pacific

Out[69]:
   2014-08-01 04:00:00+00:00   NaN
   2014-08-01 05:00:00+00:00   NaN
   2014-08-01 06:00:00+00:00     2
   2014-08-01 07:00:00+00:00     4
   2014-08-01 08:00:00+00:00     6
   2014-08-01 09:00:00+00:00   NaN
   2014-08-01 10:00:00+00:00   NaN
   Freq: H, dtype: float64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.213.44