© Thomas Mailund 2019
Thomas MailundR Data Science Quick Referencehttps://doi.org/10.1007/978-1-4842-4894-2_10

10. Working with Dates: lubridate

Thomas Mailund1 
(1)
Aarhus, Denmark
 
The lubridate package is essential for working with dates and fits well with the Tidyverse. It is not, however, loaded when you import the tidyverse package, so you need to explicitly load it.
library(lubridate)

Time Points

You can create dates and dates with time-of-day information using variations of the ymd() function . The letters y, m, and d stand for year, month, and day, respectively. With ymd() you should write your data in a format that puts the year first, the month second, and the day last. The function is very flexible in what it can parse as a date.
ymd("1975 Feb 15")
## [1] "1975-02-15"
ymd("19750215")
## [1] "1975-02-15"
ymd("1975/2/15")
## [1] "1975-02-15"
ymd("1975-02-15")
## [1] "1975-02-15"
You can permute the y, m, and d letters if the order of year, month, and day is different. Each permutation gives you a parser that will interpret its input in the specified order.
dmy("150275")
## [1] "1975-02-15"
mdy("February 15th 1975")
## [1] "1975-02-15"
If you want to add a time of the day to your date, you can add an hour, an hour and a minute, or an hour, a minute, and a second by using _h(), _hm() and _hms() variants of the ymd() functions .
dmy_h("15/2/1975 2pm")
## [1] "1975-02-15 14:00:00 UTC"
dmy_hm("15/2/1975 14:30")
## [1] "1975-02-15 14:30:00 UTC"
dmy_hms("15/2/1975 14:30:10")
## [1] "1975-02-15 14:30:10 UTC"
If you have a time object
x <- dmy_hms("15/2/1975 14:30:10")
then you can extract its components through dedicated functions:
c(day(x), month(x), year(x))
## [1]   15    2 1975
c(hour(x), minute(x), second(x))
## [1] 14 30 10
c(week(x), # The week in the year
  wday(x), # The day in the week
  yday(x)) # The day in the year
## [1] 7 7 46
These functions have corresponding assignment functions that you can use to modify the components of the time point.
minute(x) <- 15
wday(x) <- 42
x
## [1] "1975-03-22 14:15:10 UTC"

Time Zones

When you add a time of day, a time zone is also necessary. After all, we do not know what time a given hour is before we know which time zone we are in. If I tell you that I am going to call you at two o’clock, you can’t assume that it is two o’clock in your time zone.1 Unless you tell the functions otherwise, they will assume UTC is the time zone. You can specify another time zone via the tz argument.
x <- dmy_hm(
  "15/2/1975 14:00",
  tz = "Europe/Copenhagen"
)
x
## [1] "1975-02-15 14:00:00 CET"
You can take a time point in one time zone and move it to another in two different ways. You set the time zone and pretend that the day and time of day were already in this time zone. That is, you can just change the time zone attribute of the object and not touch the time information. You can do this using the function force_tz() .
force_tz().
force_tz(
  dmy_hm("15/2/1975 14:00",
          tz = "Europe/Copenhagen"),
  tz = "Europe/London"
)
## [1] "1975-02-15 14:00:00 GMT"
A much more likely situation is that you want to know at what a time point in one time zone was in another time zone. For example, if I promise to call you at two o’clock in Denmark, and you are in the UK, you can translate the time from my time zone to yours using with_tz().
with_tz(x, tz = "Europe/London")
## [1] "1975-02-15 13:00:00 GMT"

Time Intervals

If you have two time points, you also have a time interval: the time between the two points. You can create an interval object from two time points using the interval() function .
start <- dmy("02 11 1949")
end <- dmy("15 02 1975")
interval(start, end)
## [1] 1949-11-02 UTC--1975-02-15 UTC
The infix operator %--% does the same thing.
start %--% end
## [1] 1949-11-02 UTC--1975-02-15 UTC
You can get the start and end points of an interval using int_start() and int_end().
int <- interval(start, end)
int
## [1] 1949-11-02 UTC--1975-02-15 UTC
int_start(int)
## [1] "1949-11-02 UTC"
int_end(int)
## [1] "1975-02-15 UTC"
The start point does not have to be before the endpoint. You can define an interval that starts after it ends.
end %--% start
## [1] 1975-02-15 UTC--1949-11-02 UTC
int_start(start %--% end)
## [1] "1949-11-02 UTC"
int_start(end %--% start)
## [1] "1975-02-15 UTC"
You can flip an interval using int_flip() .
int_flip(end %--% start)
## [1] 1949-11-02 UTC--1975-02-15 UTC
The function int_standardize() will flip the interval if the start point comes after the endpoint but otherwise will leave the interval as it is.
int_standardize(start %--% end)
## [1] 1949-11-02 UTC--1975-02-15 UTC
int_standardize(end %--% start)
## [1] 1949-11-02 UTC--1975-02-15 UTC
The int_length() will give you the length of an interval in seconds.
x <- now()
int <- interval(x, x + minutes(1))
int
## [1] 2019-06-18 09:28:32 CEST--2019-06-18 09:29:32 CEST
int_length(int)
## [1] 60
int <- interval(x, x + minutes(20))
int
## [1] 2019-06-18 09:28:32 CEST--2019-06-18 09:48:32 CEST
int_length(int) / 60
## [1] 20
You can check if a point is in an interval using the %within% operator .
ymd("1867 05 02") %within% int
## [1] FALSE
ymd("1959 04 23") %within% int
## [1] FALSE
You can update the start and end point in an interval by assigning to int_start() or int_end():
int_start(int) <- dmy("19 Aug 1950")
int
## [1] 1950-08-19 01:00:00 CET--2019-06-18 09:48:32 CEST
int_end(int) <- dmy("19 Sep 1950")
int
## [1] 1950-08-19 01:00:00 CET--1950-09-19 01:00:00 CET
You can move the entire interval by a fixed amount. For example, you can move the interval one month forward using
int_shift(int, months(1))
## [1] 1950-09-19 01:00:00 CET--1950-10-19 01:00:00 CET
Given two intervals, the int_overlaps() function checks if they overlap.
int1 <- interval(dmy("19 oct 1950"), dmy("25 nov 1951"))
int2 <- interval(dmy("19 oct 1948"), dmy("25 aug 1951"))
int3 <- interval(dmy("19 oct 1981"), dmy("25 aug 2051"))
c(int, int1)
## [1] 1950-08-19 01:00:00 CET--1950-09-19 01:00:00 CET
## [2] 1950-10-19 01:00:00 CET--1951-11-25 01:00:00 CET
# int1 is contained in int
int_overlaps(int, int1)
## [1] FALSE
# int2 starts before int but they overlap
int_overlaps(int, int2)
## [1] TRUE
# no overlap
int_overlaps(int, int3)
## [1] FALSE

The function int_aligns() checks if any of the four start/end points are equal. That is, either the start or the end point of the first interval must be equal to at least one of the points in the second interval.

The preceding four intervals we have created do not have shared interval endpoints.
c(
  int_aligns(int, int1),
  int_aligns(int, int2),
  int_aligns(int, int3)
)
## [1] FALSE FALSE FALSE
Test it with intervals that share end points and test int_aligns():
int4 <- interval(int_start(int), int_end(int) + years(3))
int5 <- int_shift(int4, -years(3))
int6 <- int_shift(int5, -years(3))
c(
  int_aligns(int, int4), # share start
  int_aligns(int, int5), # share end
  int_aligns(int, int6) # overlaps but does not share endpoints
)
## [1] TRUE TRUE FALSE
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.117.25