© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
T. MailundR 4 Data Science Quick Referencehttps://doi.org/10.1007/978-1-4842-8780-4_11

11. Working with Dates: lubridate

Thomas Mailund1  
(1)
Aarhus, Denmark
 
The lubridate package is essential for working with dates and fits well with the Tidyverse . It is not, however, loaded when you import the tidyverse package, so you need to explicitly load it.
library(lubridate)

Time Points

You can create dates and dates with time -of-day information using variations of the ymd() function. The letters y, m, and d stand for year, month, and day, respectively. With ymd(), you should write your data in a format that puts the year first, the month second, and the day last. The function is very flexible in what it can parse as a date.
ymd("1975 Feb 15")
## [1] "1975-02-15"
ymd("19750215")
## [1] "1975-02-15"
ymd("1975/2/15")
## [1] "1975-02-15"
ymd("1975-02-15")
## [1] "1975-02-15"
You can permute the y, m, and d letters if the order of year, month, and day is different. Each permutation gives you a parser that will interpret its input in the specified order.
dmy("150275")
## [1] "1975-02-15"
mdy("February 15th 1975")
## [1] "1975-02-15"
If you want to add a time of the day to your date, you can add an hour, an hour and a minute, or an hour, a minute, and a second by using _h(), _hm(), and _hms() variants of the ymd() functions.
dmy_h("15/2/1975 2pm")
## [1] "1975-02-15 14:00:00 UTC"
dmy_hm("15/2/1975 14:30")
## [1] "1975-02-15 14:30:00 UTC"
dmy_hms("15/2/1975 14:30:10")
## [1] "1975-02-15 14:30:10 UTC"
If you have a time object
x <- dmy_hms("15/2/1975 14:30:10")
then you can extract its components through dedicated functions:
c(day(x), month(x), year(x))
## [1]   15    2 1975
c(hour(x), minute(x), second(x))
## [1] 14 30 10
c(week(x), # The week in the year
  wday(x), # The day in the week
  yday(x)) # The day in the year
## [1] 7 7 46
These functions have corresponding assignment functions that you can use to modify the components of the time point.
minute(x) <- 15
wday(x) <- 42
x
## [1] "1975-03-22 14:15:10 UTC"

Time Zones

When you add a time of day, a time zone is also necessary. After all, we do not know what time a given hour is before we know which time zone we are in. If I tell you that I am going to call you at two o’clock, you can’t assume that it is two o’clock in your time zone.1 Unless you tell the functions otherwise, they will assume UTC is the time zone. You can specify another time zone via the tz argument.
dmy_hm(
  "15/2/1975 14:00",
  tz = "Europe/Copenhagen"
)
## [1] "1975-02-15 14:00:00 CET"
You can take a time point in one time zone and move it to another in two different ways. You set the time zone and pretend that the day and time of day were already in this time zone. That is, you can just change the time zone attribute of the object and not touch the time information. You can do this using the function force_tz() .
force_tz(
  # This is a date/time in CET
  dmy_hm("15/2/1975 14:00", tz = "Europe/Copenhagen"),
  # It will be moved to GMT
  tz = "Europe/London"
)
## [1] "1975-02-15 14:00:00 GMT"

Even though CET and GMT are different time zones, the force_tz() function keeps the hour at 14:00 and simply updates the time zone.

A much more likely situation is that you want to know at what a time point in one time zone was in another time zone. For example, if I promise to call you at two o’clock in Denmark, and you are in the UK, you can translate the time from my time zone to yours using with_tz() .
with_tz(
  # This is a date/time in CET
  dmy_hm("15/2/1975 14:00", tz = "Europe/Copenhagen"),
  # This moves it to the same time but in GMT
  tz = "Europe/London"
)
## [1] "1975-02-15 13:00:00 GMT"

Copenhagen is one hour ahead of London, so when we move the time 14:00 from Copenhagen to GMT, we get the hour 13:00.

Time Intervals

If you have two time points, you also have a time interval: the time between the two points. You can create an interval object from two time points using the interval() function .
start <- dmy("02 11 1949")
end <- dmy("15 02 1975")
interval(start, end)
## [1] 1949-11-02 UTC--1975-02-15 UTC
The infix operator %--% does the same thing.
start %--% end
## [1] 1949-11-02 UTC--1975-02-15 UTC
You can get the start and end points of an interval using int_start() and int_end() .
int <- interval(start, end)
int
## [1] 1949-11-02 UTC--1975-02-15 UTC
int_start(int)
## [1] "1949-11-02 UTC"
int_end(int)
## [1] "1975-02-15 UTC"
The start point does not have to be before the end point. You can define an interval that starts after it ends.
end %--% start
## [1] 1975-02-15 UTC--1949-11-02 UTC
int_start(start %--% end)
## [1] "1949-11-02 UTC"
int_start(end %--% start)
## [1] "1975-02-15 UTC"
You can flip an interval using int_flip() .
int_flip(end %--% start)
## [1] 1949-11-02 UTC--1975-02-15 UTC
The function int_standardize() will flip the interval if the start point comes after the end point but otherwise will leave the interval as it is.
int_standardize(start %--% end)
## [1] 1949-11-02 UTC--1975-02-15 UTC
int_standardize(end %--% start)
## [1] 1949-11-02 UTC--1975-02-15 UTC
The int_length() will give you the length of an interval in seconds.
x <- now()
int <- interval(x, x + minutes(1)) # from now and one minute forward
int_length(int) # the length is one minute, so 60 seconds
## [1] 60
int <- interval(x, x + minutes(20)) # now and 20 minutes forward
int_length(int) / 60 # Dividing by 60 to get the length in minutes
## [1] 20
You can check if a point is in an interval using the %within% operator.
ymd("1867 05 02") %within% int
## [1] FALSE
ymd("1959 04 23") %within% int
## [1] FALSE
x %within% int # start point is inside the interval
## [1] TRUE
(x + minutes(20)) %within% int # end point is inside the interval
## [1] TRUE
You can update the start and end points in an interval by assigning to int_start() or int_end() :
int_start(int) <- dmy("19 Aug 1950")
int
## [1] 1950-08-19 01:00:00 CET--2022-08-23 11:50:29 CEST
int_end(int) <- dmy("19 Sep 1950")
int
## [1] 1950-08-19 01:00:00 CET--1950-09-19 01:00:00 CET
You can move the entire interval by a fixed amount. For example, you can move the interval one month forward using
int_shift(int, months(1))
## [1] 1950-09-19 01:00:00 CET--1950-10-19 01:00:00 CET
Given two intervals, the int_overlaps() function checks if they overlap.
int1 <- interval(dmy("19 Aug 1950"), dmy("19 Sep 1950"))
int2 <- interval(dmy("19 oct 1950"), dmy("25 nov 1951"))
int3 <- interval(dmy("19 oct 1948"), dmy("25 aug 1951"))
int4 <- interval(dmy("19 oct 1981"), dmy("25 aug 2051"))
# int1 ends before int2
int_overlaps(int1, int2)
## [1] FALSE
# int3 starts before int1 but they overlap
int_overlaps(int1, int3)
## [1] TRUE
# no overlap, int4 is far in the future compared to int1
int_overlaps(int1, int4)
## [1] FALSE

The function int_aligns() checks if any of the four start/end points are equal. That is, either the start or the end point of the first interval must be equal to at least one of the points in the second interval.

The four intervals we have created earlier do not have shared interval end points.
c(
  int_aligns(int1, int2),
  int_aligns(int1, int3),
  int_aligns(int1, int4)
)
## [1] FALSE FALSE FALSE
We can create intervals that do share end points and test int_aligns() :
int5 <- interval(int_start(int1), int_end(int1) + years(3))
int6 <- int_shift(int5, -years(3))
int7 <- int_shift(int6, -years(3))
c(
  int_aligns(int1, int5), # share start
  int_aligns(int1, int6), # share end
  int_aligns(int1, int7) # overlaps but does not share endpoints
)
## [1] TRUE TRUE FALSE
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.190.182