Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data. However, it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages.
In addition to the basic R functions, we will also be using the zoo
package in this recipe. So, first we need to install it:
install.packages("zoo")
Let's use the dailysales.csv
example dataset and format its date
column:
sales<-read.csv("dailysales.csv") d1<-as.Date(sales$date,"%d/%m/%y") d2<-strptime(sales$date,"%d/%m/%y") data.class(d1) [1] "Date" data.class(d2) [1] "POSIXt"
We have seen two different functions to convert a character vector into dates. If we did not convert the date
column, R will not automatically recognize the values in the column as dates. Instead, the column will be treated as a character vector or a factor.
The as.Date()
function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date
class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date
column are in the DD/MM/YYYY format (you can verify this by typing in sales$date
at the R prompt). So, we specify the format argument as "%d/%m/%y"
. Note that this order is important. If we instead use "%m/%d/%y"
, then our days will be read as months and vice versa. The quotes around the value are also necessary.
The strptime()
function is another way to convert character vectors into dates. However, strptime()
returns a different kind of object of the POSIXlt
class, which is a named list of vectors that represent the different components of a date and time, such as year, month, day, hour, seconds, and minutes.
POSIXlt
is one of the two basic classes of date/time in R. The other POSIXct
class represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct
is more convenient for including in data frames, and POSIXlt
is closer to human-readable forms. Being a virtual class, POSIXt
inherits from both of the classes. That's why, when we ran the data.class()
function on d2
earlier, we got POSIXt
as the result.
The strptime()
function also takes a character vector to be converted and the format as arguments.
The zoo package is handy for dealing with time series data. The zoo()
function takes an x
argument that can be a numeric vector, matrix, or factor. It also takes an order.by
argument that has to be an index vector with unique entries by which the observations in x
are ordered:
library(zoo) d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y")) data.class(d3) [1] "zoo"
See the help on DateTimeClasses
to find out more details about the ways dates can be represented in R.
13.59.61.147