Formatting time series data for plotting

Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data. However, it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages.

Getting ready

In addition to the basic R functions, we will also be using the zoo package in this recipe. So, first we need to install it:

install.packages("zoo")

How to do it...

Let's use the dailysales.csv example dataset and format its date column:

sales<-read.csv("dailysales.csv")

d1<-as.Date(sales$date,"%d/%m/%y")

d2<-strptime(sales$date,"%d/%m/%y")

data.class(d1)
[1] "Date"

data.class(d2)
[1] "POSIXt"

How it works...

We have seen two different functions to convert a character vector into dates. If we did not convert the date column, R will not automatically recognize the values in the column as dates. Instead, the column will be treated as a character vector or a factor.

The as.Date() function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in the DD/MM/YYYY format (you can verify this by typing in sales$date at the R prompt). So, we specify the format argument as "%d/%m/%y". Note that this order is important. If we instead use "%m/%d/%y", then our days will be read as months and vice versa. The quotes around the value are also necessary.

The strptime() function is another way to convert character vectors into dates. However, strptime() returns a different kind of object of the POSIXlt class, which is a named list of vectors that represent the different components of a date and time, such as year, month, day, hour, seconds, and minutes.

POSIXlt is one of the two basic classes of date/time in R. The other POSIXct class represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct is more convenient for including in data frames, and POSIXlt is closer to human-readable forms. Being a virtual class, POSIXt inherits from both of the classes. That's why, when we ran the data.class() function on d2 earlier, we got POSIXt as the result.

The strptime() function also takes a character vector to be converted and the format as arguments.

There's more...

The zoo package is handy for dealing with time series data. The zoo() function takes an x argument that can be a numeric vector, matrix, or factor. It also takes an order.by argument that has to be an index vector with unique entries by which the observations in x are ordered:

library(zoo)

d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))

data.class(d3)
[1] "zoo"

See the help on DateTimeClasses to find out more details about the ways dates can be represented in R.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.61.147