Predicting intense earthquakes

Having reviewed several time series models, we are now ready for some practical examples. Our first data set is a time series of earthquakes having magnitude that exceeds 4.0 on the Richter scale in Greece over the period between the year 2000 and the year 2008. This data set was recorded by the Observatory of Athens and is hosted on the website of the University of Athens, Faculty of Geology, Department of Geophysics & Geothermics. The data is available online at http://www.geophysics.geol.uoa.gr/catalog/catgr_20002008.epi.

We will import these data directly by using the package RCurl. From this package, we will use the functions getURL(), which retrieves the contents of a particular address on the Internet, and textConnection(), which will interpret the result as raw text. Once we have the data, we provide meaningful names for the columns using information from the website:

> library("RCurl")
> seismic_raw <- read.table(textConnection(getURL(
 "http://www.geophysics.geol.uoa.gr/catalog/catgr_20002008.epi")),  
  sep = "", header = F)
> names(seismic_raw) <- c("date", "mo", "day", "hr", "mn", "sec", 
 "lat", "long", "depth", "mw")
> head(seismic_raw, n = 3)
  date mo day hr mn  sec     lat  long depth  mw
1 2000  1   1  1 19 28.3 41.950N 20.63     5 4.8
2 2000  1   1  4  2 28.4 35.540N 22.76    22 3.7
3 2000  1   2 10 44 10.9 35.850N 27.61     3 3.7

The first column is the date column and the second is the month column. The next four columns represent the day, hour, minute, and second that the earthquake was observed. The next two columns are the latitude and longitude of the epicenter of the earthquake. The last two columns contain the surface depth and earthquake intensity.

Our goal is to aggregate these data in order to get monthly counts of the significant earthquakes observed in this time period. To achieve this, we can use the count() function of the R package plyr to aggregate the data, and the standard ts() function to create a new time series. We will specify the starting and ending year and month of our time series, as well as set the freq parameter to 12 to indicate monthly readings. Finally, we will plot our data:

> library("plyr")
> seismic <- count(seismic_raw, c("date", "mo"))
> seismic_ts <- ts(seismic$freq, start = c(2000, 1), 
                   end = c(2008, 1), frequency = 12)

The following plot shows our time series:

Predicting intense earthquakes

The data seems to fluctuate around 30 with occasional peaks, and although the largest two of these are more recent in time, there does not appear to be any overall upward trend.

We would like to analyze these data using an ARIMA model; however, at the same time, we are not sure what values we should use for the order. A simple way that we can compute this ourselves is to obtain the AIC for all the different models we would like to train and pick the model that has the smallest AIC. Concretely, we will first begin by creating ranges of possible values for the order parameters p, d, and q. Next, we shall use the expand.grid() function, which is a very useful function that will create a data frame with all the possible combination of these parameters:

> d <- 0 : 2
> p <- 0 : 6
> q <- 0 : 6
> seismic_models <- expand.grid(d = d, p = p, q = q)
> head(seismic_models, n = 4)
  d p q
1 0 0 0
2 1 0 0
3 2 0 0
4 0 1 0 

Next, we define a function that fits an ARIMA model using a particular combination of order parameters and returns the AIC produced:

getTSModelAIC <- function(ts_data, p, d, q) {
  ts_model <- arima(ts_data, order = c(p, d, q))
  return(ts_model$aic)
}

Tip

When we talked about ARIMA processes in this chapter, we described them in terms of a zero mean. In R, many functions, such as the arima() function, center the results automatically for time series whose mean is nonzero by subtracting the mean of the time series from each point.

For certain combinations of order parameters, our function will produce an error if it fails to converge. When this happens, we'll want to report a value of infinity for the AIC value so that this model is not chosen when we try to pick the best model. The following function acts as a wrapper around our previous function:

getTSModelAICSafe <- function(ts_data, p, d, q) {
  result = tryCatch({
    getTSModelAIC(ts_data, p, d, q)
  }, error = function(e) {
    Inf
  })
}

All that remains is to apply this function on every parameter combination in our seismic_models data frame, save the results, and pick the combination that gives us the lowest AIC:

> seismic_models$aic <- mapply(function(x, y, z) 
  getTSModelAICSafe(seismic_ts, x, y, z), seismic_models$p, 
  seismic_models$d, seismic_models$q)
> subset(seismic_models,aic == min(aic))
   d p q     aic
26 1 1 1 832.171

The results indicate that the most appropriate model for our earthquakes time series is the ARIMA(1, 1, 1) model. We can train this model again with these parameters:

> seismic_model <- arima(seismic_ts, order = c(1, 1, 1))
> summary(seismic_model)
Series: seismic_ts 
ARIMA(1,1,1)

Coefficients:
         ar1      ma1
      0.2949  -1.0000
s.e.  0.0986   0.0536

sigma^2 estimated as 306.9:  log likelihood=-413.09
AIC=832.17   AICc=832.43   BIC=839.86

Training set error measures:
                     ME     RMSE      MAE       MPE     MAPE     MASE        ACF1
Training set -0.2385232 17.42922 11.12018 -14.47481 29.84171 0.8174096 -0.02179457

The forecast package has a very useful forecasting function, forecast, that can be applied to time series models. This, in turn, provides us with not only a convenient method to forecast values into the future, but also allows us to visualize the result using the plot() function.

Tip

The forecast package also has an auto.arima() function that can be used to pick the order values of the ARIMA process, which is similar but slightly more sophisticated than the simple approach we present here.

Let's forecast the next ten points in the future:

> plot(forecast(seismic_model, 10))
Predicting intense earthquakes

The plot shows that our model predicts a small rise in the number of intense earthquakes over the next few time periods and then predicts a steady average value. Note that the bands shown are confidence intervals around these forecasts. The spiky nature of the time series is reflected in the width of these bands. If we examine the autocorrelation function of our time series, we see that there is only one lag coefficient that is statistically significant:

Predicting intense earthquakes

The profile of the ACF indicates that a simple AR(1) process might be an appropriate fit, and indeed this is what is predicted using the auto.arima() function of the forecast package. Repeating our forecast with an AR(1) model yields a result very similar to what we saw earlier. In fact, as indicated with the AIC values, this model is only slightly better than a white noise model (AR(0)), indicating that the time series that we have is very close to being white noise. This is not very surprising for our particular data set, which involves earthquakes produced at different epicenters around Greece. As a final note, we will add that if we want to inspect the coefficients of our fitted model, we can do so using the coefficients() function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.51.36