Chapter 1: Time Series Data

1.1 Time Series Questions

1.2 Types of Time Series: Theoretical Considerations

1.3 Types of Time Series: Practical Considerations

1.4 Time Series Procedures in SAS

1.5 References for Data Used in this Book

1.1 Time Series Questions

An observed time series is a set of values that are recorded for specific points in time. This book includes many practical series that illustrate the rich variety of areas for which time series analysis is relevant. The following time series are used as examples:

Ice coverage in the Arctic areas - Daily observations

The Swiss business indicator - Monthly observations

Unemployment in UK - Monthly observations

Danish fertility - Yearly observations

Number for overnight stays at Danish hotels by US citizens - Monthly observations

Volume of US E-Commerce - Quarterly observations

A short appendix to this section contains specific references to the origin of these series. The series are available on the author’s web page.

Three other time series are briefly cited in the chapters about handling time series in SAS. These examples are series that are not originally suitable for analyses by the time series procedures in SAS:

Number of copies taken at a photo copy machine - Observed irregularly

Movements of the left arm of a baby - Observed 60 times per second

Speed for automobiles at a highway - Observed at irregular points in time

These series are not analyzed in this book, and they are not included on the author’s web page.

The relation to specific points in time raises special considerations that are irrelevant for other types of data sets. Time series often show a high degree of dependence between observations that are close in time, but this dependence weakens for observations that are made within a longer time span. This is in contrast to many other statistical analyses where all observations are often assumed to be completely independent. Dependence implies that knowledge about the time series in an observation period leads to some ideas of what will happen to the time series after the last available observation. This is the basic principle underlying forecasting: The past provides information about the future. In fact, many practical time series analyses are performed in order to produce forecasts. This is the basic theme for Part 3.

Observations of time series could include seasonal patterns due to weather conditions (for instance, a series of monthly sales of ice cream). Similarly, the variation of sales volumes over a week or hourly registrations of electricity consumption during a day also exhibit seasonal variation. Often this seasonal variation is only a nuisance because the analyzer is interested in the underlying trend. A typical example is a time series for unemployment, which is, of course, weather dependent. However, because the usual seasonal variation tells nothing about the state of the national economy, a seasonal adjusted time series is needed in order to comment on or react to the real unemployment situation. This is the basic theme for Part 4.

In statistical analyses of time series data, the purpose of the analysis is to gain insight into the underlying mechanism that generated the data. Time series theory provides many tools that are somewhat difficult to apply by non-statisticians because they require some rather advanced mathematical skills. But less effort will suffice if users want only estimates of the trend and seasonality. When coupled with rough ideas of the amount of variation, estimates can form a basic understanding of the data series, which is enough to plan future activities. It is a very easy task to decompose a series into a sum of a few series, each of which describes one fundamental property of the observed series, like a trend, a seasonal component, relationships to other series, and so on. This is the basic theme of Part 5.

1.2 Types of Time Series: Theoretical Considerations

In mathematics, a time series is usually denoted Xt where X is the value of, for example, the outdoor temperature, and the subscript t in some way denotes the time. For the mathematical theory, the exact definition of the time is of no importance and the letter t typically takes values like 1, 2, 3, and so on, or perhaps all real numbers, giving no idea of what the time index really means.

Many time series like the outdoor temperature are defined for all points in time, and mathematicians then denote them as time series in continuous time. For series like these, the time index could theoretically be all real numbers, all positive real numbers, or an interval of real numbers.

Other time series, such as total retail sales, are published as a monthly total, and it is hard to imagine that these sales could be considered as phenomena that could realistically be defined in continuous time. Mathematicians denote such series as time series in discrete time. A discrete time series is called equidistant if it is observed at points in time separated by equal distances (for example, total sales every month). For time considerations, such series could use t = 1 for the first observation and let the index take all integer values up to t = T for the last observation. Forecasts are then defined as the expected values for time t = T + 1, t = T + 2, and so on. For the mathematical analysis, the time window from t = 1 to t = T is often extended to all positive integers or even to all integer values, including negative numbers.

In practical analyses, the notion of an infinite past is meaningless, and even the infinite future is hard to relate to, but in mathematical theory such concepts are of great interest. Mathematical theory provides theorems that ensure the effectiveness and consistency of the applied methods, such as convergences and consistency results. These results are important because they do in fact underlie and justify all the practical methods in this book. However, because this book focuses on the practical aspects of using SAS for analyzing time series, I generally avoid such purely theoretical concepts.

1.3 Types of Time Series: Practical Considerations

All the algorithms behind the SAS procedures that are used in this book rely on the assumption that the series is discrete and equidistant. In practice this means that a time series of, say, 12 years of monthly data is considered as observations X1, .., X144, and forecasts are then the expected values for X145, X146, and so on. You have to keep in mind that the first observation is for, say, January 1995, and the last observation is for December 2006. This time frame means that the forecasts are for January 2007, and so on. Every time you look at these data for forecasts, plots, and so on, you have to keep track of the translation from the observation number to the corresponding point in time. For practical applications, it is a better strategy to specify this correspondence as an element of the data set by defining t as a proper point in time. SAS offers a rich variety of datetime formats which, in combination with functions and procedures for time series handling, provide the basis for labeling the time index in a way suitable for immediate presentation. This is demonstrated in Chapter 2.

Many observed series are not originally generated as equidistant discrete time series but must be converted in various ways before the SAS procedures can be applied. Part 2 of this book presents some of the facilities offered by SAS for handling time series data in order to transform the data into SAS data sets that are convenient for further analysis. Chapter 3 is devoted to the aggregation of time series, including an example that converts sales on different days to a series of monthly total sales by accumulation. Chapter 4 similarly describes how to interpolate time series for which some observations are for some reason missing. This situation could arise for measurements of temperature if the measuring equipment is out of order for some of the planned observations. By using a combination of aggregation and interpolation, an irregularly sampled continuous time series can be converted into a discrete, equidistant time series. The data example in Chapter 13 illustrates all of this by applying several aggregation and interpolation levels.

1.4 Time Series Procedures in SAS

SAS/ETS® software is dedicated to econometric and time series (ETS) analysis. SAS/ETS includes procedures such as PROC TIMESERIES and PROC EXPAND for the practical handling of time series data such as aggregation and interpolation. These two procedures are the subject of Part 2, which also includes an overview of how SAS treats datetime variables and time series data.

SAS/ETS also contains procedures for the statistical analysis of econometric models and for time series analysis. Even if many of these procedures are specially designed for econometric analyses, the underlying statistical methods are of major relevance for many other scientific areas such geosciences, medicine, and so on.

In this book, the main topics are procedures for simple time series analysis from SAS/ETS. The procedures covered are all simple to use and do not require much programming. The analyses are not intended to end up with a fully specified statistical model for the data series. The idea is to show that it is easy to obtain useful results like forecasts and trend judgments because many procedures in SAS/ETS are designed for this purpose. It turns out that this can be done without lengthy statistical modeling. Algorithms, along with rather simple ideas, can help you achieve results that are fully comparable with results from more involved and costly model building.

The following procedures in SAS/ETS are featured in this book:

PROC ESM (an up-to-date procedure for forecasting; see Part 3)

PROC X12 (for seasonal adjustments; see Part 4)

PROC UCM (for unobserved component models; see Part 5)

PROC AUTOREG, PROC ARIMA, and PROC VARMAX, which are designed for model-based econometric analyses, are briefly mentioned in Chapter 7. This is done mainly in order to establish the connection between the practical techniques focused on in this book and more careful statistical methods, but you could read the overview given in Chapter 7 as an introduction to ordinary model-based time series analysis.

It is, of course, impossible to cover all the facilities offered by these procedures in this book. For more information, you should see the SAS Help that is either shipped as a part of the SAS installation or included on the SAS support web site. Especially consult the syntax in the SAS Help for exact answers in case of doubt.

You could use other time series procedures in SAS/ETS for almost the same analyses, but from different viewpoints and with different focuses. Moreover, many procedures overlap to a certain degree, so the choice of the “correct” procedure is often irrelevant.

In Chapter 7, a very short review of the Box-Jenkins class of time series models is given as a short introduction. The main purpose in this section is to clarify to what extent the automatic methods presented in this book are closely related to the more complicated, detailed econometric time series models. This section serves as an argument for the viewpoint that in many respects, the automatic models in the procedures covered by this book make the use of the more complicated procedures superfluous. In Chapter 7, PROC VARMAX is applied in order to derive a forecast by ARIMA models that is parallel to more intuitive forecasting algorithms. PROC VARMAX, which includes some facilities for model selection that make Box-Jenkins modeling easy, is a fairly new procedure designed for much more advanced analyses of multivariate time series. The discussion of it here is in no way a comprehensive description. Other procedures for time series analysis are PROC ARIMA and PROC AUTOREG, which are thoroughly discussed by Brocklebank and Dickey (2003).

1.5 References for Data Used in this Book

This section presents brief references to the series that are used in the various examples in this book. All series are downloaded at some point and later revisions of the series are not incorporated in the examples. The focus is on applications and not on specific conclusions about the series and their impact. They are analyzed without any political or economic viewpoints to ensure that the presentation is neutral and purely technical.

Time series examples soon become by nature obsolete. Even forecasting experiments where more recent observations are compared with forecasts begin to seem like historical exercises after a while. Keeping this in mind, forecasts in this book are in no way suggested to be the future realizations of the time series.

The series are available at the author’s web page (http://support.sas.com/publishing/authors/milhoj.html). The series are used as a member of the library SASTS (for SAS Time Series) in all code in this book.

In the book, two Danish series are applied.

Danish fertility - Yearly observations

Number for overnight stays at Danish hotels by US citizens - Monthly observations.

Both of the above series are published by the Danish Statistical Office Danmarks Statistik as a part of this institution’s database system named Statstikbanken. The web page is located at http://dst.dk/, and the English version of the database home page is http://www.statbank.dk/statbank5a/default.asp?w=1920.

The following time series are also used in this book:

Ice coverage in the Arctic areas - Daily observations

This series is published by NASA. This particular series is available at http://polynya.gsfc.nasa.gov/datasets/Np_29yrs_78-07.area.txt. The last column is the total sea area covered with ice.

The Swiss Business Indicator - Monthly observations

This series is published by OECD, along with similar series for many other countries. See http://stats.oecd.org/. You can download the actual series from http://stats.oecd.org/Index.aspx?DatasetCode=MEI_CLI#

Unemployment in UK - Monthly observations

This series is published by the Organisation for Economic Co-operation and Development (OECD), along with similar series for many other countries. See http://stats.oecd.org/#,where many labor market series, including unemployment series, are published. This particular series is from the database Registered Unemployed and Job Vacancies (MEI).

Volume of US E-Commerce - Quarterly observations

The series is published by the United States Bureau of the Census. It is located at http://www.census.gov/ . The specific series is found at http://www.census.gov/retail/index.html#ecommerce. This web page also gives the total retail sales, which is used as the independent variable in a regression-style model in Chapter 12.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.162.37