Chapter 6

About the Data

6.1 SIAR

The Agroclimatic Information System for Irrigation (SIAR) (MARM 2011) is a free-download database operating since 1999, covering the majority of the irrigated area of Spain. This network belongs to the Ministry of Agriculture, Food and Environment of Spain, as a tool to predict and study meteorological variables for agriculture. SIAR is composed of twelve regional centers and a national center, aiming to centralize and depurate measurements from the stations of the network. Figure 6.1 displays the stations over an altitude map. Some stations from the complete network have been omitted, due to difficulties accessing their coordinates or to incomplete or spurious data series1.

Figure 6.1

Figure showing meteorological stations of the siar network. the color key indicates the altitude (meters).

Meteorological stations of the SIAR network. The color key indicates the altitude (meters).

6.1.1 Daily Data of Different Meteorological Variables

As an example of multiple time series with different scales, we will use 8 years (from January 2004 to December 2011) of daily data corresponding to several meteorological variables measured at the SIAR station located at Aranjuez (Madrid, Spain) available on the SIAR webpage2. The aranjuez.gz file, available in the data folder of the book repository, contains this information with several meteorological variables: average, maximum, and minimum ambient temperature; average and maximum humidity; average and maximum wind speed; rainfall; solar radiation on the horizontal plane; and evotranspiration.

The read.zoo from the zoo package accepts this string and downloads the data to construct a zoo object. Several arguments are passed directly to read.table (header, skip, etc.) and are detailed conveniently on the help page of this function. The index.column is the number of the column with the time index, and format defines the date format of this index.

library(zoo)

aranjuez <- read.zoo(“data/aranjuez.gz”,
                 index.column = 3, format = “%d/%m/%Y”,
                 fileEncoding = ’UTF-16LE’,
                 header = TRUE, fill = TRUE,
                 sep = ’;’, dec = “,”, as.is = TRUE)
aranjuez <- aranjuez[, -c(1:4)]

names(aranjuez) <- c(’TempAvg’, ’TempMax’, ’TempMin’,
                 ’HumidAvg’, ’HumidMax’,
                 ’WindAvg’, ’WindMax’,
                 ’Radiation’, ’Rain’, ’ET’)
summary(aranjuez)

From the summary it is clear that parts of these time series include erroneous outliers that can be safely removed:

aranjuezClean <- within(as.data.frame(aranjuez),{
  TempMin[TempMin>40] <- NA
  HumidMax[HumidMax>100] <- NA
  WindAvg[WindAvg>10] <- NA
  WindMax[WindMax>10] <- NA
})

aranjuez <- zoo(aranjuezClean, index(aranjuez))

6.1.2 Solar Radiation Measurements from Different Locations

As an example of multiple time series with the same scale, we will use data of daily solar radiation measurements from different locations.

Daily solar radiation incident on the horizontal plane is registered by meterological stations and estimated from satellite images. This meteorological variable is important for a wide variety of scientific disciplines and engineering applications. Its variations and trends, dependent on the location (mainly latitude, and also longitude and altitude) and on time (day of the year), have been analyzed and modeled in a huge collection of papers and reports. In this section we will focus our attention on the time evolution of the solar radiation. The spatial distribution and the spatio-time behavior will be the subject of later sections.

The stations of the SIAR network include first-class pyranometers according to the World Meteorological Organization (WMO), whose absolute accuracy is within ±5% and is typically lower than ±3%. Solar irradiance is recorded every 15 minutes and then collated through a datalogger within the station to generate the daily irradiation, which is later sent to the regional and national centers.

The file navarra.RData contains daily solar radiation data of 2011 from the meteorological stations of Navarra, Spain. The names of the dataset are the abbreviations of each station name.

6.2 Unemployment in the United States

As an example of time series that can be displayed both in individual and in aggregate, we will use the unemployment data in the United States. The information on unemployed persons by industry and class of worker is available in Table A-14 published by the Bureau of Labor Statistics3.

The dataset arranges the information with a row for each category (Series.ID) and a column for each monthly value. In addition, there are columns with the annual summaries (annualCols). We rearrange this data.frame, dropping the Series.ID and the annual columns, and transpose the data.

unemployUSA <- read.csv(’data/unemployUSA.csv’)
nms <- unemployUSA$Series.ID
##columns of annual summaries
annualCols <- 14 + 13*(0:12)
## Transpose. Remove annual summaries
unemployUSA <- as.data.frame(t(unemployUSA[,-c(1, annualCols)]))
## First 7 characters can be suppressed
names(unemployUSA) <- substring(nms, 7)
head(unemployUSA)

With the transpose, the column names of the original data set are now the row names of the data.frame. The as.yearmon function of the zoo package converts the character vector of names into a yearmon vector, a class for representing monthly data. With Sys.setlocale(“LC_TIME”, ‘C’) we ensure that month abbreviations (%b) are correctly interpreted in a non-English locale. This vector is the time index of a new zoo object.

library(zoo)

Sys.setlocale(“LC_TIME”, ’C’)
idx <- as.yearmon(row.names(unemployUSA), format=’%b.%Y’)
unemployUSA <- zoo(unemployUSA, idx)

Finally, those rows with NA values are removed.

isNA <- apply(is.na(unemployUSA), 1, any)
unemployUSA <- unemployUSA[!isNA,]

6.3 Gross National Income and CO2 Emissions

The catalog data of the World Bank Open Data initiative includes a the World Development Indicators (WDI)4. Among them we will analyze the evolution of the relationship between Gross National Income (GNI) and CO2 emissions for a set of countries. The package WDI is able to search and download these data series.

library(WDI)

CO2data <- WDI(indicator=c(’EN.ATM.CO2E.PC’, ’EN.ATM.CO2E.PP.GD’,
           ’NY.GNP.MKTP.PP.CD’, ’NY.GNP.PCAP.PP.CD’),
        start=2000, end=2011,
        country=c(’BR’, ’CN’, ’DE’, ’ES’,
           ’FI’, ’FR’, ’GR’, ’IN’, ’NO’, ’US’))

names(CO2data) <- c(’iso2c’, ’Country.Name’, ’Year’,
                ’CO2.capita’, ’CO2.PPP’,
                ’GNI.PPP’, ’GNI.capita’)

Only two minor modifications are needed: Remove the missing values and convert the Country.Name column into a factor. This first modification will save problems when displaying the time series, and the factor conversion will be useful for grouping.

isNA <- apply(is.na(CO2data), 1, any)
CO2data <- CO2data[!isNA, ]

CO2data$Country.Name <- factor(CO2data$Country.Name)

1 The name and location data of these stations are available at the GitHub repository of the paper (Antonanzas-Torres, Cañizares, and O. Perpiñán 2013).

2 http://eportal.magrama.gob.es/websiar

3 http://www.bls.gov/webapps/legacy/cpsatab14.htm

4 http://databank.worldbank.org/Data/Views/VariableSelection/SelectVariables.aspx

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.125