Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix B

Working with Data Frames and Arrays

B.1 Resampling and Data Partitioning

B.1.1 Using the boot function

Bootstrap is implemented in the boot function (boot package [34]), which provides functions and arguments for the book [63]. In ordinary bootstrap, the samples are selected with replacement. The basic syntax for ordinary bootstrap is

  boot(data, statistic, R)

where data is the observed sample and R is the number of bootstrap replicates. The default is sim = "ordinary", the ordinary bootstrap (sampling with replacement).

The second argument (statistic) is a function, or the name of a function, which calculates the statistic to be replicated. Suppose we call this function f. The boot function generates the random indices i = (i1, ... , in) for each bootstrap replicate, and passes to the function f a copy of the data and the index vector i. The function f then computes the statistic ${\hat{θ}}^{(b)}$ ${\hat{θ}}^{(b)}$ corresponding to the resampled observations. Example B.1 discusses how to extract the samples for the calculations inside f.

Example B.1 (Extracting a bootstrap sample using an index vector)

We have seen that the sample function can be used to sample from a vector with replacement. Equivalently, if x is a vector of length n, we can sample with replacement from the vector of indices 1:n, and use the resulting value to extract the elements of x. Notice that the two methods below generate the same samples.

  > set.seed(123)

  > sample(letters[1:10], size = 10, replace = TRUE)

  [1] "c" "h" "e" "i" "j" "a" "f" "i" "f" "e"

  > set.seed(123)

  > i <- sample(1:10, size = 10, replace = TRUE)

  > letters[i]

  [1] "c" "h" "e" "i" "j" "a" "f" "i" "f" "e"

Similarly, the [] operator can be used to extract bootstrap samples from data frames and matrices using x[i,].

> x

   [,1] [,2] [,3] [,4]

 [1,] 16 14 17 12

 [2,] 14 13 16 14

 [3,] 13 13 14 11

 [4,] 19 11 15 11

 [5,] 14 10 8 11

> i

  [1] 1 3 3 2 1

  > x[i,]

   [,1] [,2] [,3] [,4]

  [1,]  16 14 17  12

  [2,]  13 13 14  11

  [3,]  13 13 14  11

  [4,]  14 13 16  14

  [5,]  16 14 17  12

The boot function will pass a copy of the observed sample x and the bth index vector i; the user’s function f (statistic) should compute the test statistic on x[i,] or x[i]. For example, if x is a bivariate sample, and the statistic to replicate is correlation, then the function f can be written as follows.

  f <- function(x, i) {

  cor(x[i, 1], x[i, 2])

For a resampling experiment, it is helpful to code the calculations for the statistic in a function like f above, whether or not the boot function will be used to run the bootstrap.

B.1.2 Sampling without replacement

The boot function can also be applied in situations where the resampling should be without replacement. For example, in permutation tests, the method of resampling should be sim = "permutation".

If boot is not used, then it is necessary to generate for each replicate a permutation of the sample observations. To obtain a permutation of the sample observations in a data frame or matrix x, use x[i,], where i is a permutation of the indices of the sample elements. A permutation of the integers 1:n is generated by sample(1:n).

In situations like the jackknife and cross-validation, it is more convenient to specify what should not be extracted. To specify which elements to exclude, use the [] operator with a negative argument. For example, to extract all but row i of a matrix A, use A[-i,]. In general, i can be a vector and A[-i,] extracts a submatrix from A that excludes the rows indexed by i.

Example B.2 (Extracting rows from a matrix)

  > A <- matrix(1:25, 5, 5)

  > A[-(2:3),]

   [,1] [,2] [,3] [,4] [,5]

  [1,] 1  6 11  16  21

  [2,] 4  9 14  19  24

  [3,] 5 10 15  20  25

  > A[-(2:3), 4]

 [1] 16 19 20

In the last line, notice that the result has been converted to a vector. To extract the 3 × 1 matrix use as.matrix(A[-(2:3), 4]).

A random sample of size k or n − k can be selected without replacement from a sample x of size n by

  i <- sample(1:n, size = k)}

  x1 <- x[i,]}

  x2 <- x[-i,]}

Then {x1, x2} form a partition of the original sample x.

Some exact tests require that all permutations of a sample be generated. The permutations function in package e1071 [72] generates a matrix containing all n! permutations of an index set 1:n. Each row of the returned matrix is a permutation of 1:n.

To generate random two-way contingency tables with given marginals see the function r2dtable.

B.2 Subsetting and Reshaping Data

When working with real data, it is often the case that the format or layout of the data does not match what is required by the methods one would like to apply, there are missing values, or other issues. R provides several utilities for reshaping a dataset. The following simple examples illustrate some of the operations that are possible, such as merging, subsetting or reshaping data. These operations can be very complicated and difficult in practice. Refer to the documentation for each of the individual topics for more detailed explanations and examples.

The examples that follow are provided for convenient reference on a few special topics only, and readers should refer to one of the references for a good introduction to data analysis using R, such as Dalgaard [62] or Verzani [280].

B.2.1 Subsetting Data

Subsets of data frames can be extracted using the operators $, [[]], and array indexing [], as shown above. The subset function provides another approach to subsetting data. The subset function expects the name of the data set, the condition satisfied (subset) by the desired subset, and/or a list of variables (select).

Example B.3 (Subsetting data frames)

Means and summary statistics computed for the iris data in Examples 1.1 and 1.4 can also be computed as follows. The first subset uses the condition that the species is versicolor and selects the variable petal length. The second subset selects sepal length and width without restricting species.

  # versicolor petal length

  y <- subset(iris, Species == "versicolor",

   select = Petal.Length)

  summary(y)

  Petal.Length

  Min.  :3.00

  1st Qu.:4.00

  Median :4.35

  Mean  :4.26

  3rd Qu.:4.60

  Max.  :5.10

  # sepal width, all species

  y <- subset(iris, select = c(Sepal.Length, Sepal.Width))

  mean(y)

  Sepal.Length Sepal.Width

  5.843333 3.057333

B.2.2 Stacking/Unstacking Data

A data frame or list can be stacked or unstacked using the stack (unstack) function.

Example B.4 (Unstacking data)

The InsectSprays data frame contains two variables, count (an integer) and spray (a factor). The format is stacked. The first few observations are shown below.

  > attach(InsectSprays)

  > InsectSprays

  count spray

  1 10 A

  2  7 A

  3 20 A

  4 14 A

  5 14 A

  6 12 A

  . . .

The data can be unstacked by the default formula unstack(InsectSprays), or by explicitly specifying the formula as shown below.

  > unstack(count, count ~ spray)

  A B C D E F

  1 10 11 0 3 3 11

  2  7 17 1 5 5 9

  3 20 21 7 12 3 15

  4 14 11 2 6 5 22

  5 14 16 3 4 3 15

  6 12 14 1 3 6 16

  7 10 17 2 5 1 13

  8 23 17 1 5 1 10

  9 17 19 3 5 3 26

  10 20 21 0 5 2 26

  11 14 7 1 2 6 24

  12 13 13 4 4 4 13

If the result is stored in an object u, then the unstacking could be reversed by stack(u). In the result of stack(u), the counts would then be labeled “values” and the spray (indices) will be labeled “ind”.

R note B.1 The formula count ~ spray represents the linear model where the response is count and the single predictor is the factor spray. An intercept term is included by default. The default model formula associated with a data frame is supplied by formula. For example, the default formula associated with the iris data is the following one, which might not be what is expected.

  > formula(iris)

  Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species

B.2.3 Merging Data Frames

Two data frames can be merged by common variable (column) names or common row names, using the merge function.

Example B.5 (Merge by ID)

In this example, we have created two sets of scores, data1 and data2. The common variable is the ID number in the first column. This example is typical of repeated measurement data. We wish to merge the two scores into a single data frame, by ID. The ID is the first variable in data1 and the first variable in data2, so by=c(1,1) specifies that the merge will match by ID. In the first version below, only the observations with common ID numbers, labeled “V1” will be retained in the new data set. This corresponds to a listwise deletion of any subjects with missing values.

  data1

  [,1] [,2]

  [1,] 1 9

  [2,] 2  12

  [3,] 3 9

  [4,] 4  13

  [5,] 5  13

  data2

   [,1] [,2]

  [1,] 3 6

  [2,] 4  10

  [3,] 5  13

  [4,] 6  10

  [5,] 7  10

Now merge the data sets. By default, only the complete cases are included in the result. In the second version below, all observations are retained in the new data set. Missing scores are assigned the missing value NA.

The syntax is

  merge(x, y) #default

  merge(x, y, by = intersect(names(x), names(y)),

   by.x = by, by.y = by, all = FALSE, ...)

where ... indicates more arguments (see the help topic).

  # keep only the common ID’s

  merge(data1, data2, by=c(1,1))

  V1 V2.x V2.y

  1  3 9 6

  2  4  13  10

  3  5  13  13

  #keep all observations

  merge(data1, data2, by=c(1,1), all=TRUE)

  V1 V2.x V2.y

  1  1 9  NA

  2  2  12  NA

  3  3 9 6

  4  4  13  10

  5  5  13  13

  6  6  NA  10

  7  7  NA  10

B.2.4 Reshaping Data

Suppose we need to reshape Example B.5 data into a “long” format, introducing a time variable. The reshape function is provided to convert between the “wide” and “long” formats. The syntax is

  reshape(data, varying, v.names, timevar, idvar, ids,

   times, drop, direction, new.row.names,

   split, include))

and all of the parameters except data and direction have default values. To keep all observations use all=TRUE. The repeated measurements or time-varying measurements are specified by varying. The direction is “wide” or “long.”

Example B.6 (Reshape)

Convert Example B.5 data from “wide” to “long” format.

  #keep all observations

  a <- merge(data1, data2, by=c(1,1), all=TRUE)

  reshape(a, idvar="ID", varying=c(2,3),

  direction="long", v.names="Scores")

  V1 time Scores ID

  1.1 1 1  9  1

  2.1 2 1 12  2

  3.1 3 1  9  3

  4.1 4 1 13  4

  5.1 5 1 13  5

  6.1 6 1 NA  6

  7.1 7 1 NA  7

  1.2 1 2 NA  1

  2.2 2 2 NA  2

  3.2 3 2  6  3

  4.2 4 2 10  4

  5.2 5 2 13  5

  6.2 6 2 10  6

  7.2 7 2 10  7

B.3 Data Entry and Data Analysis

B.3.1 Manual Data Entry

A spreadsheet-like interface to create a data frame is provided in the edit function.

  mydata <- edit(data.frame())

This command opens a spreadsheet-like editor for data entry. When the editor is closed, a data frame mydata is created. Then mydata can be edited by edit(mydata). It is probably easier to enter a large data set in a spreadsheet and read it into a data frame via read.table, described below.

B.3.2 Recoding Missing Values

The first step in recoding missing values is to find the missing values. The function is.na tests for missing values, returning logical values. The which function returns the indices of a logical vector that are TRUE. Applying which to the result of is.na gives a vector containing the indices of the missing values. Then if i contains the indices of the missing data of a vector x, recoding NA to 0, for example, is as simple as x[i] <- 0.

Example B.7 (Recode)

With the repeated measures data in Example B.6, recode the missing scores to 0. The function is.na tests for missing values. Extract the row indices of the missing scores using the which function. Below, which returns the indices 6,7,8,9, indicating that scores with those subscripts are missing.

  #store the previous result into b

  b <- reshape(a, idvar="ID", varying=c(2,3),

    direction="long", v.names="Scores")

  i <- which(is.na(b$Scores)) #these are missing

Now the indices stored in i are 6, 7, 8, 9, and we replace the corresponding NA’s with 0.

  b$Scores[i] <- 0 #replace NA with 0

   V1 time Scores ID

  1.1  1 1  9  1

  2.1  2 1 12  2

  3.1  3 1  9  3

  4.1  4 1 13  4

  5.1  5 1 13  5

  6.1  6 1  0  6

  7.1  7 1  0  7

  1.2  1 2  0  1

  2.2  2 2  0  2

  3.2  3 2  6  3

  4.2  4 2 10  4

  5.2  5 2 13  5

  6.2  6 2 10  6

  7.2  7 2 10  7

The which function can also be used to extract array indices, by setting arr.ind=TRUE. From the result of the second version of the merge operation in Example B.5, we can extract the array indices of the missing values as follows.

  m <- merge(data1, data2, by=c(1,1), all=TRUE)

  i <- which(is.na(m), arr.ind=TRUE) #these are missing

>i

  row col

  [1,] 6  2

  [2,] 7  2

  [3,] 1  3

  [4,] 2  3

B.3.3 Reading and Converting Dates

A time series for financial data usually has a calendar date corresponding to each observation. In this section we discuss some basic methods for importing files with dates, converting dates to useful formats, and extracting the day, month, and year. Date arithmetic and formatting is a complicated subject, however, and depends in part on the locale. Refer to the R manual [217] for thorough documentation.

Our first example illustrates how to convert a string format date from “mm/dd/yyyy” format into “yyyymmdd”. See the help topics for as.Date, format.Date, and strptime for more details and other examples.

Example B.8 (Date formats)

Convert the string representation of a date into a date object, and display the result in several formats. The default format is “yyyy-mm-dd”. The date is printed in four different formats below.

  d <- "3/27/1995"

  thedate <- as.Date(d, "%m/%d/%Y")

  print(thedate)

  [1] "1995-03-27"

  print(format(thedate, "%Y%m%d"))

  [1] "19950327"

  print(format(thedate, "%B %d, %Y"))

  [1] "March 27, 1995"

  print(format(thedate, "%y-%b-%d"))

  [1] "95-Mar-27"

To extract year, month, day, or other components from the date or time, we can use the POSIXlt date-time class (?DateTimeClasses).

Example B.9 (Date-time class)

Continuing with the previous example, use the POSIXlt date-time class to extract the year, month, and day from the date 1995-03-27. The commands and results are below. Notice that the months Jan., ..., Dec. are numbered 0, 1, ...,11, and year is years since 1900.

  > pdate <- as.POSIXlt(thedate)

  > print(pdate$year)

 [1] 95

  > print(pdate$mon)

  [1] 2

  > print(pdate$mday)

 [1] 27

Type ?DateTimeClasses to see the documentation on the date-time objects POSIXlt and POSIXct.

B.3.4 Importing/exporting .csv files

Data is often supplied in comma-separated-values (.csv) format, which is a text file that separates data with special text characters called delimiters. Files in .csv format can be opened in most spreadsheet applications. Spreadsheet data should be saved in .csv format before importing into R. In a .csv file, the dates are likely to be given as strings, delimited by double quotation marks.

Example B.10 (Importing/exporting .csv files)

This example illustrates how to export the contents of a data frame to a .csv file, and how to import the data from a .csv file into an R data frame.

  #create a data frame

  dates <- c("3/27/1995", "4/3/1995",

   "4/10/1995", "4/18/1995")

  prices <- c(11.1, 7.9, 1.9, 7.3)

  d <- data.frame(dates=dates, prices=prices)

  #create the .csv file

  filename <- "/Rfiles/temp.csv"

  write.table(d, file = filename, sep = ",",

   row.names = FALSE)

The new file “temp.csv” can be opened in most spreadsheets. When displayed in a text editor (not a spreadsheet), the file “temp.csv” contains the following lines (without the leading spaces).

  "dates","prices"

  "3/27/1995",11.1

  "4/3/1995",7.9

  "4/10/1995",1.9

  "4/18/1995",7.3

Most .csv format files can be read using read.table. In addition there are functions read.csv and read.csv2 designed for .csv files.

  #read the .csv file

  read.table(file = filename, sep = ",", header = TRUE)

  read.csv(file = filename) #same thing

  dates prices

  1 3/27/1995 11.1

  2 4/3/1995 7.9

  3 4/10/1995 1.9

  4 4/18/1995 7.3

See Example B.8 for converting the character representation of the dates to date objects.

B.3.5 Examples of data entry and analysis

Although it is not the subject of this text, users new to R generally need to know how to analyze typical textbook examples with small data sets. For Monte Carlo studies, one also may need to extract certain results from a fitted model. We conclude this section with a few simple examples of this type.

Stacked data entry

Example B.11 (One-way ANOVA)

Weight measurements are collected for two treatment groups of subjects and a control group. This is a completely randomized design, and we want to obtain the one-way Analysis of Variance (ANOVA). The layout of the data is the one-way layout, and for ANOVA we will need stacked data. The factor has three levels. Here we create a vector for the response variable (weight) and a vector for the group variable, encoding it as a factor. See Example B.13 for another approach to stacking the data for the one-way layout.

  # One-way ANOVA example

  # Completely randomized design

  ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)

  trt1 <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)

  trt2 <- c(5.19,3.33,3.20,3.13,6.46,5.36,6.95,4.19,3.16,4.95)

  group <- factor(rep(1:3, each=10))  #factor

  weight <- c(ctl, trt1, trt2)  #response

  a <- lm(weight ~ group)

Note that encoding the group variable as a factor is important. If group is not a factor, but simply a vector of integers, then lm will fit a regression model. The output for anova is the ANOVA table. More detailed output is available with the summary method.

> anova(a)      #brief summary

  Analysis of Variance Table

  Response: weight

    Df  Sum Sq Mean Sq F value Pr(>F)

  group  2  1.1200 0.5600 0.5656 0.5746

  Residuals 27 26.7344 0.9902

  > summary(a)     #more detailed summary

  Call:

  lm(formula = weight ~ group)

  Residuals:

   Min  1Q Median  3Q Max

 -1.4620 -0.5245 0.0685  0.5005 2.3580

  Coefficients:

   Estimate Std. Error t value Pr(>|t|)

  (Intercept) 5.0320 0.3147 15.991 2.71e-15 ***

  group2 -0.3710 0.4450 -0.834 0.412

  group3 -0.4400 0.4450 -0.989 0.332

---

  Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

  Residual standard error: 0.9951 on 27 degrees of freedom

  Multiple R-Squared: 0.04021, Adjusted R-squared: -0.03089

  F-statistic: 0.5656 on 2 and 27 DF, p-value: 0.5746

Extracting statistics and estimates from fitted models

In Monte Carlo studies, we often want to extract the p-values, F statistics, or R-squared values from the analysis, rather than print a summary of it. The following example shows how to extract various results from an anova object or the summary.

Example B.12 (Extract p-values and statistics from ANOVA)

To extract p-values, F statistics and other information from the anova object or result of summary, we need the names of these values. Then the information can be extracted by name or by position using square brackets. (This example continues from the analysis in Example B.11.)

  A <- anova(a)

  names(A)

  [1] "Df" "  Sum Sq" "Mean Sq" "F value" "Pr(>F)"

Then, suppose we need the F statistic. It is a vector of length 2, corresponding to the two rows in the ANOVA table. The F statistic in each row corresponds to the factor in the same row.

  > A$"F value"

  [1] 0.5655666    NA

  > A$"F value"[1]

  [1] 0.5655666

Similarly, we can use names to find the names of the values in the object returned by the summary method.

  B <- summary(a)

  names(B)

  [1] "call" "terms" "residuals" "coefficients" "aliased"

  [6] "sigma" "df" "r.squared" "adj.r.squared" "fstatistic"

  [11]"cov.unscaled"

Now suppose that we want to extract the R-squared, the MSE, and the degrees of freedom for error from this model.

  > B$sigma

  [1] 0.9950695

  B$r.squared

  [1] 0.0402093

  B$df[2]

  [1] 27

Create data frame in stacked layout

The next example shows an alternate method for entering data in the one-way layout. In this case, we create a data frame and use the stack function.

Example B.13 (Stacked data entry)

The small data set in this example is given in Case Study 12.3.1 of Larsen and Marx [170]. The factor (type of antibiotic) has five levels. The response variable measures the binding of the drug to serum proteins. The layout of the data frame must be stacked for the ANOVA.

  P <- c(29.6, 24.3, 28.5, 32)

  T <- c(27.3, 32.6, 30.8, 34.8)

  S <- c(5.8, 6.2, 11, 8.3)

  E <- c(21.6, 17.4, 18.3, 19)

  C <- c(29.2, 32.8, 25, 24.2)

  #glue the columns together in a data frame

  x <- data.frame(P, T, S, E, C)

  #now stack the data for ANOVA

  y <- stack(x)

  names(y) <- c("Binding", "Antibiotic")

The first few rows of the stacked data in y are

  Binding Antibiotic

  1 29.6   P

  2 24.3   P

  3 28.5   P

  4 32.0   P

  5 27.3   T

  6 32.6   T

  . . .

and this data is in the one-way layout for ANOVA. Now y is a data frame, so there is a default formula associated with it.

  > #check the default formula

  > print(formula(y)) #default formula is right one

  Binding ~ Antibiotic

As the default formula is the same model that we want to fit, lm can be applied without specifying the formula.

  > lm(y)

  Call:

  lm(formula = y)

  Coefficients:

  (Intercept) AntibioticE AntibioticP AntibioticS AntibioticT

  27.800 -8.725  0.800  -19.975  3.575

  > anova(lm(y))

  Analysis of Variance Table

  Response: Binding

    Df Sum Sq Mean Sq F value  Pr(>F)

  Antibiotic  4 1480.82 370.21 40.885 6.74e-08 ***

  Residuals  15 135.82 9.05

---

  Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Statistics, p-values, and estimates can be extracted from the fitted model in the same way as shown in Example B.12.

Example B.14 (Two-way ANOVA)

The leafshape (DAAG) [185] data is already in stacked format, with two factors location and leaf architecture arch.

  > data(leafshape, package = "DAAG")

  > anova(lm(petiole ~ location * arch))

  Analysis of Variance Table

  Response: petiole

    Df Sum Sq Mean Sq F value Pr(>F)

  location  5  209.9 42.0  1.8107 0.1108

  arch   1 1098.5 1098.5 47.3786 3.983e-11 ***

  location:arch 5  232.6 46.5  2.0066 0.0779 .

  Residuals 274 6352.8 23.2

---

  Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Use the formula petiole~location+arch to fit the model without the interaction term.

Example B.15 (Multiple comparisons)

In Example B.13, one can follow up with a multiple comparison procedure to decide which means are significantly different. One such method is Tukey’s procedure. The critical value of the studentized range statistic at α = 0.05 can be obtained by

  qtukey(p = .95, nmeans = 5, df = 15)

  [1] 4.366985

For TukeyHSD use aov to fit the model rather than lm.

  #alternately: Tukey Honest Significant Difference

 a <- aov(formula(y), data = y)

 TukeyHSD(a, conf.level=.95)

  Tukey multiple comparisons of means

  95% family-wise confidence level

 Fit: aov(formula = formula(y), data = y)

  $Antibiotic

   diff   lwr  upr  p adj

  E-C  -8.725 -15.295401  -2.154599  0.0071611

  P-C 0.800 -5.770401 7.370401  0.9952758

  S-C -19.975 -26.545401 -13.404599  0.0000010

  T-C 3.575 -2.995401  10.145401  0.4737713

  P-E 9.525  2.954599  16.095401  0.0034588

  S-E -11.250 -17.820401  -4.679599  0.0007429

  T-E  12.300  5.729599  18.870401  0.0003007

  S-P -20.775 -27.345401 -14.204599  0.0000006

  T-P 2.775 -3.795401 9.345401  0.6928357

  T-S  23.550 16.979599  30.120401  0.0000001

Example B.16 (Regression)

Other examples of formula (see e.g. Example 7.17) for regression rather than ANOVA are the following.

  library(DAAG)

  attach(ironslag)

  # simple linear regression model

  lm(magnetic ~ chemical)

  # quadratic regression model

  lm(magnetic ~ chemical + I(chemical^2))

  # exponential regression model

  lm(log(magnetic) ~ chemical)

  # log-log model

  lm(log(magnetic) ~ log(chemical))

  # cubic polynomial model

  lm(magnetic ~ poly(chemical, degree = 3))

  detach(ironslag)

  detach(package:DAAG)

In the quadratic model, the “as is” operator I() indicates that the exponentiation operator is an arithmetic operator, and should not be interpreted as a formula operator. Note that poly evaluates an orthogonal polynomial.

  > cor(poly(chemical, 2)) #uncorrelated

     1    2

  1 1.000000e +00 -4.956837e-18

  2 -4.956837e -18 1.000000e+00

  > cor(chemical, chemical^2) #correlated

  [1] 0.9919215

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix B Working with Data Frames and Arrays

Create new playlist

Sign In

Sign Up