CHAPTER 11: Descriptive Functions and Manipulating Objects

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 11

Descriptive Functions and Manipulating Objects

For arrays, matrices, vectors, lists, and expressions, there are a number of functions that describe various attributes of an object. Also, there are a number of functions that manipulate objects to create new objects. The functions covered in this chapter are the descriptive functions dim(), nrow(), NROW(), ncol(), NCOL(), length(), and nchar(); and the functions that manipulate objects: cbind() and rbind(); the apply functions; sweep(), scale(), and aggregate(); the table functions; and functions tabulate(), and ftable().

Descriptive Functions

The descriptive functions describe qualities of objects. This section discusses some descriptive functions that are useful when writing functions or creating objects. The functions are dim(), nrow(), ncol(), NROW(), NCOL(), length(), and nchar().

The Function dim()

For objects for which dimensions make sense—such as matrices, data.frames, tables, or arrays—the function dim() returns the number of levels in each of the dimensions of the object. For objects of other classes, dim() returns NULL. An example follows:

> a = 1:2
> b = 1:3
>
> dim(a)
NULL
 
> a %o% b %o% a
, , 1
 
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
 
, , 2
 
     [,1] [,2] [,3]
[1,]    2    4    6
[2,]    4    8   12
 
>
> dim(a %o% b %o% a)
[1] 2 3 2

The dimensions of the object can be changed if the product of the original dimensions equals the product of the dimensions of the result. An example follows:

> a.ar = a %o% b
  
> a.ar
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
 
> dim(a.ar)
[1] 2 3
  
> dim(a.ar)= c(3,2)
  
> a.ar
     [,1] [,2]
[1,]    1    4
[2,]    2    3
[3,]    2    6

You can find more information about dim() by entering ?dim at the R prompt.

The Functions nrow(), ncol(), NROW(), and NCOL()

For matrices, data.frames, and arrays, nrow() and ncol() give the number of levels in the first and second dimensions of the matrix, data frame, or array respectively. Other classes of objects return NULL. An example follows:

> a.ar = a%o%b
  
> a.ar
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
 
> nrow(a.ar)
[1] 2
 
> ncol(a.ar)
[1] 3
 
> nrow(1:20)
NULL

Sometimes vectors must be treated as matrices or arrays. The functions NROW() and NCOL() treat vectors as one-column matrices, but otherwise are the same as nrow() and ncol(). An example follows:

> NROW(1:20)
[1] 20
>
> NCOL(1:20)
[1] 1

You can find more information about nrow(), ncol(), NROW(), and NCOL()by entering ?nrow at the R prompt.

The Function length()

The next descriptive function we will explain is length(). The argument to length() can be any mode of object. For atomic objects, length() returns the number of elements in the object. For list objects, length() returns the number of the top level elements. For functions, length() returns one. For calls, length() returns the number of arguments entered in the creation of the call. For names, length() returns one. For expressions, length() returns the number of elements in the expression. Some examples follow:

> mat=matrix(1:4,2,2)
> mat
     [,1] [,2]
[1,]    1    3
[2,]    2    4
 
> length(mat)
[1] 4
  
> a.list=list(mat, c("abc","cde"))
> a.list
[[1]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4
 
[[2]]
[1] "abc" "cde"
 
> length(a.list)
[1] 2
 
> a.fun = function(mu, se=1, alpha=.05){
  z_value = qnorm(1-alpha/2, mu, se)
  print(z_value)
}
 
> length(a.fun)
[1] 1
 
> a.call=call("lm", y~x)
> a.call
lm(y ~ x)
 
> length(a.call)
[1] 2
  
> a.name
`1`
 
> length(a.name)
[1] 1
 
> a.exp = expression(a.call, sin(1:5/180 * pi))
> a.exp
expression(a.call, sin(1:5/180 * pi))
 
> length(a.exp)
[1] 2

The length of an atomic or list object can be assigned using length(). For other mode objects, an attempted length() assignment returns an error. If n is the length of an atomic object, then setting the length to a value larger than n generates NAs for the extra elements. Setting the length shorter than n removes the extra elements. In either case, a vector is returned unless the length is not changed, in which case the original object is returned. An example follows:

> mat
     [,1] [,2]
[1,]    1    3
[2,]    2    4
  
> mat.2 = mat
  
> length(mat.2)=6
> mat.2
[1]  1  2  3  4 NA NA
  
> mat.2 = mat
 
> length(mat.2)=3
 
> mat.2
[1] 1 2 3
  
> mat.2 = mat
  
> length(mat.2)=4
  
> mat.2
     [,1] [,2]
[1,]    1    3
[2,]    2    4

For objects of mode list, lengthening the list adds NULL elements at the top level while shortening the list removes elements at the top level. An example follows:

> a.list
[[1]]
     cl1 cl2
[1,]   1   3
[2,]   2   4
 
[[2]]
[1] "abc" "cde"
 
> length(a.list)=4
  
> a.list
[[1]]
     cl1 cl2
[1,]   1   3
[2,]   2   4
 
[[2]]
[1] "abc" "cde"
 
[[3]]
NULL
 
[[4]]
NULL
 
> length(a.list)=3
  
> a.list
[[1]]
     cl1 cl2
[1,]   1   3
[2,]   2   4
 
[[2]]
[1] "abc" "cde"
 
[[3]]
NULL

You can find more information about length()by entering ?length at the R prompt.

The Function nchar()

The function nchar() counts characters in objects that can be coerced to mode character. The function takes three arguments: x, type, and allowNA.

The argument x is the object. The function coerces the object to character, and the characters to be counted are the characters of the coerced object. For example:

> as.character(a.list)
[1] "1:4"                 "c("abc", "cde")" "NULL"
> nchar(a.list)
[1]  3 15  4

Quotes are not counted.

The argument type is a character argument and can take on the values of “bytes”, “chars”, or “width”. If “bytes” is chosen, the bytes of the strings are counted. If “chars” is chosen, the standard text number of characters are counted. If “width” is chosen, the number of characters that the function cat() would assign the strings are counted. The default value is “char”. Usually there is no difference between the three.

The argument allowNA is a logical argument. If set equal to TRUE, strings that are not valid are set equal to NA. If set equal to FALSE, strings that are not valid give an error and cause the function to stop. The default value is FALSE.

You can find more information about nchar()by entering ?nchar at the R prompt.

Manipulating Objects

There are a number of functions that manipulate R objects and make programming easier. This subsection covers some of the functions, including cbind(), rbind(), apply(), lapply(), sapply(), vapply(), tapply(), mapply(), sweep(), scale(), aggregate(), table(), tabulate(), and ftable().

The Functions cbind() and rbind()

The functions cbind() and rbind() are self-explanatory for vectors, matrices, data frames, and some other classes of objects such as time series. The function cbind() binds columns. The function rbind() binds rows.

For lists that are not matrixlike, the functions return the type and number of elements in each of the highest level elements of the list arguments, creating a matrix of the types with integers. Lists can be bound with non-list objects. The result will be a list, but the non-list arguments will not be converted like the list part of the result.

In the call to the function, the objects to be bound are separated by commas. For cbind(), vectors are treated as columns. For rbind(), vectors are treated as rows.

For vectors, vectors being bound do not have to be of the same length. The vectors cycle. For higher dimensional objects, the objects cycle until the bound object is filled if, for rbind(), the numbers of columns are multiples of each other and, for cbind(), the number of rows are multiples of each other. Otherwise, the functions give an error if there is a row/column mismatch.

The resulting object takes on the type of the highest level object entered, where the hierarchy, from lowest to highest, is raw, logical, integer, double, complex, character, and list.

There is one argument to cbind() and rbind() other than the objects to be bound—the argument deparse.level, which is used to create labels for objects that are not matrixlike. The argument is an integer argument and can take on the values of 0, 1, or 2, although any value that can be coerced to an integer works. Values that do not give 1 or 2 when coerced to an integer give the same result as 0. The default value is 1.

For data frames, if a data frame is included in the objects to be bound and a list that is not a data frame is not included, then the result is a data frame. In that case, any character columns are changed to factors unless specified to not.

For time series, cbind() gives a multivariate time series, whereas for rbind(), the time series reverts to a matrix. An example follows:

> ab.list = list(one=1:5,two=3:7)
> ab.list
$one
[1] 1 2 3 4 5
 
$two
[1] 3 4 5 6 7
 
> cbind(ab.list,1:4)
     ab.list
[1,] Integer,5 1
[2,] Integer,5 2
[3,] Integer,5 3
[4,] Integer,5 4
 
> rbind(1:3,3:5,5:7)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    3    4    5
[3,]    5    6    7

The Apply Functions

There are several functions in R for applying a function over a subset of an object, six of which are covered here. The six functions are apply(), lapply(), sapply(), vapply(), tapply(), and mapply(). The functions to be applied can be user-defined, which can be quite useful.

The Function apply()

The function apply() takes three arguments—X, MARGIN, and FUN—as well as any arguments to the function FUN. The first argument, X, is an array (including matrices). The second argument gives the margin(s) over which the function is to operate, and FUN is the function to be applied.

For matrices, entering 1 for MARGIN applies the function across the columns. For 2, the function is applied down the rows.

The function to be applied is entered without parentheses. Any arguments to the function are entered next, separated by commas. The result is an array, matrix, or vector. An example follows:

> mat=matrix(1:4,2,2, dimnames=list(c("r1","r2"),c("c1","c2")))
> mat
   c1 c2
r1  1  3
r2  2  4
 
> apply(mat,1,sum)
r1 r2
 4  6
 
> apply(mat,1,pnorm,3,1)
           r1        r2
c1 0.02275013 0.1586553
c2 0.50000000 0.8413447

In the example, the arguments to pnorm() are the rows in mat for the q values, 3 for the value of mean, and 1 for the value of sd. Note that the matrix is transposed in the result.

You can find more information about apply()by entering ?apply at the R prompt.

The lapply(), sapply(), and vapply() Functions

The lapply(), sapply(), and vapply() functions work with vectors, including lists, and expressions. If X is not a list, then X is coerced to a list. The elements must be of the correct mode for the function being applied.

The function lapply() is the simplest with just two arguments plus any arguments to the function to be applied. The function sapply() takes four arguments plus any extra arguments for the function to be applied. The function vapply() also takes four arguments plus any extra for the function to be applied.

The Function lapply()

The function lapply() takes the arguments X and FUN, plus any extra arguments for FUN. The function FUN is applied to every element of the vector or every top level element of the list. The result is a list. An example follows:

> b.list=list(1:7,3:4)
> b.list
[[1]]
[1] 1 2 3 4 5 6 7
 
[[2]]
[1] 3 4
 
> lapply(b.list,sum)
[[1]]
[1] 28
 
[[2]]
[1] 7

You can enter arithmetic operators by enclosing the operators within quotes. For example:

> lapply(1:2,"^",2)
[[1]]
[1] 1
 
[[2]]
[1] 4

The Function sapply()

The function sapply() also operates on vectors, including lists, and expressions. The function takes the arguments X and FUN, then any arguments to FUN followed by the arguments simplify and USE.NAMES.

The argument simplify can be logical or the character string “array”. The argument simplify tells sapply() to simplify the list to a vector or matrix if TRUE, and to an array if set equal to “array”. No simplification is done if set equal to FALSE. For FALSE, a list is returned. The value TRUE is the default.

The argument USE.NAMES is a logical argument. For an object of mode character, the argument USE.NAMES tells sapply() to use the elements of the object as names for the result. The default value is TRUE. An example follows:

> ab.list
$one
[1] 1 2 3 4 5
 
$two
[1] 3 4 5 6 7
 
> sapply(ab.list, sum)
one two
 15  25
  
> a.char
[1] "a7"  "a8"  "a9"  "a10"
  
> sapply(a.char,paste,"b", sep="")
    a7     a8     a9    a10
 "a7b"  "a8b"  "a9b" "a10b"
  
> sapply(a.char,paste,"b", sep="", USE.NAMES=F)
[1] "a7b"  "a8b"  "a9b"  "a10b"

The Function vapply()

The function vapply() takes the arguments X, FUN, FUN.VALUE, any arguments to FUN, and USE.NAMES, in that order.

The argument FUN.VALUE is a structure for the output from the function. The structure is the structure of the result of applying FUN to a single element of X. Dummy values of the correct mode are used in the structure. The number and mode of the dummy elements must be correct. Any extra arguments for FUN are placed after FUN.VALUE. The default value of USE.NAMES is TRUE. An example follows:

> set.seed(382765)
> e
[1] 1 2
  
> vapply(e,rnorm,matrix(.1,2,2), n=4, sd=1)
, , 1
 
         [,1]      [,2]
[1,] 1.701435 1.1422971
[2,] 2.068151 0.9604146
 
, , 2
 
          [,1]     [,2]
[1,] 0.3541925 1.186276
[2,] 2.6841000 1.745577

In the example, e is a vector of means entered into the function rnorm(), and the other arguments to rnorm() are n=4 and sd=1.

The function vapply() returns an array, matrix, or vector of objects of the kind given by the argument FUN.VALUE.

You can find more information about lapply(), sapply(), and vapply() by entering ?lapply at the R prompt.

The Function tapply()

The function tapply() applies functions to cross tabulated data. The arguments are X, IND, FUN, any extra arguments to FUN, and simplify. The default value for FUN is NULL, and the default value of simplify is TRUE.

The argument X must be an atomic object and is coerced to a vector. The argument can be a contingency table created by table(). The length of X is then the product of the dimensions of the contingency table.

The argument IND must be a vector that can be coerced to a factor or a list of vectors that can be coerced to factors. The length of X and the length(s) of the factor vectors must all be the same.

The values of X are the number of observations with a given factor combination, where the factor combinations are given by crossing the factor values. If combinations are repeated, the function does not work right. There is no need to enter zeroes for factor combinations without observations, but zeroes may be included.

Using tapply() without a function gives the index of the cells that contain observations, while using a function gives the factor cross table, with the function applied to the contents of the cells. An example follows:

> list(c("a","b","b","c"), c(5,5,6,5))
[[1]]
[1] "a" "b" "b" "c"
 
[[2]]
[1] 5 5 6 5
 
> cbind(c("a","b","b","c"),c(5,5,6,5))
     [,1] [,2]
[1,] "a"  "5"
[2,] "b"  "5"
[3,] "b"  "6"
[4,] "c"  "5"
 
> tapply(1:4, list(c("a","b","b","c"), c(5,5,6,5)))
[1] 1 2 5 3
  
> tapply(1:4, list(c("a","b","b","c"), c(5,5,6,5)), "^",3)
   5  6
a  1 NA
b  8 27
c 64 NA

You can find more information about tapply()by entering ?tapply at the R prompt.

The Function mapply()

The function mapply() takes an object that is a vector or a list as an argument and applies a function to each element of the vector or list. If an object that is not a vector or list is entered, mapply() attempts to coerce the object to a vector or list. The elements of the resulting object must be legal for the function to be applied. The result of mapply() is a vector, matrix, or list.

The arguments to mapply() are FUN, . . ., MoreArgs, SIMPLIFY, and USE.NAMES. The argument FUN is the function to be applied. The argument . . . refers to the vectors or lists on which the argument FUN operates and may be a collection of lists and/or vectors collected using c(). The argument MoreArgs refers to any additional arguments to FUN and by default equals NULL. The arguments must be in list mode, with a separate list for each argument.

The argument SIMPLIFY tells mapply() to attempt to simplify the result to a vector or matrix. The default value is TRUE. The argument USE.NAMES tells mapply() to use the names of the elements or, if the vector is of mode character, the characters themselves, as names for the output. By default, the value is TRUE. An example follows:

> set.seed(382765)
 
> a.mat = matrix(1,4,4)
> b.mat = matrix(runif(9),3,3)
> c.vec = 1:2
 
> mapply(det, list(a.mat, b.mat))
[1]  0.0000000 -0.3349038
  
> mapply(mean, c( list(a.mat,b.mat), c.vec))
[1] 1.0000000 0.6208733 1.0000000 2.0000000
  
> mapply(mean, c( list(a.mat,b.mat), list(c.vec)))
[1] 1.0000000 0.6208733 1.5000000

Here det finds the determinants of the elements and mean finds the means of the elements.

Another example—using MoreArgs—follows:

> set.seed(382765)
>
> mapply(cor, c(list(a.mat,b.mat), list(c.vec)), list(y=1:4,y=1:3,y=3:4), list(use="everything"), list(method="pearson"))
[[1]]
     [,1]
[1,]   NA
[2,]   NA
[3,]   NA
[4,]   NA
 
[[2]]
           [,1]
[1,]  0.1872769
[2,]  0.8836377
[3,] -0.4585219
 
[[3]]
[1] 1
 
Warning message:
In (function (x, y = NULL, use = "everything", method = c("pearson",  :
  the standard deviation is zero

Here the function is the correlation function and the arguments y, use, and method are supplied, each as a list.

You can find more information about mapply()by entering ?mapply at the R prompt.

The sweep() and scale() Functions

The sweep() function operates on arrays (including matrices and vectors that have been converted to matrices), and the scale() function operates on numeric matrixlike objects. The sweep() function sweeps out a margin(s) of an array (say, the columns of a matrix) with values (say, the column means) using a function (say, the subtraction operator). The scale() function by default centers and normalizes the columns of matrices by subtracting the mean and dividing by the standard deviation for each column.

The Function sweep()

The function sweep() takes the arguments x, MARGIN, STATS, FUN, check.margin, and . . .. The argument x is the array. The array can be of any atomic mode.

The argument MARGIN gives the margins over which the sweep is to take place. For a matrix, MARGIN equals 1, 2, or 1:2 (or c(1,2)). If MARGIN equals 1:2, the entire matrix is swept, rather than the sweeping being done by column or row. For an array of more than two dimensions, MARGIN can be any subset of the margins, including all of the margins.

The argument STATS gives the value(s) to sweep with. For example, to use column means the function apply() can be applied; that is apply(mat, 2, mean) would work as a value for STATS, where mat is the matrix being swept. The value(s) for STATS cycle.

The argument FUN is the function to use. By default, FUN equals “-”, the subtraction operator, but FUN can be any function legal for the values of the array. For example, paste can be used with arrays of mode character.

The argument check.margin checks to see if the dimensions or length of STATS agrees with the dimensions given by MARGIN. If not, just a warning is given. The function does not stop, but cycles the values in STATS. The default value is TRUE.

The argument . . . gives any extra arguments to the function FUN. An example follows:

> d.mat = matrix(1:8,2,4)
> d.mat
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
 
> a = sweep(d.mat, 2, apply(d.mat, 2, mean))
> a
     [,1] [,2] [,3] [,4]
[1,] -0.5 -0.5 -0.5 -0.5
[2,]  0.5  0.5  0.5  0.5
  
> sweep(a, 2, apply(d.mat, 2, sd), "/")
           [,1]       [,2]       [,3]       [,4]
[1,] -0.7071068 -0.7071068 -0.7071068 -0.7071068
[2,]  0.7071068  0.7071068  0.7071068  0.7071068

Since MARGIN is set equal to 2, the function mean() takes the mean of each column and the function sd() takes the standard deviation of each column. In the second statement, the mean of each column is subtracted from the elements in the column. The subtraction function is the default, so it does not need to be entered. In the third statement, the centered elements in the columns are divided by the standard deviations for the columns.

Note that the function returns a matrix. You can find more information about sweep()by entering ?sweep at the R prompt.

The Function scale()

The function scale() is used to scale columns of a matrix—that is, to center the column to a specified center and to scale the column to a specified standard deviation. The function scale() takes three arguments: x, center, and scale. The argument x is a matrix or matrixlike numeric object (for example a data frame or time series).

The argument center can be either logical or a numeric vector of length equal to the number of columns in x. If set to TRUE, the column mean is subtracted from each element in a column. If set to a vector of numbers, then each number is subtracted from the elements in the number’s corresponding column. If set equal to FALSE, nothing is subtracted. The default value is TRUE.

The argument scale can also be logical or a vector of numbers. If scale is set equal to TRUE, each centered (if centering has been done) element is divided by the standard deviation of the elements in the column, where NAs are ignored and the division is by n-1. If set equal to a vector of numbers, each (centered) element of a column is divided by the corresponding number in the vector. Dividing by zero will give an NaN but will not stop the execution. If scale is set equal to FALSE, no division is done. The default value is TRUE. An example follows:

> d.mat = matrix(1:8,2,4)
> d.mat
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
  
> scale(d.mat)
           [,1]       [,2]       [,3]       [,4]
[1,] -0.7071068 -0.7071068 -0.7071068 -0.7071068
[2,]  0.7071068  0.7071068  0.7071068  0.7071068
attr(,"scaled:center")
[1] 1.5 3.5 5.5 7.5
attr(,"scaled:scale")
[1] 0.7071068 0.7071068 0.7071068 0.7071068
  
> e.mat = matrix(c(1:8,NA,2),2,5)
> e.mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7   NA
[2,]    2    4    6    8    2
 
> scale(e.mat, center=rep(3,5), scale=rep(4,5))
      [,1] [,2] [,3] [,4]  [,5]
[1,] -0.50 0.00 0.50 1.00    NA
[2,] -0.25 0.25 0.75 1.25 -0.25
attr(,"scaled:center")
[1] 3 3 3 3 3
attr(,"scaled:scale")
[1] 4 4 4 4 4

Note that scale() returns the scaled matrix, the values used to center the elements, and the values used to scale the elements.

For more information, enter ?scale at the R prompt.

The Functions aggregate(), table(), tabulate(), and ftable()

Like the apply functions, the function aggregate() finds statistics for data groups. The functions table(), tabulate(), and ftable() create contingency tables out of data.

The Function aggregate()

The function aggregate() applies a function to the elements of an object based on the values of another object. The object to be operated on is either a time series, a data frame, or an object that can be coerced to a data frame. The values of the other object must be a list with elements that can be interpretable as factors and, at the second level, must be of length equal to the rows of the data frame or time series. The function treats data frames and time series differently.

Data Frames

For data frames , the arguments are x, by, FUN, . . ., and simplify. The argument x is a data frame. The argument by is an object of mode list consisting of elements that can be interpreted as factors. The elements of by are used to group the rows of x.

The argument FUN is the function to be applied and . . . are any extra arguments for that function. The argument simplify tells aggregate() whether to try to simplify the result to a vector or matrix. The default value is TRUE. The result of aggregate() for a data frame is a data frame. An example follows:

>  x2=rep(1:2,3)
>  x1=rep(1:2,3)
>  y1=1:6
>  y2=7:12
  
>  a.df=data.frame(y1,y2,x1,x2)
 
>  a.df
  y1 y2 x1 x2
1  1  7  1  1
2  2  8  2  2
3  3  9  1  1
4  4 10  2  2
5  5 11  1  1
6  6 12  2  2
  
> aggregate(a.df[,1:2], list(x1,x2), mean)
  Group.1 Group.2 y1 y2
1       1       1  3  9
2       2       2  4 10

For data frames, a formula may be used to classify x rather than using the argument by. For the formula option, the arguments are formula, data, FUN, . . ., subset, and na.action. The argument formula takes the form y~x, where y is numeric and can have more than one column and x is a formula such as x1 or x1+x2, where both x1 and x2 can be interpreted as factors.

The argument data gives the name of the data frame and must be included. The argument FUN is the function to be applied and . . . contains any extra arguments for FUN. The argument subset gives the rows of the data frame on which to operate. The argument na.action gives the choice for how to handle missing values and is a character string. The default value is “na.omit”, which tell aggregate() to omit missing values. An example follows:

>  a.df
  y1 y2 x1 x2
1  1  7  1  1
2  2  8  2  2
3  3  9  1  1
4  4 10  2  2
5  5 11  1  1
6  6 12  2  2
 
> aggregate(cbind(y1,y2)~x1+x2, data=a.df, sum, subset=1:3)
  x1 x2 y1 y2
1  1  1  4 16
2  2  2  2  8

Note that the by variable must be a list while the right side of a formula cannot be a list.

Time Series

Time series have both a frequency and a period. In R, the frequency is the inverse of the period and vice versa. For example, a year can be the period of interest. Then the months have a frequency of 12 while having sub-periods of 1/12.

For time series, the arguments are x, nfrequency, FUN, ndeltat, ts.eps, and . . .. The argument x must be a time series. The argument nfrequency is the number of sub-periods for each period after FUN has operated on the time series. The value must divide evenly into the original time series frequency. The argument equals 1 by default. (The original time series frequency divided by nfrequency gives the number of elements that are grouped together—on which FUN operates.)

The argument FUN is the function to be applied and . . . gives any extra arguments to FUN. The argument . . . is at the end of the argument list. The function FUN must be legal for the values of the time series and is by default sum.

The argument ndeltat tells aggregate() the length of the sub-periods for the output and equals 1 by default. The product of the frequency of the original time series and ndeltat must be an integer.

Either nfrequency or ndeltat can be set. The other is set to the inverse of the one set.

The argument ts.eps gives the tolerance for accepting that nfrequency divides evenly into the frequency of the time series. By default, nfrequency equals getOption(“ts.eps”), which value can be found by entering options(“ts.eps”) at the R prompt. The value is numeric and can be set manually. An example follows:

> x1=c(1,2,1,2,1,2)
> x2=c(1,2,3,1,2,3)
 
> a.ts=ts(cbind(x1,x2), start=1, frequency=3)
> a.ts
Time Series:
Start = c(1, 1)
End = c(2, 3)
Frequency = 3
         x1 x2
1.000000  1  1
1.333333  2  2
1.666667  1  3
2.000000  2  1
2.333333  1  2
2.666667  2  3
  
> aggregate(a.ts, FUN=sum)
Time Series:
Start = 1
End = 2
Frequency = 1
  x1 x2
1  4  6
2  5  6

Note that in the example, nfrequency and ndeltat both equal one.

You can find more information about aggregate() by entering ?aggregate at the R prompt.

The Functions table(), as.table(), and is.table()

There are three functions associated with creating tables using table(). The function table() creates a contingency table from atomic data or some lists. The data must be able to be interpreted as factors. The result has class table. The function as.table() attempts to coerce an object to class table. The function is.table() tests if an object is of class table.

The arguments to table() are . . ., exclude, useNA, dnn, and deparse.level.

The argument . . . refers to the object(s) that are to be cross-classified. The objects are separated by commas and, for atomic objects, must have same length. For list objects, the second level elements must all have the same length and be atomic. Atomic and list objects cannot be combined in a call to table().

The argument exclude gives values to be excluded from the contingency table. By default, exclude equals if(useNA==“no”) c(NA, NaA), which tells table() not to set a level for missing values or illegal values, such as one divided by zero, if the argument useNA equals “no”.

The argument useNA is a character argument and can take on the value “no”, “ifany”, or “always”. For “no”, no level is set for missing values. For “ifany”, a level is set if missing values are present. For “always”, a level for missing values is always set. The default level is “no”.

The argument dnn is a list argument and gives dimension names for the contingency table. The default value is list.names(. . .). The function list.names() is defined in table() and gives the names of the dimensions being tabulated.

The argument deparse.level is an integer argument that can take on the values of 0, 1, or 2. The argument controls list.names() if dnn is not given. For 0, no names are given. For 1, the column names are used. For 2, column names are deparsed. The default value is 1. An example follows:

> table(c(1,2,1,2),1:4, useNA="always", deparse.level=0)
       
       1 2 3 4 <NA>
  1    1 0 1 0    0
  2    0 1 0 1    0
  <NA> 0 0 0 0    0
  
> table(c(1,2,1,NA),1:4,c(5,6,6,5), useNA="no", deparse.level=1)
, ,  = 5
 
    1 2 3 4
  1 1 0 0 0
  2 0 0 0 0
 
, ,  = 6
 
    1 2 3 4
  1 0 0 1 0
  2 0 1 0 0
 
> table(c(1,2,1,NA),1:4,c(5,6,6,5), useNA="ifany", deparse.level=2)
, , c(5, 6, 6, 5) = 5
 
              1:4
c(1, 2, 1, NA) 1 2 3 4
          1    1 0 0 0
          2    0 0 0 0
          <NA> 0 0 0 1
 
, , c(5, 6, 6, 5) = 6
 
              1:4
c(1, 2, 1, NA) 1 2 3 4
          1    0 0 1 0
          2    0 1 0 0
          <NA> 0 0 0 0

Note that the first and last arrays have four non-zero elements, but the second array only has three since the NA is excluded.

The function as.table() takes the arguments x and . . .. The argument x is the object to be coerced to class table. The argument must be of mode numeric. The argument. . . provides any arguments for lower-level functions.

The function is.table() takes the argument x and returns TRUE if x is of class table and FALSE if not.

You can find more information about table(), as.table(), and is.table() by entering ?table() at the R prompt.

The Function tabulate()

The function tabulate() coerces numeric or factor objects to vectors and tabulates the result. The arguments are bin and nbins. The argument bin is the object to be binned. If the object is not an integer or factor object, then the elements are rounded down to integers. The resulting integers must be positive. If an illegal element is present, the element is ignored.

The argument nbins gives the largest integer to be binned and by default equals max(1, bin, na.rm=T)—that is, the largest value in bin, assuming the largest value in bin is larger than one.

If nbins is smaller than the largest value in bin, then only those values with a value less than or equal nbins are binned. All of the integers between one and nbins are binned even if there are zero elements in a given bin. The function creates a vector without labels. The bins always start with one. An example follows:

> tabulate(c(-3.5,.9,1,4,5.6,5.4,4,1,3))
[1] 2 0 1 2 2
  
> tabulate(c(-3.5,.9,1,4,5.6,5.4,4,1,3), nbins=3)
[1] 2 0 1

In the example, there are two ones, zero twos, one three, two fours, and two fives in the reduced object.

The function tabulate() is good when all of the bins, including those with zero elements, are needed. You can find more information about tabulate()by entering ?tabulate at the R prompt.

The Function ftable()

The function ftable() creates a matrix out of a contingency table—that is, a matrix that is a flat table. The arguments are . . ., exclude, row.vars, and col.vars. The argument . . . can be objects that can be coerced to a vector and that can be interpreted as factors. The argument can be a list whose elements can be interpreted as factors, or the argument can be of class table or ftable.

The argument exclude gives the values to be excluded when building the flat table. By default, exclude equals c(NA, NaN).

The arguments row.vars and col.vars give the dimensions to put in the rows and columns. The values can go from one to the number of dimensions in the table—in other words, a table with three dimensions can have row.vars and col.vars equal to 1:2 and 3; or 2:1 and 3; or 1 and 3; or c(3,1) and 2; and so forth. An example follows:

> a.list = list(1:2,3:4,5:6)
> ftable(a.list)
        x.3 5 6
x.1 x.2
1   3       1 0
    4       0 0
2   3       0 0
    4       0 1
  
> a1 = 1:2
> a2 = 3:4
> a3 = 5:6
> ftable(a1, a2, a3, row.vars=3, col.vars=2:1)
   a2 3   4
   a1 1 2 1 2
a3
5     1 0 0 0
6     0 0 0 1
 
> a.table = table(1:2,3:4,5:6)
> ftable(a.table, row.vars=2, col.vars=3)
   5 6
       
3  1 0
4  0 1

You can find more information about ftable() by entering ?ftable at the R prompt.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 11: Descriptive Functions and Manipulating Objects

Create new playlist

Sign In

Sign Up

Table of Contents for
CHAPTER 11: Descriptive Functions and Manipulating Objects