© Margot Tollefson 2019
Margot TollefsonR Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4405-0_16

16. The Packages base, stats, and graphics

Margot Tollefson1 
(1)
Stratford, IA, USA
 

In this chapter, we take a quick look at the packages base, stats, and graphics—three of the packages loaded by default in R. The package base contains things such as the trigonometric function and other mathematical functions, many of the as. and is. functions, the arithmetic operators, the flow control statements, some apply functions, and many other basic functions in R.

The package stats contains many basic statistical functions, such as functions to find the median, the standard deviation, and the variance. It also includes the functions associated with common probability distributions as well as many more statistical functions. The package graphics contains the basic plotting functions and their ancillary functions.

The other packages loaded by default are datasets, which contains data sets; utils, which contains utility functions; grDevices, which contains information used in plotting—such as fonts and colors; and methods, which contains functions and information for working with S4 (formal) methods and classes.

For a list of the functions in a package with clickable links to the function help pages, enter help(package="package.name") or library(help=package.name) at the R prompt, where package.name is the name of the package.

The source of the information in this chapter is the CRAN help pages.

The base Package

The base package contains many functions basic to R. The list of links to the help pages for base is 30 pages long. This section covers the reserved words, the built-in constants, the trigonometric and hyperbolic functions, the functions related to the beta and gamma functions, some other mathematical functions, and functions for complex numbers, matrix functions, and a few other functions. It also discusses some other functions for the package base.

Reserved Words

The reserved words in R are if, else, repeat, while, for, function, next, break, in, TRUE, FALSE, Inf, NULL, NA, NaN, NA_integer_, NA_real_, NA_complex_, NA_character_, ..., ..1, ..2, and so forth. See Table 16-1.
Table 16-1

The Reserved Words in R

if

else

repeat

while

for

in

next

break

function

TRUE

FALSE

Inf

NULL

NA

NAN

NA_integer_

NA_real_

NA_complex_

NA_character_

 

‘. . .’

‘. ._1’

‘. ._2’

. . . .

‘. . _n’

For more information, enter ?Reserved at the R prompt or use the Help tab in R Studio.

Built-In Constants

The built-in constants in R are LETTERS, which are the 26 letters in the English alphabet and which are capitalized; letters, which are the 26 letters in the English alphabet and which are lowercase; month.abb, which are three-letter abbreviations of the names of the months in English; month.name, which are the names of the months in English; and pi, the mathematical constant π. See Table 16-2 for a listing of the constants.
Table 16-2

The Built-In Constants in R

Constants

Description

LETTERS

the 26 capital letters

letters

the 26 lowercase letters

month.abb

the 12 names of the months abbreviated to three letters

month.name

the 12 names of the months

pi

π; 1/2 the circumference of a unit circle

You can find more information about the constants by entering ?Constants at the R prompt or by using the Help tab in R.

Trigonometric and Hyperbolic Functions

The trigonometric and hyperbolic functions available in R are the cosine—cos(), the cosine for which pi has been accounted—cospi(), the sine—sin(), the sine for which pi has been accounted—sinpi(), the tangent—tan(), the tangent for which pi has been accounted—tanpi(), the inverse cosine—acos(), the inverse sine—asin(), two versions of the inverse tangent—atan() and atan2(), the hyperbolic cosine—cosh(), the hyperbolic sine—sinh(), the hyperbolic tangent—tanh(), the inverse hyperbolic cosine—acosh(), the inverse hyperbolic sine—asinh(), and the inverse hyperbolic tangent—atanh().

Angles are entered into the functions as radians (radians = pi/180 x degrees), except for cospi(), sinpi(), and tanpi()—for which angles are entered as double fractions of a circle (degrees per 180°; that is, one equals 180°.) For the inverse functions, the angles are returned in radians (degrees = 180/pi x radians). The arguments must be of an atomic mode and logical, numeric, or complex—except for cospi(), sinpi(), and tanpi() which cannot be complex. Logical values are coerced to numeric.

For the inverse cosine and sine, the values must be between –1 and 1, inclusive. For other values, the result is NaN. For the inverse tangent, atan() takes one argument, and the result falls between –π/2 and π/2.

The function atan2() takes two arguments. The function returns the inverse tangent of the ratio of the two arguments, with the first argument being the numerator and the second the denominator. The function takes any number (real or complex) for the numerator and any number (real or complex) as the denominator. The arguments can be of different lengths and will cycle.

The function atan2() returns results between -π and π. The quadrant of the angle depends on signs of the numerator and the denominator, that is: (+,+) first quadrant; (+,–) second quadrant; (–,–) third quadrant; and (–,+) fourth quadrant. (By definition, the tangent of x, for any number x, is the sine of x divided by the cosine of x.) Zero in the denominator returns π/2 or –π/2 depending on the sign of the numerator.

The hyperbolic functions can also take on any number (real or complex). For the inverse of the hyperbolic functions, the argument for acosh() must be between 1 and ∞, inclusive, and the argument for atanh() must be between –1 and 1, inclusive.

Arguments can be vectors, matrices, data frames, or arrays. For arguments with more than one element, the operation is carried out element-wise. For atan2(), which takes two arguments, the arguments cycle. The functions return an object of the same dimensions as the argument(s) to the function.

See Table 16-3 for a listing of the functions, with restrictions.
Table 16-3

The Trigometric and Hyperbolic Functions

Function

R Function

Restrictions

cosine

cos(x)

logical, numeric, or complex; logical coerced to numeric

sine

sin(x)

see cosine

tangent

tan(x)

see cosine

cosine with pi

cospi(x)

logical or numeric; logical coerced to numeric

sine with pi

sinpi(x)

see cosine with pi

tangent with pi

tanpi(x)

see cosine with pi

inverse cosine

acos(x)

-1 ≤ x ≤ 1

inverse sine

asin(x)

see inverse cosine

inverse tangent

atan(x)

see cosine

      “”

atan2(y,x)

see cosine; inverse of tangent of y divided by x; maintains quadrant information

hyperbolic cosine

cosh(x)

see cosine

hyperbolic sine

sine(x)

see cosine

hyperbolic tangent

tanh(x)

see cosine

inverse hyperbolic cosine

acosh(x)

1 ≤ x ≤ ∞

inverse hyperbolic sine

asinh(x)

see cosine

inverse hyperbolic tangent

atanh(x)

-1 ≤ x ≤ 1

You can find more information about the trigonometric functions by entering ?Trig at the R prompt; for the hyperbolic functions, by entering ?cosh at the R prompt or using the Help tab in R Studio.

Beta- and Gamma-Related Functions

The functions related to the beta and gamma functions are beta(), lbeta(), gamma(), lgamma(), psigamma(), bigamma(), trigamma(), choose(), lchoose(), factorial(), and lfactorial(). In R, these functions are the Special functions. The arguments to these functions must be of the atomic mode and logical (which are coerced to numeric) or numeric. The function returns a result in the same form as the argument (the same dimensions). Arguments cycle.

The beta() and lbeta() functions take the arguments a and b, both of which must be non-negative, and return the value of the beta function or the natural logarithm of the value of the beta function, respectively. Negative numbers return NaN, with a warning.

The gamma(), lgamma(), psigamma(), digamma(), and trigamma() functions take the argument x, and for psigamma(), the argument deriv. The argument x can be any number, except for zero or the negative integers, for which NaNs are returned, with a warning. The functions gamma() and lgamma() return the value of the gamma function and the natural logarithm of the absolute value of the gamma function, respectively. The function psigamma() returns the derivative of the natural logarithm of the gamma function to the order given by deriv. The argument deriv must be an integer greater than or equal to zero. Otherwise, NaNs are returned, with a warning. By default, deriv equals zero. The function digamma() returns the value of the first derivative of the natural logarithm of the gamma function while trigamma() returns the second derivative.

The functions choose() and lchoose() return binomial coefficients and the natural logarithms of the absolute values of binomial coefficients, respectively. Both functions take the arguments n, which can be any real number, and k, which can be any real number and is rounded to an integer. Negative rounded numbers for k return 0. The function choose() is the familiar “n choose k” for n a positive integer and k a non-negative integer less than or equal to n.

The functions factorial() and lfactorial() return the factorial value and the natural logarithm of the absolute value of the factorial value, respectively. The functions take one argument, x. The value of x can be any real number (numeric or logical coerced to numeric). The factorial value is defined as
factorial(x) = gamma(x+1)

for any value of x and equals x! (that is, (x)(x-1)(x-2)...(2)(1)) for positive integer values of x. For x equal to zero, factorial(x) equals one. Negative integers return NaNs, with a warning.

See Table 16-4 for a listing of the functions. You can find more information about the functions by entering ?Special at the R prompt or by using the Help tab in R Studio.
Table 16-4

The Beta, Gamma, and Related Functions

Function

Function in R

Arguments

beta

beta(a, b)

a, b; both integers ≥ 0

natural log beta

lbeta(a, b)

see beta

gamma

gamma(x)

x, any real number; zero and negative integers return NaN

natural log of absolute value of gamma

lgamma(x)

x, any real number; zero and negative integers return Inf

nth derivative of natural log of gamma function where deriv equals n

psigamma(x, deriv=0)

x, any real number;

deriv, an integer ≥ 0; returns NaN’s where not defined

first derivative of natural log of gamma function

digamma(x)

x, any real number; returns NaN’s where not defined

second derivative of natural log of gamma function

trigamma(x)

see digamma

binomial coefficients

choose(n, k)

n, any real number

k, any real number; rounds to nearest integer, negative integers return 0

natural log absolute value binomial coefficients

lchoose(n, k)

see binomial coefficients

factorial

factorial(x)

x, any real number; factorial(x) equals gamma(x+1); negative integers return NaN

natural log absolute value factorial

lfactorial(x)

x, any real number; lfactorial(x) equals lgamma(x+1); negative integers return Inf

Miscellaneous Mathematical Functions

Some other mathematical functions include the following:
  • abs() for the absolute values of the elements of an object

  • sqrt() for the square roots of the elements of an object

  • ceiling() for rounding the elements of an object up to an integer

  • floor() for rounding the elements of an object down to an integer

  • trunc() for truncating the elements of an object to the decimal point

  • cummax() for the cumulative maximum over an atomic object

  • cummin() for the cumulative minimum over an atomic object

  • cumprod() for the cumulative product over an atomic object

  • cumsum() for the cumulative sum over an atomic object

  • exp() for e to the powers of the elements of an object

  • log(), log10(), and log2() for the logarithms of the elements of an object for a specified base (defaults to the natural logarithm), base 10, and base 2, respectively

  • max() for the maximum of the elements in an object, can be character

  • min() for the minimum of the elements in an object, can be character

  • pmax() for multiple vectors or matrices (will cycle)—returns the maximum across rows between objects

  • pmin() for multiple vectors or matrices (will cycle)—returns the minimum across rows between objects

  • sum() for the sum of the elements of an object

  • prod() for the product of the elements of an object

  • mean() for the mean of the elements of an object

  • range() for the range of the elements of an object

  • rank() for the ranks of the elements of an object

  • sign() for the signs of the elements of an object—returns 1 for positive numbers, –1 for negative numbers, and 0 for zeroes

  • order() for indices giving the order of the elements of an object; with more than one object, the order of the first object, using the second object for ties, and so forth; used to reorder vectors, matrices, data frames, and arrays; x[order(x)] equals sort(x)

  • sort() for sorting the elements of objects

  • zapsmall() for setting very small numbers to zero

Atomic vectors, matrices, arrays, and data frames of the legal modes can be used for these functions. The results of these functions are various types of objects, depending on the function.

See Table 16-5 for a listing of the functions with restrictions.
Table 16-5

Some Other Mathematical Functions

Function in R

Restrictions

abs(x)

logical, numeric, or complex objects; logical coerced to numeric; returns object of same dimensions

sqrt(x)

see abs(); negative real numbers return NaN

ceiling(x)

logical or numeric object; logical coerced to numeric; returns object of same dimensions

floor(x)

see ceiling()

trunc(x, ...)

x, logical or numeric object; logical coerced to numeric; returns object of same dimensions. . ., any arguments to be passed on to lower level functions called by trunc()

cummax(x)

raw, logical, numeric, or character object; will be coerced to numeric; character objects that are not a number in quotes return NAs; returns vector

cummin(x)

see cummax()

cumsum(x)

see cummax()

cumprod(x)

see cummax()

exp(x)

logical, numeric, or complex object; logical coerced to numeric; returns object of same dimensions

log(x, base=exp(1))

x, logical, numeric, or complex object; logical coerced to numeric; x ≥ 0; 0’s return –Inf; negative real numbers return NaN; returns object of same dimensionsbase, the base for the logarithm; numeric or complex—logical is legal but returns Inf for T and 0 for F; base ≥ 0

log2(x)

logical, numeric, or complex; logical coerced to numeric; x ≥ 0; 0’s return –Inf; negative real numbers return NaN; returns object of same dimensions

log10(x)

see log2()

max(..., na.rm=FALSE)

. . .,  logical, numeric, complex, and character objects separated by commas; do not need to be of the same length; can mix modes; returns a single value

na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA

min(..., na.rm=FALSE)

see max()

pmax(..., na.rm=FALSE)

. . .,  logical, numeric, and character objects separated by commas; do not need to be of the same length—cycle; can mix modes; returns a vector or matrix

na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA

pmin(..., na.rm=FALSE)

see pmax()

sum(..., na.rm=FALSE)

. . ., logical, numeric, and complex objects separated by commas; can mix modes; returns a single value

na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA; NaN similar but are treated differently for complex numbers

prod(..., na.rm=FALSE)

see sum()

mean(x, trim=0, na.rm=FALSE, ...)

x, logical, numeric, or complex object; returns a single value; for complex trim must equal zero

trim, 0 ≤ trim ≤ .5; is proportion of elements to trim before taking the mean

na.rm, logical; if an NA is present and na.rm is FALSE returns NA, if TRUE ignores NA; NaN the same

. . . any arguments to be passed to lower level functions called by mean()

range(..., na.rm=FALSE)

. . ., logical, numeric, and character objects separated by commas; can mix modes; returns two values

na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA; NaN the same

rank(x, na.last=TRUE, ties.method=c( "average", "first", "random", "max", "min"))

x, logical, numeric, complex, or character object

na.last, logical or character; if TRUE, NAs and NaNs are ranked last, if FALSE they are first, if NA they are discarded, if “keep” they keep their place in the order; NaNs return NAs; returns a vector

ties.method, character; method for setting a value for ties; the default is “average”

sign(x)

logical or numeric object; returns object of same dimensions

order(..., na.last=TRUE, decreasing=FALSE)

..., logical, numeric, complex or character vectors of the same length—can use just one vector—can mix modes; returns a permutation of indices of length equal to the length of the vector(s)

na.last, logical; for TRUE NAs are placed last, for FALSE NAs first, for NA NAs are removed

decreasing, logical; must be TRUE or FALSE; if TRUE order is decreasing, if FALSE increasing

sort(x, decreasing=FALSE, na.last=NA, ...)

x, logical, numeric, complex, or character object; sorts real and imaginary parts of complex separately; returns a vector

decreasing, logical; if TRUE sorts in decreasing order, if FALSE increasing; must be TRUE or FALSE

na.last, logical; if TRUE, NAs are put last, if FASLE, they are put first, if NA they are discarded; NaNs are put last

. . ., any arguments to be passed on to lower level functions called by sort()

zapsmall(x, digits=getOptions("digits"))

x, logical, numeric, or complex object; returns object of same dimensions

digits, numeric; will round to an integer

You can find more information about any of these functions by going to the help page of the function (?function.name, where function.name is the name of the function, or use the Help tab in R Studio.)

Complex Numbers

The following functions are for complex numbers :
  • Re(), the real part of a complex number

  • Img(), the imaginary part of a complex number

  • Arg(), the angle from the x axis in radians of the line between the origin and the complex number

  • Mod(), the modulus of a complex number; equals the length of the line between the origin and the complex number

  • Conj(), the complex conjugate of a complex number

The functions take logical, numeric, and complex objects for arguments. Logical arguments are coerced to numeric. The result has the same dimensions as the argument.

You can find more information about the complex functions by entering ?Re at the R prompt or by using the Help tab in R Studio.

Matrices, Arrays, and Data Frames

There are a number of functions for matrices , arrays, and data frames in base that we have not yet covered .

Some of the functions include the following:
  • aperm(), which permutes an array

  • rowsum(), which sums over rows of a matrix or data frame in groups set by the group variable

  • colMeans(), which returns the means of the columns of a data frame or matrix or the means for given dimensions for an array—going from the first dimension to the specified dimension

  • colSums(), which returns the sums of the columns of a data frame or matrix or the sums for an array—going from the first dimension to the specified dimension

  • rowMeans(), which returns the means of the rows of a data frame or matrix or the sums over dimensions of an array—going from the specified dimension plus one to the last dimension

  • rowSums(), which returns the sums of the rows or a data frame or matrix—going from the specified dimension plus one to the last dimension

  • col(), which returns a matrix of the same dimensions as the argument and which contains the column indices in the columns or a matrix of factors with each column one factor

  • row(), which returns a matrix of the same dimensions as the argument and which contains the row indices in the rows or a matrix of factors with each row one factor

  • det(), which returns the determinant of a matrix

  • determinant(), which returns the modulus or the logarithm of the modulus of the determinant and the sign of the modulus

  • eigen(), which returns the eigenvalues and eigenvectors of a matrix

  • kappa(), which calculates the condition of a square matrix

  • kronecker(), which returns the matrix or array which is the kronecker product of two objects and where product is a specified function. The two objects can be vectors, matrices, and/or arrays. The dimensions of the result are the products of the dimensions of the two objects.

  • norm(), which returns the norm of a matrix calculated by the one, infinity, Frobenius, maximum modulus, or spectral (or 2) method

Some functions used in model fitting are the following:
  • backsolve(), which solves a matrix equation where the matrix on the left of the equation is upper triangular

  • forwardsolve(), solves a matrix equation where the matrix on the left of the equation is lower triangular

  • chol(), the Choleski decomposition of a square positive definite matrix

  • chol2inv(), the inverse of a positive definite matrix using the Choleski decomposition of the matrix

  • qr(), the QR decomposition of a matrix

  • svd(), a singular value decomposition of a matrix.

See Table 16-6 for a listing of the functions with arguments.
Table 16-6

Some Functions for Matrices, Arrays, and Data Frames

Function in R

Restrictions

aperm(a, perm=NULL, resize=TRUE, ...)

a, matrix or array

perm, NULL, integer or character vector; gives order of the dimensions by index or character string; if not NULL must be of length equal to the dimensions of a and a permutation of the dimensions of a; NULL returns the dimensions reversed

resize, logical; must be TRUE or FALSE

..., any arguments to be passed to lower level functions

rowsum(x, group, reorder=TRUE, na.rm=FALSE, ...)

x, any numeric matrix

group, a vector or factor of length equal to the number of rows in x—used for grouping

reorder, logical; must be TRUE or FALSE

na.rm, logical; must be TRUE or FALSE

..., any arguments to be passed to or from lower level functions

colMeans(x,

na.rm=FALSE, dims=1)

x, logical, numeric or complex matrix, data frame, or array

na.rm, logical; must be TRUE or FALSE

dims, numeric; 1 ≤ dims ≤ n-1, where n is the number of dimensions

colSums(x,

na.rm=FALSE, dims=1)

see colMeans()

rowMeans(x,

na.rm=FALSE, dims=1)

see colMeans()

rowSums(x,

na.rm=FALSE, dims=1)

see colMeans()

col(x,

as.factor=FALSE)

x, any matrix

as.factor, logical; must be TRUE or FALSE

row(x,

as.factor=FALSE)

see col()

det(x, ...)

x, a logical or numeric square matrix; logical coerced to numeric

..., ignored

determinant(x, logarithm=TRUE, ...)

x, a logical or numeric square matrix; logical coerced to numeric

logarithm, logical; must be TRUE or FALSE

..., ignored

eigen(x, symmetric, only.values=FALSE, EISPACK=FALSE)

x, a logical, numeric, or complex square matrix; logical coerced to numeric

symmetric, logical; if TRUE matrix is assumed symmetric, if FALSE not

only.values, logical; if TRUE only eigenvalues are returned, if FALSE both eigenvalues and eigenvectors are returned

EISPACK, logical; defunct and ignored

kappa(z, exact=FALSE, norm=NULL, method= c("qr", "direct"),   ..)

z, logical or numeric square matrix; logical coerced to numeric

exact, logical; must be TRUE or FALSE

norm, character; must be NULL, “O”, or “I”—for norm one and norm infinite

method, character; must be “qr” or “direct”; default is “qr”

..., any arguments to lower level functions

kronecker(X, Y,  FUN="*", make.names=FALSE, ...)

X, Y, vectors, matrices, and arrays; do not have to be of the same mode; must be legal for the function FUN

FUN, a function; can be a character string

make.names, logical; must be TRUE or FALSE; does not work with all functions

...,  any arguments for the function FUN

norm(x, type=  c("O","I","F","M","2")

x, logical, numeric, or complex matrix; logical and complex are coerced to numeric

type, character; default value is “O”

backsolve(r, x, k=ncol(r), upper.tri=TRUE, transpose=FALSE)

r, upper triangular matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric

x, vector or matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric

k, numeric—rounds down to an integer; 1 ≤ k ≤ ncol(r); is the number of columns in ‘r’ to use

upper.tri, logical; for TRUE the upper triangle is used, for FALSE, the lower is used

transpose, logical; for TRUE r is transposed in the formula

forwardsolve(l, x, k=ncol(l), upper.tri=FALSE, transpose=FALSE)

l, lower triangular matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric

x, a vector or matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric

k, numeric—rounds down to an integer; 1 ≤ k ≤ ncol(l); the number of columns in ‘l’ to use

upper.tri, logical; for TRUE the upper triangle is used, for FALSE, the lower is used

transpose, logical; for TRUE l is transposed in the formula

chol(x, pivot=FALSE, LINPACK=FALSE, tol=-1, ...)

x, raw, logical, or numeric matrix—where raw and logical matrices are coerced to numeric; must be square and positive definite

pivot, logical; for TRUE pivot, FALSE do not pivot

LINPACK, (deprecated) logical; for TRUE use LINPACK, FALSE do not use LINPACK

tol, numeric; tolerance when pivot=TRUE and LINPACK=FALSE

..., any arguments to be passed to lower level functions

chol2inv(x, size=NCOL(x), LINPACK=FALSE)

x, matrix for which the first size columns are a Choleski decomposition

size, numeric, logical, or complex—logical and complex coerced to numeric; 1 ≤ size ≤ ncol(x)

LINPACK, logical; defunct—no longer used

qr(x, tol=1e-7, LAPACK=FALSE, ...)

x, logical, numeric, or complex matrix; logical matrices are coerced to numeric

tol, numeric; tolerance for singularity

LAPACK, logical; if FALSE qr() uses

LINPACK

..., any arguments to be passed to lower level functions

svd(x, nu=min(n,p), nv=min(n,p), LINPACK=FALSE)

x, logical, numeric, or complex matrix; logical matrices are coerced to numeric

nu, integer; 0 ≤ nu ≤ n; n = nrow(x)

nv, integer; 0 ≤ nv ≤ p; p = ncol(x)

LINPACK, logical; defunct and ignored

You can find more information by going to the individual help pages (?function.name, where function.name is the name of the function) or by using the Help tab in R Studio.

A Few Other Functions and Some Comments

  • A few other functions that are often useful are R.home(), R.Version(), all.equal(), Identical(), dir(), getwd(), setwd(), unique(), hexamode(), jitter(), append(), duplicated() (and anyDuplicated()), attr() (and attributes()), pretty(), margin.table(), prop.table(), cut(), rev(), readline(), system(), try(), warnings(), and stop(). For the functions, we will just describe what they do. You can find more information about the functions by entering ?‘function.name at the R prompt, where function.name is the name of the function or by using the Help tab in R Studio..

Following are the function descriptions:
  • R.home() gives the full path to the directory containing the R program.

  • R.Version() gives the R version and other information about the version.

  • all.equal() tests if two objects are nearly equal.

  • Identical() tests if two objects are identically equal.

  • dir() returns the contents of a directory on the hard drive.

  • getwd() returns the working directory on the hard drive.

  • setwd() sets the working directory on the hard drive.

  • unique() returns a vector with any duplicated elements in the original vector removed. The function only works on vectors, including vectors of mode list.

  • hexmode() returns the hexadecimal value of a number.

  • jitter() adds a little jitter (noise) to the elements of numeric objects. The arguments to jitter() control how much jitter is added.

  • append() is used to append vectors. An argument to append() gives where along the vector the appending is done.

  • duplicated() and anyDuplicated() look for duplicates. For vectors, including lists, duplicated() returns a vector of the same length containing FALSE for elements that are not duplicated and for the first instance of elements that are duplicated. The function returns TRUE for the rest of the duplicates. For matrices and data frames, rows are compared. The function anyDuplicated() counts how many differing elements have duplicates, or duplicated rows for matrices and data frames.

  • attr() and attributes() return an attribute or a list of the attributes of an object. To use an attribute, the function attr() returns a value that can be accessed. To see a list of the attributes of an object, use attributes().

  • pretty() takes any object that can be coerced to numeric and returns a vector of evenly spaced values close to a given length and similar to the values in the original object.

  • margin.table() takes a logical, numeric, or complex object and returns margin sums for a margin in a table.

  • prop.table() takes a logical, numeric, or complex object and returns the object divided by the sum of the elements in the object. Logical objects are coerced to numeric and the real and imaginary parts of complex objects are treated separately.

  • cut() cuts a numeric vector into factors and returns a character vector with the factor names in the place of the original elements. The object to be cut can be any object that can be coerced to vector, but must be numeric. The break points and factor names can be assigned, but cut() creates break points and factor names from the break points by default.

  • rev() reverses the order of the elements of an object and returns a vector. The object can be atomic or of any mode where reversing the order makes sense, like the modes list, expression, and call.

  • readline() reads a line from the console—for interactive use of an R function.

  • system() runs a system command from inside R—the command is entered in quotes.

  • try() attempts to execute a expression or function—returns an error message or the result of the execution. Errors do not stop the program.

  • warnings() returns the warning messages if a program has run with warnings.

  • stop() tells R to stop the execution of a function. If stop() has a character string for an argument, the character string prints when stop() executes. The function is very useful for the process of debugging a function as well as for checking if conditions are met for objects entered into a function.

  • gc() garbage collection—cleans up the session.

There are many other functions in base, many of which have to do with the running of R. The as. and is. functions are prevalent. In the list of help pages, there are 110 links for as. functions and 46 links for is. functions. If you are interested in what is in the listings, go to the page of the links and look at what is there. The Bessel functions and bitwise logical functions are also part of base.

The stats Package

The stats package contains items such as basic descriptive statistics, probability distributions, tests, functions to fit models, clustering functions, some plotting functions, and other functions used for outputting results. The list of links to the help pages for stats is 18 pages long (help(package=stats)). In this chapter, we cover the basic descriptive statistics, the tests, clustering and other functions for multivariate data, and modeling functions, but in little detail. The probability distributions can be found in Chapter 9.

Basic Descriptive Statistics

Some of the basic statistical functions in package stats include the following:
  • weighted.mean(), which finds the weighted mean of an object

  • sd(), which finds the standard deviation of an object

  • var(), which finds the variance of a vector or the covariance matrix of a matrix or data frame

  • cov(), which finds the covariance matrix of a matrix or data frame—more flexible than var()

  • cov.wt(), which finds the weighted covariance or correlation matrix of a matrix or data frame

  • cor(), which finds the correlation between vectors or within matrices and data frames

  • median(), which finds the median of the elements of an object

  • mad(), which finds the median absolute deviation of the elements of an object

  • IQR(), which finds the interquartile range of the elements of an object

  • quantile(), which finds specific quantiles of the elements in an object

  • fivenum(), which finds Tukey’s five-number summary for the elements in an object

  • ave(), which uses a function to operate on different rows of an object based on factor values

  • cancor(), which finds the canonical correlation between two matrices

  • dist(), which finds a type of average difference between the rows of a matrix, based on the type of distance and the power used to find the average

  • mahalanobis(), which finds the Mahalanobis distance between rows of a matrix

  • ecdf(), which finds the empirical cumulative distribution function of the elements in an object—a quantile method exists for the function

  • r2dtable(), which creates a random two-way table based on marginal values—using Patefield’s algorithm

  • simulate(), which simulates observations from a model that has been fitted

  • TukeyHSD(), which finds confidence intervals for the coefficients of a model that take into account that more than one hypothesis is being tested—for analysis of variance models

  • xtabs(), which creates a contingency table based on a formula

  • smooth(), which creates a smoother version of a noisy set of data using Tukey’s running median smoothers—usually used for time series

See Table 16-7 for a listing of the functions, with arguments.
Table 16-7

Basic Statistical Functions in Package stats

Function in R

Description

weighted.mean(x, w, ..., na.rm=FALSE)

Finds the weighted mean of x, where x is coerced to a vector.

sd(x, na.rm=FALSE)

Finds the standard deviation x, where x is coerced to a vector; divides by the square root of (n-1).

var(x, y=NULL, na.rm=FALSE, use)

Finds the variance of x if x is a vector or the covariance of x and y or the covariance matrix of x if x is a matrix or data frame; divides by (n-1)

cov(x, y=NULL, use="everything", method=c("pearson", "kendall", "spearman"))

Finds the covariance between x and y if y is given or the covariance matrix of x if x is a matrix or data frame; more options are available than with var( )

cov.wt(x, wt=rep(1/nrow(x), nrow(x)), cor=FALSE, center=TRUE, method=c("unbiased", "ML"))

Finds the weighted covariance matrix or weighted correlation matrix of x, where x is a matrix or data frame

cor(x, y=NULL, use="everything", method=c("pearson", "kendall", "spearman"))

Finds the correlation between x and y if y is supplied or within x if just x is supplied, where x is a vector, matrix, or data frame

median(x, na.rm=FALSE)

Finds the median of the elements of x

mad(x, center=median(x), constant=1.4826, na.rm=FALSE, low=FALSE, high=FALSE)

Finds the median absolute deviation of x

IQR(x, na.rm=FALSE, type=7)

Finds the interquartile range of x

quantile(x, probs=seq(0,1,.25), na.rm=FALSE, names=TRUE, type=7, ...)

Finds the quantiles of x for the values of probs

fivenum(x, na.rm=FALSE)

Finds Tukey’s five-number summary for x

ave(x, ..., FUN=mean)

The function in FUN operates on groups of the elements of x, where the grouping variables are in the argument ...

cancor(x, y,  xcenter=TRUE, ycenter=TRUE)

Finds canonical correlation between the matrices x and y

dist(x, method="euclidean", diag=FALSE, upper=FALSE, p=2)

Finds distance between rows of a matrix, where the type of distance is specified by method

mahalanobis(x, center, cov, inverted=FALSE)

Finds the Mahalanobis distance between rows of a matrix

ecdf(x)

Finds the empirical cumulative distribution function of x

r2dtable(n, r, c)

Creates a random table based on marginal totals for the rows and columns

simulate(x, nsim=1, seed=NULL, ...)

Simulates observations from the model given in x; x is a model

TukeyHSD(x, which, order=FALSE, conf.level=0.95, ...)

Tukey’s honest significant differences for analysis of variance models

xtabs(formula=~., data=parent.frame(), subset, sparse=FALSE, na.action, exclude=c(NA,NaN), drop.unused.levels=FALSE)

Creates a contingency table based on the formula, where the variables on the right side of the formula are used to group the object on the left

smooth(x, kind=c("3RS3R", "3RSS", "3RSR", "3R", "3S", "3", "S"), twiceit=FALSE, endrule="Tukey", do.ends=FALSE)

Smooths a vector or time series using Tukey’s running median smoothers

You can find more information about the functions by entering ? function.name at the R prompt where function.name is the name of the function or by using the Help tab in R Studio.

Some Functions That Do Tests

There are a number of functions in stats that do hypothesis tests. Some of the functions include the following:
  • ansari.test() for the Ansari-Bradley test for testing for a difference between the scale parameters of two samples

  • bartlett.test() for the homogeneity of variances

  • binomial.test() for exact tests using the binomial distribution

  • Box.test() for the Box-Pierce and Ljug-Box tests—used in time series to test for independence

  • chisq.test() for testing count data using Pearson’s test

  • cor.test() for correlations in paired samples

  • fisher.test() for contingency tables using Fisher’s exact test

  • fligner.test() for the Fligner-Killeen test for homogeneity of variances

  • friedman.test() for the Friedman rank sum test

  • kruskal.test() for the Kruskal-Wallis rank sum test

  • ks.test() for the Kolmogorov-Smirnov tests on one or two samples

  • mantelhaen.test() for the Cochran-Mantel-Haenszel chi squared test for count data

  • mauchly.test() for the test of sphericity developed by Mauchly

  • mcnemar.test() for the chi squared test for count data developed by McNemar

  • mood.test() for the two sample tests of scale developed by Mood

  • oneway.test() for testing for equal means if the layout is one way

  • pairwise.prop.test() for comparing proportions pairwise

  • pairwise.t.test() for comparing t tests pairwise

  • pairwise.wilcox.test() for comparing Wilcox rank sum tests pairwise

  • poisson.test() for an exact test using the Poisson distribution

  • power.anova.test() to find powers for a balanced one-way analysis of variance

  • power.prop.test() to find the powers for comparing two proportions

  • power.t.test() for the powers in one and two sample t tests

  • PP.test() for the Phillops-Perron test to test for unit roots in time series data

  • prop.test() for testing proportions

  • prop.trend.test() for testing trend in proportions

  • quade.test() for the Quade test

  • shapiro.test() for the Shapiro-Wilk test for normality

  • t.test() for doing a t test

  • var.test() for an F test to compare two variances

  • wilcox.test() for Wilcoxon rank sum and sign tests

The tests are listed with arguments in Table 16-8.
Table 16-8

Some Tests in stats

Test

ansari.test(x, y, alternative=c(“two-sided”, “less”, “greater”), exact=NULL, conf.int=FALSE, conf.level=0.95,  . . . )

bartlett.test(x, g, ...)

biniom.test(x, n, p=0.5, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95)

Box.test(x, lag=1, type=c(“Box-Pierce”, “Ljung-Box”), fitdf=0)

chisq.test(x, y=NULL, correct=TRUE, p=rep(1/length(x), length(x)), rescale.p=FALSE, B=2000)

cor.test(x, y, alternative=c(“two.sided”, “less”, “greater”), method=c(“pearson”, “kendall”, “spearman”), exact=NULL, conf.level=0.95, continuity=FALSE,  . . . )

fisher.test(x, y=NULL, workspace=200000, hybrid=FALSE, control=list(), or=1, alternative=“two.sided”, conf.int=TRUE, conf.level=0.95, simulate.p.value=FALSE, B=2000)

fligner.test(x, g,  . . . )

friedman.test(y, groups, blocks,  . . . )

kruskal(x, g,  . . . )

ks.test(x, y,  . . . , alternative=c(“two-sided”, “less”, “greater”), exact=NULL)

mantelhaen.test(x, y=NULL, z=NULL, alternative=c(“two.sided”, “less”, “greater”), correct=T, exact=F, conf.level=0.95)

mauchly.test(object,  . . . )

mcnemar.test(x, y=NULL, correct=TRUE)

mood.test(x, y, alternative=c(“two.sided”, “less”, “greater”),  . . . )

oneway.test(formula, data, subset, na.action, var.equal=FALSE)

pairwise.prop.test(x, n, p.adjust.method=p.adjust.methods,  . . . )

pairwise.t.test(x, g, p.adjust.method=p.adjust.methods, pool.sd=!paired, paired=FALSE, alternative=c(“two.sided”, “less”, “greater”),  . . .  )

pairwise.wilcox.test(x, g, p.adjust.method=p.adjust.methods, paired=FALSE,  . . . )

poisson.test(x, T=1, r=1, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95)

power.anova.test(groups=NULL, n=NULL, between.var=NULL, within.var=NULL, sig.level=0.05, power=NULL)

power.prop.test(n=NULL, p1=NULL, p2=NULL, sig.level=0.05, power=NULL, alternative=c(“two-sided”, “one.sided”), strict=FALSE)

power.t.test(n=NULL, delta=NULL, sd=1, sig.level=0.05, type=c(“two.sample”, “one.sample”, “paired”), alternative=c(“two.sided”, “one.sided”), strict=FALSE)

PP.test(x, lshort=TRUE)

prop.test(x, n, p=NULL, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95, correct=TRUE)

prop.tend.test(x, n, score=seq_along(x))

quade.test(y,  . . . )

shapiro.test(x)

t.test(x, y=NULL, alternative=c(“two-sided”, “less”, “greater”),  mu=0, paired=FALSE, var.equal=FALSE, conf.level=0.95,  . . .  )

var.test(x, y, ratio=1, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95,  . . . )

wilcox.test(x, y=NULL, alternative=c(“two-sided”, “less”, “greater”), mu=0, paired=FALSE, exact=NULL, correct=TRUE, conf.int=FALSE, conf.level=0.95,  . . . )

For more information about any of the tests, enter ? function.name at the R prompt where function.name is the name of the function or use the Help tab in R Studio.

Some Modeling Functions in stats

There are a number of functions in stats that do modeling, including the following:
  • acf() to estimate autocorrelation and autocovariance in time series

  • acf2AR() to exactly fit an autoregressive model to an autocorrelation function

  • add1() to find those single terms that can be added or dropped from a model, fit the models, and tabulate the results of the fitting

  • AIC() and BIC() to find the Akaike’s ‘An Information Criterion’ or the ‘Schwartz Bayesian criterion’ for an appropriate model

  • aov() to fit an analysis of variance model

  • approx() and approxfun() to do linear interpolation

  • ar() to fit a time series autoregressive model

  • arima() to fit an autoregressive integrated moving average to time series data

  • arima.sim() to do simulations from an ARIMA model

  • ccf() to estimate cross correlation and cross covariance for two time series

  • complete.cases() to find complete cases for a sequence of vectors, matrices, or data.frames

  • contrasts() to set or get contrasts for a factor object

  • cpgram() to plot a cumulative periodogram for time series data

  • decompose() to decompose seasonal patterns using moving average

  • density() for kernel density estimation

  • ecdf() for the empirical cumulative distribution function

  • fft() for fast discrete fourier transforms for time series data

  • filter() for linear filtering of time series

  • glm() to fit a generalized linear model

  • isoreg() isotonic or monotone regression

  • KalmanForcast(), KalmanLike(), KalmanRun(), KalmanSmooth(), and makeARIMA() for Kalman filtering

  • ksmooth() to smooth using a kernel smoother

  • line() to fit a line robustly—based on Tukey’s Exploratory Data Analysis

  • lm() to fit a linear model

  • loess() to fit a local polynomial model

  • loglin() to fit a loglinear model

  • lsfit() to fit a least squared linear model with one explanatory variable

  • manova() to fit multiple analysis of variance models

  • medpolish() for a median polish of a matrix

  • mvfft() for fast discrete fourier transforms for matrices

  • nlm() to find a minimum of a nonlinear model

  • nls() to fit a nonlinear least squares model

  • optim(), optimHess(), optimise(), and optimize() to optimize a function

  • pacf() to estimate partial autocovariances and autocorrelations for a time series

  • poly() and polym() to create orthogonal polynomials of the desired degree

  • ppr() to fit a projection pursuit regression model

  • profile() to profile models—generic function

  • smooth.spline() to fit a smooth spline model

  • spec() to find the spectral density for time series data

  • step() to use the AIC to choose a model using a stepwise algorithm

  • stl() to use the loess method to seasonally decompose a time series

  • StrucTS() to fit a structural time series model

  • supsmu() for Friedman’s super smoother

  • update() for updating a model

There are many functions in stats that support the modeling functions, which we do not cover. You can find more information at the help pages for the individual functions: enter ?function.name at the R prompt where function.name is the name of the function or use the Help tab in R Studio.

Clustering Algorithms and Other Multivariate Techniques

Some of the functions used in multivariate analysis for clustering and working with multivariate data are the following:
  • cmdscale() for classical multidimensional scaling

  • cophenetic() for cophenetic distances in hierarchical clustering

  • cut.dendrogram() for a general tree structure

  • cutree() for cutting a tree into groups

  • dendrapply() to apply a function to all nodes of a dendrogram

  • as.dendrogram() to give an appropriate object the class dendrogram

  • factanal() for factor analysis

  • hclust() for hierarchical clustering

  • identify.hclust() to identify clusters

  • kmeans() for k means clustering

  • labels.dendrogram() gives the ordering of or the labels of the leaves on a dendrogram

  • loadings() printing loadings from a factor analysis

  • merge.dendrogram() merges two dendrograms

  • order.dendrogram() gives the ordering or the labels of the leaves of a dendrogram

  • prcomp() does principal components analysis

  • princomp() also does principal component analysis

  • promax() used for rotation of axes in factor analysis

  • reorder.dendrogram() for reordering a dendrogram maintaining the initial constraints

  • rev.dendrogram() reverses the order of the nodes in a dendrogram

  • str.dendrogram() displays the internal structure of a dendrogram

  • varimax() used for rotation of axes in factor analysis

For more information about any of the functions, enter ?‘function.name at the R prompt where function.name is the name of the function or use the Help tab in R Studio.

The package stats also contains several probability distributions (see Chapter 9); eight as. functions; six is. functions; a number of plotting functions—like heatmap() and 19 plot. functions—which are specific for many of the classes associated with modeling functions; functions used in kernel estimation; ancillary functions for models—like the seven model. functions; seven na. functions—to handle missing data; 13 predict.—functions for model output, 27 print. functions for printing output; and nine summary. functions for summarizing output.

The graphics Package

The package graphics contains the function plot()—for which the many plot. methods are written. The ancillary functions for plot() are in graphics. There are also several plotting functions for specific types of plots—like histograms and bar charts. The list of links to the help pages for graphics is three pages long (help(package=graphics)). In this section, we cover the specific types of plots and a few other functions related to plotting.

Following are the functions in graphics that do specific types of plots:
  • assocplot() for a Cohen-Friendly association plot; used for contingency tables; will work with any matrix that is logical or numeric

  • barplot() for a bar plot; takes vector or matrix objects, which are of mode logical or numeric, for the heights of the bars

  • boxplot() for box plots; logical or numeric vectors, matrices, arrays, data frames, and some lists can be used as input to the function

  • bxp() for box plots of summaries

  • cdplot() for a conditional density plot

  • coplot() for scatter plots using a conditioning variable

  • dotchart() for a Cleveland’s dot plot; numeric vectors and matrices can be used for the plot

  • fourfoldplot() for a four fold plot of 2 x 2 x k contingency tables

  • hist() for histograms; gives histograms for numeric vectors, matrices, and arrays

  • mosaicplot() for mosaic plots; takes numeric or logical arguments that are vectors, matrices, data frames, or arrays; is meant for contingency tables

  • pairs() for scatter plots of paired variables; takes numeric vectors, matrices, and data frames as input; creates a matrix of plots

  • persp() for a perspective plot; does three-dimensional plotting

  • pie() for pie charts; use numeric vectors, matrices, and arrays as input

  • smoothScatter() for a smoothed version of scatter plots—which are colored; is copyrighted by M. P. Wand

  • spineplot() for spine plots; use a logical, numeric, or complex matrix as input to the plot; logical and complex matrices are coerced to numeric; was developed for two-way contingency tables

  • stars() for star or segment plots; use a numeric matrix or data frame for the input to the plot

  • stem() for a stem and leaf plot; use a numeric vector, matrix, or array as the input to the plot

  • stripchart() for a one dimensional scatter plot

  • sunflowerplot() for a sunflower plot, which is a scatter plot in which points with duplicates have sunflower leaves for the duplicated points; use a logical, numeric, or complex vector, matrix, or data frame for the input to the plot

There are also some functions in graphics that control the screen for plotting functions. The function splitscreen() and its ancillary functions close.screen(), erase.screen(), and screen() are used to split the plotting screen into regions and to plot to the regions. The functions frame() and plot.new() open a new frame for plotting.

The function par() is like options()—except for plotting—and contains the default options for plots. The options can be changed at any time. Calling par() opens a new plotting frame. To see the list of options, call par() with no arguments.

The function plot() is the basic plotting function and has a numbers of ancillary functions and is defined for quite a few methods. We do not cover plot() in this book.

You can find more information about the functions in graphics by entering ? function.name at the R prompt where function.name is the name of the function or by using the Help tab in R Studio.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.37.136