In this chapter, we take a quick look at the packages base, stats, and graphics—three of the packages loaded by default in R. The package base contains things such as the trigonometric function and other mathematical functions, many of the as. and is. functions, the arithmetic operators, the flow control statements, some apply functions, and many other basic functions in R.
The package stats contains many basic statistical functions, such as functions to find the median, the standard deviation, and the variance. It also includes the functions associated with common probability distributions as well as many more statistical functions. The package graphics contains the basic plotting functions and their ancillary functions.
The other packages loaded by default are datasets, which contains data sets; utils, which contains utility functions; grDevices, which contains information used in plotting—such as fonts and colors; and methods, which contains functions and information for working with S4 (formal) methods and classes.
For a list of the functions in a package with clickable links to the function help pages, enter help(package="package.name") or library(help=package.name) at the R prompt, where package.name is the name of the package.
The source of the information in this chapter is the CRAN help pages.
The base Package
The base package contains many functions basic to R. The list of links to the help pages for base is 30 pages long. This section covers the reserved words, the built-in constants, the trigonometric and hyperbolic functions, the functions related to the beta and gamma functions, some other mathematical functions, and functions for complex numbers, matrix functions, and a few other functions. It also discusses some other functions for the package base.
Reserved Words
The Reserved Words in R
if | else | repeat | while | for |
---|---|---|---|---|
in | next | break | function | TRUE |
FALSE | Inf | NULL | NA | NAN |
NA_integer_ | NA_real_ | NA_complex_ | NA_character_ | |
‘. . .’ | ‘. ._1’ | ‘. ._2’ | . . . . | ‘. . _n’ |
For more information, enter ?Reserved at the R prompt or use the Help tab in R Studio.
Built-In Constants
The Built-In Constants in R
Constants | Description |
---|---|
LETTERS | the 26 capital letters |
letters | the 26 lowercase letters |
month.abb | the 12 names of the months abbreviated to three letters |
month.name | the 12 names of the months |
pi | π; 1/2 the circumference of a unit circle |
You can find more information about the constants by entering ?Constants at the R prompt or by using the Help tab in R.
Trigonometric and Hyperbolic Functions
The trigonometric and hyperbolic functions available in R are the cosine—cos(), the cosine for which pi has been accounted—cospi(), the sine—sin(), the sine for which pi has been accounted—sinpi(), the tangent—tan(), the tangent for which pi has been accounted—tanpi(), the inverse cosine—acos(), the inverse sine—asin(), two versions of the inverse tangent—atan() and atan2(), the hyperbolic cosine—cosh(), the hyperbolic sine—sinh(), the hyperbolic tangent—tanh(), the inverse hyperbolic cosine—acosh(), the inverse hyperbolic sine—asinh(), and the inverse hyperbolic tangent—atanh().
Angles are entered into the functions as radians (radians = pi/180 x degrees), except for cospi(), sinpi(), and tanpi()—for which angles are entered as double fractions of a circle (degrees per 180°; that is, one equals 180°.) For the inverse functions, the angles are returned in radians (degrees = 180/pi x radians). The arguments must be of an atomic mode and logical, numeric, or complex—except for cospi(), sinpi(), and tanpi() which cannot be complex. Logical values are coerced to numeric.
For the inverse cosine and sine, the values must be between –1 and 1, inclusive. For other values, the result is NaN. For the inverse tangent, atan() takes one argument, and the result falls between –π/2 and π/2.
The function atan2() takes two arguments. The function returns the inverse tangent of the ratio of the two arguments, with the first argument being the numerator and the second the denominator. The function takes any number (real or complex) for the numerator and any number (real or complex) as the denominator. The arguments can be of different lengths and will cycle.
The function atan2() returns results between -π and π. The quadrant of the angle depends on signs of the numerator and the denominator, that is: (+,+) first quadrant; (+,–) second quadrant; (–,–) third quadrant; and (–,+) fourth quadrant. (By definition, the tangent of x, for any number x, is the sine of x divided by the cosine of x.) Zero in the denominator returns π/2 or –π/2 depending on the sign of the numerator.
The hyperbolic functions can also take on any number (real or complex). For the inverse of the hyperbolic functions, the argument for acosh() must be between 1 and ∞, inclusive, and the argument for atanh() must be between –1 and 1, inclusive.
Arguments can be vectors, matrices, data frames, or arrays. For arguments with more than one element, the operation is carried out element-wise. For atan2(), which takes two arguments, the arguments cycle. The functions return an object of the same dimensions as the argument(s) to the function.
The Trigometric and Hyperbolic Functions
Function | R Function | Restrictions |
---|---|---|
cosine | cos(x) | logical, numeric, or complex; logical coerced to numeric |
sine | sin(x) | see cosine |
tangent | tan(x) | see cosine |
cosine with pi | cospi(x) | logical or numeric; logical coerced to numeric |
sine with pi | sinpi(x) | see cosine with pi |
tangent with pi | tanpi(x) | see cosine with pi |
inverse cosine | acos(x) | -1 ≤ x ≤ 1 |
inverse sine | asin(x) | see inverse cosine |
inverse tangent | atan(x) | see cosine |
“” | atan2(y,x) | see cosine; inverse of tangent of y divided by x; maintains quadrant information |
hyperbolic cosine | cosh(x) | see cosine |
hyperbolic sine | sine(x) | see cosine |
hyperbolic tangent | tanh(x) | see cosine |
inverse hyperbolic cosine | acosh(x) | 1 ≤ x ≤ ∞ |
inverse hyperbolic sine | asinh(x) | see cosine |
inverse hyperbolic tangent | atanh(x) | -1 ≤ x ≤ 1 |
You can find more information about the trigonometric functions by entering ?Trig at the R prompt; for the hyperbolic functions, by entering ?cosh at the R prompt or using the Help tab in R Studio.
Beta- and Gamma-Related Functions
The functions related to the beta and gamma functions are beta(), lbeta(), gamma(), lgamma(), psigamma(), bigamma(), trigamma(), choose(), lchoose(), factorial(), and lfactorial(). In R, these functions are the Special functions. The arguments to these functions must be of the atomic mode and logical (which are coerced to numeric) or numeric. The function returns a result in the same form as the argument (the same dimensions). Arguments cycle.
The beta() and lbeta() functions take the arguments a and b, both of which must be non-negative, and return the value of the beta function or the natural logarithm of the value of the beta function, respectively. Negative numbers return NaN, with a warning.
The gamma(), lgamma(), psigamma(), digamma(), and trigamma() functions take the argument x, and for psigamma(), the argument deriv. The argument x can be any number, except for zero or the negative integers, for which NaNs are returned, with a warning. The functions gamma() and lgamma() return the value of the gamma function and the natural logarithm of the absolute value of the gamma function, respectively. The function psigamma() returns the derivative of the natural logarithm of the gamma function to the order given by deriv. The argument deriv must be an integer greater than or equal to zero. Otherwise, NaNs are returned, with a warning. By default, deriv equals zero. The function digamma() returns the value of the first derivative of the natural logarithm of the gamma function while trigamma() returns the second derivative.
The functions choose() and lchoose() return binomial coefficients and the natural logarithms of the absolute values of binomial coefficients, respectively. Both functions take the arguments n, which can be any real number, and k, which can be any real number and is rounded to an integer. Negative rounded numbers for k return 0. The function choose() is the familiar “n choose k” for n a positive integer and k a non-negative integer less than or equal to n.
for any value of x and equals x! (that is, (x)(x-1)(x-2)...(2)(1)) for positive integer values of x. For x equal to zero, factorial(x) equals one. Negative integers return NaNs, with a warning.
The Beta, Gamma, and Related Functions
Function | Function in R | Arguments |
---|---|---|
beta | beta(a, b) | a, b; both integers ≥ 0 |
natural log beta | lbeta(a, b) | see beta |
gamma | gamma(x) | x, any real number; zero and negative integers return NaN |
natural log of absolute value of gamma | lgamma(x) | x, any real number; zero and negative integers return Inf |
nth derivative of natural log of gamma function where deriv equals n | psigamma(x, deriv=0) | x, any real number; deriv, an integer ≥ 0; returns NaN’s where not defined |
first derivative of natural log of gamma function | digamma(x) | x, any real number; returns NaN’s where not defined |
second derivative of natural log of gamma function | trigamma(x) | see digamma |
binomial coefficients | choose(n, k) | n, any real number k, any real number; rounds to nearest integer, negative integers return 0 |
natural log absolute value binomial coefficients | lchoose(n, k) | see binomial coefficients |
factorial | factorial(x) | x, any real number; factorial(x) equals gamma(x+1); negative integers return NaN |
natural log absolute value factorial | lfactorial(x) | x, any real number; lfactorial(x) equals lgamma(x+1); negative integers return Inf |
Miscellaneous Mathematical Functions
abs() for the absolute values of the elements of an object
sqrt() for the square roots of the elements of an object
ceiling() for rounding the elements of an object up to an integer
floor() for rounding the elements of an object down to an integer
trunc() for truncating the elements of an object to the decimal point
cummax() for the cumulative maximum over an atomic object
cummin() for the cumulative minimum over an atomic object
cumprod() for the cumulative product over an atomic object
cumsum() for the cumulative sum over an atomic object
exp() for e to the powers of the elements of an object
log(), log10(), and log2() for the logarithms of the elements of an object for a specified base (defaults to the natural logarithm), base 10, and base 2, respectively
max() for the maximum of the elements in an object, can be character
min() for the minimum of the elements in an object, can be character
pmax() for multiple vectors or matrices (will cycle)—returns the maximum across rows between objects
pmin() for multiple vectors or matrices (will cycle)—returns the minimum across rows between objects
sum() for the sum of the elements of an object
prod() for the product of the elements of an object
mean() for the mean of the elements of an object
range() for the range of the elements of an object
rank() for the ranks of the elements of an object
sign() for the signs of the elements of an object—returns 1 for positive numbers, –1 for negative numbers, and 0 for zeroes
order() for indices giving the order of the elements of an object; with more than one object, the order of the first object, using the second object for ties, and so forth; used to reorder vectors, matrices, data frames, and arrays; x[order(x)] equals sort(x)
sort() for sorting the elements of objects
zapsmall() for setting very small numbers to zero
Atomic vectors, matrices, arrays, and data frames of the legal modes can be used for these functions. The results of these functions are various types of objects, depending on the function.
Some Other Mathematical Functions
Function in R | Restrictions |
---|---|
abs(x) | logical, numeric, or complex objects; logical coerced to numeric; returns object of same dimensions |
sqrt(x) | see abs(); negative real numbers return NaN |
ceiling(x) | logical or numeric object; logical coerced to numeric; returns object of same dimensions |
floor(x) | see ceiling() |
trunc(x, ...) | x, logical or numeric object; logical coerced to numeric; returns object of same dimensions. . ., any arguments to be passed on to lower level functions called by trunc() |
cummax(x) | raw, logical, numeric, or character object; will be coerced to numeric; character objects that are not a number in quotes return NAs; returns vector |
cummin(x) | see cummax() |
cumsum(x) | see cummax() |
cumprod(x) | see cummax() |
exp(x) | logical, numeric, or complex object; logical coerced to numeric; returns object of same dimensions |
log(x, base=exp(1)) | x, logical, numeric, or complex object; logical coerced to numeric; x ≥ 0; 0’s return –Inf; negative real numbers return NaN; returns object of same dimensionsbase, the base for the logarithm; numeric or complex—logical is legal but returns Inf for T and 0 for F; base ≥ 0 |
log2(x) | logical, numeric, or complex; logical coerced to numeric; x ≥ 0; 0’s return –Inf; negative real numbers return NaN; returns object of same dimensions |
log10(x) | see log2() |
max(..., na.rm=FALSE) | . . ., logical, numeric, complex, and character objects separated by commas; do not need to be of the same length; can mix modes; returns a single value na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA |
min(..., na.rm=FALSE) | see max() |
pmax(..., na.rm=FALSE) | . . ., logical, numeric, and character objects separated by commas; do not need to be of the same length—cycle; can mix modes; returns a vector or matrix na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA |
pmin(..., na.rm=FALSE) | see pmax() |
sum(..., na.rm=FALSE) | . . ., logical, numeric, and complex objects separated by commas; can mix modes; returns a single value na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA; NaN similar but are treated differently for complex numbers |
prod(..., na.rm=FALSE) | see sum() |
mean(x, trim=0, na.rm=FALSE, ...) | x, logical, numeric, or complex object; returns a single value; for complex trim must equal zero trim, 0 ≤ trim ≤ .5; is proportion of elements to trim before taking the mean na.rm, logical; if an NA is present and na.rm is FALSE returns NA, if TRUE ignores NA; NaN the same . . . any arguments to be passed to lower level functions called by mean() |
range(..., na.rm=FALSE) | . . ., logical, numeric, and character objects separated by commas; can mix modes; returns two values na.rm, logical; if an NA is present and na.rm is set to FALSE returns NA, if TRUE ignores the NA; NaN the same |
rank(x, na.last=TRUE, ties.method=c( "average", "first", "random", "max", "min")) | x, logical, numeric, complex, or character object na.last, logical or character; if TRUE, NAs and NaNs are ranked last, if FALSE they are first, if NA they are discarded, if “keep” they keep their place in the order; NaNs return NAs; returns a vector ties.method, character; method for setting a value for ties; the default is “average” |
sign(x) | logical or numeric object; returns object of same dimensions |
order(..., na.last=TRUE, decreasing=FALSE) | ..., logical, numeric, complex or character vectors of the same length—can use just one vector—can mix modes; returns a permutation of indices of length equal to the length of the vector(s) na.last, logical; for TRUE NAs are placed last, for FALSE NAs first, for NA NAs are removed decreasing, logical; must be TRUE or FALSE; if TRUE order is decreasing, if FALSE increasing |
sort(x, decreasing=FALSE, na.last=NA, ...) | x, logical, numeric, complex, or character object; sorts real and imaginary parts of complex separately; returns a vector decreasing, logical; if TRUE sorts in decreasing order, if FALSE increasing; must be TRUE or FALSE na.last, logical; if TRUE, NAs are put last, if FASLE, they are put first, if NA they are discarded; NaNs are put last . . ., any arguments to be passed on to lower level functions called by sort() |
zapsmall(x, digits=getOptions("digits")) | x, logical, numeric, or complex object; returns object of same dimensions digits, numeric; will round to an integer |
You can find more information about any of these functions by going to the help page of the function (?function.name, where function.name is the name of the function, or use the Help tab in R Studio.)
Complex Numbers
Re(), the real part of a complex number
Img(), the imaginary part of a complex number
Arg(), the angle from the x axis in radians of the line between the origin and the complex number
Mod(), the modulus of a complex number; equals the length of the line between the origin and the complex number
Conj(), the complex conjugate of a complex number
The functions take logical, numeric, and complex objects for arguments. Logical arguments are coerced to numeric. The result has the same dimensions as the argument.
You can find more information about the complex functions by entering ?Re at the R prompt or by using the Help tab in R Studio.
Matrices, Arrays, and Data Frames
There are a number of functions for matrices , arrays, and data frames in base that we have not yet covered .
aperm(), which permutes an array
rowsum(), which sums over rows of a matrix or data frame in groups set by the group variable
colMeans(), which returns the means of the columns of a data frame or matrix or the means for given dimensions for an array—going from the first dimension to the specified dimension
colSums(), which returns the sums of the columns of a data frame or matrix or the sums for an array—going from the first dimension to the specified dimension
rowMeans(), which returns the means of the rows of a data frame or matrix or the sums over dimensions of an array—going from the specified dimension plus one to the last dimension
rowSums(), which returns the sums of the rows or a data frame or matrix—going from the specified dimension plus one to the last dimension
col(), which returns a matrix of the same dimensions as the argument and which contains the column indices in the columns or a matrix of factors with each column one factor
row(), which returns a matrix of the same dimensions as the argument and which contains the row indices in the rows or a matrix of factors with each row one factor
det(), which returns the determinant of a matrix
determinant(), which returns the modulus or the logarithm of the modulus of the determinant and the sign of the modulus
eigen(), which returns the eigenvalues and eigenvectors of a matrix
kappa(), which calculates the condition of a square matrix
kronecker(), which returns the matrix or array which is the kronecker product of two objects and where product is a specified function. The two objects can be vectors, matrices, and/or arrays. The dimensions of the result are the products of the dimensions of the two objects.
norm(), which returns the norm of a matrix calculated by the one, infinity, Frobenius, maximum modulus, or spectral (or 2) method
backsolve(), which solves a matrix equation where the matrix on the left of the equation is upper triangular
forwardsolve(), solves a matrix equation where the matrix on the left of the equation is lower triangular
chol(), the Choleski decomposition of a square positive definite matrix
chol2inv(), the inverse of a positive definite matrix using the Choleski decomposition of the matrix
qr(), the QR decomposition of a matrix
svd(), a singular value decomposition of a matrix.
Some Functions for Matrices, Arrays, and Data Frames
Function in R | Restrictions |
---|---|
aperm(a, perm=NULL, resize=TRUE, ...) | a, matrix or array perm, NULL, integer or character vector; gives order of the dimensions by index or character string; if not NULL must be of length equal to the dimensions of a and a permutation of the dimensions of a; NULL returns the dimensions reversed resize, logical; must be TRUE or FALSE ..., any arguments to be passed to lower level functions |
rowsum(x, group, reorder=TRUE, na.rm=FALSE, ...) | x, any numeric matrix group, a vector or factor of length equal to the number of rows in x—used for grouping reorder, logical; must be TRUE or FALSE na.rm, logical; must be TRUE or FALSE ..., any arguments to be passed to or from lower level functions |
colMeans(x, na.rm=FALSE, dims=1) | x, logical, numeric or complex matrix, data frame, or array na.rm, logical; must be TRUE or FALSE dims, numeric; 1 ≤ dims ≤ n-1, where n is the number of dimensions |
colSums(x, na.rm=FALSE, dims=1) | see colMeans() |
rowMeans(x, na.rm=FALSE, dims=1) | see colMeans() |
rowSums(x, na.rm=FALSE, dims=1) | see colMeans() |
col(x, as.factor=FALSE) | x, any matrix as.factor, logical; must be TRUE or FALSE |
row(x, as.factor=FALSE) | see col() |
det(x, ...) | x, a logical or numeric square matrix; logical coerced to numeric ..., ignored |
determinant(x, logarithm=TRUE, ...) | x, a logical or numeric square matrix; logical coerced to numeric logarithm, logical; must be TRUE or FALSE ..., ignored |
eigen(x, symmetric, only.values=FALSE, EISPACK=FALSE) | x, a logical, numeric, or complex square matrix; logical coerced to numeric symmetric, logical; if TRUE matrix is assumed symmetric, if FALSE not only.values, logical; if TRUE only eigenvalues are returned, if FALSE both eigenvalues and eigenvectors are returned EISPACK, logical; defunct and ignored |
kappa(z, exact=FALSE, norm=NULL, method= c("qr", "direct"), ..) | z, logical or numeric square matrix; logical coerced to numeric exact, logical; must be TRUE or FALSE norm, character; must be NULL, “O”, or “I”—for norm one and norm infinite method, character; must be “qr” or “direct”; default is “qr” ..., any arguments to lower level functions |
kronecker(X, Y, FUN="*", make.names=FALSE, ...) | X, Y, vectors, matrices, and arrays; do not have to be of the same mode; must be legal for the function FUN FUN, a function; can be a character string make.names, logical; must be TRUE or FALSE; does not work with all functions ..., any arguments for the function FUN |
norm(x, type= c("O","I","F","M","2") | x, logical, numeric, or complex matrix; logical and complex are coerced to numeric type, character; default value is “O” |
backsolve(r, x, k=ncol(r), upper.tri=TRUE, transpose=FALSE) | r, upper triangular matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric x, vector or matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric k, numeric—rounds down to an integer; 1 ≤ k ≤ ncol(r); is the number of columns in ‘r’ to use upper.tri, logical; for TRUE the upper triangle is used, for FALSE, the lower is used transpose, logical; for TRUE r is transposed in the formula |
forwardsolve(l, x, k=ncol(l), upper.tri=FALSE, transpose=FALSE) | l, lower triangular matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric x, a vector or matrix of mode logical, numeric, or complex—logical and complex values are coerced to numeric k, numeric—rounds down to an integer; 1 ≤ k ≤ ncol(l); the number of columns in ‘l’ to use upper.tri, logical; for TRUE the upper triangle is used, for FALSE, the lower is used transpose, logical; for TRUE l is transposed in the formula |
chol(x, pivot=FALSE, LINPACK=FALSE, tol=-1, ...) | x, raw, logical, or numeric matrix—where raw and logical matrices are coerced to numeric; must be square and positive definite pivot, logical; for TRUE pivot, FALSE do not pivot LINPACK, (deprecated) logical; for TRUE use LINPACK, FALSE do not use LINPACK tol, numeric; tolerance when pivot=TRUE and LINPACK=FALSE ..., any arguments to be passed to lower level functions |
chol2inv(x, size=NCOL(x), LINPACK=FALSE) | x, matrix for which the first size columns are a Choleski decomposition size, numeric, logical, or complex—logical and complex coerced to numeric; 1 ≤ size ≤ ncol(x) LINPACK, logical; defunct—no longer used |
qr(x, tol=1e-7, LAPACK=FALSE, ...) | x, logical, numeric, or complex matrix; logical matrices are coerced to numeric tol, numeric; tolerance for singularity LAPACK, logical; if FALSE qr() uses LINPACK ..., any arguments to be passed to lower level functions |
svd(x, nu=min(n,p), nv=min(n,p), LINPACK=FALSE) | x, logical, numeric, or complex matrix; logical matrices are coerced to numeric nu, integer; 0 ≤ nu ≤ n; n = nrow(x) nv, integer; 0 ≤ nv ≤ p; p = ncol(x) LINPACK, logical; defunct and ignored |
You can find more information by going to the individual help pages (?function.name, where function.name is the name of the function) or by using the Help tab in R Studio.
A Few Other Functions and Some Comments
A few other functions that are often useful are R.home(), R.Version(), all.equal(), Identical(), dir(), getwd(), setwd(), unique(), hexamode(), jitter(), append(), duplicated() (and anyDuplicated()), attr() (and attributes()), pretty(), margin.table(), prop.table(), cut(), rev(), readline(), system(), try(), warnings(), and stop(). For the functions, we will just describe what they do. You can find more information about the functions by entering ?‘function.name’ at the R prompt, where function.name is the name of the function or by using the Help tab in R Studio..
R.home() gives the full path to the directory containing the R program.
R.Version() gives the R version and other information about the version.
all.equal() tests if two objects are nearly equal.
Identical() tests if two objects are identically equal.
dir() returns the contents of a directory on the hard drive.
getwd() returns the working directory on the hard drive.
setwd() sets the working directory on the hard drive.
unique() returns a vector with any duplicated elements in the original vector removed. The function only works on vectors, including vectors of mode list.
hexmode() returns the hexadecimal value of a number.
jitter() adds a little jitter (noise) to the elements of numeric objects. The arguments to jitter() control how much jitter is added.
append() is used to append vectors. An argument to append() gives where along the vector the appending is done.
duplicated() and anyDuplicated() look for duplicates. For vectors, including lists, duplicated() returns a vector of the same length containing FALSE for elements that are not duplicated and for the first instance of elements that are duplicated. The function returns TRUE for the rest of the duplicates. For matrices and data frames, rows are compared. The function anyDuplicated() counts how many differing elements have duplicates, or duplicated rows for matrices and data frames.
attr() and attributes() return an attribute or a list of the attributes of an object. To use an attribute, the function attr() returns a value that can be accessed. To see a list of the attributes of an object, use attributes().
pretty() takes any object that can be coerced to numeric and returns a vector of evenly spaced values close to a given length and similar to the values in the original object.
margin.table() takes a logical, numeric, or complex object and returns margin sums for a margin in a table.
prop.table() takes a logical, numeric, or complex object and returns the object divided by the sum of the elements in the object. Logical objects are coerced to numeric and the real and imaginary parts of complex objects are treated separately.
cut() cuts a numeric vector into factors and returns a character vector with the factor names in the place of the original elements. The object to be cut can be any object that can be coerced to vector, but must be numeric. The break points and factor names can be assigned, but cut() creates break points and factor names from the break points by default.
rev() reverses the order of the elements of an object and returns a vector. The object can be atomic or of any mode where reversing the order makes sense, like the modes list, expression, and call.
readline() reads a line from the console—for interactive use of an R function.
system() runs a system command from inside R—the command is entered in quotes.
try() attempts to execute a expression or function—returns an error message or the result of the execution. Errors do not stop the program.
warnings() returns the warning messages if a program has run with warnings.
stop() tells R to stop the execution of a function. If stop() has a character string for an argument, the character string prints when stop() executes. The function is very useful for the process of debugging a function as well as for checking if conditions are met for objects entered into a function.
gc() garbage collection—cleans up the session.
There are many other functions in base, many of which have to do with the running of R. The as. and is. functions are prevalent. In the list of help pages, there are 110 links for as. functions and 46 links for is. functions. If you are interested in what is in the listings, go to the page of the links and look at what is there. The Bessel functions and bitwise logical functions are also part of base.
The stats Package
The stats package contains items such as basic descriptive statistics, probability distributions, tests, functions to fit models, clustering functions, some plotting functions, and other functions used for outputting results. The list of links to the help pages for stats is 18 pages long (help(package=stats)). In this chapter, we cover the basic descriptive statistics, the tests, clustering and other functions for multivariate data, and modeling functions, but in little detail. The probability distributions can be found in Chapter 9.
Basic Descriptive Statistics
weighted.mean(), which finds the weighted mean of an object
sd(), which finds the standard deviation of an object
var(), which finds the variance of a vector or the covariance matrix of a matrix or data frame
cov(), which finds the covariance matrix of a matrix or data frame—more flexible than var()
cov.wt(), which finds the weighted covariance or correlation matrix of a matrix or data frame
cor(), which finds the correlation between vectors or within matrices and data frames
median(), which finds the median of the elements of an object
mad(), which finds the median absolute deviation of the elements of an object
IQR(), which finds the interquartile range of the elements of an object
quantile(), which finds specific quantiles of the elements in an object
fivenum(), which finds Tukey’s five-number summary for the elements in an object
ave(), which uses a function to operate on different rows of an object based on factor values
cancor(), which finds the canonical correlation between two matrices
dist(), which finds a type of average difference between the rows of a matrix, based on the type of distance and the power used to find the average
mahalanobis(), which finds the Mahalanobis distance between rows of a matrix
ecdf(), which finds the empirical cumulative distribution function of the elements in an object—a quantile method exists for the function
r2dtable(), which creates a random two-way table based on marginal values—using Patefield’s algorithm
simulate(), which simulates observations from a model that has been fitted
TukeyHSD(), which finds confidence intervals for the coefficients of a model that take into account that more than one hypothesis is being tested—for analysis of variance models
xtabs(), which creates a contingency table based on a formula
smooth(), which creates a smoother version of a noisy set of data using Tukey’s running median smoothers—usually used for time series
Basic Statistical Functions in Package stats
Function in R | Description |
---|---|
weighted.mean(x, w, ..., na.rm=FALSE) | Finds the weighted mean of x, where x is coerced to a vector. |
sd(x, na.rm=FALSE) | Finds the standard deviation x, where x is coerced to a vector; divides by the square root of (n-1). |
var(x, y=NULL, na.rm=FALSE, use) | Finds the variance of x if x is a vector or the covariance of x and y or the covariance matrix of x if x is a matrix or data frame; divides by (n-1) |
cov(x, y=NULL, use="everything", method=c("pearson", "kendall", "spearman")) | Finds the covariance between x and y if y is given or the covariance matrix of x if x is a matrix or data frame; more options are available than with var( ) |
cov.wt(x, wt=rep(1/nrow(x), nrow(x)), cor=FALSE, center=TRUE, method=c("unbiased", "ML")) | Finds the weighted covariance matrix or weighted correlation matrix of x, where x is a matrix or data frame |
cor(x, y=NULL, use="everything", method=c("pearson", "kendall", "spearman")) | Finds the correlation between x and y if y is supplied or within x if just x is supplied, where x is a vector, matrix, or data frame |
median(x, na.rm=FALSE) | Finds the median of the elements of x |
mad(x, center=median(x), constant=1.4826, na.rm=FALSE, low=FALSE, high=FALSE) | Finds the median absolute deviation of x |
IQR(x, na.rm=FALSE, type=7) | Finds the interquartile range of x |
quantile(x, probs=seq(0,1,.25), na.rm=FALSE, names=TRUE, type=7, ...) | Finds the quantiles of x for the values of probs |
fivenum(x, na.rm=FALSE) | Finds Tukey’s five-number summary for x |
ave(x, ..., FUN=mean) | The function in FUN operates on groups of the elements of x, where the grouping variables are in the argument ... |
cancor(x, y, xcenter=TRUE, ycenter=TRUE) | Finds canonical correlation between the matrices x and y |
dist(x, method="euclidean", diag=FALSE, upper=FALSE, p=2) | Finds distance between rows of a matrix, where the type of distance is specified by method |
mahalanobis(x, center, cov, inverted=FALSE) | Finds the Mahalanobis distance between rows of a matrix |
ecdf(x) | Finds the empirical cumulative distribution function of x |
r2dtable(n, r, c) | Creates a random table based on marginal totals for the rows and columns |
simulate(x, nsim=1, seed=NULL, ...) | Simulates observations from the model given in x; x is a model |
TukeyHSD(x, which, order=FALSE, conf.level=0.95, ...) | Tukey’s honest significant differences for analysis of variance models |
xtabs(formula=~., data=parent.frame(), subset, sparse=FALSE, na.action, exclude=c(NA,NaN), drop.unused.levels=FALSE) | Creates a contingency table based on the formula, where the variables on the right side of the formula are used to group the object on the left |
smooth(x, kind=c("3RS3R", "3RSS", "3RSR", "3R", "3S", "3", "S"), twiceit=FALSE, endrule="Tukey", do.ends=FALSE) | Smooths a vector or time series using Tukey’s running median smoothers |
You can find more information about the functions by entering ? function.name at the R prompt where function.name is the name of the function or by using the Help tab in R Studio.
Some Functions That Do Tests
ansari.test() for the Ansari-Bradley test for testing for a difference between the scale parameters of two samples
bartlett.test() for the homogeneity of variances
binomial.test() for exact tests using the binomial distribution
Box.test() for the Box-Pierce and Ljug-Box tests—used in time series to test for independence
chisq.test() for testing count data using Pearson’s test
cor.test() for correlations in paired samples
fisher.test() for contingency tables using Fisher’s exact test
fligner.test() for the Fligner-Killeen test for homogeneity of variances
friedman.test() for the Friedman rank sum test
kruskal.test() for the Kruskal-Wallis rank sum test
ks.test() for the Kolmogorov-Smirnov tests on one or two samples
mantelhaen.test() for the Cochran-Mantel-Haenszel chi squared test for count data
mauchly.test() for the test of sphericity developed by Mauchly
mcnemar.test() for the chi squared test for count data developed by McNemar
mood.test() for the two sample tests of scale developed by Mood
oneway.test() for testing for equal means if the layout is one way
pairwise.prop.test() for comparing proportions pairwise
pairwise.t.test() for comparing t tests pairwise
pairwise.wilcox.test() for comparing Wilcox rank sum tests pairwise
poisson.test() for an exact test using the Poisson distribution
power.anova.test() to find powers for a balanced one-way analysis of variance
power.prop.test() to find the powers for comparing two proportions
power.t.test() for the powers in one and two sample t tests
PP.test() for the Phillops-Perron test to test for unit roots in time series data
prop.test() for testing proportions
prop.trend.test() for testing trend in proportions
quade.test() for the Quade test
shapiro.test() for the Shapiro-Wilk test for normality
t.test() for doing a t test
var.test() for an F test to compare two variances
wilcox.test() for Wilcoxon rank sum and sign tests
Some Tests in stats
Test |
ansari.test(x, y, alternative=c(“two-sided”, “less”, “greater”), exact=NULL, conf.int=FALSE, conf.level=0.95, . . . ) |
bartlett.test(x, g, ...) |
biniom.test(x, n, p=0.5, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95) |
Box.test(x, lag=1, type=c(“Box-Pierce”, “Ljung-Box”), fitdf=0) |
chisq.test(x, y=NULL, correct=TRUE, p=rep(1/length(x), length(x)), rescale.p=FALSE, B=2000) |
cor.test(x, y, alternative=c(“two.sided”, “less”, “greater”), method=c(“pearson”, “kendall”, “spearman”), exact=NULL, conf.level=0.95, continuity=FALSE, . . . ) |
fisher.test(x, y=NULL, workspace=200000, hybrid=FALSE, control=list(), or=1, alternative=“two.sided”, conf.int=TRUE, conf.level=0.95, simulate.p.value=FALSE, B=2000) |
fligner.test(x, g, . . . ) |
friedman.test(y, groups, blocks, . . . ) |
kruskal(x, g, . . . ) |
ks.test(x, y, . . . , alternative=c(“two-sided”, “less”, “greater”), exact=NULL) |
mantelhaen.test(x, y=NULL, z=NULL, alternative=c(“two.sided”, “less”, “greater”), correct=T, exact=F, conf.level=0.95) |
mauchly.test(object, . . . ) |
mcnemar.test(x, y=NULL, correct=TRUE) |
mood.test(x, y, alternative=c(“two.sided”, “less”, “greater”), . . . ) |
oneway.test(formula, data, subset, na.action, var.equal=FALSE) |
pairwise.prop.test(x, n, p.adjust.method=p.adjust.methods, . . . ) |
pairwise.t.test(x, g, p.adjust.method=p.adjust.methods, pool.sd=!paired, paired=FALSE, alternative=c(“two.sided”, “less”, “greater”), . . . ) |
pairwise.wilcox.test(x, g, p.adjust.method=p.adjust.methods, paired=FALSE, . . . ) |
poisson.test(x, T=1, r=1, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95) |
power.anova.test(groups=NULL, n=NULL, between.var=NULL, within.var=NULL, sig.level=0.05, power=NULL) |
power.prop.test(n=NULL, p1=NULL, p2=NULL, sig.level=0.05, power=NULL, alternative=c(“two-sided”, “one.sided”), strict=FALSE) |
power.t.test(n=NULL, delta=NULL, sd=1, sig.level=0.05, type=c(“two.sample”, “one.sample”, “paired”), alternative=c(“two.sided”, “one.sided”), strict=FALSE) |
PP.test(x, lshort=TRUE) |
prop.test(x, n, p=NULL, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95, correct=TRUE) |
prop.tend.test(x, n, score=seq_along(x)) |
quade.test(y, . . . ) |
shapiro.test(x) |
t.test(x, y=NULL, alternative=c(“two-sided”, “less”, “greater”), mu=0, paired=FALSE, var.equal=FALSE, conf.level=0.95, . . . ) |
var.test(x, y, ratio=1, alternative=c(“two-sided”, “less”, “greater”), conf.level=0.95, . . . ) |
wilcox.test(x, y=NULL, alternative=c(“two-sided”, “less”, “greater”), mu=0, paired=FALSE, exact=NULL, correct=TRUE, conf.int=FALSE, conf.level=0.95, . . . ) |
For more information about any of the tests, enter ? function.name at the R prompt where function.name is the name of the function or use the Help tab in R Studio.
Some Modeling Functions in stats
acf() to estimate autocorrelation and autocovariance in time series
acf2AR() to exactly fit an autoregressive model to an autocorrelation function
add1() to find those single terms that can be added or dropped from a model, fit the models, and tabulate the results of the fitting
AIC() and BIC() to find the Akaike’s ‘An Information Criterion’ or the ‘Schwartz Bayesian criterion’ for an appropriate model
aov() to fit an analysis of variance model
approx() and approxfun() to do linear interpolation
ar() to fit a time series autoregressive model
arima() to fit an autoregressive integrated moving average to time series data
arima.sim() to do simulations from an ARIMA model
ccf() to estimate cross correlation and cross covariance for two time series
complete.cases() to find complete cases for a sequence of vectors, matrices, or data.frames
contrasts() to set or get contrasts for a factor object
cpgram() to plot a cumulative periodogram for time series data
decompose() to decompose seasonal patterns using moving average
density() for kernel density estimation
ecdf() for the empirical cumulative distribution function
fft() for fast discrete fourier transforms for time series data
filter() for linear filtering of time series
glm() to fit a generalized linear model
isoreg() isotonic or monotone regression
KalmanForcast(), KalmanLike(), KalmanRun(), KalmanSmooth(), and makeARIMA() for Kalman filtering
ksmooth() to smooth using a kernel smoother
line() to fit a line robustly—based on Tukey’s Exploratory Data Analysis
lm() to fit a linear model
loess() to fit a local polynomial model
loglin() to fit a loglinear model
lsfit() to fit a least squared linear model with one explanatory variable
manova() to fit multiple analysis of variance models
medpolish() for a median polish of a matrix
mvfft() for fast discrete fourier transforms for matrices
nlm() to find a minimum of a nonlinear model
nls() to fit a nonlinear least squares model
optim(), optimHess(), optimise(), and optimize() to optimize a function
pacf() to estimate partial autocovariances and autocorrelations for a time series
poly() and polym() to create orthogonal polynomials of the desired degree
ppr() to fit a projection pursuit regression model
profile() to profile models—generic function
smooth.spline() to fit a smooth spline model
spec() to find the spectral density for time series data
step() to use the AIC to choose a model using a stepwise algorithm
stl() to use the loess method to seasonally decompose a time series
StrucTS() to fit a structural time series model
supsmu() for Friedman’s super smoother
update() for updating a model
There are many functions in stats that support the modeling functions, which we do not cover. You can find more information at the help pages for the individual functions: enter ?function.name at the R prompt where function.name is the name of the function or use the Help tab in R Studio.
Clustering Algorithms and Other Multivariate Techniques
cmdscale() for classical multidimensional scaling
cophenetic() for cophenetic distances in hierarchical clustering
cut.dendrogram() for a general tree structure
cutree() for cutting a tree into groups
dendrapply() to apply a function to all nodes of a dendrogram
as.dendrogram() to give an appropriate object the class dendrogram
factanal() for factor analysis
hclust() for hierarchical clustering
identify.hclust() to identify clusters
kmeans() for k means clustering
labels.dendrogram() gives the ordering of or the labels of the leaves on a dendrogram
loadings() printing loadings from a factor analysis
merge.dendrogram() merges two dendrograms
order.dendrogram() gives the ordering or the labels of the leaves of a dendrogram
prcomp() does principal components analysis
princomp() also does principal component analysis
promax() used for rotation of axes in factor analysis
reorder.dendrogram() for reordering a dendrogram maintaining the initial constraints
rev.dendrogram() reverses the order of the nodes in a dendrogram
str.dendrogram() displays the internal structure of a dendrogram
varimax() used for rotation of axes in factor analysis
For more information about any of the functions, enter ?‘function.name’ at the R prompt where function.name is the name of the function or use the Help tab in R Studio.
The package stats also contains several probability distributions (see Chapter 9); eight as. functions; six is. functions; a number of plotting functions—like heatmap() and 19 plot. functions—which are specific for many of the classes associated with modeling functions; functions used in kernel estimation; ancillary functions for models—like the seven model. functions; seven na. functions—to handle missing data; 13 predict.—functions for model output, 27 print. functions for printing output; and nine summary. functions for summarizing output.
The graphics Package
The package graphics contains the function plot()—for which the many plot. methods are written. The ancillary functions for plot() are in graphics. There are also several plotting functions for specific types of plots—like histograms and bar charts. The list of links to the help pages for graphics is three pages long (help(package=graphics)). In this section, we cover the specific types of plots and a few other functions related to plotting.
assocplot() for a Cohen-Friendly association plot; used for contingency tables; will work with any matrix that is logical or numeric
barplot() for a bar plot; takes vector or matrix objects, which are of mode logical or numeric, for the heights of the bars
boxplot() for box plots; logical or numeric vectors, matrices, arrays, data frames, and some lists can be used as input to the function
bxp() for box plots of summaries
cdplot() for a conditional density plot
coplot() for scatter plots using a conditioning variable
dotchart() for a Cleveland’s dot plot; numeric vectors and matrices can be used for the plot
fourfoldplot() for a four fold plot of 2 x 2 x k contingency tables
hist() for histograms; gives histograms for numeric vectors, matrices, and arrays
mosaicplot() for mosaic plots; takes numeric or logical arguments that are vectors, matrices, data frames, or arrays; is meant for contingency tables
pairs() for scatter plots of paired variables; takes numeric vectors, matrices, and data frames as input; creates a matrix of plots
persp() for a perspective plot; does three-dimensional plotting
pie() for pie charts; use numeric vectors, matrices, and arrays as input
smoothScatter() for a smoothed version of scatter plots—which are colored; is copyrighted by M. P. Wand
spineplot() for spine plots; use a logical, numeric, or complex matrix as input to the plot; logical and complex matrices are coerced to numeric; was developed for two-way contingency tables
stars() for star or segment plots; use a numeric matrix or data frame for the input to the plot
stem() for a stem and leaf plot; use a numeric vector, matrix, or array as the input to the plot
stripchart() for a one dimensional scatter plot
sunflowerplot() for a sunflower plot, which is a scatter plot in which points with duplicates have sunflower leaves for the duplicated points; use a logical, numeric, or complex vector, matrix, or data frame for the input to the plot
There are also some functions in graphics that control the screen for plotting functions. The function splitscreen() and its ancillary functions close.screen(), erase.screen(), and screen() are used to split the plotting screen into regions and to plot to the regions. The functions frame() and plot.new() open a new frame for plotting.
The function par() is like options()—except for plotting—and contains the default options for plots. The options can be changed at any time. Calling par() opens a new plotting frame. To see the list of options, call par() with no arguments.
The function plot() is the basic plotting function and has a numbers of ancillary functions and is defined for quite a few methods. We do not cover plot() in this book.
You can find more information about the functions in graphics by entering ? function.name at the R prompt where function.name is the name of the function or by using the Help tab in R Studio.