Appendix Algebra and Calculus Basics

A.1 Exponentials and Logarithms

Exponentials are written as ex or exp (x), where e = 2.718…By definition exp (– ∞) = 0, exp (0) = 1, exp (1) = e, and exp (∞) = ∞. In R, ex is exp(x); if you want the value of e, use exp(1). Logarithms are the solutions to exponential or power equations like y = ex or y = 10x. Natural logs, ln or loge, are logarithms base e; common logs, log10, are typically logarithms base 10. When you see just “log” it's usually in a context where the difference doesn't matter (although in R log10 is log10 and loge is log).

 

1. log (1) = 0. If x > 1, then log (x) > 0, and vice versa. log (0) =–∞ logarithms are undefined for x < 0.

2. Logarithms convert products to sums: log (ab) = log (a) + log (b).

3. Logarithms convert powers to multiplication: log (an) = nlog (a).

4. You can't do anything with log (a + b).

5. Converting bases: logx (a) = logy (a)/ logy (x). In particular, log10 (a) = loge (a) / loge (10) ≈ loge (a)/2.3 and loge (a) = log10 (a)/log10(e) ≈ log10 (a)/0.434. This means that converting between log bases just means multiplying or dividing by a constant. Here's the proof:

images

(compare the first and last lines).

6. The derivative of the logarithm, d(log x)/dx, equals 1/x. This is always positive for x > 0 (which are the only values for which the logarithm is defined anyway).

7. The fact that d(logx)/dx > 0 means the function is monotonic (always either increasing or decreasing), which means that if x > y, then log (x) > log (y) and if x < y, then log (x) < log (y). This in turn means that if you find the maximum likelihood parameter, you've also found the maximum log-likelihood parameter (and the minimum negative log-likelihood parameter).

A.2 Differential Calculus

1. Notation: differentation of a function f (x) with respect to x can be written, depending on the context, as images.

2. Definition of the derivative:

images

In words, the derivative is the slope of the line tangent to a curve at a point, or the instantaneous slope of a curve. The second derivative, d2f /dx2, is the rate of change of the slope, or the curvature.

3. The derivative of a constant (which is a flat line if you think about it as a curve) is zero (slope = 0).

4. The derivative of a linear equation, y = ax, is the slope of the line, a. (The derivative of y = ax + b is also a.)

5. Derivatives of polynomials:images.

6. Derivatives of sums: images

7. Derivatives of products: images

8. Derivatives of constant multiples:images

9. Derivative of the exponential: images = a exp (ax), if a is a constant. (If not, use the chain rule.)

10. Derivative of logarithms: images

11. Chain rule: images (thinking about this as “multiplying fractions” is a good mnemonic but don't take it too literally!) Example:

images

Another example: people sometimes express the proportional change in x, (dx/dt)/x, as d(log (x))/dt. Can you see why?

12. Critical points (maxima, minima, and saddle points) of a curve f have df /dx = 0. The sign of the second derivative determines the type of a critical point (positive = minimum, negative = maximum, zero = saddle).

A.3 Partial Differentiation

1. Partial differentiation acts just like regular differentiation except that you hold all but one variable constant, and you use a curly d (∂) instead of a regular d. So, for example, ∂(xy)/9(x) = y. Geometrically, this is taking the slope of a surface in one particular direction. (Second partial derivatives are curvatures in a particular direction.)

2. You can do partial differentiation multiple times with respect to different variables; order doesn't matter, so jimages

A.4 Integral Calculus

For the material in this book, I'm not asking you to remember very much about integration, but it would be useful to remember that

 

1. The (definite) integral of f (x) from a to b, fba f (x) dx, represents the area under the curve between a and b. The integral is a limit of the sum images as Δx → 0.

2. You can take a constant out of an integral (or put one in): ƒ af (x) dx = af (x) dx.

3. Integrals are additive: ƒ (f(x) + g(x)) dx = ƒ f (x) dx + ƒ g(x) dx.

A.5 Factorials and the Gamma Function

A factorial, written with an exclamation point !, means k! = k × (k — 1) ×…× 1. For example, 2! = 2, 3! = 6, and 6! = 720. In R a factorial is factorial—you can't use the shorthand ! notation, especially since != means “not equal to” in R. Factorials come up in probability calculations frequently, e.g., as the number of permutations with k elements. The gamma function, usually written as Γ (gamma in R) is a generalization of factorials. For integers, Γ(x) = (x — 1)!. Factorials are defined for integers only, but for positive, noninteger x, Γ(x) is still defined and it is still true that Γ(x + 1) = x · Γ(x).

Factorials and gamma functions get very large, and you often have to compute ratios of factorials or gamma functions (e.g., the binomial coefficient, N!/(k!(Nk)!). Numerically, it is more efficient and accurate to compute the logarithms of the factorials first, add and subtract them, and then exponentiate the result: exp (log N!—log k! —log (Nk)!). R provides the log-factorial (lfactorial) and log-gamma (lgamma) functions for this purpose. (Actually, R also provides choose and lchoose for the binomial coefficient and the log-binomial coefficient, but the log-gamma is more generally useful.)

The main reason that the gamma function (as opposed to factorials) comes up in ecology is that it is part of the normalizing constant (see Chapter 4) for the Gamma distribution, which is usually written as Gamma (not Γ): Gamma(x, a, s) = images

A.6 Probability

Most of the probability rules you need are discussed in Chapter 4.

 

1. Probability distributions always add or integrate to 1 over all possible values.

2. Probabilities of independent events are multiplied: p(A and B) = p(A)p(B).

3. The binomial coefficient, is the number of different ways of choosing k objects out of a set of N, without regard to order.

images

A.7 The Delta Method

The formula for the delta method of approximating variances is

images

Lyons (1991) describes the delta method very clearly; Oehlert (1992) provides a short technical description of the formal assumptions necessary for the delta method to apply.

This formula is exact in some simple cases:

images Multiplying by a constant: Var(ax) = a2Var(x).

images Sum or difference of independent variables: Var(x ± y) = Var(x) + Var(y).

images Product or ratio of independent variables:

images

A similar form holds for the coefficient of variation (CV): (CV(x · y))2 = (CV(x))2 + (CV(y))2.

images The formula is exact for linear functions of normal or multivariate normal variables.

The formula can be extended to more than two variables. The deltavar function in the emdbook package will calculate delta-method-based variances for functions with any number of parameters.

A.8 Linear Algebra Basics

This section is more of a “cheater's guide” than a real introduction to linear algebra: Lynch and Walsh (1997) and Caswell (2000) both give useful bare-bones linear algebra reviews. All you need to know for this book is how to understand the general meaning of a matrix equation.

In mathematics a matrix is a rectangular table of numbers, while a vector is a list of numbers (specified as either a 1 row × n column row vector or an n row × 1 column column vector in some contexts). Matrices are usually uppercase, often denoted by boldface (V). Vectors are usually lowercase, either bold (x) or topped with arrows images. The transpose of a matrix or vector, which exchanges the rows and columns of a matrix or switches between row and column vectors, is written as VT or V'.

Matrices and vectors can be added to or subtracted from any matrix or vector with the same number of rows and columns. If the number of rows of A is equal to the number of columns of B, then A can be multiplied on the left by B (i.e., BA is well-defined). Matrix multiplication is noncommutative in general: ABBA, although diagonal matrices (matrices with nonzero entries only on the diagonal) do commute.

Matrices can be multiplied on the right by column vectors (Ax) or on the left by row vectors (xTA) or anywhere by scalars (i.e., plain numbers: cA = Ac). The inverse of a matrix, A–1, is the matrix such that AA–1 equals the identity matrix (1 or I)—a matrix with ones on the diagonal and zero everywhere else. Multiplying by the inverse of a matrix is like dividing by the matrix.

The inner product of two (column) vectors x and y with each other is x T y. The inner product of a vector with itself, x T x, is the sum of squares of its elements. The quadratic form of a matrix A and a vector x is xT Ax. The quadratic form that appears in the multivariate normal distribution, (x–μ)T V–1(x–μ), where x is the data vector, μ is the vector of means, and V is the variance-covariance matrix, is roughly analogous to (x–μ)22 in the univariate normal distribution. We could write the univariate form as (x–μ)(σ2)–1(x - μ) to make the two expressions look more similar.

The determinant of a matrix, |A| or det (A), is complicated in general, but for diagonal matrices it is equal to the product of the diagonal entries. Similarly, the trace, tr (A), is the sum of the diagonal entries for a diagonal matrix.

The best way to figure out a matrix equation is to think about the equivalent scalar equation, or see how the equation would simplify if all the matrices were diagonal; see p. 321 for an example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.244.250