Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2

Standard Statistical Inference

Christophe Dutang

Université de Strasbourg and Université du Maine, Le Mans, France
Strasbourg, France

2.1 Probability Distributions in Actuarial Science

Let X be our quantity of interest. Actuarial models rely on particular assumptions on the probability distribution of X. When X represents the claim amount or the life length of an individual, one expects X to have a distribution $ℝ_{+}$ , whereas when X represents the claim number, we deal with distribution on ℕ. But, characterizing the support of the random variable X is a necessary but not a sufficient step to characterize our quantity of interest.

In the discrete case, probability distributions are generally characterized by the mass probability function $p x$ or the “elementary” probabilities: $p x (x) = ℙ (X = x)$ for $x \in ℕ$ In the continuous case, we define the probability distribution by its density $f x (x)$ , being the infinitesimal version $p x$ such that $f x (x) d x = ℙ (X \in [x, x + d x])$ . A third case is when the random variable has both continuous and discrete parts, for which there is no proper density. In such a case, we define the distribution with the cumulative distribution function $F_{X} (x) = ℙ (X \leq x)$ . We recall that for a discrete distribution on ℕ $F_{X} (x) = Σ_{n = 0}^{[x]} p_{X} (n)$ while for a continuous distribution, $F_{X} (x) = \int_{- \infty}^{X} f x (y) d y$ .

The purpose of this section is to present the most common distributions used in actuarial sciences, being continuous, discrete or mixed-type. As always in this book, a special emphasis is put on how this topic is implemented and can be extended in R.

2.1.1 Continuous Distributions

There are a lot of ways to classify and to distinguish distributions. We present here the Pearson system and the exponential family, the latter being used, for instance, in generalized linear models (GLM, see Chapter 14). Pearson (1895) considers the family of continuous distributions such that the density function $f_{X}$ verifies the following ordinary differential equation:

$\frac{1}{f x (x)} \frac{d f x (x)}{d x} = - \frac{a + x}{c_{0} + c_{1} x + c_{2} x^{2}}$

where $a, c_{0}, c_{1}, c_{2}$ are constants. Let p(x) = c0 + c1x + c2 x2. The solution is defined up to a constant K which is derived by the constraint $\int_{ℝ} f_{X} (x) d x = 1$ . Type 0 is obtained when c1 = c2 = 0: we get $f_{X} (x) = K e^{- (2 a + \infty) \infty / (2 c_{0})}$ $f_{X} (x) = K e^{- (2 a + \infty) \infty / (2 c_{0})}$ . , which is the the normal distribution. Type 1 is the case where the polynomial function c0 + c1x + c2 x2 has two distinct real roots a1 and a2 such that a1 < 0 < a2: we get $f_{X} (x) = K {(x - a_{1})}^{m_{1}} {(a_{2} - x)}^{m_{2}}$ . We recognize the beta distribution. Type 2 corresponds to the case where m1 = m2 = m.

Type 3 is obtained when c2 =0 leading a first-order polynomial function c0 + c1x. In this case, we get the gamma distribution with $f_{X} (x) = K (c_{0} + c_{1} x) m_{e^{\infty + c_{1}}}$ $f_{X} (x) = K (c_{0} + c_{1} x) m e^{x + c 1}$ . . Type 4 corresponds to the case where the polynomial function p(x) = c0 + c1x + c2x2 has no real roots, in which case p(x) = C0 + c2(x + C1)2. We get $f_{X} (x) = K (c_{0} + c_{2} {(x + c_{1})}^{2}) e^{k \tan^{- 1 ((x + c_{1}) / \sqrt{c_{0} / c_{2}})}}$ , which is closely linked to the generalized inverse Gaussian distribution of Barndoff-Nielsen.

We get type 5 when p is a perfect square, that is, p(x) = (x + C1)2. The associated density is $f_{X} (x) = K {(x + C_{1})}^{- 1 / c 2 e^{k} / (x + C_{1})}$ . Two special cases are obtained when k = 0, c2 > 0 for type 8 and c2 < 0 for type 9.

Type 6 is obtained when p has two real roots a1, a2 of the same sign for which we get $f_{X} (x) = K {(x - a_{1})}^{m_{1}} {(x - a_{2})}^{m_{2}}$ , a generalized Beta distribution. Finally, type 7 is obtained when a = c1 =0, leading to $f_{X} (x) K {(c_{0} + c_{2} x^{2})}^{- 1 / (2 c 2)}$ .

Those distributions are implemented in the package PearsonDS. In Figure 2.1, we plot the densities for the first seven types in order to compare the different possible shapes.

Figure 2.1

Figure showing Pearson’s distribution system.

Pearson’s distribution system.

> library(PearsonDS)

> x <- seq(-1, 6, by=1e-3)

> y0 <- dpearson0(x, 2, 1/2)

> y1 <- dpearsonI(x, 1.5, 2, 0, 2)

> y2 <- dpearsonII(x, 2, 0, 1)

> y3 <- dpearsonIII(x, 3, 0, 1/2)

> y4 <- dpearsonIV(x, 2.5, 1/3, 1, 2/3)

> y5 <- dpearsonV(x, 2.5, -1, 1)

> y6 <- dpearsonVI(x, 1/2, 2/3, 2, 1)

> y7 <- dpearsonVII(x, 3, 4, 1/2)

> plot(x, y0, type="l", ylim=range(y0, y1, y2, y3, y4, y5, y7),

+ ylab="f(x)", main="The Pearson distribution system")

> lines(x[y1 != 0], y1[y1 != 0], lty=2)

> lines(x[y2 != 0], y2[y2 != 0], lty=3)

> lines(x[y3 != 0], y3[y3 != 0], lty=4)

> lines(x, y4, col="grey")

> lines(x, y5, col="grey", lty=2)

> lines(x[y6 != 0], y6[y6 != 0], col="grey", lty=3)

> lines(x[y7 != 0], y7[y7 != 0], col="grey", lty=4)

> legend("topright", leg=paste("Pearson", 0:7), lty=1:4,

+ col=c(rep("black", 4), rep("grey", 4)))

Another important class of distribution is the exponential family, that Andersen (1970) traces back to the work of Pitman, Darmois and Koopman in the mid-1930s. This family contains distributions where the density function can be written as

$f_{X} (x) = \exp (\sum_{j = 1}^{d} a_{j} (x) α_{j} (θ) + b (x) + β (θ))$

where $θ \in ℝ^{d}$ is the d-dimensional parameter vector, and $a_{j}, α_{j}, b and β$ are known functions (see Bickel & Doksum (2001) for more details). The exponential family includes many familiar distributions. We recover the exponential distribution $f x (x) = λ e^{- λ x}$ with $d = 1, a (x) = x, α (x) = λ, b (x) = 0 and β (λ) = \log (λ)$ ,or the normal distribution, $f_{X} (x) = e - {(x - μ)}^{2} / (2 σ^{2}) / \sqrt{2 π σ^{2}}$ with $d = 2, a_{1} (x) = x^{2}, α_{1} (m, σ^{2}) = - 1 / (2 σ^{2}), a_{2} (x) = x,$ $α_{2} (m, σ^{2}) = m / σ^{2}, b (x) and β (m, σ^{2}) = - m / (2 σ^{2}) - \log \sqrt{2 π σ^{2}}$ . In the exponential family, the gamma and the inverse Gaussian distributions are also examples of particular interest in actuarial science.

In R, each probability distribution is implemented by a set of four functions and a particular root name foo: dfoo computes the density function fx (x) or the mass probability functionpx(x), pfoo the cumulative distribution function Fx(x), qfoo the quantile function F-1xand rfoo a random number generator. For instance, the gamma distribution with density $f_{X} (x) = λ^{α} x^{α - 1} e^{- λ x} / Γ (α)$ is implemented in dgamma, pgamma, qgamma and rgamma; see example below.

> dgamma(1:2, shape=2, rate=3/2)

[1] 0.5020429 0.2240418

> pgamma(1:2, shape=2, rate=3/2)

[1] 0.4421746 0.8008517

> qgamma(1/2, shape=2, rate=3/2)

[1] 1.118898

> set.seed(1)

> rgamma(5, shape=2, rate=3/2)

[1] 0.553910 2.380504 2.308780 1.367208 2.590273

In Table 2.1, the continuous distributions implemented in Rare listed. This set of distributions is rather limited, and in practice, other distributions such as Pareto are particularly relevant in actuarial science. Most of distributions are generally implemented in a dedicated package. The full list of non R-base distributions are listed on the corresponding task view http://cran.r-project.org/web/views/Distributions.html. Among the numerous packages, two packages focus on distributions relevant to actuarial science : actuar and ActuDistns. Note that actuar provides the raw moment $? (X^{k})$ , the limited expected values $? (\min {(X, l)}^{k})$ and the moment generating functions $? (e^{t X})$ for many distributions in three dedicated functions mfoo, levfoo and mgffoo.

Table 2.1

Continuous distributions in R.

Probability Distribution	Root	Probability Distribution	Root
Beta	beta	Logistic	logis
Cauchy	cauchy	Lognormal	lnorm
Chi-2	chisq	Normal	norm
Exponential	exp	Student t	t
Fisher F	f	Uniform	unif
Gamma	gamma	Weibull	weibull

When on a particular problem all classical distributions have been exhausted, it is sometimes appropriate to create new probability distributions. Typical transformations of a random variable X are listed:

(i) Translation X + c (e.g. the shifted lognormal distribution),
(ii) Scaling $λ$ X,
(iii) Power Xα (e.g. the generalized beta type 1 distribution),
(iv) Inverse 1/X (e.g. the inverse gamma distribution),
(v)The logarithm log(X) (e.g. the loglogistic distribution),
(vi)Exponential exp(X) and
(vii)The odds ratio X/(1 — X) (e.g. the beta type 2 distribution).

With the small code below, we can visualize all those transformations (except the last one) on gamma-distributed variables, using

$f_{Y} (y) = | \frac{d}{d y} (g^{- 1} (y)) | \cdot f_{X} (g^{- 1} (y)), where Y =g (Y),$

g being a monotonic transformation.

> f <- function(x) dgamma(x,2)

> f1 <- function(x) f(x-1)

> f2 <- function(x) f(x/2)/2

> f3 <- function(x) 2*x*f(x"2)

> f4 <- function(x) f(1/x)/x"2

> f5 <- function(x) f(exp(x))*exp(x)

> f6 <- function(x) f(log(x))/x

> x=seq(0,10,by=.025)

> plot(x,f(x), ylim=c(0, 1.3), xlim=c(0, 10), main="Theoretial densities",

+ lwd=2, type="l", xlab="x", ylab="")

> lines(x,f1(x), lty=2, lwd=2)

> lines(x,f2(x), lty=3, lwd=2)

> lines(x,f3(x), lty=4, lwd=2)

> lines(x,f4(x), lty=1, col="grey", lwd=2)

> lines(x,f5(x), lty=2, col="grey", lwd=2)

> lines(x,f6(x), lty=3, col="grey", lwd=2)

> legend("topright", lty=1:4, col=c(rep("black", 4), rep("grey", 3)),

+ leg=c("X","X+1","2X", "sqrt(X)", "1/X", "log(X)", "exp(X)"))

We can also run simulations and visualize kernel-based densities:

> set.seed(123)

> x <- rgamma(100, 2)

> x1 <- x+1

> x2 <- 2*x

> x3 <- sqrt(x)

> x4 <- 1/x

> x5 <- log(x)

> x6 <- exp(x)

> plot(density(x), ylim=c(0, 1), xlim=c(0, 10), main="Empirical densities", + lwd=2, xlab="x", ylab="f_X(x)")

> lines(density(x1), lty=2, lwd=2)

> lines(density(x2), lty=3, lwd=2)

> lines(density(x3), lty=4, lwd=2)

> lines(density(x4), lty=1, col="grey", lwd=2)

> lines(density(x5), lty=2, col="grey", lwd=2)

> lines(density(x6), lty=3, col="grey", lwd=2)

In Figure 2.2, we plot the empirical densities (as estimated by the density() function, using a kernel approach). Note that the exponential transformation has a heavy-tailed distribution and only the right-tail is shown on the graphic. With these transformations in mind, we can now list the set of distributions generally used in actuarial science.

Figure 2.2

Figure showing Transformation of random variables (from a gamma distribution).

Transformation of random variables (from a gamma distribution).

The most important distribution with finite-support is the uniform distribution with a density $f_{X} (x) = 1_{[0, 1]} (x)$ . The uniform distribution is always used for non-uniform random generation as the random variable $F_{X} {(U)}^{- 1}$ with U a uniform variable has distribution Fx.

Another popular distribution is the beta distribution, defined as

$f_{X} (x) = \frac{x^{a - 1} {(1 - x)}^{b - 1}}{β (a, b)} 1_{[0, 1]} (x) and F_{X} (x) = \frac{β (a, b, x)}{β (a, b)} .$

where $β (., .)$ is the beta function and $β (., ., .)$ is the incomplete lower beta function; see Olver et al. (2010). When a = b = 1, we get back to the uniform distribution, that is, fx is constant. When a,b < 1, the density fx is U-shaped, whereas for a,b > 1, the density is unimodal. A monotone density is obtained when a and b have opposite signs. Both of these distributions are implemented in R; see ?dunif and ?dbeta. By appropriate scaling and shifting, that is, c + (d — c)X, a distribution on any interval [c, d] can be obtained. Finally, another important distribution Tr(a, b, c) is the triangular distribution given by

$f_{X} (x) = \frac{2 (x - a)}{(b - a) (c - a)} 1_{[a, c]} (x) + \frac{2 (x - a)}{(b - a) (c - a)} 1_{[c, b]} (x),$

which as its name suggests has a triangular-shaped density. When b = (a + c)/2, the triangular is the sum of two uniform variates on interval [a, b]. The triangular distribution is available in triangle.

As presented in Klugman et al. (2009), the two main families of (unbounded) positive continuous distributions are the gamma-transformed family and the beta-transformed family. Let X follow a gamma distribution G(α, 1). The gamma-transformed family is the distribution of $Y = X^{1 / τ} / λ for τ > 0$ , which has the following density and distribution functions

$f_{Y} (y) = \frac{λ^{τ α}}{Γ (α)} T y^{α τ - 1} e^{- {(λ)}^{τ}} and F_{Y} (y) = Γ (α, {(λ y)}^{τ}) / Γ (α),$

where $Γ (., .)$ denotes the incomplete lower gamma function, see, for example, Olver et al. (2010). When $τ < 0$ we get the inverse gamma-transformed family. Let $τ*= - τ$ . The density and the distribution function of $Y = 1 / (λ X^{1 / τ^{*}})$ are given by

$f_{Y} (y) = \frac{τ^{*} e - {(λ y)}^{- τ^{*}}}{λ^{τ* α} y^{α τ* + 1} +^{1} Γ (α)} and F_{Y} (y) = 1 - Γ (α, {(λ y)}^{- τ^{*}}) / Γ (α) .$

On Figure 2.3, we list the different special cases of the transformed gamma distribution and their relationships.

Figure 2.3

Figure showing Transformed gamma family.

Transformed gamma family.

Figure 2.4

Figure showing Transformed beta family.

Transformed beta family.

The beta-transformed family is based on the beta distribution of the second kind (or type II), that is, the distribution of X/(1—X) when X follows a beta distribution of type I, see the previous subsection. The beta distribution of type II has a density $f_{X} (x) = \frac{x^{a - 1}}{β (a, b) {(1 + x)}^{a + b}}$ Renaming $a = α and b=τ,$ , the transformed beta is the distribution of $Y = θ X^{1 / γ}$ and has the following density and distribution function :

$f_{Y} (y) = \frac{1}{β (α, g τ)} \frac{γ {(y / θ)}^{γ τ}}{y {(1 + {(y / θ)}^{γ})}^{α + τ}} and F_{Y} (y) = \frac{β (α, τ, \frac{x}{1 + x})}{β (α, τ)},$

where $x = {(y / θ)}^{γ}$ and $β (., ., .)$ denotes the incomplete lower beta function. These two families are available in actuar.

2.1.2 Discrete Distributions

The Sundt (a, b, 0) family of distributions is the set of distributions verifying

$\frac{ℙ (X = k + 1)}{ℙ (X = k)} = a + \frac{b}{k},$

for $k \in ℕ$ and $a, b \geq 0$ positive parameters. This recurrence equation can be seen a simplied discrete equation of the Pearson system (see Johnson et al. (2005)). We get back to the binomial distribution B(n, p) with $a = - p / (1 - p)$ ,and $b = p / (n + 1) / (1 - p)$ the Poisson distribution $P (λ)$ with $a = 0 and b = λ$ and the negative binomial distribution N B(m, p) with $a = 1 - p and b = (1 - p) (m - 1)$ . A generalization of the (a, b, 0) family is obtained by truncating the values smaller than n. Thus, the (a, b, n) family verifies

$p_{X} (k) = p_{X}^{(k - 1)} (a + \frac{b}{k}) 1 (k \geq n) .$

Furthermore, the exponential family also models discrete distributions by considering the mass probability function $p_{X}$ that verifies

$p_{X} (k) = \exp (\sum_{j = 1}^{d} a_{j} (k) α_{j} (θ) + b (k) + β (θ))$

It includes many familiar distributions: the Bernoulli distribution with d =1, a(x) = x, a(p) = log(p/(1 — p)), b(x) = 0 and $β (p) = \log (1 - p)$ , and the Poisson distribution with $d = 1, a (x) = x, α (λ) = λ, b (x) = - \log (x!), and β (λ) = - λ .$ . See Chapter 14 for a discussion of the negative binomial distribution and the exponential family.

As for continuous distributions, discrete distributions are implemented in four functions: dfoo computes the mass probability function pxpfoo the cumulative distribution function Fx, qfoo the quantile function $F_{X}^{- 1}$ and rfoo the random number generator. For instance, the Poisson distribution is implemented in dpois, etc. Here is a standard call:

> dpois(0:2, lambda=3)

[1] 0.04978707 0.14936121 0.22404181

> ppois(1:2, lambda=3)

[1] 0.1991483 0.4231901

> qpois(1/2, lambda=3)

[1] 3

> rpois(5, lambda=3)

[1] 2 2 3 5 2

Typical transformations of an integer-valued random variable X are listed: (i) translation X + m for a non-null interger m (e.g. the shifted Poisson distribution), (ii) scaling mX, (iii) zero-inflation (1 — B)X where B follows a Bernoulli distribution B(q) and (iv) zero- modification (1 — B)(X + 1) where B follows a Bernoulli distribution. The resulting mass probability function for the transformed variable Y is

$(i) ℙ (Y = k) = ℙ (X = k - m) for k \geq m,$

$(ii) ℙ (Y = k) = ℙ (X = k / m) for k = 0, m, 2 m, 3 m, ...,$

$(iii) ℙ (Y = 0) = q + (1 - q) ℙ (X = 0) and ℙ (Y = k) = (1 - q) ℙ (Y = k) f o r k \geq 1,$

$(iv) ℙ (Y = 0) = q and ℙ (Y = k) = (1 - q) ℙ (Y = k - 1) f o r k \geq 1.$

The zero-modification and the zero-inflation are useful to add a parameter to standard discrete distributions, for example, the Poisson distribution. A particular of the zero- modification is the zero-truncation when the variable B equals almost surely 0. Those transformations will be considered in Chapter 14, in the context of modeling claims frequency in motor insurance.

Such transformations are implemented in special packages, see the task view, but can be easily implemented.

> dpoisZM <- function(x, prob, lambda)

+ prob*(x == 0) + (1-prob)*(x > 0)*dpois(x-1, lambda)

> ppoisZM <- function(q, prob, lambda)

+ prob*(q >= 0) + (1-prob)*(q > 0)*ppois(q-1, lambda)

> qpoisZM <- function(p, prob, lambda)

+ ifelse(p <= prob, 0, 1+qpois((p-prob)/(1-prob), lambda))

> rpoisZM <- function(n, prob, lambda)

+ (1-rbinom(n, 1, prob))*(rpois(n, lambda)+1)

> x <- rpoisZM(100, 1/2, 3)

> plot(ecdf(x), main="Zero-modified Poisson(prob=1/2, lam=3)")

> lines(z <- sort(c(0:12, 0:12-1e-6)),

+ ppoisZM(z, 1/2, 3), col="grey", lty=4, lwd=2)

> legend("bottomright", lty=c(1,4), lwd=1:2,

+ col=c("black","grey"), leg=c("empir.","theo."))

In Figure 2.5, we plot the empirical cumulative distribution function of a zero-modified Poisson distribution.

Figure 2.5

Figure showing Cumulative distribution function of a zero-modified Poisson sample.

Cumulative distribution function of a zero-modified Poisson sample.

The main discrete distributions are the binomial B(n,p), the Poisson P( $λ$ ) and the negative binomial N B(m,p) distributions, for which we recall the mass probability function $p_{X} (k)$

$ℙ (X = k) = (\begin{matrix} n \\ k \end{matrix}) p^{k} {(1 - p)}^{n - k},$

for $p \in [0, 1]$ (the Bernoulli distribution is obtained with n =1).

$ℙ (X = k) = \frac{λ^{k}}{k!} e^{- λ},$

for $λ > 0$ and

$ℙ (X = k) = (\begin{matrix} m + k - 1 \\ k \end{matrix}) p^{m} {(1 - p)}^{k},$

for $p \in [0, 1]$ . The discrete analog of the Pareto distribution is the Zipf distribution whose mass probability function is given by

$ℙ (X = k) \frac{k^{n}}{ζ (η)},$

where ζ(.) is the zeta’s Rieman function; see Olver et al. (2010).

2.1.3 Mixed-Type Distributions

Mixed-type distributions are distributions of random variables that are neither continuous nor discrete, that is, $0 < Σ_{x \in D_{X}} (F_{X} (x) - F_{X} (x -)) < 1 {for D}_{X}$ the set of discontinuities where the lower bound corresponds to continuous distributions and the upper bound discrete distributions. Thus, the distribution function has discontinuities and continuous parts. A first example of mixed-type distribution is the zero-modified gamma distribution which has the distribution function

$F_{X} (x) = p 1_{x}_{\geq 0} + (1 - p) \frac{Γ (α, λ x)}{Γ (α)},$

where $Γ (., .)$ denotes the incomplete gamma function. X has an improper density function $f_{X} (x) = (1 - p) λ^{α} x^{α - 1} e^{- λ x} / Γ (α)$ . In a similar way, zero-modified Pareto or zero-modified lognormal distributions can be defined.

An application of mixed-type distributions to destruction rate models is now presented. Destruction rate models focus on the distribution X = L/d where L is the loss amount and d the maximum possible loss (as defined in the insurance terms). By definition, X is bounded to the interval [0,1], and may have a mass at 1 when the object insured is entirely destroyed. In the application that will follow, we will consider the one-modified beta and the MBBEFD distributions. The one-modified beta is the distribution of X = BY where Y follows a beta distribution B(a, b) and B follows a Bernoulli distribution B(q). The distribution function is given by $F_{X} (x) = \frac{β (a, b, x)}{β} (1 - q) + q 1_{x} {_{\geq}}_{1},$

for which the improper density is $f_{X} (x) = (1 - q) x^{a - 1} {(1 - x)}^{b - 1} / β (a, b) .$ In R, we define it as

> dbetaOM <- function(x, prob, a, b)

+ dbeta(x, a, b)*(1-prob)*(x != 1) + prob*(x == 1)

> pbetaOM <- function(q, prob, a, b)

+ pbeta(q, a, b)*(1-prob) + prob*(q >= 1)

The Maxwell-Boltzmann Bore-Einstein Fermi-Dirac (MBBEFD) distribution was introduced and popularized by Bernegger (1997) in the context of reinsurance treaties. The distribution function is given by

$F_{X} (x) = a (\frac{a + 1}{a + b^{x}} - 1) 1_{[0, 1 [} (x) + 1_{[1, + \infty [} (x),$

where $(a, b) \in [- 1, 0 [\times] 1, + \infty] or (a, b) \in (ℝ [- 1, 0]) \times [0, 1]$ Note that there is a probability
mass at 1, since ℙ(X=1)=(a+1)b/(a+b)=qThe improper density function is

$f_{X} (x) = \frac{- a (a - 1) b^{x} \log (b)}{{(a + b^{x})}^{2}} 1_{[0, 1]} (x) .$

At the time this book is written, there is no package implementing the MBBEFD distribution,
but this can be remedied by the following lines:

> dMBBEFD <- function(x, a, b)

+ -a*(a+1)*b"x*log(b)/(a + b"x)"2 + (a+1)*b/(a+b)*(x == 1)

> pMBBEFD <- function(x, a, b)

+ a*((a+1)/(a+b"x)-1)*(x<1)+1*(x>=1)

Those two distributions will be used in the subsequent section on destruction rate data.

Mixing distributions consists of randomly drawing a distribution among a finite set of distributions. Consider a set of distribution functions $F_{1}, ..., F_{p}$ and a set of weights $ω_{1}, ...., ω_{p} \in [0, 1]$ . The choice $Θ$ of a distribution is such that $ℙ (Θ = i) = ω_{i} for i = 1, ...., p .$ . The random generation process given by (i) draw $Θ$ according to $ω_{i}' s$ and (ii) draw according to Fc knowing $Θ$ is the mixture distribution among $(F_{1}, ..., F_{p})$ according to $ω_{1}, ..., ω_{p}$ . This is characterized by the following distribution function

$F_{X} (x) = \sum_{i = 1}^{p} ω_{i} F_{i} (x)$

for all $x \in ℝ$ . If distributions $F_{i}$ are differentiable, then the density function of the mixture variable X is simply $f_{X} (x) = \sum_{i = 1}^{p} ω_{i} F_{i} (x)$

A first simple example is the mixture of two normal distributions $N (m_{1}, s_{1}^{2}), N (m_{2}, s_{2}^{2})$ with the following density:

$f_{X} (x) = p \frac{_{e} - {(x - m_{1})}^{2} (2 s_{1}^{2})}{\sqrt{2 π s_{1}^{2}}} + (1 - p) \frac{_{e} - {(x - m_{2})}^{2} (2 s_{2}^{2})}{\sqrt{2 π s_{2}^{2}}},$

with a proportion $p \in [0, 1] and x \in ℝ$ This distribution is implemented in the package mixtools and norm1mix. A second example of more interest in actuarial science is the mixture of a light-tailed and heavy-tailed claim distribution. Say, for example, the mixture of a gamma distribution $G (v, λ)$ and a Pareto distribution $p (α, θ)$ . The density is given by

$f_{X} (x) = p \frac{λ^{v} x^{v - 1} e^{- λ x}}{Γ (v)} + (1 - p) α / θ {(\frac{θ}{θ + x})}^{α + 1},$

with a proportion $p \in [0, 1] and x \in ℝ_{+}$ . In R, we implement it as

> library(actuar)

> dmixgampar <- function(x, prob, nu, lambda, alpha, theta)

+ prob*dgamma(x, nu, lambda) + (1-prob)*dpareto(x, alpha, theta)

> pmixgampar <- function(q, prob, nu, lambda, alpha, theta)

+ prob*pgamma(q, nu, lambda) + (1-prob)*ppareto(q, alpha, theta)

where dpareto is implemented in the actuar package.

Another important family obtained using mixtures are the so-called phase-type distributions, obtained as mixtures of exponential distributions. Given p a vector of probabilities of length k, and $M a k \times k$ matrix, X is said to be phase-type distributed, with parameters p and M if

$F_{X} (x) = ℙ (X \leq x) = 1 - p \exp [M x] 1$

where exp denotes here the matrix exponential; see Moler & Van Loan (1978) for a recent survey. The phase-type distribution can be seen as the distribution of the time to absorption (in the state 0) of a Markov jump process on the set {0,1,... ,n} with initial probability (0, p) and intensity matrix

where the vector $m_{0} = - M 1_{ℝ^{k} .}$ . Observe that X has density

$f x (x) = - p \exp [M x] m_{0} .$

In package actuar, phtype distributions do exist, prob being vector p and rates being matrix M. One particular case is the Erlang distribution: Erlang (k, $λ$ ) distribution, with density

$f (x; k, λ) = \frac{λ^{k} x^{k - 1} e^{- λ x}}{Γ (k)}$

is obtained when p = (1,0,..., 0) (of length k), and M is the $k \times k$ matrix with — $λ$ on the diagonal, A above, and $λ$ elsewhere (seeO’Cinneide (1990) and Bladt (2005) for more details). To generate an Erlang distribution with k = 3 and $λ$ = 2, one can use

> M <- matrix(0,3,3)

> diag(M) <- -2

> diag(M[1:(nrow(M)-1),2:ncol(M)])

> M

 [,1] [,2] [,3]

[1,]  -2 2 0

[2,] 0  -2 2

[3,] 0 0  -2

> set.seed(123)

> rphtype(5, prob=c(1,0,0), rates=M)

[1] 0.3311529 2.3017693 0.5631011 2.7375481 2.0612129

Other distributions such as mixture of generalized Erlang distributions are phase-type, but it is shown that a phase-type distribution does have Laplace transform $\hat{f} (s) = π {(- s I_{k} - M)}^{- 1} m_{0}$ . Therefore, the phase-type family does not include heavy-tailed distribution.

2.1.4 S3 versus S4 Types for Distribution

In the previous chapter, the distinction between S3 and S4 objects was introduced. Some packages allow one to use S4 objects to deal with distributions. For instance, using

> library(distr)

> library(distrEx)

we define an object, which is a distribution, and then various functions can be used to get the density, the quantile function or a random number generator, based on that distribution. Consider, for instance, the N(5, 22) distribution:

> X <- Norm(mean=5,sd=2)

> X

Distribution Object of Class: Norm
 mean: 5 
 sd: 2

Figure 2.6

Figure showing Using plot of a distr object for a discrete distribution.

Using plot of a distr object for a discrete distribution.

If we want to compute quantiles associated to that distribution, we use the quantile q() function: q(X) is then the function $u \mapsto F_{X}^{- 1} (u)$ ,

> q(X)

function (p, lower.tail = TRUE, log.p = FALSE)

qnorm(p, mean =5, sd = 2, lower.tail = lower.tail, log.p = log.p)

<environment: 0x10d796c98>

And if we want to evaluate that function, for instance to get the value of $F_{X}^{- 1} (0.25)$ , we use

> q(X)(0.25)

[1] 3.65102

(which is the same as the standard qnorm(0.25,mean=5,sd=2)). Various functions can also be used to derive simple quantities associated to that distribution, such as moments

> mean(X)

[1] 5

We can also create discrete distributions, such as

> N <- DiscreteDistribution(supp=c(1,2,4,9) , prob=c(.2,.4,.35,.05))

where the support supp and the associate probabilities prob are mentioned. One can then use r() to generate random numbers, d() to compute the density function, p() to compute the cumulative distribution function, and q() to compute the quantile function. We can also visualize that distribution using

> plot(N)

An interesting feature of this S4 class is that simple arithmetics on distributions can be performed. Consider two distributions X1 and X2:

> X1 <- Norm(mean=5,sd=2)

> X2 <- Norm(mean=2,sd=1)

then operator + can be used to define a distribution which will be the sum of two independent Gaussian random variables,

> S <- X1+X2

> plot(S)

If we look at the titles on Figure 2.7, we can see that S is recognized as a Gaussian distribution (the sum of two independent Gaussian distributions being also a Gaussian distribution).

Figure 2.7

Figure showing Using plot of a distr object for a sum of two variables.

Using plot of a distr object for a sum of two variables.

Other operators can be used, such as -, * or /, even ^,

> U <- Unif(Min=0,Max=1)

> N <- DiscreteDistribution(supp=c(1,2,4,9), prob=c(.2,.4,.35,.05))

> Z <- U"N

> plot(Z)

Such a function is (absolutely continuous), and is recognized as an AbscontDistribution object. A more complex object is obtained if N can take value 0. Then, the distribution is no longer absolutely continuous:

> N <- DiscreteDistribution(supp=c(0,1,2,4) , prob=c(.2,.4,.35,.05))

> Z <- U~N

> Z

An object of class "UnivarLebDecDistribution"

--- a Lebesgue decomposed distribution:

  Its discrete part (with weight 0.200000) is a Distribution Object of Class: Dirac location: 1

This part is accessible with ’discretePart(<obj>)’.

  Its absolutely continuous part (with weight 0.800000) is a Distribution Object of Class: AbscontDistribution This part is accessible with ’acPart(<obj>)’.

Using plot(Z), we can see that this distribution has two components: a (absolutely) continuous one, and a Dirac mass, in 1.

Observe finally that compound distributions can also be generated easily. The standard compound Poisson distribution is obtained using

> CP <- CompoundDistribution(Pois(), Gammad())

> CP

An object of class "CompoundDistribution"

  The frequency distribution is:

Distribution Object of Class: Pois lambda: 1

  The summands distribution is/are:

Distribution Object of Class: Gammad
shape: 1 
scale: 1

2.2 Parametric Inference

Parametric inference deals with the estimation of an unknown parameter of a chosen distribution. The experimenter assumes that $(x_{1}, ..., x_{n})$ are realizations of a random sample $(X_{1}, ..., X_{n})$ such that X are independent and identically distributed randoms variables according to a generic random variable X (this is the blanket assumption). The random variable X has a distribution function F(.; $θ$ )for $θ$ $\in$ $Θ$ $ℝ^{d}$ . For example, $F (x; θ) = (1 - e^{- θ x}) 1_{ℝ_{+}} (x)$ when considering an exponential distribution $θ \in ℝ_{+}$ . In the following subsequent sections, classical estimation methods are presented and provide criteria to establish an estimator $\hat{θ}$ of $θ$ . Once a model is fitted, the experimenter can derive its quantities of interest (mean, variance, quantiles, survival probabilities,. . .), derived from the fitted distribution $F (x; \hat{θ})$ . In most applications, X has either a continuous or a discrete distribution. Therefore, we work either the density $f_{X} (.; θ)$ function or the mass probability function $p x (.; θ)$ . For a general introduction to statistical inference, we refer to Casella & Berger (2002).

2.2.1 Maximum Likelihood Estimation

As its name suggests, maximum likelihood estimation consists of maximizing the likelihood with respect to $θ$ , which is defined as

$L (θ, x_{1}, ..., x_{n}) = \prod_{i = 1}^{n} f_{X} (x_{i}; θ) or \prod_{i = 1}^{n} p_{X} (x_{i}; θ)$

depending on the type of the random variable X. For many reasons, it is more convenient to maximize the log-likelihood log L with respect to $θ$ A school example is to consider the exponential distribution $ε (θ)$ , that is, $L (θ) = n \log (θ) - \sum_{i = 1}^{n} θ x_{i}$ . The maximizer is $1 / {\bar{x}}_{n}$ leading to the estimator $\hat{θ} = 1 / {\bar{X}}_{n}$ . In practice, closed-form formulas of the maximizers may not exist, thus we use a numeric optimization. The fitdistrplus package provides routines to compute the maximum likelihood estimator for most standard distributions.

We consider a claim dataset itamtplcost which contains large losses (in excess of 500,00 euros) of an Italian Motor-TPL company since 1997. For pedagogical purposes (despite that the distribution is not really appropriate), we choose to fit a gamma distribution $G (α, λ)$ defined as $f_{X} (x) = λ^{α} x^{α - 1} e^{- λ x} / Γ (α)$ , with parameter $θ = (α, λ) \in ℝ_{+}^{2}$

> data(itamtplcost)

> library(fitdistrplus)

> x <- itamtplcost$UltimateCost / 10"6

> summary(x)

 Min. 1st Qu.  Median Mean 3rd Qu. Max.

0.002161 0.627700 0.844000 1.015000 1.224000 6.639000

> fgamMLE <- fitdist(x, "gamma", method="mle")

> fgamMLE

Fitting of the distribution ' gamma ' by maximum likelihood Parameters:

 estimate  Std. Error

shape 2.398655  0.1489696

rate 2.362486  0.1631542

> summary(fgamMLE)

Fitting of the distribution ' gamma ' by maximum likelihood
Parameters :

  estimate Std. Error

shape 2.398655 0.1489696

rate 2.362486 0.1631542

Loglikelihood:  -385.1474  AIC:  774.2947  BIC:  782.5441

Correlation matrix:

  shap  rate

shape 1.0000000 0.8992915

rate 0.8992915 1.0000000

Without a scaling of cost from euros to millions of euros, the call to fitdist raises an error, thus we divide the ultimate cost by 106. In this example, $\hat{θ}$ is estimated as (2.398655, 2.362486). Note that the fitdist function returns an S3-object of class fitdist, for which print, summary and plot methods have been defined. In addition to the estimation of standard errors of $\hat{θ}$ , the summary method gives an estimation of the asymptotic correlation matrix as well as the (optimal) log-likelihood. This is based on the asymptotic normality of the maximum likelihood estimators (under the hypotheses of the Cramer-Rao model; see Casella & Berger (2002)).

2.2.2 Moment Matching Estimation

The moment matching estimation is also commonly used to fit parametric distributions. This consists of finding the value of the parameter $θ$ that matches the first theoretical raw moments of the parametric distribution to the corresponding empirical raw moments as

$? (X^{k} | θ) = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{k},$

for k = 1,... ,d, with d the number of parameters to estimate and $x_{i}$ the n observations of variable X. For moments of order greater than or equal to 2, it may be relevant to match centered moments defined as

$? (X | θ) = {\bar{x}}_{n}, ? ({(X - ? (X))}^{k} | θ) = m_{k}, for k=2, ...,d,$

where $m_{k} = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - {\bar{x}}_{n})}^{k}$ denotes the empirical centered moments. For instance, consider the gamma distribution $G (α, λ)$ . The moment matching estimation solves

${\begin{cases} α / λ = {\bar{x}}_{n} \\ α / λ^{2} = m_{2} \end{cases} \Leftrightarrow {\begin{cases} α = {({\bar{x}}_{n})}^{2} / m_{n} \\ λ = {\bar{x}}_{n} / m_{2} \end{cases}$

In general, there are no closed-form formulas for this estimator and use a numerical method. Still considering the gamma distribution fit on the MTPL dataset, we use the fitdistrplus package.

> fgamMME <- fitdist(x, "gamma", method="mme")

> cbind(MLE=fgamMLE$estimate, MME=fgamMME$estimate)

   MLE  MME

shape 2.398655 2.229563

rate 2.362486 2.195851

2.2.3 Quantile Matching Estimation

Fitting of a parametric distribution may also be done by matching theoretical quantiles of the parametric distribution (for some specified probabilities) against the empirical quantiles (see Tse (2009) among others). The equation below is very similar to the previous equations for matching moments

$F^{- 1} (p_{k}; θ) = Q_{n, p k},$

for $k = 1, ..., d and Q_{n, p k}$ the empirical quantiles for specified probabilities pk. Empirical quantiles $Q_{n},_{p k}$ are computed on observations $x_{1}, ..., x_{n}$ using the quantile function of the stats package. When $n . p k$ is an integer, the empirical quantile is uniquely defined as $Q_{n},_{p k} = x_{p k n,}^{*}$ , where $x_{1}^{*}, ..., x_{n}^{*}$ is the sorted sample. Otherwise, the empirical quantile is the convex combination of $x^{*}_{[p k^{n}]} and x^{*}_{[p k^{n}]};$ see ?quantile and Hyndman & Fan (1996). The theoretical quantile F-1(.; $θ$ ) can have a closed-form formula. For example, when considering the exponential distribution $\in (λ)$ , the quantile function is

$F^{- 1} (p; θ) = \frac{\log (1 - p)}{λ} .$

Solving the d equations $F^{- 1} (p; θ) = Q_{n, p k}$ is achieved by a numeric optimization in the fitdist function.

Continuing the MTPL example, we fit a gamma distribution against the probabilities $p_{1} = 1 / 3 and p_{2} = 2 / 3.$

> fgamQME <- fitdist(x, "gamma", method="qme", probs=c(1/3, 2/3))

> cbind(MLE=fgamMLE$estimate, MME=fgamMME$estimate,

+  QME=fgamQME$estimate)

   MLE MME  QME

shape 2.398655 2.229563 4.64246

rate 2.362486 2.195851 4.95115

Note that compared to the method of moments and the maximum likelihood estimation, the estimate paramater $\hat{θ}$ differs significantly when using the quantile matching estimation, despite considering probabilities in the heart of the distribution.

2.2.4 Maximum Goodness-of-Fit Estimation

A last method of estimation called maximum goodness-of-fit estimation or (minimum distance estimation) is presented here; see D’Agostino & Stephens (1986) or Dutang et al. (2008) for more details. In this section, we focus on the Cramer-von Mises distance and refer to Delignette-Muller & Dutang (2013) for other distances (i.e. Kolmogorov-Smirnov and Anderson-Darling). The Cramer-von Mises looks at the squared difference between the candidate distribution $F (x : θ)$ and the empirical distribution function Fn, the latter being defined as the percentage of observations below x: $x : F_{n} (x) = \sum_{i = 1}^{n} 1_{x_{i} \leq x} .$ . The Cramer-von Mises distance is defined as

$D (θ) = \int_{- \infty}^{\infty} (F_{n} (x) - F {(x; θ)}^{2}) d x,$

and is estimated in practice by

$\hat{D} (θ) = \frac{1}{12 n} + {\sum_{i = 1}^{n} (F (x_{i}; θ) - \frac{2 i - 1}{2 n})}^{2} .$

The maximum goodness-of-fit estimation consists of finding the value of $θ$ minimizing $\hat{D} (θ)$ . The name comes from the fact that the Cramer-von Mises distance measures the goodness- of-fit of F(.; $θ$ ) against $F_{n}$ . There is no closed-form formula for argmin ${\hat{D} (θ)}$ , and a numerical optimization is used in the fitdist function.

Finally, we fit a gamma distribution by maximum goodness-of-fit estimation:

> fgamMGE <- fitdist(x, "gamma", method="mge", gof="CvM")

> cbind(MLE=fgamMLE$estimate, MME=fgamMME$estimate,

+ QME=fgamQME$estimate, MGE=fgamMGE$estimate)

   MLE  MME   QME   MGE

shape 2.398655  2.229563  4.64246  3.720546

rate  2.362486  2.195851  4.95115  3.875971

As for quantile matching estimation, the value of $\hat{θ}$ differs widely. A practitioner approach could be take the average by component irrespectively of the methods tested. This leads to the question of how to choose between fitted parameters and between fitted distributions.

2.3 Measures of Adequacy

This section focuses on measures of adequacy either graphical methods or numerical methods.

2.3.1 Histogram and Empirical Densities

A typical plot to assess the adequacy of a distribution is the histogram. We recall that for plotting a histogram, observed data are divided into k classes $[a_{j - 1}, a_{j}] for j =1, ....,k$ (generally k is proportional to log(n)); the number of observation in each class is computed, that is, frequencies $f_{j}$ ; finally rectangles are drawn such that the basis is a class $] a_{j} - 1, a_{j}]$ and the height is the absolute $f_{j};$ or the relative $f_{j} / n$ frequencies. Thus, the histogram is an estimator of the empirical density, as the area of a rectangle is proportional to $P (X \in [a_{j - 1,} a_{j}])$ . This graph is generically provided in the plot function of a fitdist object, but does not allow multiple fitted distributions. So in the example of a gamma fit to the MTPL dataset, we use the denscomp function:

> txt <- c("MLE","MME","QME(1/3, 2/3)", "MGE-CvM")

> denscomp(list(fgamMLE, fgamMME, fgamQME, fgamMGE), legendtext=txt,

+ fitcol="black", main="Histogram and fitted gamma densities")

Alternatively, we can estimate directly the density function by the popular kernel density estimation. This is implemented in the density function as shown below

> hist(x, prob=TRUE, ylim=c(0, 1))

> lines(density(x), lty=5)

In order to better assess the fitted gamma densities, the two above graphs are plotted on separate graphics. We observe that the MLE and the MME fits best approximates the density between $x \in [0.5, 1.5]$ , while the QME and the MGE fit best assess the density between $x \in [1.5, 4]$ . However, it is clear that the gamma distribution cannot appropriately fit the whole distribution, mainly due to its light-tailedness.

2.3.2 Distribution Function Plot

Another typical graph is to plot the fitted distribution $F (.; \hat{θ})$ and the empirical cumulative distribution function $F_{n}$ . As already given, the computation of $F_{n}$ is simpler than for the empirical density $F_{n} (x) = \sum_{i = 1}^{n} 1_{x i \leq x} .$ . A new claim dataset is considered to illustrate this type of plot: we use the popular Danish dataset, used in McNeil (1997). The dataset is stored in danishuni for the univariate version and contains fire loss amounts collected at Copenhagen Reinsurance between 1980 and 1990. We consider three distributions: a gamma distribution

$G (α, λ)$ a Pareto distribution $P (α, θ),$ a Pareto-gamma mixture $P G (p, α_{1}, θ_{1}, α_{2}, λ_{2})$ defined in Section 2.1.3 and a Burr distribution $B u (γ, τ, θ)$

> data(danishuni)

> x <- danishuni$Loss

> fgam <- fitdist(x, "gamma", lower=0)

> fpar <- fitdist(x, "pareto", start=list(shape=2, scale=2), lower=0)

> fmixgampar <- fitdist(x, "mixgampar", start=

+ list(prob=1/2, nu=1, lambda=1, alpha=2, theta=2), lower=0)

> cbind(SINGLE= c(NA, fgam$estimate, fpar$estimate),

+ MIXTURE=fmixgampar$estimate)

   SINGLE  MIXTURE

   NA 0.6849568

shape  1.2976150 10.8706430

rate 0.3833335 6.5436349

shape  5.3689277 5.4182746

scale  13.8413207  30.0700544

When fitted alone, the parameters of the gamma distribution are estimated as $(\hat{α}, \hat{λ}) =$ (1.2976, 0.3833) and the parameters of the Pareto distribution are estimated as $(\hat{α}, \hat{θ}) =$ (5.3689,13.8413). When used in the mixture, we get

$(\hat{p}, {\hat{α}}_{1}, {\hat{θ}}_{1}, {\hat{α}}_{2}, {\hat{λ}}_{2}) = (0.6849, 10.8706, 6.5436, 5.4182, 30.0700) .$

As only the shape parameter ${\hat{α}}_{2}$ is of similar amplitude, only heavy-tailed distributions (like the Pareto) are appropriate for this dataset. Finally, we fit a Burr distribution:

> fburr <- fitdist(x, "burr", start=list(shape1=2, shape2=2,

+ scale=2), lower=c(0.1,1/2, 0))

> fburr$estimate

 shape1 shape2 scale

0.10000 14.44286 1.08527

Comparing the fitted densities is then carried out using the cdfcomp function:

> cdfcomp(list(fgam, fpar, fmixgampar, fburr), xlogscale=TRUE,

+ datapch=".", datacol="grey", fitcol="black", fitlty=2:5,

+ legendtext=c("gamma","Pareto","Par-gam","Burr"),

+ main="Fitted CDFs on danish")

We also plot the tail of the distribution function on Figure 2.10 When using the maximum likelihood estimation, the best fit is provided by the Burr distribution, yet the first shape parameter hits the lower bound of 0.1.

Figure 2.8

Figure showing Using plot of a distr object for , N being a discrete distribution.

Using plot of a distr object for UN,N being a discrete distribution.

Figure 2.9

Figure showing Comparison of fits on a MTPL dataset.

Comparison of fits on a MTPL dataset.

Figure 2.10

Figure showing Comparison of fits on the Danish fire dataset.

Comparison of fits on the Danish fire dataset.

2.3.3 QQ-Plot, PP-Plot

On the two previous graphs, we consider the plot of the empirical density (respectively the empirical distribution function) and the fitted density (respectively the fitted distribution function). The PP-plot (respectively the QQ-plot) consists of plotting (directly) the empirical distribution function $F_{n}$ against the fitted distribution function $F (.; \hat{θ})$ (respectively the empirical quantile function $Q_{n}$ against the fitted quantile function $F^{- 1} (.; \hat{θ})$ Those quantities are computed at the observations, which leads to further simplifications $F_{n} (x_{i}) = rank (x_{i}) / n and Q_{n, i / n} = x_{i}^{*}$ This is illustrated on the Danish fire dataset danishuni and the four distributions considered by using the ppcomp and qqcomp functions:

> qmixgampar <- function(p, prob, nu, lambda, alpha, theta)

+ {

+ L2 <- function(q, p)

+ (p - pmixgampar(q, prob, nu, lambda, alpha, theta))"2 + sapply(p, function(p) optimize(L2, c(0, 10"3), p=p)$minimum)

+}

> ppcomp(list(fgam, fpar, fmixgampar, fburr), xlogscale=TRUE,

+ ylogscale=TRUE, fitcol="black", main="PP-plot on danish",

+ legendtext=c("gamma","Pareto","Par-gam","Burr"), fitpch=1:4)

> qqcomp(list(fgam, fpar, fmixgampar, fburr), xlogscale=TRUE,

+ ylogscale=TRUE, fitcol="black", main="QQ-plot on danish",

+ legendtext=c("gamma","Pareto","Par-gam","Burr"), fitpch=1:4)

As there is no closed-form formula for the quantile function of the mixture distribution (i.e. inverse of $x \mapsto p γ (α, v x) / Γ (v) + (1 - p) (1 - θ^{α} / {(θ + x)}^{α})$ a numerical optimization is carried out using the Golden line-search (implemented in optimize). On Figure 2.11, quantiles and probabilities are plotted as a point, while the straight line corresponds to the identity function. The more points that are close to the line, the better fit the distribution. The pp-plot reveals that only the Burr distribution sufficiently fits the data, whereas the qq-plot shows that both the Pareto-gamma mixture and the Burr distributions best approximate the data.

Figure 2.11

Figure showing Comparison of fits on the Danish fire dataset.

Comparison of fits on the Danish fire dataset.

The plot method of a fitdist object provides the four above graphs (for the fitted distribution) in the following order: a histogram with the fitted density, an ecdf-plot with the fitted distribution function, a (theoretical) quantile—(empirical) quantile plot and a (theoretical) probability—(empirical) probability plot.

2.3.4 Goodness-of-Fit Statistics and Tests

We turn our attention to goodness-of-fit statistics to complement the four previous goodness- of-fit graphs. For continuous distributions, the three statistics presented in Section 2.2.4 can be computed, that is, Cramer-von Mises, Kolmogorov-Smirnov and Anderson-Darling statistics. For discrete distributions, the more common statistic is the chi-square statistic defined as

$Δ^{2} = \sum_{i = 0}^{m} \frac{{(n_{i} - n p_{i})}^{2}}{n p_{i}}$

where $n_{i}$ is the empirical frequency count for the level i, n is the total number of observations, the theoretical probability ${\hat{p}}_{i} = P (X = i; \hat{θ})$ (i.e. $n p_{i}$ the theoretical frequency count), and m is the number of cells. In practice, the number of cells is either fixed by the experimenter or chosen so that empirical frequencies are greater than five and ${\hat{p}}_{i}$ is replaced by $p_{i} = ℙ (X = i; θ)$ . The chi-square statistic is linked to the Pearson’s hypothesis test of goodness- of-fit, for which under the null hypothesis, $Δ^{2}$ converges in law to a chi-square distribution $x^{2}$ (m — d — 1) (where d is the number of parameters). For all distributions, we consider also the information criteria (AIC and BIC) proportional to the opposite of the log-likelihood. All of this is provided in the gofstat function (of the fitdistrplus package). A numerical illustration is proposed on a TPL claim number dataset, for which a Poisson, a negative binomial and a zero-modified Poisson distribution are fitted using maximum likelihood techniques.

> data(tplclaimnumber)

> x <- tplclaimnumber$claim.number

> fpois <- fitdist(x, "pois")

> fnbinom <- fitdist(x, "nbinom")

> fpoisZM <- fitdist(x, "poisZM",  start=list(

+  prob=sum(x == 0)/length(x), lambda=mean(x)),

+  lower=c(0,0), upper=c(1, Inf))

> gofstat(list(fpois, fnbinom, fpoisZM), chisqbreaks=c(0:4, 9),

+ discrete=TRUE, fitnames=c("Poisson","NegBinomial","ZM-Poisson"))

Chi-squared statistic: Inf 11765679 Inf

Degree of freedom of the Chi-squared distribution:  5 4 4

Chi-squared p-value:  0 0 0

the p-value may be wrong with some theoretical counts < 5

Chi-squared table:

  obscounts  theo Poisson  theo NegBinomial  theo ZM-Poisson

<=0  653047  6.520559e+05  6.530606e+05  6.530411e+05

<= 1  23592  2.545374e+04  2.353633e+04  2.351466e+04

<= 2  1299  4.968076e+02  1.326372e+03  1.413873e+03

<= 3  62  6.464481e+00  8.372804e+01  4.250619e+01

<= 4  5  6.308707e-02  5.568862e+00  8.519276e-01

<= 9  5  4.957574e-04  4.104209e-01  1.296158e-02

> 9   3  0.000000e+00  7.649401e-07  0.000000e+00

Goodness-of-fit criteria

       Poisson NegBinomial  ZM-Poisson

Aikake's Information Criterion 226880.4 225375.1 225585.7

Bayesian Information Criterion 226891.8 225398.0 225608.5

From the chi-square statistic and the chi-square table ${(n_{i}, n {\hat{p}}_{i})}_{i},$ the negative binomial distribution is clearly the best distribution. This is also confirmed by the AIC and the BIC criteria.

2.3.5 Skewness—Kurtosis Graph

When selecting a distribution, depending on the type of applications, the experimenter may give particular attention to the tail, some quantiles or the body of the distribution for which a natural way of choosing the “best” distribution emerges. In actuarial science, a great care is given to the tail of distribution, and also on first moments. The code below provide values of quantiles (plotted before) as well as the first two raw moments.

> p <- c(.9, .95, .975, .99)

> rbind(

+ empirical= quantile(danishuni$Loss, prob=p),

+ gamma= quantile(fgam, prob=p)$quantiles,

+ Pareto= quantile(fpar, prob=p)$quantiles,

+ Pareto_gamma= quantile(fmixgampar, prob=p)$quantiles,

+ Burr= quantile(fburr, prob=p)$quantiles)

   p=0.9   p=0.95   p=0.975    p=0.99

empirical 5.541526  9.972647   16.26821   26.04253

gamma   7.308954  9.261227   11.18907   13.71207

Pareto  7.412375  10.341301  13.67386   18.79426

Pareto_gamma 7.093200  12.164896  17.92871   26.77259

Burr  5.344677  8.636739   13.95655   26.32118

> compmom <- function(order)

+ c(empirical= sum(danishuni$Loss"order)/length(x),

+ gamma=mgamma(order, fgam[[1]][1], fgam[[1]][2]),

+ Pareto=mpareto(order, fpar[[1]][1], fpar[[1]][2]),

+ Pareto_gamma= as.numeric(fmixgampar[[1]][1]*

+ mgamma(order, fmixgampar[[1]][2], fmixgampar[[1]][3]) +

+ (1-fmixgampar[[1]][1])*

+ mpareto(order, fmixgampar[[1]][4], fmixgampar$estimate[5])),

+ Burr=mburr(order, fburr[[1]][1], fburr[[1]][2], fburr[[1]][3]))

> rbind(Mean=compmom(1), Mom2nd= compmom(2))

  empirical   gamma   Pareto  Pareto_gamma Burr

Mean 0.01081909  3.385081 3.168128 3.28202  2.98562

Mom2nd 0.26784042  20.289412 26.032657 39.78745   Inf

For higher moments, it is typical to look at the skewness and the kurtosis coefficients defined as

$s k (X) = \frac{? [{(X - ? (X))}^{3}]}{var {(X)}^{\frac{3}{2}}}, k r (X) = \frac{? [{(X - ? (X))}^{4}]}{var {(X)}^{2}}$

for a random variable X. For heavy-tailed distributions, such coefficients may not exist, yet empirically they always exist. The descdist function provides the so-called Cullen and Frey graph, which plots the empirical estimates of sk(X) and kr(X) as well as the possible values for some classic distributions (including the gamma family for continuous distributions and the Poisson distribution for discrete distributions) This is illustrated on the danishuni and the tplcaimnumber datasets on Figure 2.12, using so-called Cullen and Frey graphs, from Cullen & Frey (1999). The fit analysis can also be completed by looking at the uncertainty of parameter estimate with a bootstrap analysis. This is possible with the bootdist function of fitdistrplus.

Figure 2.12

Figure showing Cullen and Frey graph for danish and tplclaimnumber, on the left and on the right, respectively.

Cullen and Frey graph for danish and tplclaimnumber, on the left and on the right, respectively.

2.4 Linear Regression: Introducing Covariates in Statistical Inference

In the first part of this chapter, we did mention the normal distribution. If it is still a popular distribution in financial models (see Chapter 11), it is also frequently used in actuarial science because of its connection with linear regression.

2.4.1 Using Covariates in the Statistical Framework

So far, we have assumed that observations were i.i.d., for example, with distribution $N (θ, σ^{2})$ . But in most assumptions, it can yield a very restrictive model. For instance, consider dataset Davis and let X denote the height of a person (in centimeter):

> X <- Davis$height

We can fit a Gaussian distribution to the weight

> (param <- fitdistr(X,"normal")$estimate)

 mean   sd

170.56500  8.90987

If we plot the distribution (see Figure 2.13), we can see that using a mixture of two Gaussian distributions is much better than using only a single model,

Figure 2.13

Figure showing Distribution of the height, using a normal distribution, and mixtures of normal distributions: One with a non-observable latent factor, one where mixture is related to the sex.

Distribution of the height, using a normal distribution, and mixtures of normal distributions: One with a non-observable latent factor, one where mixture is related to the sex.

$X \sim p . N (μ_{1}, σ^{2}) + [1 - p] . N (μ_{2}, σ^{2})$

(where the + sign should be understood in the context of mixtures, as described in Section 2.1.3). This model can be estimated using maximum likelihood techniques. Let us define the log-density as

> logdf <- function(x,parameter){

+ p <- parameter[1]

+ ml <- parameter[2]

+ sl <- parameter[4]

+ m2 <- parameter[3]

+ s2 <- parameter[5]

+ return(log(p*dnorm(x,m1,s1)+(1-p)*dnorm(x,m2,s2)))

+}

and in order to take into account various constraints, namely $p \in (0, 1) and σ_{1}, σ_{2} \in (0, \infty),$ that can be written

$\underset{A}{\underset{︸}{(\begin{matrix} 1 & 0 & 0 & 0 \\ - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{matrix})}} θ + \underset{- b}{\underset{︸}{(\begin{matrix} 0 \\ 1 \\ 0 \\ 0 \end{matrix})}} \geq 0$

Function constrOptim will seek the minimum of a function, so we will consider here the opposite of the log-likelihood:

> logL <- function(parameter) -sum(logdf(X,parameter))

> Amat <- matrix(c(1,-1,0,0,0,0,

+ 0,0,0,0,1,0,0,0,0,0,0,0,0,1), 4, 5)

> bvec <- c(0,-1,0,0)

> constr0ptim(c(.5,160,180,10,10), logL, NULL, ui = Amat, ci = bvec)$par [1] 0.5996263 165.2690084 178.4991624   5.9447675  6.3564746

Because we use a (finite) normal mixture here, it is also possible to use the EM algorithm, from the mixtools package,

> library(mixtools)

> mix <- normalmixEM(X) number of iterations= 391

> (param12 <- c(mix$lambda[1],mix$mu,mix$sigma))

[1] 0.5995197 165.2676186 178.4951348  5.9448806  6.3579494

The two methods yield rather similar outputs.

If previously we assumed that the mixing variable 0 was a latent unobservable random variable, here it would make sense to assume that a good proxy of this variable can be the sex of the individuals. And here,

$X \sim p_{M} . N (μ_{M}, σ^{2}) + p_{F} . (μ_{F}, σ^{2})$

Here, $p_{M}$ and $p_{F}$ are known, and are, respectively, the proportion of males and females in the population.

> sex <- Davis$sex

> (pM <- mean(sex=="M"))

[1] 0.44

> (paramF <- fitdistr(X[sex=="F"],"normal")$estimate)

  mean    sd

164.714286  5.633808

> (paramM <- fitdistr(X[sex=="M"],"normal")$estimate)

  mean    sd

178.011364  6.404001

If we compare the three models, including a kernel-based estimator, we obtain the graph of Figure 2.13.

> f1 <- function(x) dnorm(x,param[1],param[2])

> f2 <- function(x) param12[1]*dnorm(x,param12[2],param12[4]) +

+ (1-param12[1])*dnorm(x,param12[3],param12[5])

> f3 <- function(x) pM*dnorm(x,paramM[1],paramM[2]) + (1-pM)*dnorm(x,paramF[1],paramF[2])

> boxplot(X~sex,horizontal=TRUE,names=c("Female","Male"))

> x <- seq(min(X),max(X),by=.1)

> plot(x,f1(x),lty=2,type="l")

> lines(x,f2(x),col="grey",lwd=2)

> lines(x,f3(x),col="black",lwd=2)

> lines(density(X))

Actually, this factor-based mixture is a particular case of what is known as the linear model.

2.4.2 Linear Regression Model

To use standard notions in regression modeling, let Y denote the variable of interest. And assume that some additional variables can be used, denoted $X = (X_{1}, ..., X_{k})' s$ . This means that for each observation $Y_{i}$ , we observe also $X_{i} = (X_{1, i,} ..., X_{k, i})$ . As discussed in Chapter 14, $X_{k}$ ’s can be either numeric (also called continuous) or categorical (also called factor) variables.

In the Davis dataset, the varible of interest is the height of a person, denoted height (our variable Y), and two additional variables can be used, sex (variable $X_{1}$ ) and weight (variable $X_{2}$ ).

> Y <- Davis$height

> X1 <- Davis$sex

> X2 <- Davis$weight

> df <- data.frame(Y,X1,X2)

Instead of assuming that

$Y \sim N (θ, σ^{2})$ ,

we will assume, in a regression model, that

$Y | X = x \sim N (θ (x), σ^{2}),$

where $θ ()$ is now a function of the explanatory variables. Consider here the case where we observe two covariates, the sex and the weight (in kilograms) of the individuals. If we restrict ourselves to linear models, then

$θ (x_{1}, x_{2}) = β_{0} + β_{1, H} 1 (x_{1} = H) + β_{1, F} 1 (x_{1} = F) + β_{2} x_{2} .$

From properties of the Gaussian distribution, it is also possible to write this model as

$Y = β_{0} + β_{1, H} 1 (x_{1} = H) + β_{1, F} 1 (x_{1} = F) + β_{2} x_{2} + ε,$

where ε is an error term, usually called residuals, centered, and normally distributed,

$ε \sim N (0, σ^{2})$

The unknown parameters (that should be estimated) are now

$β_{0}, β_{1, H}, β_{1, \vec{F}}, β_{2} and σ^{2} .$

2.4.3 Inference in a Linear Model

Recall, first of all, that the previous model cannot be identified: We cannot have the intercept and the two factors (M and F) at the same time. The standard procedure is to keep the intercept and to remove one of the two factors. The factor that was dropped will become the reference.

The maximum likelihood estimator is obtained by maximizing

$L ((β, σ); y, x) = \prod_{i = 1} φ (y_{i}; β_{0} + β_{2} x_{2}_{,} σ^{2}),$

where here $φ (y; μ, σ^{2})$ is the density of the $N (μ, σ^{2})$ distribution.

Here, the maximum likelihood estimators of the set of parameters can be written explicitly. When writing the problem using the logarithm of the likelihood, we can observe that the optimal value of = $β = (β_{0}, β_{1}, β_{2})$ should satisfy

$\hat{β} = \arg \min {\underset{residuals ε_{i}}{\underset{︸}{\sum_{i = 1}^{n} {(y_{i} - [β_{0} + β_{1} x_{1} + β_{2} x_{2}])}^{2}}}} .$

Thus, using maximum likelihood techniques in a Gaussian linear model is the same as minimizing the sum of squares of residuals (known as Ordinary Least Squares estimation).

Fitting a linear model is done using function lm(). Using symbolic notions (introduced in Chapter 1), we write here

> lin.mod <- lm(Y~X1+X2,data=df)

As mentioned in Chapter 1, lin.mod is a S3 object, and many functions can be used to extract information from that object. To visualize the standard output of a linear regression, use

> summary(lin.mod)

Call:

lm(formula = Y ~ X1 + X2, data = df)

Residuals:

 Min 1Q Median  3Q  Max

-85.204 -4.183  0.446 5.224 19.009

Coefficients:

   Estimate Std. Error t value Pr(>|t|)

(Intercept) 175.26607 3.30681 53.002 < 2e-16 ***

X1M  17.86160 1.66941 10.699 < 2e-16 ***

X2  -0.19917 0.05503 -3.619 0.000376 ***

---

Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1  1

Residual standard error: 9.424 on 197 degrees of freedom

Multiple R-squared: 0.3903,  Adjusted R-squared: 0.3841

F-statistic: 63.05 on 2 and 197 DF, p-value: < 2.2e-16

It is also possible to get predictions, using predict. Keep in mind that we should have the same input format as in the lm call: The regression was run on a dataframe, so predict should also be called on a dataframe (with the same variable names),

> new.obs <- data.frame(X1=c("M","M","F"),X2=c(100,70,65))

> predict(lin.mod,newdata=new.obs)

  1   2   3

173.2110 179.1860 162.3202

which will return $x' \hat{β}$ for any observation x.

Linear models with R have been intensively described, so we refer to Venables & Ripley (2002), Fox & Weisberg (2011) or Kleiber & Zeileis (2008) for more details. Extensions of this model will be given in the next chapters (the logistic regression of binary responses in Chapter 4, and GLMs in Chapter 14, among others) as well as a Bayesian interpretation of this model (in Chapter 3).

2.5 Aggregate Loss Distribution

This section deals with the aggregate loss amount distribution, which is the distribution of the compound sum

$S = \sum_{i = 1}^{N} x_{i}, with S = 0 if N = 0,$

where N is the claim number and ( $X_{i}$ )’s are the claim severities (which are assumed to be strictly positive). Firstly, the computation of the distribution function $F_{S}$ of S is studied. Then, an application to a TPL motor dataset is carried out. Finally, a continuous-time version of this problem is analyzed via the ruin theory framework.

2.5.1 Computation of the Aggregate Loss Distribution

A classical assumption on the aggregate amount S is to require that N is independent of claim amounts ${(X_{i})}_{i}$ . Another common assumption is that ${(X_{i})}_{i} \overset{i . i . d}{\sim} X .$ . Therefore, the distribution function simplifies to $F s (s) = \sum_{n = 0}^{+ \infty} ℙ (N = n) ℙ (X_{1} + ... + X_{n} \leq s) = \sum_{n = 0}^{+ \infty} ℙ (N = n) F_{X}^{* n} (s),$

where $F_{X}^{* n}$ is the n-order convolution product of $F_{X}$ . In a small number of distributions of X, the distribution of the sum $X_{1} + ... + X_{n}$ is easy. For instance, when X follows a gamma distribution, then the sum $X_{1} + ... + X_{n}$ follows a gamma distribution $G (n α, λ)$ . This can be implemented as

> pgamsum <- function(x, dfreq, argfreq, shape, rate, Nmax=10)

+ {

+ tol <- 1e-10; maxit <- 10 + nbclaim <- 0:Nmax

+ dnbclaim <- do.call(dfreq, c(list(x=nbclaim), argfreq))

+ psumfornbclaim <- sapply(nbclaim, function(n)

+ pgamma(x, shape=shape*n, rate=rate))

+ psumtot <- psumfornbclaim dnbclaim + dnbclaimtot <- dnbclaim + iter <- 0

+ while(abs(sum(dnbclaimtot)-1) > tol && iter < maxit)

+ {

+  nbclaim <- nbclaim+Nmax

+  dnbclaim <- do.call(dfreq, c(list(x=nbclaim), argfreq))

+  psumfornbclaim <- sapply(nbclaim, function(n)

+  pgamma(x,  shape=shape*n, rate=rate))

+  psumtot <- psumtot + psumfornbclaim dnbclaim

+  dnbclaimtot <- c(dnbclaimtot, dnbclaim)

+  iter <- iter+1

+}

+ as.numeric(psumtot)

+}

In general, the distribution of the sum $X_{1} + ... + X_{n}$ does not necessarily have the same distribution as X. Alternative computations are possible. The Panjer recursion provides a recursive method to compute the mass probability function of S in the case that X has a discrete distribution and N belongs to the (a, b, n) family; see Panjer (1981). The recursion formula for the mass probability function $p_{S}$ is

$p s (s) = \frac{[p_{X} (1) - (a + b) p_{X} (0)] p_{X} (s) \sum_{y = 1}^{s \land m} (a + b y / x) p_{X} (y) p s (s - y)}{1 - a p_{X} (0)}$

Where $s \in ℕ, X$ has a discrete distribution on {0,1,…,m} with a mass probability function $p_{X}$ , N belongs to (a, b, 0) family and starting at $p s (0) = G_{N} (p x (0)) with G_{N}$ the probability generating function. The recursion is stopped when the sum of elementary probabilities P(S = 0,1,...) is arbitrarily close to 1. In practice, the distribution of the claim amount is not discrete but can be discretized. The upper discretization is the forward difference

$\tilde{f} (x) = F_{X} (x + h) - F_{X} (x)$ , the lower discretization is the backward difference $\tilde{f} (x) = F_{X} (x) - F_{X} (x - h)$ and the unbiased discretization is $\tilde{f} (x) = (2 ? (X \land x) - ? (X \land x - h) - ? (X \land x + h)) / h,$ , where h is the step of discretization; see Figure 2.14 and Dutang et al. (2008) for further details.

Figure 2.14

Figure showing Comparison of three discretization methods (a paralogistic distribution).

Comparison of three discretization methods (a paralogistic distribution).

Approximations based on the normal distribution are also available: (i) the normal approximation is given by

$F s (x) \approx Φ (\frac{x - ? (s)}{σ (s)}), and$

(ii) the normal-power approximation is given by

$F s (x) \approx Φ (- \frac{3}{s k (s)} + \sqrt{\frac{9}{s k (s)} + 1 + \frac{6}{s k (s)} \frac{x - ? (s)}{σ (s)}}),$ ,

where $σ (S)$ is the standard deviation of S and sk(S) is the skewness coefficient of S. The skewness coefficient can be written as

$s k (s) = \frac{s k (N) var {(N)}^{3 / 2} ? {(X)}^{3} + 3 var (N) ? (X) V a r (X) + ? (N) s k (X) var {(X)}^{3 / 2}}{var {(s)}^{3 / 2}}$

An approximation based on the gamma distribution is also possible; see Gendron & Crepeau (1989). These approximations are reasonably correct at the heart of the distribution but not at the tails of the distribution. A last alternative to exact computation is the simulation procedure. It consists of simulating n claim numbers $N_{1}, ..., N_{n}$ , $N_{i}$ ’s claim severities $(X_{i, j}) j$ in order to get n realizations $S_{1}, ..., S_{n}$ . All these alternative methods are available in the aggregateDist function of the actuar package. We consider two examples: the gamma case (N follows a Poisson distribution P(10) and X follows a gamma distribution G(3, 2)) and the Pareto case (N follows a Poisson distribution P(10) and X follows a Pareto distribution P(3.1, 4.2)). The following code computes the gamma case:

> parsev <- c(3, 2); parfreq <- 10

> meansev <- mgamma(1, parsev[1], parsev[2])

> varsev <- mgamma(2, parsev[1], parsev[2]) - meansev"2

> skewsev <- (mgamma(3, parsev[1], parsev[2]) -

+  3*meansev*varsev   -   meansev"3)/varsev"(3/2)

> meanfreq <- varfreq <- parfreq[1]; skewfreq <- 1/sqrt(parfreq[1])

> meanagg <- meanfreq * meansev

> varagg <- varfreq * (varsev + meansev"2)

> skewagg <- (skewfreq*varfreq"(3/2)*meansev"3 + 3*varfreq*meansev*

+  varsev  + meanfreq*skewsev*varsev"(3/2))/varagg"(3/2)

> Fs.s <- aggregateDist("simulation", model.freq = expression(y =

+ rpois(parfreq)), model.sev = expression(y =

+  rgamma(parsev[1], parsev[2])), nb.simul = 1000)

> Fs.n <- aggregateDist("normal", moments = c(meanagg, varagg))

> Fs.np <- aggregateDist("npower", moments = c(meanagg, varagg, skewagg))

> Fs.exact <- function(x) pgamsum(x, dpois, list(lambda=parfreq),

+  parsev[1], parsev[2], Nmax=100)

> x <- seq(25, 40, length=101)

> plot(x, Fs.exact(x), type="l",

+  main="Agg. Claim Amount Distribution", ylab="F_S(x)")

> lines(x, Fs.s(x), lty=2)

> lines(x, Fs.n(x), lty=3)

> lines(x, Fs.np(x), lty=4)

> legend("bottomright", leg=c("exact", "simulation",

+ "normal approx.", "NP approx."), col = "black",

+ lty = 1:4, text.col = "black")

Similarly, we have the Pareto case. We show here only the recursive computation calls.

> parsev <- c(3.1, 2*2.1) ; parfreq <- 10

> xmax <- qpareto(1-1e-9, parsev[1], parsev[2])

> fx2 <- discretize(ppareto(x, parsev[1], parsev[2]), from = 0,

+ to = xmax, step = 0.5, method = "unbiased",

+ lev = levpareto(x, parsev[1], parsev[2]))

> Fs2 <- aggregateDist("recursive", model.freq = "poisson",

+ model.sev = fx2, lambda = parfreq, x.scale = 0.5, maxit=2000)

> fx.u2 <- discretize(ppareto(x, parsev[1], parsev[2]), from = 0,

+ to = xmax, step = 0.5, method = "upper")

> Fs.u2 <- aggregateDist("recursive", model.freq = "poisson",

+ model.sev = fx.u2, lambda = parfreq, x.scale = 0.5, maxit=2000)

> fx.l2 <- discretize(ppareto(x, parsev[1], parsev[2]), from = 0,

+ to = xmax, step = 0.5, method = "lower")

> Fs.l2 <- aggregateDist("recursive", model.freq = "poisson",

+ model.sev = fx.l2, lambda = parfreq, x.scale = 0.5, maxit=2000)

The two graphs are displayed on Figure 2.15. Despite the expectation that E(X) is identical in both cases, high-level quantiles of the aggregate claim distribution are significantly different. For the gamma case, the normal-power approximation suitably fits the exact distribution function, while for the Pareto case, the normal-power approximation overestimates as the skewness sk(S) is very high and not representative of the shape of the distribution. As their name suggests, the upper and the lower recursive computations surround the true distribution function. The simulation number is voluntarily chosen low (1,000), but can be set to a much larger number. If convergence is achieved for a high number of simulations, paral- lelization, GPU computation and quasi-Monte Carlo sampling methods can be used to fasten the process; see Chapter 1, http://cran.r-project.org/web/views/HighPerformanceComputing.html and http://cran.r-project.org/web/views/Distributions.html for more details.

Figure 2.15

Figure showing Aggregate claim distribution (gamma case/Pareto case).

Aggregate claim distribution (gamma case/Pareto case).

2.5.2 Poisson Process

The Poisson process is probably the most important stochastic process in general insurance. It is used to describe the number of claims that occurred in a time interval. In its basic form, the homogeneous Poisson process is a counting process ( $N_{t}$ ), with independent and stationary increments. At time t, $N_{t}$ , the number of claims that occurred from time 0 until time t, has a Poisson distribution, and the distribution of waiting time until the next claim is an exponential distribution. Let ( $T_{i}$ ) denote the ith arrival time, so that { $N_{t}$ = 0} can equivalently be represented by ${T_{1} > t}$ and more generally

${N_{t} = n} = {T_{n} \leq t and T_{n + 1} > t} for al l n \geq 1.$

In the case where interarrival times $W_{i} = T_{i} - T_{i - 1}$ are i.i.d. random variables, then $(N_{t})$ is called a renewal process, and it is fully characterized by the distribution of inter-arrival times, $F (x) = ℙ (W \leq x)$ . The Poisson process is obtained when F is the distribution of an exponential random variable. The code to generate a renewal process up to time T, or more precisely an arrival time sequence, is

> rate <- 1

> rFexp <- function!) rexp(1,rate)

> rRenewal <- function(Tmax=1,rF=rFexp){

+ t <-0

+ vect.W<- NULL

+ while(t<Tmax){

+ W<-rF()

+ t<-t+W

+ if(t<T) vect.W=c(vect.W,W)}

+ return(list(T=cumsum(vect.W),W=vect.W,N=length(vW)))}

> set.seed(1)

> rRenewal(Tmax=2)

$T

[1] 0.7551818 1.9368246 $W

[1] 0.7551818 1.1816428

$N [1] 2

An interesting alternative, in the case of the Poisson process with intensity $λ$ , is to use a uniform property of the process: For all $n \geq 1$ , given { $N_{t} = n$ }, the joint distribution of the n arrival time $T_{1}, ..., T_{n}$ is the same as the joint distribution of $U_{1 : n}, ..., U_{n : n}$ , the order statistics of n i.i.d. random variables uniformly distributed on [0,t]. Thus, a natural algorithm to generate the Poisson process is the following:

> rPoissonProc <- function(Tmax=1,lambda=rate){

+ N <- rpois(n=1,lambda*Tmax)

+ vect.T <- NULL

+ if(N>0) vect.T=sort(runif(N))*lambda*Tmax

+ return(list(T=vect.T,W=diff(c(0,vect.T)),N=N))}

> set.seed(1)

> rPoissonProc(T=5)

$T

[1] 1.008410 1.860619 2.864267 4.541039

$W

[1] 1.0084097 0.8522098 1.0036473 1.6767721

$N

[1] 4

An homogeneous Poisson process, with intensity $λ \geq 0$ satisfies

$ℙ (N_{t + h} - N_{t} = k) = \frac{1}{k!} e^{λ h} {[λ h]}^{k} .$

It is possible to consider a non-homogeneous Poisson process with intensity $(λ_{t})$ . Then

$ℙ (N_{t + h} - N_{t} = k) = \frac{1}{k!} e^{\int_{t}^{t + h} λ_{s} d s} {[\int_{t}^{t + h} λ_{s} d s]}^{k} .$ .

See Rolski et al. (1999) for more details.

To generate a Poisson process, several algorithms can be considered. In order to illustrate, consider a cyclical Poisson process with intensity $λ_{t} \propto (1 + \sin (π t)),$

> lambda <- function(t) 100*(sin(t*pi)+1)

so that the cumulated intensity $Λ$ is

> Lambda <- function(t) integrate(f=lambda,lower=0,upper=t)$value

Given that the last claim occurred at time t, let $F_{t}$ be the conditional distribution function of the waiting time before the next claim. Then

$F_{t} (x) = 1 - \exp [Λ (t) - Λ (x + t)] = 1 - \exp [\int_{t}^{x + t} λ_{s} d s] .$

From a computational aspect, we just have to invert this function and use a rejection technique algorithm,

> Tmax <- 3*pi

> set.seed(l)

> t <- 0; X <— numeric(0)

> while(X[length(X)] <= Tmax){

+ Ft <- function(x) 1-exp(-Lambda(t+x)+Lambda(t))

+ x <- uniroot(function(x) Ft(x)-runif(1),interval=c(0,Tmax))$root + t <- t+x + X <- c(X,t)}

> X <- X[-which.max(X)]

To visualize the cycle of occurrences, let us consider the following histogram:

> hist(X,breaks=seq(0,3*pi,by=pi/32),col="grey",

+ border="white",xlab="",main="")

> lines(seq(0,3*pi,by=.02),lambda(seq(0,

+ 3*pi,by=.02))*pi/32,lwd=2)

See Figure 2.16. Pasupathy (2010) suggested also to use a rejection technique as an alternative. What we need is an upper bound for the intensity process. A natural upper bound is a constant one, obtained using max ${λ_{s}, s \in ℝ},$ ,

Figure 2.16

Figure showing Histogram associated with a non-homogeneous Poisson process, with cyclical intensity λ.

Histogram associated with a non-homogeneous Poisson process, with cyclical intensity $λ$ .

> lambda.up <- 200

The code to generate a Poisson process is

> set.seed(1)

> t <- 0; X <- t

> while(X[length(X)]<=Tmax){

+  u <- runif(1)

+  t <- t-log(u)/lambda.up

+ if(runif(1)<=lambda(t)/lambda.up) X <- c(X,t)}

> X <- X[-c(1,which.max(X))]

The two algorithms can be visualized in Figure 2.16.

Consider a Poisson process with intensity $λ$ . If we keep each point according to some Bernoulli distribution B(p), then the new point process is also a Poisson process, with intensity $p λ$ . A standard application is obtained when we consider some deductible d. If the occurrence of claims, for some reinsurer is driven by a Poisson process, with intensity. $λ$ , and if individual losses have distribution F, then the process of claims above the deductible is a Poisson process with intensity $[1 - F (d)] . λ .$ This property will be extremely important in compound Poisson processes (introduced in the next section).

2.5.3 From Poisson Processes to Levy Processes

A natural extension to the Poisson process is the compound Poisson process, extremely useful to model a surplus process of an insurance company. Given a Poisson process ( $N_{t}$ ) and a collection of i.i.d. random variables $X_{1}, X_{2}, ...,$ define

$S_{t} = \sum_{i = 1}^{N_{t}} X_{i} .$

To generate such a process on time interval [0, Tmax], we need to generate a collection of variables, for claims arrival, and claim sizes given a function randX that generates independent variables $X_{i}$ ’s, such as

> randX <- function(n) rexp(n,1)

The code can be the following:

> rCompPoissonProc <- function(Tmax=1,lambda=rate,rand){

+ N <- rpois(n=1,lambda*Tmax)

+ X <- randX(N)

+ vect.T <- NULL

+ if(N>0) vect.T=sort(runif(N))*lambda*T

+ return(list(T=vect.T,W=diff(c(0,vect.T)),X=X,N=N))}

> set.seed(1)

> rCompPoissonProc(Tmax=5,rand=randX)

$T

[1] 0.3089314 0.8827838 1.0298729 3.4351142

$W

[1] 0.3089314 0.5738524 0.1470891 2.4052414

$X

[1] 1.1816428 0.1457067 0.1397953 0.4360686

$N

[1] 4

Based on such a simulation, it is possible to define a function $t \mapsto S_{t} :$ :

> set.seed(1)

> compois <- rCompPoissonProc(Tmax=5,rand=randX)

> St <- function(t){sum(compois$X[compois$T<=t])}

and we can visualize this trajectory (left part of Figure 2.17) using

Figure 2.17

Figure showing Sample path of a compound Poisson process on the left, and a Brownian motion on the right.

Sample path of a compound Poisson process on the left, and a Brownian motion on the right.

> time <- seq(0,5,length=501)

> plot(time,Vectorize(St)(time),type="s")

> abline(v=compois$T,lty=2,col="grey")

To generalize to Lévy processes, we simply have to have a random part, corresponding to a Brownian motion,

$L_{t} = μ . t + B_{t} - \sum_{i = 1}^{N_{t}} X_{i},$

The Brownian ( $B_{t}$ ) motion satisfies, for all n

$B_{t} = \sum_{i = 1}^{[n t / T]} d B_{i}, for all t = \frac{T}{n}, \frac{2 T}{n}, ..., \frac{(n - 1) T}{n}, T,$

where increments $d B_{i}$ are i.i.d. Gaussian random variables, centered, with variance T/n. Thus, to generate a trajectory of the Brownian motion on [0, Tmax], we have to discretize, and given n, use the function above,

> n <- 1000

> h <- Tmax/n

> set.seed(1)

> B <- c(0,cumsum(rnorm(n,sd=sqrt(h))))

The analogous function St would be, here,

> Bt <- function(t){B[trunc(n*t/Tmax)+1])}

and we can visualize this trajectory (right part of Figure 2.17) using

> time <- seq(0,5,length=501)

> plot(time,Vectorize(Bt)(time),type="s")

(where level curves related to quantiles of Gaussian random variables were added).

Based on these two functions, it is then possible to generate a trajectory for the Lévy process,

> mu <- lambda*rate

> L <- function(t) -mu*t+St(t)+Bt(t)

but one should keep in mind that if we can generate the first continuous time process (compound Poisson), then we can only generate an approximation of the second one (Brownian motion); first, we have to specify the grid (choosing n), and then, on that grid, we generate a path.

2.5.4 Ruin Models

Ruin theory deals with the study of stochastic processes linked to the wealth of an insurer; see Asmussen & Albrecher (2010) or Dickson (2010) for a recent survey. A reserve risk process ${(U_{t})}_{t}_{\geq o}$ is considered. The initial model of Cramer-Lundberg assumes that the surplus ${(U_{t})}_{t}_{\geq o}$ of an insurance company at time t is represented by

$U_{t} = u + c t - \sum_{i = 1}^{N_{t}} X_{i},$

where u is the initial surplus, c is the premium rate, ${(X_{i})}_{i}_{\geq 1}$ are i.i.d. successive claim amounts and ${(N_{t})}_{t \geq o}$ is the claim arrival process assumed to be a Poisson process of intensity $λ$ (see Rolski et al. (1999) for more details on the Poisson process). Andersen (1957) generalized this model by proposing a renewal process for the claim arrival process ${(N_{t})}_{t \geq o}$ (the claim waiting times are denoted by ${(T_{i})}_{i}_{\geq 1}$ . When claim severities and claim waiting times follow a phase-type distribution, closed-form formulas exist for the ruin probability,

$ψ (u) = ℙ (\exists t > 0 : U_{t} < 0 | U_{0} = u);$

see Asmussen & Rolski (1991). We provide below examples of that article.

> psi <- ruin(claims = "e", par.claims = list(rate = 1/0.6),

+ wait = "e", par.wait = list(rate = 1/0.6616858))

Consider Phase-type claims, exponential inter-arrival times:

> p <- c(0.5614, 0.4386)

> r <- matrix(c(-8.64, 0.101, 1.997, -1.095), 2, 2)

> lambda <- 1/(1.1 * mphtype(1, p, r))

> psi2 <- ruin(claims = "p", par.claims = list(prob = p, rates = r),

+ wait = "e", par.wait = list(rate = lambda))

Consider Phase-type claims, a mixture of two exponentials for inter-arrival times:

> a <- (0.4/5 +0.6) * lambda

> psi3 <- ruin(claims = "p", par.claims = list(prob = p, rates = r),

+ wait = "e", par.wait = list(rate = c(5 * a, a), weights =

+ c(0.4, 0.6)), maxit = 225)

> plot(psi, from =0, to = 50)

> plot(psi2, add=TRUE, lty=2)

> plot(psi3, add=TRUE, lty=3)

> legend("topright", leg=c("Exp - Exp", "PH - Exp",

+ "PH - MixExp"), lty=1:3, col="black")

Figure 2.18

Figure showing Sample path of a compound Poisson process on the left, and a Brownian motion on the right u.

Ruin probability as a function of the initial surplus of the company, u.

2.6 Copulas and Multivariate Distributions

This final section deals with distributions of multivariate random vectors $X = (X_{1}, ..., X_{d})$ . Due to the growing literature (see Frees & Valdez (1998), Embrechts et al. (2001), Frees & Wang (2006), among others) on copulas during the past decade (defined as multivariate distribution functions of random vector with uniform marginals), we focus on copulas in this section.

2.6.1 Definition of Copulas

Let $F_{X}$ be the distribution function of X with marginals $F_{X_{j}}$ that is,

$F_{X} (x) = ℙ (X_{1} \leq x_{1}, ...., X_{d} \leq x_{d}) .$

As $F_{X_{1}}^{- 1} (U)$ has the same distribution as $X_{j}$ for U a uniform variate, it is easily checked that $ℙ (X_{1} \leq F_{X_{1}}^{- 1} (u_{1}), ...., X_{d} \leq F_{X_{d}}^{- 1} (u_{d})) = P (F_{X_{1}} (X_{1}) \leq u_{1}, ...., F_{X_{d}} (X_{d}) \leq u_{d}) .$ . A copula function C is a multivariate distribution function such that $C (u_{1}, ...., u_{d}) = ℙ (F_{X_{1}} (X_{1}) \leq u_{1}, ...., F_{X_{d}} (X_{d}) \leq u_{d}) for u \in {[0, 1]}^{d}$ The C function is bounded by the so-called Fréchet bound as

${(\sum_{i = 1}^{d} u_{i} - (d - 1))}_{+} \leq C (u) \leq \min (u_{1}, ...., u_{d})$

generally denoted by W(u) and M(u); see Nelsen (2006) for a recent introduction. By the Sklar theorem (from Sklar (1959)) for any random vectors X with marginals $F_{X_{j}}$ , there exists a copula function C such that

$ℙ (X \leq x) = ℙ (X_{1} \leq x_{1}, ...., X_{d} \leq x_{d}) = C (F_{X} (x_{1}), ...., F_{X_{d}} (x_{d})),$

for all $x \in ℝ^{d}$ . Note that the copula C is unique on the support of X, and not otherwise. Let us note that in the independent case, the copula function is simply $C (u) = u_{1} ... u_{d} = \prod (u)$ . As described below, classical multivariate distributions such as the multivariate Gaussian distribution and the multivariate Pareto distribution can be represented using a copula function. Note further that there exists a copula function C* such that

$ℙ (X > x) = ℙ (X_{1} > x_{1}, ...., X_{d} > x_{d}) = C^{*} (1 - F_{X_{1}} (x_{1}), ...., F_{X_{d}} (x_{d})),$

for all $x \in ℝ^{d}$ . This copula C* will be called survival or dual of C. If U has distribution function C, then 1 — U has distribution function C*.

2.6.2 Archimedean Copulas

A wide class of copulas is given by the family of Archimedean copulas. An Archimedean copula is characterized by a generator function $ϕ : [0, 1] \mapsto [0, \infty]$ such that

$C (u) = ϕ^{- 1} (\sum_{i = 1}^{d} ϕ (u_{i})),$

where $u \in {[0, 1]}^{d}$ and $ϕ$ is infinitely differentiable, completely monotone and invertible (weaker conditions can be required for specific dimensions d). We refer to Theorem 2.1 of Marshall & Olkin (1988) for the construction of Archimedean copulas. In this family, the three most classical copulas are the Gumbel copula $ϕ (t) = {(- \log (t))}^{- α}$ , the Frank copula $ϕ (t) = \log (e^{α} - 1) - \log (e^{α} - 1)$ and the Clayton copula $ϕ (t) = t^{- α} - 1$ for a parameter $α$ .

We get the following copula function:

Gumbel: $C_{G u} (u) = \exp (- {({(- \log (u_{1}))}^{α} + ... + {(- \log (u_{d}))}^{α})}^{1 / α}), for α \geq 1.$
Frank: $C_{F} (u) = - \log (1 + (e^{- α u 1} - 1) .... (e^{- α u d} - 1) / (e^{α} - 1)) / α, α \neq 0.$
Clayton: $C_{c} (u) = {(1 + d + u_{1}^{- α} + ... + u_{d}^{- α})}^{- 1 / α}, f o r α > 0 \neq 0 (or α \in [- 1, + \infty]) {0}$ dimesions

The survival Clayton defined as $ℙ (U > u) = C_{c} (1 - u)$ is linked to the multivariate Pareto distribution. According to Arnold (1983), the multivariate Pareto distribution is characterized by the following survival function:

$ℙ (X > x) = {(1 + \sum_{i = 1}^{d} x_{i}^{\frac{1}{γ_{i}}})}^{- α}$

The marginal distribution of $X_{i}$ is also Pareto distributed, as $ℙ (X_{i} > x_{i}) = {(1 + x_{i}^{1 / γ i})}^{- α} = 1 - F_{X_{i}} (x_{i})$ . It is easy to check that $ℙ (X_{1} > F_{X_{1}}^{- 1} (u_{1}), ....., X_{d} > F_{X_{d}}^{- 1} (u_{d})) = C_{c} (1 - u) .$ .

2.6.3 Elliptical Copulas

Before introducing elliptic copulas, we define elliptical distributions. A random variable X has an elliptical distribution if its characteristic function $φ x$ satisfies $φ X (t) = e^{i t μ} ψ (t^{2} σ)$

for some parameters $μ, σ$ and some function $ψ$ . Generally a random vector X follows an elliptical distribution if its characteristic function verifies $φ X (t) = e^{i t Τ μ} ψ (t^{Τ} \sum t), where t \in ℝ^{d},$

for some vector $μ$ , some positive definite matrix $\sum$ , and some function $ψ$ . For such a distribution, the density function is given by

$f_{X} (x) = \frac{c_{d}}{\sqrt{\det (\sum)}} ϕ ({(x - μ)}^{T} \sum^{- 1} (x - μ) / 2), where x \in ℝ^{d},$

some function $ϕ : ℝ_{+} \mapsto ℝ$ such that $\int_{0}^{^{\infty}} x^{d / 2 - 1} ϕ (x) d x < \infty$ and some normalizing constant $c_{d}$ . We get the multivariate normal distribution when $ϕ (t) = e^{- t}$ with mean vector $μ$ and covariance matrix $\sum$ , the multivariate Student distribution with m degrees of freedom when $ϕ (t) = {(1 + t / m)}^{(d + m) / 2}$ See Fang et al. (1990) or Genton (2004) for more details on elliptical distributions.

An elliptical copula is defined as

$C (u) = H (H_{1}^{- 1} (u_{1}), ...., H_{d}^{- 1} (u_{d})),$

where H is a multivariate distribution with marginals $H_{i}$ belonging to the elliptical family. In particular for a symmetric positive definite matrix $\sum$ , the Gaussian and the student copulas are defined as

Gaussian

$C_{G a} (u) = \int_{- \infty}^{z 1} ... \int_{- \infty}^{z d} \tilde{c} d^{e^{- x^{T} \sum^{- 1} x / 2 d x_{1} ... d x_{d}}}$

where $z_{i} = Φ^{- 1} (u_{i}) and Φ^{- 1}$ is the quantile function of the standard normal distribution.
Student

$C_{S t} (u) = \int_{- \infty}^{z 1} ... \int_{- \infty}^{z d} \tilde{c} d {(1 + \frac{x^{T} \sum^{- 1} x}{2 m})}^{\frac{d + m}{2}} d x_{1} ... d x_{d}$

Where $z_{i} = F_{S t}^{- 1} (u_{i})$ is the quantile of a Student distribution with m > 0 degrees of freedoms.s

2.6.4 Properties and Extreme Copulas

Copulas presented in the previous subsections have a density function $c : {[0, 1]}^{d} \mapsto [0, 1]$ because the copula function is differentiable with respect to all variables on the unit hypercube. The dependence induced by a particular copula can be quantified through the theory of concordance measures introduced by Scarsini (1984). The two main measures of concordance are Kendall’s tau and Spearman’s rho. Kendall’s tau for a bivariate vector (X, Y) is defined as

$τ (X, Y) = ℙ ((X - \tilde{X}) (Y - \tilde{Y}) > 0) - ℙ ((X - \tilde{X}) (Y - \tilde{Y}) < 0),$

where $(\tilde{X}, \tilde{Y})$ is an independent replicate of (X, Y). Similarly, Spearman’s rho for (X, Y) is defined as

$T (X, Y) = 3 ℙ ((X - \tilde{X}) (Y - \tilde{Y}) > 0) - 3 ℙ ((X - \tilde{X}) (Y - \tilde{Y}) < 0),$

where $(\tilde{X}, \tilde{Y})$ and $(\bar{X}, \bar{Y})$ are independent replicates of (X, Y). As these two measures satisfy the criteria of concordance measures, $τ (X, Y) \in [0, 1], τ (X, Y) = 1$ means that the copula of (X, Y) is the upper Frechet bound, and τ(X, Y) = 0 means that the copula of (X, Y) is the independent copula (the same holds for p(X, Y)). In the bivariate case, closed-form formulas are available for the copulas previously presented; see, for example, Nelsen (2006) and Joe (1997).

A desirable feature of copulas lies in the fact that they can model dependence between two or more variables with or without a tail dependency. This is characterized by the tail dependance coefficients. The upper tail coefficient of (X, Y) is defined as

$_{λ U} (X, Y) = \lim_{t \to 1^{-}} ℙ (Y > F_{Y}^{- 1} (t) | X > F_{Y}^{- 1} (t)),$

while the lower tail coefficient $λ_{L} (X, Y)$ is obtained considering

$_{λ L} (X, Y) = \lim_{t \to 0^{+}} ℙ (Y \leq F_{Y}^{- 1} (t) | X > F_{Y}^{- 1} (t))$

When X, Y have a continuous distribution with a dependency given by a copula $C_{X, Y}$ , those coefficients can be rewritten as

$λ_{U} (X, Y) = \lim_{t \to 1^{-}} \frac{1 - 2 t - C_{X, Y} (t, t)}{1 - t} and λ_{L} (X, Y) = \lim_{t \to 0^{+}} \frac{C_{X, Y} (t, t)}{t} .$

For the copulas presented here, we have $λ_{U} = 0$ except for the Gumbel copula $λ_{U} = 2 - 2^{1 / α}$ and the Student copula $λ_{U} = 2 F_{S t} (\sqrt{m + 1} \sqrt{1 - ρ} / ρ 1 + ρ), wheras λ_{L} = 0$ except for the Clayton copula $λ_{L} = 2^{- 1 / α}$ and the Student copula $λ_{L} = λ_{U}$ .In other words, copulas with $λ_{U} = 0$ cannot model dependence at the right-hand tail.

Another desirable property of copulas can be the max-stability. A copula function C is max-stable if

$C (u_{1}, ....., u_{d}) = {(C (u_{1}^{1 / k}, ...., u_{d}^{1 / k}))}^{k}$

for all k > 0. This property is linked to the extreme value theory because the right-hand side is the copula of component-wise maxima of a random vector sample $(X_{1}, ..., X_{k})$ , where the $X_{i}$ ’s have copula C. Copulas verifying this property are called extreme copulas: The Gumbel and the Hiisler-Reiss copulas belong to this family. The Hiisler-Reiss copula is defined as follows in the bivariate case,

$C_{H R} (u_{1}, u_{2}) = \exp (\log (u_{1}) Φ (d_{+}) + \log (u_{2}) Φ (d_{-}))$

where $d \pm = 1 / α \pm α 2 \log (\log (u_{1}) / \log (u_{2})) and ϕ$ is the distribution function of the standard normal distribution.

2.6.5 Copula Fitting Methods

There are four main methods to calibrate copulas which differ on how the marginals are considered in the fitting process. Consider a sample of random vectors $(X_{1}, ..., X_{n})$ and corresponding observations $x_{1}, ..., x_{n}$

where the ith marginal has a density $F_{i} (.; θ_{i})$ and a distribution function $F_{i} (.; θ_{i})$ . A (full) maximum likelihood estimation is the first option, which consists of maximizing the likelihood

$L (α, θ_{1}, ..., θ_{d}, x_{1}, ..., x_{n}) = \prod_{i = 1}^{n} c (F_{1} (x_{1, i}; θ_{1}), ...., F_{d} (x_{d, i}; θ_{d}); α) . f_{1} (x_{1, i}; θ_{1}) ... f_{d, i}$

$= f_{j} (x_{d},_{i}; θ_{d}),$

α being the parameter of the copula C and $θ_{i}$ being the parameter for the ith marginal distribution. The optimization is carried out over the whole parameter space.

The second estimation method is the method of moments which consists, as in the univariate, of matching theoretical moments and empirical moments. Marginal parameters θ i are set by equalizing the empirical moments of the sample $(X_{i, 1}, ..., X_{i, n})$ , while the copula parameters a are determined by matching Kendall’s tau or Spearman’s rho.

The third estimation, called inference for margins, is a two-step procedure. First, marginal distributions are fitted by maximum likelihood, and then a pseudo sample is defined as

${\hat{u}}_{i} = ({\hat{u}}_{1, i}, ..., {\hat{u}}_{d, i}) = (F (x_{1, i,} {\hat{θ}}_{1}), ..., F (x_{d, i,} {\hat{θ}}_{d}))$

for i = 1,..., n. Then the copula is fitted on ${\hat{u}}_{1}, .., {\hat{u}}_{n}$ by maximizing the likelihood

$L (α, {\hat{u}}_{1}, ..., {\hat{u}}_{n}) = \prod_{i = 1}^{n} c ({\hat{u}}_{1}, i ..., {\hat{u}}_{d, i}; α) .$

The inference for margins method takes advantage of the two steps to reduce the dimension of the likelihood from $(α, θ_{1}, ...., θ_{d}) to α$ to α. Finally, the canonical maximum likelihood method is similar to the inference for margins and consists of replacing the parametric estimate by the non-parametric estimates in the pseudo data. That is to say, ${\hat{u}}_{i} = (F_{n} (x_{1, i}), ..., F_{n} (x_{d, i}))$ which further simplifies to ${\hat{u}}_{i} = (rank (x_{1, i}) / n, ..., rank (x_{d, i}) / n) .$ . In the following section, we only consider the inference for margins method.

2.6.6 Application and Copula Selection

Numerical illustrations of copulas and their estimation are carried out on the loss-ALAE dataset used in Frees & Valdez (1998) and Klugman & Parsa (1999). The dataset consists of 1,500 general liability claims (expressed in USD) where each claim is a two-component vector: an indemnity payment (loss) and an allocated loss adjustment expense (ALAE).

> data(lossalae)

> par(mfrow=c(1,2))

> plot(lossalae, log="xy", main="Scatterplot of loss-ALAE")

> plot(apply(lossalae, 2, rank)/NROW(lossalae),

+ main="rank transform of loss-ALAE")

In Figure 2.19, we plot the scatterplots of the data (xi,yi) and the empirical distributions evaluated at (xj,yj), that is,

$(F_{n},_{X} (x_{i}), F_{n,} Y (y_{i})) = (r a n k (x_{i}) / n, r a n k (y_{i}) / n)$

Figure 2.19

Figure showing Loss-ALAE dataset.

Loss-ALAE dataset.

On this dataset, we choose to fit the following bivariate copulas:

(i) Gaussian copula CGa(.,.; p),
(ii) Student copula CSt(.,.; p, m),
(iii) Gumbel copula CGu(.,.; α),
(iv) Frank copula CF(.,.; α)
(v) Hiisler-Reiss copula CHR(.,.; a).

We use the implementation done in the fCopulae package (part of the Rmetrics project, see https://www.rmetrics.org/; see also Chapter 11 and chapter 13). For convenience, we define the following functions

> dnormcop <- function(U, param)

+ as.numeric(dellipticalCopula(U, rho=param[1], type="norm"))

> dtcop <- function(U, param)

+ as.numeric(dellipticalCopula(U, rho=param[1], type="t",

+ param=param[2]))

> dgumcop <- function(U, param)

+ as.numeric(devCopula(U, type="gumbel", param=param[1]))

> dHRcop <- function(U, param)

+ as.numeric(devCopula(U, type="husler.reiss", param=param[1]))

> dfrankcop <- function(U, param)

+ as.numeric(darchmCopula(U, type="5", alpha=param[1]))

In addition to finding an appropriate copula, a choice of distribution must be done for marginals. A Pareto chart on both marginals shows that they follow heavy-tailed distributions.

> paretochart <- function(x, ...)

+ plot(-log((1:length(x))/(length(x)+1)), log(sort(x)), ...)

> paretochart(lossalae$Loss)

> paretochart(lossalae$ALAE)

Therefore, we choose a Pareto type II distribution and a lognormal distribution for candidate distributions of marginals. As there is no package fitting copulas for any kind of copula, we implement the inference for margins method in the following function:

> fit.cop.IFM.2 <- function(obs, copula, marg, arg.margin=list(),

+ method.margin="mle", arg.cop=list(), initpar, ...)

+ {

+ Obs1 <- obs[,1]

+ Obs2 <- obs[,2]

+ if(marg %in% c("exp","gamma","lnorm","pareto","burr")){

+  Obs1 <- Obs1[Obs1 > 0]

+  Obs2 <- Obs2[Obs2 > 0]}

+ marg1 <- do.call(fitdist, c(list(data= Obs1, distr=marg,

+  method=method.margin), arg.margin))

+ marg2 <- do.call(fitdist, c(list(data= Obs2, distr=marg,

+  method=method.margin), arg.margin))

+ comput.cdf <- function(fit, obs) {

+  para <- c(as.list(fit$estimate), as.list(fit$fix.arg))

+  distname <- fit$distname

+  pdistname <- paste("p", distname, sep = "")

+  do.call(pdistname, c(list(q = obs), as.list(para)))}

+ pseudomarg1 <- comput.cdf(marg1, Obs1)

+ pseudomarg2 <- comput.cdf(marg2, Obs2)

+ U <- cbind(pseudomarg1, pseudomarg2)

+ copLogL <- function(x) {

+ if(arg.cop$lower <= x && arg.cop$upper >= x) + res <- -sum(remove.naninf(log(copula(U, param=x))))

+ else res <- Inf + return(res)}

+ resopt <- optim(par=initpar, fn=copLogL, method="L-BFGS-B",

+  lower=arg.cop$lower, upper=arg.cop$upper, ...)

+  list(marg1=marg1, marg2=marg2, copula=

+  list(id="arg.cop$name, alpha=resopt$par))}

> remove.naninf <- function(x)

+ x[!is.nan(x) & is.finite(x)]

The copulas are now fitted using the function fit.cop.IFM.2 by defining the corresponding arg.cop argument. Note that the marginal distributions are fitted using fitdist.

> library(fCopulae)

> argnorm <- list(length=1, lower=0, upper=1, id=""Gaussian")

> argt <- list(length=2, lower=c(0,0), upper=c(1,1000),

+ id=""Student")

> arggum <- list(length=1, lower=1, upper=100, id=""Gumbel")

> argHR <- list(length=1, lower=0, upper=1000, id=""Husler-Reiss")

> argfrank <- list(length=1, lower=-1000, upper=1000, id=""Frank")

> fgausspareto <- fit.cop.IFM.2(lossalae, copula= dnormcop,

+ marg="pareto", arg.margin=list(start=list(shape=10, scale=100),

+ lower=c(1, 1/2)), arg.cop= argnorm, initpar=1/2)

> ftpareto <- fit.cop.IFM.2(lossalae, copula= dtcop,

+ marg="pareto", arg.margin=list(start=list(shape=10, scale=100),

+ lower=c(1, 1/2)), arg.cop= argt, initpar=c(1/2, 4))

> fgumbelpareto <- fit.cop.IFM.2(lossalae, copula= dgumcop,

+ marg="pareto", arg.margin=list(start=list(shape=10, scale=100),

+ lower=c(1, 1/2)), arg.cop= arggum, initpar=10)

> fHRpareto <- fit.cop.IFM.2(lossalae, copula= dHRcop,

+ marg="pareto", arg.margin=list(start=list(shape=10, scale=100),

+ lower=c(1, 1/2)), arg.cop= argHR, initpar=10)

> ffrankpareto <- fit.cop.IFM.2(lossalae, copula= dfrankcop,

+ marg="pareto", arg.margin=list(start=list(shape=10, scale=100),

+ lower=c(1, 1/2)), arg.cop= argfrank, initpar=10)

> recap <- function(x){

+ res <- c(alpha=x$copula$alpha, x$marg1$estimate, x$marg2$estimate) +  if(length(res) < 6)

+  res   <-  c(res[1], NA, res[2:5])

+  res <- as.matrix(res)

+  colnames(res) <- x$copula$name

+  res}

> round(cbind(recap(fgausspareto), recap(ftpareto), + recap(fHRpareto), recap(fgumbelpareto),

+ recap(ffrankpareto)  ),  4)

   Gaussian Student Husler-Reiss   Gumbel   Frank

alpha 0.4783  0.4816   1.1133   1.4444  3.1140

   NA  9.6475   NA   NA   NA

shape 1.2377  1.2377   1.2377   1.2377  1.2377

scale 16228.2572 16228.2572 16228.257 16228.2572  16228.2572

shape 2.2230  2.2230   2.2230   2.2230  2.2230

scale 15133.3463 15133.3463  15133.3463  15133.3463  15133.3463

The level of dependency seems low as the value of the first parameter is either close to 1 (for non-elliptic copulas) or close to 0 for elliptic copulas. Unsurprisingly, the fitted parameters of the marginal distributions are identical. In order to assess the quality of the fit, we look at the tail coefficients $λ_{U} (X, Y)$ and $λ_{L} (X, Y),$ which are computable given a copula. They can be estimated on data by using a non-parametric estimate of C(t,t), that is, the empirical bivariate distribution function $C_{n} (t, t) = \sum_{i = 1}^{n} 1_{x}_{i \leq} 1_{y i}_{\leq t} / n .$ This is done by the following function:

> Lemp <- function(u, obs)

+ sapply(1:length(u), function(i)

+ 1/NROW(obs)*sum(obs[,1] <= u[i] & obs[,2] <= u[i]))/u

> Uemp <- function(u, obs)

+ (1-2*u+sapply(1:length(u), function(i)

+ 1/NROW(obs)*sum(obs[,1] <= u[i] & obs[,2] <= u[i])))/(1-u)

> Lcop <- function(u, pcop, param=param)

+ pcop(cbind(u, u), param=param)/u

> Ucop <- function(u, pcop, param=param)

+ (1-2*u+pcop(cbind(u, u), param=param))/(1-u)

The plot of the two tail coefficients is now possible.

> u <- seq(0, 0.4, length=101)

> rklossalae <- apply(lossalae, 2, rank)/NROW(lossalae)

> plot(u, Lemp(u, rklossalae), type="l", main="Lower coefficient",

+ ylim=c(0,.6), xlab="u", ylab="L(u)")

> lines(u, Lcop(u, pgumcop, fgumbelpareto$copula$alpha), lty=2)

> lines(u, Lcop(u, pHRcop, fHRpareto$copula$alpha), lty=3)

> lines(u, Lcop(u, ptcop, ftpareto$copula$alpha), lty=4, col="grey25")

> lines(u, Lcop(u, pnormcop, fgausspareto$copula$alpha), lty=5, col="grey25")

> lines(u, Lcop(u, pfrankcop, ffrankpareto$copula$alpha), lty=6, col="grey25")

> legend("bottomright", lty=1:6, col=c(rep("black", 3), rep("grey25", 3)),

+ leg=c("emp.", "Gumbel","Husler-Reiss","Student","Gaussian","Frank"))

As shown on Figure 2.20, the tail coefficients are best approximated by the extreme copulas: Gumbel and Husler-Reiss. When considering Kendall’s tau or Spearman’s rho, these two copulas are also reasonably good. Kendall’s tau is here:

Figure 2.20

Figure showing Tail coefficients.

Tail coefficients.

> cbind(emp=cor(lossalae, method="kendall")[1,2],

+ Frank=taufrankcop(ffrankpareto$copula$alpha),

+ Gumbel= taugumcop(fgumbelpareto$copula$alpha),

+ HR= tauHRcop(fHRpareto$copula$alpha),

+ Gauss = taunormcop(fgausspareto$copula$alpha),

+ Student = tautcop(ftpareto$copula$alpha))

  emp Frank Gumbel   HR Gauss  Student

Tau 0.3154175 0.3171107 0.3076705 0.3008525 0.3174771 0.3198691

while Spearman’s rho is

> cbind(emp=cor(lossalae, method="spearman")[1,2],

+ Frank=rhofrankcop(ffrankpareto$copula$alpha),

+ Gumbel= rhogumcop(fgumbelpareto$copula$alpha),

+ HR= rhoHRcop(fHRpareto$copula$alpha),

+ Gauss = rhonormcop(fgausspareto$copula$alpha),

+ Student = rhotcop(ftpareto$copula$alpha))

  emp Frank Gumbel  HR  Gauss   Student

Rho 0.451872 0.4622724 0.4424486 0.4360027 0.4500000  0.4500000

Therefore, we continue the analysis only with the Gumbel and Husler-Reiss copulas.

Given a copula, quantities of interest can be estimated by a Monte-Carlo method. Focusing on the distribution of the total expense, that is, the sum of loss and ALAE, we simulate this sum with the following template function:

> simul.cop.2 <- function(n, rcopula, fit){

+ U <- rcopula(n, fit$copula$alpha)

+ qmarg1 <- paste("q", fit$marg1$distname, sep="")

+ qmarg2 <- paste("q", fit$marg2$distname, sep="")

+ cbind(

+ X1=do.call(qmarg1, c(list(p=U[,1]), fit$marg1$estimate)),

+ X2=do.call(qmarg2, c(list(p=U[,2]), fit$marg2$estimate))

+)}

where rcopula() is the copula random generator defined as

> rgumcop <- function(n, param)

+ revCopula(n, type="gumbel", param=param[1])

> rHRcop <- function(n, param)

+ revCopula(n, type="husler.reiss", param=param[1])

> rindep <- function(n,param)

+ cbind(runif(n), runif(n))

Choosing a sample size of n = 1e4, we plot the empirical distribution functions of the sum for the Gumbel, the Husler-Reiss and the independent copulas.

> sumlossalae <- rowSums(lossalae)

> n <- 10"4

> sumgumpareto <- rowSums(simul.cop.2(n, rgumcop, fgumbelpareto))

> sumHRpareto <- rowSums(simul.cop.2(n, rHRcop, fHRpareto))

> sumindeppareto <- rowSums(simul.cop.2(n, rindep, fgumbelpareto))

> plot(ecdf(sumlossalae), log="x", main="", xlim=range(sumlossalae))

> z <- 10"seq(1, log(max(sumlossalae)), length=201)

> lines(z, ecdf(sumgumpareto)(z), lty=2)

> lines(z, ecdf(sumHRpareto)(z), lty=3)

> lines(z, ecdf(sumindeppareto)(z), lty=4)

> legend("bottomright", lty=1:4, col="black",

+ leg=c("emp.", "Gumbel", "Husler-Reiss", "indep."))

On Figure 2.21, we observe that both extreme copulas are particularly adapted for the lossalae dataset, irrespective of the choice of the marginal distribution. Yet, we notice that the choice of the lognormal distribution better suits the tail of the empirical distribution. However, as extreme claims above 1 million dollars are less observed (by definition) than large claims (between 100 thousand and 1 million), the experimenter must take a prudential view of the right-hand tail without overfitting data. The independent copula seems to be a reasonable approximation of the tail of the distribution (yet not of the rest of the distribution).

Figure 2.21

Figure showing Tails of the distribution of the total claim (loss + ALAE).

Tails of the distribution of the total claim (loss + ALAE).

2.7 Exercises

For these exercises, consider the following vectors,

> set.seed(123)

> X1 <- trunc(rlnorm(n=753,5))

> X2 <- rgamma(n=267,1.2,.25) fitdistr(X2,"lognormal")

2.1. Benford’s law is defined on {1, 2,..., 8, 9} as follows:
$ℙ (N = n) = \frac{\log (1 + \frac{1}{n})}{\log (10)} .$
Write functions dBenford, pBenford and rBenford that return the density, the (cumulative) distribution function and generate random values according to this distribution.
2.2. Given a vector of integers, x, write a function that counts frequency of first digits of elements of x. Write a function, based on Person’s statistics, that tests if Benford’s distribution is relevant, or not (using functions of Exercise 1.). Try this function on X1 .
Hint: Use function chisq.test to run the test.
2.3. Using fitdistr from library MASS, find maximum likelihood estimators if we fit a lognornal distribution to X2.
2.4. According to the Kolmogorov-Smirnov test, can we model X2 using a LN(1,1) distribution?
2.5. Assume that $N \sim p (10)$ . Compute $P (N \in [8, 12])$ . Find (numerically) max ${P (N \in [a : a + 4]), a \in ℕ}$
2.6. Use function barplot to visualize the probability function associated to Sibuya’s distribution, defined on N by
$P (N \leq n) = 1 - {(1 - n)}^{r}, for some r \in (0, 1) .$
2.7. Without using a loop, write a function rmix(n,p,rf1,rf2,...) that generates n random variables, i.i.d., with distribution the mixture of f1, ..., fn, where p is a vector of probabilities of length n.
2.8. Compute the 95% quantile of a compound sum, when N has a negative binomial distribution, with mean 2 and variance 3, and when $X_{i}$ have a gamma distribution, with mean 100 and variance 150, using the Panjer algorithm.
2.9. Compute the 95% quantile of a compound sum, when N has a negative binomial distribution, with mean 2 and variance 3, and when $X_{i}$ have a gamma distribution, with mean 100 and variance 150, using Monte Carlo simulations.
2.10. Fit a gamma and a lognormal distribution to data(danishuni) from library CASdatasets. Which model gives the highest 95% quantile. Compare it with the empirical one.
2.11. Fit a gamma and a lognormal distribution to lossalae$ALAE from library CASdatasets. Which model gives the highest 95% quantile. Compare it with the empirical one.
2.12. Using a gamma distribution on lossalae$ALAE, what would be the expected value, given that lossalae$ALAE is larger than 10,000.
2.13. From results mentioned in Section 2.1.3 and functions associated to phtype, write functions dErlang(x,k,lambda), pErlang(x,k,lambda) and rErlang(n,k,lambda).

2.14. Fit a copula on the following sample:

> n <- 1500

> set.seed(123)

> theta <- rgamma(n,.5)

> U <- cbind(rank(rexp(1500,rate=theta))/(n+1),

+ rank(rexp(1500,rate=theta))/(n+1))

2.15. Given a > 0, fit a truncated distribution to the following sample:

> n <- 1500

> a <- 1

> set.seed(123)

> X <- rep(NA,n)

> for(i in 1:n){

> x <- a; v<- 2

+ while(v*x>a) v<- runif(1); x<-sqrt(a"2-2*log(runif(1)))

+ X[i] <- x +}

2.16. Fit a distribution to the following sample:

> n <- 1500

> set.seed(123)

> f <- function(v) -v[1]*log(v[2]*v[3])

> X <- apply(matrix(runif(3*n),n,3),1,f)

2.17. Fit a distribution to the following sample:
```
> n <- 1500
```
```
> set.seed(123)
```
```
> f <- function(v) log((v[1]*v[2])/(v[3]*v[4]))
```
```
> X <- apply(matrix(runif(4*n),n,4),1,f)
```
Hint: It is the density of the sum of two independent and identically distributed random variables.

2.18. Fit a distribution to the following sample:

> n <- 1500

> set.seed(123)

> a <- .4; b <- .2

> U <- runif(n)~(1/a)

> V <- runif(n)~(1/b)

> X <- (U/(U+V))[U+V <= 1]

2.19. Fit a distribution to the following sample:

> n <- 1500

> set.seed(123)

> a <- 17; b <- 12

> B <- rbeta(n,a,b)

> X <- b*B/(a*(1-B))

2.20. Fit a distribution to the following sample:

> n <- 1500

> set.seed(123)

> B <- rbeta(n,a/2,a/2)

> X <- sqrt(a)*(B-.5)/sqrt(B*(1-B))

2.21. Consider the following counting process, with sequence of arrival times T:

> Tmax <- 20

> set.seed(123)

> T <- 0

> while(max(T)<=Tmax) T<-c(T,sqrt(max(T)"2+2*rexp(1)))

Assuming that this sequence is generated from a non-homogeneous Poisson process, suggest values for an affine intensity $λ_{t}$ .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2 Standard Statistical Inference

Create new playlist

Sign In

Sign Up