Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 15

Longitudinal Data and Experience Rating

Katrien Antonio

Universiteit van
Amsterdam Amsterdam, Netherlands

Peng Shi

University of Wisconsin Madison
Madison, Wisconsin, USA

Frank van Berkum

Universiteit van Amsterdam
Amsterdam, Netherlands

15.1 Motivation

15.1.1 A Priori Rating for Cross-Sectional Data

In Chapter 14 on “General Insurance Pricing,” Boucher & Charpentier discuss regression techniques suitable for pricing with cross-sectional data. A cross-sectional dataset observes each subject in the sample (for instance, a policy(holder)) only once. Each subject is therefore described by a single response, say s $Y_{i}$ (Generalized) for subject i, and vector with covariate information, xi. Assuming independence between subjects, Linear Models [(G)LMs] are directly available for statistical modeling and explain the response as a function of the risk factors, within an appropriate distributional framework. For pricing in general insurance, the actuary builds such models for a (cross-sectional) dataset with claim counts on the one hand and claim severities on the other hand as response variables, obtained by following policyholders during a single time period. The result is a tariff based on risk classification through regression modeling. When the explanatory variables used as rating factors express a priori correctly measurable information about the policyholder (or, for instance, the vehicle or the insured building), the system is called an a priori classification scheme. The examples in Chapter 14 illustrate this idea and use, for instance, age of the driver and age of the car to explain the number of claims registered by a policyholder.

15.1.2 Experience Rating for Panel Data

However, despite the presence of an a priori rating system, some important risk factors remain unmeasurable or unobservable. For example, in automobile insurance, the insurer is unable to detect the driver's aggressiveness behind the wheel or the quickness of his reflexes to avoid a possible accident (see Denuit et al. (2007) for further motivation). This motivates the presence of inhomogeneous tariff cells within an a priori rating system. An a posteriori or experience rating system is necessary to allow for the reevaluation of the premium (established a priori) based on the history of claims as reported by the insured. One can argue that an important predictor for the future number of claims reported by an insured will be the number of claims reported in the past. Predictive modeling for experience rating will confront analysts with data structures going beyond the cross-sectional design dealt with in (G)LMs. Longitudinal (or panel) data arise when the claim history of policyholders (or, in general, a group of “subjects”) is registered repeatedly over time. Thus, with longitudinal data, the variables will have double subscripts, indicating the subject and observation period, respectively. Specifically, let Yit denote the response for the i-th subject in the t-th time period, and let xit denote the associated vector of explanatory variables. Assuming that there are n subjects and following the i-th subject over $t = 1, ..., T_{i}$ time periods, we observe

$\begin{matrix} 1 st subject & {(y_{11}, X_{11}) & , (y_{12}, X_{12}), & \dots & , (y_{11}, X_{1} T_{1})} \\ 2st subject & {(y_{21}, X_{21}), & (y_{22}, X_{22}), & \dots & , (y_{21}, X_{2} T_{2})} \\ ⋮ & ⋮ \\ n -th subject & {(y_{n 1}, X_{n 1}), & (y_{n 2}, X_{n 2}), & \dots & , (y_{n 1}, X_{n} T_{n})} \end{matrix}$

Longitudinal data have several potential advantages. First, longitudinal data are a hybrid of cross-sectional and time series data. On the one hand, they allow for the examination of the effects of covariates on the response, as in usual regression. On the other hand, similar to time series analysis, they also permit the identification of dynamic relations over time. Because they share subject-specific characteristics, observations on the same subject over time are correlated and require an adjusted toolkit for statistical modeling. In this chapter we study regression models incorporating these dynamics, among others, by extending a priori rating with so-called random effects. These random effects structure correlation between observations registered on the same subject, and also take heterogeneity among subjects, due to unobserved characteristics, into account.

15.1.3 From Panel to Multilevel Data

The panel data setting has two layers (or levels) of data: the time level on the one hand and the timesubject level on the other hand. However, insurers may have several other layers of data at their disposal. For example, Antonio et al. (2010) discuss experience rating for a dataset on fleet covers, registered for multiple insurance companies. Fleet policies are umbrella-type policies issued to customers whose insurance covers more than a single vehicle. The hierarchical or multilevel structure of the data is as follows: vehicles (v) observed over time (t), nested within fleets (f), with policies issued by insurance companies (c). Multilevel models allow for incorporating the hierarchical structure of the data by specifying random effects at the various levels in the data. Once again, these random effects represent unobservable characteristics at each level. Moreover, random effects allow a posteriori updating of an a priori tariff, by taking into account the past performance of—in the case of intercompany fleet contracts—the vehicle, fleet, and company.

15.1.4 Structure of the Chapter

Section 15.2 of this chapter considers linear models for longitudinal data. We discuss three approaches to capture unobserved heterogeneity in the longitudinal data context. Section

15.2.2 introduces the basic fixed effects model and describes the model specification and diagnostics. Section 15.2.3 extends these models to incorporate serial correlation in error terms. Section 15.2.4 presents models with random effects and generalizes the framework to linear mixed models. Section 15.2.5 covers the prediction for the linear mixed effects model and points out its connection to the actuarial credibility theory. The computational aspects are illustrated using a dataset introduced in Section 15.2.1. In Section 15.3 we leave the framework of linear models and switch to a distributional framework that is probably more appealing to actuaries, namely the generalized linear models and their random effects extensions. Actuarial credibility systems are examples of a posteriori rating systems accounting for the history of claims as it emerges for an individual risk. Commercial versions of these experience rating schemes are more widely known in practice as Bonus-Malus scales. A case study (using R) with such rating schemes is the topic of Section 15.3.2. The theory on longitudinal data models is based on Diggle et al. (2002), Frees (2004), Hsiao (2003), Wooldridge (2010), and the references therein. Section 15.3 of this chapter is based on Antonio & Valdez (2012), Antonio & Zhang (2014), and Denuit et al. (2007) but focus now on implementation with R. We refer to these papers and the references therein for more technical background. This chapter only covers examples with panel data. We refer to Antonio et al. (2010) for examples with multilevel data structures.

15.2 Linear Models for Longitudinal Data

15.2.1 Data

For linear longitudinal data models, we demonstrate the theory and computational aspects using a dataset of automobile bodily injury liability claims that was described and employed in Frees & Wang (2005). The dataset contains claims of 6 years from 1993 to 1998 for a random sample of twenty-nine towns in the state of Massachusetts. All variables in monetary values are rescaled using the consumer price index to mitigate the effect of time trends. We are interested in the behavior of average claims per unit of exposure, that is, the pure premium, for each town and each year. Two explanatory variables are available for the regression analysis, the per-capita income (PCI) and the population per square mile (PPSM) of each town. The variables and their descriptions are summarized in Table 15.1.

Table 15.1

Description of variables in the auto claim dataset.

Variable	Description
TOWNCODE	The index of Massachusetts towns
YEAR	The calendar year of the observation
AC	Average claims per unit of exposure
PCI	Per-capita income of the town
PPSM	Population per square mile of the town

> # File name is AutoClaimData.txt

> AutoClaim = read.table(choose.files(), sep = "", quote = "",header=TRUE)

> names(AutoClaim)

[1] "TOWNCODE" "YEAR" "AC"        "PCI"    "PPSM"

> AutoClaim[1:12,]  # Check longitudinal structure

  TOWNCODE YEAR   AC  PCI PPSM

1   10 1993 160.8522 18134.04  1475.5515

2   10 1994 158.3382 18495.88  1461.8110

3   10 1995 156.8098 18778.29  1488.9911

4   10 1996 168.9899 18740.46  1502.9322

5   10 1997 171.8229 18809.62  1534.4251

6   10 1998 153.7644 19034.59  1557.6937

7   11 1993 149.3873 15597.56 855.4350

8   11 1994 137.5546 15908.79 877.2725

9   11 1995 169.9164 16151.69 872.8024

10  11 1996 169.0598 16119.15 898.7802

11  11 1997 161.3425 16178.64 929.2647

12  11 1998 138.0516 16372.14 940.9162

We use the data in the first 5 years, namely 1993-1997, to develop the model and keep the observations in the final year for validation purposes. To explore relations among variables, the techniques used for usual regressions such as histogram and correlations statistics are ready to apply for longitudinal data. In addition, we introduce several more specialized techniques. The first is the multiple time series plot as exhibited in Figure 15.1, where the average claims in multiple years for each town are joined using straight lines. The plot shows the development of claims over time and helps visualize town-specific effects.

Figure 15.1

Figure showing Multiple time series plot of average claims.

Multiple time series plot of average claims.

# Use year 1993-1997 as trainning data and reserve year 1998 for validation AutoClaimIn <- subset(AutoClaim, YEAR < 1998)

> # Multiple time series plot

> plot(AC ~ YEAR, data = AutoClaimIn, ylab="Average Claim", xlab="Year")

> for (i in AutoClaimIn$TOWNCODE) {

+ lines(AC ~ YEAR, data = subset(AutoClaimIn, TOWNCODE == i))}

One can also use scatterplots to help detect the relation between the response and explanatory variables. Figure 15.2 displays the scatterplot for variables PCI and PPSM, suggesting the negative relation between AC and PCI and the positive relation between AC and PPSM. Note that we use both PCI and PPSM in log scale, and logarithmic values will be used in the following analysis. In addition, we also serially connect the observations to identify potential patterns in each covariate. In this case, we observe that PCI varies over time and PPSM is relatively statable.

Figure 15.2

Figure showing Scatterplot between average claims and explanatory variables.

Scatterplot between average claims and explanatory variables.

> # Scatter plot to explore relations

> AutoClaimIn$lnPCI <- log(AutoClaimIn$PCI)

> AutoClaimIn$lnPPSM <- log(AutoClaimIn$PPSM)

> plot(AC ~ lnPCI, data = AutoClaimIn, ylab="Average Claim", xlab="PCI")

> for (i in AutoClaimIn$TOWNCODE) {

+ lines(AC ~ lnPCI, data = subset(AutoClaimIn, TOWNCODE == i))}

> plot(AC ~ lnPPSM, data = AutoClaimIn, ylab="Average Claim", xlab="PPSM")

> for (i in AutoClaimIn$TOWNCODE) {

+ lines(AC ~ lnPPSM, data = subset(AutoClaimIn, TOWNCODE == i))}

As a preliminary analysis, we consider a pooled cross-sectional regression model Pool.fit assuming all observations are independent, that is,

$y_{i t} = α {+ x^{'}}_{i t} β + ε_{i t} (15.1)$

Here, α is the homogeneous intercept for all towns and β is the vector of regression coefficients. Variables PCI, PPSM, and YEAR are included as covariates. As expected, we observe a significant negative effect of PCI and a positive effect of PPSM. We also observe an increasing trend in claims after purging off the inflation. Functions such as lm and anova are used to fit and analyze the ordinary least squares regression:

> AutoClaimIn$YEAR <- AutoClaimIn$YEAR-1992

> Pool.fit <- lm(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn)

> summary(Pool.fit)

Call:

lm(formula = AC ~ lnPCI + lnPPSM + YEAR, data = AutoClaimIn)

Residuals:

	Min	 1Q Median 3Q Max

 -49.944 -16.154 -1.759 14.300 104.468

Coefficients:

	 Estimate  Std. Error t value  Pr(>|t|)

(Intercept) 899.569  120.150 7.487 6.98e-12 ***

lnPCI   -92.604   11.855  -7.812 1.17e-12 ***

lnPPSM   22.305  2.933 7.606 3.64e-12 ***

YEAR   3.923  1.519 2.583  0.01082 *

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.75 on 141 degrees of freedom
Multiple R-squared: 0.4908, Adjusted R-squared: 0.48

F-statistic: 45.3  on 3  and 141  DF, p-value:  < 2.2e-16

> anova(Pool.fit)

Analysis of Variance Table

Response: AC

   Df  Sum Sq  Mean Sq F value  Pr(>F)

lnPCI  1 46355 46355 69.9028  5.402e-14 ***

lnPPSM  1 39344 39344 59.3302  2.141e-12 ***

YEAR   1 4423  4423  6.6704 0.01082 *

Residuals 141 93502  663

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

15.2.2 Fixed Effects Models

Repeated observations allow one to study the heterogeneity, be it either of subject or of time. We begin with the basic fixed effects model by introducing subject-specific intercepts in the model

$y_{i t} = α + {x^{'}}_{i t} β + ε_{i t} . (15.2)$

Hereby, $α_{i}$ is a town-specific intercept $(i = 1, ..., n); x_{i t} = {(x_{i t},_{1}, ..., x_{i t}_{, Κ})}^{'}$ is the vector of covariates; and is the vector of regression coefficients to be estimated. There are alternative methods to treat the heterogeneous intercepts. In this section, we assume ${α_{i}}$ are fixed parameters to be estimated along with. β Here, β is known as the population parameter capturing the common effects of explanatory variables. ${α_{i}}$ , called nuisance parameters, vary by subject (here, town) and account for the subject heterogeneity. In the following, we will be using notations $T = \max {T_{1}, ..., T_{n}}$ and $N = Σ^{n}_{i} T_{i} .$

The basic fixed effects model assumes that there is no within-subject serial correlation, that is, s $\in_{i t}$ are i.i.d. random variables with mean zero and variances $σ^{2}$ . Thus, by the Gauss- Markov theorem, the OLS estimates are the best linear unbiased estimates with

$\hat{β} = {(\sum_{i = 1}^{n} \sum_{t = 1}^{T i} (x_{i t} - {\bar{x}}_{i}) {(x_{i t} - {\bar{x}}_{i})}^{'})}^{- 1} (\sum_{i = 1}^{n} \sum_{t = 1}^{T i} (x i t - \bar{x}) {(y i t - \bar{y} i)}^{'}) (15.3)$

and

${\overset{⌢}{a}}_{i} = {\bar{y}}_{i} - {\bar{x}}^{'}_{i} \overset{⌢}{β} . (15.4)$

Here, ${\bar{y}}_{i}$ and ${\bar{x}}_{i}$ are averages of ${y_{i t}}$ and ${x_{i t}}$ over time, respectively. The above is also known as within estimator because it uses the time variation within each cross section. In addition, the variance of $\hat{β}$ is shown to be

$var \overset{⌢}{β} = s^{2} {(\sum_{i = 1}^{n} \sum_{t = 1}^{T i} (x_{i t} - {\bar{x}}_{i}) {(x_{i t} - {\bar{x}}_{i})}^{'})}^{- 1} (15.5)$

where s2 is the unbiased estimate of $σ^{2}$ using residuals. In deriving the large sample property, one assumes $n \to \infty$ and T remaining fixed. Under regular conditions, one can show that $\hat{β}$ is consistent and asymptotically normally distributed. However, { $a_{i}$ } are not consistent and are not even approximately normal if the responses are not normally distributed.

We fit this basic fixed effects model FE.fit using lm by treating TOWNCODE as a categorical variable. The t- and F-statistics are constructed in the same way as in classical regression models. Note that the above model could be easily modified to account for time- specific heterogeneity by replacing $a_{i}$ , with $λ_{t}$ . Similarly, using categorical variables for the time dimension, least squares estimation is readily applied.

> # Basic fixed-effects model

> FE.fit <- lm(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1, data=AutoClaimIn)

> summary(FE.fit)

Call:

lm(formula = AC ~ factor(TOWNCODE) + lnPCI + lnPPSM + YEAR - 1, data = AutoClaimIn)

Residuals:

Min  1Q Median 3Q Max

-55.645 -8.900 0.177 8.995 50.141

Coefficients:

		Estimate Std. Error t value Pr(>|t|)

factor(TOWNCODE)10 1660.321 1846.793  0.899 0.371

factor(TOWNCODE)11 1558.851 1794.617  0.869 0.387

factor(TOWNCODE)12 1554.375 1884.831  0.825 0.411

factor(TOWNCODE)13 1360.128 1731.874  0.785 0.434

factor(TOWNCODE)14 1443.895 1780.094  0.811 0.419

factor(TOWNCODE)15 1681.983 1841.401  0.913 0.363

(et cetera)

lnPCI	   -22.631 159.268 -0.142 0.887

lnPPSM    -176.831 107.240 -1.649 0.102

YEAR		  5.947 2.738  2.172 0.032 *

---

Signif. codes: 0  '***' 0.001 '**' 0.01 0.05 '.' 0.1  ' '  1

Residual standard error: 18.88 on 113 degrees of freedom

Multiple R-squared: 0.9863, Adjusted R-squared: 0.9824

F-statistic: 254.4 on 32 and 113 DF, p-value: < 2.2e-16

> anova(FE.fit)

Response: AC

factor(TOWNCODE)

		 Df Sum Sq Mean Sq F value Pr(>F)

factor(TOWNCODE) 29 2897069 99899 280.3677 < 2e-16 ***

lnPCI		 1 2231 2231  6.2621 0.01377 *

lnPPSM		 1	 34  34  0.0967 0.75638

YEAR		 1 1681 1681  4.7168 0.03196 *

Residuals	113  40263  356

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We further discuss three specific tests for model specification and diagnostics. The first is the pooling test, where one wishes to test whether the subject-specific effect is significant. The null hypothesis is

$H_{0} : a_{1} = a_{2} = ... = a_{n} = a .$

This can be done using the partial F- (Chow) test (see Chow (1960)) by calculating

$F - r a t i o = \frac{{(E r r o r S S)}_{P o o l e d} - E r r o r S S}{(n - 1) s^{2}} .$

Here, ErrorSS and $s^{2}$ are from the heterogeneous model (i.e., FE.fit) and (ErrorSS)Pooled are from the homogeneous model (i.e., pool.fit). It can be shown that F-ratio follows an F-distribution with degrees of freedom df1 = n-1 and df 2= N- (n+K). In this example, the F-statistic is equal to (93, 502 - 40, 263)/(29 - 1)/18.882 = 5.33, so we reject the null hypothesis.

> anova(Pool.fit,FE.fit)

Analysis of Variance Table

Model 1: AC ~  lnPCI + lnPPSM + YEAR

Model 2: AC ~ factor(TOWNCODE) + lnPCI + lnPPSM + YEAR - 1

Res.Df RSS Df Sum of Sq F Pr(>F)

1 141 93502

2 113 40263 28 53238 5.3362  7.214e-11 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

15.2.3 Models with Serial Correlation

An alternative approach to capture heterogeneity is to use serial correlation. The intuition is that if there are some unobserved time constant variables affecting the response, they will introduce correlation among repeated observations. To motivate this approach, we examine the serial correlation of residuals from Pool.fit. The results show strong temporal correlation among AC after removing the effects of explanatory variables. This suggests that the i.i.d. assumption used in the homogeneous model is not appropriate.

> # Correlation among residuals

> AutoClaimIn$rPool <- resid(Pool.fit)

> rvec <- cbind(subset(AutoClaimIn,YEAR==1)$rPool,subset(AutoClaimIn,YEAR==2)$rPool, + subset(AutoClaimIn,YEAR==3)$rPool,subset(AutoClaimIn,YEAR==4)$rPool,

+ subset(AutoClaimIn,YEAR==5)$rPool)

> cor(rvec)

	 [,1]	[,2]  [,3]  [,4]   [,5]

[1,]  1.0000000 0.5862895 0.5187797 0.4207831  0.5424555

[2,]  0.5862895 1.0000000 0.3911814 0.2164202  0.2555096

[3,]  0.5187797 0.3911814 1.0000000 0.3955654  0.7890728

[4,]  0.4207831 0.2164202 0.3955654 1.0000000  0.4778912

[5,]  0.5424555 0.2555096 0.7890728 0.4778912  1.0000000

To relax the i.i.d. assumption, we first consider a homogeneous model with serial correlation. For subject i, the matrix presentation of the model is

$y_{i} = X_{i} β + ε_{i}, (15.6)$

Where

$y_{i} = (\begin{matrix} y_{i} 1 \\ y_{\begin{matrix} i \\ ⋮ \end{matrix}} 2 \\ y_{i} T_{i} \end{matrix}), X_{i} = (\begin{matrix} x_{i} 1, 1 \\ x_{i} 2, 1 \\ \begin{matrix} ⋮ \\ x_{i}_{T i,} 1 \end{matrix} \end{matrix} \begin{matrix} x_{i 1,} 2 \\ x_{i 2,} 2 \\ \begin{matrix} ⋮ \\ x_{i T i,} 2 \end{matrix} \end{matrix} \begin{matrix} \dots \\ \dots \\ \dots \end{matrix} \begin{matrix} x_{i 1,} K \\ x_{i 2,} K \\ \begin{matrix} ⋮ \\ x_{i T,} K \end{matrix} \end{matrix}) = (\begin{matrix} X_{i 1} \\ X_{i 2} \\ ⋮ \\ X_{i T i} \end{matrix}), ε i = (\begin{matrix} ε_{i}_{1} \\ ε_{i}_{2} \\ ⋮ \\ ε_{i T i} \end{matrix}) . (15.7)$

Now we assume that $\in_{i}$ are correlated with $v a r (\in_{i}) = R_{i} .$ Let $R = R (τ)$ denote the $T \times T$ temporal covariance matrix for a vector of T observations. Unknown parameters in this covariance matrix are denoted with τ. Note there are at most T(T + 1)/2 unknown elements in R. Commonly used special cases of R are (using T = 5):

$\begin{matrix} Independent R = & Compound Symmetry R = & AR(1) R = \\ (\begin{matrix} \begin{matrix} σ^{2} \\ 0 \end{matrix} \\ 0 \\ 0 \\ 0 \end{matrix} \begin{matrix} 0 \\ σ^{2} \\ 0 \\ \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix} \begin{matrix} 0 \\ 0 \\ σ^{2} \\ \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix} \begin{matrix} 0 \\ 0 \\ 0 \\ \begin{matrix} σ^{2} \\ 0 \end{matrix} \end{matrix} \begin{matrix} 0 \\ 0 \\ 0 \\ \begin{matrix} 0 \\ σ^{2} \end{matrix} \end{matrix}) & σ^{2} (\begin{matrix} 1 & p & p & p & p \\ p & 1 & p & p & p \\ p & p & 1 & p & p \\ p & p & p & 1 & p \\ p & p & p & p & 1 \end{matrix}) & σ^{2} (\begin{matrix} 1 \\ p \\ p^{2} \\ \begin{matrix} p^{3} \\ p^{4} \end{matrix} \end{matrix} \begin{matrix} p \\ 1 \\ p \\ \begin{matrix} p^{2} \\ p^{3} \end{matrix} \end{matrix} \begin{matrix} p^{2} \\ p \\ 1 \\ \begin{matrix} p \\ p^{2} \end{matrix} \end{matrix} \begin{matrix} p^{3} \\ p^{2} \\ p \\ \begin{matrix} 1 \\ p \end{matrix} \end{matrix} \begin{matrix} p^{4} \\ p^{3} \\ p^{2} \\ \begin{matrix} p \\ 1 \end{matrix} \end{matrix}) \\ Toeplitz R = & Banded Toeplitz R = & Unstructured R = \\ (\begin{matrix} σ^{2} \\ σ_{1} \\ σ_{2} \\ \begin{matrix} σ_{3} \\ σ_{4} \end{matrix} \end{matrix} \begin{matrix} σ_{1} \\ σ^{2} \\ σ_{1} \\ \begin{matrix} σ_{2} \\ σ_{3} \end{matrix} \end{matrix} \begin{matrix} σ_{2} \\ σ_{1} \\ σ^{2} \\ \begin{matrix} σ_{1} \\ σ_{2} \end{matrix} \end{matrix} \begin{matrix} σ_{3} \\ σ_{2} \\ σ_{1} \\ \begin{matrix} σ^{2} \\ σ_{1} \end{matrix} \end{matrix} \begin{matrix} σ_{4} \\ σ_{3} \\ σ_{2} \\ \begin{matrix} σ_{1} \\ σ^{2} \end{matrix} \end{matrix}) & (\begin{matrix} σ^{2} \\ σ_{1} \\ σ_{2} \\ \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix} \begin{matrix} σ_{1} \\ σ^{2} \\ σ_{1} 1 \\ \begin{matrix} σ_{2} \\ 0 \end{matrix} \end{matrix} \begin{matrix} σ_{2} \\ σ_{1} \\ σ^{2} \\ \begin{matrix} σ_{1} \\ σ_{2} \end{matrix} \end{matrix} \begin{matrix} 0 \\ σ_{2} \\ σ_{1} \\ \begin{matrix} σ^{2} \\ σ_{1} \end{matrix} \end{matrix} \begin{matrix} 0 \\ 0 \\ σ_{2} \\ \begin{matrix} σ_{1} \\ σ^{2} \end{matrix} \end{matrix}) & (\begin{matrix} σ^{2} \\ σ_{12} \\ σ_{13} \\ \begin{matrix} σ_{14} \\ σ_{15} \end{matrix} \end{matrix} \begin{matrix} σ_{12} \\ σ^{2} \\ σ_{23} \\ \begin{matrix} σ_{24} \\ σ_{25} \end{matrix} \end{matrix} \begin{matrix} σ_{13} \\ σ_{23} \\ σ^{2} \\ \begin{matrix} σ_{34} \\ σ_{35} \end{matrix} \end{matrix} \begin{matrix} σ_{14} \\ σ_{24} \\ σ_{34} \\ \begin{matrix} σ^{2} \\ σ_{45} \end{matrix} \end{matrix} \begin{matrix} σ_{15} \\ σ_{25} \\ σ_{35} \\ \begin{matrix} σ_{45} \\ σ^{2} \end{matrix} \end{matrix}) \end{matrix}$

For the i-th observation, the covariance matrix var $(\in_{i}) = R_{i} (τ)$ $T_{i} \times T_{i}$ matrix. Here, $R_{i} (τ)$ is positive definite and depends on i only through its dimension; thus it can be determined by removing certain rows and columns of the matrix $R_{i} (τ)$ . This set of notations allows us to easily handle missing data and incomplete observations.

The model can be estimated using either moment-based or likelihood-based methods. With known Ri, the generalized least squares (GLS) estimates are obtained by minimizing

$\sum_{i = 1}^{n} (y_{i} - X_{i} β)' R_{i}^{- 1} (y_{i} - X_{i} β),$

and we have

$\overset{⌢}{β} = {(\sum_{i = 1}^{n} X_{i}^{'} R_{i}^{- 1} Xi)}^{- 1} \sum_{i = 1}^{n} X_{i}^{'} R_{i}^{- 1} y_{i} .$

We can estimate such a model using the R package nlme. Two types of likelihood-based methods are provided to estimate regression parameter β and variance components $τ$ , the full maximum likelihood (ML) estimation and the restricted maximum likelihood (REML) estimation. Based on the assumption of multivariate normality of the response yi, the full

log-likelihood function $(l = \log (L))$ fo the model is

$l_{M L} (β, T) = - \frac{1}{2} (\sum_{i = 1}^{n} \log {det R}_{i} (τ) + \sum_{i = 1}^{n} (y_{i} - X_{i} β)' R_{i} {(τ)}^{- 1} (y_{i} - X_{i} β)) . (15.8)$

The MLE follows by maximizing the above likelihood function over β and $τ$ simultaneously. It is also easy to show that for fixed covariance parameter $τ$ , the MLE of $β$ are the same as the generalized least squares estimators. It is known that the MLE of $τ$ is biased downward. To mitigate the bias, the restricted maximum likelihood maximizes the following log-likelihood function:

$l_{R E M L} (β, τ) \equiv l_{M L} (β, τ) - \frac{1}{2} \log det (\sum_{i = 1}^{n} X_{i}^{'} R_{i}^{- 1} X_{i}) . (15.9)$

The REML estimation will be discussed in more detail in the section on random-effects models.

In our application, we fit the linear model with three types of serial correlation: the compound symmetry, the AR(1), and the unstructured. See Table 15.2 for the results. We denote the resulting models by SCex.fit, SCar.fit, and SCun.fit, respectively. The models are fit using the function gls() in the nlme package. The argument correlation is used to specify matrix R( $τ$ ), and the argument methodis used to specify the estimation method. The default estimation approach is the REML. The estimation results are displayed in Table 15.2. The estimates of regression coefficients are similar and are consistent with the pooled cross-sectional regression model. The estimates of variance components suggest significant within-subject temporal correlation. Note that when unstructured covariance is specified, the model is not identifiable in its most general form due to the nonuniqueness of $R (τ)$ . Thus, additional constraints are necessary for identification purposes. The gls() function estimates the model under the parameterization $R = σ^{2} Σ$ where $σ^{2}$ is a scale parameter and $\sum$ is the correlation matrix.

For inference, the estimation error of population parameter β is based on

$\hat{V a r \overset{⌢}{β}} = {(\sum_{i = 1}^{n} X_{i}^{'} R {(\overset{⌢}{τ})}_{i}^{- 1} X_{i})}^{- 1} .$

The estimation error of $\hat{τ}$ can follow in different ways. The approach implemented in the gls() is to use the inverse of the observed Fisher information. The confidence interval for the scale parameter σ and correlation parameter ρs are obtained based on the approximate normal distribution of the ML or REML estimators of a transformation of parameters. Specifically, the 95% confidence interval of σ is

$[\exp (\hat{σ *} - 1.645 s_{\hat{σ *}}), \exp (\hat{σ *} + 1.645 s_{\hat{σ} *})],$

where $\begin{matrix} \hat{σ *} \equiv \ln \hat{σ} \end{matrix}$ is the associated standard error derived from the Fisher information. Similarly, the 95% confidence interval of ρ is

$[\frac{\exp (\hat{ρ} * - 1.645 s \hat{ρ} *) - 1}{\exp (\hat{ρ} * - 1.645 s \hat{ρ} *) + 1}, \frac{\exp (\hat{ρ} * + 1.645 s \hat{ρ} *) - 1}{\exp (\hat{ρ} * + 1.645 s \hat{ρ} *) + 1},],$

where $\hat{ρ} * \equiv \ln \frac{1 - \hat{ρ}}{1 + \hat{ρ}} and s_{\hat{ρ *}}$ is the corresponding standard error. In the package nlme, function intervals can be used to call for the 95% confidence interval of $τ$ , and function getVarCov can be used to call for the estimates of $R (\hat{τ})$ .

> library(nlme)

> # Compound symmetry

> SCex.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ correlation=corCompSymm(form=~1|TOWNCODE))

> summary(SCex.fit)

> intervals(SCex.fit,which = "var-cov")

> getVarCov(SCex.fit)

Marginal variance covariance matrix

	[,1] [,2]  [,3]  [,4]  [,5]

[1,] 688.50  326.07 326.07 326.07 326.07

[2,] 326.07  688.50 326.07 326.07 326.07

[3,] 326.07  326.07 688.50 326.07 326.07

[4,] 326.07  326.07 326.07 688.50 326.07

[5,] 326.07  326.07 326.07 326.07 688.50

Standard Deviations: 26.239 26.239 26.239 26.239 26.239

> # AR(1)

> SCar.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ correlation=corAR1(form=~1|TOWNCODE))

> summary(SCar.fit)

> intervals(SCar.fit,which = "var-cov")

> getVarCov(SCar.fit)

Marginal variance covariance matrix

	 [,1] [,2]  [,3] [,4] [,5]

[1,] 673.210 292.350 126.96 55.132 23.942

[2,] 292.350 673.210 292.35 126.960 55.132

[3,] 126.960 292.350 673.21 292.350 126.960

[4,]  55.132 126.960 292.35 673.210 292.350

[5,]  23.942  55.132 126.96 292.350 673.210

Standard Deviations: 25.946 25.946 25.946 25.946 25.946

> # Unstructured

> SCun.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ correlation=corSymm(form=~1|TOWNCODE))

> summary(SCun.fit)

> intervals(SCun.fit,which = "var-cov")

> getVarCov(SCun.fit)

Marginal variance covariance matrix

[,1]  [,2] [,3] [,4] [,5]

[1,] 696.15  485.50 324.79 315.06 374.16

[2,] 485.50  696.15 227.51 179.88 190.11

[3,] 324.79  227.51 696.15 284.12 522.68

[4,] 315.06  179.88 284.12 696.15 351.96

[5,] 374.16  190.11 522.68 351.96 696.15

Standard Deviations: 26.385 26.385 26.385 26.385 26.385

The usual t- or F-test statistics follow as for the i.i.d. case. Caution is needed for the tests based on the likelihood function. For example, the likelihood ratio test relies on the value of log-likelihood function rather than the restricted likelihood. One can use method="ML" in the gls() function to implement maximum likelihood estimation. We perform the test using anova for the models with serial correlation and the pooled cross-sectional regression. The results support the evidence of positive serial correlation.

Table 15.2

Estimation for models with serial correlation.

SCex.fit			SCar.fit		SCun.fit
Parameter	Est.	S.E.	Est.	S.E.	Est.	S.E.
(Intercept)	887.89	206.81	891.45	168.25	878.68	200.85
lnPCI	-91.20	20.41	-91.33	16.61	-90.81	19.81
lnPPSM	21.96	5.08	21.76	4.11	23.70	4.95
YEAR	3.91	1.14	3.55	1.66	1.82	1.03
	Est.	95%CI	Est.	95%CI	Est.	95%CI
CS	0.47	(0.29,0.64)
AR(1)			0.43	(0.26,0.58)
UN
corr(1,2)					0.70	(0.46,0.84)
corr(1,3)					0.47	(0.15,0.70)
corr(1,4)					0.45	(0.06,0.72)
corr(1,5)					0.54	(0.19,0.76)
corr(2,3)					0.33	(-0.00,0.59)
corr(2,4)					0.26	(-0.16,0.60)
corr(2,5)					0.27	(-0.13,0.60)
corr(3,4)					0.41	(0.11,0.64)
corr(3,5)					0.75	(0.57,0.86)
corr(4,5)					0.51	(0.20,0.72)
Scale	26.24	(22.22,30.98)	25.95	(22.62,29.76)	26.38	(22.27,31.26)
log-REML		-645.96		-654.25		-635.93
log-ML		-655.61		-663.67		-645.38
AIC		1323.21		1339.34		1320.75
BIC		1341.07		1357.21		1365.40

> # Likelihood ratio test

> SCex.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ correlation=corCompSymm(form=~1|TOWNCODE), method="ML")

> SCar.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ correlation=corAR1(form=~1|TOWNCODE), method="ML")

> SCun.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ correlation=corSymm(form=~1|TOWNCODE), method="ML")

> anova(SCex.fit.ml, Pool.fit)

    Model df   AIC   BIC   logLik   Test   L.Ratio p-value

SCex.fit.ml  1 6  1323.212  1341.073 -655.6062

Pool.fit   2 5  1359.497  1374.381 -674.7487 1  vs  2 38.28505 <.0001

> anova(SCar.fit.ml, Pool.fit)

    Model df   AIC   BIC   logLik   Test   L.Ratio p-value

SCar.fit.ml  1 6 1339.344 1357.205 -663.6721

Pool.fit   2 5 1359.497 1374.381 -674.7487 1  vs  2 22.15326 <.0001

> anova(SCun.fit.ml, Pool.fit)

    Model df   AIC   BIC   logLik   Test   L.Ratio p-value

SCun.fit.ml   1 15 1320.753 1365.404 -645.3763

Pool.fit   2 5 1359.497 1374.381 -674.7487 1  vs  2 58.74476 <.0001

Finally, we extend the above model to allow for heterogeneity. We consider a more general model where not only subject specific intercepts, but also subject-specific slopes are incorporated in the linear model as

$y_{i} = Z_{i} α_{i} + X_{i} β + ε_{i}, (15.10)$

with explanatory matrix

$Z_{i} = (\begin{matrix} Z_{i} {_{1,}}_{1} \\ Z_{i}_{2, 1} \\ ⋮ \\ Z_{i}_{T_{i}_{,} 1} \end{matrix} \begin{matrix} Z_{i 1,}_{2} \\ Z_{i 2,}_{1} \\ ⋮ \\ Z_{i T i,}_{2} \end{matrix} \begin{matrix} \dots \\ \dots \\ \dots \\ \dots \end{matrix} \begin{matrix} Z_{i 1, q} \\ Z_{i 2, 1} \\ ⋮ \\ Z_{i T i, q} \end{matrix}) = (\begin{matrix} Z^{'}_{i 1} \\ Z^{'}_{i 2} \\ ⋮ \\ Z^{'}_{i T i} \end{matrix})$

and subject-specific parameters $α_{i} = (α_{i 1}, ..., α_{i q})'$ . The temporal correlation is allowed through the assumption Var $(\in_{i}) = R_{i} (τ)$ . This is known as the fixed-effects linear longitudinal data model. The GLS of parameters can be shown as

$\hat{β} = {(\sum_{i = 1}^{n} X_{i}^{'} R_{i}^{- 1 / 2} Ω_{i} R_{i}^{- 1 / 2} X_{i}^{'})}^{- 1} \sum_{i = 1}^{n} X_{i}^{'} R_{i}^{- 1 / 2} Ω_{i} R_{i}^{- 1 / 2} y_{i}^{'}$

And

${\hat{α}}_{i} = {(Z_{i}^{'} R_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{'} R_{i}^{- 1} (y_{i} - X_{i} \hat{β}),$

with

$Ω = I_{i} - R_{i}^{- 1 / 2} Z_{i} {(Z_{i}^{'} R_{i}^{- 1} Z_{i})}^{- 1} Z_{i} R_{i}^{- 1 / 2}$

The above model can also be easily implemented using gls() by modifying the R code. For example, in the special case of $Z_{i t} = 1$ , the model reduces to the subject-specific intercept model with serial correlation. One could simply add factor(TOWNCODE) in the SCar.fit.

15.2.4 Models with Random Effects

Consider the linear longitudinal data model

$y_{i t} = z_{i t}^{'} α_{i} + x_{i t}^{'} β + \in_{i t} . (15.11)$

Instead of treating $α_{i}$ as fixed parameters, another approach to study heterogeneity is to view $α_{i}$ as random variables. This model, containing fixed effects parameter A and random effects $α_{i}$ , is known as the Linear Mixed-Effects Model (LMM). In its most general form, we assume that $E (α_{i})$ and $Var (α_{i}) = D, a q \times q$ positive definite matrix. Furthermore, the subject effects and error term are assumed to be uncorrelated, that is, $Cov (α_{i}, \in_{i}^{'}) = 0$ . Under these assumptions, the variance of each subject can be expressed as

$var (y_{i}) = Z_{i} D_{i}^{'} + R_{i} = V_{i} (τ),$

where vector $τ$ determines the covariance matrix.

For inference purposes, the GLS estimator of population parameter β is

${\hat{β}}_{G L S} = {(\sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} X_{i})}^{- 1} \sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} y_{i}$

and its variance is

$var {\hat{β}}_{G L S} = {(\sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} X_{i})}^{- 1}$

Similar to the fixed-effects model, it is easy to show that the MLE under multivariate normality is the same as the GLS estimators ofβ . For feasible estimates, we discuss likelihood- based methods for the estimation of variance components. Using $\hat{β}$ GLS, the concentrated log-likelihood function is shown as

$l_{M L} = ({\hat{β}}_{G L S} (τ), (τ))$

$\equiv - \frac{1}{2} (\sum_{i = 1}^{n} \log \det V_{i} (τ) + {\sum_{i = 1}^{n} {(y_{i} - X_{i} {\hat{β}}_{G L S} (τ))}^{'} V_{i} (τ)}^{- 1} (y_{i} - X_{i} {\hat{β}}_{G L S} (τ))) .$

Viewing ${\hat{β}}_{G L S}$ as a function of $τ$ , one can maximize the log-likelihood with respect to $τ$ . This can be done using either Newton-Raphson or the Fisher scoring method. As in the OLS regression, the MLEs of variance component are biased downward. To mitigate the bias, one could employ restricted maximum likelihood by modifying the concentrated log-likelihood function:

$l_{R E M L} ({\hat{β}}_{G L S} (T), T) \equiv l_{M L} ({\hat{β}}_{G L S} (T), T) - \frac{1}{2} \log \det (\sum_{i = 1}^{n} X_{i}^{'} V_{i} {(T)}^{- 1} X_{i}) (15.12)$

Now we examine the so-called error components model (or, random intercept model), a special case that is important in actuarial science where $z_{i t} = 1 and var \in_{i} = σ^{2} I_{i}$ . See Sections 15.3 and 15.3.2 for more examples of this specification. The model becomes

$y_{i t} = α_{i} + x_{i t}^{'} β + \in_{i t}$

The model has the same presentation as the basic fixed-effects model and assumes no serial correlation within each subject. The difference is that the subject-specific intercept $α_{i}$ . is assumed to be random with zero mean and variance $σ_{α}^{2}$ . The error components model corresponds to the random sampling scheme where subjects consist of a random subset from a population. One can show that the variance of subject i is

$var y_{i} = σ_{α}^{2} J_{i} + σ^{2} J_{i} = V_{i} = σ^{2} (\begin{matrix} 1 & p & \dots & p \\ p & 1 & \dots & p \\ ⋮ & ⋮ & \dots & ⋮ \\ p & p & \dots & 1 \end{matrix}),$

where $J_{i}$ is a $T_{i} \times T_{i}$ matrix with all elements equal to one, $J_{i}$ is a $T_{i}$ -dimensional identity matrix, and $p = σ_{α}^{2} / (σ^{2} + σ_{α}^{2})$ Thus, the error components model is equivalent to the model with exchangeable serial correlation.

We implement the error components model EC.fit using function lme() in the nlme package. The argument random is used to specify the random effects in the mixed-effects model. Comparing with Table 15.2, we notice that estimates of β are the same as the model with the exchangeable serial correlation. The default uses the REML to estimate model parameters. The confidence intervals of variance components are calculated in a similar way as for models with serial correlation (see Section 15.2.3) and can be called by function intervals().

> library(nlme)

> # Error-components model

> EC.fit <- lme(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn, random=~1|TOWNCODE)

> summary(EC.fit)

Linear mixed-effects model fit by REML Data: AutoClaimIn

AIC BIC logLik

1303.913 1321.606 -645.9566

Random effects:

Formula: ~1 | TOWNCODE

(Intercept) Residual StdDev: 18.05746 19.03756

StdDev:  18.05746 19.03756

Fixed effects: AC ~ lnPCI + lnPPSM + YEAR

    Value Std.Error  DF  t-value p-value

(Intercept)  887.8878 206.81071 113 4.293239  0e+00

lnPCI  -91.1979 20.41210 113 -4.467833  0e+00

lnPPSM  21.9614  5.07913 113 4.323844  0e+00

YEAR   3.9119  1.14457 113 3.417801  9e-04

Correlation:

  (Intr) lnPCI lnPPSM

lnPCI -0.988

lnPPSM  -0.249  0.096

YEAR  0.197 -0.205 -0.082

Standardized Within-Group Residuals:

   Min  Q1   Med  Q3  Max

-2.53017784 -0.61089180 0.01099886 0.50082006  2.91907172

Number of Observations: 145

Number of Groups: 29

> intervals(EC.fit, which="var-cov")

Approximate 95% confidence intervals

Random Effects:

Level: TOWNCODE

    lower est.  upper

sd((Intercept)) 12.93758 18.05746 25.20347

Within-group standard error:

 lower est . upper

16.72928 19.03756 21.66434

A relevant question to ask is whether the subject-specific effects are significant or the intercepts take a common value. Because is random, we wish to test the null hypothesis Ho : $σ_{α}^{2} = 0$ . We consider the following procedure:

Run the pooled cross-sectional model $y_{i t} = x_{i t}^{'} β + \in_{i t}$ and then calculate residuals eij.
For each subject, compute an estimator of $σ_{α}^{2}$ ,
$s_{i} = \frac{1}{T_{i} (T_{i} - 1)} (T_{i}^{2} e_{i}^{- 2} \sum_{t = 1}^{T_{i}} e_{i t}^{2})$
Compute test statistic and compare it with a quantile of ans $x^{2} (1)$ :
$T S = \frac{1}{2 n} {(\frac{\sum_{i = 1}^{n} s_{i} \sqrt{T_{i} (T_{i} - 1)}}{N^{- 1} \sum_{i = 1}^{n} \sum_{i = 1}^{T_{i}} e_{i t}})}^{2}$

In our example, the test statistic is equal to 56.82 and thus we reject the null hypothesis of constant intercept.

> # Pooling test

> tcode = unique(AutoClaimIn$TOWNCODE)

> n = length(tcode)

> N = nrow(AutoClaimIn)

> T <- rep(NA,n)

> s <- rep(NA,n)

> for (i in 1:n){

+ T[i] <- nrow(subset(AutoClaimIn,TOWNCODE==tcode[i]))

+ s[i] <- (sum(subset(AutoClaimIn,TOWNCODE==tcode[i])$rPool)~2 +

-  sum(subset(AutoClaimIn,TOWNCODE==tcode[i])$rPool~2))/T  [i]/(T[i]-1)

+}

> TS <- (sum(s*sqrt(T*(T-1)))*N/sum(AutoClaimIn$rPool~2))~2/2/n

> TS

[1] 56.85278

To implement the mixed-effects model, one could use correlation in the lme() function to specify serial correlation. For example, in the model RE.fit, we use update() to include AR(1) temporal correlation in the error components model. Here we see that with subject-specific intercept, the serial correlation (-0.014) is not significant. The function getVarCov() can be used to output the variance-covariance matrix. The argument type="conditional" provides the estimate of Ri and the argument type="marginal" provides the estimate of Vi. We further perform a likelihood ratio test to test for the serial correlation using anova. Consistently,the large p-value does not show support for serial correlation in the error components model. Note: we use method="ML" to get the true og-likelihood value for this test.

 # Error component with AR1

 RE.fit <- update(EC.fit, correlation=corAR1(form=~1|TOWNCODE))

 summary(RE.fit)

 Linear mixed-effects model fit by REML

	Data: AutoClaimIn

		AIC  BIC logLik

	  1305.897 1326.538 -645.9484

 Random effects:

 Formula: ~1 | TOWNCODE

	 (Intercept) Residual

 StdDev: 18.10974 18.9826

 Correlation Structure: AR(1)

 Formula: ~1 | TOWNCODE

Parameter estimate(s):

Phi

 -0.01444735

 Fixed effects: AC ~ lnPCI + lnPPSM + YEAR

		Value Std.Error  DF  t-value p-value

(Intercept) 887.8789 206.74423 113 4.294577 0e+00

lnPCI	 -91.2038 20.40536 113 -4.469601 0e+00

lnPPSM	  21.9669  5.07795 113 4.325938 0e+00

YEAR	  3.9237  1.13499 113 3.457055 8e-04

Correlation:

	 (Intr) lnPCI lnPPSM

lnPCI  -0.988

lnPPSM -0.249 0.096

YEAR 0.198 -0.207 -0.082

Standardized Within-Group Residuals:

	Min  Q1   Med    Q3  Max

-2.55033919 -0.60887177 0.02008323 0.49759528  2.91281638

Number of Observations: 145 Number of Groups: 29

> intervals(RE.fit, which="var-cov")

Approximate 95% confidence intervals

 Random Effects:

  Level: TOWNCODE

    lower est. upper

sd((Intercept)) 12.96079 18.10974 25.30422

Correlation structure:

  lower   est. upper

Phi -0.2431935 -0.01444735 0.215821

attr(,"label")

[1] "Correlation structure:"

Within-group standard error:

  lower est.   upper

16.55969 18.98260 21.76003

> # Get variance components

> getVarCov(RE.fit)

Random effects variance covariance matrix

	  (Intercept)

(Intercept) 327.96

  Standard Deviations: 18.11

> getVarCov(RE.fit, type="conditional")

TOWNCODE 10

Conditional variance covariance matrix

	  1		 2	  3  	  4   5

1 3.6034e+02 -5.2059000  0.075212 -0.0010866  1.5699e-05

2  -5.2059e+00  360.3400000 -5.205900  0.0752120 -1.0866e-03

3 7.5212e-02 -5.2059000 360.340000 -5.2059000  7.5212e-02

4  -1.0866e-03 0.0752120 -5.205900 360.3400000 -5.2059e+00

5 1.5699e-05 -0.0010866  0.075212 -5.2059000  3.6034e+02

Standard Deviations: 18.983 18.983 18.983 18.983 18.983

> getVarCov(RE.fit, type="marginal")

TOWNCODE 10

 Marginal variance covariance matrix

  1  2  3  4  5

1  688.30 322.76 328.04 327.96 327.96

2  322.76 688.30 322.76 328.04 327.96

3  328.04 322.76 688.30 322.76 328.04

4  327.96 328.04 322.76 688.30 322.76

5  327.96 327.96 328.04 322.76 688.30

Standard Deviations: 26.236 26.236 26.236 26.236 26.236

> # Likelihood ratio test

> EC.fit.ml <- lme(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,

+ random="1|TOWNCODE, method="ML")

> RE.fit.ml <- update(EC.fit, correlation=corAR1(form=~1|TOWNCODE), method="ML")

> anova(EC.fit.ml, RE.fit.ml)

	Model df AIC  BIC logLik  Test L.Ratio  p-value

EC.fit.ml  1 6 1323.212 1341.073 -655.6062

RE.fit.ml  2 7 1325.171 1346.009 -655.5857 1 vs 2 0.04087198 0.8398

We conclude this section with the Hausman test. We have discussed the linear fixed- effects panel data model and the linear mixed-effects model. Both allow for subject specific heterogeneity but with different assumptions. An interesting question is how to choose from the two classes, that is, whether to treat $α_{i}$ as fixed or random. A possible solution is to refer to the Hausman test (see Hausman (1978)) with test statistic given by

$T S = {({\hat{β}}_{F E} - {\hat{β}}_{G L S})}^{'} {(var {\hat{β}}_{F E} - var {\hat{β}}_{G L S})}^{- 1} ({\hat{β}}_{F E} - {\hat{β}}_{G L S})$

where β FE and β GLS denote the fixed-effects estimator and the random-effects estimator, respectively. We compare the test statistic with a quantile of a $x^{2}$ (q). A large value supports the fixed-effects estimator. As an example, we compare the basic fixed-effects model with the error components model. The test statistic's observed value is 3.97, supporting the error components formulation.

> # Hausman test

> Var.FE <- vcov(FE.fit)[-(1:n),-(1:n)]

> Var.EC <- vcov(EC.fit)[-1,-1]

> beta.FE <- coef(FE.fit)[-(1:n)]

> beta.EC <- fixef(EC.fit)[-1]

> ChiSq <- t(beta.FE-beta.EC)°/o*°/0solve(Var.FE-Var.EC)°/o*°/o(beta.FE-beta.EC)

> ChiSq

  [,1]

[1,] 3.970489

15.2.5 Prediction

This section reviews prediction for longitudinal data mixed-effects models (as discussed in Section 15.2.4). In previous sections, we discussed the estimation and inference of fixed parameters β in the model. It is also of interest to summarize the subject-specific effects described by random variable $α_{i}$ . For example, in credibility theory, one is interested in the prediction of expected claims for a policyholder given his risk class. In doing so, we develop the best linear unbiased predictor (BLUP) of a random variable. Predictors are said to be linear if they are formed from a linear combination of the response and the BLUPs are constructed by minimizing the mean square error.

In a linear mixed-effects model where we have $E (y_{i}) = X_{i} β and var (y_{i}) = Z_{i} {DZ}_{i}^{'} + R_{i} = V_{i}$ , we wish to predict a random variable s $η$ with $E (η) = \overset{i}{c} β$ and $Var (η) = σ_{η}^{2} .$ . Let ${\hat{β}}_{G L S}$ to be the generalized least squares estimator of β , then the BLUP of $η$ is

$η_{B L U P} = c^{'} {\hat{β}}_{G L S} + \sum_{i = 1}^{n} {Cov {(η, y_{i})}^{'} V_{i}^{- 1} (y_{i} - X_{i} {\hat{β}}_{G L S})}$

and the mean squared error is

$var (η_{B L U P} - n) = (c^{'} - {\sum_{i = 1}^{n} Cov (η, y_{i})}^{'} V_{i}^{- 1} X_{i}) ({\sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} X_{i}}_{i})$ $\times (c^{'} - {\sum_{i = 1}^{n} Cov (η, y_{i})}^{'} V_{i}^{- 1} X_{i}) \sum_{i = 1}^{n} Cov {(η, y_{i})}^{'} V_{i}^{- 1} Cov (η, y_{i}) + σ_{η}^{2} .$

For example, consider a special case $η = w_{1}^{'} α_{i} + w_{2}^{'} β$ ,a linear combination of population parameters and subject-specific effects. Using the above relation, we can show that

${\hat{η}}_{B L U P} = w_{1}^{'} {DZ}_{i}^{- 1} (y_{i} - X_{i} {\hat{β}}_{G L S}) + w_{2}^{'} {\hat{β}}_{G L S} .$

Taking w2 = 0, we further have the BLUP of $α_{i}$ :

${\hat{α}}_{i}_{, B L U P} = {DZ}_{i}^{'} V_{i}^{- 1} (y_{i} - X_{i} {\hat{β}}_{G L S})$

Another special case that is useful for diagnostics is the residual $η = \in_{i t}$ In this case, we

have c = 0 and its BLUP is straightforwardly shown as

${\hat{e}}_{i t, B L U P} = y_{i t} - (z_{i t}^{'} {\hat{α}}_{i, B L U P} + z_{i t}^{'} {\hat{β}}_{G L S}) .$

Some special cases of BLUPs are available in package nlme. For the example of the error- components model EC.fit, function ranef() could be used to get the BLUP of random intercept a*,BLUP, and function residuals() could be used get the BLUP of residuals ljt,BLUp and its standardized version.

> # BLUP

> alpha.BLUP <- ranef(EC.fit)

> beta.GLS <- fixef(EC.fit)

> resid.BLUP <- residuals(EC.fit, type="response")

> rstandard.BLUP <- residuals(EC.fit, type="normalized")

> alpha.BLUP

(Intercept)

10 -0.2049993

11 -6.9197373

12 17.7349235

13 20.9538588

14 -0.1942180

15 -5.6464625

et cetera

To conclude this section, we compare the performance of alternative models using the data of automobile insurance. Our interest is to predict the expected claims of each policyholder in the next year. So the quantity of interest is $η = E (y_{i}, T_{i} + 1 | α_{i})$ . The corresponding BLUP is $η_{B L U P} = z_{i t}^{'} {\hat{α}}_{i, B L U P} + x_{i t} {\hat{β}}_{G L S}$

Recall that we developed various longitudinal data models using data of years 1993-1997, and use the data of year 1998 to validate the prediction. Table 15.3 presents the performance of various longitudinal data models based on both in-sample and out-of-sample data. For in-sample data, we report the information- based model selection criteria AIC and BIC. For out-of-sample, we report the sum of squared prediction error (SSPE) and the sum of absolute prediction error (SAPE). The results show that models that account for subject-specific effects perform better, regardless of the way that heterogeneity is accommodated.

> # Use data of year 1998 for validation

> AutoClaimOut <- subset(AutoClaim, YEAR == 1998)

> # Define new variables

> AutoClaimOut$lnPCI <- log(AutoClaimOut$PCI)

> AutoClaimOut$lnPPSM <- log(AutoClaimOut$PPSM)

> AutoClaimOut$YEAR <- AutoClaimOut$YEAR-1992

> # Compare models Pool.fit, SCar.fit, FE.fit, EC.fit, RE.fit and FEar.fit

> # Fixed-effects model with AR(1)

> FEar.fit <- gls(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1,

+ data=AutoClaimIn, correlation=corAR1(form=~1|TOWNCODE))

> FEar.fit.ml <- gls(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1,

+ data=AutoClaimIn, correlation=corAR1(form=~1|TOWNCODE), method="ML")

# Prediction

> Xmat <- cbind(rep(1,nrow(AutoClaimOut)),AutoClaimOut$lnPCI,

+ AutoClaimOut$lnPPSM,AutoClaimOut$YEAR)

beta.Pool <- coef(Pool.fit) pred.Pool <- Xmat%*%beta.Pool

MSPE.Pool <- sum((pred.Pool - AutoClaimOut$AC)"2)

MAPE.Pool <- sum(abs(pred.Pool - AutoClaimOut$AC))

beta.SCar <- coef(SCar.fit) pred.SCar <- Xmat%*%beta.SCar

MSPE.SCar <- sum((pred.SCar - AutoClaimOut$AC)~2)

MAPE.SCar <- sum(abs(pred.SCar - AutoClaimOut$AC))

beta.FE <- coef(FE.fit)[-(1:29)]

pred.FE <- coef(FE.fit)[1:29] + Xmat[,-1]%*%beta.FE MSPE.FE <- sum((pred.FE - AutoClaimOut$AC)"2)

MAPE.FE <- sum(abs(pred.FE - AutoClaimOut$AC))

beta.FEar <- coef(FEar.fit)[-(1:29)]

pred.FEar <- coef(FEar.fit)[1:29] + Xmat[,-1]%*%beta.FEar MSPE.FEar <- sum((pred.FEar - AutoClaimOut$AC)"2)

MAPE.FEar <- sum(abs(pred.FEar - AutoClaimOut$AC))

alpha.EC <- ranef(EC.fit)

beta.EC <- fixef(EC.fit)

pred.EC <- alpha.EC+Xmat%*%beta.EC

MSPE.EC <- sum((pred.EC - AutoClaimOut$AC)~2)

MAPE.EC <- sum(abs(pred.EC - AutoClaimOut$AC))

alpha.RE <- ranef(RE.fit)

beta.RE <- fixef(RE.fit)

pred.RE <- alpha.RE+Xmat%*%beta.RE

MSPE.RE <- sum((pred.RE - AutoClaimOut$AC)"2)

MAPE.RE <- sum(abs(pred.RE - AutoClaimOut$AC))

Table 15.3

Comparison of alternative models.

	In-Sample		Out-of-Sample
	AIC	BIC	SSPE	SAPE
Pooled cross-sectional model	1359.50	1374.38	22201.78	681.25
Pooled cross-sectional with AR(1)	1339.34	1357.21	21242.64	658.98
Fixed-effects model	1293.33	1391.56	21506.07	660.59
Fixed-effects with AR(1)	1286.03	1387.24	21573.79	662.04
Error-components model	1323.21	1341.07	19515.86	619.44
Error-components with AR(1)	1325.17	1346.01	19572.94	620.64

15.3 Generalized Linear Models for Longitudinal Data

As in the previous section, we have a dataset at our disposal consisting of n subjects, where for each subject i, $(1 \leq i \leq n) T_{i}$ observations are available. Relevant examples in experience rating are (among others) a dataset with n policyholders followed over time, and for which claim counts and severities are registered during each time period under consideration. As explained in Section 15.1 and demonstrated in Section 15.2 for linear models, we extend the GLMs discussed in Chapter 14 by including subject- (or, policyholder-) specific random effects. The random effects structure correlation between observations registered on the same subject, and also take heterogeneity among subjects, due to unobserved characteristics, into account. Therefore, our approach is in line with the random effects approach discussed in Section 15.2.4. Other methods exist for the analysis of longitudinal data in the framework of generalized linear models (the so-called marginal and conditional models; see Verbeke & Molenberghs (2000) and Antonio & Zhang (2014) for a discussion), but those will not be covered here.

15.3.1 Specifying Generalized Linear Models with Random Effects

Given the vector $α_{i}$ with the random effects for subject i, the repeated measurements $Y_{i 1}, ....., Y_{i T_{i}}$ are assumed to be independent with a density from the exponential family

$f (y_{i t} | α_{i}, β, ϕ) = \exp (\frac{y_{i t} θ_{i t} - ψ (θ_{i t})}{ϕ} + c (y_{i t}, ϕ)), t = 1, ... T_{i} (15.13)$

Some explicit examples follow in the illustrations discussed below. Similar to expressions obtained in Chapter 14, the following (conditional) relations hold:

$μ_{i t} = E [Y_{i t} | α_{i}] = ψ^{'} (θ_{i t}) and var [Y_{i t} | α_{i}] = ϕ ψ^{"} (θ_{i t}) = ϕ V (μ_{i t}) (15.14)$

where $g (μ_{i t}) = z^{'}_{i t} α_{i} + x^{'}_{i t} β .$ . As before, g(.) is called the link and V() is the variance function. β (p x 1) denotes the fixed-effects parameter vector (governing a priori rating) and $α_{i} (q \times 1)$ the random-effects vector. $x_{i t} (p \times 1)$ and $z_{i t} (q \times 1)$ contain subject i's covariate information for the fixed and random effects, respectively. The specification of the GLMM is completed by assuming that the random effects, $α_{i} (i = 1, ...., n)$ , are mutually independent and identically distributed with a density function $f (α_{i} | v)$ . Herewith, v denotes the unknown parameters in the density. In general statistics, the random effects often have a (multivariate) normal distribution with zero mean and covariance matrix determined by v. Observations on the same subject are dependent because they share the same random effects $α_{i}$ .

The likelihood function for the unknown parameters $β, v and ϕ$ then becomes

$L (β, v, ϕ, y) = \prod_{i = 1}^{n} f (y_{i} | α, β, ϕ) = \prod_{i = 1}^{n} \int \prod_{t = 1}^{T_{i}} f (y_{i t} | α_{i}, β, ϕ) f (α_{i} | v) d α_{i}, (15.15)$

where $y = {(y_{1}^{'}, ...., y_{n}^{'})}^{'}$ and the integral is with respect to the q-dimensional vector. For instance, with normally distributed data and random effects (our setting in Section 15.2), the integral can be worked out analytically and explicit expressions follow for the maximum likelihood estimator of β and the Best Linear Unbiased Predictor ('BLUP') for $α_{i}$ . For more general GLMMs, however, approximations to the likelihood or numerical integration techniques are required to maximize Equation (15.15) with respect to the unknown parameters. Such techniques are discussed (and demonstrated) in Antonio & Zhang (2014) (and references therein).

To illustrate the concepts described above, we now consider a Poisson GLMM with normally distributed random intercept, that is, a Poisson error components model. This GLMM allows explicit calculation of the marginal mean and covariance matrix. In this way, one can clearly see how the inclusion of the random effect leads to overdispersion and within-subject covariance.

Example 15.1 (A Poisson GLMM) Let $N_{i t}$ denote the claim frequency registered in year t for policyholder i. Assume that, conditional on $α_{i}$ ; $N_{i t}$ follows a Poisson distribution with mean $E [N_{i t} | α_{i}] = \exp (x_{i t}^{'} β + α_{i}) and α_{i} ~ N (0, σ_{b}^{2})$

Straightforward calculations lead to

$V a r (N_{i t}) = V a r (E (N_{i t} | α_{i})) +E (V a r (N_{i t} | α_{i})) (N_{i t})$ $=E (exp (x_{i t}^{'} β) [\exp (3 σ_{b}^{2} / 2) - \exp (σ_{b}^{2} / 2)] + 1), (15.16)$

and

$C o v (N_{i t}, N_{i t 2}) = C o v (E (N_{i t_{1}} | α_{i}), E (N_{i t_{2}} | α_{i})) + E (C o v (N_{i t_{1}}, N_{i t_{1}} | α_{i}))$

$= \exp (x_{i t}^{'} β) \exp (x_{i t 2}^{'} β) (\exp (2 σ_{b}^{2}) - \exp (σ_{b}^{2})) . (15.17)$

Hereby, we used the expressions for the mean and variance of a log-normal distribution. In the expression for the covariance, we used the fact that, given the random effect $α_{i}$ $N_{i t 1}$ and $N_{i t_{2}}$ are independent. We see that the expression inside the parentheses in Equation (15.16) is always bigger than 1. Thus, although $N_{i t} | α_{i}$ follows a regular Poisson distribution, the marginal distribution of Nit is overdispersed. According to Equation (15.17), due to the random intercept, observations on the same subject are no longer independent.

Example 15.2 (A Poisson GLMM — continued) Let $N_{i t}$ again denote the claim frequency for policyholder i in year t. Assume that, conditional on $α_{i}, N_{i t}$ follows a Poisson distribution with mean $E [N_{i t} | α_{i}] = \exp (x_{i t}^{'} β + α)$ and that $α_{i} ~ N (- \frac{σ_{b}^{2}}{2}, σ_{b}^{2})$ This re-parameterization is commonly used in ratemaking. Indeed, we now get

$E [N_{i t}] = E [E [N_{i t} | α_{i}]] = (x_{i t}^{'} β - \frac{σ_{b}^{2}}{2} + \frac{σ_{b}^{2}}{2}) = \exp (x_{i t}^{,} β), (15.18)$

and

$E [N_{i t} | α_{i}] = \exp (x_{i t}^{'} β + α_{i}) . (15.19)$

This specification shows that the a priori premium, given by $\exp (x_{i t}^{^{,}} β)$ ,is correct on the average. The a posteriori correction to this premium is determined by $\exp (α_{i})$ Besides the log-normal distribution from the above examples, other mixing distributions can be used. In the Poisson-Gamma framework, for instance, the conjugacy of these distributions allows for explicit calculation of the predictive premium. Example 15.3 (A Poisson-Gamma rating model).

$N_{i t} ~ P o i (b_{i} λ_{i t}), w h e r e λ_{i t} = \exp (x_{i t}^{'} β) a n d b_{i} ~ Γ (a, a)$

It follows that $E [b_{i}] = 1$ and the resulting joint, unconditional distribution then becomes

$p r (N_{i 1} = n_{i 1}, ..., N_{i T i} = n_{i T i}) (\prod_{t = 1}^{T_{i}} \frac{λ_{i t}^{n_{i t}}}{n_{i t}!})$ $= \frac{Γ (\sum_{t = 1}^{T_{i}} n_{i t} + α)}{Γ (α)} {(\frac{α}{\sum_{t = 1}^{T_{i}} n_{i t} + α})}^{α} \times {(\sum_{t = 1}^{T_{i}} n_{i t} + α)}^{- \sum_{t = 1}^{T_{i}} n_{i t}} (15.20)$

with $E [N_{i t}] = E [E [N_{i t} | b_{i}]] = λ a n d V a r [N_{i t}] = [V a r [N_{i t} | b_{i}]] + V a r [E [N_{i t} | b_{i}]] = λ_{i t} + \frac{1}{α} λ_{i t}^{2}$ For the specification in Equation (15.20), the posterior distribution of the random intercept b* has again a Gamma distribution with

$f (b_{i} | N_{i 1} = n_{i 1}, ...., N_{i T i} = n_{i T i}) \propto Γ (\sum_{t = 1}^{T i} n_{i t} + a, \sum_{t = 1}^{T i} λ_{i t} + a) . (15.21)$

The (conditional) mean and variance of this posterior distribution are given, respectively, by

$E [b_{i} | N_{i t} = n_{i t}, t = 1, ...., T_{i}] = \frac{a + \sum_{t = 1}^{T_{i}} n_{i t}}{a + \sum_{t = 1}^{T_{i}} λ_{i t}} and (15.22)$

$V a r [b_{i} | N_{i t} = n_{i t}, t = 1, ...., T_{i}] = \frac{a + \sum_{t = 1}^{T_{i}} n_{i t}}{{(a + \sum_{t = 1}^{T_{i}} λ_{i t})}^{2}} . (15.23)$

This leads to the following a posteriori premium

$E [N_{i, T_{i} + 1} | N_{i t} = n_{i t}, t = 1, ...., T_{i}] = λ_{i T_{i} + 1} E [b_{i} | N_{i t} = n_{i t}, t = 1, ...., T_{i}]$

$= λ_{i T_{i} + 1} {\frac{α + \sum_{t = 1}^{T_{i}} n_{i t}}{α + \sum_{t = 1}^{T_{i}} λ_{i t}}} (15.24)$

The above credibility premium is optimal when a quadratic loss function is used. Indeed, as is known in mathematical statistics, the conditional expectation minimizes a mean squared error criterion.

Experience rating based on multilevel (panel or higher order) models poses a challenge to the insurer when it comes to communicating the predictive results of these models to the policyholders. Customers may find it difficult to understand. It is not readily transparent to an ordinary policyholder how the surcharges (maluses) for reported claims and the discounts (bonuses) for claim-free periods are evaluated. In order to establish an experience rating system where insureds can easily understand the effect of reported claims or periods without claims, Bonus-Malus scales have been developed. We develop a case study (using R) of such scales in Section 15.3.2.

15.3.2 Case Study: Experience Rating with Bonus—Malus Scales in R

We now demonstrate how the statistical models from Section 15.3 allow us to develop a specific type of experience rating system, namely a Bonus-Malus ([BM]) scale. This type of experience rating is very common in motor (or vehicle) insurance. See Lemaire (1984) and Denuit et al. (2007) for detailed discussions. In a BM scale, an a priori tariff is adjusted based on the claim history of a policyholder. A “good” history will create a bonus, and therefore premium reduction. A 'bad' performance causes a malus, and penalizes the policyholder by a premium increase. We closely follow Denuit et al. (2007) in this section, and extend the discussion in Antonio & Valdez (2012) with an implementation in R of a simple BM scale.

Experience rating with a BM scale is appealing from a commercial and communication point of view. An insurer can easily explain to a customer how his claims reported in year t will change the premium applicable to year t +1 for automobile insurance. To discuss the probabilistic, statistical, as well as computational aspects of Bonus-Malus scales, a credibility model similar to the one in Example 15.3 is assumed. Let $N_{i t}$ denote the number of claims registered for policyholder i in year t. Our credibility model is structured as follows:

Policy(holder) i of the portfolio (i = 1,... ,n) is represented by a sequence $(Θ_{i}, N_{i})$ where $N_{i} = (N_{i 1}, N_{i 2}, .....) and Θ_{i}$ and $Θ_{i}$ , represents unexplained heterogeneity and has mean 1;
Given $Θ_{i} = θ$ , the random variables $N_{i t} (t = 1, 2, ....)$ are independent and $p (λ_{i t} θ)$ distributed; and
The sequences $(Θ_{i}, N_{i}) (i = 1, ..., n)$ are assumed to be independent.

15.3.2.1 Bonus—Malus Scales

A BM scale consists of a certain number of levels, say s +1, that are numbered from 0,..., s, with 0 being the best scale. Let $ℓ_{0}$ be the entrance level of a new driver. According to the number of claims reported during the insured period, drivers will move up and down the scale. A claim-free year results in a bonus point, which implies that the driver goes one level down. Claims are penalized by malus points, meaning that for each claim filed, the driver goes up a certain number of levels, denoted with pen (for penalty). We introduce a set of random variables that allows us to describe the technicalities of a BM scale. Lk represents the level occupied by the driver in the time interval $(k, k + 1)$ . Thus, $L_{K}$ takes a value in ${0, .... s}, and {L_{1}, L_{2} ....}$ is the driver's trajectory over time. With $N_{k}$ s the number of claims reported by the insured in the period (k — 1, k), the future level of an insured Lk is obtained from the present level $L_{k - 1}$ and the number of claims reported during the present year Nk. We recognize the so-called Markov property: the future depends on the present but not on the past. The relativity $r ℓ$ associated with each level $ℓ$ in the scale determines the premium discount/penalty awarded to the driver. A policyholder who has at present

Table 15.4

Transitions in the (-1/top scale) BM system.

Starting Level	Level 0 Claim	Occupied if > 1 is Reported
0	0	5
1	0	5
2	1	5
3	2	5
4	3	5
5	4	5

a priori premium $λ_{i t}$ (determined using the techniques from Chapter 14) and is in scale $ℓ$ , has to pay $r_{ℓ} \times λ_{i t}$ . With $r_{ℓ} < 1$ the driver receives a discount based on a favorable record of past claims. When $r_{ℓ} < 1$ the driver is penalized for his past performance. The relativities, together with the transition rules in the scale, are the commercial alternative for the credibility-type corrections to an a priori tariff, as discussed above. We want to demonstrate in this section the calculation of these relativities for a given portfolio and BM scale.

Example 15.3 (-1/Top Scale) We consider a very simple example of a BM scale to illustrate the concepts : the (-1/Top Scale). See Denuit et al. (2007) for more realistic examples. This scale has six levels, numbered 0,1,... ,5. Starting class is level 5. Each claim-free year is rewarded by one bonus class. When an accident is reported, the policyholder is transferred to scale 5. Table 15.4 represents these transitions.

15.3.2.2 Transition Rules, Transition Probabilities and Stationary Distribution

To enable the calculation of the relativity corresponding with each level £, some probabilistic concepts associated with BM scales must be introduced. The transition rules corresponding with a certain BM scale are indicator variables $t_{i j} (k)$ such that

$t_{i j} (k) {\begin{matrix} 1 if the policy transfers from i to j when k claisms are reported, \\ 0 otherwise . \end{matrix}$

1 if the policy transfers from i to j when k claims are reported, . tin (k) = < (15.25)

We define the transition matrix T(k), with k the number of claims reported by the driver,

$T (k) = (\begin{matrix} t_{00} (k) & t_{01} (k) & ... & t_{0 s} (k) \\ t_{01} (k) & t_{11} (k) & ... & t_{l s} (k) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t_{s o} (k) & t_{s 1} (k) & ... & t_{s s} \end{matrix}), (15.26)$

Where

$t_{i j} = {_{0 otherwise}^{1 if the policy transfers from i to j when k claisms are reported,} (k) (15.27)$

Thus, this matrix is a 0 - 1 matrix and each row has exactly one 1.

Assuming s $N_{1}, N_{2}, ....$ are independent and P(θ) distributed, the trajectory this driver follows through the scale will be represented as ${L_{1} (θ), L_{2} (θ), ....}$ The transition probability of this driver go from level $ℓ_{1}$ to $ℓ_{2}$ in a single step is

$p ℓ_{1} ℓ_{2} (θ) = P [L_{k + 1} (θ) = ℓ_{2} | L_{k} (θ) = ℓ_{1}]$

$= \sum_{n = 0}^{+ \infty} P [L_{k + 1} (θ) = ℓ_{2} | N_{k + 1} = n, L_{k} (θ) = ℓ_{1}] P [L_{k + 1} = n | L_{k} (θ) = ℓ_{1}]$

$= \sum_{n = 0}^{+ \infty} \frac{θ^{n}}{n!} \exp (- θ) t_{ℓ_{1} ℓ_{2}} (n) (15.28)$

where we used the independence of $N_{k + 1}$ and Lk. In matrix form, the one-step transition matrix P(θ) is given by

$p (θ) = (\begin{array}{l} p_{00} (θ) & p_{01} (θ) & \dots & p_{0 s} (θ) \\ p_{10} (θ) & p_{11} (θ) & \dots & p_{1 s} (θ) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ p_{s 0} (θ) & p_{s 1} (θ) & \dots & p_{s s} (θ) \end{array}) (15.29)$

The probability of being transferred from level i to level j in n steps is expressed by the n-step transition probability $p_{i j}^{(n)}$

$p_{i j}^{(n)} (θ) = P [L_{k} + n (θ) = j | L_{k} (θ) = i]$

$= \sum_{i_{1} = 0}^{s} \sum_{i_{2} = 0}^{s} \dots \sum_{i_{n - 1} = 0}^{s} p_{i i_{1}} (θ) p_{i_{1} i_{2}} (θ) ... p_{i_{n - 1 j}} (θ), (15.30)$

which composes the n-step transition matrix $p^{(n)} (θ)$

$p^{(n)} (θ) = (\begin{matrix} p_{00}^{(n)} (θ) & p_{01}^{(n)} (θ) & \dots & p_{0 s}^{(n)} (θ) \\ p_{10}^{(n)} (θ) & p_{11}^{(n)} (θ) & \dots & p_{1 s}^{(n)} (θ) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ p_{s 0}^{(n)} (θ) & p_{s 1}^{(n)} (θ) & \dots & p_{s s}^{(n)} (θ) \end{matrix}) . (15.31)$

The following relation holds between the 1 and n-step transition matrices: $p^{(n)} (θ) = p^{(n)} (θ)$

Ultimately, the BM system will stabilize and the proportion of policyholders occupying each level of the scale will remain unchanged. These proportions are captured in the stationary distribution $π (θ) = {(π_{0} (θ), ...., π_{s} (θ))}^{'}$ , which are defined as

$π_{ℓ_{2}} (θ) = \lim_{n \to + \infty} p_{ℓ_{1} ℓ_{2}}^{(n)} (θ) . (15.32)$

Correspondingly, $P^{(n)} (θ)$ converges to $\prod (θ)$ defined as

$\lim_{n \to + \infty} p^{{(n)}_{(θ)}} = \prod (θ) (\begin{matrix} π^{'} (θ) \\ π^{'} (θ) \\ ⋮ \\ π^{'} (θ) \end{matrix}) (15.33)$

For the BM scale introduced in Illustration 15.3 the transition and one-step probability

$T = (0) (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \end{matrix}) and T = (1) (\begin{matrix} 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}) (15.34)$

$P = (θ) (\begin{matrix} \exp (- θ) & 0 & 0 & 0 & 0 & 1 - \exp (- θ) \\ \exp (- θ) & 0 & 0 & 0 & 0 & 1 - \exp (- θ) \\ 0 & \exp (- θ) & 0 & 0 & 0 & 1 - \exp (- θ) \\ 0 & 0 & \exp (- θ) & 0 & 0 & 1 - \exp (- θ) \\ 0 & 0 & 0 & \exp (- θ) & 0 & 1 - \exp (- θ) \\ 0 & 0 & 0 & 0 & \exp (- θ) & 1 - \exp (- θ) \end{matrix}) (15.35)$

In R, we specify this one-step transition matrix P as follows:

Pmatrix =

function(th) {

P = matrix(nrow=6,ncol=6,data=0)

P[1,1]=P[2,1]=P[3,2]=P[4,3]=P[5,4]=P[6,5]= exp(-th)

P[,6] = 1-exp(-th)

return(P)}

Using a result from Rolski et al. (1999) (also see Denuit et al. (2007)), the stationary $π (θ)$ distribution can be obtained as n $π^{'} (θ) = e^{'} {(I - P (θ) + E)}^{- 1},$ with E the $(s + 1) x (s + 2)$ matrix with all entries 1. For the (— 1/Top Scale), this results in

$P = (θ) {(\begin{matrix} 2 - \exp (- θ) & 1 & 1 & 1 & 1 & \exp (- θ) \\ 1 - \exp (- θ) & 2 & 1 & 1 & 1 & \exp (- θ) \\ 1 & 1 - \exp (- θ) & 2 & 1 & 1 & \exp (- θ) \\ 1 & 1 & 1 - \exp (- θ) & 2 & 1 & \exp (- θ) \\ 1 & 1 & 1 & 1 - \exp (- θ) & 2 & \exp (- θ) \\ 1 & 1 & 1 & 1 & 1 - \exp (- θ) & 1 + \exp (- θ) \end{matrix})}^{- 1} (15.36)$

We specify the stationary distribution of the (— 1/Top Scale) in R:

lim.distr = function(matrix) {

et = matrix(nrow=1, ncol=dim(matrix)[2], data=1)

E = matrix(nrow=dim(matrix)[1], ncol=dim(matrix)[2], data=1)

mat = diag(dim(matrix)[1]) - matrix + E

inverse.mat = solve(mat)

p = et inverse.mat

return(p)}

For instance, with 0 = 0.1 (as in the example of Denuit et al. (2007), page 180, Example 4.9), the stationary distribution becomes

$π^{'} (0, 1) = (0:6065307 0:06378939 0:07049817 0:07791253 0:08610666 0:09516258) . (15.37)$

In R, we use the following instructions:

> P = Pmatrix(0.1)

> P

   [,1]  [,2]  [,3]  [,4]  [,5]   [,6]

[1,] 0.9048374 0.0000000 0.0000000 0.0000000 0.0000000 0.09516258

[2,] 0.9048374 0.0000000 0.0000000 0.0000000 0.0000000 0.09516258

[3,] 0.0000000 0.9048374 0.0000000 0.0000000 0.0000000 0.09516258

[4,] 0.0000000 0.0000000 0.9048374 0.0000000 0.0000000 0.09516258

[5,] 0.0000000 0.0000000 0.0000000 0.9048374 0.0000000 0.09516258

[6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.9048374 0.09516258

> pi = lim.distr(P)

> pi

  [,1]   [,2]  [,3]   [,4]  [,5]  [,6]

[1,] 0.6065307 0.06378939 0.07049817 0.07791253 0.08610666 0.09516258

15.3.2.3 Relativities

The calculation of the relativities in a BM scale reveals some similarities with explicit credibility-type calculations. Following Norberg (1976) with the number of levels and transition rules being fixed, the optimal relativitys $r_{e}$ , corresponding with level $ℓ$ , is determined by maximizing the asymptotic predictive accuracy. This implies that one tries to minimize

$E [{(Θ - r_{L})}^{2}], (15.38)$

the difference between the relativity rL and the “true” relative premium $Θ$ , under the assumptions of our credibility model. Simplifying the notation in this model, the a priori premium of a random policyholder is denoted with $Λ$ and the residual effect of unknown risk characteristics with $Θ$ . The policyholder then has (unknown) annual expected claim frequency $Λ$ $Θ$ , where $Λ$ and $Θ$ are assumed to be independent. The weights of different risk classes follow from the a priori system with $P [Λ = λ_{k}] = w k$

Calculation of the $r ℓ$ 's goes as follows:

$minE [{(Θ - r_{L})}^{2}] = \sum_{ℓ = 0}^{s} E [{(Θ - r_{ℓ})}^{2} | L = ℓ] P | [L = ℓ] (15.39)$

$= \sum_{ℓ = 0}^{s} \int_{_{0}}^{^{+ \infty}} {(Θ - r_{ℓ})}^{2} P [L = ℓ | Θ = θ] d F Θ (θ)$

$= \sum_{k} w k \int_{_{0}}^{^{+ \infty}} \sum_{ℓ = 0}^{s} {(Θ - r_{ℓ})}^{2} π ℓ (λ_{k} θ) d F Θ (θ), (15.40)$

where $P [Λ = λ_{k}]$ = $w k$ In the last step of the derivation, conditioning is on Λ. It is straightforward to obtain the optimal relativities by solving

$\frac{\partial E [{(Θ - r_{L})}^{2}]}{\partial r_{j}} = 0 with j = 0, ...., s . (15.41)$

Alternatively, from mathematical statistics it is well known that for a quadratic loss function

(see Equation (15.39)) the optimal $r_{ℓ} = E [Θ | L = ℓ]$ This is calculated as follows:

$r_{ℓ} = E [Θ | L = ℓ]$

$E [E [Θ | L = ℓ, Λ] | L = ℓ]$

$= \sum_{k} E [Θ | L = ℓ, Λ = λ_{k}] P [Λ = λ_{k} | L = ℓ]$

$= \sum_{k} \int_{0}^{^{+ \infty}} θ \frac{P [L = ℓ | Θ = θ, Λ = λ_{k}] w_{k}}{P [L = ℓ, Λ = λ_{k}]} d F Θ (θ) \frac{P [Λ = λ_{k}, L = ℓ]}{P [L = ℓ]}, (15.42)$

where the relation $f Θ | L = ℓ, Λ = λ_{k} (θ | ℓ, λ_{k}) = \frac{P [L = ℓ | Θ = θ, Λ = λ_{k}] \times w_{k} \times f_{Θ} (θ)}{P [Λ = λ_{k}, L = ℓ]},$ is used. The optimal relativities are given by

$r_{ℓ} = \frac{\sum_{k} w k \int_{0}^{^{^{+ \infty}}} θ π_{ℓ} (λ_{k} θ) d F_{Θ} (θ)}{\sum_{k} w k \int_{0}^{^{^{+ \infty}}} θ π_{ℓ} (λ_{k} θ) d F_{Θ} (θ)} (15.43)$

When no a priori rating system is used, all the ${\hat{λ}}_{k}$ 's are equal (estimated by $\hat{λ}$ ) and the relativities reduce to

$r_{ℓ} = \frac{\int_{0}^{^{^{+ \infty}}} θ π_{ℓ} (λ_{k} θ) d F_{Θ} (θ)}{\int_{0}^{^{^{+ \infty}}} θ π_{ℓ} (λ_{k} θ) d F_{Θ} (θ)} (15.44)$

Calculation of these relativities in R goes as follows. We replicate Example 4.11 from Denuit et al. (2007) where no a priori rating is used. This example uses a Γ(a, a) distribution for the policyholder-specific random effect $Θ_{i}$ (as in Illustration 46), with $\hat{α}$ = 0.888 and $\hat{λ}$ = 0.1474. Those estimates are obtained by calibrating a Negative Binomial distribution on the data from Portfolio A in Denuit et al. (2007) (see Section 1.6, pages 44-45 , in the book). Data in Portfolio A are the claim counts registered on 14,505 policies during calendar year 1997.

### Without a priori ratemaking

a.hat = 0.8888

lambda.hat = 0.1474

inti =

function(theta, s, a, lambda) {

	a = a.hat

	lambda = lambda.hat

	f.dist = gamma(a)~(-1) * a~a * theta~(a-1) * exp(-a*theta)

	p	 = lim.distr(Pmatrix((lambda*theta)))

return(theta*p[1,s+1]*f.dist)}

P1 = matrix(nrow=1, ncol=6, data=0)

for (i in 0:5) P1[1,i+1] = integrate(Vectorize(int1),lower=0,upper=Inf,s=i)$value

int2 =

function(theta, s, a, lambda) {

	a = a.hat

	lambda = lambda.hat

	f.dist = gamma(a)~(-1) * a~a * theta~(a-1) * exp(-a*theta)

	p = lim.distr(Pmatrix((lambda*theta)))

return(p[1,s+1]*f.dist)}

P2 = matrix(nrow=1, ncol=6, data=0)

for (i in 0:5) P2[1,i+1] = integrate(Vectorize(int2),lower=0,upper=Inf,s=i)$value R = P1 / P2

> R # relativities without a priori rating

[,1] [,2] [,3] [,4]  [,5] [,6]

[1,] 0.5466848 1.21958 1.348203 1.507254 1.709032 1.973534

To demonstrate the calculation of relativities when accounting for a priori rating, we use the Portfolio A data from Denuit et al. (2007) again with the $\hat{λ}$ k's and $ω$ k's printed in Table 2.7 (page 91) of the book. $\hat{λ}$ k is the a priori annually expected claim frequency for risk class k, as determined by a set of a priori observed risk factors. The selection of risk factors and estimated annual claim frequencies are obtained by fitting a Negative Binomial regression model to the Portfolio A data. Negative Binomial regression for a single year of data on observed claim counts, say kj with i = 1... ,N is based on the following likelihood

$L (β, α) \prod_{i = 1}^{n} \frac{λ_{i}^{k_{i}}}{k_{i}^{!}} {(\frac{a}{a + λ_{i}})}^{a} {(a + λ_{i})}^{- k} \frac{Γ (a + k_{i})}{Γ (a)}, (15.45)$

where $λ_{i} = d_{i} \exp (x_{i}^{t} β)$ (with dj the exposure registered for policyholder i). Negative Binomial regression is available in R from the glm.nb() function.

lambda = c(0.1176,0.1408,0.1897,0.2272,0.1457,0.1746,0.2351,0.2816,

				0.1761,0.2109,0.2840,0.3402,0.2182,0.2614,0.3520,0.0928,

				0.1112,0.1498,0.1794,0.1151,0.1378,0.1856,0.2223)

weights = c(0.1049,0.1396,0.0398,0.0705,0.0076,0.0122,0.0013,0.0014,

				0.0293,0.0299,0.0152,0.0242,0.0007,0.0009,0.0002,0.1338,

				0.1973,0.0294,0.0661,0.0372,0.0517,0.0025,0.0044)

a = 1.065 n=length(weights)

int3 =

function(theta, lambda, a, l) {

	p = lim.verd(Pmatrix(lambda*theta))

	f.dist = gamma(a)“(-1) * a“a * theta“(a-1) * exp(-a*theta)

	return(theta*p[1,l+1]*f.dist)}

int4

	function(theta, lambda, a, l) {

	p = lim.verd(Pmatrix(lambda*theta))

	f.dist = gamma(a)“(-1) * a“a * theta“(a-1) * exp(-a*theta)

	return(p[1,l+1]*f.dist)}

teller1 = teller2 = noemer = array(dim=6, data=0)

result1 = result2 = array(dim=6, data=0)

for (i in 0:5) {

	b = c = array(dim=n,data=0)

	for (j in 1:n) {

		b[j] = integrate(Vectorize(int3),lower=0, upper=Inf,lambda=lambda[j],a=a,l=i)$value

		c[j] = integrate(Vectorize(int4),lower=0, upper=Inf,lambda=lambda[j],a=a,l=i)$value}

	teller1[i+1] = b %*% weights

	noemer[i+1] = c %*% weights

	R = teller1/noemer

> R # relativities with a priori rating

[1] 0.6118907 1.2088841 1.3124752 1.4388207 1.5985014 1.8123074

Summarizing, we obtain the relativities displayed in Table 15.5 (with and without a priori rating) for the (—1/Top Scale) and the Portfolio A data from Denuit et al. (2007). The a posteriori corrections are less severe when a priori rating is taken into account.

Table 15.5

Numerical characteristics for the (— 1/top scale) and portfolio A data from Denuit et al. (2007), without and with a priori rating taken into account.

	rℓ = E[©L = ℓ]	rℓ = E[©L = ℓ]
Level ℓ	without a priori	with a priori
5	197.3%	181.2%
4	170.9%	159.9%
3	150.7%	143.9%
2	134.8 %	131.3%
1	122.0%	120.9%
0	54.7%	61.2%

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 15 Longitudinal Data and Experience Rating

Create new playlist

Sign In

Sign Up