Chapter 15

Longitudinal Data and Experience Rating

Katrien Antonio

Universiteit van
Amsterdam Amsterdam, Netherlands

Peng Shi

University of Wisconsin Madison
Madison, Wisconsin, USA

Frank van Berkum

Universiteit van Amsterdam
Amsterdam, Netherlands

15.1 Motivation

15.1.1 A Priori Rating for Cross-Sectional Data

In Chapter 14 on “General Insurance Pricing,” Boucher & Charpentier discuss regression techniques suitable for pricing with cross-sectional data. A cross-sectional dataset observes each subject in the sample (for instance, a policy(holder)) only once. Each subject is there­fore described by a single response, say s Yi (Generalized) for subject i, and vector with covariate informa­tion, xi. Assuming independence between subjects, Linear Models [(G)LMs] are directly available for statistical modeling and explain the response as a function of the risk factors, within an appropriate distributional framework. For pricing in general insur­ance, the actuary builds such models for a (cross-sectional) dataset with claim counts on the one hand and claim severities on the other hand as response variables, obtained by following policyholders during a single time period. The result is a tariff based on risk classification through regression modeling. When the explanatory variables used as rating factors express a priori correctly measurable information about the policyholder (or, for instance, the ve­hicle or the insured building), the system is called an a priori classification scheme. The examples in Chapter 14 illustrate this idea and use, for instance, age of the driver and age of the car to explain the number of claims registered by a policyholder.

15.1.2 Experience Rating for Panel Data

However, despite the presence of an a priori rating system, some important risk factors remain unmeasurable or unobservable. For example, in automobile insurance, the insurer is unable to detect the driver's aggressiveness behind the wheel or the quickness of his reflexes to avoid a possible accident (see Denuit et al. (2007) for further motivation). This motivates the presence of inhomogeneous tariff cells within an a priori rating system. An a posteriori or experience rating system is necessary to allow for the reevaluation of the premium (es­tablished a priori) based on the history of claims as reported by the insured. One can argue that an important predictor for the future number of claims reported by an insured will be the number of claims reported in the past. Predictive modeling for experience rating will confront analysts with data structures going beyond the cross-sectional design dealt with in (G)LMs. Longitudinal (or panel) data arise when the claim history of policyholders (or, in general, a group of “subjects”) is registered repeatedly over time. Thus, with longitudi­nal data, the variables will have double subscripts, indicating the subject and observation period, respectively. Specifically, let Yit denote the response for the i-th subject in the t-th time period, and let xit denote the associated vector of explanatory variables. Assuming that there are n subjects and following the i-th subject over t=1,...,Ti time periods, we observe

1st subject{(y11,X11),(y12, X12),,(y11, X1T1)}2st subject{(y21,X21),(y22,X22),,(y21,X2T2)}n-th subject{(yn1,Xn1),(yn2,Xn2),,(yn1,XnTn)}

Longitudinal data have several potential advantages. First, longitudinal data are a hybrid of cross-sectional and time series data. On the one hand, they allow for the examination of the effects of covariates on the response, as in usual regression. On the other hand, similar to time series analysis, they also permit the identification of dynamic relations over time. Because they share subject-specific characteristics, observations on the same subject over time are correlated and require an adjusted toolkit for statistical modeling. In this chapter we study regression models incorporating these dynamics, among others, by extending a priori rating with so-called random effects. These random effects structure correlation between observations registered on the same subject, and also take heterogeneity among subjects, due to unobserved characteristics, into account.

15.1.3 From Panel to Multilevel Data

The panel data setting has two layers (or levels) of data: the time level on the one hand and the timesubject level on the other hand. However, insurers may have several other layers of data at their disposal. For example, Antonio et al. (2010) discuss experience rating for a dataset on fleet covers, registered for multiple insurance companies. Fleet policies are umbrella-type policies issued to customers whose insurance covers more than a single vehicle. The hierar­chical or multilevel structure of the data is as follows: vehicles (v) observed over time (t), nested within fleets (f), with policies issued by insurance companies (c). Multilevel models allow for incorporating the hierarchical structure of the data by specifying random effects at the various levels in the data. Once again, these random effects represent unobservable characteristics at each level. Moreover, random effects allow a posteriori updating of an a priori tariff, by taking into account the past performance of—in the case of intercompany fleet contracts—the vehicle, fleet, and company.

15.1.4 Structure of the Chapter

Section 15.2 of this chapter considers linear models for longitudinal data. We discuss three approaches to capture unobserved heterogeneity in the longitudinal data context. Section

15.2.2 introduces the basic fixed effects model and describes the model specification and diagnostics. Section 15.2.3 extends these models to incorporate serial correlation in error terms. Section 15.2.4 presents models with random effects and generalizes the framework to linear mixed models. Section 15.2.5 covers the prediction for the linear mixed effects model and points out its connection to the actuarial credibility theory. The computational aspects are illustrated using a dataset introduced in Section 15.2.1. In Section 15.3 we leave the framework of linear models and switch to a distributional framework that is probably more appealing to actuaries, namely the generalized linear models and their random effects extensions. Actuarial credibility systems are examples of a posteriori rating systems accounting for the history of claims as it emerges for an individual risk. Commercial versions of these experience rating schemes are more widely known in practice as Bonus-Malus scales. A case study (using R) with such rating schemes is the topic of Section 15.3.2. The theory on longi­tudinal data models is based on Diggle et al. (2002), Frees (2004), Hsiao (2003), Wooldridge (2010), and the references therein. Section 15.3 of this chapter is based on Antonio & Valdez (2012), Antonio & Zhang (2014), and Denuit et al. (2007) but focus now on implementation with R. We refer to these papers and the references therein for more technical background. This chapter only covers examples with panel data. We refer to Antonio et al. (2010) for examples with multilevel data structures.

15.2 Linear Models for Longitudinal Data

15.2.1 Data

For linear longitudinal data models, we demonstrate the theory and computational aspects using a dataset of automobile bodily injury liability claims that was described and employed in Frees & Wang (2005). The dataset contains claims of 6 years from 1993 to 1998 for a random sample of twenty-nine towns in the state of Massachusetts. All variables in monetary values are rescaled using the consumer price index to mitigate the effect of time trends. We are interested in the behavior of average claims per unit of exposure, that is, the pure premium, for each town and each year. Two explanatory variables are available for the regression analysis, the per-capita income (PCI) and the population per square mile (PPSM) of each town. The variables and their descriptions are summarized in Table 15.1.

Table 15.1

Description of variables in the auto claim dataset.

Variable

Description

TOWNCODE

The index of Massachusetts towns

YEAR

The calendar year of the observation

AC

Average claims per unit of exposure

PCI

Per-capita income of the town

PPSM

Population per square mile of the town

> # File name is AutoClaimData.txt
> AutoClaim = read.table(choose.files(), sep = "", quote = "",header=TRUE)
> names(AutoClaim)
[1] "TOWNCODE" "YEAR" "AC"        "PCI"    "PPSM"
> AutoClaim[1:12,]  # Check longitudinal structure
  TOWNCODE YEAR   AC  PCI PPSM
1   10 1993 160.8522 18134.04  1475.5515
2   10 1994 158.3382 18495.88  1461.8110
3   10 1995 156.8098 18778.29  1488.9911
4   10 1996 168.9899 18740.46  1502.9322
5   10 1997 171.8229 18809.62  1534.4251
6   10 1998 153.7644 19034.59  1557.6937
7   11 1993 149.3873 15597.56 855.4350
8   11 1994 137.5546 15908.79 877.2725
9   11 1995 169.9164 16151.69 872.8024
10  11 1996 169.0598 16119.15 898.7802
11  11 1997 161.3425 16178.64 929.2647
12  11 1998 138.0516 16372.14 940.9162

We use the data in the first 5 years, namely 1993-1997, to develop the model and keep the observations in the final year for validation purposes. To explore relations among variables, the techniques used for usual regressions such as histogram and correlations statistics are ready to apply for longitudinal data. In addition, we introduce several more specialized techniques. The first is the multiple time series plot as exhibited in Figure 15.1, where the average claims in multiple years for each town are joined using straight lines. The plot shows the development of claims over time and helps visualize town-specific effects.

Figure 15.1

Figure showing Multiple time series plot of average claims.

Multiple time series plot of average claims.

# Use year 1993-1997 as trainning data and reserve year 1998 for validation AutoClaimIn <- subset(AutoClaim, YEAR < 1998)
> # Multiple time series plot
> plot(AC ~ YEAR, data = AutoClaimIn, ylab="Average Claim", xlab="Year")
> for (i in AutoClaimIn$TOWNCODE) {
+ lines(AC ~ YEAR, data = subset(AutoClaimIn, TOWNCODE == i))}

One can also use scatterplots to help detect the relation between the response and explana­tory variables. Figure 15.2 displays the scatterplot for variables PCI and PPSM, suggesting the negative relation between AC and PCI and the positive relation between AC and PPSM. Note that we use both PCI and PPSM in log scale, and logarithmic values will be used in the following analysis. In addition, we also serially connect the observations to identify potential patterns in each covariate. In this case, we observe that PCI varies over time and PPSM is relatively statable.

Figure 15.2

Figure showing Scatterplot between average claims and explanatory variables.

Scatterplot between average claims and explanatory variables.

> # Scatter plot to explore relations
> AutoClaimIn$lnPCI <- log(AutoClaimIn$PCI)
> AutoClaimIn$lnPPSM <- log(AutoClaimIn$PPSM)
> plot(AC ~ lnPCI, data = AutoClaimIn, ylab="Average Claim", xlab="PCI")
> for (i in AutoClaimIn$TOWNCODE) {
+ lines(AC ~ lnPCI, data = subset(AutoClaimIn, TOWNCODE == i))}
> plot(AC ~ lnPPSM, data = AutoClaimIn, ylab="Average Claim", xlab="PPSM")
> for (i in AutoClaimIn$TOWNCODE) {
+ lines(AC ~ lnPPSM, data = subset(AutoClaimIn, TOWNCODE == i))}

As a preliminary analysis, we consider a pooled cross-sectional regression model Pool.fit assuming all observations are independent, that is,

yit= α+xitβ+εit(15.1)

Here, α is the homogeneous intercept for all towns and β is the vector of regression coeffi­cients. Variables PCI, PPSM, and YEAR are included as covariates. As expected, we observe a significant negative effect of PCI and a positive effect of PPSM. We also observe an increasing trend in claims after purging off the inflation. Functions such as lm and anova are used to fit and analyze the ordinary least squares regression:

> AutoClaimIn$YEAR <- AutoClaimIn$YEAR-1992
> Pool.fit <- lm(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn)
> summary(Pool.fit)
Call:
lm(formula = AC ~ lnPCI + lnPPSM + YEAR, data = AutoClaimIn)
Residuals:
	Min	 1Q Median 3Q Max
 -49.944 -16.154 -1.759 14.300 104.468
Coefficients:
	 Estimate  Std. Error t value  Pr(>|t|)
(Intercept) 899.569  120.150 7.487 6.98e-12 ***
lnPCI   -92.604   11.855  -7.812 1.17e-12 ***
lnPPSM   22.305  2.933 7.606 3.64e-12 ***
YEAR   3.923  1.519 2.583  0.01082 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.75 on 141 degrees of freedom
Multiple R-squared: 0.4908, Adjusted R-squared: 0.48
F-statistic: 45.3  on 3  and 141  DF, p-value:  < 2.2e-16
> anova(Pool.fit)
Analysis of Variance Table
Response: AC
   Df  Sum Sq  Mean Sq F value  Pr(>F)
lnPCI  1 46355 46355 69.9028  5.402e-14 ***
lnPPSM  1 39344 39344 59.3302  2.141e-12 ***
YEAR   1 4423  4423  6.6704 0.01082 *
Residuals 141 93502  663
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

15.2.2 Fixed Effects Models

Repeated observations allow one to study the heterogeneity, be it either of subject or of time. We begin with the basic fixed effects model by introducing subject-specific intercepts in the model

yit=α+xitβ+εit.(15.2)

Hereby, αi is a town-specific intercept (i=1,...,n);xit=(xit,1,...,xit,Κ) is the vector of covariates; and is the vector of regression coefficients to be estimated. There are alternative methods to treat the heterogeneous intercepts. In this section, we assume {αi} are fixed parameters to be estimated along with. β Here, β is known as the population parameter capturing the common effects of explanatory variables. {αi} , called nuisance parameters, vary by subject (here, town) and account for the subject heterogene­ity. In the following, we will be using notations T=max{T1,...,Tn} and N=ΣniTi.

The basic fixed effects model assumes that there is no within-subject serial correlation, that is, s it are i.i.d. random variables with mean zero and variances σ2 . Thus, by the Gauss- Markov theorem, the OLS estimates are the best linear unbiased estimates with

ˆβ=(ni=1Tit=1(xitˉxi)(xitˉxi))1(ni=1Tit=1(xitˉx)(yitˉyi))(15.3)

and

ai=ˉyiˉxiβ.(15.4)

Here, ˉyi and x¯i are averages of {yit} and {xit} over time, respectively. The above is also known as within estimator because it uses the time variation within each cross section. In addition, the variance of β^ is shown to be

varβ=s2(i=1nt=1Ti(xitx¯i)(xitx¯i))1(15.5)

where s2 is the unbiased estimate of σ2 using residuals. In deriving the large sample property, one assumes n and T remaining fixed. Under regular conditions, one can show that β^ is consistent and asymptotically normally distributed. However, {ai} are not consistent and are not even approximately normal if the responses are not normally distributed.

We fit this basic fixed effects model FE.fit using lm by treating TOWNCODE as a categorical variable. The t- and F-statistics are constructed in the same way as in classical regression models. Note that the above model could be easily modified to account for time- specific heterogeneity by replacing ai , with λt . Similarly, using categorical variables for the time dimension, least squares estimation is readily applied.

> # Basic fixed-effects model
> FE.fit <- lm(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1, data=AutoClaimIn)
> summary(FE.fit)
Call:
lm(formula = AC ~ factor(TOWNCODE) + lnPCI + lnPPSM + YEAR - 1, data = AutoClaimIn)
Residuals:
Min  1Q Median 3Q Max
-55.645 -8.900 0.177 8.995 50.141
Coefficients:
		Estimate Std. Error t value Pr(>|t|)
factor(TOWNCODE)10 1660.321 1846.793  0.899 0.371
factor(TOWNCODE)11 1558.851 1794.617  0.869 0.387
factor(TOWNCODE)12 1554.375 1884.831  0.825 0.411
factor(TOWNCODE)13 1360.128 1731.874  0.785 0.434
factor(TOWNCODE)14 1443.895 1780.094  0.811 0.419
factor(TOWNCODE)15 1681.983 1841.401  0.913 0.363
(et cetera)
lnPCI	   -22.631 159.268 -0.142 0.887
lnPPSM    -176.831 107.240 -1.649 0.102
YEAR		  5.947 2.738  2.172 0.032 *
--- 
Signif. codes: 0  '***' 0.001 '**' 0.01 0.05 '.' 0.1  ' '  1
Residual standard error: 18.88 on 113 degrees of freedom
Multiple R-squared: 0.9863, Adjusted R-squared: 0.9824
F-statistic: 254.4 on 32 and 113 DF, p-value: < 2.2e-16
> anova(FE.fit)
Response: AC
factor(TOWNCODE)
		 Df Sum Sq Mean Sq F value Pr(>F)
factor(TOWNCODE) 29 2897069 99899 280.3677 < 2e-16 ***
lnPCI		 1 2231 2231  6.2621 0.01377 *
lnPPSM		 1	 34  34  0.0967 0.75638
YEAR		 1 1681 1681  4.7168 0.03196 *
Residuals	113  40263  356
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We further discuss three specific tests for model specification and diagnostics. The first is the pooling test, where one wishes to test whether the subject-specific effect is significant. The null hypothesis is

H0:a1=a2=...=an=a.

This can be done using the partial F- (Chow) test (see Chow (1960)) by calculating

Fratio=(ErrorSS)PooledErrorSS(n1)s2.

Here, ErrorSS and s2 are from the heterogeneous model (i.e., FE.fit) and (ErrorSS)Pooled are from the homogeneous model (i.e., pool.fit). It can be shown that F-ratio follows an F-distribution with degrees of freedom df1 = n-1 and df 2= N- (n+K). In this example, the F-statistic is equal to (93, 502 - 40, 263)/(29 - 1)/18.882 = 5.33, so we reject the null hypothesis.

> anova(Pool.fit,FE.fit)
Analysis of Variance Table
Model 1: AC ~  lnPCI + lnPPSM + YEAR
Model 2: AC ~ factor(TOWNCODE) + lnPCI + lnPPSM + YEAR - 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 141 93502
2 113 40263 28 53238 5.3362  7.214e-11 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

15.2.3 Models with Serial Correlation

An alternative approach to capture heterogeneity is to use serial correlation. The intuition is that if there are some unobserved time constant variables affecting the response, they will introduce correlation among repeated observations. To motivate this approach, we examine the serial correlation of residuals from Pool.fit. The results show strong temporal corre­lation among AC after removing the effects of explanatory variables. This suggests that the i.i.d. assumption used in the homogeneous model is not appropriate.

> # Correlation among residuals
> AutoClaimIn$rPool <- resid(Pool.fit)
> rvec <- cbind(subset(AutoClaimIn,YEAR==1)$rPool,subset(AutoClaimIn,YEAR==2)$rPool, + subset(AutoClaimIn,YEAR==3)$rPool,subset(AutoClaimIn,YEAR==4)$rPool,
+ subset(AutoClaimIn,YEAR==5)$rPool)
> cor(rvec)
	 [,1]	[,2]  [,3]  [,4]   [,5]
[1,]  1.0000000 0.5862895 0.5187797 0.4207831  0.5424555
[2,]  0.5862895 1.0000000 0.3911814 0.2164202  0.2555096
[3,]  0.5187797 0.3911814 1.0000000 0.3955654  0.7890728
[4,]  0.4207831 0.2164202 0.3955654 1.0000000  0.4778912
[5,]  0.5424555 0.2555096 0.7890728 0.4778912  1.0000000

To relax the i.i.d. assumption, we first consider a homogeneous model with serial correlation. For subject i, the matrix presentation of the model is

yi=Xiβ+εi,(15.6)

Where

yi=(yi1yi2yiTi), Xi=(xi1,1xi2,1xiTi,1xi1,2xi2,2xiTi,2xi1,Kxi2,KxiT,K)=(Xi1Xi2XiTi),εi=(εi1εi2εiTi).(15.7)

Now we assume that i are correlated with var(i)=Ri. Let R=R(τ) denote the T×T temporal covariance matrix for a vector of T observations. Unknown parameters in this covariance matrix are denoted with τ. Note there are at most T(T + 1)/2 unknown elements in R. Commonly used special cases of R are (using T = 5):

Independent R =Compound Symmetry R =AR(1) R =(σ200000σ200000σ200000σ200000σ2)σ2(1ppppp1ppppp1ppppp1ppppp1)σ2(1pp2p3p4p1pp2p3p2p1pp2p3p2p1pp4p3p2p1)Toeplitz R =Banded Toeplitz R =Unstructured R =(σ2σ1σ2σ3σ4σ1σ2σ1σ2σ3σ2σ1σ2σ1σ2σ3σ2σ1σ2σ1σ4σ3σ2σ1σ2)(σ2σ1σ200σ1σ2σ11σ20σ2σ1σ2σ1σ20σ2σ1σ2σ100σ2σ1σ2 )(σ2σ12σ13σ14σ15σ12σ2σ23σ24σ25σ13σ23σ2σ34σ35σ14σ24σ34σ2σ45σ15σ25σ35σ45σ2)

For the i-th observation, the covariance matrix var (i)=Ri(τ) Ti×Ti matrix. Here, Ri(τ) is positive definite and depends on i only through its dimension; thus it can be determined by removing certain rows and columns of the matrix Ri(τ) . This set of notations allows us to easily handle missing data and incomplete observations.

The model can be estimated using either moment-based or likelihood-based methods. With known Ri, the generalized least squares (GLS) estimates are obtained by minimizing

i=1n(yiXiβ)'Ri1(yiXiβ),

and we have

β=(i=1nXiRi1Xi)1i=1nXiRi1yi.

We can estimate such a model using the R package nlme. Two types of likelihood-based methods are provided to estimate regression parameter β and variance components τ , the full maximum likelihood (ML) estimation and the restricted maximum likelihood (REML) estimation. Based on the assumption of multivariate normality of the response yi, the full

log-likelihood function (l=log(L)) fo the model is

lML(β,T)=12(i=1nlog detRi(τ)+i=1n(yiXiβ)'Ri(τ)1(yiXiβ)).(15.8)

The MLE follows by maximizing the above likelihood function over β and τ simultane­ously. It is also easy to show that for fixed covariance parameter τ, the MLE of β are the same as the generalized least squares estimators. It is known that the MLE of τ is biased downward. To mitigate the bias, the restricted maximum likelihood maximizes the following log-likelihood function:

lREML(β,τ)lML(β,τ)12log det (i=1nXiRi1Xi).(15.9)

The REML estimation will be discussed in more detail in the section on random-effects models.

In our application, we fit the linear model with three types of serial correlation: the compound symmetry, the AR(1), and the unstructured. See Table 15.2 for the results. We denote the resulting models by SCex.fit, SCar.fit, and SCun.fit, respectively. The models are fit using the function gls() in the nlme package. The argument correlation is used to specify matrix R(τ), and the argument methodis used to specify the estimation method. The default estimation approach is the REML. The estimation results are displayed in Table 15.2. The estimates of regression coefficients are similar and are consistent with the pooled cross-sectional regression model. The estimates of variance components suggest significant within-subject temporal correlation. Note that when unstructured covariance is specified, the model is not identifiable in its most general form due to the nonuniqueness of R(τ) . Thus, additional constraints are necessary for identification purposes. The gls() function estimates the model under the parameterization R=σ2Σ where σ2 is a scale parameter and is the correlation matrix.

For inference, the estimation error of population parameter β is based on

Varβ^=(i=1nXiR(τ)i1Xi)1.

The estimation error of τ^ can follow in different ways. The approach implemented in the gls() is to use the inverse of the observed Fisher information. The confidence interval for the scale parameter σ and correlation parameter ρs are obtained based on the approximate normal distribution of the ML or REML estimators of a transformation of parameters. Specifically, the 95% confidence interval of σ is

[exp(σ*^1.645sσ*^),exp(σ*^+1.645sσ^*)],

where σ*^lnσ^ is the associated standard error derived from the Fisher information. Similarly, the 95% confidence interval of ρ is

[exp(ρ^*1.645sρ^*)1exp(ρ^*1.645sρ^*)+1,exp(ρ^*+1.645sρ^*)1exp(ρ^*+1.645sρ^*)+1, ],

where ρ^*ln1ρ^1+ρ^and sρ*^ is the corresponding standard error. In the package nlme, function intervals can be used to call for the 95% confidence interval of τ , and function getVarCov can be used to call for the estimates of R(τ^).

> library(nlme)
> # Compound symmetry
> SCex.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corCompSymm(form=~1|TOWNCODE))
> summary(SCex.fit)
> intervals(SCex.fit,which = "var-cov")
> getVarCov(SCex.fit)
Marginal variance covariance matrix
	[,1] [,2]  [,3]  [,4]  [,5]
[1,] 688.50  326.07 326.07 326.07 326.07
[2,] 326.07  688.50 326.07 326.07 326.07
[3,] 326.07  326.07 688.50 326.07 326.07
[4,] 326.07  326.07 326.07 688.50 326.07
[5,] 326.07  326.07 326.07 326.07 688.50
Standard Deviations: 26.239 26.239 26.239 26.239 26.239
> # AR(1)
> SCar.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corAR1(form=~1|TOWNCODE))
> summary(SCar.fit)
> intervals(SCar.fit,which = "var-cov")
> getVarCov(SCar.fit)
Marginal variance covariance matrix
	 [,1] [,2]  [,3] [,4] [,5]
[1,] 673.210 292.350 126.96 55.132 23.942
[2,] 292.350 673.210 292.35 126.960 55.132
[3,] 126.960 292.350 673.21 292.350 126.960
[4,]  55.132 126.960 292.35 673.210 292.350
[5,]  23.942  55.132 126.96 292.350 673.210
Standard Deviations: 25.946 25.946 25.946 25.946 25.946
> # Unstructured
> SCun.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corSymm(form=~1|TOWNCODE))
> summary(SCun.fit)
> intervals(SCun.fit,which = "var-cov")
> getVarCov(SCun.fit)
Marginal variance covariance matrix
[,1]  [,2] [,3] [,4] [,5]
[1,] 696.15  485.50 324.79 315.06 374.16
[2,] 485.50  696.15 227.51 179.88 190.11
[3,] 324.79  227.51 696.15 284.12 522.68
[4,] 315.06  179.88 284.12 696.15 351.96
[5,] 374.16  190.11 522.68 351.96 696.15
Standard Deviations: 26.385 26.385 26.385 26.385 26.385

The usual t- or F-test statistics follow as for the i.i.d. case. Caution is needed for the tests based on the likelihood function. For example, the likelihood ratio test relies on the value of log-likelihood function rather than the restricted likelihood. One can use method="ML" in the gls() function to implement maximum likelihood estimation. We perform the test using anova for the models with serial correlation and the pooled cross-sectional regression. The results support the evidence of positive serial correlation.

Table 15.2

Estimation for models with serial correlation.

SCex.fit

SCar.fit

SCun.fit

Parameter

Est.

S.E.

Est.

S.E.

Est.

S.E.

(Intercept)

887.89

206.81

891.45

168.25

878.68

200.85

lnPCI

-91.20

20.41

-91.33

16.61

-90.81

19.81

lnPPSM

21.96

5.08

21.76

4.11

23.70

4.95

YEAR

3.91

1.14

3.55

1.66

1.82

1.03

Est.

95%CI

Est.

95%CI

Est.

95%CI

CS

0.47

(0.29,0.64)

AR(1)

0.43

(0.26,0.58)

UN

corr(1,2)

0.70

(0.46,0.84)

corr(1,3)

0.47

(0.15,0.70)

corr(1,4)

0.45

(0.06,0.72)

corr(1,5)

0.54

(0.19,0.76)

corr(2,3)

0.33

(-0.00,0.59)

corr(2,4)

0.26

(-0.16,0.60)

corr(2,5)

0.27

(-0.13,0.60)

corr(3,4)

0.41

(0.11,0.64)

corr(3,5)

0.75

(0.57,0.86)

corr(4,5)

0.51

(0.20,0.72)

Scale

26.24

(22.22,30.98)

25.95

(22.62,29.76)

26.38

(22.27,31.26)

log-REML

-645.96

-654.25

-635.93

log-ML

-655.61

-663.67

-645.38

AIC

1323.21

1339.34

1320.75

BIC

1341.07

1357.21

1365.40

> # Likelihood ratio test
> SCex.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corCompSymm(form=~1|TOWNCODE), method="ML")
> SCar.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corAR1(form=~1|TOWNCODE), method="ML")
> SCun.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corSymm(form=~1|TOWNCODE), method="ML")
> anova(SCex.fit.ml, Pool.fit)
    Model df   AIC   BIC   logLik   Test   L.Ratio p-value
SCex.fit.ml  1 6  1323.212  1341.073 -655.6062
Pool.fit   2 5  1359.497  1374.381 -674.7487 1  vs  2 38.28505 <.0001
> anova(SCar.fit.ml, Pool.fit)
    Model df   AIC   BIC   logLik   Test   L.Ratio p-value
SCar.fit.ml  1 6 1339.344 1357.205 -663.6721
Pool.fit   2 5 1359.497 1374.381 -674.7487 1  vs  2 22.15326 <.0001
> anova(SCun.fit.ml, Pool.fit)
    Model df   AIC   BIC   logLik   Test   L.Ratio p-value
SCun.fit.ml   1 15 1320.753 1365.404 -645.3763
Pool.fit   2 5 1359.497 1374.381 -674.7487 1  vs  2 58.74476 <.0001

Finally, we extend the above model to allow for heterogeneity. We consider a more general model where not only subject specific intercepts, but also subject-specific slopes are incorporated in the linear model as

yi=Ziαi+Xiβ+εi,(15.10)

with explanatory matrix

Zi=(Zi1,1Zi2,1ZiTi,1 Zi1,2Zi2,1ZiTi,2   Zi1,qZi2,1ZiTi,q )=(Z'i1Z'i2Z'iTi)

and subject-specific parameters αi=(αi1,...,αiq)' . The temporal correlation is allowed through the assumption Var (i)=Ri(τ) . This is known as the fixed-effects linear longitudinal data model. The GLS of parameters can be shown as

β^=(i=1nXi'Ri1/2ΩiRi1/2Xi')1i=1nXi'Ri1/2ΩiRi1/2yi'

And

α^i=(Zi'Ri1Zi)1Zi'Ri1(yiXiβ^),

with

Ω=IiRi1/2Zi(Zi'Ri1Zi)1ZiRi1/2

The above model can also be easily implemented using gls() by modifying the R code. For example, in the special case of Zit=1 , the model reduces to the subject-specific intercept model with serial correlation. One could simply add factor(TOWNCODE) in the SCar.fit.

15.2.4 Models with Random Effects

Consider the linear longitudinal data model

yit=zit'αi+xit'β+it. (15.11)

Instead of treating αi as fixed parameters, another approach to study heterogeneity is to view αi as random variables. This model, containing fixed effects parameter A and random effects αi , is known as the Linear Mixed-Effects Model (LMM). In its most general form, we assume that E(αi) and Var(αi)=D, a q × q positive definite matrix. Furthermore, the subject effects and error term are assumed to be uncorrelated, that is, Cov(αi,i')=0 . Under these assumptions, the variance of each subject can be expressed as

var(yi)=ZiDi'+Ri=Vi(τ),

where vector τ determines the covariance matrix.

For inference purposes, the GLS estimator of population parameter β is

β^GLS=(i=1nXi'Vi1Xi)1i=1nXi'Vi1yi

and its variance is

varβ^GLS=(i=1nXi'Vi1Xi)1

Similar to the fixed-effects model, it is easy to show that the MLE under multivariate nor­mality is the same as the GLS estimators ofβ . For feasible estimates, we discuss likelihood- based methods for the estimation of variance components. Using β^ GLS, the concentrated log-likelihood function is shown as

lML=(β^GLS(τ),(τ))

12(i=1nlogdetVi(τ)+i=1n(yiXiβ^GLS(τ))'Vi(τ)1(yiXiβ^GLS(τ))).

Viewing β^GLS as a function of τ , one can maximize the log-likelihood with respect to τ . This can be done using either Newton-Raphson or the Fisher scoring method. As in the OLS regression, the MLEs of variance component are biased downward. To mitigate the bias, one could employ restricted maximum likelihood by modifying the concentrated log-likelihood function:

lREML(β^GLS(T),T)lML(β^GLS(T),T)12logdet(i=1nXi'Vi(T)1Xi) (15.12)

Now we examine the so-called error components model (or, random intercept model), a special case that is important in actuarial science where zit=1 and vari=σ2Ii . See Sections 15.3 and 15.3.2 for more examples of this specification. The model becomes

yit=αi+xit'β+it

The model has the same presentation as the basic fixed-effects model and assumes no serial correlation within each subject. The difference is that the subject-specific intercept αi . is assumed to be random with zero mean and variance σα2 . The error components model corresponds to the random sampling scheme where subjects consist of a random subset from a population. One can show that the variance of subject i is

var yi=σα2Ji+σ2Ji=Vi=σ2(1ppp1ppp1),

where Ji is a Ti×Ti matrix with all elements equal to one, Ji is a Ti -dimensional identity matrix, and p=σα2/(σ2+σα2) Thus, the error components model is equivalent to the model with exchangeable serial correlation.

We implement the error components model EC.fit using function lme() in the nlme package. The argument random is used to specify the random effects in the mixed-effects model. Comparing with Table 15.2, we notice that estimates of β are the same as the model with the exchangeable serial correlation. The default uses the REML to estimate model parameters. The confidence intervals of variance components are calculated in a similar way as for models with serial correlation (see Section 15.2.3) and can be called by function intervals().

> library(nlme)
> # Error-components model
> EC.fit <- lme(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn, random=~1|TOWNCODE)
> summary(EC.fit)
Linear mixed-effects model fit by REML Data: AutoClaimIn
AIC BIC logLik
1303.913 1321.606 -645.9566
Random effects:
Formula: ~1 | TOWNCODE
(Intercept) Residual StdDev: 18.05746 19.03756
StdDev:  18.05746 19.03756
Fixed effects: AC ~ lnPCI + lnPPSM + YEAR
    Value Std.Error  DF  t-value p-value
(Intercept)  887.8878 206.81071 113 4.293239  0e+00
lnPCI  -91.1979 20.41210 113 -4.467833  0e+00
lnPPSM  21.9614  5.07913 113 4.323844  0e+00
YEAR   3.9119  1.14457 113 3.417801  9e-04 
Correlation:
  (Intr) lnPCI lnPPSM
lnPCI -0.988
lnPPSM  -0.249  0.096
YEAR  0.197 -0.205 -0.082
Standardized Within-Group Residuals:
   Min  Q1   Med  Q3  Max
-2.53017784 -0.61089180 0.01099886 0.50082006  2.91907172
Number of Observations: 145
Number of Groups: 29 
> intervals(EC.fit, which="var-cov")
Approximate 95% confidence intervals
Random Effects:
Level: TOWNCODE
    lower est.  upper
sd((Intercept)) 12.93758 18.05746 25.20347
Within-group standard error:
 lower est . upper
16.72928 19.03756 21.66434

A relevant question to ask is whether the subject-specific effects are significant or the intercepts take a common value. Because is random, we wish to test the null hypothesis Ho :σα2=0 . We consider the following procedure:

  • Run the pooled cross-sectional model y it = x it ' β+ it and then calculate residuals eij.
  • For each subject, compute an estimator of σ α 2 ,

    si=1Ti(Ti1)(Ti2ei2t=1Tieit2)

  • Compute test statistic and compare it with a quantile of ans x2(1) :

    TS=12n(i=1nsiTi(Ti1)N1i=1ni=1Tieit)2

In our example, the test statistic is equal to 56.82 and thus we reject the null hypothesis of constant intercept.

> # Pooling test
> tcode = unique(AutoClaimIn$TOWNCODE)
> n = length(tcode)
> N = nrow(AutoClaimIn)
> T <- rep(NA,n)
> s <- rep(NA,n)
> for (i in 1:n){
+ T[i] <- nrow(subset(AutoClaimIn,TOWNCODE==tcode[i]))
+ s[i] <- (sum(subset(AutoClaimIn,TOWNCODE==tcode[i])$rPool)~2 +
-  sum(subset(AutoClaimIn,TOWNCODE==tcode[i])$rPool~2))/T  [i]/(T[i]-1)
+}
> TS <- (sum(s*sqrt(T*(T-1)))*N/sum(AutoClaimIn$rPool~2))~2/2/n
> TS
[1] 56.85278

To implement the mixed-effects model, one could use correlation in the lme() func­tion to specify serial correlation. For example, in the model RE.fit, we use update() to include AR(1) temporal correlation in the error components model. Here we see that with subject-specific intercept, the serial correlation (-0.014) is not significant. The func­tion getVarCov() can be used to output the variance-covariance matrix. The argument type="conditional" provides the estimate of Ri and the argument type="marginal" provides the estimate of Vi. We further perform a likelihood ratio test to test for the serial correlation using anova. Consistently,the large p-value does not show support for serial correlation in the error components model. Note: we use method="ML" to get the true og-likelihood value for this test.

 # Error component with AR1
 RE.fit <- update(EC.fit, correlation=corAR1(form=~1|TOWNCODE))
 summary(RE.fit)
 Linear mixed-effects model fit by REML
	Data: AutoClaimIn
		AIC  BIC logLik
	  1305.897 1326.538 -645.9484
 Random effects:
 Formula: ~1 | TOWNCODE
	 (Intercept) Residual
 StdDev: 18.10974 18.9826
 Correlation Structure: AR(1)
 Formula: ~1 | TOWNCODE
Parameter estimate(s):
	 Phi
 -0.01444735
 Fixed effects: AC ~ lnPCI + lnPPSM + YEAR
		Value Std.Error  DF  t-value p-value
(Intercept) 887.8789 206.74423 113 4.294577 0e+00
lnPCI	 -91.2038 20.40536 113 -4.469601 0e+00
lnPPSM	  21.9669  5.07795 113 4.325938 0e+00
YEAR	  3.9237  1.13499 113 3.457055 8e-04 
Correlation:
	 (Intr) lnPCI lnPPSM
lnPCI  -0.988
lnPPSM -0.249 0.096
YEAR 0.198 -0.207 -0.082
Standardized Within-Group Residuals:
	Min  Q1   Med    Q3  Max
-2.55033919 -0.60887177 0.02008323 0.49759528  2.91281638
Number of Observations: 145 Number of Groups: 29
> intervals(RE.fit, which="var-cov")
Approximate 95% confidence intervals
 Random Effects:
  Level: TOWNCODE
    lower est. upper
sd((Intercept)) 12.96079 18.10974 25.30422
Correlation structure:
  lower   est. upper
Phi -0.2431935 -0.01444735 0.215821
attr(,"label")
[1] "Correlation structure:"
Within-group standard error:
  lower est.   upper
16.55969 18.98260 21.76003
> # Get variance components
> getVarCov(RE.fit)
Random effects variance covariance matrix
	  (Intercept)
(Intercept) 327.96
  Standard Deviations: 18.11
> getVarCov(RE.fit, type="conditional")
TOWNCODE 10
Conditional variance covariance matrix
	  1		 2	  3  	  4   5
1 3.6034e+02 -5.2059000  0.075212 -0.0010866  1.5699e-05
2  -5.2059e+00  360.3400000 -5.205900  0.0752120 -1.0866e-03
3 7.5212e-02 -5.2059000 360.340000 -5.2059000  7.5212e-02
4  -1.0866e-03 0.0752120 -5.205900 360.3400000 -5.2059e+00
5 1.5699e-05 -0.0010866  0.075212 -5.2059000  3.6034e+02
Standard Deviations: 18.983 18.983 18.983 18.983 18.983
> getVarCov(RE.fit, type="marginal")
TOWNCODE 10
 Marginal variance covariance matrix
  1  2  3  4  5
1  688.30 322.76 328.04 327.96 327.96
2  322.76 688.30 322.76 328.04 327.96
3  328.04 322.76 688.30 322.76 328.04
4  327.96 328.04 322.76 688.30 322.76
5  327.96 327.96 328.04 322.76 688.30
Standard Deviations: 26.236 26.236 26.236 26.236 26.236
> # Likelihood ratio test
> EC.fit.ml <- lme(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ random="1|TOWNCODE, method="ML")
> RE.fit.ml <- update(EC.fit, correlation=corAR1(form=~1|TOWNCODE), method="ML")
> anova(EC.fit.ml, RE.fit.ml)
	Model df AIC  BIC logLik  Test L.Ratio  p-value
EC.fit.ml  1 6 1323.212 1341.073 -655.6062
RE.fit.ml  2 7 1325.171 1346.009 -655.5857 1 vs 2 0.04087198 0.8398

We conclude this section with the Hausman test. We have discussed the linear fixed- effects panel data model and the linear mixed-effects model. Both allow for subject specific heterogeneity but with different assumptions. An interesting question is how to choose from the two classes, that is, whether to treat αi as fixed or random. A possible solution is to refer to the Hausman test (see Hausman (1978)) with test statistic given by

TS=(β^FEβ^GLS)'(varβ^FEvarβ^GLS)1(β^FEβ^GLS)

where β FE and β GLS denote the fixed-effects estimator and the random-effects estimator, respectively. We compare the test statistic with a quantile of a x2 (q). A large value supports the fixed-effects estimator. As an example, we compare the basic fixed-effects model with the error components model. The test statistic's observed value is 3.97, supporting the error components formulation.

> # Hausman test
> Var.FE <- vcov(FE.fit)[-(1:n),-(1:n)]
> Var.EC <- vcov(EC.fit)[-1,-1]
> beta.FE <- coef(FE.fit)[-(1:n)]
> beta.EC <- fixef(EC.fit)[-1]
> ChiSq <- t(beta.FE-beta.EC)°/o*°/0solve(Var.FE-Var.EC)°/o*°/o(beta.FE-beta.EC)
> ChiSq
  [,1]
[1,] 3.970489

15.2.5 Prediction

This section reviews prediction for longitudinal data mixed-effects models (as discussed in Section 15.2.4). In previous sections, we discussed the estimation and inference of fixed parameters β in the model. It is also of interest to summarize the subject-specific effects described by random variable αi . For example, in credibility theory, one is interested in the prediction of expected claims for a policyholder given his risk class. In doing so, we develop the best linear unbiased predictor (BLUP) of a random variable. Predictors are said to be linear if they are formed from a linear combination of the response and the BLUPs are constructed by minimizing the mean square error.

In a linear mixed-effects model where we have E(yi)=Xiβ and var(yi)=ZiDZi'+Ri=Vi , we wish to predict a random variable s η with E(η)=ciβ and Var(η)=ση2. . Let β^GLS to be the generalized least squares estimator of β , then the BLUP of η is

ηBLUP=c'β^GLS+i=1n{Cov(η,yi)'Vi1(yiXiβ^GLS)}

and the mean squared error is

var(ηBLUPn)=(c'i=1nCov(η,yi)'Vi1Xi)(i=1nXi'Vi1Xii) ×(c'i=1nCov(η,yi)'Vi1Xi)i=1nCov(η,yi)'Vi1Cov(η,yi)+ση2.

For example, consider a special case η=w1'αi+w2'β ,a linear combination of population parameters and subject-specific effects. Using the above relation, we can show that

η^BLUP=w1'DZi1(yiXiβ^GLS)+w2'β^GLS.

Taking w2 = 0, we further have the BLUP of αi :

α^i,BLUP=DZi'Vi1(yiXiβ^GLS)

Another special case that is useful for diagnostics is the residual η=it In this case, we

have c = 0 and its BLUP is straightforwardly shown as

e^it,BLUP=yit(zit'α^i,BLUP+zit'β^GLS).

Some special cases of BLUPs are available in package nlme. For the example of the error- components model EC.fit, function ranef() could be used to get the BLUP of random intercept a*,BLUP, and function residuals() could be used get the BLUP of residuals ljt,BLUp and its standardized version.

> # BLUP
> alpha.BLUP <- ranef(EC.fit)
> beta.GLS <- fixef(EC.fit)
> resid.BLUP <- residuals(EC.fit, type="response")
> rstandard.BLUP <- residuals(EC.fit, type="normalized")
> alpha.BLUP
(Intercept)
10 -0.2049993
11 -6.9197373
12 17.7349235
13 20.9538588
14 -0.1942180
15 -5.6464625
et cetera

To conclude this section, we compare the performance of alternative models using the data of automobile insurance. Our interest is to predict the expected claims of each policy­holder in the next year. So the quantity of interest is η=E(yi,Ti+1|αi) . The corresponding BLUP is ηBLUP=zit'α^i,BLUP+xitβ^GLS

Recall that we developed various longitudinal data models using data of years 1993-1997, and use the data of year 1998 to validate the prediction. Table 15.3 presents the performance of various longitudinal data models based on both in-sample and out-of-sample data. For in-sample data, we report the information- based model selection criteria AIC and BIC. For out-of-sample, we report the sum of squared prediction error (SSPE) and the sum of absolute prediction error (SAPE). The results show that models that account for subject-specific effects perform better, regardless of the way that heterogeneity is accommodated.

> # Use data of year 1998 for validation
> AutoClaimOut <- subset(AutoClaim, YEAR == 1998)
> # Define new variables
> AutoClaimOut$lnPCI <- log(AutoClaimOut$PCI)
> AutoClaimOut$lnPPSM <- log(AutoClaimOut$PPSM)
> AutoClaimOut$YEAR <- AutoClaimOut$YEAR-1992
> # Compare models Pool.fit, SCar.fit, FE.fit, EC.fit, RE.fit and FEar.fit
> # Fixed-effects model with AR(1)
> FEar.fit <- gls(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1,
+ data=AutoClaimIn, correlation=corAR1(form=~1|TOWNCODE))
> FEar.fit.ml <- gls(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1,
+ data=AutoClaimIn, correlation=corAR1(form=~1|TOWNCODE), method="ML")
# Prediction
> Xmat <- cbind(rep(1,nrow(AutoClaimOut)),AutoClaimOut$lnPCI,
+ AutoClaimOut$lnPPSM,AutoClaimOut$YEAR)
beta.Pool <- coef(Pool.fit) pred.Pool <- Xmat%*%beta.Pool
MSPE.Pool <- sum((pred.Pool - AutoClaimOut$AC)"2)
MAPE.Pool <- sum(abs(pred.Pool - AutoClaimOut$AC))
beta.SCar <- coef(SCar.fit) pred.SCar <- Xmat%*%beta.SCar
MSPE.SCar <- sum((pred.SCar - AutoClaimOut$AC)~2)
MAPE.SCar <- sum(abs(pred.SCar - AutoClaimOut$AC))
beta.FE <- coef(FE.fit)[-(1:29)]
pred.FE <- coef(FE.fit)[1:29] + Xmat[,-1]%*%beta.FE MSPE.FE <- sum((pred.FE - AutoClaimOut$AC)"2)
MAPE.FE <- sum(abs(pred.FE - AutoClaimOut$AC))
beta.FEar <- coef(FEar.fit)[-(1:29)]
pred.FEar <- coef(FEar.fit)[1:29] + Xmat[,-1]%*%beta.FEar MSPE.FEar <- sum((pred.FEar - AutoClaimOut$AC)"2)
MAPE.FEar <- sum(abs(pred.FEar - AutoClaimOut$AC))
alpha.EC <- ranef(EC.fit)
beta.EC <- fixef(EC.fit)
pred.EC <- alpha.EC+Xmat%*%beta.EC
MSPE.EC <- sum((pred.EC - AutoClaimOut$AC)~2)
MAPE.EC <- sum(abs(pred.EC - AutoClaimOut$AC))
alpha.RE <- ranef(RE.fit)
beta.RE <- fixef(RE.fit)
pred.RE <- alpha.RE+Xmat%*%beta.RE
MSPE.RE <- sum((pred.RE - AutoClaimOut$AC)"2)
MAPE.RE <- sum(abs(pred.RE - AutoClaimOut$AC))

Table 15.3

Comparison of alternative models.

In-Sample

Out-of-Sample

AIC

BIC

SSPE

SAPE

Pooled cross-sectional model

1359.50

1374.38

22201.78

681.25

Pooled cross-sectional with AR(1)

1339.34

1357.21

21242.64

658.98

Fixed-effects model

1293.33

1391.56

21506.07

660.59

Fixed-effects with AR(1)

1286.03

1387.24

21573.79

662.04

Error-components model

1323.21

1341.07

19515.86

619.44

Error-components with AR(1)

1325.17

1346.01

19572.94

620.64

15.3 Generalized Linear Models for Longitudinal Data

As in the previous section, we have a dataset at our disposal consisting of n subjects, where for each subject i, (1in)Ti observations are available. Relevant examples in experience rating are (among others) a dataset with n policyholders followed over time, and for which claim counts and severities are registered during each time period under consideration. As explained in Section 15.1 and demonstrated in Section 15.2 for linear models, we extend the GLMs discussed in Chapter 14 by including subject- (or, policyholder-) specific random effects. The random effects structure correlation between observations registered on the same subject, and also take heterogeneity among subjects, due to unobserved characteristics, into account. Therefore, our approach is in line with the random effects approach discussed in Section 15.2.4. Other methods exist for the analysis of longitudinal data in the framework of generalized linear models (the so-called marginal and conditional models; see Verbeke & Molenberghs (2000) and Antonio & Zhang (2014) for a discussion), but those will not be covered here.

15.3.1 Specifying Generalized Linear Models with Random Effects

Given the vector αi with the random effects for subject i, the repeated measurements Yi1,.....,YiTi are assumed to be independent with a density from the exponential family

f(yit|αi,β,ϕ)=exp(yitθitψ(θit)ϕ+c(yit,ϕ)),t=1,...Ti(15.13)

Some explicit examples follow in the illustrations discussed below. Similar to expressions obtained in Chapter 14, the following (conditional) relations hold:

μit=E[Yit|αi]=ψ'(θit) and var[Yit|αi]=ϕψ"(θit)=ϕV(μit)(15.14)

where g(μit)=z'itαi+x'itβ. . As before, g(.) is called the link and V() is the variance func­tion. β (p x 1) denotes the fixed-effects parameter vector (governing a priori rating) and αi(q×1) the random-effects vector. xit(p×1) and zit(q×1) contain subject i's covariate information for the fixed and random effects, respectively. The specification of the GLMM is completed by assuming that the random effects, αi(i=1,....,n) , are mutually inde­pendent and identically distributed with a density function f(αi|v) . Herewith, v denotes the unknown parameters in the density. In general statistics, the random effects often have a (multivariate) normal distribution with zero mean and covariance matrix determined by v. Observations on the same subject are dependent because they share the same random effects αi .

The likelihood function for the unknown parameters β,v and ϕ then becomes

L(β,v,ϕ,y)=i=1nf(yi|α,β,ϕ)=i=1nt=1Tif(yit|αi,β,ϕ)f(αi|v)dαi,(15.15)

where y=(y1',....,yn')' and the integral is with respect to the q-dimensional vector. For instance, with normally distributed data and random effects (our setting in Section 15.2), the integral can be worked out analytically and explicit expressions follow for the maximum likelihood estimator of β and the Best Linear Unbiased Predictor ('BLUP') for αi . For more general GLMMs, however, approximations to the likelihood or numerical integration techniques are required to maximize Equation (15.15) with respect to the unknown param­eters. Such techniques are discussed (and demonstrated) in Antonio & Zhang (2014) (and references therein).

To illustrate the concepts described above, we now consider a Poisson GLMM with normally distributed random intercept, that is, a Poisson error components model. This GLMM allows explicit calculation of the marginal mean and covariance matrix. In this way, one can clearly see how the inclusion of the random effect leads to overdispersion and within-subject covariance.

Example 15.1 (A Poisson GLMM) Let Nit denote the claim frequency registered in year t for policyholder i. Assume that, conditional onαi ; Nit follows a Poisson distribution with mean E[Nit|αi]=exp(xit'β+αi)andαi~N(0,σb2)

Straightforward calculations lead to

Var(Nit)=Var(E(Nit|αi))+E(Var(Nit|αi))(Nit) =E(exp(xit'β)[exp(3σb2/2)exp(σb2/2)]+1), (15.16)

and

Cov(Nit,Nit2)=Cov(E(Nit1|αi),E(Nit2|αi))+E(Cov(Nit1,Nit1|αi))

=exp(xit'β)exp(xit2'β)(exp(2σb2)exp(σb2)).(15.17)

Hereby, we used the expressions for the mean and variance of a log-normal distribution. In the expression for the covariance, we used the fact that, given the random effect αi Nit1 and Nit2 are independent. We see that the expression inside the parentheses in Equation (15.16) is always bigger than 1. Thus, although Nit|αi follows a regular Poisson distribution, the marginal distribution of Nit is overdispersed. According to Equation (15.17), due to the random intercept, observations on the same subject are no longer independent.

Example 15.2 (A Poisson GLMM — continued) Let Nit again denote the claim frequency for policyholder i in year t. Assume that, conditional on αiNit follows a Pois­son distribution with mean E[Nit|αi]=exp(xit'β+α) and that αi~N(σb22,σb2) This re-parameterization is commonly used in ratemaking. Indeed, we now get

E[Nit]=E[E[Nit|αi]]=(xit'βσb22+σb22)=exp(xit,β), (15.18)

and

E[ N it | α i ]=exp( x it ' β+ α i ).(15.19)

This specification shows that the a priori premium, given byexp(xit,β) ,is correct on the average. The a posteriori correction to this premium is determined by exp(αi) Besides the log-normal distribution from the above examples, other mixing distributions can be used. In the Poisson-Gamma framework, for instance, the conjugacy of these distributions allows for explicit calculation of the predictive premium. Example 15.3 (A Poisson-Gamma rating model).


Nit~Poi(biλit),where λit=exp(xit'β)and bi~Γ(a,a)

It follows that E[bi]=1 and the resulting joint, unconditional distribution then becomes

pr(Ni1=ni1,...,NiTi=niTi)(t=1Tiλitnitnit!) =Γ(t=1Tinit+α)Γ(α)(αt=1Tinit+α)α×(t=1Tinit+α)t=1Tinit(15.20)

with E[Nit]=E[E[Nit|bi]]=λ and Var[Nit]=[Var[Nit|bi]]+Var[E[Nit|bi]]=λit+1αλit2 For the specification in Equation (15.20), the posterior distribution of the random inter­cept b* has again a Gamma distribution with

f(bi|Ni1=ni1,....,NiTi=niTi)Γ(t=1Tinit+a,t=1Tiλit+a).(15.21)

The (conditional) mean and variance of this posterior distribution are given, respectively, by

E[bi|Nit=nit,t=1,....,Ti]=a+t=1Tinita+t=1Tiλit and(15.22)

Var[bi|Nit=nit,t=1,....,Ti]=a+t=1Tinit(a+t=1Tiλit)2.(15.23)

This leads to the following a posteriori premium

E[Ni,Ti+1|Nit=nit,t=1,....,Ti]=λiTi+1E[bi|Nit=nit,t=1,....,Ti]

=λiTi+1{α+t=1Tinitα+t=1Tiλit}(15.24)

The above credibility premium is optimal when a quadratic loss function is used. Indeed, as is known in mathematical statistics, the conditional expectation minimizes a mean squared error criterion.

Experience rating based on multilevel (panel or higher order) models poses a challenge to the insurer when it comes to communicating the predictive results of these models to the policyholders. Customers may find it difficult to understand. It is not readily transparent to an ordinary policyholder how the surcharges (maluses) for reported claims and the discounts (bonuses) for claim-free periods are evaluated. In order to establish an experience rating system where insureds can easily understand the effect of reported claims or periods without claims, Bonus-Malus scales have been developed. We develop a case study (using R) of such scales in Section 15.3.2.

15.3.2 Case Study: Experience Rating with Bonus—Malus Scales in R

We now demonstrate how the statistical models from Section 15.3 allow us to develop a specific type of experience rating system, namely a Bonus-Malus ([BM]) scale. This type of experience rating is very common in motor (or vehicle) insurance. See Lemaire (1984) and Denuit et al. (2007) for detailed discussions. In a BM scale, an a priori tariff is adjusted based on the claim history of a policyholder. A “good” history will create a bonus, and therefore premium reduction. A 'bad' performance causes a malus, and penalizes the policyholder by a premium increase. We closely follow Denuit et al. (2007) in this section, and extend the discussion in Antonio & Valdez (2012) with an implementation in R of a simple BM scale.

Experience rating with a BM scale is appealing from a commercial and communication point of view. An insurer can easily explain to a customer how his claims reported in year t will change the premium applicable to year t +1 for automobile insurance. To discuss the probabilistic, statistical, as well as computational aspects of Bonus-Malus scales, a credibility model similar to the one in Example 15.3 is assumed. Let Nit denote the number of claims registered for policyholder i in year t. Our credibility model is structured as follows:

  • Policy(holder) i of the portfolio (i = 1,... ,n) is represented by a sequence ( Θi,Ni) where Ni=(Ni1,Ni2,.....)and Θi and Θi, represents unexplained heterogeneity and has mean 1;
  • Given Θi=θ, the random variables Nit(t=1,2,....) are independent and p(λitθ) distributed; and
  • The sequences (Θi,Ni)(i=1,...,n) are assumed to be independent.

15.3.2.1 Bonus—Malus Scales

A BM scale consists of a certain number of levels, say s +1, that are numbered from 0,..., s, with 0 being the best scale. Let 0 be the entrance level of a new driver. According to the number of claims reported during the insured period, drivers will move up and down the scale. A claim-free year results in a bonus point, which implies that the driver goes one level down. Claims are penalized by malus points, meaning that for each claim filed, the driver goes up a certain number of levels, denoted with pen (for penalty). We introduce a set of random variables that allows us to describe the technicalities of a BM scale. Lk represents the level occupied by the driver in the time interval (k,k+1) . Thus, LK takes a value in {0,....s},and {L1,L2....} is the driver's trajectory over time. With Nk s the number of claims reported by the insured in the period (k — 1, k), the future level of an insured Lk is obtained from the present level Lk1 and the number of claims reported during the present year Nk. We recognize the so-called Markov property: the future depends on the present but not on the past. The relativityr associated with each level in the scale determines the premium discount/penalty awarded to the driver. A policyholder who has at present

Table 15.4

Transitions in the (-1/top scale) BM system.

Starting

Level

Level

0

Claim

Occupied if > 1 is Reported

0

0

5

1

0

5

2

1

5

3

2

5

4

3

5

5

4

5

a priori premium λit (determined using the techniques from Chapter 14) and is in scale , has to pay r×λit . With r<1 the driver receives a discount based on a favorable record of past claims. When r<1 the driver is penalized for his past performance. The relativities, together with the transition rules in the scale, are the commercial alternative for the credibility-type corrections to an a priori tariff, as discussed above. We want to demonstrate in this section the calculation of these relativities for a given portfolio and BM scale.

Example 15.3 (-1/Top Scale) We consider a very simple example of a BM scale to illus­trate the concepts : the (-1/Top Scale). See Denuit et al. (2007) for more realistic examples. This scale has six levels, numbered 0,1,... ,5. Starting class is level 5. Each claim-free year is rewarded by one bonus class. When an accident is reported, the policyholder is transferred to scale 5. Table 15.4 represents these transitions.

15.3.2.2 Transition Rules, Transition Probabilities and Stationary Distribution

To enable the calculation of the relativity corresponding with each level £, some probabilistic concepts associated with BM scales must be introduced. The transition rules correspond­ing with a certain BM scale are indicator variables tij(k) such that

tij(k){1 if the policy transfers from i to j when k claisms are reported,0 otherwise.

1 if the policy transfers from i to j when k claims are reported, . tin (k) = < (15.25)

We define the transition matrix T(k), with k the number of claims reported by the driver,

T(k)=(t00(k)t01(k)...t0s(k)t01(k)t11(k)...tls(k)tso(k)ts1(k)...tss),(15.26)

Where

tij={ 0 otherwise 1 if the policy transfers from i to j when k claisms are reported, ( k) (15.27)

Thus, this matrix is a 0 - 1 matrix and each row has exactly one 1.

Assuming s N1,N2,.... are independent and P(θ) distributed, the trajectory this driver follows through the scale will be represented as {L1(θ),L2(θ),....} The transition probability of this driver go from level 1 to 2 in a single step is

p12(θ)=P[Lk+1(θ)=2|Lk(θ)=1]

=n=0+P[Lk+1(θ)=2|Nk+1=n,Lk(θ)=1]P[Lk+1=n|Lk(θ)=1]

=n=0+θnn!exp(θ)t12(n)(15.28)

where we used the independence of Nk+1 and Lk. In matrix form, the one-step transition matrix P(θ) is given by

p(θ)=(p00(θ)p01(θ)p0s(θ)p10(θ)p11(θ)p1s(θ)ps0(θ)ps1(θ)pss(θ))(15.29)

The probability of being transferred from level i to level j in n steps is expressed by the n-step transition probability pij(n)

pij(n)(θ)=P[Lk+n(θ)=j|Lk(θ)=i]

=i1=0si2=0sin1=0spii1(θ)pi1i2(θ)...pin1j(θ), (15.30)

which composes the n-step transition matrixp(n)(θ)

p(n)(θ)=(p00(n)(θ)p01(n)(θ)p0s(n)(θ)p10(n)(θ)p11(n)(θ)p1s(n)(θ)ps0(n)(θ)ps1(n)(θ)pss(n)(θ)).(15.31)

The following relation holds between the 1 and n-step transition matrices:p(n)(θ)=p(n)(θ)

Ultimately, the BM system will stabilize and the proportion of policyholders occupy­ing each level of the scale will remain unchanged. These proportions are captured in the stationary distributionπ(θ)=(π0(θ),....,πs(θ))', which are defined as

π2(θ)=limn+p12(n)(θ).(15.32)

Correspondingly, P(n)(θ) converges to (θ) defined as

limn+p(n)(θ)=(θ)(π'(θ)π'(θ)π'(θ))(15.33)

For the BM scale introduced in Illustration 15.3 the transition and one-step probability

T=(0)(100000100000010000001000000100000010)  and T=(1)(000001000001000001000001000001000001)(15.34)

P=(θ)(exp(θ)00001exp(θ)exp(θ)00001exp(θ)0exp(θ)0001exp(θ)00exp(θ)001exp(θ)000exp(θ)01exp(θ)0000exp(θ)1exp(θ))(15.35)

In R, we specify this one-step transition matrix P as follows:

Pmatrix =
function(th) {
P = matrix(nrow=6,ncol=6,data=0)
P[1,1]=P[2,1]=P[3,2]=P[4,3]=P[5,4]=P[6,5]= exp(-th)
P[,6] = 1-exp(-th)
return(P)}

Using a result from Rolski et al. (1999) (also see Denuit et al. (2007)), the stationary π(θ) distribution can be obtained as n π'(θ)=e'(IP(θ)+E)1, with E the (s+1)x(s+2) matrix with all entries 1. For the (— 1/Top Scale), this results in

P=(θ)(2exp(θ)1111exp(θ)1exp(θ)2111exp(θ)11exp(θ)211exp(θ)111exp(θ)21exp(θ)1111exp(θ)2exp(θ)11111exp(θ)1+exp(θ))1(15.36)

We specify the stationary distribution of the (— 1/Top Scale) in R:

lim.distr = function(matrix) {
et = matrix(nrow=1, ncol=dim(matrix)[2], data=1)
E = matrix(nrow=dim(matrix)[1], ncol=dim(matrix)[2], data=1)
mat = diag(dim(matrix)[1]) - matrix + E
inverse.mat = solve(mat)
p = et inverse.mat
return(p)}

For instance, with 0 = 0.1 (as in the example of Denuit et al. (2007), page 180, Example 4.9), the stationary distribution becomes

π'(0,1)=(0:6065307  0:06378939  0:07049817  0:07791253  0:08610666  0:09516258).(15.37)

In R, we use the following instructions:

> P = Pmatrix(0.1)
> P
   [,1]  [,2]  [,3]  [,4]  [,5]   [,6]
[1,] 0.9048374 0.0000000 0.0000000 0.0000000 0.0000000 0.09516258
[2,] 0.9048374 0.0000000 0.0000000 0.0000000 0.0000000 0.09516258
[3,] 0.0000000 0.9048374 0.0000000 0.0000000 0.0000000 0.09516258
[4,] 0.0000000 0.0000000 0.9048374 0.0000000 0.0000000 0.09516258
[5,] 0.0000000 0.0000000 0.0000000 0.9048374 0.0000000 0.09516258
[6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.9048374 0.09516258
> pi = lim.distr(P)
> pi
  [,1]   [,2]  [,3]   [,4]  [,5]  [,6]
[1,] 0.6065307 0.06378939 0.07049817 0.07791253 0.08610666 0.09516258

15.3.2.3 Relativities

The calculation of the relativities in a BM scale reveals some similarities with explicit credibility-type calculations. Following Norberg (1976) with the number of levels and tran­sition rules being fixed, the optimal relativitys re , corresponding with level , is determined by maximizing the asymptotic predictive accuracy. This implies that one tries to minimize

E[(ΘrL)2],(15.38)

the difference between the relativity rL and the “true” relative premium Θ , under the assumptions of our credibility model. Simplifying the notation in this model, the a priori premium of a random policyholder is denoted with Λ and the residual effect of unknown risk characteristics with Θ . The policyholder then has (unknown) annual expected claim frequency ΛΘ , where Λ and Θ are assumed to be independent. The weights of different risk classes follow from the a priori system with P[Λ=λk]=wk

Calculation of the r's goes as follows:

minE[(ΘrL)2]==0sE[(Θr)2|L=]P|[L=](15.39)

==0s0+(Θr)2P[L=|Θ=θ]dFΘ(θ)

=kwk0+=0s(Θr)2π(λkθ)dFΘ(θ), (15.40)

where P[Λ=λk] =wk In the last step of the derivation, conditioning is on Λ. It is straightforward to obtain the optimal relativities by solving

E[(ΘrL)2]rj=0 with  j=0,....,s.(15.41)

Alternatively, from mathematical statistics it is well known that for a quadratic loss function

(see Equation (15.39)) the optimal r=E[Θ|L=] This is calculated as follows:

r=E[Θ|L=]

E[E[Θ|L=,Λ]|L=]

=kE[Θ|L=,Λ=λk]P[Λ=λk|L=]

=k0+θP[L=|Θ=θ,Λ=λk]wkP[L=,Λ=λk]dFΘ(θ)P[Λ=λk,L=]P[L=],(15.42)

where the relation fΘ|L=,Λ=λk(θ|,λk)=P[L=|Θ=θ,Λ=λk]×wk×fΘ(θ)P[Λ=λk,L=], is used. The optimal relativities are given by

r=kwk0+θπ(λkθ)dFΘ(θ)kwk0+θπ(λkθ)dFΘ(θ)(15.43)

When no a priori rating system is used, all the λ^k 's are equal (estimated by λ^) and the relativities reduce to

r=0+θπ(λkθ)dFΘ(θ)0+θπ(λkθ)dFΘ(θ)(15.44)

Calculation of these relativities in R goes as follows. We replicate Example 4.11 from Denuit et al. (2007) where no a priori rating is used. This example uses a Γ(a, a) distribution for the policyholder-specific random effect Θi (as in Illustration 46), with α^ = 0.888 and λ^ = 0.1474. Those estimates are obtained by calibrating a Negative Binomial distribution on the data from Portfolio A in Denuit et al. (2007) (see Section 1.6, pages 44-45 , in the book). Data in Portfolio A are the claim counts registered on 14,505 policies during calendar year 1997.

### Without a priori ratemaking
a.hat = 0.8888
lambda.hat = 0.1474
inti =
function(theta, s, a, lambda) { 
	a = a.hat
	lambda = lambda.hat
	f.dist = gamma(a)~(-1) * a~a * theta~(a-1) * exp(-a*theta)
	p	 = lim.distr(Pmatrix((lambda*theta)))
return(theta*p[1,s+1]*f.dist)}
P1 = matrix(nrow=1, ncol=6, data=0)
for (i in 0:5) P1[1,i+1] = integrate(Vectorize(int1),lower=0,upper=Inf,s=i)$value
int2 =
function(theta, s, a, lambda) {
	a = a.hat
	lambda = lambda.hat
	f.dist = gamma(a)~(-1) * a~a * theta~(a-1) * exp(-a*theta) 
	p = lim.distr(Pmatrix((lambda*theta)))
return(p[1,s+1]*f.dist)}
P2 = matrix(nrow=1, ncol=6, data=0)
for (i in 0:5) P2[1,i+1] = integrate(Vectorize(int2),lower=0,upper=Inf,s=i)$value R = P1 / P2
> R # relativities without a priori rating
[,1] [,2] [,3] [,4]  [,5] [,6]
[1,] 0.5466848 1.21958 1.348203 1.507254 1.709032 1.973534

To demonstrate the calculation of relativities when accounting for a priori rating, we use the Portfolio A data from Denuit et al. (2007) again with the λ^ k's and ω k's printed in Table 2.7 (page 91) of the book. λ^ k is the a priori annually expected claim frequency for risk class k, as determined by a set of a priori observed risk factors. The selection of risk factors and estimated annual claim frequencies are obtained by fitting a Negative Binomial regression model to the Portfolio A data. Negative Binomial regression for a single year of data on observed claim counts, say kj with i = 1... ,N is based on the following likelihood

L(β,α)i=1nλikiki!(aa+λi)a(a+λi)kΓ(a+ki)Γ(a),(15.45)

where λi=diexp(xitβ) (with dj the exposure registered for policyholder i). Negative Bi­nomial regression is available in R from the glm.nb() function.

lambda = c(0.1176,0.1408,0.1897,0.2272,0.1457,0.1746,0.2351,0.2816,
				0.1761,0.2109,0.2840,0.3402,0.2182,0.2614,0.3520,0.0928,
				0.1112,0.1498,0.1794,0.1151,0.1378,0.1856,0.2223)
weights = c(0.1049,0.1396,0.0398,0.0705,0.0076,0.0122,0.0013,0.0014,
				0.0293,0.0299,0.0152,0.0242,0.0007,0.0009,0.0002,0.1338,
				0.1973,0.0294,0.0661,0.0372,0.0517,0.0025,0.0044)
a = 1.065 n=length(weights)
int3 =
function(theta, lambda, a, l) {
	p = lim.verd(Pmatrix(lambda*theta))
	f.dist = gamma(a)“(-1) * a“a * theta“(a-1) * exp(-a*theta)
	return(theta*p[1,l+1]*f.dist)}
int4
	function(theta, lambda, a, l) {
	p = lim.verd(Pmatrix(lambda*theta))
	f.dist = gamma(a)“(-1) * a“a * theta“(a-1) * exp(-a*theta) 
	return(p[1,l+1]*f.dist)}
teller1 = teller2 = noemer = array(dim=6, data=0) 
result1 = result2 = array(dim=6, data=0)
for (i in 0:5) {
	b = c = array(dim=n,data=0) 
	for (j in 1:n) {
		b[j] = integrate(Vectorize(int3),lower=0, upper=Inf,lambda=lambda[j],a=a,l=i)$value
		c[j] = integrate(Vectorize(int4),lower=0, upper=Inf,lambda=lambda[j],a=a,l=i)$value}
	teller1[i+1] = b %*% weights 
	noemer[i+1] = c %*% weights 
	R = teller1/noemer
	}
> R # relativities with a priori rating
[1] 0.6118907 1.2088841 1.3124752 1.4388207 1.5985014 1.8123074

Summarizing, we obtain the relativities displayed in Table 15.5 (with and without a priori rating) for the (—1/Top Scale) and the Portfolio A data from Denuit et al. (2007). The a posteriori corrections are less severe when a priori rating is taken into account.

Table 15.5

Numerical characteristics for the (— 1/top scale) and portfolio A data from Denuit et al. (2007), without and with a priori rating taken into account.

rℓ = E[©L = ℓ]

rℓ = E[©L = ℓ]

Level

without a priori

with a priori

5

197.3%

181.2%

4

170.9%

159.9%

3

150.7%

143.9%

2

134.8 %

131.3%

1

122.0%

120.9%

0

54.7%

61.2%

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.187.233