Chapter 15
Katrien Antonio
Universiteit van
Amsterdam Amsterdam, Netherlands
Peng Shi
University of Wisconsin Madison
Madison, Wisconsin, USA
Frank van Berkum
Universiteit van Amsterdam
Amsterdam, Netherlands
In Chapter 14 on “General Insurance Pricing,” Boucher & Charpentier discuss regression techniques suitable for pricing with cross-sectional data. A cross-sectional dataset observes each subject in the sample (for instance, a policy(holder)) only once. Each subject is therefore described by a single response, say s Yi (Generalized) for subject i, and vector with covariate information, xi. Assuming independence between subjects, Linear Models [(G)LMs] are directly available for statistical modeling and explain the response as a function of the risk factors, within an appropriate distributional framework. For pricing in general insurance, the actuary builds such models for a (cross-sectional) dataset with claim counts on the one hand and claim severities on the other hand as response variables, obtained by following policyholders during a single time period. The result is a tariff based on risk classification through regression modeling. When the explanatory variables used as rating factors express a priori correctly measurable information about the policyholder (or, for instance, the vehicle or the insured building), the system is called an a priori classification scheme. The examples in Chapter 14 illustrate this idea and use, for instance, age of the driver and age of the car to explain the number of claims registered by a policyholder.
However, despite the presence of an a priori rating system, some important risk factors remain unmeasurable or unobservable. For example, in automobile insurance, the insurer is unable to detect the driver's aggressiveness behind the wheel or the quickness of his reflexes to avoid a possible accident (see Denuit et al. (2007) for further motivation). This motivates the presence of inhomogeneous tariff cells within an a priori rating system. An a posteriori or experience rating system is necessary to allow for the reevaluation of the premium (established a priori) based on the history of claims as reported by the insured. One can argue that an important predictor for the future number of claims reported by an insured will be the number of claims reported in the past. Predictive modeling for experience rating will confront analysts with data structures going beyond the cross-sectional design dealt with in (G)LMs. Longitudinal (or panel) data arise when the claim history of policyholders (or, in general, a group of “subjects”) is registered repeatedly over time. Thus, with longitudinal data, the variables will have double subscripts, indicating the subject and observation period, respectively. Specifically, let Yit denote the response for the i-th subject in the t-th time period, and let xit denote the associated vector of explanatory variables. Assuming that there are n subjects and following the i-th subject over t=1,...,Ti time periods, we observe
1st subject{(y11,X11),(y12, X12),⋯,(y11, X1T1)}2st subject{(y21,X21),(y22,X22),⋯,(y21,X2T2)}⋮⋮n-th subject{(yn1,Xn1),(yn2,Xn2),⋯,(yn1,XnTn)}
Longitudinal data have several potential advantages. First, longitudinal data are a hybrid of cross-sectional and time series data. On the one hand, they allow for the examination of the effects of covariates on the response, as in usual regression. On the other hand, similar to time series analysis, they also permit the identification of dynamic relations over time. Because they share subject-specific characteristics, observations on the same subject over time are correlated and require an adjusted toolkit for statistical modeling. In this chapter we study regression models incorporating these dynamics, among others, by extending a priori rating with so-called random effects. These random effects structure correlation between observations registered on the same subject, and also take heterogeneity among subjects, due to unobserved characteristics, into account.
The panel data setting has two layers (or levels) of data: the time level on the one hand and the timesubject level on the other hand. However, insurers may have several other layers of data at their disposal. For example, Antonio et al. (2010) discuss experience rating for a dataset on fleet covers, registered for multiple insurance companies. Fleet policies are umbrella-type policies issued to customers whose insurance covers more than a single vehicle. The hierarchical or multilevel structure of the data is as follows: vehicles (v) observed over time (t), nested within fleets (f), with policies issued by insurance companies (c). Multilevel models allow for incorporating the hierarchical structure of the data by specifying random effects at the various levels in the data. Once again, these random effects represent unobservable characteristics at each level. Moreover, random effects allow a posteriori updating of an a priori tariff, by taking into account the past performance of—in the case of intercompany fleet contracts—the vehicle, fleet, and company.
Section 15.2 of this chapter considers linear models for longitudinal data. We discuss three approaches to capture unobserved heterogeneity in the longitudinal data context. Section
15.2.2 introduces the basic fixed effects model and describes the model specification and diagnostics. Section 15.2.3 extends these models to incorporate serial correlation in error terms. Section 15.2.4 presents models with random effects and generalizes the framework to linear mixed models. Section 15.2.5 covers the prediction for the linear mixed effects model and points out its connection to the actuarial credibility theory. The computational aspects are illustrated using a dataset introduced in Section 15.2.1. In Section 15.3 we leave the framework of linear models and switch to a distributional framework that is probably more appealing to actuaries, namely the generalized linear models and their random effects extensions. Actuarial credibility systems are examples of a posteriori rating systems accounting for the history of claims as it emerges for an individual risk. Commercial versions of these experience rating schemes are more widely known in practice as Bonus-Malus scales. A case study (using R) with such rating schemes is the topic of Section 15.3.2. The theory on longitudinal data models is based on Diggle et al. (2002), Frees (2004), Hsiao (2003), Wooldridge (2010), and the references therein. Section 15.3 of this chapter is based on Antonio & Valdez (2012), Antonio & Zhang (2014), and Denuit et al. (2007) but focus now on implementation with R. We refer to these papers and the references therein for more technical background. This chapter only covers examples with panel data. We refer to Antonio et al. (2010) for examples with multilevel data structures.
For linear longitudinal data models, we demonstrate the theory and computational aspects using a dataset of automobile bodily injury liability claims that was described and employed in Frees & Wang (2005). The dataset contains claims of 6 years from 1993 to 1998 for a random sample of twenty-nine towns in the state of Massachusetts. All variables in monetary values are rescaled using the consumer price index to mitigate the effect of time trends. We are interested in the behavior of average claims per unit of exposure, that is, the pure premium, for each town and each year. Two explanatory variables are available for the regression analysis, the per-capita income (PCI) and the population per square mile (PPSM) of each town. The variables and their descriptions are summarized in Table 15.1.
Description of variables in the auto claim dataset.
Variable |
Description |
TOWNCODE |
The index of Massachusetts towns |
YEAR |
The calendar year of the observation |
AC |
Average claims per unit of exposure |
PCI |
Per-capita income of the town |
PPSM |
Population per square mile of the town |
> # File name is AutoClaimData.txt
> AutoClaim = read.table(choose.files(), sep = "", quote = "",header=TRUE)
> names(AutoClaim)
[1] "TOWNCODE" "YEAR" "AC" "PCI" "PPSM"
> AutoClaim[1:12,] # Check longitudinal structure
TOWNCODE YEAR AC PCI PPSM
1 10 1993 160.8522 18134.04 1475.5515
2 10 1994 158.3382 18495.88 1461.8110
3 10 1995 156.8098 18778.29 1488.9911
4 10 1996 168.9899 18740.46 1502.9322
5 10 1997 171.8229 18809.62 1534.4251
6 10 1998 153.7644 19034.59 1557.6937
7 11 1993 149.3873 15597.56 855.4350
8 11 1994 137.5546 15908.79 877.2725
9 11 1995 169.9164 16151.69 872.8024
10 11 1996 169.0598 16119.15 898.7802
11 11 1997 161.3425 16178.64 929.2647
12 11 1998 138.0516 16372.14 940.9162
We use the data in the first 5 years, namely 1993-1997, to develop the model and keep the observations in the final year for validation purposes. To explore relations among variables, the techniques used for usual regressions such as histogram and correlations statistics are ready to apply for longitudinal data. In addition, we introduce several more specialized techniques. The first is the multiple time series plot as exhibited in Figure 15.1, where the average claims in multiple years for each town are joined using straight lines. The plot shows the development of claims over time and helps visualize town-specific effects.
# Use year 1993-1997 as trainning data and reserve year 1998 for validation AutoClaimIn <- subset(AutoClaim, YEAR < 1998)
> # Multiple time series plot
> plot(AC ~ YEAR, data = AutoClaimIn, ylab="Average Claim", xlab="Year")
> for (i in AutoClaimIn$TOWNCODE) {
+ lines(AC ~ YEAR, data = subset(AutoClaimIn, TOWNCODE == i))}
One can also use scatterplots to help detect the relation between the response and explanatory variables. Figure 15.2 displays the scatterplot for variables PCI and PPSM, suggesting the negative relation between AC and PCI and the positive relation between AC and PPSM. Note that we use both PCI and PPSM in log scale, and logarithmic values will be used in the following analysis. In addition, we also serially connect the observations to identify potential patterns in each covariate. In this case, we observe that PCI varies over time and PPSM is relatively statable.
> # Scatter plot to explore relations
> AutoClaimIn$lnPCI <- log(AutoClaimIn$PCI)
> AutoClaimIn$lnPPSM <- log(AutoClaimIn$PPSM)
> plot(AC ~ lnPCI, data = AutoClaimIn, ylab="Average Claim", xlab="PCI")
> for (i in AutoClaimIn$TOWNCODE) {
+ lines(AC ~ lnPCI, data = subset(AutoClaimIn, TOWNCODE == i))}
> plot(AC ~ lnPPSM, data = AutoClaimIn, ylab="Average Claim", xlab="PPSM")
> for (i in AutoClaimIn$TOWNCODE) {
+ lines(AC ~ lnPPSM, data = subset(AutoClaimIn, TOWNCODE == i))}
As a preliminary analysis, we consider a pooled cross-sectional regression model Pool.fit assuming all observations are independent, that is,
yit= α+x′itβ+εit(15.1)
Here, α is the homogeneous intercept for all towns and β is the vector of regression coefficients. Variables PCI, PPSM, and YEAR are included as covariates. As expected, we observe a significant negative effect of PCI and a positive effect of PPSM. We also observe an increasing trend in claims after purging off the inflation. Functions such as lm and anova are used to fit and analyze the ordinary least squares regression:
> AutoClaimIn$YEAR <- AutoClaimIn$YEAR-1992
> Pool.fit <- lm(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn)
> summary(Pool.fit)
Call:
lm(formula = AC ~ lnPCI + lnPPSM + YEAR, data = AutoClaimIn)
Residuals:
Min 1Q Median 3Q Max
-49.944 -16.154 -1.759 14.300 104.468
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 899.569 120.150 7.487 6.98e-12 ***
lnPCI -92.604 11.855 -7.812 1.17e-12 ***
lnPPSM 22.305 2.933 7.606 3.64e-12 ***
YEAR 3.923 1.519 2.583 0.01082 *
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.75 on 141 degrees of freedom
Multiple R-squared: 0.4908, Adjusted R-squared: 0.48
F-statistic: 45.3 on 3 and 141 DF, p-value: < 2.2e-16
> anova(Pool.fit)
Analysis of Variance Table
Response: AC
Df Sum Sq Mean Sq F value Pr(>F)
lnPCI 1 46355 46355 69.9028 5.402e-14 ***
lnPPSM 1 39344 39344 59.3302 2.141e-12 ***
YEAR 1 4423 4423 6.6704 0.01082 *
Residuals 141 93502 663
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Repeated observations allow one to study the heterogeneity, be it either of subject or of time. We begin with the basic fixed effects model by introducing subject-specific intercepts in the model
yit=α+x′itβ+εit.(15.2)
Hereby, αi is a town-specific intercept (i=1,...,n);xit=(xit,1,...,xit,Κ)′ is the vector of covariates; and is the vector of regression coefficients to be estimated. There are alternative methods to treat the heterogeneous intercepts. In this section, we assume {αi} are fixed parameters to be estimated along with. β Here, β is known as the population parameter capturing the common effects of explanatory variables. {αi} , called nuisance parameters, vary by subject (here, town) and account for the subject heterogeneity. In the following, we will be using notations T=max{T1,...,Tn} and N=ΣniTi.
The basic fixed effects model assumes that there is no within-subject serial correlation, that is, s ∈it are i.i.d. random variables with mean zero and variances σ2 . Thus, by the Gauss- Markov theorem, the OLS estimates are the best linear unbiased estimates with
ˆβ=(∑ni=1∑Tit=1(xit−ˉxi)(xit−ˉxi)′)−1(∑ni=1∑Tit=1(xit−ˉx)(yit−ˉyi)′)(15.3)
and
⌢ai=ˉyi−ˉx′i⌢β.(15.4)
Here, ˉyi and are averages of and over time, respectively. The above is also known as within estimator because it uses the time variation within each cross section. In addition, the variance of is shown to be
where s2 is the unbiased estimate of using residuals. In deriving the large sample property, one assumes and T remaining fixed. Under regular conditions, one can show that is consistent and asymptotically normally distributed. However, {} are not consistent and are not even approximately normal if the responses are not normally distributed.
We fit this basic fixed effects model FE.fit using lm by treating TOWNCODE as a categorical variable. The t- and F-statistics are constructed in the same way as in classical regression models. Note that the above model could be easily modified to account for time- specific heterogeneity by replacing , with . Similarly, using categorical variables for the time dimension, least squares estimation is readily applied.
> # Basic fixed-effects model
> FE.fit <- lm(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1, data=AutoClaimIn)
> summary(FE.fit)
Call:
lm(formula = AC ~ factor(TOWNCODE) + lnPCI + lnPPSM + YEAR - 1, data = AutoClaimIn)
Residuals:
Min 1Q Median 3Q Max
-55.645 -8.900 0.177 8.995 50.141
Coefficients:
Estimate Std. Error t value Pr(>|t|)
factor(TOWNCODE)10 1660.321 1846.793 0.899 0.371
factor(TOWNCODE)11 1558.851 1794.617 0.869 0.387
factor(TOWNCODE)12 1554.375 1884.831 0.825 0.411
factor(TOWNCODE)13 1360.128 1731.874 0.785 0.434
factor(TOWNCODE)14 1443.895 1780.094 0.811 0.419
factor(TOWNCODE)15 1681.983 1841.401 0.913 0.363
(et cetera)
lnPCI -22.631 159.268 -0.142 0.887
lnPPSM -176.831 107.240 -1.649 0.102
YEAR 5.947 2.738 2.172 0.032 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 0.05 '.' 0.1 ' ' 1
Residual standard error: 18.88 on 113 degrees of freedom
Multiple R-squared: 0.9863, Adjusted R-squared: 0.9824
F-statistic: 254.4 on 32 and 113 DF, p-value: < 2.2e-16
> anova(FE.fit)
Response: AC
factor(TOWNCODE)
Df Sum Sq Mean Sq F value Pr(>F)
factor(TOWNCODE) 29 2897069 99899 280.3677 < 2e-16 ***
lnPCI 1 2231 2231 6.2621 0.01377 *
lnPPSM 1 34 34 0.0967 0.75638
YEAR 1 1681 1681 4.7168 0.03196 *
Residuals 113 40263 356
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We further discuss three specific tests for model specification and diagnostics. The first is the pooling test, where one wishes to test whether the subject-specific effect is significant. The null hypothesis is
This can be done using the partial F- (Chow) test (see Chow (1960)) by calculating
Here, ErrorSS and are from the heterogeneous model (i.e., FE.fit) and (ErrorSS)Pooled are from the homogeneous model (i.e., pool.fit). It can be shown that F-ratio follows an F-distribution with degrees of freedom df1 = n-1 and df 2= N- (n+K). In this example, the F-statistic is equal to (93, 502 - 40, 263)/(29 - 1)/18.882 = 5.33, so we reject the null hypothesis.
> anova(Pool.fit,FE.fit)
Analysis of Variance Table
Model 1: AC ~ lnPCI + lnPPSM + YEAR
Model 2: AC ~ factor(TOWNCODE) + lnPCI + lnPPSM + YEAR - 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 141 93502
2 113 40263 28 53238 5.3362 7.214e-11 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
An alternative approach to capture heterogeneity is to use serial correlation. The intuition is that if there are some unobserved time constant variables affecting the response, they will introduce correlation among repeated observations. To motivate this approach, we examine the serial correlation of residuals from Pool.fit. The results show strong temporal correlation among AC after removing the effects of explanatory variables. This suggests that the i.i.d. assumption used in the homogeneous model is not appropriate.
> # Correlation among residuals
> AutoClaimIn$rPool <- resid(Pool.fit)
> rvec <- cbind(subset(AutoClaimIn,YEAR==1)$rPool,subset(AutoClaimIn,YEAR==2)$rPool, + subset(AutoClaimIn,YEAR==3)$rPool,subset(AutoClaimIn,YEAR==4)$rPool,
+ subset(AutoClaimIn,YEAR==5)$rPool)
> cor(rvec)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000000 0.5862895 0.5187797 0.4207831 0.5424555
[2,] 0.5862895 1.0000000 0.3911814 0.2164202 0.2555096
[3,] 0.5187797 0.3911814 1.0000000 0.3955654 0.7890728
[4,] 0.4207831 0.2164202 0.3955654 1.0000000 0.4778912
[5,] 0.5424555 0.2555096 0.7890728 0.4778912 1.0000000
To relax the i.i.d. assumption, we first consider a homogeneous model with serial correlation. For subject i, the matrix presentation of the model is
Where
Now we assume that are correlated with Let denote the temporal covariance matrix for a vector of T observations. Unknown parameters in this covariance matrix are denoted with τ. Note there are at most T(T + 1)/2 unknown elements in R. Commonly used special cases of R are (using T = 5):
For the i-th observation, the covariance matrix var matrix. Here, is positive definite and depends on i only through its dimension; thus it can be determined by removing certain rows and columns of the matrix . This set of notations allows us to easily handle missing data and incomplete observations.
The model can be estimated using either moment-based or likelihood-based methods. With known Ri, the generalized least squares (GLS) estimates are obtained by minimizing
and we have
We can estimate such a model using the R package nlme. Two types of likelihood-based methods are provided to estimate regression parameter β and variance components , the full maximum likelihood (ML) estimation and the restricted maximum likelihood (REML) estimation. Based on the assumption of multivariate normality of the response yi, the full
log-likelihood function fo the model is
The MLE follows by maximizing the above likelihood function over β and simultaneously. It is also easy to show that for fixed covariance parameter , the MLE of are the same as the generalized least squares estimators. It is known that the MLE of is biased downward. To mitigate the bias, the restricted maximum likelihood maximizes the following log-likelihood function:
The REML estimation will be discussed in more detail in the section on random-effects models.
In our application, we fit the linear model with three types of serial correlation: the compound symmetry, the AR(1), and the unstructured. See Table 15.2 for the results. We denote the resulting models by SCex.fit, SCar.fit, and SCun.fit, respectively. The models are fit using the function gls() in the nlme package. The argument correlation is used to specify matrix R(), and the argument methodis used to specify the estimation method. The default estimation approach is the REML. The estimation results are displayed in Table 15.2. The estimates of regression coefficients are similar and are consistent with the pooled cross-sectional regression model. The estimates of variance components suggest significant within-subject temporal correlation. Note that when unstructured covariance is specified, the model is not identifiable in its most general form due to the nonuniqueness of . Thus, additional constraints are necessary for identification purposes. The gls() function estimates the model under the parameterization where is a scale parameter and is the correlation matrix.
For inference, the estimation error of population parameter β is based on
The estimation error of can follow in different ways. The approach implemented in the gls() is to use the inverse of the observed Fisher information. The confidence interval for the scale parameter σ and correlation parameter ρs are obtained based on the approximate normal distribution of the ML or REML estimators of a transformation of parameters. Specifically, the 95% confidence interval of σ is
where is the associated standard error derived from the Fisher information. Similarly, the 95% confidence interval of ρ is
where is the corresponding standard error. In the package nlme, function intervals can be used to call for the 95% confidence interval of , and function getVarCov can be used to call for the estimates of .
> library(nlme)
> # Compound symmetry
> SCex.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corCompSymm(form=~1|TOWNCODE))
> summary(SCex.fit)
> intervals(SCex.fit,which = "var-cov")
> getVarCov(SCex.fit)
Marginal variance covariance matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 688.50 326.07 326.07 326.07 326.07
[2,] 326.07 688.50 326.07 326.07 326.07
[3,] 326.07 326.07 688.50 326.07 326.07
[4,] 326.07 326.07 326.07 688.50 326.07
[5,] 326.07 326.07 326.07 326.07 688.50
Standard Deviations: 26.239 26.239 26.239 26.239 26.239
> # AR(1)
> SCar.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corAR1(form=~1|TOWNCODE))
> summary(SCar.fit)
> intervals(SCar.fit,which = "var-cov")
> getVarCov(SCar.fit)
Marginal variance covariance matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 673.210 292.350 126.96 55.132 23.942
[2,] 292.350 673.210 292.35 126.960 55.132
[3,] 126.960 292.350 673.21 292.350 126.960
[4,] 55.132 126.960 292.35 673.210 292.350
[5,] 23.942 55.132 126.96 292.350 673.210
Standard Deviations: 25.946 25.946 25.946 25.946 25.946
> # Unstructured
> SCun.fit <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corSymm(form=~1|TOWNCODE))
> summary(SCun.fit)
> intervals(SCun.fit,which = "var-cov")
> getVarCov(SCun.fit)
Marginal variance covariance matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 696.15 485.50 324.79 315.06 374.16
[2,] 485.50 696.15 227.51 179.88 190.11
[3,] 324.79 227.51 696.15 284.12 522.68
[4,] 315.06 179.88 284.12 696.15 351.96
[5,] 374.16 190.11 522.68 351.96 696.15
Standard Deviations: 26.385 26.385 26.385 26.385 26.385
The usual t- or F-test statistics follow as for the i.i.d. case. Caution is needed for the tests based on the likelihood function. For example, the likelihood ratio test relies on the value of log-likelihood function rather than the restricted likelihood. One can use method="ML" in the gls() function to implement maximum likelihood estimation. We perform the test using anova for the models with serial correlation and the pooled cross-sectional regression. The results support the evidence of positive serial correlation.
Estimation for models with serial correlation.
SCex.fit |
SCar.fit |
SCun.fit |
||||
Parameter |
Est. |
S.E. |
Est. |
S.E. |
Est. |
S.E. |
(Intercept) |
887.89 |
206.81 |
891.45 |
168.25 |
878.68 |
200.85 |
lnPCI |
-91.20 |
20.41 |
-91.33 |
16.61 |
-90.81 |
19.81 |
lnPPSM |
21.96 |
5.08 |
21.76 |
4.11 |
23.70 |
4.95 |
YEAR |
3.91 |
1.14 |
3.55 |
1.66 |
1.82 |
1.03 |
Est. |
95%CI |
Est. |
95%CI |
Est. |
95%CI |
|
CS |
0.47 |
(0.29,0.64) |
||||
AR(1) |
0.43 |
(0.26,0.58) |
||||
UN |
||||||
corr(1,2) |
0.70 |
(0.46,0.84) |
||||
corr(1,3) |
0.47 |
(0.15,0.70) |
||||
corr(1,4) |
0.45 |
(0.06,0.72) |
||||
corr(1,5) |
0.54 |
(0.19,0.76) |
||||
corr(2,3) |
0.33 |
(-0.00,0.59) |
||||
corr(2,4) |
0.26 |
(-0.16,0.60) |
||||
corr(2,5) |
0.27 |
(-0.13,0.60) |
||||
corr(3,4) |
0.41 |
(0.11,0.64) |
||||
corr(3,5) |
0.75 |
(0.57,0.86) |
||||
corr(4,5) |
0.51 |
(0.20,0.72) |
||||
Scale |
26.24 |
(22.22,30.98) |
25.95 |
(22.62,29.76) |
26.38 |
(22.27,31.26) |
log-REML |
-645.96 |
-654.25 |
-635.93 |
|||
log-ML |
-655.61 |
-663.67 |
-645.38 |
|||
AIC |
1323.21 |
1339.34 |
1320.75 |
|||
BIC |
1341.07 |
1357.21 |
1365.40 |
> # Likelihood ratio test
> SCex.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corCompSymm(form=~1|TOWNCODE), method="ML")
> SCar.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corAR1(form=~1|TOWNCODE), method="ML")
> SCun.fit.ml <- gls(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ correlation=corSymm(form=~1|TOWNCODE), method="ML")
> anova(SCex.fit.ml, Pool.fit)
Model df AIC BIC logLik Test L.Ratio p-value
SCex.fit.ml 1 6 1323.212 1341.073 -655.6062
Pool.fit 2 5 1359.497 1374.381 -674.7487 1 vs 2 38.28505 <.0001
> anova(SCar.fit.ml, Pool.fit)
Model df AIC BIC logLik Test L.Ratio p-value
SCar.fit.ml 1 6 1339.344 1357.205 -663.6721
Pool.fit 2 5 1359.497 1374.381 -674.7487 1 vs 2 22.15326 <.0001
> anova(SCun.fit.ml, Pool.fit)
Model df AIC BIC logLik Test L.Ratio p-value
SCun.fit.ml 1 15 1320.753 1365.404 -645.3763
Pool.fit 2 5 1359.497 1374.381 -674.7487 1 vs 2 58.74476 <.0001
Finally, we extend the above model to allow for heterogeneity. We consider a more general model where not only subject specific intercepts, but also subject-specific slopes are incorporated in the linear model as
with explanatory matrix
and subject-specific parameters . The temporal correlation is allowed through the assumption Var . This is known as the fixed-effects linear longitudinal data model. The GLS of parameters can be shown as
And
with
The above model can also be easily implemented using gls() by modifying the R code. For example, in the special case of , the model reduces to the subject-specific intercept model with serial correlation. One could simply add factor(TOWNCODE) in the SCar.fit.
Consider the linear longitudinal data model
Instead of treating as fixed parameters, another approach to study heterogeneity is to view as random variables. This model, containing fixed effects parameter A and random effects , is known as the Linear Mixed-Effects Model (LMM). In its most general form, we assume that and positive definite matrix. Furthermore, the subject effects and error term are assumed to be uncorrelated, that is, . Under these assumptions, the variance of each subject can be expressed as
where vector determines the covariance matrix.
For inference purposes, the GLS estimator of population parameter β is
and its variance is
Similar to the fixed-effects model, it is easy to show that the MLE under multivariate normality is the same as the GLS estimators ofβ . For feasible estimates, we discuss likelihood- based methods for the estimation of variance components. Using GLS, the concentrated log-likelihood function is shown as
Viewing as a function of , one can maximize the log-likelihood with respect to . This can be done using either Newton-Raphson or the Fisher scoring method. As in the OLS regression, the MLEs of variance component are biased downward. To mitigate the bias, one could employ restricted maximum likelihood by modifying the concentrated log-likelihood function:
Now we examine the so-called error components model (or, random intercept model), a special case that is important in actuarial science where . See Sections 15.3 and 15.3.2 for more examples of this specification. The model becomes
The model has the same presentation as the basic fixed-effects model and assumes no serial correlation within each subject. The difference is that the subject-specific intercept . is assumed to be random with zero mean and variance . The error components model corresponds to the random sampling scheme where subjects consist of a random subset from a population. One can show that the variance of subject i is
where is a matrix with all elements equal to one, is a -dimensional identity matrix, and Thus, the error components model is equivalent to the model with exchangeable serial correlation.
We implement the error components model EC.fit using function lme() in the nlme package. The argument random is used to specify the random effects in the mixed-effects model. Comparing with Table 15.2, we notice that estimates of β are the same as the model with the exchangeable serial correlation. The default uses the REML to estimate model parameters. The confidence intervals of variance components are calculated in a similar way as for models with serial correlation (see Section 15.2.3) and can be called by function intervals().
> library(nlme)
> # Error-components model
> EC.fit <- lme(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn, random=~1|TOWNCODE)
> summary(EC.fit)
Linear mixed-effects model fit by REML Data: AutoClaimIn
AIC BIC logLik
1303.913 1321.606 -645.9566
Random effects:
Formula: ~1 | TOWNCODE
(Intercept) Residual StdDev: 18.05746 19.03756
StdDev: 18.05746 19.03756
Fixed effects: AC ~ lnPCI + lnPPSM + YEAR
Value Std.Error DF t-value p-value
(Intercept) 887.8878 206.81071 113 4.293239 0e+00
lnPCI -91.1979 20.41210 113 -4.467833 0e+00
lnPPSM 21.9614 5.07913 113 4.323844 0e+00
YEAR 3.9119 1.14457 113 3.417801 9e-04
Correlation:
(Intr) lnPCI lnPPSM
lnPCI -0.988
lnPPSM -0.249 0.096
YEAR 0.197 -0.205 -0.082
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.53017784 -0.61089180 0.01099886 0.50082006 2.91907172
Number of Observations: 145
Number of Groups: 29
> intervals(EC.fit, which="var-cov")
Approximate 95% confidence intervals
Random Effects:
Level: TOWNCODE
lower est. upper
sd((Intercept)) 12.93758 18.05746 25.20347
Within-group standard error:
lower est . upper
16.72928 19.03756 21.66434
A relevant question to ask is whether the subject-specific effects are significant or the intercepts take a common value. Because is random, we wish to test the null hypothesis Ho : . We consider the following procedure:
In our example, the test statistic is equal to 56.82 and thus we reject the null hypothesis of constant intercept.
> # Pooling test
> tcode = unique(AutoClaimIn$TOWNCODE)
> n = length(tcode)
> N = nrow(AutoClaimIn)
> T <- rep(NA,n)
> s <- rep(NA,n)
> for (i in 1:n){
+ T[i] <- nrow(subset(AutoClaimIn,TOWNCODE==tcode[i]))
+ s[i] <- (sum(subset(AutoClaimIn,TOWNCODE==tcode[i])$rPool)~2 +
- sum(subset(AutoClaimIn,TOWNCODE==tcode[i])$rPool~2))/T [i]/(T[i]-1)
+}
> TS <- (sum(s*sqrt(T*(T-1)))*N/sum(AutoClaimIn$rPool~2))~2/2/n
> TS
[1] 56.85278
To implement the mixed-effects model, one could use correlation in the lme() function to specify serial correlation. For example, in the model RE.fit, we use update() to include AR(1) temporal correlation in the error components model. Here we see that with subject-specific intercept, the serial correlation (-0.014) is not significant. The function getVarCov() can be used to output the variance-covariance matrix. The argument type="conditional" provides the estimate of Ri and the argument type="marginal" provides the estimate of Vi. We further perform a likelihood ratio test to test for the serial correlation using anova. Consistently,the large p-value does not show support for serial correlation in the error components model. Note: we use method="ML" to get the true og-likelihood value for this test.
# Error component with AR1
RE.fit <- update(EC.fit, correlation=corAR1(form=~1|TOWNCODE))
summary(RE.fit)
Linear mixed-effects model fit by REML
Data: AutoClaimIn
AIC BIC logLik
1305.897 1326.538 -645.9484
Random effects:
Formula: ~1 | TOWNCODE
(Intercept) Residual
StdDev: 18.10974 18.9826
Correlation Structure: AR(1)
Formula: ~1 | TOWNCODE
Parameter estimate(s):
Phi
-0.01444735
Fixed effects: AC ~ lnPCI + lnPPSM + YEAR
Value Std.Error DF t-value p-value
(Intercept) 887.8789 206.74423 113 4.294577 0e+00
lnPCI -91.2038 20.40536 113 -4.469601 0e+00
lnPPSM 21.9669 5.07795 113 4.325938 0e+00
YEAR 3.9237 1.13499 113 3.457055 8e-04
Correlation:
(Intr) lnPCI lnPPSM
lnPCI -0.988
lnPPSM -0.249 0.096
YEAR 0.198 -0.207 -0.082
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.55033919 -0.60887177 0.02008323 0.49759528 2.91281638
Number of Observations: 145 Number of Groups: 29
> intervals(RE.fit, which="var-cov")
Approximate 95% confidence intervals
Random Effects:
Level: TOWNCODE
lower est. upper
sd((Intercept)) 12.96079 18.10974 25.30422
Correlation structure:
lower est. upper
Phi -0.2431935 -0.01444735 0.215821
attr(,"label")
[1] "Correlation structure:"
Within-group standard error:
lower est. upper
16.55969 18.98260 21.76003
> # Get variance components
> getVarCov(RE.fit)
Random effects variance covariance matrix
(Intercept)
(Intercept) 327.96
Standard Deviations: 18.11
> getVarCov(RE.fit, type="conditional")
TOWNCODE 10
Conditional variance covariance matrix
1 2 3 4 5
1 3.6034e+02 -5.2059000 0.075212 -0.0010866 1.5699e-05
2 -5.2059e+00 360.3400000 -5.205900 0.0752120 -1.0866e-03
3 7.5212e-02 -5.2059000 360.340000 -5.2059000 7.5212e-02
4 -1.0866e-03 0.0752120 -5.205900 360.3400000 -5.2059e+00
5 1.5699e-05 -0.0010866 0.075212 -5.2059000 3.6034e+02
Standard Deviations: 18.983 18.983 18.983 18.983 18.983
> getVarCov(RE.fit, type="marginal")
TOWNCODE 10
Marginal variance covariance matrix
1 2 3 4 5
1 688.30 322.76 328.04 327.96 327.96
2 322.76 688.30 322.76 328.04 327.96
3 328.04 322.76 688.30 322.76 328.04
4 327.96 328.04 322.76 688.30 322.76
5 327.96 327.96 328.04 322.76 688.30
Standard Deviations: 26.236 26.236 26.236 26.236 26.236
> # Likelihood ratio test
> EC.fit.ml <- lme(AC ~ lnPCI+lnPPSM+YEAR, data=AutoClaimIn,
+ random="1|TOWNCODE, method="ML")
> RE.fit.ml <- update(EC.fit, correlation=corAR1(form=~1|TOWNCODE), method="ML")
> anova(EC.fit.ml, RE.fit.ml)
Model df AIC BIC logLik Test L.Ratio p-value
EC.fit.ml 1 6 1323.212 1341.073 -655.6062
RE.fit.ml 2 7 1325.171 1346.009 -655.5857 1 vs 2 0.04087198 0.8398
We conclude this section with the Hausman test. We have discussed the linear fixed- effects panel data model and the linear mixed-effects model. Both allow for subject specific heterogeneity but with different assumptions. An interesting question is how to choose from the two classes, that is, whether to treat as fixed or random. A possible solution is to refer to the Hausman test (see Hausman (1978)) with test statistic given by
where β FE and β GLS denote the fixed-effects estimator and the random-effects estimator, respectively. We compare the test statistic with a quantile of a (q). A large value supports the fixed-effects estimator. As an example, we compare the basic fixed-effects model with the error components model. The test statistic's observed value is 3.97, supporting the error components formulation.
> # Hausman test
> Var.FE <- vcov(FE.fit)[-(1:n),-(1:n)]
> Var.EC <- vcov(EC.fit)[-1,-1]
> beta.FE <- coef(FE.fit)[-(1:n)]
> beta.EC <- fixef(EC.fit)[-1]
> ChiSq <- t(beta.FE-beta.EC)°/o*°/0solve(Var.FE-Var.EC)°/o*°/o(beta.FE-beta.EC)
> ChiSq
[,1]
[1,] 3.970489
This section reviews prediction for longitudinal data mixed-effects models (as discussed in Section 15.2.4). In previous sections, we discussed the estimation and inference of fixed parameters β in the model. It is also of interest to summarize the subject-specific effects described by random variable . For example, in credibility theory, one is interested in the prediction of expected claims for a policyholder given his risk class. In doing so, we develop the best linear unbiased predictor (BLUP) of a random variable. Predictors are said to be linear if they are formed from a linear combination of the response and the BLUPs are constructed by minimizing the mean square error.
In a linear mixed-effects model where we have , we wish to predict a random variable s with and . Let to be the generalized least squares estimator of β , then the BLUP of is
and the mean squared error is
For example, consider a special case ,a linear combination of population parameters and subject-specific effects. Using the above relation, we can show that
Taking w2 = 0, we further have the BLUP of :
Another special case that is useful for diagnostics is the residual In this case, we
have c = 0 and its BLUP is straightforwardly shown as
Some special cases of BLUPs are available in package nlme. For the example of the error- components model EC.fit, function ranef() could be used to get the BLUP of random intercept a*,BLUP, and function residuals() could be used get the BLUP of residuals ljt,BLUp and its standardized version.
> # BLUP
> alpha.BLUP <- ranef(EC.fit)
> beta.GLS <- fixef(EC.fit)
> resid.BLUP <- residuals(EC.fit, type="response")
> rstandard.BLUP <- residuals(EC.fit, type="normalized")
> alpha.BLUP
(Intercept)
10 -0.2049993
11 -6.9197373
12 17.7349235
13 20.9538588
14 -0.1942180
15 -5.6464625
et cetera
To conclude this section, we compare the performance of alternative models using the data of automobile insurance. Our interest is to predict the expected claims of each policyholder in the next year. So the quantity of interest is . The corresponding BLUP is
Recall that we developed various longitudinal data models using data of years 1993-1997, and use the data of year 1998 to validate the prediction. Table 15.3 presents the performance of various longitudinal data models based on both in-sample and out-of-sample data. For in-sample data, we report the information- based model selection criteria AIC and BIC. For out-of-sample, we report the sum of squared prediction error (SSPE) and the sum of absolute prediction error (SAPE). The results show that models that account for subject-specific effects perform better, regardless of the way that heterogeneity is accommodated.
> # Use data of year 1998 for validation
> AutoClaimOut <- subset(AutoClaim, YEAR == 1998)
> # Define new variables
> AutoClaimOut$lnPCI <- log(AutoClaimOut$PCI)
> AutoClaimOut$lnPPSM <- log(AutoClaimOut$PPSM)
> AutoClaimOut$YEAR <- AutoClaimOut$YEAR-1992
> # Compare models Pool.fit, SCar.fit, FE.fit, EC.fit, RE.fit and FEar.fit
> # Fixed-effects model with AR(1)
> FEar.fit <- gls(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1,
+ data=AutoClaimIn, correlation=corAR1(form=~1|TOWNCODE))
> FEar.fit.ml <- gls(AC ~ factor(TOWNCODE)+lnPCI+lnPPSM+YEAR-1,
+ data=AutoClaimIn, correlation=corAR1(form=~1|TOWNCODE), method="ML")
# Prediction
> Xmat <- cbind(rep(1,nrow(AutoClaimOut)),AutoClaimOut$lnPCI,
+ AutoClaimOut$lnPPSM,AutoClaimOut$YEAR)
beta.Pool <- coef(Pool.fit) pred.Pool <- Xmat%*%beta.Pool
MSPE.Pool <- sum((pred.Pool - AutoClaimOut$AC)"2)
MAPE.Pool <- sum(abs(pred.Pool - AutoClaimOut$AC))
beta.SCar <- coef(SCar.fit) pred.SCar <- Xmat%*%beta.SCar
MSPE.SCar <- sum((pred.SCar - AutoClaimOut$AC)~2)
MAPE.SCar <- sum(abs(pred.SCar - AutoClaimOut$AC))
beta.FE <- coef(FE.fit)[-(1:29)]
pred.FE <- coef(FE.fit)[1:29] + Xmat[,-1]%*%beta.FE MSPE.FE <- sum((pred.FE - AutoClaimOut$AC)"2)
MAPE.FE <- sum(abs(pred.FE - AutoClaimOut$AC))
beta.FEar <- coef(FEar.fit)[-(1:29)]
pred.FEar <- coef(FEar.fit)[1:29] + Xmat[,-1]%*%beta.FEar MSPE.FEar <- sum((pred.FEar - AutoClaimOut$AC)"2)
MAPE.FEar <- sum(abs(pred.FEar - AutoClaimOut$AC))
alpha.EC <- ranef(EC.fit)
beta.EC <- fixef(EC.fit)
pred.EC <- alpha.EC+Xmat%*%beta.EC
MSPE.EC <- sum((pred.EC - AutoClaimOut$AC)~2)
MAPE.EC <- sum(abs(pred.EC - AutoClaimOut$AC))
alpha.RE <- ranef(RE.fit)
beta.RE <- fixef(RE.fit)
pred.RE <- alpha.RE+Xmat%*%beta.RE
MSPE.RE <- sum((pred.RE - AutoClaimOut$AC)"2)
MAPE.RE <- sum(abs(pred.RE - AutoClaimOut$AC))
Comparison of alternative models.
In-Sample |
Out-of-Sample |
|||
AIC |
BIC |
SSPE |
SAPE |
|
Pooled cross-sectional model |
1359.50 |
1374.38 |
22201.78 |
681.25 |
Pooled cross-sectional with AR(1) |
1339.34 |
1357.21 |
21242.64 |
658.98 |
Fixed-effects model |
1293.33 |
1391.56 |
21506.07 |
660.59 |
Fixed-effects with AR(1) |
1286.03 |
1387.24 |
21573.79 |
662.04 |
Error-components model |
1323.21 |
1341.07 |
19515.86 |
619.44 |
Error-components with AR(1) |
1325.17 |
1346.01 |
19572.94 |
620.64 |
As in the previous section, we have a dataset at our disposal consisting of n subjects, where for each subject i, observations are available. Relevant examples in experience rating are (among others) a dataset with n policyholders followed over time, and for which claim counts and severities are registered during each time period under consideration. As explained in Section 15.1 and demonstrated in Section 15.2 for linear models, we extend the GLMs discussed in Chapter 14 by including subject- (or, policyholder-) specific random effects. The random effects structure correlation between observations registered on the same subject, and also take heterogeneity among subjects, due to unobserved characteristics, into account. Therefore, our approach is in line with the random effects approach discussed in Section 15.2.4. Other methods exist for the analysis of longitudinal data in the framework of generalized linear models (the so-called marginal and conditional models; see Verbeke & Molenberghs (2000) and Antonio & Zhang (2014) for a discussion), but those will not be covered here.
Given the vector with the random effects for subject i, the repeated measurements are assumed to be independent with a density from the exponential family
Some explicit examples follow in the illustrations discussed below. Similar to expressions obtained in Chapter 14, the following (conditional) relations hold:
where . As before, g(.) is called the link and V() is the variance function. β (p x 1) denotes the fixed-effects parameter vector (governing a priori rating) and the random-effects vector. and contain subject i's covariate information for the fixed and random effects, respectively. The specification of the GLMM is completed by assuming that the random effects, , are mutually independent and identically distributed with a density function . Herewith, v denotes the unknown parameters in the density. In general statistics, the random effects often have a (multivariate) normal distribution with zero mean and covariance matrix determined by v. Observations on the same subject are dependent because they share the same random effects .
The likelihood function for the unknown parameters then becomes
where and the integral is with respect to the q-dimensional vector. For instance, with normally distributed data and random effects (our setting in Section 15.2), the integral can be worked out analytically and explicit expressions follow for the maximum likelihood estimator of β and the Best Linear Unbiased Predictor ('BLUP') for . For more general GLMMs, however, approximations to the likelihood or numerical integration techniques are required to maximize Equation (15.15) with respect to the unknown parameters. Such techniques are discussed (and demonstrated) in Antonio & Zhang (2014) (and references therein).
To illustrate the concepts described above, we now consider a Poisson GLMM with normally distributed random intercept, that is, a Poisson error components model. This GLMM allows explicit calculation of the marginal mean and covariance matrix. In this way, one can clearly see how the inclusion of the random effect leads to overdispersion and within-subject covariance.
Example 15.1 (A Poisson GLMM) Let denote the claim frequency registered in year t for policyholder i. Assume that, conditional on ; follows a Poisson distribution with mean
Straightforward calculations lead to
and
Hereby, we used the expressions for the mean and variance of a log-normal distribution. In the expression for the covariance, we used the fact that, given the random effect and are independent. We see that the expression inside the parentheses in Equation (15.16) is always bigger than 1. Thus, although follows a regular Poisson distribution, the marginal distribution of Nit is overdispersed. According to Equation (15.17), due to the random intercept, observations on the same subject are no longer independent.
Example 15.2 (A Poisson GLMM — continued) Let again denote the claim frequency for policyholder i in year t. Assume that, conditional on follows a Poisson distribution with mean and that This re-parameterization is commonly used in ratemaking. Indeed, we now get
and
This specification shows that the a priori premium, given by ,is correct on the average. The a posteriori correction to this premium is determined by Besides the log-normal distribution from the above examples, other mixing distributions can be used. In the Poisson-Gamma framework, for instance, the conjugacy of these distributions allows for explicit calculation of the predictive premium. Example 15.3 (A Poisson-Gamma rating model).
It follows that and the resulting joint, unconditional distribution then becomes
with For the specification in Equation (15.20), the posterior distribution of the random intercept b* has again a Gamma distribution with
The (conditional) mean and variance of this posterior distribution are given, respectively, by
This leads to the following a posteriori premium
The above credibility premium is optimal when a quadratic loss function is used. Indeed, as is known in mathematical statistics, the conditional expectation minimizes a mean squared error criterion.
Experience rating based on multilevel (panel or higher order) models poses a challenge to the insurer when it comes to communicating the predictive results of these models to the policyholders. Customers may find it difficult to understand. It is not readily transparent to an ordinary policyholder how the surcharges (maluses) for reported claims and the discounts (bonuses) for claim-free periods are evaluated. In order to establish an experience rating system where insureds can easily understand the effect of reported claims or periods without claims, Bonus-Malus scales have been developed. We develop a case study (using R) of such scales in Section 15.3.2.
We now demonstrate how the statistical models from Section 15.3 allow us to develop a specific type of experience rating system, namely a Bonus-Malus ([BM]) scale. This type of experience rating is very common in motor (or vehicle) insurance. See Lemaire (1984) and Denuit et al. (2007) for detailed discussions. In a BM scale, an a priori tariff is adjusted based on the claim history of a policyholder. A “good” history will create a bonus, and therefore premium reduction. A 'bad' performance causes a malus, and penalizes the policyholder by a premium increase. We closely follow Denuit et al. (2007) in this section, and extend the discussion in Antonio & Valdez (2012) with an implementation in R of a simple BM scale.
Experience rating with a BM scale is appealing from a commercial and communication point of view. An insurer can easily explain to a customer how his claims reported in year t will change the premium applicable to year t +1 for automobile insurance. To discuss the probabilistic, statistical, as well as computational aspects of Bonus-Malus scales, a credibility model similar to the one in Example 15.3 is assumed. Let denote the number of claims registered for policyholder i in year t. Our credibility model is structured as follows:
A BM scale consists of a certain number of levels, say s +1, that are numbered from 0,..., s, with 0 being the best scale. Let be the entrance level of a new driver. According to the number of claims reported during the insured period, drivers will move up and down the scale. A claim-free year results in a bonus point, which implies that the driver goes one level down. Claims are penalized by malus points, meaning that for each claim filed, the driver goes up a certain number of levels, denoted with pen (for penalty). We introduce a set of random variables that allows us to describe the technicalities of a BM scale. Lk represents the level occupied by the driver in the time interval . Thus, takes a value in is the driver's trajectory over time. With s the number of claims reported by the insured in the period (k — 1, k), the future level of an insured Lk is obtained from the present level and the number of claims reported during the present year Nk. We recognize the so-called Markov property: the future depends on the present but not on the past. The relativity associated with each level in the scale determines the premium discount/penalty awarded to the driver. A policyholder who has at present
Transitions in the (-1/top scale) BM system.
Starting Level |
Level 0 Claim |
Occupied if > 1 is Reported |
0 |
0 |
5 |
1 |
0 |
5 |
2 |
1 |
5 |
3 |
2 |
5 |
4 |
3 |
5 |
5 |
4 |
5 |
a priori premium (determined using the techniques from Chapter 14) and is in scale , has to pay . With the driver receives a discount based on a favorable record of past claims. When the driver is penalized for his past performance. The relativities, together with the transition rules in the scale, are the commercial alternative for the credibility-type corrections to an a priori tariff, as discussed above. We want to demonstrate in this section the calculation of these relativities for a given portfolio and BM scale.
Example 15.3 (-1/Top Scale) We consider a very simple example of a BM scale to illustrate the concepts : the (-1/Top Scale). See Denuit et al. (2007) for more realistic examples. This scale has six levels, numbered 0,1,... ,5. Starting class is level 5. Each claim-free year is rewarded by one bonus class. When an accident is reported, the policyholder is transferred to scale 5. Table 15.4 represents these transitions.
To enable the calculation of the relativity corresponding with each level £, some probabilistic concepts associated with BM scales must be introduced. The transition rules corresponding with a certain BM scale are indicator variables such that
1 if the policy transfers from i to j when k claims are reported, . tin (k) = < (15.25)
We define the transition matrix T(k), with k the number of claims reported by the driver,
Where
Thus, this matrix is a 0 - 1 matrix and each row has exactly one 1.
Assuming s are independent and P(θ) distributed, the trajectory this driver follows through the scale will be represented as The transition probability of this driver go from level to in a single step is
where we used the independence of and Lk. In matrix form, the one-step transition matrix P(θ) is given by
The probability of being transferred from level i to level j in n steps is expressed by the n-step transition probability
which composes the n-step transition matrix
The following relation holds between the 1 and n-step transition matrices:
Ultimately, the BM system will stabilize and the proportion of policyholders occupying each level of the scale will remain unchanged. These proportions are captured in the stationary distribution, which are defined as
Correspondingly, converges to defined as
For the BM scale introduced in Illustration 15.3 the transition and one-step probability
In R, we specify this one-step transition matrix P as follows:
Pmatrix =
function(th) {
P = matrix(nrow=6,ncol=6,data=0)
P[1,1]=P[2,1]=P[3,2]=P[4,3]=P[5,4]=P[6,5]= exp(-th)
P[,6] = 1-exp(-th)
return(P)}
Using a result from Rolski et al. (1999) (also see Denuit et al. (2007)), the stationary distribution can be obtained as n with E the matrix with all entries 1. For the (— 1/Top Scale), this results in
We specify the stationary distribution of the (— 1/Top Scale) in R:
lim.distr = function(matrix) {
et = matrix(nrow=1, ncol=dim(matrix)[2], data=1)
E = matrix(nrow=dim(matrix)[1], ncol=dim(matrix)[2], data=1)
mat = diag(dim(matrix)[1]) - matrix + E
inverse.mat = solve(mat)
p = et inverse.mat
return(p)}
For instance, with 0 = 0.1 (as in the example of Denuit et al. (2007), page 180, Example 4.9), the stationary distribution becomes
In R, we use the following instructions:
> P = Pmatrix(0.1)
> P
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.9048374 0.0000000 0.0000000 0.0000000 0.0000000 0.09516258
[2,] 0.9048374 0.0000000 0.0000000 0.0000000 0.0000000 0.09516258
[3,] 0.0000000 0.9048374 0.0000000 0.0000000 0.0000000 0.09516258
[4,] 0.0000000 0.0000000 0.9048374 0.0000000 0.0000000 0.09516258
[5,] 0.0000000 0.0000000 0.0000000 0.9048374 0.0000000 0.09516258
[6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.9048374 0.09516258
> pi = lim.distr(P)
> pi
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.6065307 0.06378939 0.07049817 0.07791253 0.08610666 0.09516258
The calculation of the relativities in a BM scale reveals some similarities with explicit credibility-type calculations. Following Norberg (1976) with the number of levels and transition rules being fixed, the optimal relativitys , corresponding with level , is determined by maximizing the asymptotic predictive accuracy. This implies that one tries to minimize
the difference between the relativity rL and the “true” relative premium , under the assumptions of our credibility model. Simplifying the notation in this model, the a priori premium of a random policyholder is denoted with and the residual effect of unknown risk characteristics with . The policyholder then has (unknown) annual expected claim frequency , where and are assumed to be independent. The weights of different risk classes follow from the a priori system with
Calculation of the 's goes as follows:
where = In the last step of the derivation, conditioning is on Λ. It is straightforward to obtain the optimal relativities by solving
Alternatively, from mathematical statistics it is well known that for a quadratic loss function
(see Equation (15.39)) the optimal This is calculated as follows:
where the relation is used. The optimal relativities are given by
When no a priori rating system is used, all the 's are equal (estimated by ) and the relativities reduce to
Calculation of these relativities in R goes as follows. We replicate Example 4.11 from Denuit et al. (2007) where no a priori rating is used. This example uses a Γ(a, a) distribution for the policyholder-specific random effect (as in Illustration 46), with = 0.888 and = 0.1474. Those estimates are obtained by calibrating a Negative Binomial distribution on the data from Portfolio A in Denuit et al. (2007) (see Section 1.6, pages 44-45 , in the book). Data in Portfolio A are the claim counts registered on 14,505 policies during calendar year 1997.
### Without a priori ratemaking
a.hat = 0.8888
lambda.hat = 0.1474
inti =
function(theta, s, a, lambda) {
a = a.hat
lambda = lambda.hat
f.dist = gamma(a)~(-1) * a~a * theta~(a-1) * exp(-a*theta)
p = lim.distr(Pmatrix((lambda*theta)))
return(theta*p[1,s+1]*f.dist)}
P1 = matrix(nrow=1, ncol=6, data=0)
for (i in 0:5) P1[1,i+1] = integrate(Vectorize(int1),lower=0,upper=Inf,s=i)$value
int2 =
function(theta, s, a, lambda) {
a = a.hat
lambda = lambda.hat
f.dist = gamma(a)~(-1) * a~a * theta~(a-1) * exp(-a*theta)
p = lim.distr(Pmatrix((lambda*theta)))
return(p[1,s+1]*f.dist)}
P2 = matrix(nrow=1, ncol=6, data=0)
for (i in 0:5) P2[1,i+1] = integrate(Vectorize(int2),lower=0,upper=Inf,s=i)$value R = P1 / P2
> R # relativities without a priori rating
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.5466848 1.21958 1.348203 1.507254 1.709032 1.973534
To demonstrate the calculation of relativities when accounting for a priori rating, we use the Portfolio A data from Denuit et al. (2007) again with the k's and k's printed in Table 2.7 (page 91) of the book. k is the a priori annually expected claim frequency for risk class k, as determined by a set of a priori observed risk factors. The selection of risk factors and estimated annual claim frequencies are obtained by fitting a Negative Binomial regression model to the Portfolio A data. Negative Binomial regression for a single year of data on observed claim counts, say kj with i = 1... ,N is based on the following likelihood
where (with dj the exposure registered for policyholder i). Negative Binomial regression is available in R from the glm.nb() function.
lambda = c(0.1176,0.1408,0.1897,0.2272,0.1457,0.1746,0.2351,0.2816,
0.1761,0.2109,0.2840,0.3402,0.2182,0.2614,0.3520,0.0928,
0.1112,0.1498,0.1794,0.1151,0.1378,0.1856,0.2223)
weights = c(0.1049,0.1396,0.0398,0.0705,0.0076,0.0122,0.0013,0.0014,
0.0293,0.0299,0.0152,0.0242,0.0007,0.0009,0.0002,0.1338,
0.1973,0.0294,0.0661,0.0372,0.0517,0.0025,0.0044)
a = 1.065 n=length(weights)
int3 =
function(theta, lambda, a, l) {
p = lim.verd(Pmatrix(lambda*theta))
f.dist = gamma(a)“(-1) * a“a * theta“(a-1) * exp(-a*theta)
return(theta*p[1,l+1]*f.dist)}
int4
function(theta, lambda, a, l) {
p = lim.verd(Pmatrix(lambda*theta))
f.dist = gamma(a)“(-1) * a“a * theta“(a-1) * exp(-a*theta)
return(p[1,l+1]*f.dist)}
teller1 = teller2 = noemer = array(dim=6, data=0)
result1 = result2 = array(dim=6, data=0)
for (i in 0:5) {
b = c = array(dim=n,data=0)
for (j in 1:n) {
b[j] = integrate(Vectorize(int3),lower=0, upper=Inf,lambda=lambda[j],a=a,l=i)$value
c[j] = integrate(Vectorize(int4),lower=0, upper=Inf,lambda=lambda[j],a=a,l=i)$value}
teller1[i+1] = b %*% weights
noemer[i+1] = c %*% weights
R = teller1/noemer
}
> R # relativities with a priori rating
[1] 0.6118907 1.2088841 1.3124752 1.4388207 1.5985014 1.8123074
Summarizing, we obtain the relativities displayed in Table 15.5 (with and without a priori rating) for the (—1/Top Scale) and the Portfolio A data from Denuit et al. (2007). The a posteriori corrections are less severe when a priori rating is taken into account.
Numerical characteristics for the (— 1/top scale) and portfolio A data from Denuit et al. (2007), without and with a priori rating taken into account.
rℓ = E[©L = ℓ] |
rℓ = E[©L = ℓ] |
|
Level ℓ |
without a priori |
with a priori |
5 |
197.3% |
181.2% |
4 |
170.9% |
159.9% |
3 |
150.7% |
143.9% |
2 |
134.8 % |
131.3% |
1 |
122.0% |
120.9% |
0 |
54.7% |
61.2% |
18.226.187.233