In this and Chapter 18, we consider a model-based approach to the solution of the credibility problem. This approach, referred to as greatest accuracy credibility theory, is the outgrowth of a classic 1967 paper by Bühlmann [19]. Many of the ideas are also found in Whitney [131] and Bailey [9].
We return to the basic problem. For a particular policyholder, we have observed n exposure units of past claims . We have a manual rate (we no longer use M for the manual rate) applicable to this policyholder, but the past experience indicates that it may not be appropriate (, as well as , could be quite different from ). This difference raises the question of whether next year's net premium (per exposure unit) should be based on , on , or on a combination of the two.
The insurer needs to consider the following question: Is the policyholder really different from what has been assumed in the calculation of , or is it just random chance that is responsible for the difference between and ?
While it is difficult to definitively answer that question, it is clear that no underwriting system is perfect. The manual rate has presumably been obtained by (a) evaluation of the underwriting characteristics of the policyholder and (b) assignment of the rate on the basis of inclusion of the policyholder in a rating class. Such a class should include risks with similar underwriting characteristics. In other words, the rating class is viewed as homogeneous with respect to the underwriting characteristics used. Surely, not all risks in the class are truly homogeneous, however. No matter how detailed the underwriting procedure, there still remains some heterogeneity with respect to risk characteristics within the rating class (good and bad risks, relatively speaking).
Thus, it is possible that the given policyholder may be different from what has been assumed. If this is the case, how should an appropriate rate for the policyholder be determined?
To proceed, let us assume that the risk level of each policyholder in the rating class may be characterized by a risk parameter (possibly vector valued), but that the value of varies by policyholder. This assumption allows us to quantify the differences between policyholders with respect to the risk characteristics. Because all observable underwriting characteristics have already been used, may be viewed as representative of the residual, unobserved factors that affect the risk level. Consequently, we shall assume the existence of , but we shall further assume that it is not observable and that we can never know its true value.
Because varies by policyholder, there is a probability distribution with pf of these values across the rating class. Thus, if is a scalar parameter, the cumulative distribution function may be interpreted as the proportion of policyholders in the rating class with risk parameter less than or equal to . (In statistical terms, is a random variable with distribution function ) Stated another way, represents the probability that a policyholder picked at random from the rating class has a risk parameter less than or equal to (to accommodate the possibility of new insureds, we slightly generalize the “rating class” interpretation to include the population of all potential risks, whether insured or not).
While the value associated with an individual policyholder is not (and cannot be) known, we assume (for this chapter) that is known. That is, the structure of the risk characteristics within the population is known. This assumption can be relaxed, and we shall decide later how to estimate the relevant characteristics of .
Because risk levels vary within the population, it is clear that the experience of the policyholder varies in a systematic way with . Imagine that the experience of a policyholder picked (at random) from the population arises from a two-stage process. First, the risk parameter is selected from the distribution . Then the claims or losses X arise from the conditional distribution of X given , . Thus the experience varies with via the distribution given the risk parameter . The distribution of claims thus differs from policyholder to policyholder to reflect the differences in the risk parameters.
The formulation of the problem just presented involves the use of conditional distributions, given the risk parameter of the insured. Subsequent analyses of mathematical models of this nature will be seen to require a good working knowledge of conditional distributions and conditional expectation. A discussion of these topics is now presented.
Much of the material is of a review nature and, hence, may be quickly glossed over if you have a good background in probability. Nevertheless, there may be some material not seen before, and so this section should not be completely ignored.
Suppose that X and Y are two random variables with joint probability function (pf) or probability density function (pdf)2 and marginal pfs and , respectively. The conditional pf of X given that is
If X and Y are discrete random variables, then is the conditional probability of the event under the hypothesis that . If X and Y are continuous, then may be interpreted as a definition. When X and Y are independent random variables,
and, in this case,
We observe that the conditional and marginal distributions of X are identical.
Note that
demonstrating that joint distributions may be constructed from products of conditional and marginal distributions. Because the marginal distribution of X may be obtained by integrating (or summing) y out of the joint distribution,
we find that
Formula (17.1) has an interesting interpretation as a mixed distribution (see Section 5.2.4). Assume that the conditional distribution is one of the usual parametric distributions, where y is the realization of a random parameter Y with distribution . Section 6.3 shows that if, given , X has a Poisson distribution with mean and has a gamma distribution, then the marginal distribution of X will be negative binomial. Also, Example 5.5 shows that if has a normal distribution with mean and variance v and has a normal distribution with mean and variance a, then the marginal distribution of X is normal with mean and variance .
Note that the roles of X and Y can be interchanged, yielding
because both sides of this equation equal the joint distribution of X and Y. Division by yields Bayes' theorem, namely
We now turn our attention to conditional expectation. Consider the conditional pf of X given that , . Clearly, this is a valid probability distribution, and its mean is denoted by
with the integral replaced by a sum in the discrete case.3 Clearly, (17.2) is a function of y, and it is often of interest to view this conditional expectation as a random variable obtained by replacing y by Y in the right-hand side of (17.2). Thus we can write instead of the left-hand side of (17.2), and so is itself a random variable because it is a function of the random variable Y. The expectation of is given by
Equation (17.3) can be proved using (17.2) as follows:
with a similar derivation in the discrete case.
It is often convenient to replace X by an arbitrary function in (17.2), yielding the more general definition
Similarly, is the conditional expectation viewed as a random variable that is a function of Y. Then, (17.3) generalizes to
To see (17.4), note that
If we choose , then its expected value, based on the conditional distribution of X given Y, is the variance of this conditional distribution,
Clearly, (17.5) is a function of the random variable Y.
It is instructive now to analyze the variance of X where X and Y are two random variables. To begin, note that (17.5) may be written as
Thus,
Also, because , we may use to obtain
Thus,
Finally, we have established the important formula
Formula (17.6) states that the variance of X is composed of the sum of two parts: the mean of the conditional variance plus the variance of the conditional mean.
Continue to assume that the distribution of the risk characteristics in the population may be represented by , and the experience of a particular policyholder with risk parameter arises from the conditional distribution of claims or losses, given .
We now return to the problem introduced in Section 16.2. That is, for a particular policyholder, we have observed , where and , and are interested in setting a rate to cover . We assume that the risk parameter associated with the policyholder is (which is unknown). Furthermore, the experience of the policyholder corresponding to different exposure periods is assumed to be independent. In statistical terms, conditional on , the claims or losses are independent (although not necessarily identically distributed).
Let have conditional pf
Note that, if the are identically distributed (conditional on ), then does not depend on j. Ideally, we are interested in the conditional distribution of , given , in order to predict the claims experience of the same policyholder (whose value of has been assumed not to have changed). If we knew , we could use . Unfortunately, we do not know , but we do know x for the same policyholder. The obvious next step is to condition on x rather than . Consequently, we calculate the conditional distribution of given , termed the predictive distribution, as defined in Chapter 13.
The predictive distribution of given is the relevant distribution for risk analysis, management, and decision making. It combines the uncertainty about the claims losses with that of the parameters associated with the risk process.
Here, we repeat the development in Chapter 13, noting that if has a discrete distribution, the integrals are replaced by sums. Because the are independent conditional on , we have
The joint distribution of X is thus the marginal distribution obtained by integrating out, that is,
Similarly, the joint distribution of is the right-hand side of (17.7) with n replaced by in the product. Finally, the conditional density of given is the joint density of divided by that of X, namely
There is a hidden mathematical structure underlying (17.8) that may be exploited. The posterior density of given X is
In other words, , and substitution in the numerator of (17.8) yields
Equation (17.10) provides the additional insight that the conditional distribution of given X may be viewed as a mixture distribution, with the mixing distribution being the posterior distribution .
The posterior distribution combines and summarizes the information about contained in the prior distribution and the likelihood, and consequently (17.10) reflects this information. As noted in Theorem 13.18, the posterior distribution admits a convenient form when the likelihood is derived from the linear exponential family and is the natural conjugate prior. When both are in place, there is an easy method to evaluate the conditional distribution of given X.
Note that the posterior distribution is of the same type (gamma) as the prior distribution. The concept of a conjugate prior distribution is introduced in Section 13.3. This result also implies that is a mixture distribution with a simple mixing distribution, facilitating evaluation of the density of . Further examples of this idea are found in the exercises at the end of this section.
To return to the original problem, we have observed for a particular policyholder and we wish to predict (or its mean). An obvious choice would be the hypothetical mean (or individual premium)
if we knew . Note that replacement of by in (17.11) yields, on taking the expectation,
so that the pure, or collective, premium is the mean of the hypothetical means. This is the premium we would use if we knew nothing about the individual. It does not depend on the individual's risk parameter, ; nor does it use x, the data collected from the individual. Because is unknown, the best we can do is try to use the data, which suggest the use of the Bayesian premium (the mean of the predictive distribution):
A computationally more convenient form is
In other words, the Bayesian premium is the expected value of the hypothetical means, with expectation taken over the posterior distribution . Recall that in the discrete case, the integrals are replaced by sums. To prove (17.13 ), we see from (17.10) that
As expected, the revised value based on two observations is between the prior value (0.475) based on no data and the value based only on the data (0.5).
Example 17.12 is one where the random variables do not have identical distributions.
In each of Examples 17.11 and 17.12, the Bayesian estimate was a weighted average of the sample mean and the pure premium . This result is appealing from a credibility standpoint. Furthermore, the credibility factor Z in each case is an increasing function of the number of exposure units. The greater the amount of past data observed, the closer Z is to 1, consistent with our intuition.
In Section 17.3, a systematic approach is suggested for treatment of the past data of a particular policyholder. Ideally, rather than the pure premium , we would like to charge the individual premium (or hypothetical mean) , where is the (hypothetical) parameter associated with the policyholder. Because is unknown, the hypothetical mean is impossible to determine, but we could instead condition on x, the past data from the policyholder. This leads to the Bayesian premium .
The major challenge with this approach is that it may be difficult to evaluate the Bayesian premium. Of course, in simple examples such as those in Section 17.3, the Bayesian premium is not difficult to evaluate numerically. But these examples can hardly be expected to capture the essential features of a realistic insurance scenario. More realistic models may well introduce analytic difficulties with respect to evaluation of , whether we use (17.12) or (17.13). Often, numerical integration may be required. There are exceptions, such as Examples 17.11 and 17.12.
We now present an alternative suggested by Bühlmann [19] in 1967. Recall the basic problem. We wish to use the conditional distribution or the hypothetical mean for estimation of next year's claims. Because we have observed x, one suggestion is to approximate by a linear function of the past data. (After all, the formula is of this form.) Thus, let us restrict ourselves to estimators of the form , where need to be chosen. To this end, we choose the s to minimize expected squared-error loss, that is,
and the expectation is over the joint distribution of and . That is, the squared error is averaged over all possible values of and all possible observations. To minimize Q, we take derivatives. Thus,
We shall denote by the values of that minimize (17.14). Then, equating to 0 yields
But , and so implies that
Equation (17.15) may be termed the unbiasedness equation because it requires that the estimate be unbiased for . However, the credibility estimate may be biased as an estimator of , the quantity we are trying to estimate. This bias will average out over the members of . By accepting this bias, we are able to reduce the overall MSE. For , we have
and setting this expression equal to zero yields
The left-hand side of this equation may be reexpressed as
where the second from last step follows by independence of and conditional on . Thus implies
Next, multiply (17.15) by and subtract from (17.16) to obtain
Equation (17.15) and the n equations (17.17) together are called the normal equations. These equations may be solved for to yield the credibility premium
While it is straightforward to express the solution to the normal equations in matrix notation (if the covariance matrix of the is nonsingular), we shall be content with solutions for some special cases.
Note that exactly one of the terms on the right-hand side of (17.17) is a variance term, that is, . The other terms are true covariance terms.
As an added bonus, the values also minimize
and
To see this, differentiate (17.19) or (17.20) with respect to and observe that the solutions still satisfy the normal equations (17.15) and (17.17). Thus the credibility premium (17.18) is the best linear estimator of each of the hypothetical mean , the Bayesian premium , and .
We now turn to some models that specify the conditional means and variances of and, hence, the means , variances , and covariances .
The simplest credibility model, the Bühlmann model, specifies that, for each policyholder (conditional on ), past losses have the same mean and variance and are i.i.d. conditional on .
Thus, define
and
As discussed previously, is referred to as the hypothetical mean, whereas is called the process variance. Define
and
The quantity in (17.21) is the expected value of the hypothetical means, v in (17.22) is the expected value of the process variance, and a in (17.23) is the variance of the hypothetical means. Note that is the estimate to use if we have no information about (and thus no information about ). It will also be referred to as the collective premium.
The mean, variance, and covariance of the may now be obtained. First,
Second,
Finally, for ,
This result is exactly of the form of Example 17.13 with parameters , and . Thus the credibility premium is
where
and
The credibility factor Z in (17.28) with k given by (17.29) is referred to as the Bühlmann credibility factor. Note that (17.27) is of the form (16.7), and (17.28) is exactly (16.8). Now, however, we know how to obtain k, namely, from (17.29).
Formula (17.27) has many appealing features. First, the credibility premium (17.27) is a weighted average of the sample mean and the collective premium , a formula we find desirable. Furthermore, Z approaches 1 as n increases, giving more credit to rather than as more past data accumulate, a feature that agrees with intuition. Also, if the population is fairly homogeneous with respect to the risk parameter , then (relatively speaking) the hypothetical means do not vary greatly with (i.e. they are close in value) and hence have small variability. Thus, a is small relative to v, that is, k is large and Z is closer to zero. This observation agrees with intuition because, for a homogeneous population, the overall mean is of more value in helping to predict next year's claims for a particular policyholder. Conversely, for a heterogeneous population, the hypothetical means are more variable, that is, a is large and k is small, and so Z is closer to 1. Again this observation makes sense because, in a heterogeneous population, the experience of other policyholders is of less value in predicting the future experience of a particular policyholder than is the past experience of that policyholder.
We now present some examples.
An alternative analysis for this problem could have started with a single observation of . From the assumptions of the problem, S has a mean of and a variance of . While it is true that S has a gamma distribution, that information is not needed because the Bühlmann approximation requires only moments. Following the preceding calculations,
The key is to note that in calculating Z the sample size is now 1, reflecting the single observation of S. Because , the Bühlmann estimate is
which is n times the previous answer. That is because we are now estimating the next value of S rather than the next value of X. However, the credibility factor itself (i.e. Z) is the same whether we are predicting or the next value of S.
The Bühlmann model is the simplest of the credibility models because it effectively requires that the past claims experience of a policyholder comprise i.i.d. components with respect to each past year. An important practical difficulty with this assumption is that it does not allow for variations in exposure or size.
For example, what if the first year's claims experience of a policyholder reflected only a portion of a year due to an unusual policyholder anniversary? What if a benefit change occurred part way through a policy year? For group insurance, what if the size of the group changed over time?
To handle these variations, we consider the following generalization of the Bühlmann model. Assume that are independent, conditional on , with common mean (as before)
but with conditional variances
where is a known constant measuring exposure. Note that need only be proportional to the size of the risk. This model would be appropriate if each were the average of independent (conditional on ) random variables each with mean and variance . In the preceding situations, could be the number of months the policy was in force in past year j, or the number of individuals in the group in past year j, or the amount of premium income for the policy in past year j.
As in the Bühlmann model, let
and
Then, for the unconditional moments, from (17.24) , and from (17.26) , but
To obtain the credibility premium (17.18), we solve the normal equations (17.15) and (17.17) to obtain . For notational convenience, define
to be the total exposure. Then, using (17.24), the unbiasedness equation (17.15) becomes
which implies
For , (17.17) becomes
which may be rewritten as
Then, using (17.30) and (17.31),
and so
As a result,
The credibility premium (17.18) becomes
where, with from (17.29),
and
Clearly, the credibility premium (17.32) is still of the form (16.7). In this case, m is the total exposure associated with the policyholder, and the Bühlmann–Straub credibility factor Z depends on m. Furthermore, is a weighted average of the , with weights proportional to . Following the group interpretation, is the average loss of the group members in year j, and so is the total loss of the group in year j. Then, is the overall average loss per group member over the n years. The credibility premium to be charged to the group in year would thus be for members in the next year.
Had we known that (17.33) would be the correct weighting of the to receive the credibility weight Z, the rest would have been easy. For the single observation , the process variance is
and so the expected process variance is . The variance of the hypothetical means is still a, and therefore . There is only one observation of , and so the credibility factor is
as before. Equation (17.33) should not have been surprising because the weights are simply inversely proportional to the (conditional) variance of each .
The assumptions underlying the Bühlmann–Straub model may be too restrictive to represent reality. In a 1967 paper, Hewitt [54] observed that large risks do not behave the same as an independent aggregation of small risks and, in fact, are more variable than would be indicated by independence. A model that reflects this observation is created in the following example.
Another generalization is provided by letting the variance of depend on the exposure, which may be reasonable if we believe that the extent to which a given risk's propensity to produce claims that differ from the mean is related to its size. For example, larger risks may be underwritten more carefully. In this case, extreme variations from the mean are less likely because we ensure that the risk not only meets the underwriting requirements but also appears to be exactly what it claims to be.
In Examples 17.15–17.17, we found that the credibility premium and the Bayesian premium are equal. From (17.19), we may view the credibility premium as the best linear approximation to the Bayesian premium in the sense of squared-error loss. In these examples, the approximation is exact because the two premiums are equal. The term exact credibility is used to describe the situation in which the credibility premium equals the Bayesian premium.
At first glance, it appears to be unnecessary to discuss the existence and finiteness of the credibility premium in this context, because exact credibility as defined is clearly not possible otherwise. However, in what follows, there are some technical issues to be considered, and their treatment is clearer if it is tacitly remembered that the credibility premium must be well defined, which requires that , , and , as is obvious from the normal equations (17.15) and (17.17). Exact credibility typically occurs in Bühlmann (and Bühlmann–Straub) situations involving linear exponential family members and their conjugate priors. It is clear that the existence of the credibility premium requires that the structural parameters , , and be finite.
Consider in this situation. Recall from (5.8) that, for the linear exponential family, the mean is
and the conjugate prior pdf is, from Theorem 13.18, given by
where the interval of support is explicitly identified. Also, for now, and k should be viewed as known parameters associated with the prior pdf . To determine , note that from (17.36) it follows that
and differentiating with respect to yields
Multiplication by results in, using (17.35),
Next, integrate both sides of (17.37) with respect to over the interval , to obtain
Therefore, it follows that
Note that, if
then
demonstrating that the choice of the symbol in (17.36) is not coincidental. If (17.40) holds, as is often the case, it is normally because both sides of (17.39) are equal to zero. Regardless, it is possible to have but . Also, may result if either or fails to be finite.
Next, consider the posterior distribution in the Bühlmann situation with
and given by (17.36). From Theorem 13.18 , the posterior pdf is
with
and
Because (17.41) is of the same form as (17.36), the Bayesian premium (17.13) is
with given by (17.43). Because is a linear function of the , the same is true of the Bayesian premium if
that is, (17.45) implies that (17.44) becomes
Clearly, for (17.45) to hold for all vectors x, both sides should be equal to zero. Also, note that (17.46) is of the form (16.7).
To summarize, posterior linearity of the Bayesian premium results (i.e. (17.46) holds) if (17.45) is true (usually with both sides equal to zero). It is instructive to note that posterior linearity of the Bayesian premium may occur even if . However, as long as the credibility premium is well defined (all three of , , and are finite), the posterior linearity of the Bayesian premium implies equality with the credibility premium, that is, exact credibility. To see this equivalence, note that, if the Bayesian premium is a linear function of , that is,
then it is clear that in (17.19) the quantity attains its minimum value of zero with for . Thus the credibility premium is , and credibility is exact.
The following example clarifies these concepts.
There is one last technical point worth noting. It was mentioned previously that the choice of the symbol as a parameter associated with the prior pdf is not a coincidence because it is often the case that . A similar comment applies to the parameter k. Because from (5.9), it follows from (17.37) and the product rule for differentiation that
Integrating with respect to over yields
and solving for k yields
If, in addition, (17.39) holds, then (17.40) holds, and (17.51) simplifies to
in turn simplifying to the well-known result if
which typically holds with both sides equal to zero.
In this section, one of the two major criticisms of limited fluctuation credibility has been addressed. Through the use of the variance of the hypothetical means, we now have a means of relating the mean of the group of interest, , to the manual, or collective, premium, . The development is also mathematically sound in that the results follow directly from a specific model and objective. We have also seen that the additional restriction of a linear solution is not as bad as it might be in that we still often obtain the exact Bayesian solution. There has subsequently been a great deal of effort expended to generalize the model. With a sound basis for obtaining a credibility premium, we have but one remaining obstacle: how to numerically estimate the quantities a and v in the Bühlmann formulation, or how to specify the prior distribution in the Bayesian formulation. Those matters are addressed in Chapter 18.
A historical review of credibility theory including a description of the limited fluctuation and greatest accuracy approaches is provided by Norberg [94]. Since the classic paper of Bühlmann [19], there has developed a vast literature on credibility theory in the actuarial literature. Other elementary introductions are given by Herzog [52] and Waters [130]. Other more advanced treatments are Goovaerts and Hoogstad [46] and Sundt [118]. An important generalization of the Bühlmann–Straub model is the Hachemeister [48] regression model, which is not discussed here. See also Klugman [71]. The material on exact credibility is motivated by Jewell [62]. See also Ericson [36]. A special issue of Insurance: Abstracts and Reviews (Sundt [117]) contains an extensive list of papers on credibility.
Suppose also that Z is binomially distributed with parameters and p independently of X. Then, is binomially distributed with parameters and p. Demonstrate that has the hypergeometric distribution.
Table 17.2 The data for Exercise 17.4.
y | |||
x | 0 | 1 | 2 |
0 | 0.20 | 0 | 0.10 |
1 | 0 | 0.15 | 0.25 |
2 | 0.05 | 0.15 | 0.10 |
Show the following:
Hence,
Table 17.3 The data for Exercise 17.9.
Urn | 0s | 1s | 2s |
1 | 0.40 | 0.35 | 0.25 |
2 | 0.25 | 0.10 | 0.65 |
3 | 0.50 | 0.15 | 0.35 |
Table 17.4 The data for Exercise 17.10.
Number of claims | Severity | |||
Type | Mean | Variance | Mean | Variance |
A | 0.2 | 0.2 | 200 | 4,000 |
B | 0.7 | 0.3 | 100 | 1,500 |
Three observations are made on a particular policyholder and we observe total claims of 200. Determine the Bühlmann credibility factor and the Bühlmann premium for this policyholder.
Let
and
where
a negative binomial pf with a known quantity.
where and are the acceptable parameter values for to be a valid pdf.
If m is a parameter, this is called the exponential dispersion family. In Exercise 5.25, it is shown that the mean of this random variable is . For this exercise, assume that m is known.
Determine the Bayesian premium.
Let , , , , and .
and
Show that and that , where and .
where has pdf .
and has the inverse Gaussian pdf from Appendix A (with replaced by ),
Define .
for .
and, hence, use Exercise 17.17(b) to describe how to calculate the Bayesian premium.
and
Determine the posterior distribution of and the predictive distribution of . Then determine the Bayesian estimate of . Finally, show that the Bayesian and Bühlmann estimates are equal.
Table 17.5 The data for Exercise 17.30.
Outcome, T | Bühlmann estimate of | Bayesian estimate of | |
1 | 1/3 | 2.72 | 2.6 |
8 | 1/3 | 7.71 | 7.8 |
12 | 1/3 | 10.57 | – |
Determine the variance of the hypothetical means .
Determine the value of the Bühlmann credibility factor Z after three observations of X.
where the expectation is taken over .
3.137.200.7