The material presented here has traditionally been presented under the heading of “survival models,” with the accompanying notion that the techniques are useful only when studying lifetime distributions. Standard texts on the subject, such as Klein and Moeschberger [70] and Lawless [77], contain examples that are exclusively oriented in that direction. However, the same problems that occur when modeling lifetime occur when modeling payment amounts. The examples we present are of both types. However, the latter sections focus on special considerations when constructing decrement models. Only a handful of references are presented, most of the results being well developed in the survival models literature. If you want more detail and proofs, consult a text dedicated to the subject, such as the ones just mentioned.
In Chapter 4, models were divided into two types – data-dependent and parametric. The definitions are repeated here.
This chapter will focus on data-dependent distributions as models, making few, if any, assumptions about the underlying distribution. To fix the most important concepts, we begin by assuming that we have a sample of n observations that are an independent and identically distributed sample from the same (unspecified) continuous distribution. This is referred to as a complete data situation. In that context, we have the following definition.
When observations are collected from a probability distribution, the ideal situation is to have the (essentially) exact1 value of each observation. This case is referred to as complete, individual data and applies to Data Set B, introduced in Chapter 10 and reproduced here as Table 14.1. There are two reasons why exact data may not be available. One is grouping, in which all that is recorded is the range of values in which the observation belongs. Grouping applies to Data Set C and to Data Set A for those with five or more accidents. There data sets were introduced in Chapter 10 and are reproduced here as Tables 14.2 and 14.3, respectively.
Table 14.1 Data Set B.
 27 | 82 | 115 | 126 | 155 | 161 | 243 | 294 | 340 | 384 |
457 | 680 | 855 | 877 | 974 | 1,193 | 1,340 | 1,884 | 2,558 | 15,743 |
Table 14.2 Data Set C.
Payment range | Number of payments |
0–7,500 | 99 |
7,500–17,500 | 42 |
17,500–32,500 | 29 |
32,500–67,500 | 28 |
67,500–125,000 | 17 |
125,000–300,000 |  9 |
Over 300,000 |  3 |
Table 14.3 Data Set A.
Number of accidents | Number of drivers |
0 | 81,714 |
1 | 11,306 |
2 |  1,618 |
3 |    250 |
4 |     40 |
5 or more |      7 |
A second reason that exact values may not be available is the presence of censoring or truncation. When data are censored from below, observations below a given value are known to be below that value but the exact value is unknown. When data are censored from above, observations above a given value are known to be above that value but the exact value is unknown. Note that censoring effectively creates grouped data. When the data are grouped in the first place, censoring has no effect. For example, the data in Data Set C may have been censored from above at 300,000, but we cannot know for sure from the data set and that knowledge has no effect on how we treat the data. In contrast, were Data Set B to be censored at 1,000, we would have 15 individual observations and then five grouped observations in the interval from 1,000 to infinity.
In insurance settings, censoring from above is fairly common. For example, if a policy pays no more than 100,000 for an accident, any time the loss is above 100,000 the actual amount will be unknown, but we will know that it happened. Note that Data Set A has been censored from above at 5. This is more common language than saying that Data Set A has some individual data and some grouped data. When studying mortality or other decrements, the study period may end with some individuals still alive. They are censored from above in that we know the death will occur sometime after their age when the study ends.
When data are truncated from below, observations below a given value are not recorded. Truncation from above implies that observations above a given value are not recorded. In insurance settings, truncation from below is fairly common. If an automobile physical damage policy has a per-claim deductible of 250, any losses below 250 will not come to the attention of the insurance company and so will not appear in any data sets. Data sets may have truncation forced on them. For example, if Data Set B were to be truncated from below at 250, the first seven observations would disappear and the remaining 13 would be unchanged. In decrement studies it is unusual to observe individuals from birth. If someone is first observed at, say, age 20, that person is from a population where anyone who died before age 20 would not have been observed and thus is truncated from below.
As noted in Definition 14.3, the empirical distribution assigns probability 1/n to each data point. That definition works well when the value of each data point is recorded. An alternative definition follows.
In the following example, not all values are distinct.
To assess the quality of the estimate, we examine statistical properties, in particular, the mean and variance. Working with the empirical estimate of the distribution function is straightforward. To see that with complete data the empirical estimator of the survival function is unbiased and consistent, recall that the empirical estimate of is , where Y is the number of observations in the sample that are less than or equal to x. Then Y must have a binomial distribution with parameters n and and
demonstrating that the estimator is unbiased. The variance is
which has a limit of zero, thus verifying consistency.
To make use of the result, the best we can do for the variance is estimate it. It is unlikely that we will know the value of , because that is the quantity we are trying to estimate. The estimated variance is given by
The same results hold for empirically estimated probabilities. Let . The empirical estimate of p is . Arguments similar to those used for verify that is unbiased and consistent, with .
For grouped data as in Data Set C, construction of the empirical distribution as defined previously is not possible. However, it is possible to approximate the empirical distribution. The strategy is to obtain values of the empirical distribution function wherever possible and then connect those values in some reasonable way. For grouped data, the distribution function is usually approximated by connecting the points with straight lines. For notation, let the group boundaries be , where often and . The number of observations falling between and is denoted , with . For such data, we are able to determine the empirical distribution at each group boundary. That is, . Note that no rule is proposed for observations that fall on a group boundary. There is no correct approach, but whatever approach is used, consistency in assignment of observations to groups should be used. Note that in Data Set C it is not possible to tell how the assignments were made. If we had that knowledge, it would not affect any subsequent calculations.2
This function is differentiable at all values except group boundaries. Therefore the density function can be obtained. To completely specify the density function, it is arbitrarily made right continuous.
Many computer programs that produce histograms actually create a bar chart with bar heights proportional to . A bar chart is acceptable if the groups have equal width, but if not, then the preceding formula is needed. The advantage of this approach is that the histogram is indeed a density function and, among other things, areas under the histogram can be used to obtain empirical probabilities.
Table 14.4 The data for Exercise 14.1.
Payment range | Number of payments |
0–25 |  6 |
25–50 | 24 |
50–75 | 30 |
75–100 | 31 |
100–150 | 57 |
150–250 | 80 |
250–500 | 85 |
500–1,000 | 54 |
1,000–2,000 | 15 |
2,000–4,000 | 10 |
Over 4,000 |  0 |
Table 14.5 The data for Exercise 14.3.
Refinances | Original | |||
Years | Number issued | Survived | Number issued | Survived |
1.5 | 42,300 | 99.97 | 12,813 | 99.88 |
2.5 | 9,756 | 99.82 | 18,787 | 99.43 |
3.5 | 1,550 | 99.03 | 22,513 | 98.81 |
4.5 | 1,256 | 98.41 | 21,420 | 98.26 |
5.5 | 1,619 | 97.78 | 26,790 | 97.45 |
Table 14.6 The data for Exercise 14.4.
Loss | Number of observations |
0–2 | 25 |
2–10 | 10 |
10–100 | 10 |
100–1,000 |  5 |
The federal government is considering funding a program that would provide 100% payment for all damages for any hurricane causing damage in excess of 5,000,000. You have been asked to make some preliminary estimates.
Table 14.7 Trended hurricane losses.
Year | Loss (103) | Year | Loss (103) | Year | Loss (103) |
1964 |  6,766 | 1964 |  40,596 | 1975 |  192,013 |
1968 |  7,123 | 1949 |  41,409 | 1972 |  198,446 |
1971 | 10,562 | 1959 |  47,905 | 1964 |  227,338 |
1956 | 14,474 | 1950 |  49,397 | 1960 |  329,511 |
1961 | 15,351 | 1954 |  52,600 | 1961 |  361,200 |
1966 | 16,983 | 1973 |  59,917 | 1969 |  421,680 |
1955 | 18,383 | 1980 |  63,123 | 1954 |  513,586 |
1958 | 19,030 | 1964 |  77,809 | 1954 |  545,778 |
1974 | 25,304 | 1955 | 102,942 | 1970 |  750,389 |
1959 | 29,112 | 1967 | 103,217 | 1979 |  863,881 |
1971 | 30,146 | 1957 | 123,680 | 1965 | 1,638,000 |
1976 | 33,727 | 1979 | 140,136 |
In this section, we generalize the empirical approach of the previous section to situations in which the data are not complete. In particular, we assume that individual observations may be right censored. We have the following definition.
In insurance claims data, the presence of a policy limit may give rise to right censored observations. When the amount of the loss equals or exceeds the limit u, benefits beyond that value are not paid, and so the exact value is typically not recorded. However, it is known that a loss of at least u has occurred.
When carrying out a study of the mortality of humans, if a person is alive when the study ends, right censoring has occurred. The person's age at death is not known, but it is known that it is at least as large as the age when the study ended. Right censoring also affects those who exit the study prior to its end due to surrender or lapse. Note that this discussion could have been about other decrements, such as disability, policy surrender, or retirement.
For this section and the next two, we assume that the underlying random variable has a continuous distribution. While data from discrete random variables can also be right censored (Data Set A is an example), the use of empirical estimators is rare and thus the development of analogous formulas is unlikely to be worth the effort.
We now make specific assumptions regarding how the data are collected and recorded. It is assumed that we have a random sample for which some (but not all) of the data are right censored. For the uncensored (i.e. completely known) observations, we will denote their k unique values by . We let denote the number of times that appears in the sample. We also set as the minimum possible value for an observation and assume that . Often, . Similarly, set as the largest observation in the data, censored or uncensored. Hence, . Our goal is to create an empirical (data-dependent) distribution that places probability at the values .
We often possess the specific value at which an observation was censored. However, for both the derivation of the estimator and its implementation, it is only necessary to know between which y-values it occurred. Thus, the only input needed is , the number of right censored observations in the interval for . We make the assumption that if an observation is censored at , then the observation is censored at (i.e. in the lifetime situation, immediately after the death). It is possible to have censored observations at values between and . However, because we are placing probability only at the uncensored values, these observations provide no information about those probabilities and so can be dropped. When referring to the sample size, n will denote the number of observations after these have been dropped. Observations censored at or above cannot be ignored. Let be the number of observations right censored at or later. Note that if , then .
The final important quantity is , referred to as the number “at risk” at . When thinking in terms of a mortality study, the risk set comprises the individuals who are under observation at that age. Included are all who die at that age or later and all who are censored at that age or later. Formally, we have the following definition.
This formula reflects the fact that the number at risk at is that at less the exact observations at and the censored observations in . Note that and hence .
The following numerical example illustrates these ideas.
It should be noted that if there is no censoring, so that for all i, then the data are complete and the techniques of Section 14.1 may be used. As such, the approach of this section may be viewed as a generalization.
We shall now present a heuristic derivation of a well-known generalization of the empirical distribution function. This estimator is referred to as either the Kaplan–Meier or the product limit estimator.
To proceed, we first present some basic facts regarding the distribution of a discrete random variable Y, say, with support on the points . Let , and then the survival function is (where means to take the sum or product over all values of i where )
Setting for , we have
and . We also have from the definition of .
Thus,
implying that . Hence,
Also, , and for ,
The heuristic derivation proceeds by viewing for as unknown parameters, and estimating them by a nonparametric “maximum likelihood” bsed argument.3 For a more detailed discussion, see Lawless [77]. For the present data, the uncensored observations at each contribute to the likelihood where and
Each of the censored observations contributes
to the likelihood (recall that for ), and the censored observations at or above each contribute .
The likelihood is formed by taking products over all contributions (assuming independence of all data points), namely
which, in terms of the , becomes
where the last line follows by interchanging the order of multiplication in each of the two double products. Thus,
Observe that and . Hence,
This likelihood has the appearance of a product of binomial likelihoods. That is, this is the same likelihood as if were realizations of k independent binomial observations with parameters and . The “maximum likelihood estimate” of is obtained by taking logarithms, namely
implying that
Equating this latter expression to zero yields .
For , the Kaplan–Meier [66] estimate of is obtained by replacing by wherever it appears. Noting that for , it follows that
This may be written more succinctly as for . When , you should interpret as .
We now discuss estimation for . First, note that if (no censored observations at ), then and for is clearly the (only) obvious choice. However, if , as in the previous example, there are no empirical data to estimate for , and tail estimates for (often called tail corrections) are needed. There are three popular extrapolations:
Note that if there is no censoring ( for all i), then , and for
In this case, is the number of observations exceeding y and . Thus, with no censoring, the Kaplan–Meier estimate reduces to the empirical estimate of the previous section.
An alternative to the Kaplan–Meier estimator, called the Nelson–Åalen estimator [1], [93], is sometimes used. To motivate the estimator, note that if is the survival function of a continuous distribution with failure rate , then is called the cumulative hazard rate function. The discrete analog is, in the present context, given by , which can intuitively be estimated by replacing by its estimate . The Nelson–Åalen estimator of is thus defined for to be
That is, for , and the Nelson–Åalen estimator of the survival function is . The notation under the summation sign indicates that values of should be included only if . For , the situation is similar to that involving the Kaplan–Meier estimate in the sense that a tail correction of the type discussed earlier needs to be employed. Note that, unlike the Kaplan–Meier estimate, , so that a tail correction is always needed.
To assess the quality of the two estimators, we will now consider estimation of the variance. Recall that for , the Kaplan–Meier estimator may be expressed as
which is a function of the . Thus, to estimate the variance of , we first need the covariance matrix of . We estimate this from the “likelihood,” using standard likelihood methods. Recall that
and thus satisfies
Thus,
and
which, with replaced by , becomes
For ,
The observed information, evaluated at the maximum likelihood estimate, is thus a diagonal matrix, which when inverted yields the estimates
and
These results also follow directly from the binomial form of the likelihood.
Returning to the problem at hand, the delta method4 gives the approximate variance of as , for an estimator .
To proceed, note that
and since the are assumed to be approximately uncorrelated,
The choice yields
implying that
Because , the delta method with yields
This yields the final version of the estimate,
Equation (14.2) holds for in all cases. However, if (that is, there are no censored observations after the last uncensored observation), then it holds for . Hence the formula always holds for .
Formula (14.2) is known as Greenwood's approximation to the variance of , and is known to often understate the true variance.
If there is no censoring, and we take , then Greenwood's approximation yields
which may be expressed (using due to no censoring) as
Because , this sum telescopes to give
which is the same estimate as obtained in Section 14.1, but derived without use of the delta method.
We remark that in the case with (i.e. ), Greenwood's approximation cannot be used to estimate the variance of . In this case, is often replaced by in the denominator.
Turning now to the Nelson–Åalen estimator, we note that
and the same reasoning used for Kaplan–Meier implies that , yielding the estimate
which is referred to as Klein's estimate. A commonly used alternative estimate due to Ă…alen is obtained by replacing with in the numerator.
We are typically more interested in than . Because , the delta method with yields Klein's survival function estimate
that is, the estimated variance is
Variance estimates for depend on the tail correction used. Efron's method gives an estimate of 0, which is not of interest in the present context. For the exponential tail correction in the Kaplan–Meier case, we have for , , and the delta method with yields
Likelihood methods typically result in approximate asymptotic normality of the estimates, and this is true for Kaplan–Meier and Nelson–Åalen estimates as well. Using the results of Example 14.9, an approximate 95% confidence interval for is given by
For , the Nelson–Åalen estimate gives a confidence interval of
whereas that based on the Kaplan–Meier estimate is
Clearly, both confidence intervals for are unsatisfactory, both including values greater than 1.
An alternative approach can be constructed as follows, using the Kaplan–Meier estimate as an example.
Let . Using the delta method, the variance of Y can be approximated as follows. The function of interest is . Its derivative is
According to the delta method,
Then, an approximate 95% confidence interval for is
Because , evaluating each endpoint of this formula provides a confidence interval for . For the upper limit, we have (where )
Similarly, the lower limit is . This interval will always be inside the range 0–1 and is referred to as a log-transformed confidence interval.
For the Nelson–Åalen estimator, a similar log-transformed confidence interval for has endpoints , where . Exponentiation of the negative of these endpoints yields a corresponding interval for .
Table 14.12 The data for Exercise 14.11.
Time | Number of deaths | Number at risk |
 5 | 2 | 15 |
 7 | 1 | 12 |
10 | 1 | 10 |
12 | 2 |  6 |
Table 14.13 The data for Exercise 14.18.
Age | Number of deaths |
 2 | 1 |
 3 | 1 |
 5 | 1 |
 7 | 2 |
10 | 1 |
12 | 2 |
13 | 1 |
14 | 1 |
Table 14.14 The data for Exercise 14.24.
i | ||||
1 |  4 | 3 | — | 40 |
2 |  6 | — | 3 | 31 |
3 |  9 | 6 | 4 | 23 |
4 | 13 | 4 | — | — |
5 | 15 | 2 | 4 | 6 |
Table 14.15 The data for Exercise 14.25.
i | ||||
1 |  3 | 3 | 6 | 50 |
2 |  5 | 7 | 4 | 41 |
3 |  7 | 5 | 2 | 30 |
4 | 11 | 5 | 3 | 23 |
5 | 16 | 6 | 4 | 15 |
6 | 20 | 2 | 3 |  5 |
where is differentiable.
Prove that if , and if , and thus that in particular.
Hint: Prove by induction on m the identity for and .
In the previous section, we focused on estimation of the survival function or, equivalently, the cumulative distribution function , of a random variable Y. In many actuarial applications, other quantities such as raw moments are of interest. Of central importance in this context is the mean, particularly for premium calculation in a loss modelling context.
For estimation of the mean with complete data , an obvious (unbiased) estimation is , but for incomplete data such as that of the previous section involving right censoring, other methods are needed. We continue to assume that we have the setting described in the previous section, and we will capitalize on the results obtained for there. To do so, we recall that, for random variables that take on only nonnegative values, the mean satisfies
and empirical estimation of may be done by replacing with an estimator such as the Kaplan–Meier estimator or the Nelson–Åalen estimator . To unify the approach, we will assume that is estimated for by the estimator given in Exercise 14.26 of Section 14.3, namely
where for the Kaplan–Meier estimator and for the Nelson–Åalen estimator. The mean is obtained by replacing with in the integrand. This yields the estimator
It is convenient to write
where
and
Anticipating what follows, we wish to evaluate for . For , we have that for . Thus
To evaluate for , recall that for and for , . Thus,
For evaluation of , note that
and also that for ,
a recursive formula, beginning with .
For the estimates themselves, , and the above formulas continue to hold when is replaced by , by , and by .
The estimate of the mean clearly depends on , which in turn depends on the tail correction, that is, on for . If for (as, for example, under Efron's tail correction), then . Under Klein and Moeschberger's method, with for , and for where ,
For the exponential tail correction of Brown, Hollander, and Korwar, for with . Thus
The following example illustrates the calculation of , where all empirical quantities are obtained by substitution of estimates.
To estimate the variance of , we note that is a function of , for which we have an estimate of the variance matrix from the previous section. In particular, is a diagonal matrix (i.e. all off-diagonal elements are 0). Thus, by the multivariate delta method, with the matrix A with jth entry , the estimated variance of is
and it remains to identify for .
To begin, first note that depends on , which in turn depends on but also on the tail correction employed. As such, we will express the formulas in terms of for for the moment. We first consider for . Then,
In the above expression, does not appear in the first terms of the summation, that is, for . Thus,
and in terms of , this may be expressed as
It is also useful to note that does not involve and thus . The general variance formula thus may be written as
But
and thus,
in turn implying that
The variance is estimated by replacing parameters with their estimates in the above formula. This yields
where we understand to mean with replaced by and by .
If , then
a formula that further simplifies, under the Kaplan–Meier assumption (recalling that ), to
We note that if no tail correction is necessary, because (in which case as well and the upper limit of the summation is ), or under Efron's approximation.
For Klein and Moeschberger's method,
implying that
resulting in the same variance formula as under Efron's method [but is increased by for this latter approximation].
Turning now to the exponential tail correction with , recall that and . Thus
Therefore, under the exponential tail correction, the general variance estimate becomes
In the Nelson–Åalen case with , the term may obviously be omitted.
For higher moments, a similar approach may be used. We have, for the th moment,
which may be estimated (using without loss of generality) by
Again, the final integral on the right-hand side depends on the tail correction, and is 0 if or under Efron's tail correction. It is useful to note that under the exponential tail correction, for with , and if ,
using the tail function representation of the gamma distribution. That is, under the exponential tail correction,
In particular, for the second moment ,
Variance estimation for may be done in a similar manner as for the mean, if desired.
The results of Section 14.3 apply in situations in which the data are (right) censored. In this section, we discuss the situation in which the data may also be (left) truncated. We have the following definitions.
In insurance survival data and claim data, the most common occurrences are left truncation and right censoring. Left truncation occurs when an ordinary deductible of d is applied. When a policyholder has a loss below d, he or she realizes no benefits will be paid and so does not inform the insurer. When the loss is above d, the amount of the loss is assumed to be reported.5 A policy limit leads to an example of right censoring. When the amount of the loss equals or exceeds u, benefits beyond that value are not paid, and so the exact value is not recorded. However, it is known that a loss of at least u has occurred.
For decrement studies, such as of human mortality, it is impractical to follow people from birth to death. It is more common to follow a group of people of varying ages for a few years during the study period. When a person joins a study, he or she is alive at that time. This person's age at death must be at least as great as the age at entry to the study and thus has been left truncated. If the person is alive when the study ends, right censoring has occurred. The person's age at death is not known, but it is known that it is at least as large as the age when the study ended. Right censoring also affects those who exit the study prior to its end due to surrender. Note that this discussion could have been about other decrements, such as disability, policy surrender, or retirement.
Because left truncation and right censoring are the most common occurrences in actuarial work, they are the only cases that are covered in this section. To save words, truncated always means truncated from below and censored always means censored from above.
When trying to construct an empirical distribution from truncated or censored data, the first task is to create notation to represent the data. For individual (as opposed to grouped) data, the following facts are needed. The first is the truncation point for that observation. Let that value be for the jth observation. If there was no truncation, .6 Next, record the observation itself. The notation used depends on whether or not that observation was censored. If it was not censored, let its value be . If it was censored, let its value be . When this subject is presented more formally, a distinction is made between the case where the censoring point is known in advance and where it is not. For example, a liability insurance policy with a policy limit usually has a censoring point that is known prior to the receipt of any claims. By comparison, in a mortality study of insured lives, those that surrender their policy do so at an age that was not known when the policy was sold. In this chapter, no distinction is made between the two cases.
To construct the estimate, the raw data must be summarized in a useful manner. The most interesting values are the uncensored observations. As in Section 14.3, let be the k unique values of the that appear in the sample, where k must be less than or equal to the number of uncensored observations. We also continue to let be the number of times the uncensored observation appears in the sample. Again, an important quantity is , the number “at risk” at . In a decrement study, represents the number under observation and subject to the decrement at that time. To be under observation at , an individual must (1) either be censored or have an observation that is on or after and (2) not have a truncation value that is on or after . That is,
Alternatively, because the total number of is equal to the total number of and , we also have
This latter version is a bit easier to conceptualize because it includes all who have entered the study prior to the given age less those who have already left. The key point is that the number at risk is the number of people observed alive at age . If the data are loss amounts, the risk set is the number of policies with observed loss amounts (either the actual amount or the maximum amount due to a policy limit) greater than or equal to less those with deductibles greater than or equal to . These relationships lead to a recursive version of the formula,
where between is interpreted to mean greater than or equal to and less than , and is set equal to zero.
A consequence of the above definitions is that if a censoring or truncation time equals that of a death, the death is assumed to have happened first. That is, the censored observation is considered to be at risk while the truncated observation is not.
The definition of presented here is consistent with that in Section 14.3. That is, if for all observations, the formulas presented here reduce match those presented earlier. The following example illustrates calculating the number at risk when there is truncation.
The approach to developing an empirical estimator of the survival function is to use the formulas developed in Section 14.3, but with this more general definition of . A theoretical treatment that incorporates left truncation is considerably more complex (for details, see Lawless [77]).
The formula for the Kaplan–Meier estimate is the same as presented earlier, namely
The same tail corrections developed in Section 14.3 can be used for in cases where .
In this example, a tail correction is not needed because an estimate of survival beyond the five-year term is of no value when analyzing these policyholders.
The same analogy holds for the Nelson–Åalen estimator, where the formula for the cumulative hazard function remains
As before, for and for the same tail corrections can be used.
In this section, the results were not formally developed, as was done for the case with only right censored data. However, all the results, including formulas for moment estimates and estimates of the variance of the estimators, hold when left truncation is added. However, it is important to note that when the data are truncated, the resulting distribution function is the distribution function of observations given that they are above the smallest truncation point (i.e. the smallest d value). Empirically, there is no information about observations below that value, and thus there can be no information for that range. Finally, if it turns out that there was no censoring or truncation, use of the formulas in this section will lead to the same results as when using the empirical formulas in Section 14.1.
Table 14.18 The data for Exercise 14.37.
 1 | 100 | 15 |
 8 |  65 | 20 |
17 |  40 | 13 |
25 |  31 | 31 |
One problem with empirical distributions is that they are always discrete. If it is known that the true distribution is continuous, the empirical distribution may be viewed as a poor approximation. In this section, a method of obtaining a smooth, empirical-like distribution, called a kernel density distribution, is introduced. We have the following definition.
Note that the empirical distribution is a special type of kernel smoothed distribution in which the random variable assigns probability 1 to the data point. With regard to kernel smoothing, there are several distributions that could be used, three of which are introduced here.
While not necessary, it is customary that the continuous variable have a mean equal to the value of the point it replaces, ensuring that, overall, the kernel estimate has the same mean as the empirical estimate. One way to think about such a model is that it produces the final observed value in two steps. The first step is to draw a value at random from the empirical distribution. The second step is to draw a value at random from a continuous distribution whose mean is equal to the value drawn at the first step. The selected continuous distribution is called the kernel.
For notation, let be the probability assigned to the value by the empirical distribution. Let be a distribution function for a continuous distribution such that its mean is y. Let be the corresponding density function.
The function is called the kernel. Three kernels are now introduced: uniform, triangular, and gamma.
In each case, there is a parameter that relates to the spread of the kernel. In the first two cases, it is the value of , which is called the bandwidth. In the gamma case, the value of controls the spread, with a larger value indicating a smaller spread. There are other kernels that cover the range from zero to infinity.
Table 14.19 The data for Exercise 14.40.
 10 | 1 | 20 |
 34 | 1 | 19 |
 47 | 1 | 18 |
 75 | 1 | 17 |
156 | 1 | 16 |
171 | 1 | 15 |
The discussion in this section is motivated by the circumstances that accompany the determination of a model for the time to death (or other decrement) for use in pricing, reserving, or funding insurance programs. The particular circumstances are as follows:
These circumstances typically apply when an insurance company (or a group of insurance companies) conducts a mortality study based on the historical experience of a very large portfolio of life insurance policies. (For the remainder of this section, we shall refer only to mortality. The results apply equally to the study of other decrements such as disablement or surrender.)
The typical mortality table is essentially a distribution function or a survival function with values presented only at integral ages. While there are parametric models that do well over parts of the age range (such as the Makeham model at ages over about 30), there are too many changes in the pattern from age 0 to ages well over 100 to allow for a simple functional description of the survival function.
The typical mortality study is conducted over a short period of time, such as three to five years. For example, all persons who are covered by an insurance company's policies at some time from January 1, 2014 through December 31, 2016 might be included. Some of these persons may have purchased their policies prior to 2014 and were still covered when the study period started. During the study period some persons will die, some will cancel (surrender) their policy, some will have their policy expire due to policy provisions (such as with term insurance policies that expire during the study period), and some will still be insured when the study ends. It is assumed that if a policy is cancelled or expires, the eventual age at death will not be known to the insurance company. Some persons will purchase their life insurance policy during the study period and be covered for some of the remaining part of the study period. These policies will be subject to the same decrements (death, surrender, expiration) as other policies. With regard to the age at death, almost every policy in the study will be left truncated.8 If the policy was issued prior to 2014, the truncation point will be the age on January 1, 2014. For those who buy insurance during the study period, the truncation point is the age at which the contract begins. For any person who exits the study due to a cause other than death, their observation is right censored at the age of exit, because all that is known about them is that death will be at some unknown later age.
When no simple parametric distribution is appropriate and when large amounts of data are available, it is reasonable to use a nonparametric model because the large amount of data will ensure that key features of the survival function will be captured. Because there are both left truncation (due to the age at entry into the study) and right censoring (due to termination of the study at a fixed time), when there are large amounts of data, constructing the Kaplan–Meier estimate may require a very large amount of sorting and counting. Over the years, a variety of methods have been introduced and entire texts have been written about the problem of constructing mortality tables from this kind of data (e.g. [12, 81]). While the context for the examples presented here is the construction of mortality tables, the methods can apply any time the circumstances described previously apply.
We begin by examining the two ways in which data are usually collected. Estimators will be presented for both situations. The formulas will be presented in this section and their derivation and properties will be provided in Section 14.8. In all cases, a set of values (ages), has been established in advance and the goal is to estimate the survival function at these values and no others (with some sort of interpolation to be used to provide intermediate values as needed). All of the methods are designed to estimate the conditional one-period probability of death, , where j may refer to the interval and not to a particular age. From those values, can be evaluated as follows:
In this setting, data are recorded for each person observed. This approach is sometimes referred to as a seriatim method, because the data points are analyzed as a series of individual observations. The estimator takes the form , where is the number of observed deaths in the interval and is a measure of exposure, representing the number of individuals who had a chance to be an observed death in that interval. Should a death occur at one of the boundary values between successive intervals, the death is counted in the preceding interval. When there are no entrants after age into the interval and no exitants except for death during the interval (referred to as complete data), represents the number of persons alive at age and the number of deaths has a binomial distribution. With incomplete data, it is necessary to determine a suitable convenient approximation, preferably one that requires only a single pass through the data set. To illustrate this challenge, consider the following example.
The next step is to tally information for each age interval, building up totals for and . Counting deaths is straightforward. For exposures, there are two approaches that are commonly used.
Exact exposure method
Following this method, we set the exposure equal to the exact total time under observation within the age interval. When a death occurs, that person's exposure ends at the exact age of death. It will be shown in Section 14.8 that is the maximum likelihood estimator of the hazard rate, under the assumption that the hazard rate is constant over the interval . Further properties of this estimator will also be discussed in that section. The estimated hazard rate can then be converted into a conditional probability of death using the formula .
Actuarial exposure method
Under this method, the exposure period for deaths extends to the end of the age interval, rather than the exact age at death. This has the advantage of reproducing the empirical estimator for complete data, but has been shown to be an inconsistent estimator in other cases. In this case, the estimate of the conditional probability of death is obtained as .
When the conditional probability of death is small, with a large number of observations, the choice of method is unlikely to materially affect the results.
While the examples have been in a life insurance context, the methodology applies to any situation with left truncation and right censoring. However, there is a situation that is specific to life insurance studies. Consider a one-year term insurance policy. Suppose that an applicant was born on February 15, 1981 and applies for this insurance on October 15, 2016. Premiums are charged by whole-number ages. Some companies will use the age at the last birthday (35 in this case) and some will use the age at the nearest birthday (36 in this case). One company will base the premium on and one on when both should be using , the applicant's true age. Suppose that a company uses age last birthday. When estimating , it is not interested in the probability that a person exactly age 35 dies in the next year (the usual interpretation) but, rather, the probability that a random person who is assigned age 35 at issue (who can be anywhere between 35 and 36 years old) dies in the next year. One solution is to obtain a table based on exact ages, assume that the average applicant is 35.5, and use an interpolated value when determining premiums. A second solution is to perform the mortality study using the ages assigned by the company rather than the policyholder's true age. In the example, the applicant is considered to be exactly age 35 on October 15, 2016 and is thus assigned a new birthday of October 15, 1981. When this is done, the study is said to use insuring ages and the resulting values can be used directly for insurance calculations.
Note that with insuring ages, those who enter observation after the study begins are first observed on their newly assigned birthday. Thus there are no approximation issues with regard to those numbers.
The mortality studies described so far in this section are often called calendar-based or date-to-date studies because the period of study runs from one calendar date to another calendar date. It is also common for mortality studies of insured persons to use a different setup.
Instead of having the observations run from one calendar date to another calendar date, observation for a particular policyholder begins on the first policy anniversary following the fixed start date of the study and ends on the last anniversary prior to the study's fixed end date. Such studies are often called anniversary-to-anniversary studies. We can illustrate this through a previous example.
Consider Example 14.18, with the study now running from anniversaries in 2014 to anniversaries in 2016. The first policy comes under observation on 8-2014 at insuring age 33-0 and exits the study on 8-2016 at insuring age 35-0. Policyholder 2 begins observation on 7-2014 at insuring age 33-0. Policyholder 5 surrendered after the 2016 anniversary, so observation ends on 3-2016 at age 34-0. All other ages remain the same. In this setting, all subjects begin observations at an integral age and all who are active policyholders at the end of the study do so at an integral age. Only the ages of death and surrender may be other than integers (and note that with the actuarial exposure method, in calculating the exposure, deaths are placed at the next integral age). There is a price to be paid for this convenience. In a three-year study such as the one in the example, no single policyholder can be observed for more than two years. In the date-to-date version, some policies will contribute three years of exposure.
All of the examples have used one-year time periods. If the length of an interval is not equal to 1, an adjustment is necessary. Exposures should be the fraction of the period under observation and not the length of time.
Instead of recording the exact age at which an event happens, all that is recorded is the age interval in which it took place and the nature of the event. As with the individual method, for a portfolio of insurance policies, only running totals need to be recorded, and the end result is just four to six9 numbers for each age interval:
The analysis of this situation is relatively simple. For the interval from age to age , let be the number of lives under observation at age . This number includes those carried over from the prior interval as well as those entering at age . Let , , and be the number entering, dying, and exiting during the interval. Note that, in general, , as the right-hand side must be adjusted by those who exit or enter at exact age . Estimating the mortality probability depends on the method selected and an assumption about when the events that occur during the age interval take place.
One approach is to assume a uniform distribution of the events during the interval. For the exact exposure method, the who start the interval have the potential to contribute a full unit of exposure and the entrants during the year add another half-year each (on average). Similarly, those who die or exit subtract one-half year on average. Thus the net exposure is . For the actuarial exposure method, those who die do not reduce the exposure, and it becomes .
Another approach is to adapt the Kaplan–Meier estimator to this situation. Suppose that the deaths all occur at midyear and all other decrements occur uniformly through the year. Then the risk set at midyear is and the estimator is the same as the actuarial estimator.
The goal of all the estimation procedures in this book is to deduce the probability distribution for the random variable in the absence of truncation and censoring. For loss data, that would be the probabilities if there were no deductible or limit, that is, ground-up losses. For lifetime data, it would be the probability distribution of the age at death if we could follow the person from birth to death. These are often referred to as single-decrement probabilities and are typically denoted in life insurance mathematics. In the life insurance context, the censoring rates are often as important as the mortality rates. For example, in the context of Data Set D, both time to death and time to withdrawal may be of interest. In the former case, withdrawals cause observations to be censored. In the latter case, censoring is caused by death. A superscript identifies the decrement of interest. For example, suppose that the decrements were death (d) and withdrawal (w). Then is the actuarial notation for the probability that a person alive and insured at age withdraws prior to age in an environment where withdrawal is the only decrement, that is, that death is not possible. When the causes of censoring are other important decrements, an often-used assumption is that all the decrements are stochastically independent. That is, that they do not influence each other. For example, a person who withdraws at a particular age has the same probability of dying in the following month as a person who does not.
Table 14.23 The single-decrement withdrawal probabilities for Example 14.24.
j | |||||||
0 |  0 | 30 | 3 | 3 | 1 |  0 | |
1 | 29 |  0 | 1 | 2 | 0 |  0 | |
2 | 28 |  0 | 3 | 3 | 2 |  0 | |
3 | 26 |  0 | 3 | 3 | 3 |  0 | |
4 | 23 |  0 | 0 | 4 | 2 | 17 |
Table 14.26 The data for Exercise 14.44.
d | u | x | d | u | x |
45 | 46.0 | 45 | 45.8 | ||
45 | 46.0 | 46 | 47.0 | ||
45 | 45.3 | 46 | 47.0 | ||
45 | 46.7 | 46 | 46.3 | ||
45 | 45.4 | 46 | 46.2 | ||
45 | 47.0 | 46 | 46.4 | ||
45 | 45.4 | 46 | 46.9 |
Table 14.27 The data for Exercise 14.45.
Deductible | Paymenta | Deductible | Payment |
250 | 2,221 | 500 | 3,660 |
250 | 2,500 | 500 | 215 |
250 | 207 | 500 | 1,302 |
250 | 3,735 | 500 | 10,000 |
250 | 5,000 | 1,000 | 1,643 |
250 | 517 | 1,000 | 3,395 |
250 | 5,743 | 1,000 | 3,981 |
500 | 2,500 | 1,000 | 3,836 |
500 | 525 | 1,000 | 5,000 |
500 | 4,393 | 1,000 | 1,850 |
500 | 5,000 | 1,000 | 6,722 |
aNumbers in italics indicate that the amount paid was at the policy limit. |
In Section 14.7, methods were introduced for estimating mortality probabilities with large data sets. One of the methods was a seriatim method using exact exposure. In this section, that estimator will be shown to be maximum likelihood under a particular assumption. To do this, we need to develop some notation. Suppose that we are interested in estimating the probability that an individual alive at age a dies prior to age b, where . This is denoted . Let X be the random variable with survival function , the probability of surviving from birth to age x. Now let Y be the random variable X conditioned on . Its survival function is .
We now introduce a critical assumption about the shape of the survival function within the interval under consideration. Assume that for . This means that the survival function decreases exponentially within the interval. Equivalently, the hazard rate (called the force of mortality in life insurance mathematics) is assumed to be constant within the interval. Beyond b, a different hazard rate can be used. Our objective is to estimate the conditional probability q. Thus we can perform the estimation using only data from and a functional form for this interval. Values of the survival function beyond b will not be needed.
Now consider data collected on n individuals, all of whom were observed during the age interval . For individual j, let be the age at which the person was first observed within the interval and let be the age at which the person was last observed within the interval (thus ). Let if the individual was alive when last observed and if the individual was last observed due to death. For this analysis, we assume that each individual's censoring age (everyone who does not die in the interval will be censored, either by reaching age b or through some event that removes them from observation) is known in advance. Thus the only random quantities are , and for individuals with , the age at death. The likelihood function is
where is the number of observed deaths and is the total time the individuals were observed in the interval (which was called exact exposure in Section 14.7). Taking logarithms, differentiating, and solving produces
Finally, the maximum likelihood estimate of the probability of death is .
Studies often involve random censoring, where individuals may exit for reasons other than death at times that were not known in advance. If all decrements (e.g. death, disability, and retirement) are stochastically independent (that is, the timing of one event does not influence any of the others), then the maximum likelihood estimator turns out to be identical to the one derived in this section. Although we do not derive the result, note that it follows from the fact that the likelihood function can be decomposed into separate factors for each decrement.
The variance of this estimator can be approximated using the observed information approach. The second derivative of the loglikelihood function is
Substitution of the estimator produces
and so . Using the delta method,
Recall from Section 14.7 that there is an alternative called actuarial exposure, with with e calculated in a different manner. When analyzing results from this approach, it is common to assume that d is the result of a binomial experiment with sample size . Then,
If the terms are dropped (and they are often close to 1), the two variance formulas are identical (noting that the values of e will be slightly different).
The discussion to this point has concerned estimating the probability of a decrement in the absence of other decrements. An unstated assumption was that the environment in which the observations are made is one where once any decrement occurs, the individual is no longer observed.
A common, and more complex, situation is one in which after a decrement occurs, the individual remains under observation, with the possibility of further decrements. A simple example is a disability income policy. A healthy individual can die, become disabled, or surrender their policy. Those who become disabled continue to be observed, with possible decrements being recovery or death. Scenarios such as this are referred to as multistate models. Such models are discussed in detail in Dickson et al. [28]. In this section, we cover estimation of the transition intensities associated with such models. The results presented are based on Waters [129].
For notation, let the possible states be and let be the force of transition to state j for an individual who is currently between ages x and and is in state i. This notation is based on an assumption that the force of transition is constant over an integral age. This is similar to the earlier assumption that the force of decrement is constant over a given age.
While not shown here, maximum likelihood estimates turn out to be based on exact exposure for the time spent in each state. For those between ages x and (which can be generalized for periods of other than one year), let be the total time policyholders are observed in state i and be the number of observed transitions from state i to state j. Then, . Similarly, .
The construction of interval-based methods is more difficult because it is unclear when to place the transitions. Those who make one transition in the year may be reasonably placed at mid-age. However, those who make two transitions would more reasonably be placed at the one-third and two-thirds points. This would require careful data-keeping and the counting of many different cases.
13.59.139.220