Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

14
Construction of Empirical Models

14.1 The Empirical Distribution

The material presented here has traditionally been presented under the heading of “survival models,” with the accompanying notion that the techniques are useful only when studying lifetime distributions. Standard texts on the subject, such as Klein and Moeschberger [70] and Lawless [77], contain examples that are exclusively oriented in that direction. However, the same problems that occur when modeling lifetime occur when modeling payment amounts. The examples we present are of both types. However, the latter sections focus on special considerations when constructing decrement models. Only a handful of references are presented, most of the results being well developed in the survival models literature. If you want more detail and proofs, consult a text dedicated to the subject, such as the ones just mentioned.

In Chapter 4, models were divided into two types – data-dependent and parametric. The definitions are repeated here.

This chapter will focus on data-dependent distributions as models, making few, if any, assumptions about the underlying distribution. To fix the most important concepts, we begin by assuming that we have a sample of n observations that are an independent and identically distributed sample from the same (unspecified) continuous distribution. This is referred to as a complete data situation. In that context, we have the following definition.

When observations are collected from a probability distribution, the ideal situation is to have the (essentially) exact¹ value of each observation. This case is referred to as complete, individual data and applies to Data Set B, introduced in Chapter 10 and reproduced here as Table 14.1. There are two reasons why exact data may not be available. One is grouping, in which all that is recorded is the range of values in which the observation belongs. Grouping applies to Data Set C and to Data Set A for those with five or more accidents. There data sets were introduced in Chapter 10 and are reproduced here as Tables 14.2 and 14.3, respectively.

Table 14.1 Data Set B.

27	82	115	126	155	161	243	294	340	384
457	680	855	877	974	1,193	1,340	1,884	2,558	15,743

Table 14.2 Data Set C.

Payment range	Number of payments
0–7,500	99
7,500–17,500	42
17,500–32,500	29
32,500–67,500	28
67,500–125,000	17
125,000–300,000	9
Over 300,000	3

Table 14.3 Data Set A.

Number of accidents	Number of drivers
0	81,714
1	11,306
2	1,618
3	250
4	40
5 or more	7

A second reason that exact values may not be available is the presence of censoring or truncation. When data are censored from below, observations below a given value are known to be below that value but the exact value is unknown. When data are censored from above, observations above a given value are known to be above that value but the exact value is unknown. Note that censoring effectively creates grouped data. When the data are grouped in the first place, censoring has no effect. For example, the data in Data Set C may have been censored from above at 300,000, but we cannot know for sure from the data set and that knowledge has no effect on how we treat the data. In contrast, were Data Set B to be censored at 1,000, we would have 15 individual observations and then five grouped observations in the interval from 1,000 to infinity.

In insurance settings, censoring from above is fairly common. For example, if a policy pays no more than 100,000 for an accident, any time the loss is above 100,000 the actual amount will be unknown, but we will know that it happened. Note that Data Set A has been censored from above at 5. This is more common language than saying that Data Set A has some individual data and some grouped data. When studying mortality or other decrements, the study period may end with some individuals still alive. They are censored from above in that we know the death will occur sometime after their age when the study ends.

When data are truncated from below, observations below a given value are not recorded. Truncation from above implies that observations above a given value are not recorded. In insurance settings, truncation from below is fairly common. If an automobile physical damage policy has a per-claim deductible of 250, any losses below 250 will not come to the attention of the insurance company and so will not appear in any data sets. Data sets may have truncation forced on them. For example, if Data Set B were to be truncated from below at 250, the first seven observations would disappear and the remaining 13 would be unchanged. In decrement studies it is unusual to observe individuals from birth. If someone is first observed at, say, age 20, that person is from a population where anyone who died before age 20 would not have been observed and thus is truncated from below.

As noted in Definition 14.3, the empirical distribution assigns probability 1/n to each data point. That definition works well when the value of each data point is recorded. An alternative definition follows.

In the following example, not all values are distinct.

To assess the quality of the estimate, we examine statistical properties, in particular, the mean and variance. Working with the empirical estimate of the distribution function is straightforward. To see that with complete data the empirical estimator of the survival function is unbiased and consistent, recall that the empirical estimate of is , where Y is the number of observations in the sample that are less than or equal to x. Then Y must have a binomial distribution with parameters n and and

demonstrating that the estimator is unbiased. The variance is

which has a limit of zero, thus verifying consistency.

To make use of the result, the best we can do for the variance is estimate it. It is unlikely that we will know the value of , because that is the quantity we are trying to estimate. The estimated variance is given by

(14.1)

The same results hold for empirically estimated probabilities. Let . The empirical estimate of p is . Arguments similar to those used for verify that is unbiased and consistent, with .

14.2 Empirical Distributions for Grouped Data

For grouped data as in Data Set C, construction of the empirical distribution as defined previously is not possible. However, it is possible to approximate the empirical distribution. The strategy is to obtain values of the empirical distribution function wherever possible and then connect those values in some reasonable way. For grouped data, the distribution function is usually approximated by connecting the points with straight lines. For notation, let the group boundaries be , where often and . The number of observations falling between and is denoted , with . For such data, we are able to determine the empirical distribution at each group boundary. That is, . Note that no rule is proposed for observations that fall on a group boundary. There is no correct approach, but whatever approach is used, consistency in assignment of observations to groups should be used. Note that in Data Set C it is not possible to tell how the assignments were made. If we had that knowledge, it would not affect any subsequent calculations.²

This function is differentiable at all values except group boundaries. Therefore the density function can be obtained. To completely specify the density function, it is arbitrarily made right continuous.

Many computer programs that produce histograms actually create a bar chart with bar heights proportional to . A bar chart is acceptable if the groups have equal width, but if not, then the preceding formula is needed. The advantage of this approach is that the histogram is indeed a density function and, among other things, areas under the histogram can be used to obtain empirical probabilities.

14.2.1 Exercises

14.1 Construct the ogive and histogram for the data in Table 14.4.

Table 14.4 The data for Exercise 14.1.

Payment range	Number of payments
0–25	6
25–50	24
50–75	30
75–100	31
100–150	57
150–250	80
250–500	85
500–1,000	54
1,000–2,000	15
2,000–4,000	10
Over 4,000	0

14.2 (*) The following 20 windstorm losses (in millions of dollars) were recorded in one year:
1. Construct an ogive based on using class boundaries at 0.5, 2.5, 8.5, 15.5, and 29.5.
2. Construct a histogram using the same boundaries as in part (a).

14.3 The data in Table 14.5 are from Herzog and Laverty [53]. A certain class of 15-year mortgages was followed from issue until December 31, 1993. The issues were split into those that were refinances of existing mortgages and those that were original issues. Each entry in the table provides the number of issues and the percentage of them that were still in effect after the indicated number of years. Draw as much of the two ogives (on the same graph) as is possible from the data. Does it appear from the ogives that the lifetime variable (the time to mortgage termination) has a different distribution for refinanced versus original issues?

Table 14.5 The data for Exercise 14.3.

	Refinances		Original
Years	Number issued	Survived	Number issued	Survived
1.5	42,300	99.97	12,813	99.88
2.5	9,756	99.82	18,787	99.43
3.5	1,550	99.03	22,513	98.81
4.5	1,256	98.41	21,420	98.26
5.5	1,619	97.78	26,790	97.45

14.4 (*) The data in Table 14.6 were collected (the units are millions of dollars). Construct the histogram.

Table 14.6 The data for Exercise 14.4.

Loss Number of observations

0–2 25

2–10 10

10–100 10

100–1,000 5
14.5 (*) Forty losses have been observed. Sixteen losses are between 1 and (in millions), and the sum of the 16 losses is 20. Ten losses are between and 2, with a total of 15. Ten more are between 2 and 4, with a total of 35. The remaining four losses are greater than 4. Using the empirical model based on these observations, determine .
14.6 (*) A sample of size 2,000 contains 1,700 observations that are no greater than 6,000, 30 that are greater than 6,000 but no greater than 7,000, and 270 that are greater than 7,000. The total amount of the 30 observations that are between 6,000 and 7,000 is 200,000. The value of for the empirical distribution associated with these observations is 1,810. Determine for the empirical distribution.
14.7 (*) A random sample of unknown size produced 36 observations between 0 and 50; x between 50 and 150; y between 150 and 250; 84 between 250 and 500; 80 between 500 and 1,000; and none above 1,000. Two values of the ogive constructed from these observations are and . Determine the value of x.

Loss	Number of observations
0–2	25
2–10	10
10–100	10
100–1,000	5

14.8 The data in Table 14.7 are from Hogg and Klugman [55, p. 128]. They represent the total damage done by 35 hurricanes between the years 1949 and 1980. The losses have been adjusted for inflation (using the Residential Construction Index) to be in 1981 dollars. The entries represent all hurricanes for which the trended loss was in excess of 5,000,000.

The federal government is considering funding a program that would provide 100% payment for all damages for any hurricane causing damage in excess of 5,000,000. You have been asked to make some preliminary estimates.

Estimate the mean, standard deviation, coefficient of variation, and skewness for the population of hurricane losses.
Estimate the first and second limited moments at 500,000,000.

Table 14.7 Trended hurricane losses.

Year	Loss (10³)	Year	Loss (10³)	Year	Loss (10³)
1964	6,766	1964	40,596	1975	192,013
1968	7,123	1949	41,409	1972	198,446
1971	10,562	1959	47,905	1964	227,338
1956	14,474	1950	49,397	1960	329,511
1961	15,351	1954	52,600	1961	361,200
1966	16,983	1973	59,917	1969	421,680
1955	18,383	1980	63,123	1954	513,586
1958	19,030	1964	77,809	1954	545,778
1974	25,304	1955	102,942	1970	750,389
1959	29,112	1967	103,217	1979	863,881
1971	30,146	1957	123,680	1965	1,638,000
1976	33,727	1979	140,136

14.9 (*) There have been 30 claims recorded in a random sampling of claims. There were 2 claims for 2,000, 6 for 4,000, 12 for 6,000, and 10 for 8,000. Determine the empirical skewness coefficient.

14.3 Empirical Estimation with Right Censored Data

In this section, we generalize the empirical approach of the previous section to situations in which the data are not complete. In particular, we assume that individual observations may be right censored. We have the following definition.

In insurance claims data, the presence of a policy limit may give rise to right censored observations. When the amount of the loss equals or exceeds the limit u, benefits beyond that value are not paid, and so the exact value is typically not recorded. However, it is known that a loss of at least u has occurred.

When carrying out a study of the mortality of humans, if a person is alive when the study ends, right censoring has occurred. The person's age at death is not known, but it is known that it is at least as large as the age when the study ended. Right censoring also affects those who exit the study prior to its end due to surrender or lapse. Note that this discussion could have been about other decrements, such as disability, policy surrender, or retirement.

For this section and the next two, we assume that the underlying random variable has a continuous distribution. While data from discrete random variables can also be right censored (Data Set A is an example), the use of empirical estimators is rare and thus the development of analogous formulas is unlikely to be worth the effort.

We now make specific assumptions regarding how the data are collected and recorded. It is assumed that we have a random sample for which some (but not all) of the data are right censored. For the uncensored (i.e. completely known) observations, we will denote their k unique values by . We let denote the number of times that appears in the sample. We also set as the minimum possible value for an observation and assume that . Often, . Similarly, set as the largest observation in the data, censored or uncensored. Hence, . Our goal is to create an empirical (data-dependent) distribution that places probability at the values .

We often possess the specific value at which an observation was censored. However, for both the derivation of the estimator and its implementation, it is only necessary to know between which y-values it occurred. Thus, the only input needed is , the number of right censored observations in the interval for . We make the assumption that if an observation is censored at , then the observation is censored at (i.e. in the lifetime situation, immediately after the death). It is possible to have censored observations at values between and . However, because we are placing probability only at the uncensored values, these observations provide no information about those probabilities and so can be dropped. When referring to the sample size, n will denote the number of observations after these have been dropped. Observations censored at or above cannot be ignored. Let be the number of observations right censored at or later. Note that if , then .

The final important quantity is , referred to as the number “at risk” at . When thinking in terms of a mortality study, the risk set comprises the individuals who are under observation at that age. Included are all who die at that age or later and all who are censored at that age or later. Formally, we have the following definition.

This formula reflects the fact that the number at risk at is that at less the exact observations at and the censored observations in . Note that and hence .

The following numerical example illustrates these ideas.

Example 14.5

Determine the numbers at risk for the data in Table 14.8. Observations marked with an asterisk (*) were censored at that value.

Table 14.8 The raw data for Example 14.5.

1	2	3*	4	4	4*	4*	5	7*	8
8	8	9	9	9	9	10*	12	12	15*

The first step is to summarize the data using previously defined notation. The summarized values and risk set calculations appear in Table 14.9. Addition of the values in the s and b columns confirms that there are observations. Thus, . Had only this table been available, the fact that would have implied that there is one censored value larger than 12, the largest observed value. The value indicates that this observation was censored at 15.

Table 14.9 The data for Example 14.5.

i
0	0	—	—	—
1	1	1	0	20
2	2	1	1
3	4	2	2
4	5	1	1
5	8	3	0
6	9	4	1
7	12	2	1
max	15	—	—

As noted earlier, calculation of the risk set does not require the exact values of the censored observations, only the interval in which they fall. For example, had the censored observation of 7 been at 6, the resulting numbers at risk would not change (because would still equal one).

It should be noted that if there is no censoring, so that for all i, then the data are complete and the techniques of Section 14.1 may be used. As such, the approach of this section may be viewed as a generalization.

We shall now present a heuristic derivation of a well-known generalization of the empirical distribution function. This estimator is referred to as either the Kaplan–Meier or the product limit estimator.

To proceed, we first present some basic facts regarding the distribution of a discrete random variable Y, say, with support on the points . Let , and then the survival function is (where means to take the sum or product over all values of i where )

Setting for , we have

and . We also have from the definition of .

Thus,

implying that . Hence,

Also, , and for ,

The heuristic derivation proceeds by viewing for as unknown parameters, and estimating them by a nonparametric “maximum likelihood” bsed argument.³ For a more detailed discussion, see Lawless [77]. For the present data, the uncensored observations at each contribute to the likelihood where and

Each of the censored observations contributes

to the likelihood (recall that for ), and the censored observations at or above each contribute .

The likelihood is formed by taking products over all contributions (assuming independence of all data points), namely

which, in terms of the , becomes

where the last line follows by interchanging the order of multiplication in each of the two double products. Thus,

Observe that and . Hence,

This likelihood has the appearance of a product of binomial likelihoods. That is, this is the same likelihood as if were realizations of k independent binomial observations with parameters and . The “maximum likelihood estimate” of is obtained by taking logarithms, namely

implying that

Equating this latter expression to zero yields .

For , the Kaplan–Meier [66] estimate of is obtained by replacing by wherever it appears. Noting that for , it follows that

This may be written more succinctly as for . When , you should interpret as .

Example 14.6

Construct the Kaplan–Meier estimate of using the data in Example 14.5. Indicate how the answer would change if and .

The calculations appear in Table 14.10. The estimated survival function is

With the change in values, we have and for .

Table 14.10 The Kaplan–Meier estimates for Example 14.6.

i
1	1	1	20
2	2	1	19
3	4	2	17
4	5	1	13
5	8	3	11
6	9	4	8
7	12	2	3

We now discuss estimation for . First, note that if (no censored observations at ), then and for is clearly the (only) obvious choice. However, if , as in the previous example, there are no empirical data to estimate for , and tail estimates for (often called tail corrections) are needed. There are three popular extrapolations:

Efron's tail correction [31] assumes that for .
Klein and Moeschberger [70, p. 118] assume that for and for , where is a plausible upper limit for the underlying random variable. For example, in a study of human mortality, the limit might be 120 years.
Brown, Hollander, and Korwar's exponential tail correction [18] assumes that and that for . With , , and thus

Note that if there is no censoring ( for all i), then , and for

In this case, is the number of observations exceeding y and . Thus, with no censoring, the Kaplan–Meier estimate reduces to the empirical estimate of the previous section.

An alternative to the Kaplan–Meier estimator, called the Nelson–Åalen estimator [1], [93], is sometimes used. To motivate the estimator, note that if is the survival function of a continuous distribution with failure rate , then is called the cumulative hazard rate function. The discrete analog is, in the present context, given by , which can intuitively be estimated by replacing by its estimate . The Nelson–Åalen estimator of is thus defined for to be

That is, for , and the Nelson–Åalen estimator of the survival function is . The notation under the summation sign indicates that values of should be included only if . For , the situation is similar to that involving the Kaplan–Meier estimate in the sense that a tail correction of the type discussed earlier needs to be employed. Note that, unlike the Kaplan–Meier estimate, , so that a tail correction is always needed.

Example 14.8

Determine the Nelson–Åalen estimates for the data in Example 14.6. Continue to assume .

The estimates of the cumulative hazard function are given in Table 14.11. The estimated survival function is

With regard to tail correction, Efron's method has , . Klein and Moeschberger's method has , , and , . The exponential tail correction has , .

Table 14.11 The Nelson–Åalen estimates for Example 14.8.

i
1	1	1	20
2	2	1	19
3	4	2	17
4	5	1	13
5	8	3	11
6	9	4	8
7	12	2	3

To assess the quality of the two estimators, we will now consider estimation of the variance. Recall that for , the Kaplan–Meier estimator may be expressed as

which is a function of the . Thus, to estimate the variance of , we first need the covariance matrix of . We estimate this from the “likelihood,” using standard likelihood methods. Recall that

and thus satisfies

Thus,

and

which, with replaced by , becomes

For ,

The observed information, evaluated at the maximum likelihood estimate, is thus a diagonal matrix, which when inverted yields the estimates

and

These results also follow directly from the binomial form of the likelihood.

Returning to the problem at hand, the delta method⁴ gives the approximate variance of as , for an estimator .

To proceed, note that

and since the are assumed to be approximately uncorrelated,

The choice yields

implying that

Because , the delta method with yields

This yields the final version of the estimate,

(14.2)

Equation (14.2) holds for in all cases. However, if (that is, there are no censored observations after the last uncensored observation), then it holds for . Hence the formula always holds for .

Formula (14.2) is known as Greenwood's approximation to the variance of , and is known to often understate the true variance.

If there is no censoring, and we take , then Greenwood's approximation yields

which may be expressed (using due to no censoring) as

Because , this sum telescopes to give

which is the same estimate as obtained in Section 14.1, but derived without use of the delta method.

We remark that in the case with (i.e. ), Greenwood's approximation cannot be used to estimate the variance of . In this case, is often replaced by in the denominator.

Turning now to the Nelson–Åalen estimator, we note that

and the same reasoning used for Kaplan–Meier implies that , yielding the estimate

(14.3)

which is referred to as Klein's estimate. A commonly used alternative estimate due to Åalen is obtained by replacing with in the numerator.

We are typically more interested in than . Because , the delta method with yields Klein's survival function estimate

that is, the estimated variance is

Variance estimates for depend on the tail correction used. Efron's method gives an estimate of 0, which is not of interest in the present context. For the exponential tail correction in the Kaplan–Meier case, we have for , , and the delta method with yields

Likelihood methods typically result in approximate asymptotic normality of the estimates, and this is true for Kaplan–Meier and Nelson–Åalen estimates as well. Using the results of Example 14.9, an approximate 95% confidence interval for is given by

For , the Nelson–Åalen estimate gives a confidence interval of

whereas that based on the Kaplan–Meier estimate is

Clearly, both confidence intervals for are unsatisfactory, both including values greater than 1.

An alternative approach can be constructed as follows, using the Kaplan–Meier estimate as an example.

Let . Using the delta method, the variance of Y can be approximated as follows. The function of interest is . Its derivative is

According to the delta method,

Then, an approximate 95% confidence interval for is

Because , evaluating each endpoint of this formula provides a confidence interval for . For the upper limit, we have (where )

Similarly, the lower limit is . This interval will always be inside the range 0–1 and is referred to as a log-transformed confidence interval.

For the Nelson–Åalen estimator, a similar log-transformed confidence interval for has endpoints , where . Exponentiation of the negative of these endpoints yields a corresponding interval for .

14.3.1 Exercises

14.10 (*) You are given the following times of first claim for five randomly selected automobile insurance policies: 1, 2, 3, 4, and 5. You are later told that one of the five times given is actually the time of policy lapse, but you are not told which one. The smallest product-limit estimate of , the probability that the first claim occurs after time 4, would result if which of the given times arose from the lapsed policy?
14.11 (*) For a mortality study with right censored data, you are given the information in Table 14.12. Calculate the estimate of the survival function at time 12 using the Nelson–Åalen estimate.

Table 14.12 The data for Exercise 14.11.

Time Number of deaths Number at risk

5 2 15

7 1 12

10 1 10

12 2 6
14.12 (*) Let n be the number of lives observed from birth. None were censored and no two lives died at the same age. At the time of the ninth death, the Nelson–Åalen estimate of the cumulative hazard rate is 0.511, and at the time of the tenth death it is 0.588. Estimate the value of the survival function at the time of the third death.
14.13 (*) All members of a study joined at birth; however, some may exit the study by means other than death. At the time of the third death, there was one death (i.e. ); at the time of the fourth death, there were two deaths; and at the time of the fifth death, there was one death. The following product-limit estimates were obtained: , , and . Determine the number of censored observations between times and . Assume that no observations were censored at the death times.
14.14 (*) A mortality study has right censored data and no left truncated data. Uncensored observations occurred at ages 3, 5, 6, and 10. The risk sets at these ages were 50, 49, k, and 21, respectively, while the number of deaths observed at these ages were 1, 3, 5, and 7, respectively. The Nelson–Åalen estimate of the survival function at time 10 is 0.575. Determine k.
14.15 (*) Consider the observations 2,500, 2,500, 2,500, 3,617, 3,662, 4,517, 5,000, 5,000, 6,010, 6,932, 7,500, and 7,500. No truncation is possible. First, determine the Nelson–Åalen estimate of the cumulative hazard rate function at 7,000, assuming that all the observations are uncensored. Second, determine the same estimate, assuming that the observations at 2,500, 5,000, and 7,500 were right censored.
14.16 (*) No observations in a data set are truncated. Some are right censored. You are given , , and the Kaplan–Meier estimates , , and . Also, between the observations and there are six right censored observations and no observations were right censored at the same value as an uncensored observation. Determine .
14.17 For Data Set A, determine the empirical estimate of the probability of having two or more accidents and estimate its variance.
14.18 (*) Ten individuals were observed from birth. All were observed until death. Table 14.13 gives the death ages. Let denote the estimated conditional variance of if calculated without any distribution assumption. Let denote the conditional variance of if calculated knowing that the survival function is . Determine .

Table 14.13 The data for Exercise 14.18.

Age Number of deaths

2 1

3 1

5 1

7 2

10 1

12 2

13 1

14 1
14.19 (*) Observations can be censored, but there is no truncation. Let and be consecutive death ages. A 95% linear confidence interval for using the Klein estimator is , while a similar interval for is . Determine .
14.20 (*) A mortality study is conducted on 50 lives, all observed from age 0. At age 15 there were two deaths; at age 17 there were three censored observations; at age 25 there were four deaths; at age 30 there were c censored observations; at age 32 there were eight deaths; and at age 40 there were two deaths. Let S be the product-limit estimate of and let V be the Greenwood approximation of this estimator's variance. You are given . Determine the value of c.
14.21 (*) Fifteen cancer patients were observed from the time of diagnosis until the earlier of death or 36 months from diagnosis. Deaths occurred as follows: at 15 months there were two deaths; at 20 months there were three deaths; at 24 months there were two deaths; at 30 months there were d deaths; at 34 months there were two deaths; and at 36 months there was one death. The Nelson–Åalen estimate of is 1.5641. Determine Klein's estimate of the variance of this estimator.
14.22 (*) Ten payments were recorded as follows: 4, 4, 5, 5, 5, 8, 10, 10, 12, and 15, with the italicized values representing payments at a policy limit. There were no deductibles. Determine the product-limit estimate of and Greenwood's approximation of its variance.
14.23 (*) All observations begin on day 0. Eight observations were 4, 8, 8, 12, 12, 12, 22, and 36, with the italicized values representing right censored observations. Determine the Nelson–Åalen estimate of and then determine a 90% linear confidence interval for the true value using Klein's variance estimate.
14.24 You are given the data in Table 14.14, based on 40 observations. Dashes indicate missing observations that must be deduced.
1. Compute the Kaplan–Meier estimate for .
2. Compute the Nelson–Åalen estimate for .
3. Compute using the method of Brown, Hollander, and Korwar.
4. Compute Greenwood's approximation, .
5. Compute a 95% linear confidence interval for using the Kaplan–Meier estimate.
6. Compute a 95% log-transformed confidence interval for using the Kaplan–Meier estimate.
Table 14.14 The data for Exercise 14.24.

i

1 4 3 — 40

2 6 — 3 31

3 9 6 4 23

4 13 4 — —

5 15 2 4 6
14.25 You are given the data in Table 14.15, based on 50 observations.
1. Compute the Kaplan–Meier estimate for .
2. Compute the Nelson–Åalen estimate for .
3. Compute using Efron's tail correction, and also using the exponential tail correction of Brown, Hollander, and Korwar.
4. Compute Klein's survival function estimate of the variance of .
5. Compute a 95% log-transformed confidence interval for based on the Nelson–Åalen estimate.
6. Using the exponential tail correction method of Brown, Hollander, and Korwar, estimate the variance of .
Table 14.15 The data for Exercise 14.25.

i

1 3 3 6 50

2 5 7 4 41

3 7 5 2 30

4 11 5 3 23

5 16 6 4 15

6 20 2 3 5
14.26 Consider the estimator

where is differentiable.
1. Show that becomes the Kaplan–Meier estimator when , and becomes the Nelson–Åalen estimator when .
2. Derive the variance estimate
3. Consider
  
  Prove that if , and if , and thus that in particular.
  
  Hint: Prove by induction on m the identity for and .

Time	Number of deaths	Number at risk
5	2	15
7	1	12
10	1	10
12	2	6

Age	Number of deaths
2	1
3	1
5	1
7	2
10	1
12	2
13	1
14	1

i
1	4	3	—	40
2	6	—	3	31
3	9	6	4	23
4	13	4	—	—
5	15	2	4	6

i
1	3	3	6	50
2	5	7	4	41
3	7	5	2	30
4	11	5	3	23
5	16	6	4	15
6	20	2	3	5

14.4 Empirical Estimation of Moments

In the previous section, we focused on estimation of the survival function or, equivalently, the cumulative distribution function , of a random variable Y. In many actuarial applications, other quantities such as raw moments are of interest. Of central importance in this context is the mean, particularly for premium calculation in a loss modelling context.

For estimation of the mean with complete data , an obvious (unbiased) estimation is , but for incomplete data such as that of the previous section involving right censoring, other methods are needed. We continue to assume that we have the setting described in the previous section, and we will capitalize on the results obtained for there. To do so, we recall that, for random variables that take on only nonnegative values, the mean satisfies

and empirical estimation of may be done by replacing with an estimator such as the Kaplan–Meier estimator or the Nelson–Åalen estimator . To unify the approach, we will assume that is estimated for by the estimator given in Exercise 14.26 of Section 14.3, namely

where for the Kaplan–Meier estimator and for the Nelson–Åalen estimator. The mean is obtained by replacing with in the integrand. This yields the estimator

It is convenient to write

where

and

Anticipating what follows, we wish to evaluate for . For , we have that for . Thus

To evaluate for , recall that for and for , . Thus,

For evaluation of , note that

and also that for ,

a recursive formula, beginning with .

For the estimates themselves, , and the above formulas continue to hold when is replaced by , by , and by .

The estimate of the mean clearly depends on , which in turn depends on the tail correction, that is, on for . If for (as, for example, under Efron's tail correction), then . Under Klein and Moeschberger's method, with for , and for where ,

For the exponential tail correction of Brown, Hollander, and Korwar, for with . Thus

The following example illustrates the calculation of , where all empirical quantities are obtained by substitution of estimates.

To estimate the variance of , we note that is a function of , for which we have an estimate of the variance matrix from the previous section. In particular, is a diagonal matrix (i.e. all off-diagonal elements are 0). Thus, by the multivariate delta method, with the matrix A with jth entry , the estimated variance of is

and it remains to identify for .

To begin, first note that depends on , which in turn depends on but also on the tail correction employed. As such, we will express the formulas in terms of for for the moment. We first consider for . Then,

In the above expression, does not appear in the first terms of the summation, that is, for . Thus,

and in terms of , this may be expressed as

It is also useful to note that does not involve and thus . The general variance formula thus may be written as

But

and thus,

in turn implying that

The variance is estimated by replacing parameters with their estimates in the above formula. This yields

where we understand to mean with replaced by and by .

If , then

a formula that further simplifies, under the Kaplan–Meier assumption (recalling that ), to

We note that if no tail correction is necessary, because (in which case as well and the upper limit of the summation is ), or under Efron's approximation.

For Klein and Moeschberger's method,

implying that

resulting in the same variance formula as under Efron's method [but is increased by for this latter approximation].

Turning now to the exponential tail correction with , recall that and . Thus

Therefore, under the exponential tail correction, the general variance estimate becomes

In the Nelson–Åalen case with , the term may obviously be omitted.

For higher moments, a similar approach may be used. We have, for the th moment,

which may be estimated (using without loss of generality) by

Again, the final integral on the right-hand side depends on the tail correction, and is 0 if or under Efron's tail correction. It is useful to note that under the exponential tail correction, for with , and if ,

using the tail function representation of the gamma distribution. That is, under the exponential tail correction,

In particular, for the second moment ,

Variance estimation for may be done in a similar manner as for the mean, if desired.

14.4.1 Exercises

14.27 For the data of Exercise 14.24 and using the Kaplan–Meier estimate:
1. Compute the mean survival time estimate assuming Efron's tail correction.
2. Compute the mean survival time estimate using the exponential tail correction of Brown, Hollander, and Korwar.
3. Estimate the variance of the estimate in (a).
14.28 For the data of Exercise 14.25, using the Nelson–Åalen estimate and the exponential tail correction of Brown, Hollander, and Korwar:
1. Estimate the mean .
2. Estimate the variance of in (b).
14.29 For the data in Example 14.5 and subsequent examples, using the Nelson–Åalen estimate with the exponential tail correction of Brown, Hollander, and Korwar, estimate the variance of Y.

14.5 Empirical Estimation with Left Truncated Data

The results of Section 14.3 apply in situations in which the data are (right) censored. In this section, we discuss the situation in which the data may also be (left) truncated. We have the following definitions.

In insurance survival data and claim data, the most common occurrences are left truncation and right censoring. Left truncation occurs when an ordinary deductible of d is applied. When a policyholder has a loss below d, he or she realizes no benefits will be paid and so does not inform the insurer. When the loss is above d, the amount of the loss is assumed to be reported.⁵ A policy limit leads to an example of right censoring. When the amount of the loss equals or exceeds u, benefits beyond that value are not paid, and so the exact value is not recorded. However, it is known that a loss of at least u has occurred.

For decrement studies, such as of human mortality, it is impractical to follow people from birth to death. It is more common to follow a group of people of varying ages for a few years during the study period. When a person joins a study, he or she is alive at that time. This person's age at death must be at least as great as the age at entry to the study and thus has been left truncated. If the person is alive when the study ends, right censoring has occurred. The person's age at death is not known, but it is known that it is at least as large as the age when the study ended. Right censoring also affects those who exit the study prior to its end due to surrender. Note that this discussion could have been about other decrements, such as disability, policy surrender, or retirement.

Because left truncation and right censoring are the most common occurrences in actuarial work, they are the only cases that are covered in this section. To save words, truncated always means truncated from below and censored always means censored from above.

When trying to construct an empirical distribution from truncated or censored data, the first task is to create notation to represent the data. For individual (as opposed to grouped) data, the following facts are needed. The first is the truncation point for that observation. Let that value be for the jth observation. If there was no truncation, .⁶ Next, record the observation itself. The notation used depends on whether or not that observation was censored. If it was not censored, let its value be . If it was censored, let its value be . When this subject is presented more formally, a distinction is made between the case where the censoring point is known in advance and where it is not. For example, a liability insurance policy with a policy limit usually has a censoring point that is known prior to the receipt of any claims. By comparison, in a mortality study of insured lives, those that surrender their policy do so at an age that was not known when the policy was sold. In this chapter, no distinction is made between the two cases.

To construct the estimate, the raw data must be summarized in a useful manner. The most interesting values are the uncensored observations. As in Section 14.3, let be the k unique values of the that appear in the sample, where k must be less than or equal to the number of uncensored observations. We also continue to let be the number of times the uncensored observation appears in the sample. Again, an important quantity is , the number “at risk” at . In a decrement study, represents the number under observation and subject to the decrement at that time. To be under observation at , an individual must (1) either be censored or have an observation that is on or after and (2) not have a truncation value that is on or after . That is,

Alternatively, because the total number of is equal to the total number of and , we also have

(14.4)

This latter version is a bit easier to conceptualize because it includes all who have entered the study prior to the given age less those who have already left. The key point is that the number at risk is the number of people observed alive at age . If the data are loss amounts, the risk set is the number of policies with observed loss amounts (either the actual amount or the maximum amount due to a policy limit) greater than or equal to less those with deductibles greater than or equal to . These relationships lead to a recursive version of the formula,

(14.5)

where between is interpreted to mean greater than or equal to and less than , and is set equal to zero.

A consequence of the above definitions is that if a censoring or truncation time equals that of a death, the death is assumed to have happened first. That is, the censored observation is considered to be at risk while the truncated observation is not.

The definition of presented here is consistent with that in Section 14.3. That is, if for all observations, the formulas presented here reduce match those presented earlier. The following example illustrates calculating the number at risk when there is truncation.

Example 14.14

Using Data Set D, introduced in Chapter 10 and reproduced here as Table 14.16, calculate the values using both (14.4) and (14.5). To provide some context and explain the entries in the table, think of this as a study of mortality by duration for a five-year term insurance policy. The study period is a fixed time period. Thus, policies 31–40 were already insured when first observed. There are two ways in which censoring occurs. For some (1, 2, 3, 5, etc.), the study either ended while they were still alive or they terminated their policies prior to the end of the five-year term. For others (19–32, 36, etc.), the five-year term ended with them still alive. Note that the eight observed deaths are at six distinct times and that . Because the highest censoring time is 5.0, .

The calculations appear in Table 14.17.

Table 14.16 The values for Example 14.14.

i				i
1	0	—	0.1	16	0	4.8	—
2	0	—	0.5	17	0	—	4.8
3	0	—	0.8	18	0	—	4.8
4	0	0.8	—	19–30	0	—	5.0
5	0	—	1.8	31	0.3	—	5.0
6	0	—	1.8	32	0.7	—	5.0
7	0	—	2.1	33	1.0	4.1	—
8	0	—	2.5	34	1.8	3.1	—
9	0	—	2.8	35	2.1	—	3.9
10	0	2.9	—	36	2.9	—	5.0
11	0	2.9	—	37	2.9	—	4.8
12	0	—	3.9	38	3.2	4.0	—
13	0	4.0	—	39	3.4	—	5.0
14	0	—	4.0	40	3.9	—	5.0
15	0	—	4.1

Table 14.17 The risk set calculations for Example 14.14.

j
1	0.8	1	or
2	2.9	2	or
3	3.1	1	or
4	4.0	2	or
5	4.1	1	or
6	4.8	1	or

The approach to developing an empirical estimator of the survival function is to use the formulas developed in Section 14.3, but with this more general definition of . A theoretical treatment that incorporates left truncation is considerably more complex (for details, see Lawless [77]).

The formula for the Kaplan–Meier estimate is the same as presented earlier, namely

The same tail corrections developed in Section 14.3 can be used for in cases where .

In this example, a tail correction is not needed because an estimate of survival beyond the five-year term is of no value when analyzing these policyholders.

The same analogy holds for the Nelson–Åalen estimator, where the formula for the cumulative hazard function remains

As before, for and for the same tail corrections can be used.

In this section, the results were not formally developed, as was done for the case with only right censored data. However, all the results, including formulas for moment estimates and estimates of the variance of the estimators, hold when left truncation is added. However, it is important to note that when the data are truncated, the resulting distribution function is the distribution function of observations given that they are above the smallest truncation point (i.e. the smallest d value). Empirically, there is no information about observations below that value, and thus there can be no information for that range. Finally, if it turns out that there was no censoring or truncation, use of the formulas in this section will lead to the same results as when using the empirical formulas in Section 14.1.

14.5.1 Exercises

14.30 Repeat Example 14.14, treating “surrender” as “death.” The easiest way to do this is to reverse the x and u labels. In this case, death produces censoring because those who die are lost to observation and thus their surrender time is never observed. Treat those who lasted the entire five years as surrenders at that time.
14.31 Determine the Kaplan–Meier estimate for the time to surrender for Data Set D. Treat those who lasted the entire five years as surrenders at that time.
14.32 Determine the Nelson–Åalen estimate of and for Data Set D, where the variable is time to surrender.
14.33 Determine the Kaplan–Meier and Nelson–Åalen estimates of the distribution function of the amount of a workers compensation loss. First use the raw data from Data Set B. Then repeat the exercise, modifying the data by left truncation at 100 and right censoring at 1,000.
14.34 (*) Three hundred mice were observed at birth. An additional 20 mice were first observed at age 2 (days) and 30 more were first observed at age 4. There were 6 deaths at age 1, 10 at age 3, 10 at age 4, a at age 5, b at age 9, and 6 at age 12. In addition, 45 mice were lost to observation at age 7, 35 at age 10, and 15 at age 13. The following product-limit estimates were obtained: and . Determine the values of a and b.
14.35 Construct 95% confidence intervals for by both the linear and log-transformed formulas using all 40 observations in Data Set D, with surrender being the variable of interest.
14.36 (*) For the interval from zero to one year, the exposure (r) is 15 and the number of deaths (s) is 3. For the interval from one to two years, the exposure is 80 and the number of deaths is 24. For two to three years, the values are 25 and 5; for three to four years, they are 60 and 6; and for four to five years, they are 10 and 3. Determine Greenwood's approximation to the variance of .
14.37 (*) You are given the values in Table 14.18. Determine the standard deviation of the Nelson–Åalen estimator of the cumulative hazard function at time 20.

Table 14.18 The data for Exercise 14.37.

1 100 15

8 65 20

17 40 13

25 31 31


1	100	15
8	65	20
17	40	13
25	31	31

14.6 Kernel Density Models

One problem with empirical distributions is that they are always discrete. If it is known that the true distribution is continuous, the empirical distribution may be viewed as a poor approximation. In this section, a method of obtaining a smooth, empirical-like distribution, called a kernel density distribution, is introduced. We have the following definition.

Note that the empirical distribution is a special type of kernel smoothed distribution in which the random variable assigns probability 1 to the data point. With regard to kernel smoothing, there are several distributions that could be used, three of which are introduced here.

While not necessary, it is customary that the continuous variable have a mean equal to the value of the point it replaces, ensuring that, overall, the kernel estimate has the same mean as the empirical estimate. One way to think about such a model is that it produces the final observed value in two steps. The first step is to draw a value at random from the empirical distribution. The second step is to draw a value at random from a continuous distribution whose mean is equal to the value drawn at the first step. The selected continuous distribution is called the kernel.

For notation, let be the probability assigned to the value by the empirical distribution. Let be a distribution function for a continuous distribution such that its mean is y. Let be the corresponding density function.

The function is called the kernel. Three kernels are now introduced: uniform, triangular, and gamma.

In each case, there is a parameter that relates to the spread of the kernel. In the first two cases, it is the value of , which is called the bandwidth. In the gamma case, the value of controls the spread, with a larger value indicating a smaller spread. There are other kernels that cover the range from zero to infinity.

Example 14.17

Determine the kernel density estimate for Example 14.2 using each of the three kernels.

The empirical distribution places probability at 1.0, at 1.3, at 1.5, at 2.1, and at 2.8. For a uniform kernel with a bandwidth of 0.1 we do not get much separation. The data point at 1.0 is replaced by a horizontal density function running from 0.9 to 1.1 with a height of . In comparison, with a bandwidth of 1.0, that same data point is replaced by a horizontal density function running from 0.0 to 2.0 with a height of . Figures 14.3 and 14.4 provide plots of the density functions.

Figure 14.3 The uniform kernel density with bandwidth 0.1.

Figure 14.4 The uniform kernel density with bandwidth 1.0.

It should be clear that the larger bandwidth provides more smoothing. In the limit, as the bandwidth approaches zero, the kernel density estimate matches the empirical estimate. Note that, if the bandwidth is too large, probability will be assigned to negative values, which may be an undesirable result. Methods exist for dealing with that issue, but they are not presented here.

For the triangular kernel, each point is replaced by a triangle. Graphs for the same two bandwidths as used previously appear in Figures 14.5 and 14.6.

Figure 14.5 The triangular kernel density with bandwidth 0.1.

Figure 14.6 The triangular kernel density with bandwidth 1.0.

Once again, the larger bandwidth provides more smoothing. The gamma kernel simply provides a mixture of gamma distributions, where each data point provides the mean and the empirical probabilities provide the weights. The density function is

and is graphed in Figures 14.7 and 14.8 for two values.⁷ For this kernel, decreasing the value of increases the amount of smoothing. Further discussion of the gamma kernel can be found in Carriere [23], where the author recommends .

Figure 14.7 The gamma kernel density with .

Figure 14.8 The gamma kernel density with .

14.6.1 Exercises

14.38 Provide the formula for the Pareto kernel.
14.39 Construct a kernel density estimate for the time to surrender for Data Set D. Be aware of the fact that this is a mixed distribution (probability is continuous from 0 to 5 but is discrete at 5).
14.40 (*) You are given the data in Table 14.19 on time to death. Using the uniform kernel with a bandwidth of 60, determine .

Table 14.19 The data for Exercise 14.40.

10 1 20

34 1 19

47 1 18

75 1 17

156 1 16

171 1 15
14.41 (*) You are given the following ages at time of death for 10 individuals: 25, 30, 35, 35, 37, 39, 45, 47, 49, and 55. Using a uniform kernel with a bandwidth of , determine the kernel density estimate of the probability of survival to age 40.
14.42 (*) Given the five observations 82, 126, 161, 294, and 384, determine each of the following estimates of :
1. The empirical estimate.
2. The kernel density estimate based on a uniform kernel with bandwidth .
3. The kernel density estimate based on a triangular kernel with bandwidth .


10	1	20
34	1	19
47	1	18
75	1	17
156	1	16
171	1	15

14.7 Approximations for Large Data Sets

14.7.1 Introduction

The discussion in this section is motivated by the circumstances that accompany the determination of a model for the time to death (or other decrement) for use in pricing, reserving, or funding insurance programs. The particular circumstances are as follows:

Values of the survival function are required only at discrete values, normally integral ages measured in years.
A large volume of data has been collected over a fixed time period, with most observations truncated, censored, or both.
No parametric distribution is available that provides an adequate model given the volume of available data.

These circumstances typically apply when an insurance company (or a group of insurance companies) conducts a mortality study based on the historical experience of a very large portfolio of life insurance policies. (For the remainder of this section, we shall refer only to mortality. The results apply equally to the study of other decrements such as disablement or surrender.)

The typical mortality table is essentially a distribution function or a survival function with values presented only at integral ages. While there are parametric models that do well over parts of the age range (such as the Makeham model at ages over about 30), there are too many changes in the pattern from age 0 to ages well over 100 to allow for a simple functional description of the survival function.

The typical mortality study is conducted over a short period of time, such as three to five years. For example, all persons who are covered by an insurance company's policies at some time from January 1, 2014 through December 31, 2016 might be included. Some of these persons may have purchased their policies prior to 2014 and were still covered when the study period started. During the study period some persons will die, some will cancel (surrender) their policy, some will have their policy expire due to policy provisions (such as with term insurance policies that expire during the study period), and some will still be insured when the study ends. It is assumed that if a policy is cancelled or expires, the eventual age at death will not be known to the insurance company. Some persons will purchase their life insurance policy during the study period and be covered for some of the remaining part of the study period. These policies will be subject to the same decrements (death, surrender, expiration) as other policies. With regard to the age at death, almost every policy in the study will be left truncated.⁸ If the policy was issued prior to 2014, the truncation point will be the age on January 1, 2014. For those who buy insurance during the study period, the truncation point is the age at which the contract begins. For any person who exits the study due to a cause other than death, their observation is right censored at the age of exit, because all that is known about them is that death will be at some unknown later age.

When no simple parametric distribution is appropriate and when large amounts of data are available, it is reasonable to use a nonparametric model because the large amount of data will ensure that key features of the survival function will be captured. Because there are both left truncation (due to the age at entry into the study) and right censoring (due to termination of the study at a fixed time), when there are large amounts of data, constructing the Kaplan–Meier estimate may require a very large amount of sorting and counting. Over the years, a variety of methods have been introduced and entire texts have been written about the problem of constructing mortality tables from this kind of data (e.g. [12, 81]). While the context for the examples presented here is the construction of mortality tables, the methods can apply any time the circumstances described previously apply.

We begin by examining the two ways in which data are usually collected. Estimators will be presented for both situations. The formulas will be presented in this section and their derivation and properties will be provided in Section 14.8. In all cases, a set of values (ages), has been established in advance and the goal is to estimate the survival function at these values and no others (with some sort of interpolation to be used to provide intermediate values as needed). All of the methods are designed to estimate the conditional one-period probability of death, , where j may refer to the interval and not to a particular age. From those values, can be evaluated as follows:

14.7.2 Using Individual Data Points

In this setting, data are recorded for each person observed. This approach is sometimes referred to as a seriatim method, because the data points are analyzed as a series of individual observations. The estimator takes the form , where is the number of observed deaths in the interval and is a measure of exposure, representing the number of individuals who had a chance to be an observed death in that interval. Should a death occur at one of the boundary values between successive intervals, the death is counted in the preceding interval. When there are no entrants after age into the interval and no exitants except for death during the interval (referred to as complete data), represents the number of persons alive at age and the number of deaths has a binomial distribution. With incomplete data, it is necessary to determine a suitable convenient approximation, preferably one that requires only a single pass through the data set. To illustrate this challenge, consider the following example.

Example 14.18

A mortality study is based on observations during the period January 1, 2014 through December 31, 2016. Five policies were observed, with the information in the following list recorded. For simplicity, a date of 3-2000 is interpreted as March 1, 2000 and all events are treated as occurring on the first day of the month of occurrence. Furthermore, all months are treated as being one-twelfth of a year in length. Summarize the information in a manner that is sufficient for estimating mortality probabilities.

Born 4-1981, purchased insurance policy on 8-2013, was an active policyholder on 1-2017.
Born 6-1981, purchased insurance policy on 7-2013, died 9-2015.
Born 8-1981, purchased insurance policy on 2-2015, surrendered policy on 2-2016.
Born 5-1981, purchased insurance policy on 6-2014, died 3-2015.
Born 7-1981, purchased insurance policy on 3-2014, surrendered policy on 5-2016.

The key information is the age of the individual when first observed, the age when last observed, and the reason why observation ended. For the five policies, using “x” for death and “s” for any other reason, the values are (where an age of 33-6 means 33 years and 6 months) (32-9,35-9,s), (32-7,34-3,x), (33-6,34-6,s), (33-1,33-10,x), and (32-8,34-10,s). Note that policies 1 and 2 were purchased prior to the start of the study, so they are first observed when the study begins. No distinction needs to be made between those whose observation ends by surrender as opposed to the ending of the study.

The next step is to tally information for each age interval, building up totals for and . Counting deaths is straightforward. For exposures, there are two approaches that are commonly used.

Exact exposure method

Following this method, we set the exposure equal to the exact total time under observation within the age interval. When a death occurs, that person's exposure ends at the exact age of death. It will be shown in Section 14.8 that is the maximum likelihood estimator of the hazard rate, under the assumption that the hazard rate is constant over the interval . Further properties of this estimator will also be discussed in that section. The estimated hazard rate can then be converted into a conditional probability of death using the formula .

Actuarial exposure method

Under this method, the exposure period for deaths extends to the end of the age interval, rather than the exact age at death. This has the advantage of reproducing the empirical estimator for complete data, but has been shown to be an inconsistent estimator in other cases. In this case, the estimate of the conditional probability of death is obtained as .

When the conditional probability of death is small, with a large number of observations, the choice of method is unlikely to materially affect the results.

Example 14.19

Estimate all possible mortality rates at integral ages for Example 14.18 using both methods.

First observe that data are available for ages 32, 33, 34, and 35. The deaths are in the intervals 33–34 and 34–35. With a seriatim approach, each policy is analyzed and its contribution to the exposure for each interval added to the running total. For each age interval, the contributions for each interval are totaled, using the exact exposure method of recording time under observation. The numbers represent months.

32–33: .
33–34: .
34–35: .
35–36: .

As a check, the total contributions for the five policies are 36, 20, 12, 9, and 26, respectively, which matches the times on observation. The only two nonzero mortality probabilities are estimated as and .

Under the actuarial exposure method, the exposure for the interval 33–34 for the fourth person increases from 9 to 11 months for a total of 53 (note that it is not a full year of exposure as observation did not begin until age 33-1). For the interval 34–35, the exposure for the second person increases from 3 to 12 months for a total of 40. The mortality probability estimates are and .

14.7.2.1 Insuring Ages

While the examples have been in a life insurance context, the methodology applies to any situation with left truncation and right censoring. However, there is a situation that is specific to life insurance studies. Consider a one-year term insurance policy. Suppose that an applicant was born on February 15, 1981 and applies for this insurance on October 15, 2016. Premiums are charged by whole-number ages. Some companies will use the age at the last birthday (35 in this case) and some will use the age at the nearest birthday (36 in this case). One company will base the premium on and one on when both should be using , the applicant's true age. Suppose that a company uses age last birthday. When estimating , it is not interested in the probability that a person exactly age 35 dies in the next year (the usual interpretation) but, rather, the probability that a random person who is assigned age 35 at issue (who can be anywhere between 35 and 36 years old) dies in the next year. One solution is to obtain a table based on exact ages, assume that the average applicant is 35.5, and use an interpolated value when determining premiums. A second solution is to perform the mortality study using the ages assigned by the company rather than the policyholder's true age. In the example, the applicant is considered to be exactly age 35 on October 15, 2016 and is thus assigned a new birthday of October 15, 1981. When this is done, the study is said to use insuring ages and the resulting values can be used directly for insurance calculations.

Example 14.21

Suppose that the company in Example 14.18 assigned insuring ages by age last birthday. Use the actuarial exposure method to estimate all possible mortality values.

Born 4-1981, purchased insurance policy on 8-2013, was an active policyholder on 1-2017. New birthday is 8-1981, enters at 32-5, exits at 35-5.
Born 6-1981, purchased insurance policy on 7-2013, died 9-2015. New birthday is 7-1981, enters at 32-6, dies at 34-2.
Born 8-1981, purchased insurance policy on 2-2015, surrendered policy on 2-2016. New birthday is 2-1982, enters at 33-0, exits at 34-0.
Born 5-1981, purchased insurance policy on 6-2014, died 3-2015. New birthday is 6-1981, enters at 33-0, dies at 33-9.
Born 7-1981, purchased insurance policy on 3-2014, surrendered policy on 5-2016. New birthday is 3-1982, enters at 32-0, exits at 34-2.

The exposures are now:

32–33: .
33–34: .
34–35: .
35–36: .

As expected, the exposures are assigned to younger age intervals, reflecting the fact that each applicant is assigned an age that is less than their true age. The estimates are and .

Note that with insuring ages, those who enter observation after the study begins are first observed on their newly assigned birthday. Thus there are no approximation issues with regard to those numbers.

14.7.2.2 Anniversary-Based Mortality Studies

The mortality studies described so far in this section are often called calendar-based or date-to-date studies because the period of study runs from one calendar date to another calendar date. It is also common for mortality studies of insured persons to use a different setup.

Instead of having the observations run from one calendar date to another calendar date, observation for a particular policyholder begins on the first policy anniversary following the fixed start date of the study and ends on the last anniversary prior to the study's fixed end date. Such studies are often called anniversary-to-anniversary studies. We can illustrate this through a previous example.

Consider Example 14.18, with the study now running from anniversaries in 2014 to anniversaries in 2016. The first policy comes under observation on 8-2014 at insuring age 33-0 and exits the study on 8-2016 at insuring age 35-0. Policyholder 2 begins observation on 7-2014 at insuring age 33-0. Policyholder 5 surrendered after the 2016 anniversary, so observation ends on 3-2016 at age 34-0. All other ages remain the same. In this setting, all subjects begin observations at an integral age and all who are active policyholders at the end of the study do so at an integral age. Only the ages of death and surrender may be other than integers (and note that with the actuarial exposure method, in calculating the exposure, deaths are placed at the next integral age). There is a price to be paid for this convenience. In a three-year study such as the one in the example, no single policyholder can be observed for more than two years. In the date-to-date version, some policies will contribute three years of exposure.

All of the examples have used one-year time periods. If the length of an interval is not equal to 1, an adjustment is necessary. Exposures should be the fraction of the period under observation and not the length of time.

14.7.3 Interval-Based Methods

Instead of recording the exact age at which an event happens, all that is recorded is the age interval in which it took place and the nature of the event. As with the individual method, for a portfolio of insurance policies, only running totals need to be recorded, and the end result is just four to six⁹ numbers for each age interval:

The number of persons at the beginning of the interval carried over from the previous interval.
The number of additional persons entering at the beginning of the interval.
The number of persons entering during the interval.
The number of persons exiting by death during the interval.
The number of persons exiting during the interval for reasons other than death.
The number of persons exiting at the end of the interval by other than death.

Example 14.22

Consider two versions of Example 14.18. In the first one, use exact ages and a date-to-date study. Then use age last birthday insuring ages and an anniversary-to-anniversary study. For each, construct a table of the required values.

For the exact age study, the entry age, exit age, and cause of exit values for the five lives were (32-9,35-9,s), (32-7,34-3,x), (33-6,34 -6,s), (33-1,33-10,x), and (32-8,34-10,s). There are no lives at age 32. Between ages 32 and 33, three lives enter and none exit. Between ages 33 and 34, two lives enter and one exits by death. Between ages 34 and 35, no lives enter, one dies, and two exit by other causes. Between ages 35 and 36, no lives enter and one life exits by a cause other than death. No lives enter or exit at age boundaries. The full set of values is in Table 14.20.

Table 14.20 The exact age values for Example 14.22.

Age	Number at age	Number entering	Number dying	Number exiting	Number at next age
32	0	3	0	0	3
33	3	2	1	0	4
34	4	0	1	2	1
35	1	0	0	1	0

For the insuring age study, the values are (33-0,35-0,s), (33-0,34-2,x), (33-0,34-0,s), (33-0,33-9,x), and (32-0,34-0,s). In this case, there are several entries and exits at integral ages. The values are shown in Table 14.21. It is important to note that in the first table those who enter and exit do so during the age interval, while in the second table those who enter do so at the beginning of the interval and those who exit by reason other than death do so at the end of the age interval.

Table 14.21 The insuring age values for Example 14.22.

Age	Number at age	Number entering	Number dying	Number exiting	Number at next age
32	0	1	0	0	1
33	1	4	1	2	2
34	2	0	1	1	0

The analysis of this situation is relatively simple. For the interval from age to age , let be the number of lives under observation at age . This number includes those carried over from the prior interval as well as those entering at age . Let , , and be the number entering, dying, and exiting during the interval. Note that, in general, , as the right-hand side must be adjusted by those who exit or enter at exact age . Estimating the mortality probability depends on the method selected and an assumption about when the events that occur during the age interval take place.

One approach is to assume a uniform distribution of the events during the interval. For the exact exposure method, the who start the interval have the potential to contribute a full unit of exposure and the entrants during the year add another half-year each (on average). Similarly, those who die or exit subtract one-half year on average. Thus the net exposure is . For the actuarial exposure method, those who die do not reduce the exposure, and it becomes .

Another approach is to adapt the Kaplan–Meier estimator to this situation. Suppose that the deaths all occur at midyear and all other decrements occur uniformly through the year. Then the risk set at midyear is and the estimator is the same as the actuarial estimator.

The goal of all the estimation procedures in this book is to deduce the probability distribution for the random variable in the absence of truncation and censoring. For loss data, that would be the probabilities if there were no deductible or limit, that is, ground-up losses. For lifetime data, it would be the probability distribution of the age at death if we could follow the person from birth to death. These are often referred to as single-decrement probabilities and are typically denoted in life insurance mathematics. In the life insurance context, the censoring rates are often as important as the mortality rates. For example, in the context of Data Set D, both time to death and time to withdrawal may be of interest. In the former case, withdrawals cause observations to be censored. In the latter case, censoring is caused by death. A superscript identifies the decrement of interest. For example, suppose that the decrements were death (d) and withdrawal (w). Then is the actuarial notation for the probability that a person alive and insured at age withdraws prior to age in an environment where withdrawal is the only decrement, that is, that death is not possible. When the causes of censoring are other important decrements, an often-used assumption is that all the decrements are stochastically independent. That is, that they do not influence each other. For example, a person who withdraws at a particular age has the same probability of dying in the following month as a person who does not.

Example 14.24

Estimate single-decrement probabilities using Data Set D and the actuarial method. Make reasonable assumptions.

First consider the decrement death. In the notation of this section, the relevant quantities are shown in Table 14.22. The notation refers to entrants who do so at the beginning of the interval, while refers to those entering during the interval. These categories are based not on the time of the event but, rather, on its nature. The 30 policies that were issued during the study are at time zero by definition. A policy that was at duration 1.0 when the study began could have entered at any duration. Such policies are included in for the preceding interval. For those who exit other than by death, the notation is for those exiting during the interval and for those doing so at the end. Again, those who do so by chance are assigned to the previous interval. Deaths are also assigned to the previous interval.

Table 14.22 The single-decrement mortality probabilities for Example 14.24.

j
0	0	30	3	1	3	0
1	29	0	1	0	2	0
2	28	0	3	2	3	0
3	26	0	3	3	3	0
4	23	0	0	2	4	17

For withdrawals, the values of are given in Table 14.23. To keep the notation consistent, now refers to withdrawals and refers to other decrements.

Table 14.23 The single-decrement withdrawal probabilities for Example 14.24.

j
0	0	30	3	3	1	0
1	29	0	1	2	0	0
2	28	0	3	3	2	0
3	26	0	3	3	3	0
4	23	0	0	4	2	17

Example 14.25

Loss data for policies with deductibles of 0, 250, and 500 and policy limits of 5,000, 7,500, and 10,000 were collected. The data appear in Table 14.24. Use the actuarial method to estimate the distribution function for losses.

Table 14.24 The data for Example 14.25.

	Deductible
Range	0	250	500	Total
0–100	15			15
100–250	16			16
250–500	34	96		130
500–1,000	73	175	251	499
1,000–2,500	131	339	478	948
2,500–5,000	83	213	311	607
5,000–7,500	12	48	88	148
7,500–10,000	1	4	11	16
At 5,000	7	17	18	42
At 7,500	5	10	15	30
At 10,000	2	1	4	7
Total	379	903	1,176	2,458

The calculations are in Table 14.25. Because the deductibles and limits are at the endpoints of intervals, the only reasonable assumption is the first one presented.

Table 14.25 The calculations for Example 14.25.


0	0	379	15	0	0.0000
100	364	0	16	0	0.0396
250	348	903	130	0	0.0818
500	1,121	1,176	499	0	0.1772
1,000	1,798	0	948	0	0.3560
2,500	850	0	607	42	0.6955
5,000	201	0	148	30	0.9130
7,500	23	0	16	7	0.9770
10,000	0				0.9930

14.7.4 Exercises

14.43 Verify the calculations in Table 14.23.
14.44 For an anniversary-to-anniversary study, the values in Table 14.26 were obtained. Estimate and using the exact Kaplan–Meier estimate, exact exposure, and actuarial exposure.

Table 14.26 The data for Exercise 14.44.

d u x d u x

45 46.0 45 45.8

45 46.0 46 47.0

45 45.3 46 47.0

45 46.7 46 46.3

45 45.4 46 46.2

45 47.0 46 46.4

45 45.4 46 46.9

d	u	x	d	u	x
45	46.0		45	45.8
45	46.0		46	47.0
45		45.3	46	47.0
45		46.7	46	46.3
45		45.4	46		46.2
45	47.0		46		46.4
45	45.4		46	46.9

14.45 Twenty-two insurance payments are recorded in Table 14.27. Use the fewest reasonable number of intervals and an interval-based method with actuarial exposure to estimate the probability that a policy with a deductible of 500 will have a payment in excess of 5,000.

Table 14.27 The data for Exercise 14.45.

Deductible	Payment^a	Deductible	Payment
250	2,221	500	3,660
250	2,500	500	215
250	207	500	1,302
250	3,735	500	10,000
250	5,000	1,000	1,643
250	517	1,000	3,395
250	5,743	1,000	3,981
500	2,500	1,000	3,836
500	525	1,000	5,000
500	4,393	1,000	1,850
500	5,000	1,000	6,722
aNumbers in italics indicate that the amount paid was at the policy limit.

14.46 (*) Nineteen losses were observed. Six had a deductible of 250, six had a deductible of 500, and seven had a deductible of 1,000. Three losses were paid at a policy limit, those values being 1,000, 2,750, and 5,500. For the 16 losses not paid at the limit, one was in the interval (250,500), two in (500,1,000), four in (1,000,2,750), seven in (2 ,750,5,500), one in (5,500,6,000), and one in (6,000,10,000). Estimate the probability that a policy with a deductible of 500 will have a claim payment in excess of 5,500.

14.8 Maximum Likelihood Estimation of Decrement Probabilities

In Section 14.7, methods were introduced for estimating mortality probabilities with large data sets. One of the methods was a seriatim method using exact exposure. In this section, that estimator will be shown to be maximum likelihood under a particular assumption. To do this, we need to develop some notation. Suppose that we are interested in estimating the probability that an individual alive at age a dies prior to age b, where . This is denoted . Let X be the random variable with survival function , the probability of surviving from birth to age x. Now let Y be the random variable X conditioned on . Its survival function is .

We now introduce a critical assumption about the shape of the survival function within the interval under consideration. Assume that for . This means that the survival function decreases exponentially within the interval. Equivalently, the hazard rate (called the force of mortality in life insurance mathematics) is assumed to be constant within the interval. Beyond b, a different hazard rate can be used. Our objective is to estimate the conditional probability q. Thus we can perform the estimation using only data from and a functional form for this interval. Values of the survival function beyond b will not be needed.

Now consider data collected on n individuals, all of whom were observed during the age interval . For individual j, let be the age at which the person was first observed within the interval and let be the age at which the person was last observed within the interval (thus ). Let if the individual was alive when last observed and if the individual was last observed due to death. For this analysis, we assume that each individual's censoring age (everyone who does not die in the interval will be censored, either by reaching age b or through some event that removes them from observation) is known in advance. Thus the only random quantities are , and for individuals with , the age at death. The likelihood function is

where is the number of observed deaths and is the total time the individuals were observed in the interval (which was called exact exposure in Section 14.7). Taking logarithms, differentiating, and solving produces

Finally, the maximum likelihood estimate of the probability of death is .

Studies often involve random censoring, where individuals may exit for reasons other than death at times that were not known in advance. If all decrements (e.g. death, disability, and retirement) are stochastically independent (that is, the timing of one event does not influence any of the others), then the maximum likelihood estimator turns out to be identical to the one derived in this section. Although we do not derive the result, note that it follows from the fact that the likelihood function can be decomposed into separate factors for each decrement.

The variance of this estimator can be approximated using the observed information approach. The second derivative of the loglikelihood function is

Substitution of the estimator produces

and so . Using the delta method,

Recall from Section 14.7 that there is an alternative called actuarial exposure, with with e calculated in a different manner. When analyzing results from this approach, it is common to assume that d is the result of a binomial experiment with sample size . Then,

If the terms are dropped (and they are often close to 1), the two variance formulas are identical (noting that the values of e will be slightly different).

Example 14.26

A pension plan assigns every employee an integral age on their date of hire (thus retirement age is always reached on an anniversary of being hired and not on a birthday). Because of the nature of the employee contracts, employees can only quit their job on annual anniversaries of being hired or six months after an anniversary. They can die at any age. Using assigned ages, a mortality study observed employees from age 35 to age 36. There were 10,000 who began at age 35. Of them, 100 died between ages 35 and 35.5 at an average age of 35.27, 400 quit at age 35.5, 110 died between ages 35.5 and 36 at an average age of 35.78, and the remaining 9,390 were alive and employed at age 36. Using both exact and actuarial exposure, estimate the single-decrement value of and the standard deviation of the estimator.

The exact exposure is . Then, . The estimated standard deviation is . Recall that actuarial exposure assumes deaths occur at the end of the interval. The exposure is now . Then, . The estimated standard deviation is .

14.8.1 Exercise

14.47 In Exercise 14.44, mortality estimates for q₄₅ and q₄₆ were obtained by Kaplan–Meier, exact exposure, and actuarial exposure. Approximate the variances of these estimates (using Greenwood's formula in the Kaplan–Meier case).

14.9 Estimation of Transition Intensities

The discussion to this point has concerned estimating the probability of a decrement in the absence of other decrements. An unstated assumption was that the environment in which the observations are made is one where once any decrement occurs, the individual is no longer observed.

A common, and more complex, situation is one in which after a decrement occurs, the individual remains under observation, with the possibility of further decrements. A simple example is a disability income policy. A healthy individual can die, become disabled, or surrender their policy. Those who become disabled continue to be observed, with possible decrements being recovery or death. Scenarios such as this are referred to as multistate models. Such models are discussed in detail in Dickson et al. [28]. In this section, we cover estimation of the transition intensities associated with such models. The results presented are based on Waters [129].

For notation, let the possible states be and let be the force of transition to state j for an individual who is currently between ages x and and is in state i. This notation is based on an assumption that the force of transition is constant over an integral age. This is similar to the earlier assumption that the force of decrement is constant over a given age.

While not shown here, maximum likelihood estimates turn out to be based on exact exposure for the time spent in each state. For those between ages x and (which can be generalized for periods of other than one year), let be the total time policyholders are observed in state i and be the number of observed transitions from state i to state j. Then, . Similarly, .

The construction of interval-based methods is more difficult because it is unclear when to place the transitions. Those who make one transition in the year may be reasonably placed at mid-age. However, those who make two transitions would more reasonably be placed at the one-third and two-thirds points. This would require careful data-keeping and the counting of many different cases.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14: Construction of Empirical Models

Create new playlist

Sign In

Sign Up

14.1 The Empirical Distribution

14.2 Empirical Distributions for Grouped Data

14.2.1 Exercises

14.3 Empirical Estimation with Right Censored Data

14.3.1 Exercises

14.4 Empirical Estimation of Moments

14.4.1 Exercises

14.5 Empirical Estimation with Left Truncated Data

14.5.1 Exercises

14.6 Kernel Density Models

14.6.1 Exercises

14.7 Approximations for Large Data Sets

14.7.1 Introduction

14.7.2 Using Individual Data Points

14.7.2.1 Insuring Ages

14.7.2.2 Anniversary-Based Mortality Studies

14.7.3 Interval-Based Methods

14.7.4 Exercises

14.8 Maximum Likelihood Estimation of Decrement Probabilities

14.8.1 Exercise

14.9 Estimation of Transition Intensities

Notes

Table of Contents for
Chapter 14: Construction of Empirical Models