Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

15
Model Selection

15.1 Introduction

When using data to build a model, the process must end with the announcement of a “winner.” While qualifications, limitations, caveats, and other attempts to escape full responsibility are appropriate, and often necessary, a commitment to a solution is often required. In this chapter, we look at a variety of ways to evaluate a model and compare competing models. But we must also remember that whatever model we select, it is only an approximation of reality. This observation is reflected in the following modeler's motto:

All models are wrong, but some models are useful.¹

Thus, our goal is to determine a model that is good enough to use to answer the question. The challenge here is that the definition of “good enough” will depend on the particular application. Another important modeling point is that a solid understanding of the question will guide you to the answer. The following quote from John Tukey [122 pp. 13–14] sums up this point:

Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.

In this chapter, a specific modeling strategy is considered. Our preference is to have a single approach that can be used for any probabilistic modeling situation. A consequence is that for any particular modeling situation, there may be a better (more reliable or more accurate) approach. For example, while maximum likelihood is a good estimation method for most settings, it may not be the best² for certain distributions. A literature search will turn up methods that have been optimized for specific distributions, but they are not mentioned here. Similarly, many of the hypothesis tests used here give approximate results. For specific cases, better approximations, or maybe even exact results, are available. They are also bypassed. The goal here is to outline a method that will give reasonable answers most of the time and be adaptable to a variety of situations.

This chapter assumes that you have a basic understanding of mathematical statistics as reviewed in Chapters 10 and 11. The remaining sections cover a variety of evaluation and selection tools. Each tool has its own strengths and weaknesses, and it is possible for different tools to lead to different models, making modeling as much art as science. At times, in real-world applications, the model's purpose may lead the analyst to favor one tool over another.

15.2 Representations of the Data and Model

All the approaches to be presented compare the proposed model to the data or to another model. The proposed model is represented by either its density or distribution function, or perhaps some functional of these quantities such as the limited expected value function or the mean excess loss function. The data can be represented by the empirical distribution function or a histogram. The graphs are easy to construct when there is individual, complete data. When there is grouping or observations have been truncated or censored, difficulties arise. Here, the only cases covered are those where the data are all truncated at the same value (which could be zero) and are all censored at the same value (which could be infinity). Extensions to the case of multiple truncation or censoring points are detailed in Klugman and Rioux [75].³ It should be noted that the need for such representations applies only to continuous models. For discrete data, issues of censoring, truncation, and grouping rarely apply. The data can easily be represented by the relative or cumulative frequencies at each possible observation.

With regard to representing the data, the empirical distribution function is used for individual data and the histogram will be used for grouped data.

To compare the model to truncated data, we begin by noting that the empirical distribution begins at the truncation point and represents conditional values (i.e. they are the distribution and density function given that the observation exceeds the truncation point). To make a comparison to the empirical values, the model must also be truncated. Let the truncation point in the data set be t. The modified functions are

and

In this chapter, when a distribution function or density function is indicated, a subscript equal to the sample size indicates that it is the empirical model (from Kaplan–Meier, Nelson–Åalen, the ogive, etc.), while no adornment or the use of an asterisk (*) indicates the estimated parametric model. There is no notation for the true, underlying distribution because it is unknown and unknowable.

15.3 Graphical Comparison of the Density and Distribution Functions

The most direct way to see how well the model and data match is to plot the respective density and distribution functions.

Example 15.1

Consider Data Sets B and C as given in Tables 15.1 and 15.2. For this example and all that follow, in Data Set B replace the value at 15,743 with 3,476 (to allow the graphs to fit comfortably on a page). Truncate Data Set B at 50 and Data Set C at 7,500. Estimate the parameter of an exponential model for each data set. Plot the appropriate functions and comment on the quality of the fit of the model. Repeat this for Data Set B censored at 1,000 (without any truncation).

Table 15.1 Data Set B with the highest value changed.

27	82	115	126	155	161	243	294	340	384
457	680	855	877	974	1,193	1,340	1,884	2,558	3,476

Table 15.2 Data Set C.

Payment range	Number of payments
0–7,500	99
7,500–17,500	42
17,500–32,500	29
32,500–67,500	28
67,500–125,000	17
125,000–300,000	9
Over 300,000	3

For Data Set B, there are 19 observations (the first observation is removed due to truncation). A typical contribution to the likelihood function is . The maximum likelihood estimate of the exponential parameter is . The empirical distribution function starts at 50 and jumps 1/19 at each data point. The distribution function, using a truncation point of 50, is

Figure 15.1 presents a plot of these two functions.

Figure 15.1 The model versus data cdf plot for Data Set B truncated at 50.

The fit is not as good as we might like, because the model understates the distribution function at smaller values of x and overstates the distribution function at larger values of x. This result is not good because it means that tail probabilities are understated.

For Data Set C, the likelihood function uses the truncated values. For example, the contribution to the likelihood function for the first interval is

The maximum likelihood estimate is . The height of the first histogram bar is

and the last bar is for the interval from 125,000 to 300,000 (a bar cannot be constructed for the interval from 300,000 to infinity). The density function must be truncated at 7,500 and becomes

The plot of the density function versus the histogram is given in Figure 15.2.

Figure 15.2 The model versus data density plot for Data Set C truncated at 7,500.

The exponential model understates the early probabilities. It is hard to tell from the plot how they compare above 125,000.

For Data Set B modified with a limit, the maximum likelihood estimate is . When constructing the plot, the empirical distribution function must stop at 1,000. The plot appears in Figure 15.3.

Figure 15.3 The model versus data cdf plot for Data Set B censored at 1,000.

Once again, the exponential model does not fit well.

When the model's distribution function is close to the empirical distribution function, it is difficult to make small distinctions. Among the many ways to amplify those distinctions, two are presented here. The first is to simply plot the difference of the two functions. That is, if is the empirical distribution function and is the model distribution function, plot . There is no corresponding plot for grouped data.

**Figure 15.5** The model versus data plot for Data Set B censored at 1,000.

Another way to highlight any differences is the p–p plot, which is also called a probability plot. The plot is created by ordering the observations as . A point is then plotted corresponding to each value. The coordinates to plot are . If the model fits well, the plotted points will be near the 45° line running from (0,0) to (1,1). However, for this to be the case, a different definition of the empirical distribution function is needed. It can be shown that the expected value of is and, therefore, the empirical distribution should be that value and not the usual . If two observations have the same value, either plot both points (they would have the same “y” value but different “x” values) or plot a single value by averaging the two “x” values.

Example 15.3

Create a p–p plot for the continuing example.

For Data Set B truncated at 50, and one of the observed values is . The empirical value is . The other coordinate is

One of the plotted points will be (0.05,0.0391). The complete graph appears in Figure 15.6.

Figure 15.6 The p–p plot for Data Set B truncated at 50.

From the lower left part of the plot, it is clear that the exponential model places less probability on small values than the data call for. A similar plot can be constructed for Data Set B censored at 1,000 and it appears in Figure 15.7.

Figure 15.7 The p–p plot for Data Set B censored at 1,000.

This plot ends at about 0.75 because that is the highest probability observed prior to the censoring point at 1,000. There are no empirical values at higher probabilities. Again, the exponential model tends to underestimate the empirical values.

15.3.1 Exercises

15.1 Repeat Example 15.1 using a Weibull model in place of the exponential model.
15.2 Repeat Example 15.2 for a Weibull model.
15.3 Repeat Example 15.3 for a Weibull model.

15.4 Hypothesis Tests

A picture may be worth many words, but sometimes it is best to replace the impressions conveyed by pictures with mathematical demonstrations.⁴ One such demonstration is a test of the following hypotheses:

The test statistic is usually a measure of how close the model distribution function is to the empirical distribution function. When the null hypothesis completely specifies the model (e.g. an exponential distribution with mean 100), critical values are well known. However, it is more often the case that the null hypothesis states the name of the model but not its parameters. When the parameters are estimated from the data, the test statistic tends to be smaller than it would have been had the parameter values been prespecified. This relationship occurs because the estimation method itself tries to choose parameters that produce a distribution that is close to the data. When parameters are estimated from data, the tests become approximate. Because rejection of the null hypothesis occurs for large values of the test statistic, the approximation tends to increase the probability of a Type II error (declaring that the model is acceptable when it is not) while lowering the probability of a Type I error (rejecting an acceptable model).⁵ For actuarial modeling, this tendency is likely to be an acceptable trade-off.

One method of avoiding the approximation is to randomly divide the sample into two sets. One is termed the training set. This set is used to estimate the parameters. The other set is called the test or validation set. This set is used to evaluate the quality of the model fit. This is more realistic because the model is being validated against new data. This approach is easier to do when there is a lot of data so that both sets are large enough to give useful results. These methods will not be discussed further in this text: for more details, see James et al. [61].

15.4.1 The Kolmogorov–Smirnov Test

Let t be the left truncation point ( if there is no truncation) and let u be the right censoring point ( if there is no censoring). Then, the test statistic is

This test as presented here should only be used on individual data to ensure that the step function is well defined.⁶ Also, the model distribution function is assumed to be continuous over the relevant range.

Example 15.4

Calculate D for Example 15.1.

Table 15.3 provides the needed values. Because the empirical distribution function jumps at each data point, the model distribution function must be compared both before and after the jump. The values just before the jump are denoted in the table. The maximum is .

Table 15.3 The calculation of D for Example 15.4.

x				Maximum difference
82	0.0391	0.0000	0.0526	0.0391
115	0.0778	0.0526	0.1053	0.0275
126	0.0904	0.1053	0.1579	0.0675
155	0.1227	0.1579	0.2105	0.0878
161	0.1292	0.2105	0.2632	0.1340
243	0.2138	0.2632	0.3158	0.1020
294	0.2622	0.3158	0.3684	0.1062
340	0.3033	0.3684	0.4211	0.1178
384	0.3405	0.4211	0.4737	0.1332
457	0.3979	0.4737	0.5263	0.1284
680	0.5440	0.5263	0.5789	0.0349
855	0.6333	0.5789	0.6316	0.0544
877	0.6433	0.6316	0.6842	0.0409
974	0.6839	0.6842	0.7368	0.0529
1,193	0.7594	0.7368	0.7895	0.0301
1,340	0.7997	0.7895	0.8421	0.0424
1,884	0.8983	0.8421	0.8947	0.0562
2,558	0.9561	0.8947	0.9474	0.0614
3,476	0.9860	0.9474	1.0000	0.0386

For Data Set B censored at 1,000, 15 of the 20 observations are uncensored. Table 15.4 illustrates the needed calculations. The maximum is .

Table 15.4 The calculation of D with censoring for Example 15.4.

x				Maximum difference
27	0.0369	0.00	0.05	0.0369
82	0.1079	0.05	0.10	0.0579
115	0.1480	0.10	0.15	0.0480
126	0.1610	0.15	0.20	0.0390
155	0.1942	0.20	0.25	0.0558
161	0.2009	0.25	0.30	0.0991
243	0.2871	0.30	0.35	0.0629
294	0.3360	0.35	0.40	0.0640
340	0.3772	0.40	0.45	0.0728
384	0.4142	0.45	0.50	0.0858
457	0.4709	0.50	0.55	0.0791
680	0.6121	0.55	0.60	0.0621
855	0.6960	0.60	0.65	0.0960
877	0.7052	0.65	0.70	0.0552
974	0.7425	0.70	0.75	0.0425
1,000	0.7516	0.75	0.75	0.0016

All that remains is to determine the critical value. Commonly used critical values for this test are for , for , and for . When , the critical value should be smaller because there is less opportunity for the difference to become large. Modifications for this phenomenon exist in the literature (see, e.g., Stephens [116], which also includes tables of critical values for specific null distribution models), and one such modification is given in Klugman and Rioux [75] but is not introduced here.

For both this test and the Anderson–Darling test that follows, the critical values are correct only when the null hypothesis completely specifies the model. When the data set is used to estimate parameters for the null hypothesized distribution (as in the example), the correct critical value is smaller. For both tests, the change depends on the particular distribution that is hypothesized and maybe even on the particular true values of the parameters. An indication of how simulation can be used for this situation is presented in Section 19.4.5.

15.4.2 The Anderson–Darling Test

This test is similar to the Kolmogorov–Smirnov test but uses a different measure of the difference between the two distribution functions. The test statistic is

That is, it is a weighted average of the squared differences between the empirical and model distribution functions. Note that when x is close to t or to u, the weights might be very large due to the small value of one of the factors in the denominator. This test statistic tends to place more emphasis on good fit in the tails than in the middle of the distribution. Calculating with this formula appears to be challenging. However, for individual data (so this is another test that does not work for grouped data), the integral simplifies to

where the unique noncensored data points are . Note that when , the last term of the first sum is zero (evaluating the formula as written will ask for ln(0)). The critical values are 1.933, 2.492, and 3.857 for 10%, 5%, and 1% significance levels, respectively. As with the Kolmogorov–Smirnov test, the critical value should be smaller when .

Example 15.6

Perform the Anderson–Darling test for the continuing example.

For Data Set B truncated at 50, there are 19 data points. The calculation is shown in Table 15.5, where “summand” refers to the sum of the corresponding terms from the two sums. The total is 1.0226 and the test statistic is . Because the test statistic is less than the critical value of 2.492, the exponential model is viewed as plausible.

Table 15.5 The Anderson–Darling test for Example 15.6.

j				Summand
0	50	0.0000	0.0000	0.0399
1	82	0.0391	0.0526	0.0388
2	115	0.0778	0.1053	0.0126
3	126	0.0904	0.1579	0.0332
4	155	0.1227	0.2105	0.0070
5	161	0.1292	0.2632	0.0904
6	243	0.2138	0.3158	0.0501
7	294	0.2622	0.3684	0.0426
8	340	0.3033	0.4211	0.0389
9	384	0.3405	0.4737	0.0601
10	457	0.3979	0.5263	0.1490
11	680	0.5440	0.5789	0.0897
12	855	0.6333	0.6316	0.0099
13	877	0.6433	0.6842	0.0407
14	974	0.6839	0.7368	0.0758
15	1,193	0.7594	0.7895	0.0403
16	1,340	0.7997	0.8421	0.0994
17	1,884	0.8983	0.8947	0.0592
18	2,558	0.9561	0.9474	0.0308
19	3,476	0.9860	1.0000	0.0141
20		1.0000	1.0000

For Data Set B censored at 1,000, the results are shown in Table 15.6. The total is 0.7602 and the test statistic is . Because the test statistic does not exceed the critical value of 2.492, the exponential model is viewed as plausible.

Table 15.6 The Anderson–Darling calculation for Example 15.6 with censored data.

j				Summand
0	0	0.0000	0.00	0.0376
1	27	0.0369	0.05	0.0718
2	82	0.1079	0.10	0.0404
3	115	0.1480	0.15	0.0130
4	126	0.1610	0.20	0.0334
5	155	0.1942	0.25	0.0068
6	161	0.2009	0.30	0.0881
7	243	0.2871	0.35	0.0493
8	294	0.3360	0.40	0.0416
9	340	0.3772	0.45	0.0375
10	384	0.4142	0.50	0.0575
11	457	0.4709	0.55	0.1423
12	680	0.6121	0.60	0.0852
13	855	0.6960	0.65	0.0093
14	877	0.7052	0.70	0.0374
15	974	0.7425	0.75	0.0092
16	1,000	0.7516	0.75

15.4.3 The Chi-Square Goodness-of-Fit Test

Unlike the Kolmogorov–Smirnov and Anderson–Darling tests, this test allows for some discretion. It begins with the selection of arbitrary values, . Let be the probability a truncated observation falls in the interval from to . Similarly, let be the same probability according to the empirical distribution. The test statistic is then

where n is the sample size. Another way to write the formula is to let be the number of expected observations in the interval (assuming that the hypothesized model is true) and let be the number of observations in the interval. Then,

The critical value for this test comes from the chi-square distribution with degrees of freedom equal to the number of terms in the sum (k) minus 1 minus the number of estimated parameters. There are a variety of rules that have been proposed for deciding when the test is reasonably accurate. They center around the values of . The most conservative states that each must be at least 5. Some authors claim that values as low as 1 are acceptable. All agree that the test works best when the values are about equal from term to term. If the data are grouped, there is little choice but to use the groups as given, although adjacent groups could be combined to increase . For individual data, the data can be grouped for the purpose of performing this test.⁷

Example 15.7

Perform the chi-square goodness-of-fit test for the exponential distribution for the continuing example.

All three data sets can be evaluated with this test. For Data Set B truncated at 50, establish boundaries at 50, 150, 250, 500, 1,000, 2,000, and infinity. The calculations appear in Table 15.7. The total is . With four degrees of freedom (6 rows minus 1 minus 1 estimated parameter), the critical value for a test at a 5% significance level is 9.4877 (this value can be obtained with the Excel^®, function CHISQ.INV(0.95,4)) and the p-value is 0.8436 (from 1-CHISQ.DIST(1.4034,4,TRUE)). The exponential model is a good fit.

Table 15.7 Data Set B truncated at 50 for Example 15.7.

Range		Expected	Observed
50–150	0.1172	2.227	3	0.2687
150–250	0.1035	1.966	3	0.5444
250–500	0.2087	3.964	4	0.0003
500–1,000	0.2647	5.029	4	0.2105
1,000–2,000	0.2180	4.143	3	0.3152
2,000–	0.0880	1.672	2	0.0644
Total	1	19	19	1.4034

For Data Set B censored at 1,000, the first interval is 0–150 and the last interval is 1,000–. Unlike the previous two tests, the censored observations can be used. The calculations are shown in Table 15.8. The total is . With three degrees of freedom (five rows minus one minus one estimated parameter), the critical value for a test at a 5% significance level is 7.8147 and the p-value is 0.8976. The exponential model is a good fit.

Table 15.8 Data Set B censored at 1,000 for Example 15.7.

Range		Expected	Observed
0–150	0.1885	3.771	4	0.0139
150–250	0.1055	2.110	3	0.3754
250–500	0.2076	4.152	4	0.0055
500–1,000	0.2500	5.000	4	0.2000
1,000–	0.2484	4.968	5	0.0002
Total	1	20	20	0.5951

For Data Set C, the groups are already in place. The results are given in Table 15.9. The test statistic is . There are four degrees of freedom, for a critical value of 9.488. The p-value is about . There is clear evidence that the exponential model is not appropriate. A more accurate test would combine the last two groups (because the expected count in the last group is less than 1). The group from 125,000 to infinity has an expected count of 8.997 and an observed count of 12 for a contribution of 1.002. The test statistic is now 16.552, and with three degrees of freedom the p-value is 0.00087. The test continues to reject the exponential model.

Table 15.9 Data Set C for Example 15.7.

Range		Expected	Observed
7,500–17,500	0.2023	25.889	42	10.026
17,500–32,500	0.2293	29.356	29	0.004
32,500–67,500	0.3107	39.765	28	3.481
67,500–125,000	0.1874	23.993	17	2.038
125,000–300,000	0.0689	8.824	9	0.003
300,000–	0.0013	0.172	3	46.360
Total	1	128	128	61.913

Sometimes, the test can be modified to fit different situations. The following example illustrates this for aggregate frequency data.

Example 15.8

Conduct an approximate goodness-of-fit test for the Poisson model determined in Example 12.9. The data are repeated in Table 15.10.

Table 15.10 Automobile claims by year for Example 15.8.

Year	Exposure	Claims
1986	2,145	207
1987	2,452	227
1988	3,112	341
1989	3,458	335
1990	3,698	362
1991	3,872	359

For each year, we are assuming that the number of claims is the result of the sum of a number (given by the exposure) of independent and identical random variables. In that case, the central limit theorem indicates that a normal approximation may be appropriate. The expected count is the exposure times the estimated expected value for one exposure unit, while the variance is the exposure times the estimated variance for one exposure unit. The test statistic is then

and has an approximate chi-square distribution with degrees of freedom equal to the number of data points less the number of estimated parameters. The expected count is and the variance is also. The test statistic is

With five degrees of freedom, the 5% critical value is 11.07 and the Poisson hypothesis is accepted.

There is one important point to note about these tests. Suppose that the sample size were to double but that the sampled values were not much different (imagine each number showing up twice instead of once). For the Kolmogorov–Smirnov test, the test statistic would be unchanged, but the critical value would be smaller. For the Anderson–Darling and chi-square tests, the test statistic would double while the critical value would be unchanged. As a result, for larger sample sizes, it is more likely that the null hypothesis (and, thus, the proposed model) would be rejected. This outcome should not be surprising. We know that the null hypothesis is false (it is extremely unlikely that a simple distribution using a few parameters can explain the complex behavior that produced the observations), and with a large enough sample size we will have convincing evidence of that truth. When using these tests, we must remember that although all our models are wrong, some may be useful.

15.4.4 The Likelihood Ratio Test

An alternative question to “Could the population have distribution A?” is “Is distribution B a more appropriate representation of the population than distribution A?” More formally:

To perform a formal hypothesis test, distribution A must be a special case of distribution B, for example, exponential versus gamma. An easy way to complete this test is given as follows.

This test makes some sense. When the alternative hypothesis is true, forcing the parameter to be selected from the null hypothesis should produce a likelihood value that is significantly smaller.

Example 15.9

You want to test the hypothesis that the population that produced Data Set B (using the original largest observation) has a mean that is other than 1,200. Assume that the population has a gamma distribution and conduct the likelihood ratio test at a 5% significance level. Also, determine the p-value.

The hypotheses are

From earlier work, the maximum likelihood estimates are and . The loglikelihood at the maximum is . Next, the likelihood must be maximized, but only over those values and for which . This restriction means that can be free to range over all positive numbers, but that . Thus, under the null hypothesis, there is only one free parameter. The likelihood function is maximized at and . The loglikelihood at this maximum is . The test statistic is . For a chi-square distribution with one degree of freedom, the critical value is 3.8415. Because , the null hypothesis is not rejected. The probability that a chi-square random variable with one degree of freedom exceeds 0.346 is 0.556, a p-value that indicates little support for the alternative hypothesis.

Table 15.11 Six useful models for Example 15.10.

Model	Number of parameters	Negative loglikelihood		p-value
Negative binomial	2	5,348.04	8.77	0.0125
ZM logarithmic	2	5,343.79	4.92	0.1779
Poisson–inverse Gaussian	2	5,343.51	4.54	0.2091
ZM negative binomial	3	5,343.62	4.65	0.0979
Geometric–negative binomial	3	5,342.70	1.96	0.3754
Poisson–ETNB	3	5,342.51	2.75	0.2525

It is tempting to use this test when the alternative distribution simply has more parameters than the null distribution. In such cases, the test may not be appropriate. For example, it is possible for a two-parameter lognormal model to have a higher loglikelihood value than a three-parameter Burr model, resulting in a negative test statistic, indicating that a chi-square distribution is not appropriate. When the null distribution is a limiting (rather than special) case of the alternative distribution, the test may still be used, but the test statistic's distribution is now a mixture of chi-square distributions (see Self and Liang [112]). Regardless, it is still reasonable to use the “test” to make decisions in these cases, provided that it is clearly understood that a formal hypothesis test was not conducted.

15.4.5 Exercises

15.4 Use the Kolmogorov–Smirnov test to determine if a Weibull model is appropriate for the data used in Example 15.5.
15.5 (*) Five observations are made from a random variable. They are 1, 2, 3, 5, and 13. Determine the value of the Kolmogorov–Smirnov test statistic for the null hypothesis that , .
15.6 (*) You are given the following five observations from a random sample: 0.1, 0.2, 0.5, 1.0, and 1.3. Calculate the Kolmogorov–Smirnov test statistic for the null hypothesis that the population density function is , .
15.7 Perform the Anderson–Darling test of the Weibull distribution for Example 15.6.
15.8 Repeat Example 15.7 for the Weibull model.
15.9 (*) One hundred and fifty policyholders were observed from the time they arranged a viatical settlement until their death. No observations were censored. There were 21 deaths in the first year, 27 deaths in the second year, 39 deaths in the third year, and 63 deaths in the fourth year. The survival model

is being considered. At a 5% significance level, conduct the chi-square goodness-of-fit test.
15.10 (*) Each day, for 365 days, the number of claims is recorded. The results were 50 days with no claims, 122 days with one claim, 101 days with two claims, 92 days with three claims, and no days with four or more claims. For a Poisson model, determine the maximum likelihood estimate of and then perform the chi-square goodness-of-fit test at a 2.5% significance level.
15.11 (*) During a one-year period, the number of accidents per day was distributed as given in Table 15.12. Test the hypothesis that the data are from a Poisson distribution with mean 0.6, using the maximum number of groups such that each group has at least five expected observations. Use a significance level of 5%.

Table 15.12 The data for Exercise 15.11.

Number of accidents Days

0 209

1 111

2 33

3   7

4   3

5   2
15.12 (*) One thousand values were simulated from a uniform (0,1) distribution. The results were grouped into 20 ranges of equal width. The observed counts in each range were squared and added, resulting in a sum of 51,850. Determine the p-value for the chi-square goodness-of-fit test.
15.13 (*) Twenty claim amounts were sampled from a Pareto distribution with and unknown. The maximum likelihood estimate of is 7.0. Also, and . The likelihood ratio test is used to test the null hypothesis that . Determine the p-value for this test.
15.14 Redo Example 15.8 assuming that each exposure unit has a geometric distribution. Conduct the approximate chi-square goodness-of-fit test. Is the geometric preferable to the Poisson model?
15.15 Using Data Set B (with the original largest value), determine if a gamma model is more appropriate than an exponential model. Recall that an exponential model is a gamma model with . Useful values were obtained in Example 11.2.
15.16 Use Data Set C to choose a model for the population that produced those numbers. Choose from the exponential, gamma, and transformed gamma models. Information for the first two distributions was obtained in Example 11.3 and Exercise 11.17, respectively.
15.17 Conduct the chi-square goodness-of-fit test for each of the models obtained in Exercise 12.3.
15.18 Conduct the chi-square goodness-of-fit test for each of the models obtained in Exercise 12.5.
15.19 Conduct the chi-square goodness-of-fit test for each of the models obtained in Exercise 12.6.
15.20 For the data in Table 15.20, determine the method of moments estimates of the parameters of the Poisson–Poisson distribution where the secondary distribution is the ordinary (not zero-truncated) Poisson distribution. Perform the chi-square goodness-of-fit test using this model.
15.21 You are given the data in Table 15.13, which represent results from 23,589 automobile insurance policies. The third column, headed “Fitted model,” represents the expected number of losses for a fitted (by maximum likelihood) negative binomial distribution.
1. Perform the chi-square goodness-of-fit test at a significance level of 5%.
2. Determine the maximum likelihood estimates of the negative binomial parameters r and . This can be done from the given numbers without actually maximizing the likelihood function.
Table 15.13 The data for Exercise 15.21.

Number of losses, k Number of policies, Fitted model

0 20,592 20,596.76

1 2,651 2,631.03

2 297 318.37

3 41 37.81

4 7 4.45

5 0 0.52

6 1 0.06

0 0.00

Number of accidents	Days
0	209
1	111
2	33
3	7
4	3
5	2

Number of losses, k	Number of policies,	Fitted model
0	20,592	20,596.76
1	2,651	2,631.03
2	297	318.37
3	41	37.81
4	7	4.45
5	0	0.52
6	1	0.06
	0	0.00

15.5 Selecting a Model

15.5.1 Introduction

Almost all of the tools are now in place for choosing a model. Before outlining a recommended approach, two important concepts must be introduced. The first is parsimony. The principle of parsimony states that unless there is considerable evidence to do otherwise, a simpler model is preferred. The reason for this preference is that a complex model may do a great job of matching the data, but that is no guarantee that the model will match the population from which the observations were sampled. For example, given any set of 10 pairs with unique x values, there will always be a polynomial of degree 9 or less that goes through all 10 points. But if these points were a random sample, it is highly unlikely that the population values would all lie on that polynomial. However, there may be a straight line that comes close to the sampled points as well as the other points in the population. This observation matches the spirit of most hypothesis tests. That is, do not reject the null hypothesis (and thus claim a more complex description of the population holds) unless there is strong evidence to do so.

The second concept does not have a name. It states that, if you try enough models, one will look good, even if it is not. Suppose that I have 900 models at my disposal. For most data sets, it is likely that one of them will fit extremely well, but it may not help us learn about the population.

Thus, in selecting models, there are two things to keep in mind:

Use a simple model if at all possible.
Restrict the universe of potential models.

The methods outlined in the remainder of this section help with the first point; the second one requires some experience. Certain models make more sense in certain situations, but only experience can enhance the modeler's senses so that only a short list of quality candidates is considered.

The section is split into two types of selection criteria. The first set is based on the modeler's judgment, while the second set is more formal in the sense that most of the time all analysts will reach the same conclusions because the decisions are made based on numerical measurements rather than charts or graphs.

15.5.2 Judgment-Based Approaches

Using judgment to select models involves one or more of the three concepts outlined herein. In all cases, the analyst's experience is critical.

First, the decision can be based on the various graphs (or tables based on the graphs) presented in this chapter, allowing the analyst to focus on aspects of the model that are important for the proposed application.⁹ For example, it may be more important to fit the tail well or it may be more important to match the mode or modes. Even if a score-based approach is used, it may be appropriate to present a convincing picture to support the chosen model.

Second, the decision can be influenced by the success of particular models in similar situations or the value of a particular model for its intended use. For example, the 1941 CSO mortality table follows a Makeham distribution for much of its range of ages. In a time of limited computing power, such a distribution allowed for easier calculation of joint life values. As long as the fit of this model was reasonable, this advantage outweighed the use of a different, but better fitting, model. Similarly, if the Pareto distribution has been used to model a particular line of liability insurance both by the analyst's company and by others, it may require more than the usual amount of evidence to change to an alternative distribution.

Third, the situation may completely determine the distribution. For example, suppose that a dental insurance contract provides for at most two checkups per year and suppose that individuals make two independent choices each year as to whether to have a checkup. If each time the probability is q, then the distribution must be binomial with .

Finally, it should be noted that the more algorithmic approaches outlined in this section do not always agree. In that case, judgment is most definitely required, if only to decide which algorithmic approach to use.

15.5.3 Score-Based Approaches

Some analysts might prefer an automated process for selecting a model. An easy way to do that would be to assign a score to each model and let the model with the best value win. The following scores are worth considering:

The Kolmogorov–Smirnov test statistic: Choose the model with the smallest value.
The Anderson–Darling test statistic: Choose the model with the smallest value.
The chi-square goodness-of-fit test statistic: Choose the model with the smallest value.
The chi-square goodness-of-fit test: Choose the model with the highest p-value.
The likelihood (or loglikelihood) function at its maximum: Choose the model with the largest value.

All but the chi-square p-value have a deficiency with respect to parsimony. First, consider the likelihood function. When comparing, say, an exponential to a Weibull model, the Weibull model must have a likelihood value that is at least as large as that of the exponential model. They would only be equal in the rare case that the maximum likelihood estimate of the Weibull parameter was equal to 1. Thus, the Weibull model would always win over the exponential model, a clear violation of the principle of parsimony. For the three test statistics, there is no assurance that the same relationship will hold, but it seems likely that, if a more complex model is selected, the fit measure will be better. The only reason the chi-square test p-value is immune from this problem is that with more complex models, the test has fewer degrees of freedom. It is then possible that the more complex model will have a smaller p-value. There is no comparable adjustment for the first two test statistics listed.

With regard to the likelihood value, there are two ways to proceed. One is to perform the likelihood ratio test and the other is to impose a penalty for employing additional parameters. The likelihood ratio test is technically only available when one model is a special case of another (e.g. Pareto versus generalized Pareto). The concept can be turned into an algorithm by using the test at a 5% significance level. Begin with the best one-parameter model (the one with the highest loglikelihood value). Add a second parameter only if the two-parameter model with the highest loglikelihood value shows an increase of at least 1.92 (so that twice the difference exceeds the critical value of 3.84). Then move to three-parameter models. If the comparison is to a two-parameter model, a 1.92 increase is again needed. If the early comparison led to keeping the one-parameter model, an increase of 3.00 is needed (because the test has two degrees of freedom). To add three parameters requires a 3.91 increase; four parameters, a 4.74 increase; and so on. In the spirit of this chapter, this algorithm can be used even when one model is not a special case of the other model. However, it would not be appropriate to claim that a likelihood ratio test was being conducted.

Aside from the issue of special cases, the likelihood ratio test has the same problem as any hypothesis test. Were the sample size to double, the loglikelihoods would also double, making it more likely that a model with a higher number of parameters would be selected, tending to defeat the parsimony principle. Conversely, it could be argued that, if we possess a lot of data, we have the right to consider and fit more complex models. A method that effects a compromise between these positions is the Schwarz Bayesian Criterion (SBC) [110], which is also called the Bayesian Information Criterion (BIC). This method recommends that, when ranking models, a deduction of should be made from the loglikelihood value, where r is the number of estimated parameters and n is the sample size. Thus, adding a parameter requires an increase of in the loglikelihood. For larger sample sizes, a greater increase is needed, but it is not proportional to the sample size itself.

An alternative penalty is the Akaike Information Criterion (AIC) [4]. This method deducts the number of parameters from the loglikelihood.¹⁰ Section 3 of Brockett [17] promotes the AIC, while in a discussion of that paper Carlin provides support for the SBC. The difference in the two methods is that the SBC adjusts for the sample size while the AIC does not. To summarize, the scores are as follows:

(15.1)

Example 15.11

For the continuing example in this chapter, choose between the exponential and Weibull models for the data.

Graphs were constructed in the various examples and exercises. Table 15.14 summarizes the numerical measures. For the truncated version of Data Set B, the SBC is calculated for a sample size of 19, while for the version censored at 1,000, there are 20 observations. For both versions of Data Set B, while the Weibull offers some improvement, it is not convincing. In particular, none of the likelihood ratio test, SBC, or AIC indicate value in the second parameter. For Data Set C, it is clear that the Weibull model is superior and provides an excellent fit.

Table 15.14 The results for Example 15.11.

	B truncated at 50		B censored at 1,000
Criterion	Exponential	Weibull	Exponential	Weibull
K–S	0.1340	0.0887	0.0991	0.0991
A–D^a	0.4292	0.1631	0.1713	0.1712
	1.4034	0.3615	0.5951	0.5947
p-value	0.8436	0.9481	0.8976	0.7428
Loglikelihood	−146.063	−145.683	−113.647	−113.647
SBC	−147.535	−148.628	−115.145	−116.643
AIC	−147.063	−147.683	−114.647	−115.647
C
	61.913	0.3698
p-value		0.9464
Loglikelihood	−214.924	−202.077
SBC	−217.350	−206.929
AIC	−215.924	−204.077
^aK–S and A–D refer to the Kolmogorov–Smirnov and Anderson–Darling test statistics, respectively.

Example 15.12

In Example 7.8, an ad hoc method was used to demonstrate that the Poisson–ETNB distribution provided a good fit. Use the methods of this chapter to determine a good model.

The data set is very large and, as a result, requires a very close correspondence of the model to the data. The results are given in Table 15.15.

**Table 15.15** The results for Example 15.12.

From Table 15.15, it is seen that the negative binomial distribution does not fit well, while the fit of the Poisson–inverse Gaussian is marginal at best . The Poisson–inverse Gaussian is a special case of the Poisson–ETNB. Hence, a likelihood ratio test can be formally applied to determine if the additional parameter r is justified. Because the loglikelihood increases by 5, which is more than 1.92, the three-parameter model is a significantly better fit. The chi-square test shows that the Poisson–ETNB provides an adequate fit. In contrast, the SBC, but not the AIC, favors the Poisson–inverse Gaussian distribution. This illustrates that with large sample sizes, using the SBC makes it harder to add a parameter. Given the improved fit in the tail for the three-parameter model, it seems to be the best choice.

Example 15.13

This example is taken from Douglas [29, p. 254]. An insurance company's records for one year show the number of accidents per day that resulted in a claim to the insurance company for a particular insurance coverage. The results are shown in Table 15.16. Determine if a Poisson model is appropriate.

Table 15.16 The data for Example 15.13.

Number of claims/day	Observed number of days
0	47
1	97
2	109
3	62
4	25
5	16
6	4
7	3
8	2
9+	0

A Poisson model is fitted to these data. The method of moments and the maximum likelihood method both lead to the estimate of the mean:

The results of a chi-square goodness-of-fit test are shown in Table 15.17. Any time such a table is made, the expected count for the last group is

Table 15.17 The chi-square goodness-of-fit test for Example 15.13.

Claims/day	Observed	Expected	Chi-square
0	47	47.8	0.01
1	97	97.2	0.00
2	109	98.8	1.06
3	62	66.9	0.36
4	25	34.0	2.39
5	16	13.8	0.34
6	4	4.7	0.10
7+	5	1.8	5.66
Totals	365	365	9.93

The last three groups are combined to ensure an expected count of at least one for each row. The test statistic is 9.93 with six degrees of freedom. The critical value at a 5% significance level is 12.59 and the p-value is 0.1277. By this test, the Poisson distribution is an acceptable model; however, it should be noted that the fit is poorest at the large values, and with the model understating the observed values, this may be a risky choice.

Table 15.18 The test results for Example 15.14.

	Poisson	Geometric	ZM Poisson	ZM geometric
Chi-square	543.0	643.4	64.8	0.58
Degrees of freedom	2	4	2	2
p-value
Loglikelihood	−171,373	−171,479	−171,160	−171,133
SBC	−171,379.5	−171,485.5	−171,173	−171,146
AIC	−171,374	−171,480	−171,162	−171,135

**Table 15.19** The fit of the Simon data for Example 15.15.

		Fitted distributions
0	99	54.0	95.9	98.7
1	65	92.2	75.8	70.6
2	57	78.8	50.4	50.2
3	35	44.9	31.3	32.6
4	20	19.2	18.8	20.0
5	10	6.5	11.0	11.7
6	4	1.9	6.4	6.6
7	0	0.5	3.7	3.6
8	3	0.1	2.1	2.0
9	4	0.0	1.2	1.0
10	0	0.0	0.7	0.5
11	1	0.0	0.4	0.3
12+	0	0.0	0.5	0.3
Parameters

Chi-square		72.64	4.06	2.84
Degrees of freedom		4	5	5
p-Value			54.05%	72.39%
Loglikelihood		−577.0	−528.8	−528.5
SBC		−579.8	−534.5	−534.2
AIC		−578.0	−530.8	−530.5

Example 15.16

Consider the data in Table 15.20 on automobile liability policies in Switzerland, taken from Bühlmann [20]. Determine an appropriate model.

Three models are considered in Table 15.20. The Poisson distribution is a very bad fit. Its tail is far too light compared with the actual experience. The negative binomial distribution appears to be much better, but cannot be accepted because the p-value of the chi-square statistic is very small. The large sample size requires a better fit. The Poisson–inverse Gaussian distribution provides an almost perfect fit (the p-value is large). Note that the Poisson–inverse Gaussian has two parameters, like the negative binomial. The SBC and AIC also favor this choice. This example shows that the Poisson–inverse Gaussian can have a much heavier right-hand tail than the negative binomial.

Example 15.17

Comprehensive medical claims were studied by Bevan [15] in 1963. Male (955 payments) and female (1,291 payments) claims were studied separately. The data appear in Table 15.21, where there was a deductible of 25. Can a common model be used?

Table 15.21 The comprehensive medical losses for Example 15.17.

Loss	Male	Female
25–50	184	199
50–100	270	310
100–200	160	262
200–300	88	163
300–400	63	103
400–500	47	69
500–1,000	61	124
1,000–2,000	35	40
2,000–3,000	18	12
3,000–4,000	13	4
4,000–5,000	2	1
5,000–6,667	5	2
6,667–7,500	3	1
7,500–10,000	6	1

When using the combined data set, the lognormal distribution is the best two-parameter model. Its negative loglikelihood (NLL) is 4,580.20. This value is 19.09 better than the one-parameter inverse exponential model and 0.13 worse than the three-parameter Burr model. Because none of these models is a special case of the other, the likelihood ratio test (LRT) cannot be used, but it is clear that, using the 1.92 difference as a standard, the lognormal is preferred. The SBC requires an improvement of , while the AIC requires 1.00, and again the lognormal is preferred. The parameters are and . When separate lognormal models are fitted to males ( and ) and females ( and ), the respective NLLs are 1,977.25 and 2,583.82 for a total of 4,561.07. This result is an improvement of 19.13 over a common lognormal model, which is significant by the LRT (3.00 needed), the SBC (7.72 needed), and the AIC (2.00 needed). Sometimes it is useful to be able to use the same nonscale parameter in both models. When a common value of is used, the NLL is 4,579.77, which is significantly worse than using separate models.

Example 15.18

In 1958, Longley-Cook [82] examined employment patterns of casualty actuaries. One of his tables listed the number of members of the Casualty Actuarial Society employed by casualty companies in 1949 (55 actuaries) and 1957 (78 actuaries). Using the data in Table 15.22, determine a model for the number of actuaries per company that employs at least one actuary and find out whether the distribution has changed over the eight-year period.

Table 15.22 The number of actuaries per company for Example 15.18.

Number of actuaries	Number of companies – 1949	Number of companies – 1957
1	17	23
2	7	7
3–4	3	3
5–9	2	3
10+	0	1

Because a value of zero is impossible, only zero-truncated distributions should be considered. In all three cases (1949 data only, 1957 data only, and combined data), the ZT logarithmic and ZT (extended) negative binomial distributions have acceptable goodness-of-fit test values. The improvements in NLL are 0.52, 0.02, and 0.94. The LRT can be applied (except that the ZT logarithmic distribution is a limiting case of the ZT negative binomial distribution with ), and the improvement is not significant in any of the cases. The same conclusions apply if the SBC or AIC are used. The parameter estimates (where is the only parameter) are 2.0227, 2.8114, and 2.4479, respectively. The NLL for the combined data set is 74.35, while the total for the two separate models is 74.15. The improvement is only 0.20, which is not significant (there is one degree of freedom). Even though the estimated mean has increased from to , there is not enough data to make a convincing case that the true mean has increased.

15.5.4 Exercises

15.22 (*) One thousand policies were sampled and the number of accidents for each recorded. The results are shown in Table 15.23. Without doing any formal tests, determine which of the following five models is most appropriate: binomial, Poisson, negative binomial, normal, or gamma.

Table 15.23 The data for exercise 15.22.

Number of accidents Number of policies

0 100

1 267

2 311

3 208

4 87

5 23

6 4

Total 1,000
15.23 For Example 15.1, determine if a transformed gamma model is more appropriate than either the exponential model or the Weibull model for each of the three data sets.
15.24 (*) From the data in Exercise 15.11, the maximum likelihood estimates are for the Poisson distribution and and for the negative binomial distribution. Conduct the likelihood ratio test for choosing between these two models.
15.25 (*) From a sample of size 100, five models are fitted with the results given in Table 15.24. Use the SBC and then the AIC to select the best model.

Table 15.24 The results for exercise 15.25.

Model Number of parameters Negative loglikelihood

Generalized Pareto 3 219.1

Burr 3 219.2

Pareto 2 221.2

Lognormal 2 221.4

Inverse exponential 1 224.3
15.26 Refer to Exercise 11.27. Use the likelihood ratio test (at a 5% significance level), the SBC, and the AIC to decide if Sylvia's claim is true.
15.27 (*) Five models were fitted to a sample of 260 observations. The following are the number of parameters in the model followed by the loglikelihood value: 1, −414, 2, −412, 3, −411, 4, −409, 6, −409. According to the SBC, which model (identified by the number of parameters) should be selected? Does the decision change if the AIC is used?
15.28 Using results from Exercises 12.3 and 15.17, use the chi-square goodness-of-fit test, the likelihood ratio test, the SBC, and the AIC to determine the best model from the members of the class.
15.29 Using results from Exercises 12.5 and 15.18, use the chi-square goodness-of-fit test, the likelihood ratio test, the SBC, and the AIC to determine the best model from the members of the class.
15.30 Using results from Exercises 12.6 and 15.19, use the chi-square goodness-of-fit test, the likelihood ratio test, the SBC, and the AIC to determine the best model from the members of the class.
15.31 Table 15.25 gives the number of medical claims per reported automobile accident.

Table 15.25 The data for exercise 15.31.

Number of medical claims Number of accidents

0 529

1 146

2 169

3 137

4 99

5 87

6 41

7 25

8+ 0
1. Construct a plot similar to Figure 6.1. Does it appear that a member of the class will provide a good model? If so, which one?
2. Determine the maximum likelihood estimates of the parameters for each member of the class.
3. Based on the chi-square goodness-of-fit test, the likelihood ratio test, the SBC, and the AIC, which member of the class provides the best fit? Is this model acceptable?
15.32 For the four data sets introduced in Exercises 12.3, 12.5, 12.6, and 15.31, you have determined the best model from among members of the class. For each data set, determine the maximum likelihood estimates of the zero-modified Poisson, geometric, logarithmic, and negative binomial distributions. Use the chi-square goodness-of-fit test and likelihood ratio tests to determine the best of the eight models considered and state whether the selected model is acceptable.
15.33 A frequency model that has not been mentioned to this point is the zeta distribution. It is a zero-truncated distribution with , . The denominator is the zeta function, which must be evaluated numerically as . The zero-modified zeta distribution can be formed in the usual way. More information can be found in Luong and Doray [84].
1. Determine the maximum likelihood estimates of the parameters of the zero-modified zeta distribution for the data in Example 12.7.
2. Is the zero-modified zeta distribution acceptable?
15.34 In Exercise 15.32, the best model from among the members of the and classes was selected for the data sets in Exercises 12.3, 12.5, 12.6, and 15.31. Fit the Poisson–Poisson, Polya–Aeppli, Poisson–inverse Gaussian, and Poisson–ETNB distributions to these data and determine if any of these distributions should replace the one selected in Exercise 15.32. Is the current best model acceptable?
15.35 The five data sets presented in this problem are all taken from Lemaire [79]. For each data set, compute the first three moments and then use the ideas in Section 7.2 to make a guess at an appropriate model from among the compound Poisson collection [(Poisson, geometric, negative binomial, Poisson–binomial (with and ), Polya–Aeppli, Neyman Type A, Poisson–inverse Gaussian, and Poisson–ETNB)]. From the selected model (if any) and members of the and classes, determine the best model.
1. The data in Table 15.26 represent counts from third-party automobile liability coverage in Belgium.
  
  Table 15.26 The data for exercise 15.35(a).
  
  Number of claims Number of policies
  
  0 96,978
  
  1 9,240
  
  2 704
  
  3 43
  
  4 9
  
  5+ 0
2. The data in Table 15.27 represent the number of deaths due to horse kicks in the Prussian army between 1875 and 1894. The counts are the number of deaths in a corps (there were 10 of them) in a given year, and thus there are 200 observations. This data set is often cited as the inspiration for the Poisson distribution. For using any of our models, what additional assumption about the data must be made?
  
  Table 15.27 The data for exercise 15.35(b).
  
  Number of deaths Number of corps
  
  0 109
  
  1 65
  
  2 22
  
  3 3
  
  4 1
  
  5+ 0
3. The data in Table 15.28 represent the number of major international wars per year from 1500 through 1931.
  
  Table 15.28 The data for exercise 15.35(c).
  
  Number of wars Number of years
  
  0 223
  
  1 142
  
  2 48
  
  3 15
  
  4 4
  
  5+ 0
4. The data in Table 15.29 represent the number of runs scored in each half-inning of World Series baseball games played from 1947 through 1960.
  
  Table 15.29 The data for exercise 15.35(d).
  
  Number of runs Number of half innings
  
  0 1,023
  
  1 222
  
  2 87
  
  3 32
  
  4 18
  
  5 11
  
  6 6
  
  7+ 3
5. The data in Table 15.30 represent the number of goals per game per team in the 1966–1967 season of the National Hockey League.
  
  Table 15.30 The data for exercise 15.35(e).
  
  Number of goals Number of games
  
  0 29
  
  1 71
  
  2 82
  
  3 89
  
  4 65
  
  5 45
  
  6 24
  
  7 7
  
  8 4
  
  9 1
  
  10+ 3
15.36 Verify that the estimates presented in Example 7.14 are the maximum likelihood estimates. (Because only two decimals are presented, it is probably sufficient to observe that the likelihood function takes on smaller values at each of the nearby points.) The negative binomial distribution was fitted to these data in Example 12.5. Which of these two models is preferable?

Number of accidents	Number of policies
0	100
1	267
2	311
3	208
4	87
5	23
6	4
Total	1,000

Model	Number of parameters	Negative loglikelihood
Generalized Pareto	3	219.1
Burr	3	219.2
Pareto	2	221.2
Lognormal	2	221.4
Inverse exponential	1	224.3

Number of medical claims	Number of accidents
0	529
1	146
2	169
3	137
4	99
5	87
6	41
7	25
8+	0

Number of claims	Number of policies
0	96,978
1	9,240
2	704
3	43
4	9
5+	0

Number of deaths	Number of corps
0	109
1	65
2	22
3	3
4	1
5+	0

Number of wars	Number of years
0	223
1	142
2	48
3	15
4	4
5+	0

Number of runs	Number of half innings
0	1,023
1	222
2	87
3	32
4	18
5	11
6	6
7+	3

Number of goals	Number of games
0	29
1	71
2	82
3	89
4	65
5	45
6	24
7	7
8	4
9	1
10+	3

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

		Fitted distributions
Number of claims	Observed frequency	Negative binomial	Poisson–inverse Gaussian	Poisson–ETNB
0	565,664	565,708.1	565,712.4	565,661.2
1	68,714	68,570.0	68,575.6	68,721.2
2	5,177	5,317.2	5,295.9	5,171.7
3	365	334.9	344.0	362.9
4	24	18.7	20.8	29.6
5	6	1.0	1.2	3.0
6+	0	0.0	0.1	0.4
Parameters


Chi-square		12.13	7.09	0.29
Degrees of freedom		2	2	1
p-value			2.88%	58.9%
−Loglikelihood		251,117	251,114	251,109
SBC		−251,130	−251,127	−251,129
AIC		−251,119	−251,116	−251,112

		Fitted distributions
Number of accidents	Observed frequency	Poisson	Negative binomial	P.–i.G.^a
0	103,704	102,629.6	103,723.6	103,710.0
1	14,075	15,922.0	13,989.9	14,054.7
2	1,766	1,235.1	1,857.1	1,784.9
3	255	63.9	245.2	254.5
4	45	2.5	32.3	40.4
5	6	0.1	4.2	6.9
6	2	0.0	0.6	1.3
7+	0	0.0	0.1	0.3
Parameters

Chi-square		1,332.3	12.12	0.78
Degrees of freedom		2	2	3
p-Values				85.5%
Loglikelihood		−55,108.5	−54,615.3	−54,609.8
SBC		−55,114.3	−54,627.0	−54,621.5
AIC		−55,109.5	−54,617.3	−54,611.8
^aP.–i.G. stands for Poisson–inverse Gaussian.