Chi-Squared Tests for Specific Distributions
Let be independent and identically distributed random variables. Consider the problem of testing the composite hypothesis according to which the distribution of is a member of a parametric family with the Poisson distribution
(9.1)
Let be the observed frequencies meaning the number of realized values of that fall into a specific class or interval , where the fixed integer grouping classes are such that . As before, let be a column vector of standardized grouped frequencies with its components as
If the nuisance parameter is estimated effectively from grouped data by , then the standard Pearson’s sum will follow in the limit the chi-square distribution with degrees of freedom. If, on the other hand, the parameter is estimated from raw (ungrouped) data, for example, by the maximum likelihood estimate (MLE), then the standard Pearson test must be modified. Let be a (s being the dimensionality of the parameter space) matrix with its elements as
For the null hypothesis in (9.1), the matrix possesses its elements as
Let be the MLE of based on the raw data. Then, the well-known Nikulin-Rao–Robson (NRR) modified chi-squared test statistic, distributed in the limit as , can be expressed as
(9.2)
where and are the estimators of the Fisher information matrix (scalar) and of the column vector , respectively.
Next, let us consider the binomial null hypothesis. Let the probability mass function be specified as
(9.3)
where and . In this case, the Fisher information matrix (scalar) is and the elements of the matrix are
Let be the MLE estimator of . Then, the modified chi-squared test statistic, distributed in the limit as , can be expressed as
(9.4)
where and .
Now, let the probability distribution of the null hypothesis follow Feller’s 1948, pp. 105–115) discrete distribution with cumulative distribution function
(9.5)
where is the complement gamma function.
If the parameter and , then the distribution function in (9.5) can be approximated as
Using the results of Bol’shev (1963) a more accurate approximation of (9.5) can be obtained as
(9.6)
where and . Note that (9.6) looks like the binomial distribution function with the exception that the parameter can be any real positive number. Consider the probability mass function of (9.6) given by
(9.7)
where the parameter . In this case, there are three possibilities to construct a modified chi-squared test for testing a composite null hypothesis about the distribution in (9.6). First one is to use MLEs of and and the NRR statistic . Since the MLEs of and cannot be derived easily, the modified test based on MMEs (see Eq. (4.9)) or Singh’s (see Eq. (3.25)) can be used.
For the model in (9.7) and , the DN statistic will follow in the limit and . If , then, as before (see Eq. (4.12)), we will have
In this case, , , and
To specify the above tests, we of course will need explicit expressions of all the matrices involved.
The elements of the Fisher information matrix for the model in (9.7) are
where is the largest integer contained in and is the psi-function. It is known that series expansions for the psi-function converge very slowly. But, for integer values of x, a recurrence can be used, from which it follows that
This result permits us to calculate all expressions containing with a very high accuracy.
The elements of the matrix are
The elements of the matrix are
The elements of the matrix are
The components and of the vector for Singh’s test for the model in (9.7) are
and
It has to be noted that the test in (3.25) is computationally much more complicated than the statistic for large samples.
For the model in (9.7), and . Denoting the first two sample moments and and then equating them to population moments, the MMEs of and are obtained as
(9.8)
From (9.8), we see that negative values of and are possible, but the proportion of such estimates will be almost negligible for samples of size . It seems that test can be used for analyzing Rutherford’s data, but the question about -consistency of the MMEs in (9.8) is still open.
To examine the rate of convergence of estimators and for sample sizes , we simulated 3,000 estimates of and assuming that and , values that correspond to Rutherford’s data. The power curve fit of , the average value of estimates for 3000 runs, in Figure 9.1 shows that and . The power curve fit of in Figure 9.2 gives and . To check for the distribution of the statistic under the null “Feller’s” distribution ( and ), we simulated values of . The histogram of these values is well described by the distribution (see Figure 9.3). The average value also does not contradict the assumption that the statistic follows in the limit the chi-squared distribution with one degree of freedom. Another important property of any test statistic is its independence from the unknown parameters. To check for this feature of the test , we simulated values of assuming that (two times less than for the null hypothesis ) and (two times more than for the null hypothesis ). The results (Figure 9.4) show that the simulated values do not contradict the independence, because the histogram is again well described by distribution.
Figure 9.1 Simulated average value of (circles) and the power function fit (solid line) as function of the sample size n.
Figure 9.2 Simulated average value of (circles) and the power function fit (solid line) as function of the sample size n.
Figure 9.3 The histogram of the 1000 simulated values of for the null hypothesis ( and ) and the distribution (solid line).
Figure 9.4 The histogram of the 1000 simulated values of for and the distribution (solid line).
The above results evidently allow us to use the HRM statistic for Rutherford’s data analysis.
For r equiprobable cells of the model in (4.25), the borders of equiprobable intervals are defined as:
Then, the elements of the matrix are as follows:
where and is the psi-function. For the required calculation of , we used the series expansion
where is the Euler’s constant.
Similarly, the elements of the matrices and are as follows:
where the population moments are
and is the incomplete gamma function. For the required calculation of , we used the following series expansion (Prudnikov et al., 1981, p. 705):
Finally, the elements of the matrix are as follows:
Elements of the Fisher information matrix are as follows:
Consider the two-parameter exponential distribution with cumulative distribution function
(9.9)
where the unknown parameter . It is easily verified that the matrix for the model in (9.9) is
(9.10)
Based on the set of n i.i.d. random variables , the MLE of the parameter equals , where
(9.11)
Consider r disjoint equiprobable intervals
For these intervals, the elements of the matrix (see Eq. (3.4)) are
Using the matrix in (9.10) and the above elements of the matrix with replaced by the MLE in (9.11), the NRR test (see Eq. (3.8)) can be used. While using Microsoft Excel, the calculations based on double precision is recommended.
Let , and , be borders of r equiprobable random grouping intervals. Then, the probabilities of falling into each interval are .
The elements of the matrix , for , are as follows:
Next, the elements of the matrix are as follows:
where is Euler’s dilogarithm function that can be computed by the series expansion
and by the expansion
for (Prudnikov et al., 1986, p. 763).
Finally, we have the matrices and as
System requirements for implementing the software of Sections 9.6, 9.7, 9.8, 9.9, 9.10 are Windows XP, Windows 7, MS Office 2003, 2007, 2010.
1. Open file Testing Normality.xls;
2. Enter your sample data in column “I” starting from cell 1;
3. Click the button “Compute,” introduce the sample size and the desired number of equiprobable intervals . The recommended number of intervals for the NRR test in (3.8), under close alternatives (such as the logistic), is . The recommended number of intervals for the test in (3.24) is (see Section 4.4.1). Note that the power of can be more than that of the NRR test;
5. Numerical values of and are in cells F2 and G2, respectively. Cells F3 and G3 contain the corresponding percentage points at level 0.05. The P-values of and are in cells F4 and G4, respectively.
1. Open file Testing Exp GrNik.xls;
2. Enter your sample data in column “I” starting from cell 1;
3. Click the button “Compute,” introduce the sample size and desired number of equiprobable intervals. The recommended number of equiprobable intervals is ;
5. The numerical value of (see Eq. (3.44)) is in cell F2. The percentage point at level 0.05 and the P-value are in cells F3 and F4, respectively.
1. Open file Testing NRR 2-param EXP.xls;
2. Enter your sample data in column “I” starting from cell 1;
3. Click the button “Compute,” introduce the sample size and desired number of equiprobable intervals. The recommended number of equiprobable intervals is ;
5. The numerical value of is in cell F2. The percentage point at level 0.05 and the P-value are in cells F3 and F4, respectively.
1. Open file Testing Logistic.xls;
2. Enter your sample data in column “I” starting from cell 1;
3. Click the button “Compute,” introduce the sample size and desired number of equiprobable intervals . The recommended number of equiprobable intervals, for close alternatives (such as normal), is ;
5. Numerical values of in (4.9) and in (4.13) are in cells E2 and F2, respectively. Cells E3 and F3 contain the corresponding percentage points at level 0.05. The P-values of and are in cells E4 and F4, respectively.
1. Open file Testing Weibull3.xls;
2. Enter your sample data in column “I” starting from cell 1;
3. Click the button “Compute,” introduce the sample size and desired number of equiprobable intervals . The recommended number of equiprobable intervals for the Exponentiated Weibull and Power Generalized Weibull alternatives is ;
5. Numerical values of in (4.9) and in (4.14) (see also Section 9.2) are in cells F2 and G2, respectively. Note that the power of is usually higher than that of . Cells F3 and G3 contain the corresponding percentage points at level 0.05. The P-values of and are in cells F4 and G4, respectively.
1. Open file Test for PGW (Left-tailed).xls;
2. Enter your sample data in column “I” starting from cell 1;
3. Click the button “Run,” introduce the sample size and desired number of equiprobable intervals . The recommended number of equiprobable intervals for the Exponentiated Weibull, Generalized Weibull, and Three-Parameter Weibull alternatives is ;
4. Click OK. Note that the power of in (3.50) is usually higher than that of in (3.48);
5. Numerical values of and (see Eqs. (3.48), (3.50) and Section 9.3) are in cells F2 and G2, respectively. Cells F3 and G3 contain the corresponding percentage points at level 0.05. The P-values of and are in cells F4 and G4, respectively.
1. Open file Testing Circular Normality.xls;
2. Enter your sample data in columns “I” and “J” starting from cell 1;
3. Click the button “Compute,” introduce the sample size and the desired number of equiprobable intervals. The recommended number of intervals for the two-dimensional logistic alternative is , while the recommended number of intervals for the two-dimensional normal alternative is 3;
5. Numerical values of and (see Section 3.5.3) are in cells F2 and G2, respectively. Cells F3 and G3 contain the corresponding percentage points at level 0.05. The P-values of and are in cells F4 and G4, respectively.
1. Bol’shev LN. Asymptotical Pearson’s transformations. Theory of Probability and its Applications. 1963;8:129–155.
2. Feller W. On probability problems in the theory of counters. In: Courant Anniversary Volume. New York: Interscience Publishers; 1948;105–115.
3. Prudnikov AP, Brychkov YA, Marichev OI. Integrals and Series. Nauka, Moscow: Elementary Functions; 1981.
4. Prudnikov AP, Brychkov YA, Marichev OI. Integrals and Series. Nauka, Moscow: Additional Chapters; 1986.
3.129.39.252