Checking Data for Suitability of Normal Model

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Checking Data for Suitability of Normal Model

We've already noted that there are times when we'll want to ask how well a normal distribution can serve as a model of a real-world population or process. Because there are often advantages to using a normal model, the practical question often boils down to asking if our actual data deviate too grossly from a normal distribution. If not, a normal model can be quite useful.

In this chapter we'll introduce a basic approach to this question, based on the comparison of observed and theoretical quantiles (that is, the types of comparisons we've just made in the prior section). In Chapter 9 we'll return to this topic and refine the approach.

Normal Quantile Plots

In the previous section, we computed three different quantiles of the normal variable X~ N(119.044, 18.841). We also have a large data table containing well over 6,000 observations of a variable that shares the same mean and standard deviation as X. We also know the medians (50^th percentile) for each. If we compare our computed quantiles to the observed quantiles we see the following:

Table 6.1. Comparison of Observed and Theoretical Quantiles
PercentileValue	Observed Value BPXSY1	Computed Value of X
25	106	106.34
50	116	119.04
75	128	131.75
90	142	143.19

The observed and computed values are similar, though not identical. If BPXSY1 were normally distributed, the values in the last two columns of Table 6.1 would match perfectly.

We could continue to calculate theoretical quantiles for other percentiles and continue to compare the values one pair at a time. Fortunately, there is a more direct and simple way to carry out the comparison—a Normal Quantile Plot (sometimes known as a Normal Probability Plot, or NPP).

In JMP a normal quantile plot is a scatterplot with values of the observed data on the vertical axis and theoretical percentiles on the horizontal.^[2] If the normal model were to match the observed data perfectly, the points in the graph would plot out along a 45° diagonal line. For this reason the plot includes a red diagonal reference line. To the extent that the points deviate from the line, we see imperfections in the fit. Let's look at two examples.

^[2] These are the default axis settings.

Return to the NHANES Distribution report window.
Hold down the CTRL key and click on the red triangle next to BPXSY1; select Normal Quantile Plot.

Figure 6.8 shows the plots for both blood pressure columns. Neither shows a perfectly straight diagonal pattern, but the plot of diastolic pressure on the right more closely runs along the diagonal for most of the distribution. The normal model fits poorly in the tails of the distribution but pretty well elsewhere.

Recall that we have more than 6,600 observations here. The shadowgram, histogram, and box plots show that the diastolic distribution is much more symmetric than the systolic.

Figure 6.8. Normal Quantile Plots for Blood Pressure Data

As an example of a quite good fit for a normal model, let's look at some other columns from a subset of the NHANES data table, selecting just the two-year-old girls in the sample.

Select Rows → Row Selection → Select Where. We need to specify two conditions to select the rows corresponding to two-year-old girls. Within the dialog box, first set the condition that RIAGENDR equals 2 (the code for females), and then click the Add Condition button in the lower portion of the dialog box.

Figure 6.9. Setting Select Rows Criteria
Now choose RIDAGEYR equals 2 to select the two-year-old children, click Add Condition, and then OK.

If you now scroll down the rows of the data table you'll find a relatively small number of rows selected. In fact there are just 177 two-year-old girls in this sample of almost 10,000 people.

In the Rows panel of the data table window, point to the word Selected, right-click, and choose Data View. This opens a new data table containing just the selected rows.

Figure 6.10. Choosing the Data View of Selected Rows
Select Analyze → Distribution. Cast BMXRECUM as Y. This column is the recumbent (reclining) height of these two-year-old girls.
Create a normal quantile plot for the data; it will look like Figure 6.11 and here we find the points are very close to the diagonal, suggesting that the normal model would be very suitable in this case.

Figure 6.11. Normal Quantile Plot for Recumbent Length
As a final example in this section, perform all of the steps necessary to create a normal quantile plot for INDFMPIR. The correct result is shown in Figure 6.12.

This column is equal to the ratio of family income to the federally established definition of poverty. If the ratio is less than 1, this means that the family lives in poverty. By definition if the ratio is more than 5.0, NHANES records the value as equal to 5; there are 11 such families in this subsample.

Figure 6.12. Quantile Plot for Family Income Poverty Ratio

In contrast to the prior graph we should conclude that a normal distribution forms a poor model for this variable. One could develop a normal model, but the results calculated from that model would rarely come close to matching the reality represented by the observed data from the families of two-year-old girls.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Checking Data for Suitability of Normal Model

Create new playlist

Sign In

Sign Up

Checking Data for Suitability of Normal Model

Normal Quantile Plots

Figure 6.8. Normal Quantile Plots for Blood Pressure Data

Figure 6.9. Setting Select Rows Criteria

Figure 6.10. Choosing the Data View of Selected Rows

Figure 6.11. Normal Quantile Plot for Recumbent Length

Figure 6.12. Quantile Plot for Family Income Poverty Ratio

Table of Contents for
Checking Data for Suitability of Normal Model