The average is after all a single numerical value and may fail to reveal the data entirely. Thus after measuring the central tendency, the next step is to find some measure of variability of data. Variability is the spread or scatter of the separate scores around their central tendency.
We discussed the commonly used measures of central tendency which are useful in providing descriptive information concerning a set of data. However, the information so obtained is neither exhaustive nor comprehensive, as the mean does not lead us to know whether the observations are close to each other or far apart. Median is the positional average and has nothing to do with the variability of the observations in the series. Mode is the largest occurring value independent of other values of the set.
This leads us to conclude that a central value or an average alone cannot describe the distribution adequately. Moreover, two or more sets may have the same mean but they may be quite different. To clear this point, let us consider the scores of two groups of students on the same test:
Here, both the groups have the same mean score of 60. From first inspection, we might say that the two sets of scores are equal in nature. In the first group, the range of scores is from 24 to 90, while in the second, the range is from 50 to 70. This difference in range shows that the students in the second group are more homogeneous in scoring than those in the first. So both the groups differ widely in the variability of scores.
Thus, while studying a distribution it is equally important to know whether the observations are clustered around or scattered away from the point of central tendency. Measures of dispersion help us in studying the extent to which observations are scattered about the average or the central value.
Measures of Dispersion are the following:
The range of a set of observations is defined as the difference between the largest and the smallest value. The range is a measure of observations among themselves and does not give an idea about the spread of the observations around some central value. The range is defined by
where XL is the largest of the observed values and XS is the smallest of the observed values.
Calculate the range of the following set of scores:
Solution:
Here, the largest observed score XL = 41
The smallest observed score XS = 19
Therefore, the range (R) = XL − XS
= 41 − 19
R = 21
What are the different measures of dispersion? Clarify.
Calculate the range for the following set of scores:
The quartile deviation is defined as the difference between Q3 and Q1 in a frequency distribution. It is computed by the formula:
where Q is the quartile deviation, Q3 is the third quartile and Q1 is the first quartile.
For computing Q, we must first get the values of Q1 and Q3, the first and third quartiles, respectively.
In the previous chapter, we discussed about the median as a measure of central tendency. We know that the median is the value of the score which divides the distribution into two equal halves. Similarly, quartiles are the points which divide a distribution into quarters. Q1 is the first quartile, Q3 is the third quartile and Q2 is the second quartile or median.
The quartile can be computed by the following formula:
where L is the lower limit of the class in which the quartile lies, Cf is the ‘less than’ cumulative frequency of the class preceding the quartile class, f is the frequency of the quartile class and i is the size of the class interval.
To find Q1 and Q3, we must first find quartile class.
The quartile class for Q1 is the class in which N/4th case falls. Similarly for Q3, the quartile class is the class in which 3N/4th case falls. The computation of Q1 and Q3 may be explained through the following example:
Calculate the first and third quartiles for the following frequency distribution:
Solution:
Scores Class Interval (CI) |
Frequency (f) | ‘Less than’ Cumulative Frequency (Cf) |
---|---|---|
14–15 |
3 |
60 |
12–13 |
8 |
57 |
10–11 |
15 |
49 |
8–9 |
20 |
34 |
6–7 |
10 |
14 |
4–5 |
4 |
4 |
|
N = 60 |
Calculation of Q1: Since = 15th cases fall in the class interval 8–9, the quartile class for Q1 is 8–9, with its class interval 2 and true limits ‘7.5–9.5’.
Therefore, L = 7.5, Cf = 14, f = 20, i = 2,
Substituting these values in the formula
Calculation of Q3: Since = 45th cases fall in the class interval 10–11, the quartile class for Q3 is 10–11 with its class interval 2 and true limits ‘9.5–11.15’.
Therefore, L = 9.5, Cf = 34, f = 15, i = 2.
Substituting these values in the formula
Find quartile deviation for the frequency distribution given in Example 2.2.
Solution: Quartile deviation is computed by the formula:
As calculated in Example 2.2
3. What is the value of Q when Q1 = 34 and Q3 = 76?
4. The following are the data given in the form of frequency distribution:
Scores | f |
---|---|
90–99 |
2 |
80–89 |
12 |
70–79 |
22 |
60–69 |
20 |
50–59 |
14 |
40–49 |
4 |
30–39 |
1 |
|
N = 75 |
The average of the absolute deviations of every variate value from the mean is called the mean deviation (MD) or the average deviation (AD). On averaging deviations to find the mean deviation, no account is taken of signs, and all deviations whether plus or minus are treated as positive. Thus, if some of the deviations are +4, −9, +2, +5, − 1 and −3, we simply add 4, 9, 2, 5, 1 and 3, getting 24.
The formula for computing mean deviation from ungrouped data is
where N is the number of observations and signifies the sum of absolute deviations (irrespective of positive or negative sign) taken from the mean of the series.
The computation of mean deviation will be clearer by taking an example.
Find the mean deviation for the following set of observations:
Solution: In order to find mean deviation, we first calculate the mean for the given set of observations.
Here,
The formula for computing the mean deviation is
In the case of grouped data, the formula for calculating the mean deviation is as follows:
Computation of mean deviation will be clear from the following example:
Find the mean deviation for the following frequency distribution:
Solution: As usual, the mean for the given distribution is first calculated to get the mean deviation.
Here,
5. Find the mean deviation for the following frequency distribution:
Scores | No. of Students |
---|---|
60–62 |
5 |
63–65 |
18 |
66–68 |
42 |
69–71 |
27 |
72–74 |
8 |
The most widely used measures for showing the variability of a set of scores are variance and standard deviation (SD). The variance is defined as the average of the squares of deviations of the observations from the arithmetic mean. The standard deviation is defined as the positive square root of the arithmetic mean of the squares of deviations of the observations from the arithmetic mean. It may also be called as ‘root mean square deviation from mean’ and is generally denoted by the small Greek letter (sigma).
There are two ways of computing variance and standard deviation (SD) for ungrouped data.
Direct Method. The formulae for finding the variance and SD for a set of scores are as follows:
where X is the individual score, is the arithmetic mean and N is the total number of observations.
The steps involved in the computation procedure may be listed as follows.
The following example will show how these steps are followed.
Find the variance and SD for the following ungrouped data:
where ΣX2 is the sum of squares of raw scores, 2 is the square of arithmetic mean of the given data and N is the total number of observations.
The following steps are involved in the computation procedure of variance and SD:
Calculate the arithmetic mean of the given data.
to obtain variance.
These steps will be clear from the following example
Find the variance and SD for the data given in Example 2.6 through ‘raw score’
Solution:
S. No. | X | X2 |
---|---|---|
1 |
23 |
529 |
2 |
21 |
441 |
3 |
18 |
324 |
4 |
17 |
289 |
5 |
16 |
256 |
6 |
15 |
225 |
7 |
14 |
196 |
8 |
12 |
144 |
N = 8 |
Σ X = 136 |
Σ X2 = 2404 |
Substituting the values in the formula
Short-Cut Method. In most of the cases, the arithmetic mean of the given data happens to be a fractional value and then the process of taking deviations and squaring them becomes tedious and time-consuming in the computation of variance and SD. To facilitate computation in such situations, the deviations may be taken from an assumed mean. The short-cut formula for calculating SD then becomes
where d is the deviation of the variate from an assumed mean, say a, i.e. d = (x ∔ a), d2 is the square of the deviations, Σ d is the sum of the deviations, Σ d2 is the sum of the squared deviations and N is the total number of variates.
The following are the steps of this method for calculating variance and SD:
Take some assumed mean, say a (as in the case of calculating arithmetic mean).
The computation procedure is clarified in the following example.
Find the variance and SD by short-cut method for the data given in Example 2.6.
Solution:
Let us take the assumed mean = 18.
(We can take other values also as assumed mean.)
From Examples (2.7) and (2.8), what have we found? We find that variance (11.5) and SD (3.39) are the same in both raw score and short-cut methods.
For calculating variance and standard deviation (SD) for grouped data, there are two methods analogous to ungrouped data:
Direct Method. This method uses actual arithmetic mean while considering deviations of given observations. The formulae for calculating variance and SD are as follows:
The method can be described by the following steps:
The computation procedure is clarified in the following example.
Find the variance and SD for the following frequency distribution.
Solution:
The method illustrated above is quite tedious. Let us consider the ‘raw score’ formulae for grouped data to compute variance and SD as already discussed for ungrouped data. The ‘raw score’ formulae are
The steps involved in the computation procedure may be listed as
The following example will clear these steps.
Find the variance and SD for the frequency distribution given in Example 2.9 through ‘raw score’ formulae.
Variance (203.69) and SD (14.3) are the same as in the previous example.
Short-Cut Method. In the short-cut method, our main aim is to reduce the computations. To simplify the procedure, we take deviations from some assumed mean instead of the actual mean. The calculations can be further simplified by dividing these deviations from class interval. The formulae for calculating variance and SD by this method are
where d is the deviation from the assumed mean, i.e. , N is the total number of observations and i is the class interval.
Computation by this method can be described by the following steps:
Use the above formula to calculate variance and SD. The following example will illustrate how these steps are followed.
Use the short-cut method to compute variance and SD for the frequency distribution given in Example 2.9.
Solution: First, in the middle of the distribution choose an estimated mean class. We choose 50–54 class. However, other classes can also be chosen. The mid-point of this class is 52, which will be the assumed mean a.
Here, Σfd2 = 334, Σfd = 18, N = 40.
Substituting these values in the formula
This illustrates shows the short-cut method reduces the calculations.
6. The variance of a distribution is 9. What will be the standard deviation?
7. For the following list of test scores:
8. The following are the data given in the form of frequency distribution:
Scores | f |
---|---|
60–69 |
1 |
50–59 |
4 |
40–49 |
10 |
30–39 |
15 |
20–29 |
8 |
10–19 |
2 |
If two frequency distributions have means 1 and 2 and standard deviations σ1 and σ2 respectively, then the combined variance, denoted by σ212, and SD, denoted by σ12, of the two distributions is obtained by using
where N1 is the total number of observations in the first frequency distribution, N2 is the total number of observations in the second frequency distribution, σ1 is the SD of the first frequency distribution, σ2 is the SD of the second frequency distribution, D1 = (2 − 1) is the difference between the combined mean and the mean of the first frequency distribution and D2 = (12 − 1) is the difference between the combined mean and the mean of the second frequency distribution.
The formula can be extended to any number of observations. An example will illustrate the use of the formula.
We are given the means and SDs on an achievement test for two classes differing in size. Find the variance and SD of the combined group data given as follows:
Solution: First, by using the following formula
Now, by substituting the values in the following formula:
we get
9. In Sample A (N = 150), = 120 and σ = 20; in Sample B (N = 75), = 126 and σ = 22. What are the mean and SD of A and B when combined into one distribution of 225 cases?
10. What is standard deviation? Why do we calculate SD?
The data are first tabulated in a frequency distribution. The next step is to analyse the data. It can be done by finding measures of central tendency and variability. But for quick and easy understanding, the data can be represented graphically. The following are the methods of graphical presentation of the data:
The histogram is the most popular and widely used method of presenting a frequency distribution graphically.
Steps:
Represent the following data by means of a histogram
Class Interval | Frequency (f) |
---|---|
90–94 |
1 |
85–90 |
4 |
80–84 |
2 |
75–79 |
8 |
70–74 |
9 |
65–69 |
14 |
60–64 |
6 |
55–59 |
6 |
50–54 |
4 |
45–49 |
3 |
40–44 |
3 |
|
N = 60 |
A frequency polygon is a many-sided figure representing the graph of a frequency distribution. In a frequency polygon, a mid-point of the class interval represents the entire interval. The frequency of the interval is drawn against the mid-point of the interval. The assumption is that all the scores are centred at the mid-point of the interval.
The following are the steps of construction:
Sometimes we find irregularities in the frequency distribution or data on a small sample. The frequency polygon of such distributions is jagged. To remove the irregularities and get a more clear perception of the data, the frequency polygon may be smoothed as shown in the figure. To smooth the polygon, running averages of frequencies are taken as new or adjusted or smoothed frequencies. To find the smoothed frequencies, we add on the given interval and the fs of two adjacent intervals and divide the sum by 3. For example, the smoothed frequency of interval 45–49 is (3 + 3 + 4)/3 = 3.33.
The process is illustrated as
Class Interval | Frequency (f) | Smoothed (f) |
---|---|---|
90–64 |
1 |
(0 + 1 + 4)/3 = 1.66 |
85–89 |
4 |
(1 + 4 + 2)/3 = 2.33 |
80–84 |
2 |
(4 – 2 – 8)/3 = 4.66 |
75–79 |
8 |
(2 + 8 + 9)/3 = 6.33 |
70–74 |
9 |
(8 + 9 + 14)3 = 10.33 |
65–69 |
14 |
(9 + 14 + 16)/3 = 9.66 |
60–64 |
6 |
(14 + 6 + 6) = 8.66 |
54–59 |
6 |
(6 + 6 + 4)/3 = 5.33 |
50–54 |
4 |
(6 + 4 + 3)/3 = 4.33 |
45–49 |
3 |
(4 + 3 + 3)/3 = 3.33 |
40–44 |
3 |
(3 + 3 + 0)/3 = 2 |
The cumulative frequency of an interval is found by adding to the frequency against it. It is the sum of all the frequencies against the interval below it. We start from the bottom for cumulating the frequency. Then we plot the cumulated frequency. Then we plot the cumulated frequencies instead of the respective frequencies against the intervals. It is to be noted that in a cumulative frequency curve, each cumulative frequency is plotted against the upper limit of the intervals. The process of finding cumulative frequencies and drawing the curve is illustrated below. The cumulative frequency curve starts at the lowest interval touching the X-axis, it rises gradually and becomes almost parallel to the X-axis after reaching the highest point.
Class Interval | Frequency (f ) | Smoothed (f ) |
---|---|---|
90–94 |
1 |
59 + 1 = 60 |
85–89 |
4 |
55 + 4 = 59 |
80–84 |
2 |
53 + 2 = 55 |
75–79 |
8 |
45 + 8 = 53 |
70–74 |
9 |
36 + 9 = 45 |
65–69 |
14 |
22 + 14 = 36 |
60–64 |
6 |
3 + 3 + 4 + 6 + 6 = 22 |
54–59 |
6 |
3 + 3 + 4 + 6 = 16 |
50–54 |
4 |
3 + 3 + 4 = 10 |
45–49 |
3 |
3 + 3 = 6 |
40–44 |
3 |
3 |
|
N = 60 |
|
Figure 2.4 Cumulative frequency curve
All the measures of dispersion discussed so far have units. If two series differ in their units of measurement, their variability cannot be compared by any measure so far. Also, the size of measures of dispersion depends upon the size of the values. Hence, in situations where either of the two series has different units of measurements, or their means differ sufficiently in size, the coefficient of variation should be used as a measure of dispersion.
It is sometimes called the coefficient of relative variability. It is a unitless measure of dispersion and also takes into account the size of the means of the two series. It is the best measure to compare the variability of two series or sets of observations. A series with less coefficient of variation is considered more consistent.
Coefficient of variation of a series of variate values is the ratio of standard deviation to the mean multiplied by 100.
If σ is the standard deviation and is the mean of the set of values, the coefficient of variation is
An example will illustrate the use of the formula.
From the data give in Example 2.9, calculate the coefficient of variation.
Solution: We have already calculated = 54.25 and SD = 14.3 from the data given in the example. Therefore, by using the formula, we get
Percentile is nothing but a sort of measure used to indicate the relative position of a single item of individual in context with the group to which the item of individual belongs. In other words, it is used to tell the relative position of a given score among other scores. A percentile refers to a point in a distribution of scores or values below which a given percentage of the cases occur.
The percentile is named for the percentage of cases below it. Thus, 67 per cent of the observations are below the sixty-seventh percentile, which is written as P67. The middle of a distribution or a point below which 50 per cent of cases lies in the fiftieth percentile P50, which is the median, has been discussed in detail in the previous section. Similarly P25 and P75 are the first quartile Q1 and third quartile Q3, respectively, which have been previously discussed in the section.
When we wish to compute the percentile, we will determine the score below which a given per cent of cases will fall. First the class in which the Pth percentile lies may be identified. This is the class in which PN/100th frequency falls. The formula for computing percentile is as follows:
where L is the lower limit of the Pth percentile class, N is the total number of cases, Cf is the less than cumulative frequency of class preceding to percentile class, f is the frequency of the percentile class and i is the class interval.
The computation procedure is clarified in the following example.
Find the 45th percentile, P45, for the following frequency distribution:
Solution:
Class Interval | Frequency (f) | ‘Less than’ Cumulative Frequency (Cf) |
---|---|---|
80–84 |
1 |
40 |
75–79 |
2 |
39 |
70–74 |
4 |
37 |
65–69 |
3 |
33 |
60–64 |
6 |
30 |
55–59 |
5 |
24 |
50–54 |
3 |
19 |
45–49 |
5 |
16 |
40–44 |
3 |
11 |
35–39 |
5 |
8 |
30–34 |
1 |
3 |
25–29 |
2 |
2 |
|
N = 40 |
|
Since = 18th frequency lies in 50–54 class, hence P45 class is 50–54.
Therefore, the actual lower limit of the class is = 49.5
Cf = 16
f = 3
i = 5.
Substituting the values in the formula
In this frequency distribution, 52.8 is the point below which 45 per cent of cases will fall.
The percentile rank of a given score in a distribution is the per cent of the total scores which fall below the given score. A percentile rank then indicates the position of a score in a distribution in percentile terms. If for example, a student had a score which was higher than 70 per cent of the scores in the distribution, but not higher than 71 per cent, his percentile rank would be 70.
To compute the percentile rank for a score from the grouped data, we will need to determine the number of cases below the score in order to determine what per cent of the total cases would fall below that score. The formula for computing percentile rank is
where the symbols have the same meaning as in the case of percentiles.
Let us explain this in the following example.
Find the percentile rank for the score of 63 in the frequency distribution given in Example 2.15.
Solution: Let us consider the values that will be substituted into the formula; raw score = 63 (given in the problem).
L - 59.5 (since the score 63 falls in the class 60–64 for which the real lower limit is 59.5)
l = 5, |
f = 6, |
Cf = 24, |
N = 40. |
The computation is as follows:
This answer indicates that 70.5 per cent of the cases fall below the score 63. Usually percentile would make the percentile rank 71 in this problem.
Note: This formula can also be used for finding percentile ranks from simple frequency distributions by assigning i = L.
1. Ferguson, G.A. (1980), Statistical Analysis in Psychology and Education. McGraw Hill Book Co.: New York, p. 20–21.
2. Garrett, H.E. (1986), Statistics in Psychology and Education. McGraw Hill: Tokyo, p. 15.
3. Guilford, J.P. (1978), Fundamental Statistics in Psychology and Education. McGraw Hill Book Co.: New York, p. 18–19.
4. Kurtz, A.K. and Mayo S.T. (1980), Statistical Methods in Education and Psychology. Narosa Publishing House: New Delhi, p. 10–11.
5. Mangal, S.K. (1987), Statistics in Psychology and Education. Tata McGraw Hill Publishing Company Ltd.: New Delhi, p. 24.
6. McCall, R.B. (1980), Fundamental Statistics for Psychology. Harcourt Brace Jovanavich Inc.: New York, p. 23.
7. Aggarwal, Y.P. (1988), Statistical Methods: Concepts Application and Computation. Sterling Publishers Pvt. Ltd.: New Delhi, p. 25–26.
1. Grant, E.L. (1952), Statistical Quality Control IInd Ed. McGraw Hill Book Co. Inc.: New York.
2. Downie, N.M. and Heath, R.W. (1970), Basic Statistical Method. Harper & Row Publishers: New York.
3. Stanedecor, C.W. (1956), Statistical Methods. Eyova State College Press: Ames.
4. Pathak, R.P. (2007), Statistics in Educational Research. Kanishka Publishers & Distributors: New Delhi.
5. McNemer, Q. (1962), Psychological Statistics. John Wiley and Sons: New York.
3.141.104.97