7.2 Populations and Samples

A research team may want to investigate how a certain diet affects the blood pressure of healthy people. It would, of course, be impractical to try to convince every healthy person in the world to participate in this study. Instead, they choose to work with a group of people. The group should be representative for all healthy people, so they must be careful to choose the people randomly. For example, they should not choose people working in a single profession, practicing the same sport, or having the same age, since people in such groups can be expected to respond more similarly to the diet than the whole group of healthy people. After testing their selected group, the researchers use statistical techniques to infer something about how the whole group of healthy people responds to the diet.

In this example, the whole group of healthy people is called the population and the selected group is called a sample. The population always consists of the complete set of possible observations and, in this case, it includes all people that have ever lived and those who are not yet born. This is because we are interested in treating any possible person with the diet, not only those who live today. Entire populations are most often so large that it is practically impossible to work with them. For practical reasons, we always work with subsets of populations – samples. To be useful to our investigations they must be random samples. This means that the sampling must be made so that every possible observation occurs with equal probability. A non-random sample cannot be assumed to be representative of the population.

Random sampling is a central assumption both when designing and analyzing many experiments. It is by no means self-evident that the assumption is true. If we want to measure the flight speed of a certain species of bird, for example, it may prove quite difficult to obtain a random sample. The flying ability varies between individuals of a flock, depending on age, state of the plumage and so on. If younger birds were to be more active and fly more often, we would get a biased sample if we simply chose to study any birds that happened to be flying when we were watching. We should choose our sample so that every bird in the flock has equal probability of being studied, and this is no easy task.

Example 7.1: Imagine that you want to determine the average living space in the nation. Investigating each and every household would be impractical, so you want to collect a representative, random sample. A colleague proposes that you use the coordinate system on a map, choose coordinates using a random number generator and then choose the households that are closest to those coordinates for your sample. It sounds like a good idea, but after thinking about it you realize that this won't produce a random sample at all. Some households are in densely populated areas, while others are in rural areas. For a city apartment to be chosen, the random coordinate must be very close to its address. A countryside farm, on the other hand, may be chosen if the coordinate falls within a very large area. It is reasonable to assume that houses in the country tend to be larger than apartments in a city. Such a sample will thereby be biased towards larger living spaces, as country houses have a greater probability of being chosen.


Exercise 7.1: Propose a method for drawing a truly random sample from the households in a nation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.97.170