7.9 Summary

  • Statistics is a way of coping with the variation that exists everywhere and for stating, with confidence, how uncertain we are about our conclusions.
  • Methods for summarizing essential features of data are called descriptive statistics. They include measures of central tendency and variation, as well as graphical tools for depicting data.
  • Populations consist of every possible observation and are generally impossible to obtain. In practice, we always use samples of limited size to represent the population. The population and sample standard deviations are denoted σ and s, respectively. The population and sample means are denoted μ and , respectively.
  • Random sampling means that every possible observation of a certain condition occurs with equal probability. Since random sampling is a central assumption in statistical analysis it is of great importance that we collect data in a way that produces random samples. This is often more difficult than it seems.
  • A probability density function f(x) describes the probability of finding a random variable in an infinitesimally thin interval between the values x and x + dx. This probability is equal to f(x)dx.
  • A cumulative distribution function F(x) describes the probability that a random variable is smaller than the value x. F(x) increases with x and can attain values between zero and one.
  • The standard normal distribution has mean zero and standard deviation one. Since any normally distributed variable can be transformed into one that follows the standard normal distribution, it is the only distribution we need to analyze normally distributed data. Most tables of normal probabilities are based on the standard normal distribution.
  • When sampling from an unknown population that has mean μ and standard deviation σ we expect, for sufficiently large sample sizes, n, that the sample means will tend to follow a normal distribution with mean μ and standard deviation . This is the central limit effect and it explains why the normal distribution is of such general value in statistics.
  • In normal probability plots, the values of a sample are plotted against a theoretical normal distribution. A normally distributed sample will thereby plot approximately on a straight line.
  • Due to sampling error, there is some uncertainty involved in estimating a population parameter from a sample of limited size. It is often useful instead to estimate an interval that has a specified probability of containing the parameter. Such intervals are called confidence intervals.
  • The t-distribution is a family of normal-shaped distributions that are used for estimating the mean of a population from a small sample. The standard deviation of the t-distribution decreases when the sample size increases.

