8.13 Summary

  • A statistical hypothesis test answers a question that contains a statistical parameter and a word expressing a difference. β€œIs the sample mean different from the target value?’ is an example of such a question.
  • The null hypothesis assumes that any apparent difference is due to sampling error – natural, random variation in the data. It is coupled to a reference distribution that describes all possible outcomes of the experiment if it is completely governed by chance.
  • To ensure that this reference distribution is valid, the order of the experimental treatments must be randomized. Otherwise, drift in the conditions could produce apparent effects that would be wrongly attributed to the experimental treatments.
  • The alternative hypothesis is the opposite of the null hypothesis and generally represents an effect that we want the experiment to reveal. If the significance test discredits the null hypothesis, the alternative hypothesis is supported and we say that the result is statistically significant.
  • The confidence level is the risk of making a Type I error, or the probability of rejecting the null hypothesis when it is true. The power is the risk of making a Type II error: accepting the null hypothesis when it is false.
  • The one-sample t-test is used for comparing a sample mean to a target value. The two-sample t-test is used for comparing the means of two independent samples to each other. If the samples are dependent, the paired t-test is used, which is a special case of the one-sample t-test.
  • Three or more sample means are compared using ANOVA. It is based on breaking the variation in the data down into several components and comparing them to each other. If the variation in the data is due to two variables (or factors) the two-way ANOVA is used.
  • The correlation coefficient is a measure of the strength and direction of the linear association between two variables. The existence of a strong correlation between two variables may indicate a direct dependence between them, but it is important to keep in mind that correlation is not the same thing as causation.
  • A regression model can be used to describe the relationship between numerical variables. It is a mathematical function fitted to the data. The coefficient of determination, R2, is a measure of how well the model describes the variation in the data. Residual plots are used to assess the quality of the model. The residuals should be normally distributed and have constant variance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.128