Visualizing simple regressions with fitted line plots

The fitted line plot is a simple regression tool in Minitab that produces a scatterplot with a least square regression line fitted to the data. It provides additional output to the regression fits that are provided under scatterplot tools, such as the analysis of variance and R-squared statistics.

This will only use a single predictor; multiple predictors are best used with Regression… or General Regression….

We will use the data from the Oxford weather station to investigate the relationship between the mean maximum temperature and hours of sunlight per month.

Getting ready

We will use the data from the Oxford weather station in this example. This data is from the Met Office website and is found at the following location:

http://www.metoffice.gov.uk/climate/uk/stationdata/

Select the Oxford station. The data is also made available in the Oxford data.txt files, which preserve the format from the website. Also, the Oxford weather (Cleaned).MTW Minitab file will be correctly imported into Minitab for us.

How to do it…

The following steps will plot the relationship between the hours of sunlight and mean maximum temperature. This will fit a least squares regression line and generate the analysis of variance statistics.

  1. Follow the given link to the Met Office weather station site.
  2. Choose the Oxford station.
  3. In your web browser, save the file as a text file.
  4. In Minitab, go to File and select Open Worksheet….
  5. Change Files of type: to Text (*.txt).
  6. Select the file that we have just saved or the provided Oxford Data.txt file.
  7. Go to the Stat menu, then go to Regression and select Fitted Line Plot….
  8. In the Response(Y): section, enter the column for the maximum temperature.
  9. In the Predictor(X): section, enter the column for Sun(hours).
  10. Select Options… and check the boxes for confidence and prediction intervals.
  11. Click on OK.
  12. Select Graphs… and select the Four in one residual plots.
  13. Click on OK in each dialog box.

How it works…

The results of the fitted line plot are generated as a graphical page, displaying the scatterplot and least squares regression line. There is also an Analysis of Variance table in the session window. This output will give us the regression model and the R-squared and R-squared adjusted values. We should be careful with R-squared values. They are not a measure of the quality of the model; they report the amount of variation that we have explained in the data. We should observe in the study that 67 percent of the variation in the mean maximum temperature is accounted for by the hours of sunlight. Rather than looking for R-squared values of 80 or 90 percent, we should consider the implications on the results.

We have accounted for over two-thirds of the variation with a metric of the hours of sunlight, and its result seems quite high.

We should also be careful of correlation and causation. Are hours of sunlight really the cause? Hours of sunlight will be affected by the time of the year and weather conditions.

Selecting the options for prediction intervals and confidence intervals will place 95 percent CI and 95 percent PI lines around the fitted line.

Residual plots are an important diagnostic in informing us of problems related to the fit of the data. The four in one residuals generate a page displaying Normal Probability Plot, Histogram, Versus Fits, and Versus Order of the data to check the assumptions of the regression model. These should be used to check the assumptions of the normality of the residual error, homoscedasticity, and check patterns over time.

Here, the residuals versus order of the data will show a gap at the start of the results. This relates to the data for only the hours of sunlight being collected after 1929.

There's more…

Quadratic and Cubic models can be easily selected from the main dialog box. When selecting either of these models, we generate the sequential sum of squares in the session window. The sequential output will show us the amount of the sum of squares accounted for by the linear term, and then the amount the quadratic term that has been added to the model. If we include the cubic term, this will show the additional sum of squares accounted for by the cubic term over the quadratic term.

See also

  • The Using the Assistant tool to run a regression recipe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.253.223