Application

Now that you have completed all of the activities in this chapter, use the concepts and techniques that you've learned to respond to these questions.

  1. Scenario: Return to the NHANES SRS data table.

    1. Exclude and hide respondents under age 18 and all males, leaving only adult females. Perform a regression analysis for BMI and waist circumference for adult women, and report your findings and conclusions.

    2. Is waist measurement a better predictor (in other words, a better fit) of BMI for men or for women?

    3. Perform one additional regression analysis, this time looking only at respondents under the age of 17. Summarize your findings.

  2. Scenario: High blood pressure continues to be a leading health problem in the United States. In this problem, continue to use the NHANES SRS data table. For this analysis, we'll focus on just the following variables:

    • RIAGENDR: respondent's gender

    • RIDAGEYR: respondent's age in years

    • BMXWT: respondent's weight in kilograms

    • BPXPLS: respondent's resting pulse rate

    • BPXSY1: respondent's systolic blood pressure ("top" number in BP reading)

    • BPXD1: respondent's diastolic blood pressure ("bottom" number in BP reading)

    1. Investigate a possible linear relationship of systolic blood pressure versus age. What, specifically, tends to happen to blood pressure as people age? Would you say there is a strong linear relationship?

    2. Perform a regression analysis of systolic and diastolic blood pressure. Explain fully what you have found.

    3. Create a scatterplot of systolic blood pressure and pulse rate. One might suspect that higher pulse rate is associated with higher blood pressure. Does the analysis bear out this suspicion?

  3. Scenario: We'll continue to examine the World Development Indicators data in BirthRate 2005. We'll broaden our analysis to work with other variables in that file:

    • MortUnder5: deaths, children under 5 years per 1,000 live births

    • MortInfant: deaths, infants per 1,000 live births

    1. Create a scatterplot for MortUnder5 and MortInfant. Report the equation of the fitted line and the Rsquare value, and explain what you have found.

  4. Scenario: How do the prices of used cars vary according to the mileage of the cars? Our data table Used Cars contains observational data about the listed prices of three popular compact car models in three different metropolitan areas in the U.S. All of the cars are two years old.

    1. Create a scatterplot of price versus mileage. Report the equation of the fitted line and the Rsquare value, and explain what you have found.

  5. Scenario: Stock market analysts are always on the lookout for profitable opportunities and for signs of weakness in publicly traded stocks. Market analysts make extensive use of regression models in their work, and one of the simplest ones is known as the random (or drunkard's) walk model. Simply put, the model hypothesizes that over a relatively short period of time the price of a particular share of stock is a random deviation from its price on the prior day. If Yt represents the price at time t, then Yt = Yt-1 + ε. In this problem, you'll fit a random walk model to daily closing prices for McDonald's Corporation for the first six months of 2009 and decide how well the random walk model fits. The data table is called MCD.

    1. Create a scatterplot with the daily closing price on the vertical axis and the prior day's closing price on the horizontal. Comment on what you see in this graph.

    2. Fit a line to the scatterplot, and test the credibility of the random walk model. Report on your findings.

  6. Scenario: Franz Joseph Haydn was a successful and well-established composer when the young Mozart burst upon the cultural scene. Haydn wrote more than twice as many piano sonatas as Mozart. Use the data table Haydn to perform a parallel analysis to the one we did for Mozart.

    1. Report fully on your findings from a regression analysis of Parta versus Partb.

    2. How does the fit of this model compare to the fit using the data from Mozart?

  7. Scenario: Throughout the animal kingdom, animals require sleep and there is extensive variation in the number of hours in a day that different animals sleep. The data table called Sleeping Animals contains information for more than 60 mammalian species, including the average number of hours per day of total sleep. This will be the response column in this problem.

    1. Estimate a linear regression model using gestation as the factor. Gestation is the mean number of days that females of these species carry their young before giving birth. Report on your results and comment on the extent to which gestational period is a good predictor of sleep hours.

    2. Now perform a similar analysis using brain weight as the factor. Report fully on your results, and comment on the potential usefulness of this model.

  8. Scenario: For many years, it has been understood that tobacco use leads to health problems related to the heart and lungs. The Tobacco Use data table contains recent data about the prevalence of tobacco use and of certain diseases around the world.

    1. Using cancer mortality (CancerMort) as the response variable and the prevalence of tobacco use in both sexes (TobaccoUse), run a regression analysis to decide whether total tobacco use in a country is a predictor of the number of deaths from cancer annually in that country.

    2. Using cardiovascular mortality (CVMort) as the response variable and the prevalence of tobacco use in both sexes (TobaccoUse), run a regression analysis to decide whether total tobacco use in a country is a predictor of the number of deaths from cardiovascular disease annually in that country.

    3. Review your findings in the earlier two parts. In this example, we're using aggregated data from entire nations rather than individual data about individual patients. Can you think of any ways in which this fact could explain the somewhat surprising results?

  9. Scenario: In Chapter 2, our first illustration of experimental data involved a study of the compressive strength of concrete. In this scenario, we look at a set of observations all taken at 28 days (4 weeks) after the concrete was initially formulated. The response variable is the Compressive Strength column, and we'll examine the relationship between that variable and two candidate factor variables.

    1. Use Cement as the factor and run a regression. Report on your findings in detail. Explain what this slope tells you about the impact of adding more cement to a concrete mixture.

    2. Use Water as the factor and run a regression. Report on your findings in detail. Explain what this slope tells you about the impact of adding more water to a concrete mixture.

  10. Scenario: Prof. Frank Anscombe of Yale University created an artificial data set to illustrate the hazards of applying linear regression analysis without looking at a scatterplot (Anscombe 1973). His work has been very influential, and JMP includes his illustration among the sample data tables packaged with the program. You'll find Anscombe both in this book's data tables and in the JMP sample data tables. Open it now.

    1. In the upper-left panel of the data table, you'll see a red triangle next to the words The Quartet. Click on the triangle, and select Run Script. This produces four regression analyses corresponding to four pairs of response and predictor variables. Examine the results closely, and write a brief response comparing the regressions. What do you conclude about this quartet of models?

    2. Now return to the results, and click the red triangle next to Bivariate Fit of Y1 By X1; select Show Points and re-interpret this regression in the context of the revised scatterplot.

    3. Now reveal the points in the other three graphs. Is the linear model equally appropriate in all four cases?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.104.97