Appendix: Solutions to the Odd-Numbered Problems

3-1: Select Tasks and utilities ▶ Utilities ▶ Import Data.

Click Select data and find the file ClinicData.xlsx. Click Change and name the work data set Clinic. Run the task. When finished, select Tasks and Utilities ▶ Data ▶ List Data.

On the Data tab, select the data set WORK.Clinic. In the List Variables box, click the plus sign and select all the variables. Run the task.

3-3: Select Tasks and Utilities ▶ Utilities ▶ Import data.

Click Select data and find the file Diabetes.csv. Click Change to change the name of the output data set to Diabetes.

Next, select Tasks and Utilities ▶ Data ▶ List Data.

On the Data tab, select the data set WORK.Diabetes. In the List Variables box, click the plus sign and select all the variables. On the Options tab, use the menu labeled Rows to list and select First n rows. Enter n = 5. Run the task.

4-1:

*4-1;

data Diabetes;

   Length Insulin $ 1 Diet_Drinks $ 9;

   Infile
'/folders/myfolders/problems/Diabetes_No_Varnames.csv' DSD;

   input Subj Insulin $ Diet_Drinks $ Glucose;

   *Note: Dollar signs in the INPUT statement are not

    needed because the LENGTH statement already identified

    the variables Insulin and Diet_Dinks as character. Fine

    to leave them in the INPUT statement;

run;

 

title "First 5 Observations from Diabetes";

proc print data=Diabetes(obs=5);

run;

4-3:

*4-3;

data Blood_Pressure;

   infile '/folders/myfolders/problems/Blood_Pressure.txt' dlm='09'x

   dsd pad;

   length Drug $ 7 Gender $ 1;

   input Drug Subj Gender SBP DBP;

run;

Select Tasks and Utilities ▶ Data ▶ List Data.

On the Data tab, select the data set WORK.Blood_Pressure. In the List Variables box, click the plus sign and select all the variables. On the Options tab, use the menu labeled Rows to list and select First n rows. Enter n = 10. Run the task.

5-1: Select Tasks and Utilities ▶ Statistics ▶ Summary.

Select SASHELP.BMT, click the Options tab, check Number of missing values and median, and uncheck minimum and maximum. Click Plots and select Histograms and Box plot.

5-3: Use the Import utility to import the workbook. Select Tasks and Utilities ▶ Statistics ▶ Distribution analysis. On the Options tab, check histogram, goodness of fit tests, and a Q-Q plot. The results of the K-S test for SBP (p=.111) and for DBP (p=.041).

5-5: Select Tasks and Utilities ▶ Statistics ▶ One-way frequencies. On the Options tab, expand Plots and check Suppress plots. Also on the Options tab, check Omit cumulative frequencies.

6-1: Select Tasks and Utilities ▶ Import data. Select Diabetes.xls and name your output data set Diabetes.

Select Tasks and Utilities ▶ Statistics ▶ t tests. Select a one-sample test and select Glucose as the analysis variable. On the Options tab, check Tests for normality and nonparametric tests. All tests fail to reject the null hypothesis, and the tests for normality also fail to reject the null hypothesis that the glucose values are normally distributed.

6-3: Select Tasks and Utilities ▶ Statistics ▶ t tests. Next, on the Data tab, select the SASHELP.Heart data set. Check tests for normality and default plots on the Options tab as well as the alternate hypothesis that the mean is not equal to 150. The surprisingly low p-value results from the very large sample size (over 5,000), which gives the test very high power to detect even small differences.

7-1: Select Tasks and Utilities ▶ Statistics ▶ t tests. On the Data tab, select the Diabetes data set. Also on the Data tab, click Filter and enter the line:

       Diet_Drinks ne 'Sometimes'

When you run the two-sample t test with Glucose as the Analysis variable and Diet_Drinks as the Groups variable, you will see a highly significant result (t = 4.94, p<.0001), with the Glucose mean higher in the Diet_Drinks group = 'Often' compared to 'Rarely'. This data set is fictitious, but there may be some evidence that diet drinks can raise blood glucose levels.

7-3: Data Step approach.

data Coffee;

   input Subj Before After;

datalines;

1       75       83

2       65       86

3       69       68

4       65       54

5       68       60

6       77       64

7       76       67

8       74       70

9       85       53

10      63       87

11      71       90

12      62       59

;

Then select Tasks and Utilities ▶ Statistics ▶ t tests. Select Paired t test. Enter After as the Group 1 variable and Before as the Group 2 variable. p=.0428 and you do not reject the null hypothesis that the data values are normally distributed.

8-1: Use the Import utility to create a SAS data set from the Blood_Pressure.xls workbook. Next bring up the one-way tab: Select Tasks and Utilities ▶ One-way ANOVA.

On the Data tab, select WORK.BP as the data set, SBP as the dependent variable, and Drug as the categorical variable. You can accept all the default values on the Options tab. The resulting p-value for the ANOVA is .0197. The Tukey multiple comparison tests shows that Placebo and Drug B are significantly different (p = .0077). Even though Levene's test for homogeneity of variance is significant, the variances are close enough to run the test. Remember that this assumption can be violated a bit, especially when the number of subjects per group is equal (as in this case) or nearly equal.

8-3: Import the Diabetes.xls workbook in the usual way. Next start the One-way ANOVA task: Select Tasks and Utilities ▶ Statistics ▶ One-Way ANOVA. Select WORK.Diabetes on the Data tab and accept all the defaults on the Options tab. The Overall p-value is <.0001. The Tukey multiple comparison test shows a significant difference between Often and Rarely (p = <.0001) and a nearly significant difference between Rarely and Sometimes (p = .0585).

8-5: Start out in the usual way: Select Tasks and Utilities ▶ Statistics ▶ One-Way ANOVA.

Select SASHELP.BMT on the Options tab and select T as the analysis variable and Group as the categorical variable. Overall p-value = .0012. Significant differences between All and Low Risk (p=.0081) and between Low Risk and High Risk (p=.0031).

9-1: Create the SAS data set BP using the Import utility. Next, run the N-way ANOVA task: Select Tasks and Utilities ▶ Statistics ▶ N-Way ANOVA.

Select WORK.BP on the Data tab, DBP as the analysis variable, and the two variables Drug and Gender as factors. Next click on the Model tab, select the two variables Drug and Gender, and then click Full factorial model. On the Options tab, choose to look at main effects; and, under plots, include Diagnostic plots. The interaction term is not significant, and only Drug is significant at the .05 level. Tukey comparisons show that there is a difference between Drug A and Placebo (p = .0137).

9-3: Use the Import facility to create the Diabetes dataset. Next, select Tasks and Utilities ▶ Statistics ▶ N-Way ANOVA.

On the Data tab, select WORK.Diabetes. Select Glucose as the dependent variable with Diet_Drinks and Insulin as factors. Click Model and Edit. Select Diet_Drinks and Insulin and then Full Factorial model. On the Options tab, in the list under Select statistics, choose All effects. You will see a significant interaction term in the model. To determine pairwise differences, you need to look at the p-values for each combination of Diet_Drinks and Insulin.

9-5: Run the SAS program shown in the problem (or enter the data in an Excel workbook and import it). Next, select Tasks and Utilities ▶ Statistics ▶ N-Way ANOVA.

On the Data tab, select WORK.CHF. Choose LVEF as the dependent variable with Group and Weight as the two factors. On the Model tab, select the two variables, and click Full factorial model. On the options tab, select a Tukey multiple comparison test. The overall p-value for the model is .0258. The p-values for Group, Weight, and the interaction are .0094, .3708, and .2033 respectively. The Tukey multiple comparison test shows Calcium to be different from Lasix and Calcium to be different from Placebo.

10-1: Create the data set BP using the Import task on the Utilities menu. Next, select Tasks and Utilities ▶ Statistics ▶ Correlation analysis.

On the Options tab, in the Display statistics menu, select Selected statistics. Then click Spearman's Rank Order Correlations. Under Plots, select Individual plots.

10-3: Select Tasks and Utilities ▶ Data ▶ Select Random Sample.

On the Data tab, select SASHELP.Heart. On the Options tab, name the output data set Sample_Heart. On the Options tab, choose the sampling method Without replacement. Set the number in the sample to 500, and click the box to specify the random seed. Enter the number 13579 in the box provided. Next run the Correlation Analysis task on the Statistics menu, using the WORK.Sample_Heart data set that you just created. Select the three variables Height, Weight, and Cholesterol. On the Options tab, be sure to check p-values on the selected statistics menu. Request a matrix of scatter plots.

Comparing the correlations and p-values, notice that the correlations are similar to the ones you obtained with the full data set but the p-values are much larger.

10-5: Run the original program:

data Outlier;

   input X Y @@;

datalines;

0 2 5 6 6 2 3 3 1 3 4 4 8 1 6 4 2 5 4 2 6 5

;

and the program with the added data point:

data Outlier;

   input X Y @@;

datalines;

0 2 5 6 6 2 3 3 1 3 4 4 8 1 6 4 2 5 4 2 6 5 15 15

;

 

The correlation coefficient without the extra data point is .03586 (p = .9166). With the added data point, the Pearson correlation is .72767 (p = .00373), and the Spearman Correlation is .19464 (p = .5444). The lesson is to always produce a scatter plot of the data and look for outliers.

11-1: Select Tasks and Utilities ▶ Statistics ▶ Linear Regression.

On the Data tab, select SASHELP.Heart, Weight as the dependent variable, and Height as a Continuous variable. The predicted weight for a person 65 inches tall is 153.86 pounds.

11-3: Select SASHELP.Heart on the Data tab, Weight as the dependent variable, Sex as the classification variable, and Cholesterol, Systolic, and Diastolic as continuous variables. Edit the model in the usual way. Run the model. Rerun the model with Stepwise selection selected on the Selection method tab. Only two variables, Sex and Diastolic, enter using the stepwise technique.

12-1: Select Tasks and Utilities ▶ Statistics ▶ Binary Logistic regression. Select SASHELP.Heart on the Data tab and click Filter.

Enter the clause:

BP_Status ne 'Optimal'

Next, select Status as the Response variable and 'Dead' as the event of interest. Select Chol_Status and BP_Status as Classification variables. Under parameterization of effects, choose Reference coding.

12-3: Use the import utility to create the data set Risk (don't forget to click the Change button to change the default name IMPORT to Risk). Next select Tasks and Utilities ▶ Statistics ▶ Binary logistic regression.

On the Data tab, select the WORK.Risk data set and the two variables Chol_High and Age_Group as Classification variables. Click on parameterization and select reference coding. On the Model tab, select the two variables Chol_High and Age_Group. Then click Add.

13-1: Create the Risk data set using the Import Utility. Next, select Tasks and Utilities ▶ Statistics ▶ One-Way Frequencies.

Select the data set WORK.Risk on the Data tab and highlight the variables Age_Group, Chol_High, Gender, and Heart_Attack. On the Options tab, click Plots and check the box to suppress plots. Also on the Options tab, uncheck the option to compute cumulative frequencies and percentages.

13-3: This solution uses a slightly different technique than discussed in this chapter. You may choose to make a new, temporary SAS data set with assigned formats or use the solution presented here that adds a FORMAT statement as part of PROC FORMAT. When you include a FORMAT statement in a PROC, the association between the variables and formats exists only for that procedure—when you include a FORMAT statement in a DATA step, the association remains for any procedure using that data set. Here is what the edited task code should look like:

proc format;

   value yesno 1 = '1:Yes'

               0 = '2:No';

run;

 

proc freq data=WORK.RISK order=formatted;

   format Chol_High Heart_Attack yesno.;

   tables  (Chol_High) *(Heart_Attack) / chisq relrisk nopercent nocum plots=none;

run;

Remember to add the option ORDER=formatted in PROC FREQ.

13-5: Create an Excel workbook that looks like this:

Outcome   Risk_Factor   Count

Bad         1-Yes         5

Good        1-Yes         3

Bad         2-No          2

Bad         2-No          15

 

Convert this workbook to a SAS data set using the Import utility. Next, go to the Statistics task Table Analysis. Choose the newly created SAS data set and choose Outcome as the row variable and Risk_Factor as the column variable. Under Additional tasks, select Count as the frequency variable. On the Options tab, check chi-square. The Fisher's exact test (2-tailed) value is .0169, and the continuity corrected chi-square p-value is .0309.

14-1: For a power of 80%, n per group = 17. For a power of 90%, n per group is 23.

14-3: Submit the following SAS program:

proc power;

   onewayanova

   groupmeans = 50 | 60 | 70

   stddev = 10

   power = .80 .90

   npergroup = .;

   plot x = power min = .70 max = .90;

run;

For a power of 80%, the n per group = 6. For a power of 90%, the n per group = 8.

14-5: For a power of 80%, the n per group = 294. For a power of 90%, the n per group = 392

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.227.9