# MATH 411Take-Home Practice Exam (Instructions on page 6) complete solutions correct answers key

**MATH 411, Winter 2013, Take-Home Practice Exam (Instructions on page 6)**

**complete solutions correct answers key**

**1.**The following data were collected for 32 light water nuclear power plants.

**Cost**: Cost to build the plant in $100,000 dollar units, adjusted to a 1976 base.

**Date**: Date of construction permit in years since 1900. ( These are not necessarily integers, 70.75, for example, means 70 and three quarter years since 1900. )

**MWatts**: Plant capacity in megawatts. The goal was to model cost in terms of date and megawatts. Here is the scatterplot matrix of the data.

**Three regression models were examined, and summarized as follows. Note that**

**Cm**and

**Cd**denote slopes or partial slopes for

**MWatts**or

**Date**, respectively.

**MODEL 1: Cost = Intercept + Cm*MWatts**

**Estimate Std. Error t value Pr(|t|)**

**(Intercept) 111.7408 122.3755 0.913 0.36847**

**MWatts 0.4238 0.1446 2.931 0.00641**

**Residual standard error: 152.5 on 30 degrees of freedom**

**Multiple R-squared: 0.2226, Adjusted R-squared: 0.1966**

**F-statistic: 8.588 on 1 and 30 DF, p-value: 0.006414**

**MODEL 2: Cost = Intercept + Cd*Date**

**Estimate Std. Error t value Pr(|t|)**

**(Intercept) -6553.57 1661.96 -3.943 0.000446**

**Date 102.29 24.23 4.221 0.000207**

**Residual standard error: 137 on 30 degrees of freedom**

**Multiple R-squared: 0.3727, Adjusted R-squared: 0.3517**

**F-statistic: 17.82 on 1 and 30 DF, p-value: 0.0002071**

**Model 3: Cost = Intercept + Cm*MWatts + Cd*Date**

**Estimate Std. Error t value Pr(|t|)**

**(Intercept) -6790.8792 1377.6683 -4.929 3.09e-05**

**MWatts 0.4132 0.1076 3.840 0.000616**

**Date 100.7764 20.0696 5.021 2.39e-05**

**Residual standard error: 113.4 on 29 degrees of freedom**

**Multiple R-squared: 0.5841, Adjusted R-squared: 0.5555**

**F-statistic: 20.37 on 2 and 29 DF, p-value: 2.984e-06**

**a. Based on these summaries, which of the three models is the best one? Give two quantitative reasons**

**for your answer.**

**b. What do you conclude from the model you chose in part a., and why?**

**c. For Model 3, explain carefully the meanings of the partial slopes for**

**MWatts**

**and**

**Date**

**, giving**

**interpretations of them in terms of cost, megawatts, and date.**

**Partial regression plots and residual plots for Model 3 follow.**

**d. Based on these plots and any other information provided, explain why the assumptions required to**

**justify Model 3 appear to be satisfied or not.**

**Extra Credit**

**Correlation coefficients between the three measured variables are:**

**correlation(Cost, MWatts) = 0.47**

**correlation(Cost, Date) = 0.61**

**correlation(MWatts, Date) = 0.02**

**Use these results and any other information provided to explain why the partial slopes for MWatts and**

**Date in Model 3 are nearly equal to the slopes for MWatts and Date in Models 1 and 2, respectively.**

**2.**Three diets for hamsters (labeled "I","II","III") were tested for differences in weight gain (measured as grams of increase) after a specified period of time. Six inbred laboratory lines (labeled "A","B","C", "D", "E", "F") were used to represent the responses of different genotypes to the various diets. The lines were treated as blocks and all three diets were assigned randomly within each block. The data consisted of a total of 18 observations. Both one-way ANOVA and two-way ANOVA models were fit to the data. Shown below are an interaction plot, and the ANOVA tables for 2 different models of the data.

**MODEL 1**

**: gain = Intercept + Cl*line + Cd*diet**

**Df Sum Sq Mean Sq F value Pr(F)**

**line 5 71.17 14.23 7.491 0.00365**

**diet 2 36.33 18.17 9.561 0.00477**

**Residuals 10 19.00 1.90**

**MODEL 2**

**: gain = Intercept + Cd*diet**

**Df Sum Sq Mean Sq F value Pr(F)**

**diet 2 36.33 18.167 3.022 0.0789**

**Residuals 15 90.17 6.011**

**a. What are your conclusions and the reasons for those conclusions about the different diets, based on:**

**i. Model 1?**

**ii. Model 2?**

**b. Which model and conclusion is a better one? Justify your answer by discussing the effects of blocking**

**for these data. Refer in particular to the residual sums of squares and their effects on the p-values in**

**the two models.**

**c. i. What does the interaction plot suggest about whether or not line and diet interact, and why is this so?**

**ii. What is the implication of your answer to part i. for the validity of the randomized complete block**

**design model?**

**d. What kind of design model corresponds to the one-way ANOVA? Explain this design.**

**e. i. State the null hypothesis about the factor diet that is tested in the Model 1 ANOVA table.**

**ii. If you reject this null hypothesis, what is the next step in the analysis of the effect of diet on gain?**

**3.**Calories and sodium content (mg) were measured for samples of two types of hot dogs. The types were

**Non-**

**Poultry**and

**Poultry**. This was done to assess the effect of

**Type**on mean

**Sodium**content. Summaries of three models follow: an ANOVA for

**Sodium**predicted by

**Type**, an additive ANCOVA for

**Sodium**predicted by

**Type**and

**Calories**, and an ANCOVA for

**Sodium**predicted by

**Type**and

**Calories**, including interaction.

**MODEL 1**

**: Sodium = Intercept + Ct*Type**

**Estimate Std. Error t value Pr(|t|)**

**(Intercept) 409.14 15.43 26.517 <2e-16**

**TypePoultry 49.86 27.50 1.813 0.0756**

**Residual standard error: 93.85 on 52 degrees of freedom**

**Multiple R-squared: 0.05947, Adjusted R-squared: 0.04139**

**Df Sum Sq Mean Sq F value Pr(F)**

**Type 1 28963 28963.2 3.2882 0.07555 .**

**Residuals 52 458024 8808.2**

**MODEL 2**

**: Sodium = Intercept + Ct*Type + Cc*Calories**

**Estimate Std. Error t value Pr(|t|)**

**(Intercept) -109.1762 52.5665 -2.077 0.0429**

**TypePoultry 177.8399 20.5802 8.641 1.47e-11**

**Calories 3.2866 0.3284 10.010 1.25e-13**

**Residual standard error: 55.04 on 51 degrees of freedom**

**Multiple R-squared: 0.6827, Adjusted R-squared: 0.6703**

**Df Sum Sq Mean Sq F value Pr(F)**

**Type 1 28963 28963 9.5605 0.00322**

**Calories 1 303522 303522 100.1906 1.249e-13**

**Residuals 51 154502 3029**

**MODEL 3: Sodium = Intercept + Ct*Type + Cc*Calories + Ctc*Type*Calories**

**Estimate Std. Error t value Pr(|t|)**

**(Intercept) -160.5796 61.2121 -2.623 0.01151**

**TypePoultry 324.2098 94.9871 3.413 0.00128**

**Calories 3.6126 0.3840 9.408 1.2e-12**

**TypePoultry:Calories -1.1256 0.7136 -1.577 0.12102**

**Residual standard error: 54.25 on 50 degrees of freedom**

**Multiple R-squared: 0.6978, Adjusted R-squared: 0.6796**

**Df Sum Sq Mean Sq F value Pr(F)**

**Type 1 28963 28963 9.8395 0.002861**

**Calories 1 303522 303522 103.1138 9.573e-14**

**Type:Calories 1 7324 7324 2.4880 0.121024**

**Residuals 50 147178 2944**For each of the three models, plots of fits to the data, and residual plots (below each fit) are shown.

**a. Compare Models 1 and 2. Discuss p-values, R-squared, and Residual standard error. Explain why the**

**Residuals sum of squares for Model 2 is much smaller than for Model 1.**

**b. What do you conclude from Model 2 and why?**

**c. Compare Models 2 and 3. Which model is a better one, and why? Include discussion of the p-values,**

**R-squared, Residual standard error, the number of terms in the model equation, and the effect of the**

**interaction term in Model 3.**

**d. What is the Model 2 estimate for the mean difference in mg of**

**Sodium**

**between**

**Poultry**

**and**

**Non-**

**Poultry**

**hot dogs? Show how to obtain your answer from the coefficients in the regression table for**

**Model 2, keeping in mind that the variable**

**TypePoultry**

**is a dummy or indicator variable. Start by**

**writing out the equation for the Model 2 fit for the mean amount of**

**Sodium.**

**1.**There are three problems, each worth about one-third of a total of 100 for the exam. The extra credit question is worth 10 points, so the highest score possible is 110.

**2.**Answers should be concise and display knowledge of any statistical tests used, as well as adequate justification of conclusions. Clarity of presentation is important, and points will be deducted if the answers can’t be understood, or contain material not relevant to the specific problem.

**3.**Do not use a text font smaller than 10 points. Limit the text part of the answers to all three questions to no more than three pages, although less text should be sufficient to get full credit.

**4.**The exam is open book, open notes, and open reference. Any books or web sites are fair sources of information. You may NOT use live experts, especially professors, graduate students, or consulting services, as resources.

**5.**You may discuss your work, ideas, and analyses with others in the course. However, you may not copy others’ work, or provide your work to others.

**You must hand in your own best answer in your own words, and you must be able to explain it yourself.**

**If essentially identical exams with essentially identical text, graphs, etc. are handed in, there will be a significant penalty for copying. There will also be a penalty if answers are copied from any other source.**

You'll get 1 file (95.3KB)