MATH 411Take-Home Practice Exam (Instructions on page 6) complete solutions correct answers key

MATH 411, Winter 2013, Take-Home Practice Exam (Instructions on page 6)  complete solutions correct answers key            1. The following data were collected for 32 light water nuclear power plants.     Cost:       Cost to build the plant in $100,000 dollar units, adjusted to a 1976 base.    Date:       Date of construction permit in years since 1900. ( These are not necessarily integers, 70.75, for                      example, means 70 and three quarter years since 1900. )    MWatts:  Plant capacity in megawatts.     The goal was to model cost in terms of date and megawatts. Here is the scatterplot matrix of the data.                                      Three regression models were examined, and summarized as follows. Note that Cm and Cd denote slopes or    partial slopes for MWatts or Date, respectively.                   MODEL 1: Cost = Intercept + Cm*MWatts       Estimate Std. Error t value Pr(|t|)  (Intercept) 111.7408   122.3755   0.913  0.36847  MWatts        0.4238     0.1446   2.931  0.00641 Residual standard error: 152.5 on 30 degrees of freedomMultiple R-squared:  0.2226,    Adjusted R-squared:  0.1966F-statistic: 8.588 on 1 and 30 DF,  p-value: 0.006414                    MODEL 2: Cost = Intercept + Cd*Date        Estimate Std. Error t value Pr(|t|)   (Intercept) -6553.57    1661.96  -3.943 0.000446Date          102.29      24.23   4.221 0.000207 Residual standard error: 137 on 30 degrees of freedomMultiple R-squared:  0.3727,    Adjusted R-squared:  0.3517F-statistic: 17.82 on 1 and 30 DF,  p-value: 0.0002071                     Model 3: Cost = Intercept + Cm*MWatts + Cd*Date                      Estimate Std. Error t value Pr(|t|)   (Intercept) -6790.8792  1377.6683  -4.929 3.09e-05MWatts          0.4132     0.1076   3.840 0.000616Date          100.7764    20.0696   5.021 2.39e-05 Residual standard error: 113.4 on 29 degrees of freedomMultiple R-squared:  0.5841,    Adjusted R-squared:  0.5555F-statistic: 20.37 on 2 and 29 DF,  p-value: 2.984e-06     a. Based on these summaries, which of the three models is the best one? Give two quantitative reasons        for your answer.     b. What do you conclude from the model you chose in part a., and why?     c. For Model 3, explain carefully the meanings of the partial slopes for MWatts and Date, giving        interpretations of them in terms of cost, megawatts, and date.       Partial regression plots and residual plots for Model 3 follow.                                                      d. Based on these plots and any other information provided, explain why the assumptions required to        justify Model 3 appear to be satisfied or not.     Extra Credit     Correlation coefficients between the three measured variables are:                            correlation(Cost, MWatts) = 0.47correlation(Cost, Date)   = 0.61                                                               correlation(MWatts, Date) = 0.02   Use these results and any other information provided to explain why the partial slopes for MWatts and    Date in Model 3 are nearly equal to the slopes for MWatts and Date in Models 1 and 2, respectively.      2. Three diets for hamsters (labeled "I","II","III") were tested for differences in weight gain (measured as        grams of increase) after a specified period of time. Six inbred laboratory lines (labeled "A","B","C", "D",        "E", "F") were used to represent the responses of different genotypes to the various diets. The lines were        treated as blocks and all three diets were assigned randomly within each block. The data consisted of a        total of 18 observations. Both one-way ANOVA and two-way ANOVA models were fit to the data. Shown        below are an interaction plot, and the ANOVA tables for 2 different models of the data.                                    MODEL 1: gain = Intercept + Cl*line + Cd*diet                                        Df Sum Sq Mean Sq F value   Pr(F)                  line         5  71.17   14.23   7.491   0.00365diet         2  36.33   18.17   9.561   0.00477Residuals   10  19.00    1.90 MODEL 2: gain = Intercept + Cd*diet                                         Df Sum Sq Mean Sq F value   Pr(F)                      diet         2  36.33  18.167   3.022   0.0789Residuals   15  90.17   6.011 a. What are your conclusions and the reasons for those conclusions about the different diets, based on:     i. Model 1?     ii. Model 2? b. Which model and conclusion is a better one? Justify your answer by discussing the effects of blocking     for these data. Refer in particular to the residual sums of squares and their effects on the p-values in     the two models. c. i. What does the interaction plot suggest about whether or not line and diet interact, and why is this so?     ii. What is the implication of your answer to part i. for the validity of the randomized complete block           design model? d. What kind of design model corresponds to the one-way ANOVA? Explain this design. e. i. State the null hypothesis about the factor diet that is tested in the Model 1 ANOVA table.     ii. If you reject this null hypothesis, what is the next step in the analysis of the effect of diet on gain?     3. Calories and sodium content (mg) were measured for samples of two types of hot dogs. The types were Non-  Poultry and Poultry. This was done to assess the effect of Type on mean Sodium content.  Summaries    of three models follow: an ANOVA for Sodium predicted by Type, an additive ANCOVA for Sodium    predicted by Type and  Calories, and an ANCOVA for Sodium predicted by Type and  Calories,    including interaction. MODEL 1: Sodium = Intercept + Ct*Type             Estimate Std. Error t value Pr(|t|)   (Intercept)   409.14      15.43  26.517   <2e-16TypePoultry    49.86      27.50   1.813   0.0756   Residual standard error: 93.85 on 52 degrees of freedomMultiple R-squared:  0.05947,   Adjusted R-squared:  0.04139           Df Sum Sq Mean Sq F value  Pr(F) Type       1  28963 28963.2  3.2882 0.07555 .Residuals 52 458024  8808.2       MODEL 2: Sodium = Intercept + Ct*Type + Cc*Calories              Estimate Std. Error t value Pr(|t|)   (Intercept) -109.1762    52.5665  -2.077   0.0429   TypePoultry  177.8399    20.5802   8.641 1.47e-11Calories       3.2866     0.3284  10.010 1.25e-13 Residual standard error: 55.04 on 51 degrees of freedomMultiple R-squared:  0.6827,    Adjusted R-squared:  0.6703           Df Sum Sq Mean Sq  F value    Pr(F)   Type       1  28963   28963   9.5605   0.00322  Calories   1 303522  303522 100.1906 1.249e-13Residuals 51 154502    3029                  MODEL 3: Sodium = Intercept + Ct*Type + Cc*Calories +  Ctc*Type*Calories                                 Estimate Std. Error t value Pr(|t|)   (Intercept)          -160.5796    61.2121  -2.623  0.01151   TypePoultry           324.2098    94.9871   3.413  0.00128Calories                3.6126     0.3840   9.408  1.2e-12TypePoultry:Calories   -1.1256     0.7136  -1.577  0.12102   Residual standard error: 54.25 on 50 degrees of freedomMultiple R-squared:  0.6978,    Adjusted R-squared:  0.6796               Df Sum Sq Mean Sq  F value    Pr(F)   Type           1  28963   28963   9.8395  0.002861  Calories       1 303522  303522 103.1138 9.573e-14Type:Calories  1   7324    7324   2.4880  0.121024   Residuals     50 147178    2944                        For each of the three models, plots of fits to the data, and residual plots (below each fit) are shown. a. Compare Models 1 and 2. Discuss p-values, R-squared, and Residual standard error. Explain why the    Residuals sum of squares for Model 2 is much smaller than for Model 1. b. What do you conclude from Model 2 and why? c. Compare Models 2 and 3. Which model is a better one, and why? Include discussion of the p-values,    R-squared, Residual standard error, the number of terms in the model equation, and the effect of the    interaction term in Model 3. d. What is the Model 2 estimate for the mean difference in mg of Sodium between Poultry and Non-  Poultry hot dogs? Show how to obtain your answer from the coefficients in the regression table for      Model 2, keeping in mind that the variable TypePoultry is a dummy or indicator variable. Start by    writing out the equation for the Model 2 fit for the mean amount of Sodium.      1. There are three problems, each worth about one-third of a total of 100 for the exam. The extra credit question    is worth 10 points, so the highest score possible is 110. 2. Answers should be concise and display knowledge of any statistical tests used, as well as adequate     justification of conclusions. Clarity of presentation is important, and points will be deducted if the answers    can’t be understood, or contain material not relevant to the specific problem. 3. Do not use a text font smaller than 10 points. Limit the text part of the answers to all three questions to no    more than three pages, although less text should be sufficient to get full credit. 4. The exam is open book, open notes, and open reference.  Any books or web sites are fair sources of    information. You may NOT use live experts, especially  professors, graduate students, or consulting services,    as resources. 5. You may discuss your work, ideas, and analyses with others in the course.  However, you may not copy     others’ work, or provide your work to others.  You must hand in your own best answer in your own words, and you must be able to explain it yourself. If essentially identical exams with essentially identical text, graphs, etc. are handed in, there will be a significant penalty for copying. There will also be a penalty if answers are copied from any other source.
Powered by