# Singleton Computer Assignment 2 complete solutions correct answers key

singleton Computer Assignment 2 complete solutions correct answers key---- Random Numbers, Sampling and Central Limit Theorem (refers to Ch 7 & 8)Introduction -- The Random Number GeneratorRandom number generation by computer allows computer simulation of random experiments. The Excelrandom number generator allows the user to generate random variables which behave according to aspecified distribution. Each random value generated by Excel is another result of a chance experiment. Forexample, the dialog box on the left specifies that the population mean is 0 and the population standarddeviation is 1 and the distribution shape is normal (a z distribution). It asks the program to do 3 columns of10 random values each. This block of output is to begin at cell A1. The result on the right could beconsidered a random sample of 30 values or for that matter 3 samples of 10 values each.When you make a histogram of the random values you see a picture which approximates the theoreticalshape of the distribution, but with only 30 values there are not enough to make something that looks like anormal distribution.0510

**F r****eq u e****n c****y****Bin****Histogram**With 1000 data values the distribution looks more normal:20102030405060708090-3-2 .6-2 .2-1 .8-1 .4 -1-0 .6-0 .20 .20 .6 11 .41 .82 .22 .6 3**F r****eq u e****n c****y****Bin****Histogram**But it takes an enormous number of values to get a really smooth plot (this is 32768):You will note that there is an empty box in the dialog form, the**. If this box is left empty,Excel starts with a standard sequence of values, and every time you generate some more values it continueson extending the list out with different values. However, if you put a particular integer number in this box,the Random Number generator always starts with a particular random number corresponding to the seed, andproduces the same particular sequence every time you call it. Since computers are (when working properly)deterministic, some people insist on saying "***Random Seed***pseudo**-random number", but this is a philosophical matter.3Introduction --- The Central Limit TheoremChapter 7 covers the Central Limit Theorem (CLT) thoroughly. This CLT provides the framework thatmakes inferential statistics possible in the case of a single population sample of random variable x used toestimate the population mean (in Chapter 8 and related work in Chapter 9). The connection between theknown and measured sample and the unknown population is provided by this theorem.Another version of CLT is mentioned in Chapter 10 (page 352) which likewise makes the estimation andhypothesis testing in Chapter 10 possible.Also, there is a relationship (not specifically called CLT) involving the sampling distribution of thevariance which makes possible estimation of a confidence interval for population variance and hypothesistesting involving the population variance. I discuss this in the Introduction to Chapter 8 in the part"Estimating population variance".In Computer Assignments 1, 3, and 4 for this course I ask you to do very practical and appliedcomputations to get practical answers. In this No. 2 assignment I just ask you to run simulations showinghow the CLT and the variance relationship work to improve your background knowledge. I will try to giveexplicit instructions to make it easier. The closest thing to this assignment in our course materials is in theWileyPLUS Chapter 7 material. It is "Interactive Applet: Understanding Sampling Distributions and theCentral Limit Theorem". If I assigned this there would be no output into a file from which I could give you agrade, but you are welcome to try it.Because the CLT formulafor the standard deviation of the sample mean has a square root of n in the denominator, when sample size nis small the sample mean values are spread widely and when n is big the spread is small.4The theoretical distribution in the graph according to the CLT for n=4 is hugely wider than the one for n=80.For n=4 the theoretical standard deviation of the graph is= 1 / = 0.5or for n=80= 1 / = 0.111803Since you are taking only 1000 (250 for older Excel) instead of an infinite number of samples, you should beable to compute values which are close but not exactly equal to these.The formula for the theoretical distribution of the sample variance is much more complicated than in theabove case for the sample mean, so we use the chi-squared table in the book for practical work. However thetheoretical sample variance distribution shows similar behavior: (These plots are from real formulacomputations.)For a sample size of 4 the distribution is extremely skewed and wide, whereas for 80 it is relatively narrowand very slightly skewed, almost like a normal distribution surrounding the population value.Practical implications of this are: It would be almost meaningless to try to estimate a confidence intervalfor variance (and consequently also standard deviation) with n=4, whereas with n=80, reasonable resultswould be obtained. On the other hand, in some cases estimating a population mean from a sample with n=4might be adequate. Your instructor (before having more contact with statistics) remembers that his industriallaboratory group was warned by the company statistician when we were only using n=3 for a product testwhich had to be repeated many times. We really had no idea of an estimate for the variance. The troublewas that each reading could require a half-hour or more since the technician had to tediously manipulate atiny piece of fiber just right for the test to turn out at all. We later found that since the technician also had to5move crosshairs in the microscope to measure the fiber's thickness, she was unconsciously biasing the resultstoward what we wanted (a stronger fiber).Your AssignmentI am expecting you to be familiar with concepts discussed in "Using Excel formulas with moresophistication".Your assignment is to use a population with a normal distribution to generate (by Excel simulation) blocksof 1000 different random samples. After generating 1000 samples of size n=4 and 1000 samples of sizen=80, you need to make histograms to show that the sample means and sample variances are distributed like(have a similar shape to) the above theoretical plots. They will not look exactly like the theoretical plotssince we are only using 1000 samples, not an infinte number. Also, you need to show that the mean ofsample means and the standard deviation of sample means agree with the CLT theory. The theory saysThe template for a sheet:The final result will have 2 worksheets like this, one for n=4 and one for n=80, plus a set of histograms onseparate sheets.1. Download the template file (note I ask you to type in the formulas into it yourself).62. For n=4="Number of random numbers", generate the random values with the Random Number Generator,Normal distribution, mean=0 and standard deviation=1, starting in cell B11. If you are using Excel 2010 or2013 you can specify :"Number of variables"=1000. If you are using Excel 2007 or before, it cannot handlethis many and you must use 250. 1000 gives nicer histograms.As shown the block of random values starts at cell B11 (and extends down to B14 for n=4 ).3. Type the functions in for cells B7 through B9. (B7=mean of B11 thru B14, B8=sample standard deviationof B11 thru B14, and B9=square of B8).4. Select and copy cells B7 through B9, then do a multiple paste of these out to the end of the block ofrandom data. If you are using 1000 samples the block ends way out at column ALM. If 250, it ends out atcolumn IQ.5. Now you have 1000 (or 250) trials of sample means, sample standard deviations, and sample variances.In cell B4 compute the average of the 1000 (or 250) sample means and in B5, the standard deviation of the1000 (or 250) sample means (use STDEV).6. The population mean is 0 by specification (cell B1). According to the CLT the mean of the sample means(cell B4) should be this value, but since we are estimating it with 1000 (or 250) trials it should be not exactbut close.The theoretical mean of the sample standard deviations should also be close to the CLT theoretical formulavalue. Put this in cell B2 (I have computed it for you above). This should compare closely with cell B5, yoursimulated point estimate based on 1000 (or 250) trials.7. Make a histogram of the 1000 (or 250) sample means on a new sheet. When you did ComputerAssignment 1 you used data in a column; it works the same way for data in a row. Do not bother using"bins".8. Also likewise make a histogram of the 1000 (or 250) sample variances on a new sheet.9. Copy and paste the 1st 7 or so columns of your n=4 data sheet to a new blank sheet for n=80. You couldstart completely over on a new blank sheet, but copying will save some work.10. Repeat step 2 for n=80. The block of data will now extend down to row 90.11. Repeat step 3 only now you will just have to change B14 to B90 in the average and sample standarddeviation functions.12. Repeat step 4.13. Repeat step 6. You have to change B2 so that it has the correct value for n=80.14. Repeat step 7.15. Repeat step 8.7As you make new sheets in your Excel file, please type in labels on their tabs so I can follow what you did.The theoretical CLT values in cells B1 and B2 (for taking an infinite number of samples) must closely agreewith the values you got in cells B4 and B5. The histograms which you get from steps 7, 8, 14, and 15 mustresemble the corresponding distribution plots above. The numbers on the vertical axis will be different sinceyou are drawing frequency plots, and the theories are in terms of probability densities. But the shapes shouldbe approximately the same, and the numbers on the horizontal axis should correspond. Needless to say yourhistograms will be smoother for 1000 trials than if you had to use Excel 2003 with 250 trials.Note that students may get various different results depending on how they happened to use the randomnumber generator. I grade by the general appearance of the results, not the exact number values.Singleton Computer Assignment 2 complete solutions correct answers key---- Random Numbers, Sampling and Central Limit Theorem (refers to Ch 7 & 8)Introduction -- The Random Number GeneratorRandom number generation by computer allows computer simulation of random experiments. The Excelrandom number generator allows the user to generate random variables which behave according to aspecified distribution. Each random value generated by Excel is another result of a chance experiment. Forexample, the dialog box on the left specifies that the population mean is 0 and the population standarddeviation is 1 and the distribution shape is normal (a z distribution). It asks the program to do 3 columns of10 random values each. This block of output is to begin at cell A1. The result on the right could beconsidered a random sample of 30 values or for that matter 3 samples of 10 values each.When you make a histogram of the random values you see a picture which approximates the theoreticalshape of the distribution, but with only 30 values there are not enough to make something that looks like anormal distribution.0510**F r****eq u e****n c****y****Bin****Histogram**With 1000 data values the distribution looks more normal:20102030405060708090-3-2 .6-2 .2-1 .8-1 .4 -1-0 .6-0 .20 .20 .6 11 .41 .82 .22 .6 3**F r****eq u e****n c****y****Bin****Histogram**But it takes an enormous number of values to get a really smooth plot (this is 32768):You will note that there is an empty box in the dialog form, the**. If this box is left empty,Excel starts with a standard sequence of values, and every time you generate some more values it continueson extending the list out with different values. However, if you put a particular integer number in this box,the Random Number generator always starts with a particular random number corresponding to the seed, andproduces the same particular sequence every time you call it. Since computers are (when working properly)deterministic, some people insist on saying "***Random Seed***pseudo**-random number", but this is a philosophical matter.3Introduction --- The Central Limit TheoremChapter 7 covers the Central Limit Theorem (CLT) thoroughly. This CLT provides the framework thatmakes inferential statistics possible in the case of a single population sample of random variable x used toestimate the population mean (in Chapter 8 and related work in Chapter 9). The connection between theknown and measured sample and the unknown population is provided by this theorem.Another version of CLT is mentioned in Chapter 10 (page 352) which likewise makes the estimation andhypothesis testing in Chapter 10 possible.Also, there is a relationship (not specifically called CLT) involving the sampling distribution of thevariance which makes possible estimation of a confidence interval for population variance and hypothesistesting involving the population variance. I discuss this in the Introduction to Chapter 8 in the part"Estimating population variance".In Computer Assignments 1, 3, and 4 for this course I ask you to do very practical and appliedcomputations to get practical answers. In this No. 2 assignment I just ask you to run simulations showinghow the CLT and the variance relationship work to improve your background knowledge. I will try to giveexplicit instructions to make it easier. The closest thing to this assignment in our course materials is in theWileyPLUS Chapter 7 material. It is "Interactive Applet: Understanding Sampling Distributions and theCentral Limit Theorem". If I assigned this there would be no output into a file from which I could give you agrade, but you are welcome to try it.Because the CLT formulafor the standard deviation of the sample mean has a square root of n in the denominator, when sample size nis small the sample mean values are spread widely and when n is big the spread is small.4The theoretical distribution in the graph according to the CLT for n=4 is hugely wider than the one for n=80.For n=4 the theoretical standard deviation of the graph is= 1 / = 0.5or for n=80= 1 / = 0.111803Since you are taking only 1000 (250 for older Excel) instead of an infinite number of samples, you should beable to compute values which are close but not exactly equal to these.The formula for the theoretical distribution of the sample variance is much more complicated than in theabove case for the sample mean, so we use the chi-squared table in the book for practical work. However thetheoretical sample variance distribution shows similar behavior: (These plots are from real formulacomputations.)For a sample size of 4 the distribution is extremely skewed and wide, whereas for 80 it is relatively narrowand very slightly skewed, almost like a normal distribution surrounding the population value.Practical implications of this are: It would be almost meaningless to try to estimate a confidence intervalfor variance (and consequently also standard deviation) with n=4, whereas with n=80, reasonable resultswould be obtained. On the other hand, in some cases estimating a population mean from a sample with n=4might be adequate. Your instructor (before having more contact with statistics) remembers that his industriallaboratory group was warned by the company statistician when we were only using n=3 for a product testwhich had to be repeated many times. We really had no idea of an estimate for the variance. The troublewas that each reading could require a half-hour or more since the technician had to tediously manipulate atiny piece of fiber just right for the test to turn out at all. We later found that since the technician also had to5move crosshairs in the microscope to measure the fiber's thickness, she was unconsciously biasing the resultstoward what we wanted (a stronger fiber).Your AssignmentI am expecting you to be familiar with concepts discussed in "Using Excel formulas with moresophistication".Your assignment is to use a population with a normal distribution to generate (by Excel simulation) blocksof 1000 different random samples. After generating 1000 samples of size n=4 and 1000 samples of sizen=80, you need to make histograms to show that the sample means and sample variances are distributed like(have a similar shape to) the above theoretical plots. They will not look exactly like the theoretical plotssince we are only using 1000 samples, not an infinte number. Also, you need to show that the mean ofsample means and the standard deviation of sample means agree with the CLT theory. The theory saysThe template for a sheet:The final result will have 2 worksheets like this, one for n=4 and one for n=80, plus a set of histograms onseparate sheets.1. Download the template file (note I ask you to type in the formulas into it yourself).62. For n=4="Number of random numbers", generate the random values with the Random Number Generator,Normal distribution, mean=0 and standard deviation=1, starting in cell B11. If you are using Excel 2010 or2013 you can specify :"Number of variables"=1000. If you are using Excel 2007 or before, it cannot handlethis many and you must use 250. 1000 gives nicer histograms.As shown the block of random values starts at cell B11 (and extends down to B14 for n=4 ).3. Type the functions in for cells B7 through B9. (B7=mean of B11 thru B14, B8=sample standard deviationof B11 thru B14, and B9=square of B8).4. Select and copy cells B7 through B9, then do a multiple paste of these out to the end of the block ofrandom data. If you are using 1000 samples the block ends way out at column ALM. If 250, it ends out atcolumn IQ.5. Now you have 1000 (or 250) trials of sample means, sample standard deviations, and sample variances.In cell B4 compute the average of the 1000 (or 250) sample means and in B5, the standard deviation of the1000 (or 250) sample means (use STDEV).6. The population mean is 0 by specification (cell B1). According to the CLT the mean of the sample means(cell B4) should be this value, but since we are estimating it with 1000 (or 250) trials it should be not exactbut close.The theoretical mean of the sample standard deviations should also be close to the CLT theoretical formulavalue. Put this in cell B2 (I have computed it for you above). This should compare closely with cell B5, yoursimulated point estimate based on 1000 (or 250) trials.7. Make a histogram of the 1000 (or 250) sample means on a new sheet. When you did ComputerAssignment 1 you used data in a column; it works the same way for data in a row. Do not bother using"bins".8. Also likewise make a histogram of the 1000 (or 250) sample variances on a new sheet.9. Copy and paste the 1st 7 or so columns of your n=4 data sheet to a new blank sheet for n=80. You couldstart completely over on a new blank sheet, but copying will save some work.10. Repeat step 2 for n=80. The block of data will now extend down to row 90.11. Repeat step 3 only now you will just have to change B14 to B90 in the average and sample standarddeviation functions.12. Repeat step 4.13. Repeat step 6. You have to change B2 so that it has the correct value for n=80.14. Repeat step 7.15. Repeat step 8.7As you make new sheets in your Excel file, please type in labels on their tabs so I can follow what you did.The theoretical CLT values in cells B1 and B2 (for taking an infinite number of samples) must closely agreewith the values you got in cells B4 and B5. The histograms which you get from steps 7, 8, 14, and 15 mustresemble the corresponding distribution plots above. The numbers on the vertical axis will be different sinceyou are drawing frequency plots, and the theories are in terms of probability densities. But the shapes shouldbe approximately the same, and the numbers on the horizontal axis should correspond. Needless to say yourhistograms will be smoother for 1000 trials than if you had to use Excel 2003 with 250 trials.Note that students may get various different results depending on how they happened to use the randomnumber generator. I grade by the general appearance of the results, not the exact number values.
You'll get 1 file (1.3MB)