Sampling Distributions

On February 2nd, we left Shmoop headquarters in the morning to collect some very special data. This was the day we would finally settle the question: how much wood can a woodchuck chuck?

We randomly selected woodchucks to use in the study, gave them a pile of wood to chuck, and let them go at it. We recorded their individual results, and used them to calculate a sample mean. It was surprisingly high.

We, of course, were thrilled. People always asked how much a woodchuck would chuck if a woodchuck could chuck wood. It turns out that no one had asked them to even try. We went to sleep that night dreaming a satisfied dream about our data.

When we woke up, it was February 2nd again, and our data was gone. It was one of those February 2nds, was it? We weren't going to let something like a closed time loop stop us. At least we remembered the sample mean we had already calculated.

So, we set out to repeat our data collection for the first time. We made a new random selection of our woodchucks. They chucked that wood, we recorded the results, found our new sample mean, and woke up on February 2nd again.

This has been going on for a while now. It's starting to get a bit monotonous, but at least we were able to remember the sample means from every trial. If one sample mean is a good estimate for a population parameter, just think of how good a few hundred sample means are.

That's when we got the idea to put all of our results—all of the sample means we had found—on a single graph together. This made a sampling distribution, a distribution of possible sample estimates we could get from sampling the population. The next day, we woke up on February 3rd. The time loop was broken.

So Much Sampling, So Little Time

If we were to go out and collect multiple, identical samples like this, we would expect to get slightly different results each time. We'd sample different individuals at random, and so our measurements would vary some. We would be sampling the same population every time, though, so we would expect all of our estimates to be good estimates of the true population parameter.

In real life, almost no one ever collects samples like this. That's time and money that could be spent on sampling some new population, or on fast cars and boatloads of cake. But a sampling distribution would still be nice to have. It would let us ask, "How typical was our result?"

Yep. That sure would be nice to have. Oh well. We shouldn't sit around wishing for something that we will obviously never see in the next section.