Populations and Samples - At A Glance

It's time we sat down and had The Talk, about where data comes from. When someone is very interested in a topic, the stork comes along and gives them a little bundle of data for them to analyze as their very own. The End.

Not buying it, huh? Well, you're probably old enough to learn the truth now. It's a story of the Populations and the Samples. No birds or bees involved.

Stay On Target

If we want to collect some data, the first order of business is to decide what exactly we want to study. Going out and measuring things at random is a good way to get lost, beaten up, or arrested. What is our target, the thing that we want to learn about?

A population is all of the individuals that we could collect data from. This is a real Humpty Dumpty definition: the words mean exactly what we want them to mean, no more, no less. When we say "all" of the individuals, we do mean every single one, even if they would be impossible to find or measure. A population is more of an idea than something we actually work with.

And when we say "individuals," we don't necessarily mean people. If we were studying hummingbird calls, then all the hummingbirds would be our population. Or, if we were only interested in the length of the bird calls, then it would be the bird calls themselves that would be our population. We can think about the individuals of a population as being "items of interest." By the way, you should come see our band, Items of Interest, this Saturday.

Some examples of populations are:

  • All of the students that read Shmoop
  • All of the weightlifters in Pocatello, Idaho
  • Every family with at least two kids
  • All of the news articles written about Shmoop

Obviously, it can be hard, or even impossible, to study every individual in a population. That's why we won't even try. Instead, we'll take a sample, a subset of the total population, and study that instead. This is actually the whole point of statistics—to be able to use a sample to make some conclusions about the population as a whole. And you thought it was all about mathematicians trying to trick people into paying attention to them.

When we sample a population, we're trying to learn about some parameter of the population as a whole. For instance, we might be curious about the average GPA of the students that read Shmoop. We could ask 1000 Shmoopers, a sample of the whole, about their GPA. The average GPA of our sample can be used as an estimate of the parameter in the whole population. We think that the estimate and the GPA would both be pretty good.

We can use all kinds of measurements as parameters and estimates. We can find the sample mean and use it to estimate the population mean, like in our GPA example. Yes, we can talk about multiple means at the same time. This gets confusing, so we have different symbols for the sample mean vs. the population mean: x vs. μ. Oh hey, we've seen these two before.

Other measurements we can use as parameters and estimates are proportions, the median, and the standard deviation. In each case, we use the values found from a sample to create an estimate for the population as a whole.

Random Sampling

Not every sample is going to be a good sample. If we only ask the chess club members for their GPA, we're going to get a biased picture if we try to use that as an estimate. Nothing personal, chess people, but you're not very representative of the class as a whole.

The way to get an unbiased estimate is to create the sample using random sampling. We're not talking monkey ninja pirate zombie types of random, though. There are two things we have to do to get a random sample:

  • Every individual in the population has an equal chance of being selected.
  • The selection of each individual is independent from all the others.

It's random like rolling dice, where every face has an equal chance of being rolled, and every roll of the die is independent of the one before and afterwards. As long as our sample is large enough, the results will be representative of the population as a whole.

Actually getting a random sample can be tough, though. If you're sampling wild flowers (maybe because you have a hot date and forgot to get a gift, you dog), it would be tempting to pull over to the side of the road and grab a whole clump of flowers. However, all the flowers away from the road are less likely to be picked, and flowers growing together are more likely to be picked. If your date wanted a random sample of flowers, and why wouldn't they, they're going to be disappointed.

Summary

Who loves a recap? We do.

  • A population is everything that we want to study. If we're interested in how many cereal boxes have prizes inside of them, then all the cereal boxes are our population.
  • A parameter is some measurement of the population. The proportion of all cereal boxes that have prizes would be a parameter. We almost never know the true value of the parameter, which is why we're trying to estimate it.
  • We estimate the parameter value by creating a random sample. We don't just grab every box of Lucky Charms within reach; we select boxes at random, so our results won't be biased. We can use the proportion calculated from the sample to estimate the parameter value.
  • Non-random sampling can only be used to draw conclusions about groups similar to the sample. We can't use it to make estimates for the total population.

Example 1

Scientists measured the weight of 100 randomly selected buffalo, because there was nothing good on TV that evening. Their average weight was 1200 pounds. The buffalo, not the scientists. Identify the population, the sample, the parameter, and the estimate of this study.


Example 2

A researcher wants to know if people prefer to butter their toast on the top or on the bottom. She buys ad space on several popular websites asking people to take her survey. She gets over 1000 responses, and concludes that people prefer to butter the bottom of their toast. What is the problem with how this study was carried out?


Example 3

Just how destructive is the Kool-Aid Man? We went out and measured the size of the holes he had made in 20 walls. What kind of study is this: a sample survey, an experiment, or an observational study?


Exercise 1

A scientist takes a big bucket of water from a lake and counts how many species of bacteria, bugs, and other creepy crawlies he finds in the bucket. Identify the population, the sample, the parameter, and the estimate in this situation.


Exercise 2

A school takes a poll to find out what students want to eat at lunch. 70 students are randomly chosen to answer the poll questions. What are the population, the sample, the parameter, and the estimate of this study?


Exercise 3

When is it okay to NOT use random sampling?


Exercise 4

You want to know what proportion of people enjoys getting a root canal. What type of study should be done?


Exercise 5

What does it mean for an experiment to be double blind?