High School: Statistics and Probability

Making Inferences and Justifying Conclusions HSS-IC.A.1

1. Understand statistics as a process for making inferences about population parameters based on a random sample from that population.

Students should understand that statistics consists of using little bits of data from a larger pool of potential information in order to understand and sometimes predict future outcomes of an entire set of data. Basically, the information from a part can tell us about the whole.

Now, statistics isn't about being right, necessarily. (No, that doesn't mean everyone gets an automatic A+.) Our predictions and generalizations may not be valid all of the time, but there is some merit to them. What we're getting at is that statistics isn't perfect, but it's more accurate than throwing darts at a bull's-eye from a hundred feet away with a blindfold on (hopefully with some sort of protective armor).

But before students can make inferences about population parameters based on random samples, they need to know what these things are.

A population is a big word for "a group of things we're studying." These things could be bees, snakes, students, or a weird mutant hybrid of all three (in which case you should probably call the closest biologist or National Security). A population can be big or small, as long as it gets us some statsâ€”stat!

Students should know that populations are defined using parameters, which are really just "things we want to measure about our group."

Students should understand the importance of proper random sampling, since it's unbiased. Since the whole point of statistics is to say something about an entire group, we need to choose a sample that represents the whole group as closely as possible.

Students should also be aware of the fact that random samples can, and often do, exhibit patterns. For instance, it's okay to have a random sample of 10 red flowers from a batch of 100 multicolored ones. A random sample is random because of how it's collected, not because of what is collected.

The most important things that students must understand are what a population is, what a random sample is, how to take random samples, and how to properly infer data from samples.

Did you know that it's statistically proven that practice makes perfect? Well, maybe not. But the best way to help students get comfortable with identifying these main ideas in statistics is through practice. Definitions are good and fine, but students have to practice to really improve their understanding, whether it's on a big screen in the classroom, at everyone's computer, or simply as a homework assignment.

Drills

1. A marketing company located in Hollywood, California is really curious to see which group of U.S. high school students (ninth, tenth, eleventh, or twelfth graders) spends the most time watching television every day. For some reason, they decide to take data from your high school only. After randomly selecting 25 students, they find that ninth graders watch the most television per day. What is the sample of the population?

25 high school students from your school

The population in question consists of all the high school students in the United States, but it's unrealistic to sample them all. The sample is the smaller group from which the analysts collect data. In this case, they only selected 25 students from your school to represent the entire population of U.S. high school students. That probably won't give them the most accurate data, but that was the sample they chose.

2. A pollster wants to find out whether or not American citizens would support a candidate running for national office who wants to lower the legal drinking age from 21 to 18. They plan on doing this by sending 10,000 text messages across the entire United States to randomly selected, active, U.S. based phones with text messaging capabilities. Assume every text that is sent receives a reply. Why is this random sample, despite being truly randomly chosen, unlikely to be a good representative sample of the American population's opinion in an election?

There is no method of verifying whether or not the repliers are U.S. citizens who are registered to vote

The population in question consists of the United States voters, but not everyone with a cell phone is a registered voter. The replies of minors, non-citizens, and those who aren't registered to vote will skew the projected results of the politician's chances of election. It is unclear which age group will reply the most. Even if senior citizens or middle-aged adults responded more than any other age group (does grandma even know how to text?), they could still be part of the population in question.

3. If you want to conduct a practical study that finds the average height of a Great Dane, which of the following would be a good method of collecting a population sample?

Measuring the height of a couple hundred Great Danes across the U.S.

It is unrealistic to measure all Great Danes (not to mention the massive neck strain you'd get). Asking breeders for their opinions is too theoretical if you want to conduct a practical study on height. Answer (D) is just too silly and inaccurate. Since you want a decent-sized random sample, (C) is your best bet.

4. The mayor of Crimeville wants to increase taxes to invest in public safety. A polling company decides to randomly select 2500 registered voters in the city and ask them whether or not they would approve of the tax increase. What is the population?

The registered voters in Crimeville

Your population of concern is not every U.S. citizen (many of whom can't even vote or won't be affected by the tax increases). You're not just concerned about people over the age of fifty since thirty-year-olds can vote in Crimeville, too. Your population isn't the policemen in Crimeville because you are sending the questionnaire to registered voters who aren't necessarily policemen or women. The random sample from a population must always match your population parameters, and in this case, we're looking at registered voters in Crimeville.

5. A study conducted on how fast men can run the 100-meter dash found that the athletes in the study were able to do so in an average of ten seconds. Which of the following is a correct inference?

On average, male athletes will run a ten-second 100-meter dash

Our population here consists of males, but the only information conveyed was about athletes. We can't say anything about men in general, so (A) is incorrect. While (B) might seem right, it doesn't specify between male and female athletes. This study isn't about women, so both answers (B) and (C) are incorrect. That means (D) is right. (Can you run a 100-meter dash in less than ten seconds? Make sure to check with your doctor before you try.)

6. A fair six-sided die is randomly tossed to get a sample of 1, 1, 1, 1, 1, and 1. Is this a random sample?

Yes, because the fair die was randomly tossed

While (A) might seem right at first, fair die can be tossed unfairly (for example, dropping the die down from a small height, ensuring the same outcome). Answer (B) is incorrect because a randomly chosen sample can produce patterns. It's true that rolling six ones in a row is unlikely, but that isn't what defines a random sample. The correct answer is (C) because the sample was randomly chosen (and in this case, the "choice" was made by randomly tossing the fair die).

7. In order to figure out how in the world Las Vegas, Nevada has survived in the middle of the desert, the mayor has decided that she must plan for the future. She must determine how much water Las Vegas uses per square foot per year. Which of the following would be the most accurate way to determine this information?

Requiring every building to report its the square footage and yearly water use

Collecting data from every building is much more reasonable than collecting data for every individual. Also, the water consumption of the vast number of businesses in Las Vegas wouldn't be reflected in people's individual reported water consumption. As for (C), an educated guess is still just a guess. Hopefully it was obvious that (D) had little to do with water consumption in general.

8. Your school's kitchen manager needs to find out how many packets of Choco-Chips will be sold per month in the cafeteria. Since these are a new chip brand, there is no prior information about Choco-Chips sales. Which of the following is the best way to find out?

Buy a small amount for the first month, observe initial sales, and then make a judgment from there based on how many were bought and how to proceed

While your first instinct might be to say (A), it's too dangerous to rely on assumptions from the survey group. We need hard data to make our own assumptions. Answer (C) may seem like another viable answer, but the chips are of a different flavor and brand, which separates them even more from the sales of other chip types. As educated as the principal may be, we need real data to draw from. That leaves (B) as our only answer.

9. A park ranger has been tasked with figuring out how many baby coyotes are in the typical litter. To do so, the ranger finds twenty breeding female coyotes and averages the size of their litters. In this study, which of the following is the population parameter?

The number of newborn coyotes per litter

Think of parameters as the ultimate number we want to find, the defining characteristic of our population relative to our study. Since we're looking for the average size of a litter, (D) is our population parameter. Finding (C) won't help us find the average litter unless we have (A), and finding (B) is essentially the same as finding (A).

10. Why is picking out different candies from a bag without looking not as effective a random sample than if you were to assign numbers to each piece of candy and let someone else pick those randomly instead?