# Common Core Standards: Math

### Statistics and Probability 8.SP.A.1

1. Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.

Many esteemed statisticians have discovered that there appears to be a strange link between Justin Bieber's singing and the population of capuchin monkeys in Argentina. Whenever Bieber hits a high note, there is a spurt in the number of baby capuchins. The higher the frequency of the note, the more baby monkeys are born.

We could list all the different frequencies of Bieber's voice and the number of baby monkeys that were born at those high notes. But that's just a list of numbers—and we most people will probably give up on making sense of it long before they can be convinced.

A better way to see whether there's a relationship between this bivariate data (that simply means two different sets of numbers) would be to make a scatter plot.

Along the bottom, we'll list the frequencies of Bieber's high notes while in concert. On the side, we'll list the number of baby monkeys that were born at those frequencies. Then, we'll put a dot on the graph corresponding to the note and number of monkey births. Turns out, our graph looks like this:

What exactly could we deduce from this data? Well, there certainly seems to be a relationship between high notes and the birth of baby capuchins; the higher the notes, the more babies.

Students should understand scatter plots as ways to communicate relationships between two variables. The more linear the graph, the stronger the correlation. Students should also be able to identify and define outliers and clusters and give possible reasons for their existence. For instance, holding out a shaky high note might result in a few clusters around that particular frequency, not to mention a few fans clustering toward the exit.

Students should also be able to interpret scatter plots as having linear or nonlinear associations, and discern whether these associations are positive or negative. They can think of positive and negative as describing the "slope" of the data. If there's a positive association, both variables increase together. If there's a negative slope, one increases while the other decreases. We don't mean "positive" or "negative" for the capuchin monkey population. Let PETA take that one on.

#### Drills

1. Which type of scatter plot would suggest a positive association?

When one variable increases, the other increases

A positive association is like having a line with a positive slope. As the x variable increases (the one on the horizontal axis), the y variable increases (the one on the vertical axis). So (A) is the right answer because both variables are increasing. While a positive association suggests some sort of clear relationship (better than a murky one, right?), it doesn't have to mean a positive—or even linear—association. That's why (D) doesn't quite make the cut.

2. What does an outlier indicate?

None of the above

An outlier is an odd piece of information that doesn't follow the trend. Like being a hipster, before being a hipster became the trend. Having an outlier means there must be some sort of trend that this particular point doesn't follow, so (A) can't be right. Of course, the outlier doesn't tell us anything about the correlation between the variables because, as an outlier, it doesn't follow it. So (B) is wrong. While we may want to go with (C), having an outlier doesn't tell us much about the correlation, but we can still determine the correlation from the rest of the data. So (D) is the only answer that makes sense.

3. What is meant by data that "clusters"?

Data that all lands in one particular part of graph

Your guess is as good as ours. Not because we don't know, but because it's pretty clear from the word "clusters." Just like cereal clusters that group together in a bunch, data that clusters is a collection of data points all in around the same region of the scatter plot. The word "cluster" doesn't suggest a linear relationship, nor does it indicate data that's spread out. Although it might sound a bit vague, there still exists some sort of pattern in the word "cluster," so (B) isn't right, either.

4. Which would be an example of a positive association?

The more hours spent reading, the higher the verbal SAT scores of students

A positive association is like a line with a positive slope or direct variation: when one variable increases, so does the other one. It doesn't necessarily have to have a positive effect. If we look at each choice, we see that (A), (B), and (D) are negative associations because one variable increases (gas prices, Calorie intake, and hours online) while the other decreases (miles driven, temperature, hours of sleep). The only one that doesn't is (C), in which both variables (hours reading and SAT score) increase.

5. Which would be an example of a negative association?

The more hours Aunt Tina spends exercising, the lower her weight

Negative associations don't have to be negative in context. All it means is that when one variable increases, the other decreases. In this case, (A) is the only negative association is that between Aunt Tina's exercise regimen and her weight. Her hours of exercise increase, so her weight decreases. While it's a healthy, positive outcome, the association itself is negative because one variable goes up while the other goes down. The rest are positive because both variables in (B), the number of texts and phone bill payment, and both variables in (C), the hours spent playing piano and GPA, increase.

6. What is purpose of a scatter plot?

To better observe the correlation between the variables

Scatter plots show us what's happening in a nice, neat little graph. As much as they might want to, they can't prove or verify anything by themselves. Not only that, but it takes way more than a correlation to prove causation, so (A) is definitely wrong. It's also true that scatter plots don't always show linear associations. Sometimes there can be clear-cut relationships between two variables that aren't linear. Since (A) and (C), and therefore (D), are all untrue, (B) is the only answer left over.

7. Which of the following is probably least likely to have a linear relationship?

Age and zip code

Aside from some retirement communities (and the whole of Miami), your age is probably not at all related to where you live. Each of the others would probably exemplify a linear relationship much better than (B) because the two variables involved are linked by more than just denture cream. Gross.

8. Which of the following is untrue about scatter plots?

If two variables are correlated, their scatter plot will show a linear relationship

While (A) and (B) are true, (C) isn't always the case. The key here is that there's more than one type of correlation because not all correlation between variables has to be linear. For instance, mobility and age aren't linearly correlated but they are still related in some way and would still show up on a scatter plot as something other than a line.

9. Why is it important to label the axes for a scatter plot?

To prevent misinterpretation of the data

Without labeled axes, it's easy to confuse the variables or see a relationship that might not actually be there. To help make sure the scatter plot is crystal clear, it's best to always label the axes. It won't make your scatter plot any easier to draw, but it'll certainly prevent misinterpretation of your data. (Both you and your math teacher will be happy about that.)

10. How could data in a scatter plot possibly be wrong?