From 11:00PM PDT on Friday, July 1 until 5:00AM PDT on Saturday, July 2, the Shmoop engineering elves will be making tweaks and improvements to the site. That means Shmoop will be unavailable for use during that time. Thanks for your patience!

# Common Core Standards: Math

### Statistics and Probability 8.SP.A.2

2. Know that straight lines are widely used to model relationships between two quantitative variables. For scatter plots that suggest a linear association, informally fit a straight line, and informally assess the model fit by judging the closeness of the data points to the line.

You know the old saying, "A picture is worth a thousand words"? When it comes to math, a picture is probably worth a million. Sometimes (especially when it comes to statistics), one picture or graph can give you a crystal-clear look at what's going on, while a list of numbers just leaves you looking for the nearest exit.

One awesome example of this is a scatter plot. This shows the relationship between two different quantities. It gives you an instant idea of what the data is doing: whether it's grouped in some sort of a pattern or scattered all over the place (hence the name!).

Usually, the data does follow a pattern and fairly often, this pattern takes the shape of a line. Mathematicians, being who they are, immediately want to know which line the data is closest to because it can give us a good approximation of what other data points might be.

The easiest way to find this line is to take a straightedge (a ruler, the edge of folded piece of paper, or our favorite, a piece of uncooked spaghetti) and try to get it to fit through as many of the points as possible. This line is called the line of best fit. (Yeah, every once in a while, mathematicians actually use a name that tells you what the thing is. This is one of those times.)

Students should understand that in many cases, a line of best fit can apply to a scatter plot and provide a good means of understanding the relationship between the variables. They should also be able to guesstimate the accuracy of the data by looking how closely the line of best fit approximates the data points.

Sometimes, the "line" of best fit isn't a line at all, but a curve of some sort. It's possible for the best fit to be a parabola or exponential graph. Just because there isn't a line of best fit doesn't mean the variables aren't somehow related. And if they aren't related for some reason, then no graph—line or curve—will be a "best fit." On the other hand, students should know that enough relationships are linear (or linear enough) to have a line of best fit.

#### Drills

1. Does all bivariate data suggest a linear relationship?

No, because some variables are not correlated linearly or at all

"Bivariate data" just means that we have two variables. While linear relationships are common (and easier to deal with) among bivariate data, they aren't always the case. We could still draw scatter plots of nonlinear or non-correlated data, so (A) and (B) are wrong, and (D) is wrong, too. While we could try fitting a line to nonlinear relationships, it probably wouldn't be that useful. The right answer is (C) because some variables—like total US deficit and squirrel population—have nothing to do with one another. (Unless squirrels have been rolling in our dough this whole time…)

2. Is there any rule about the slope of the line of best fit?

There is no rule for the slope

You already know the deal with straight lines. A line's slope can be positive, negative, zero, a whole number, a fraction—and even undefined. If we're trying to find the best relationship between these variables, why limit ourselves with rules for the slope? Nah, we'll just take a hint from algebra and let the slope be whatever it wants.

3. How does the line of best fit help statisticians?

All of the above

The line of best fit acts as a generalization—the most mathematically sound estimation of how the data points behave and how the two variables relate to each other. That means (C) is right, but we shouldn't stop there. We can use this tool to figure out how the variables are connected and predict how future data will look. Since all of these would be useful to statisticians, the right answer is (D).

4. What happens if some of the points are not on the line of best fit?

The line is not meant to go through all points so long as goes through most of them

It's incredibly rare that actual data will form an exact line. There are so many sources of error and variability within even the simplest experiment, so a perfectly straight line is almost impossible. In statistics, we're more concerned with trends, not concrete rules. As long as the line of best fit goes through most of the points, we should be fine. It doesn't mean the data was collected or plotted incorrectly or that the line should be altered. It simply means the line of best fit isn't the line of perfect fit. But hey, nobody's perfect, right?

5. Which of the following statements is most accurate to this scatter plot?

A line of best fit can be drawn because the variables appear to be negatively associated

Looking at the general trend of the data, we can see that as one variable increases (say, the one represented by the horizontal axis), the other (the vertical axis one) decreases. This means we have a negative linear association going on, which means we can draw a line of best fit. Sure, there might be an outlier down in the corner, but it's an oddball and not reflective of the majority of the data. The correlation is linear, so (C) and (D) are wrong, and it's negative, so (A) is wrong, too.

6. How useful would a line of best fit be for this scatter plot?

Very useful, since it'll be close enough to the data points to give reasonable approximations

There's no way we could draw a single straight line and hope to hit every data point on this scatter plot, so (A) missed the boat on this one. While (B) and (C) are both contenders, it should be clear too that (D) isn't right. The data points might be a bit spread out, but there's still a clear trend going on—definitely enough to apply a line of best fit. If you think (C) is right, consider that one outlier can result from a million different sources. A line of best fit looks for the general trend of the data, and it would serve us pretty well in this case.

7. How useful would a line of best fit be for this scatter plot?

Possibly useful, but the data points don't match up that well

The trend here isn't as strong as it could be, but there does appear to be some slight clustering. There's no way (A) is true, and (B) is probably a slight stretch. We could apply a line of best fit that would have a positive linear association, but the data is spread out pretty significantly. Since there does appear to be some linear correlation, we can eliminate (D), which means our best bet is (C).

8. How useful would a line of best fit be for this scatter plot?

Not useful at all, since the data is not linearly correlated enough to apply a line of best fit

The data here looks like a giant U, which is definitely not a linear relationship. There's no way we can fit a single straight line through all the points. A parabola would be useful, but we're talking about lines here, so (A) and (B) are out. While there is definitely a pattern to the data points, it's not something to which we can apply a line of best fit. (Well, we could, but it wouldn't do us much good.) While we could argue that (C) is right because the data points don't match up that well, (D) is the better answer because a line would be virtually useless to describe the pattern here and the pattern is definitely not linear.

9. Why is a scatter plot sometimes better to use than a list of the points?

The scatter plot gives a good sense of data trends and overall behavior

While you might think that (D) is right—because scatter plots are just visual interpretations of a list of points—the effects are drastically different. Scatter plots are excellent tools for looking at the general trends in data and how it changes. Imagine trying to envision how a particular building looks: would you prefer a list of all the different rooms or would you want a blueprint of the floor plan? The data should be the same regardless of how it's expressed, but (C) is right because of how the data is seen and understood.

10. If the data points all line up on the line of best fit, what does this say about the data?