# High School: Statistics and Probability

### Interpreting Categorical and Quantitative Data HSS-ID.C.8

8. Compute (using technology) and interpret the correlation coefficient of a linear fit.

We all know what happens when you assume. Yeah, it makes a...fool...out of you and me.

Students should know that we need to check our assumptions, especially in math. So, for example if we fit a linear model to a set of data, we can check and make sure that assumption was at least somewhat appropriate. We don't want to make a fool of the data, and we certainly don't want the data to make a fool out of us.

The correlation coefficient is a number that measures the strength of association between two variables. In particular, the Pearson product-moment correlation coefficient is a measure of the linear association between two variables. It was named after Karl Pearson, who's the reason your students are studying statistics since he is considered the "father" of the field. Tell your students to pelt him with spitballs.

We're sure Pearson won't mind if your students just call his coefficient the "correlation coefficient." They should, however, remember that it has the symbol r and that it ranges from -1 to 1. A coefficient equal to 1.0 suggests a positive correlation between the data. This means that as the independent variable (x) increases so does the dependent variable (y).

A correlation coefficient equal to -1.0 suggests a negative correlation between the data, or as the independent variable (x) increases, the dependent variable decreases. Positive is positive, negative is negative. Hopefully not earth-shattering for your students.

If the coefficient equals 0, we have made an incorrect assumption. The data has made a fool of us and there is no linear correlation. However, just because the linear correlation coefficient equals 0 doesn't mean there is not another type of correlation between the data.

In addition to being positive or negative, the correlation coefficient can be weak or strong. Strong correlations can bench press 400, while weak ones can barely lift a dumbbell. The closer the correlation is to -1 or 1, the stronger the correlation. An arbitrary cut off for a strong correlation is less than -0.8 or greater than 0.8. If r is between -0.5 and 0.5 we consider that a weak correlation and send the data back to the gym.

Students should know that the correlation coefficient can be calculated with the following formula:

This equation is pretty complicated and, quite honestly, a bit of a pain to use. If you want to make your students calculate it by hand, make sure they do so carefully. Luckily, there are lots of ways to use technology to calculate this value. Students can use their TI calculators or Excel. Hip, hip, hooray for technology.

#### Drills

1. What does the correlation coefficient tell us?

Measure of the linear association between two variables

The correlation coefficient is a measure of the linear association between two variables. It measures that, and only that.

2. The correlation coefficient between two variables is 0.9. How would you describe this value?

Strong and positive

Just like normal numbers that everyone's used to, the correlation coefficient is positive because it's greater than 0. That leaves (A) or (C). So how can we tell whether this coefficient has been at the gym or not? If it's less than -0.8 or greater than 0.8, r is considered strong. Since 0.9 is greater than 0.8, we have a strong positive correlation coefficient here.

3. The correlation coefficient between two variables is -0.4. How would you describe this value?

Weak and negative

This correlation coefficient is quite obviously negative. Hopefully you picked up on that, at least. Since it's between -0.5 and 0.5, it's considered weak. Drop and give me twenty, coefficient.

4. We assume that SAT score is linearly associated with GPA and determine the correlation coefficient to be 0.8. What does this value suggest?

SAT score increases as GPA increases

The correlation coefficient is positive so that means that as the independent variable increases, the dependent variable also increases. The coefficient is strong, suggesting that it holds up the assumption that there is a linear relationship. A strong and positive correlation means that when one variable (either GPA or SAT score) increases, the other increases as well. The only answer choice that expresses this conclusion is (D).

5. Which of the following values for r suggests a strong negative correlation?

-0.85

For the coefficient to be strong (we mean six-pack abs strong), it has to be less than -0.8 or greater than 0.8, which narrows our options to 0.95 or -0.85. For the correlation to be negative, meaning that the dependent variable decreases as the independent variable increases, the value needs to be negative. That means (C) is the right answer.

6. What is the symbol for the Pearson product-moment correlation coefficient?

r

No worries about remembering the entire name for the correlation coefficient, but it is important to know that when you see the symbol r, that it refers to the correlation coefficient and describes the linear association between two variables.

7. Which of the following is true about the data represented below?

The data depicts a negative weak correlation.

The value of the correlation coefficient tells us how closely the data is linearly correlated. Since they seem to be decreasing (and r = -0.3), we know that there exists some kind of correlation. According to the plotted data and the fact that r isn't too far from 0, the correlation isn't that strong so we can eliminate (C). Since we know r is negative, the only answer that works would be (B).

8. The following data presents the SAT score and GPA for 8 students. If we assume that SAT score is dependent on GPA, which variable is x and which is y?

GPA is x, SAT score is y

If we assume that the SAT score is dependent on GPA, then SAT score is the dependent variable or the y variable, and GPA is the independent variable or x. Of course, (A) and (D) don't make sense at all, since one variable can't be both x and y.

9. The following figure displays a graph showing GPA and SAT score. Based on the scatter plot, which of the following is the best assumption about the correlation between the variables?

Positive linear correlation

Hopefully we can tell just by looking that (B) is wrong. The two variables are related in some way. Based on the scatter plot, the SAT score increases with the GPA so there is a positive linear correlation. Exponential and negative linear correlations wouldn't make sense because the higher the GPA, the SAT score would either skyrocket (to well above the maximum) or decrease. That's just illogical.

10. What is the correlation coefficient for the assumption that SAT score is dependent on GPA?