# High School: Statistics and Probability

### Interpreting Categorical and Quantitative Data HSS-ID.B.6

6. Represent data on two quantitative variables on a scatter plot, and describe how the variables are related.

Even if they don't seem like it sometimes, people are complex and endlessly interesting. Every person is the combination of so many different traits—hair color, eye color, weight, height, nationality, sex, gender, age—that it's practically impossible for any two to be identical.

Students should already know that these different traits are called variables because... well, they vary. Some variables have nothing to do with one another (your age doesn't affect your eye color, does it?), but others, like height and weight, are related in some way. An excellent way to plot out the relationship between two variables is to use a scatter plot.

Students should know to first assign variables to axes. Usually, the dependent variable is on the vertical (y) axis and the independent variable is on the horizontal (x) axis. So if we want to figure out if a person's weight depends on their height, the height will go on the horizontal axis and weight will go on the vertical axis. It should look something like this.

Then, one by one, we can plot points from the data table onto the graph, just like on the x-y coordinate plane. Here, the x coordinate is the height and the y coordinate is the weight. After we've plotted all our points, we should have our completed scatter plot.

Now what? We've plotted the points, but we still haven't answered how a person's height affects his or her weight. Well, we can see that as a person's height increases, their weight has a tendency to increase, too. While we can't prove any concrete relationship just yet, we can say that they're correlated. Weight does depend on height and while we aren't exactly sure how, we can use functions to fit to the data.

#### Drills

1. The air and water temperatures at 3 PM at a popular summer swimming hole were recorded for a week. It is thought that the water temperature depends on the air temperature. Which is the correct scatter plot showing the dependence of water temperature on air temperature?

The key here is to look at the x and y axes. Since the water temperature depends on the air temperature, TW is the dependent variable and TA is the independent variable. In other words, TW should go on the vertical axis and TA should go on the horizontal axis. Only (A) plots the points and assigns the axes correctly.

2. The air and water temperatures at 3 PM at a popular summer swimming hole were recorded for a week. It is thought that the water temperature depends on the air temperature. If you wanted to fit a function to this data, what form will the function likely be?

Linear

Reverse what? Yeah, lets just get rid of (C) as a potential answer. Quadratic? Exponential? Can you imagine if water temperature increased exponentially with respect to air temp? Ouch! That would make for some scorching pools. The scatter plot shows a clear linear relationship. The water temperature appears to be increasing with some constant relationship to air temperature.

3. Which of the following is an example of a scatter plot that follows an exponential relationship?

Exponential means the dependent variable grows faster as the independent variable increases. There doesn't seem to be any correlation in (D), and (A) is very clearly linear. While (C) increases at first, it goes back down, which is parabolic. That means (B) is the only exponential distribution.

4. Which of the following is an example of a scatter plot that follows a quadratic distribution?

A quadratic function is one that is curved like a parabola. A quadratic function is good at describing the number of people at a party with respect to time. Only a few people show up at the beginning, but eventually the party reaches its climax. Then, after a certain amount of time, people get tired and start to leave until there are only a few people left again. The only graph that takes that shape here is (C).

5. It is thought that the score on a particular math test is dependent on the number of hours spent studying and that the equation used to describe the score is: y = 50 + 3.5x. What does x in the equation represent?

Number of hours student studied

Usually, x is the independent variable, one that does not depend on the other. In the problem statement, it was suggested that the score on a math test depends on the number of hours studied. The score is the dependent variable and the hours studying is independent of any other factor. The problem-solving skills and quick reflexes required for Tetris shouldn't be dismissed, but we doubt they'll help you on your next math test.

6. It is thought that the score on a particular math test is dependent on the number of hours spent studying and that the equation used to describe the score is: y = 50 + 3.5x. If a student studied for 10 hours, what is his predicted score?

85

Ten hours?! That is a long time to study for a test (although it might feel like less with Shmoop). But it paid off, since the student earned an 85. We know this because the score is given by y = 50 + 3.5(10) = 50 + 35 = 85.

7. It is thought that the score on a particular math test is dependent on the number of hours spent studying and that the equation used to describe the score is: y = 50 + 3.5x. A student who studied for 6 hours earned a score of 82. What is the residual for this score?

-11

The residual is always the predicted value minus the actual value. The predicted value in this case is 71 and 71 – 82 = -11. The student did better than predicted. That's always a nice surprise for teachers (and even more so for the student).

8. When finding the best fit linear equation, which of the following are we trying to minimize?

The sum of the squares of the residuals

To determine the best fit linear function, we apply the method of least squares—not least circles or triangles! This is because residuals can have positive and negative values and minimizing the sum of the residuals would make them super negative. That's not what we want. We can't minimize the dependent variable values, and while the range of the residuals might be a good choice intuitively, the point is to make sure all the residuals are about the same distance from the zero line, not to make them as close to zero as possible. That's why (B) is the right answer.

9. The air and water temperatures at 3 PM at a popular summer swimming hole were recorded for a week. It is thought that the water temperature depends on the air temperature. The data table below also presents the values for xy and x2. Determine the slope of the best-fit line for this data.

0.46

As long as we know the formula, we can calculate the slope, but that isn't actually necessary. Knowing that TW increases as TA increases means that our slope has to be positive, not negative. Just knowing that eliminates (C) and (D). Next, we should consider whether 30 or 0.46 make sense. For every 1°F the air increases, the water increases by either 0.46°F or 30°F. Which seems more reasonable? We're hoping you think 0.46°F as well. To verify, we can plug our values from the table into the formula.

10. The air and water temperatures at 3 PM at a popular summer swimming hole were recorded for a week. It is thought that the water temperature depends on the air temperature. The data table below also presents the values for xy and x2. Determine the best-fit linear equation for this data.

y = 0.46x + 30