Study Guide

# Probability and Statistics - Bivariate Data

## Bivariate Data

Bivariate data is data where two values are recorded for each observation (as opposed to univariate data). We could look at a bunch of cars in a parking lot, write down both their manufacturers and colors, and come up with data like this:

Toyota, red
Honda, blue
Honda, black
Ferrari, red
Ford, grey
Ford, white
Honda, blue
Honda, black

Most likely, we're writing down this information because several people parked like idiots and we want to report them. But we may also simply be solving an algebra problem. As in this case.

We can organize this data by sticking it into a table. The table can go either this way: ...or that way: Just depends on whether you're in more of a horizontal or vertical mood. If you're in a horizontal mood, you probably still haven't gotten out of bed.

In the car example, both variables were qualitative and categorical. We could have one variable be qualitative and the other be quantitative. We could also have both variables be quantitative. Our head is swimming from having so many options.

### Sample Problem

If we record the heights and weights of a bunch of people, we get bivariate data where both variables are quantitative: No comment about the last guy.

• ### Scatter Plots

A common way of displaying bivariate data when both variables are quantitative is using a scatter plot. A scatter plot has a title, axes with labels, and (exactly like it sounds) little dots scattered around, one dot for each data point. It's like graphing constellations. Now let's be sure Orion's pants don't fall down.

A scatter plot is not to be confused with Scattergories or scat singing. However, both of those sound like much more of a party.

### Sample Problem

Make a scatter plot for the data contained in the following table: We'll put the first variable on the x-axis, the second variable on the y-axis, and set up the scatter plot like this: We've filled in a scale on each axis that makes sense given the data in the table. We don't need to have 4 feet or 300 pounds on the graph, since none of our slightly edited data values are close to those numbers.

The funny jagged lines in the lower left corner show that we're skipping numbers. If we zoom in on the "height" axis, the jagged line shows that we're skipping the numbers from 0 to 5 on the scale, because we don't need them:

Similarly, the jagged line on the "weight" axis shows that we're skipping the numbers up to 130, because we don't need them. If you're wondering why we don't include them anyway, try putting together a graph with all of these numbers and get back to us when you're done.

Done yet?

Now?

We rest our case.

Now for each line in the table, we plot a point whose x-value is the height and whose y-value is the weight. Sometimes the dots in a scatter plot group together in a thick band or swath that looks like it's trying to be a line. Ugh, what posers. If this "line" slopes up, we say there's a positive correlation between the two variables. As one variable goes up, so does the other. Lemmings. If the line slopes down, we say there's a negative correlation between the two variables. As one variable goes up, the other goes down: The word "correlation" can be roughly chopped up into two parts: "co" for "with" (like "co-operate," which means to work "with" another person), and "relation" for..."relation." We could also chop it up into "corr" and "elation," which doesn't mean much, but which we imagine might have something to do with being extremely happy about getting to the center of an apple.

Two variables have a correlation (either positive or negative) if they have some sort of relation with each other. This relation can be as vague as "if variable 1 goes up, then so does variable 2." They don't necessarily need to live together or carpool or anything.

What does a scatter plot look like for two variables that don't have a correlation? There are two possibilities. One is to have the dots scattered all over the place, not trying to form any sort of line. These dots are going to get a stern talking-to. The other possibility is that the dots might try to make a flat line: Nice try, dots.

The values of the two variables don't have anything to do with each other here: y has roughly the same value all the time, and the value of x doesn't appear to have anything to do with the value of y. While they clearly have different interests and priorities, they still find a way to make it work, and we think that's pretty special. By the way, their anniversary is coming up. Might be nice if you got them something.

Depending on how we draw our scatter plot, we could also end up with dots trying to make a vertical line: There's no correlation here either. Since x has roughly the same value all the time, the value of x doesn't have anything to do with the value of y. The only difference is that this time we're observing their ups and downs, rather than their back-and-forth.

We can also think about correlation from a verbal standpoint. Let's look at a real-world situation and then see how it might appear in dot-form.

### Sample Problem

The longer Dinah studies, the better she does on her math exams.

This statement is talking about two variables: the length of time Dinah studies, and her score on a math exam. The statement says that as the length of time Dinah studies goes up, so does her score. Fancy that. If we were to follow Dinah's study habits for a semester, and make a scatter plot with a point for each math exam, the scatter plot would probably look something like this: Strangely enough, this scatter plot is basically what Dinah's early math exams looked like. She didn't know the answers, so she drew a bunch of dots all over the test sheet. You've come a long way, Dinah.

The variables have a positive correlation: as one goes up, so does the other. We don't need a graph to visualize such a simple correlation, but it may help you to imagine a scatter plot in your head. (Note: Do not actually draw a scatter plot in your head. You should not be putting pencils in your ears.)

### Another Sample Problem

No matter how long Dinah studies, it always takes her a full hour to finish a math exam.

This statement is talking about two variables: the length of time Dinah studies, and the length of time she takes to finish a math exam. Since it takes Dinah an hour to finish an exam no matter how long she studies, there is no correlation between the two variables.

If we followed Dinah's study habits and made a scatter plot with a point for each math exam, it would look like this: • ### Linear Regression

As we mentioned earlier, sometimes the dots in a scatter plot cluster like they're trying to make a nice shape. Sometimes the dots try to look like a straight line: Sometimes the dots try to look like a curve: Sometimes the dots try to look like an incredibly strange curve: When the dots are trying to be a straight line, the line they're trying to be is called the line of best fit. In actual statistics classes you get to learn a tedious-but-not-really-hard procedure called linear regression, which allows you to find the line of best fit. If you get stuck, shoot Goldilocks an email; we hear she's had some experience with this sort of thing.

Right now, though, we'll do things the cheap way. Actually, you have your choice of cheap ways. You can either put all the data points into your calculator and let it do the work, or you can draw a picture and guess. We told you it would be cheap.

By "draw a picture and guess," we mean exactly that. First we draw the scatter plot. Then we pick two points (not necessarily among the scatter plot dots) that look like they're pretty close to the line of best fit. We find the equation of the line between our two points, and say that's close enough. We're not trying to arrive at any precise solution here...we're just trying to get a general idea of what these dots are up to.

### Sample Problem

Approximate the line of best fit for the following data. The dots look like they're trying to be a line that slopes up and to the right, and goes through the points (1, 1) and (5, 5). The equation of the line between these points is

y = x,

so that's our guess at the line of best fit.

When drawing these pictures, of course, it's helpful to use graph paper and a ruler, and to have super-tidy labels. We know there's neat handwriting in you somewhere. But don't stress too much, because until you learn actual linear regression, you're only approximating the line of best fit. You only need to find a reasonable answer, not necessarily the one right answer. Enjoy it while it lasts...you won't often be asked to "guess" in algebra.

Here's another type of graph involving a bell curve; learn a little bit about it in our video.

## This is a premium product 