# Probability and Statistics

### Topics

A common way of displaying bivariate data when both variables are quantitative is using a **scatter plot**. A scatter plot has a title, axes with labels, and (exactly like it sounds) little dots scattered around, one dot for each data point. It's like graphing constellations. Now let's be sure Orion's pants don't fall down.

A scatter plot is not to be confused with Scattergories or scat singing. However, both of those sound like much more of a party.

### Sample Problem

Make a scatter plot for the data contained in the following table:

We'll put the first variable on the *x*-axis, the second variable on the *y*-axis, and set up the scatter plot like this:

We've filled in a scale on each axis that makes sense given the data in the table. We don't need to have 4 feet or 300 pounds on the graph, since none of our slightly edited data values are close to those numbers.

The funny jagged lines in the lower left corner show that we're skipping numbers. If we zoom in on the "height" axis, the jagged line shows that we're skipping the numbers from 0 to 5 on the scale, because we don't need them:

Similarly, the jagged line on the "weight" axis shows that we're skipping the numbers up to 130, because we don't need them. If you're wondering why we don't include them anyway, try putting together a graph with all of these numbers and get back to us when you're done.

Done yet?

How about now?

Now?

We rest our case.

Now for each line in the table, we plot a point whose *x*-value is the height and whose *y*-value is the weight.

Sometimes the dots in a scatter plot group together in a thick band or swath that looks like it's trying to be a line. Ugh, what posers. If this "line" slopes up, we say there's a **positive correlation** between the two variables. As one variable goes up, so does the other. Lemmings.

If the line slopes down, we say there's a **negative correlation** between the two variables. As one variable goes up, the other goes down:

The word "correlation" can be roughly chopped up into two parts: "co" for "with" (like "co-operate," which means to work "with" another person), and "relation" for..."relation." We could also chop it up into "corr" and "elation," which doesn't mean much, but which we imagine might have something to do with being extremely happy about getting to the center of an apple.

Two variables have a correlation (either positive or negative) if they have some sort of relation with each other. This relation can be as vague as "if variable 1 goes up than so does variable 2." They don't necessarily need to be living together, or carpool or anything.

What does a scatter plot look like for two variables that don't have a correlation? There are two possibilities. One is to have the dots scattered all over the place, not trying to form any sort of line. These dots are going to get a stern talking-to.

The other possibility is that the dots might try to make a flat line:

Nice try, dots.

The values of the two variables don't have anything to do with each other here. Variable 2 has roughly the same value all the time, and the value of variable 1 doesn't appear to have anything to do with the value of variable 2. While they clearly have different interests and priorities, they still find a way to make it work, and we think that's pretty special. By the way, their anniversary is coming up. Might be nice if you got them something.

Depending on how we draw our scatter plot, we could also end up with dots trying to make a vertical line:

There's no correlation here either. Since variable 1 has roughly the same value all the time, the value of variable 1 doesn't have anything to do with the value of variable 2. The only difference is that this time we're observing their ups and downs, rather than their back-and-forth.

We can also think about correlation from a verbal standpoint. Let's look at a real-world situation and then see how it might appear in dot-form.

### Sample Problem

The longer Dinah studies, the better she does on her math exams.

This statement is talking about two variables: the length of time Dinah studies, and her score on a math exam. The statement says that as the length of time Dinah studies goes up, so does her score. Fancy that. If we were to follow Dinah's study habits for a semester, and make a scatter plot with a point for each math exam, the scatter plot would probably look something like this:

Strangely enough, this scatter plot is basically what Dinah's early math exams looked like. She didn't know the answers, so she drew a bunch of dots all over the test sheet. You've come a long way, Dinah.

The variables have a positive correlation: as one goes up, so does the other. We don't *need* a graph to visualize such a simple correlation, but it may help you to imagine a scatter plot in your head. (Note: Do not actually draw a scatter plot in your head. You should not be putting pencils in your ears.)

### Another Sample Problem

No matter how long Dinah studies, it always takes her a full hour to finish a math exam.

This statement is talking about two variables: the length of time Dinah studies, and the length of time she takes to finish a math exam. Since it takes Dinah an hour to finish an exam no matter how long she studies, there is no correlation between the two variables.

If we followed Dinah's study habits and made a scatter plot with a point for each math exam, it would look like this: