From 11:00PM PDT on Friday, July 1 until 5:00AM PDT on Saturday, July 2, the Shmoop engineering elves will be making tweaks and improvements to the site. That means Shmoop will be unavailable for use during that time. Thanks for your patience!

# At a Glance - Scatter Plots & Correlation

Scatter plots are an awesome way to display two-variable data (that is, data with only two variables) and make predictions based on the data. These types of plots show individual data values, as opposed to histograms and box-and-whisker plots.

Here's a scatter plot of the amount of money Mateo earned each week working at his father's store.

The weeks are plotted on the x-axis and the amount of money he earned for that week is plotted on the y-axis. In general, the independent variable (the variable that isn't influenced by anything) is on the x-axis and the dependent variable (the one that is affected by the independent variable) is plotted on the y-axis.

Using this plot we can see that in week 2 Mateo earned about \$125, and in week 18 he earned about \$165. More important is the trend of the data. For example, with this data set it is clear that Mateo is earning more each week. Maybe his father is giving him more hours per week or more responsibilities.

## Correlation

With scatter plots we often talk about how the variables relate to each other. This is called correlation. There are three types of correlation: positive, negative, and none (no correlation).

• Positive Correlation: as one variable increases so does the other. Height and shoe size are an example; as one's height increases so does the shoe size.
• Negative Correlation: as one variable increases, the other decreases. Time spent studying and time spent on video games are negatively correlated; as the your time studying increases, time spent on video games decreases.
• No Correlation: there is no apparent relationship between the variables. Video game scores and shoe size appear to have no correlation; as one increases, the other has no effect.

Mateo's scatter plot has a pretty strong positive correlation; as the weeks increase his paycheck does too.

## Line of Best Fit

We use a "line of best fit" to make predictions based on past data. There are many complicated statistical formulas we could use to find this line, but for now we will just estimate it by drawing a line through the points on the graph that looks like it fits the trend of the data. When drawing the line, you want to make sure that the line fits with most of the data. If there is a point that is much higher or lower (an outlier) it shouldn't be on the line.

Using this line, we can predict how much money Mateo will earn in his 20th week of work (assuming he continues this pattern).

Based on this line, Mateo will earn approximately \$157 in week 20.

Here's another type of graph involving a bell curve

#### Example 1

 This is a scatter plot showing the amount of sleep needed per day by age. As you can see, as you grow older, you need less sleep (but still probably more than you currently are getting...).

#### Example 2

 These two scatter plots show the average income for adults based on the number of years of education completed (2006 data). 16 years of education means graduating from college. 21 years means landing a Ph.D.

#### Classify each pair of variables as positively, negatively, or not correlated.

Shoe size and the number of pairs of shoes one owns.

#### Use this scatter plot to answer questions 3 – 5

Frankie and Lucy are planning on selling a new iPhone app. This is a scatter plot estimation of how many apps they can sell at different prices. A line of best fit is drawn in for you.

#### Exercise 3

What kind of correlation is shown in this graph (positive, negative, or no correlation)?

#### Exercise 4

If Frankie and Lucy price the application at \$2.50, how many can they expect to sell, and how much money would they get from the sales?

#### Exercise 5

If they price the application at \$3.00, how many can they expect to sell, and how much money would they get from the sales?