Illusory or False Correlations

Categories: Metrics

The tendency people have to connect things that aren’t connected. Sometimes a person genuinely believes these connections are true, but other times it’s just a manipulation tool. Politicians do it all of the time.

For instance, a new mayor is elected. When the mayor takes office in January, crime in the city falls 10%. The mayor immediately claims it’s due to his election. In reality, it’s because the previous mayor received funding to double the police force and criminals stopped committing crimes, or moved to other cities.

The new mayor is using a false correlation to manipulate the public into believing he’s already awesome, even though he just started and it was really the previous mayor’s actions that lead to the crime reduction.

Related or Semi-related Video

Finance: What is r-squared?0 Views

00:00

Finance allah shmoop what is r squared r squared It's

00:09

a measure of the personage of change in one variable

00:12

Do exclusively to changes in another variable Accidentally back into

00:17

a car in the lot and need someone to blame

00:20

it on late for work and need a scapegoat The

00:22

boss will believe well r squared is here to find

00:25

you up patsy To take the blame for well pretty

00:28

much anything you can dream up well r squared won't

00:30

take the fall for you It will find the fall

00:33

guy best suited for that job So let's say we

00:35

looked at a set of by various or to variable

00:38

data that compares the top speed of remote control cars

00:42

to the size of the tires on the car which

00:45

end up having in our scored value of point seven

00:47

nine That means seventy nine percent of the changes in

00:50

the top speed of the car were due to changes

00:52

in the size of the tires More simply the changes

00:56

entire side are the primary cause then in changes in

00:59

the top speed and you'd say in normal english the

01:02

r squared between larger tires and faster speed was high

01:06

like point eight Other factors affect top speed like wind

01:09

resistance battery power and so on But change his entire

01:12

speech of the primary cause for changes in top speed

01:14

We need to be careful not to say that tire

01:16

size is the primary cause of top speed See that

01:19

was a little error there They just fill in You

01:21

are square doesn't tell us that It just tells us

01:24

that changes in the tire side are the primary cause

01:26

for changes in the top speed meaning they're just related

01:30

We don't know that one causes the other and the

01:32

difference is subtle But while hugely important here So let's

01:35

walk through how we might really work with r squared

01:38

from beginning to end in a problem What We've got

01:40

two variables like the daily price of a gallon gasoline

01:43

and the average number of gallons purchased per customer on

01:46

that same day It's not unreasonable to think that changes

01:49

in the price of gas or a factor in how

01:51

much gas people buy but how much of a factor

01:54

like a gallon of gasoline costs You know eighteen dollars

01:57

right there pete Fewer people buying instead of a god

01:59

three bucks So r squared will tell us how much

02:02

of the changes in how much people pump into their

02:04

tank is due to changes in the price of gas

02:07

And how much of those changes in the amount of

02:09

gas purchased is due to other factors like time of

02:12

season or the length of trip there taking or the

02:15

amount of money they happen to have on hand or

02:17

how loud the kids are screaming Are we there yet

02:20

Are we there yet in the back seat Yeah okay

02:22

well because we knew it would come in handy We

02:24

collected data from our local gas and sip on seven

02:27

different days Calculations like the coefficient of determination are squared

02:31

and or the correlation coefficient are should only be attempted

02:34

on data that has a linear ish shape It's always

02:38

a good idea to whip up a scatter plot of

02:39

the data just to make sure it's not obviously curved

02:42

or has some other weird non linear pattern that we

02:44

can't then generalize from Well the pattern here is linear

02:47

enough and doesn't show an obvious curve or other pattern

02:50

So we're good to go and we can calculate r

02:52

squared by hand but almost nobody does Even for a

02:54

very small data sets graphing calculator spreadsheets and web sites

02:57

dedicated to finding our and only r and r squared

03:00

Well all do a dandy job of getting us the

03:02

values we want So let's do that well after popping

03:05

this data into our jailbroken diamond plated solar powered voice

03:09

activated t I eighty four plus i'd any um edition

03:13

We get in our squared value of point one three

03:15

five eight right there What does that mean for us

03:17

And for our gas problem Well since r squared is

03:20

the percentage of change in the uae variable that is

03:22

do strictly to changes in the x variable It means

03:24

that on ly thirteen point five eight percent of changes

03:27

in the average amount of gas purchase are due to

03:29

changes in the price of gas It also means that

03:31

eighty six point four two percent of the changes in

03:33

the amount of gas purchase are due to other factors

03:36

like changes in how much money people have on hand

03:38

It could also have to do with well changes in

03:41

how far they're planning on driving That day could mean

03:43

many other things but it doesn't For now we're just

03:46

focused on the numbers And again we need to be

03:48

very careful not to claim that r squared number tells

03:51

us how much of a percentage cause one variable to

03:54

do another thing that's a no no r squared is

03:56

always an on ly the percentage of changes in one

04:00

variable do ooh to changes in the other Also weaken

04:02

Take r squared out back behind the wood shed and

04:05

square roots The mess out of it to get the

04:07

correlation coefficient are got it Okay Some important safety tips

04:11

We should only find r squared for data that have

04:13

a linear ish pattern We confined our squared by hand

04:17

but that's a sign of insanity So used tech to

04:19

do the grunt work for the actual calculations Well our

04:22

scores the percentage of change in one variable that is

04:24

do strictly to changes in another variable if we square

04:27

root are squared while we get the correlation coefficient are

04:30

positive number There never never ever suggest that ninety nine

04:33

percent of the changes in a police person's weight is

04:35

due to changes in doughnut consumption either just a extra 00:04:39.283 --> [endTime] free warning there from your friends at shmoop

Find other enlightening terms in Shmoop Finance Genius Bar(f)