Durbin Watson Statistic

  

Categories: Metrics, Education

The Durbin Watson Statistic lets us know when good statistical regression analysis has gone bad...like a donut that looks good on the outside, but is actually super stale. Nobody likes a stale donut.

The Durbin Watson Statistic tests a time-series regression for autocorrelation, which we don’t want. Other tests might say “hey, you, your regression is looking good!” while the Durbin Watson Statistic test might say “uhmmm, actually, you should take another look...something’s not right, even if the others tests checked out.” The Durbin Watson Statistic gives a value of 2 if there’s not autocorrelation, or a value above or below 2 (within 0 - 4 range), which means there’s negative or positive autocorrelation.

So what is autocorrelation, and why is it bad? Regressions are functions that try to use a bunch of data to predict something. Basically, regressions are a statistical method to find correlations (it can’t prove causations, though...for that we’ve gotta have experiments) by fitting data to a line. Finding the best line for the data is the goal. How far the data points are from the line is the error, which we want to minimize to get the best fit line.

When there’s autocorrelation, that means your error value of your regression is correlated, either negatively or positively. If your regression “fits” the data well and your errors are correlated, that means something’s wrong. For instance, it could mean that you missed a really important variable that has some explanatory power, which shouldn’t be nested in your error, but a part of your regression line (omitted variable bias).

You can also get autocorrelation when your regression is functionally misspecified, which means your regression doesn’t actually fit the data well, because you have equal errors on both sides of your regression line, showing that you missed something in the relationship...which is kinda the point of doing a regression.

A third way you can get autocorrelation is measurement error in the independent variable, which will cause your independent variable and your error variable to both reflect that measurement error, and you’ll find your errors correlating over time with that measurement error.

Related or Semi-related Video

Finance: What are correlation coefficien...36 Views

00:00

Finance allah shmoop what are correlation coefficients Kind of sounds

00:08

like a new card game from the makers of cards

00:10

against humanity or an exotic disease that spreads like wildfire

00:15

on a cruise ship you know been there But a

00:17

correlation coefficient is actually a measure of how strongly connected

00:20

or correlated to different variables are It's also a measure

00:24

of how close the points on a scatter plot are

00:26

to the vest Fifth line this thing running through them

00:30

A correlation coefficient is kind of like a ranch hand

00:32

who's in charge of hurting data Okay so let's take

00:35

a closer look at the data points in our corral

00:38

taken from wild pizza restaurant Yeah they're a set of

00:41

by vary it or to variable data In this case

00:45

the data points on the x axis are the number

00:47

of minutes a table has to wait for their food

00:49

since ordering and the data points on the y axis

00:52

are the percentage of the total bill left as a

00:54

tip Interesting correlation here Pete the owner namesake of wild

00:57

pete's pizza believes there's a relationship between how long a

01:00

table waits for the food and how much they tip

01:02

generally the first step in finding a correlation coefficient is

01:05

to determine if the data points are in a roughly

01:07

leaning your pattern So we need to whip up a

01:09

quick scatter plot like this thing If the data points

01:12

don't have an obvious linear pattern lily shouldn't even bother

01:15

to calculate the correlation coefficient because it's not meaningful Once

01:18

there appears to be a linear or roughly linear pattern

01:21

to the data it's time to get calculate their partner

01:24

okay The formula for the correlation coefficient which is denoted

01:27

by the variable are here was a bit unwieldy and

01:30

typically the correlation coefficient calculated using an actual calculator of

01:33

some kind But still it's nice to know where these

01:35

numbers come from so we'll do it by hand and

01:37

double check our work So the process goes like this

01:39

First we find the mean in standard deviation in the

01:42

ecs data in the wide out of treating each set

01:44

of data as its own list separate from each other

01:46

We'll use a calculator just a shortcut this part of

01:49

the process and now we need to take its data

01:51

point in the x list Subtract the mean from it

01:53

and divide that result by the standard deviation so twelve

01:57

months fifteen point one six six seven which is negative

01:59

Three point one six seven divided by five point six

02:01

blah blah blah which is negative about a half then

02:04

twenty minus fifteen point one six seven which is four

02:07

point eight three three divided by five points You bubba

02:09

blah blah blah which is point eight six and change

02:11

and so on But we need the lather rinse Repeat

02:13

that same process of subtracting the mean of the y

02:16

data from each y value and then dividing the standard

02:18

deviation in the y values Right Well that'll be sixteen

02:21

months Fourteen which in california is too divided by three

02:24

point two eight blah blah blah which is point six

02:26

and change So we have thirteen months fourteen which is

02:28

negative one divided by three point two eight six which

02:31

is well negative point three ish So now we need

02:33

to multiply each matched acts And why value from our

02:36

previous calculations That'll be negative Point five six and change

02:39

times a point six blah blah blah which is negative

02:42

Point three four for one Then we have point eight

02:44

six three times negative point three oh four which is

02:47

a negative point two six two Then negative point seven

02:50

four four times one point two one seven two which

02:53

is Well what is that Negative point nine and so

02:56

on Now he's some the values we just got which

02:58

is all this stuff We adam all up and it

03:00

comes out to negative Four point four five five four

03:04

Okay one last step here Cowpokes We just need to

03:06

divide one less than the number of data points We

03:09

have six data points So we divide by negative Four

03:11

point four five five four yeah by five Divide that

03:14

And that means our correlation coefficient or our value is

03:18

negative Point eight nine one one Interesting Excellent Well now

03:22

we have a real correlation coefficient also What does it

03:25

mean Well for starters we can interpret what it actually

03:28

means here Say we did their correlation coefficient or our

03:31

value is a measure of how strong your relationship is

03:34

between the two variables Assuming that linear ish pattern exists

03:37

It does not however mean that the one variable causes

03:40

the other It just means there's some kind of relationship

03:43

between them toe actually put a value on how strong

03:45

the correlation is We need to examine the continuum of

03:48

correlation Positive correlations represent situations where the scatter plot appears

03:52

to climb from left to right Negative correlations represent situations

03:56

where the scatter plot appears Toe fall from left to

03:58

right like our tips versus time data Well strong correlations

04:02

or values between point seven and one for positive correlations

04:06

and between negative point seven and one four negative correlations

04:09

That's just rough Numbers They're about point 7 And if

04:11

it's a one to one relationship it means that if

04:14

you let go of the apple it will fall every

04:16

time we're assuming they're on earth Scatter plot points will

04:20

be pretty darn close to the best fit line through

04:22

the points there medium correlations are in the point for

04:25

two point seven range and they got the negative ones

04:28

And so on Scatter plot points will be a we

04:30

distance from the best fit line Then it's not White

04:33

is tightly packed around that line and then we correlations

04:36

and just looks like a cloud It's like values from

04:38

zero two point for and zero negative point for and

04:41

they're just kind of like maybe there's a line through

04:43

there but maybe not well in our case it's our

04:45

our value is negative point eight nine one one While

04:49

it's very very negatively correlated between the two time of

04:52

ordering the food and when it shows up and the

04:55

tip paid at least the tip percentage of the meal

04:58

Which means that as it takes longer and longer for

05:01

food to arrive after ordering in general the tip percentage

05:04

goes down Also because this pattern is a strong correlation

05:08

this pattern is likely to be predictable in terms of

05:10

a certain weight time leading to a certain percentage A

05:12

while back we mentioned that our values aren't often whipped

05:15

up by hand Instead we use graphing calculator spreadsheets websites

05:18

any of them you know to whip up a mess

05:20

of our values in no time Pop the data into

05:22

the list one into in a t i a graphing

05:25

calculator Go to the count menu in the stat function

05:27

and run a lynn rag Linear regression You know we

05:30

see in our value of ours a negative point eight

05:32

nine one which is very close to our by the

05:35

hand value of point eight nine hundred eleven year negative

05:38

and is on ly different dude around it So yeah

05:40

when you need to rustle up in our value y'all

05:42

should probably grab something Check unless you want to go

05:44

through the headache of finding that our value by hand

05:47

remember that the r value just suggests a relationship between

05:49

the variables revenues saying one causes the other correlation does

05:53

not equal causation Remember that tattoo that somewhere but not

05:57

on your own body Also remember that the stronger correlations

06:00

air closer to negative one in one and farther from

06:02

zero in the middle And finally when they all go

06:05

to a restaurant and takes a spell get your order

06:07

Don't take it out on the server by stiffing them

06:09

on the tip There's a strong positive correlation between stiffing

06:13

service on tips and you know getting your food spat

06:16

in next time And while just being a massive

Up Next

Finance: What is Inverse Correlation?
1 Views

What is inverse correlation? An inverse correlation is a relationship between two variables in which one moves in the opposite direction to the oth...

Find other enlightening terms in Shmoop Finance Genius Bar(f)