Durbin Watson Statistic

  

Categories: Metrics, Education

The Durbin Watson Statistic lets us know when good statistical regression analysis has gone bad...like a donut that looks good on the outside, but is actually super stale. Nobody likes a stale donut.

The Durbin Watson Statistic tests a time-series regression for autocorrelation, which we don’t want. Other tests might say “hey, you, your regression is looking good!” while the Durbin Watson Statistic test might say “uhmmm, actually, you should take another look...something’s not right, even if the others tests checked out.” The Durbin Watson Statistic gives a value of 2 if there’s not autocorrelation, or a value above or below 2 (within 0 - 4 range), which means there’s negative or positive autocorrelation.

So what is autocorrelation, and why is it bad? Regressions are functions that try to use a bunch of data to predict something. Basically, regressions are a statistical method to find correlations (it can’t prove causations, though...for that we’ve gotta have experiments) by fitting data to a line. Finding the best line for the data is the goal. How far the data points are from the line is the error, which we want to minimize to get the best fit line.

When there’s autocorrelation, that means your error value of your regression is correlated, either negatively or positively. If your regression “fits” the data well and your errors are correlated, that means something’s wrong. For instance, it could mean that you missed a really important variable that has some explanatory power, which shouldn’t be nested in your error, but a part of your regression line (omitted variable bias).

You can also get autocorrelation when your regression is functionally misspecified, which means your regression doesn’t actually fit the data well, because you have equal errors on both sides of your regression line, showing that you missed something in the relationship...which is kinda the point of doing a regression.

A third way you can get autocorrelation is measurement error in the independent variable, which will cause your independent variable and your error variable to both reflect that measurement error, and you’ll find your errors correlating over time with that measurement error.

Related or Semi-related Video

Finance: What is Inverse Correlation?1 Views

00:00

Finance Allah Shmoop What is inverse correlation All right It's

00:07

the relationship between two variables where we can expect an

00:10

increase in one variable to be paired with a decrease

00:13

in another variable Alright in plain English correlation When it

00:17

rains you get wet Inverse correlation When it rains you

00:22

get dry Correlation You have a big brain So your

00:26

smart inverse correlation like we're thinking dinosaurs Maybe they had

00:30

big brains all of it But if they had big

00:32

brains the bigger their brain Well the dumber thing God

00:35

Well that would be an inverse correlation right Correlation You

00:38

drive a fast flashy cars so you probably have a

00:41

small garage Alright Inverse correlation You drive a fast flashy

00:45

car and everything else about you is enormous Yeah So

00:49

in a word in versus just opposite check out inverse

00:52

correlations in this table showing to data sets Note how

00:55

the returns on investment in gold increase while the returns

00:58

on investment in Pat's for cats decrease over the same

01:02

timeframe And instead of hats for cats you could have

01:05

seen the stock market because people typically retreat into gold

01:08

when they're nervous about equities So this is really not

01:10

a bad inverse correlation right Okay So let's take a

01:13

look at a scatter plot of the same two data

01:14

sets See how the data points get lower and lower

01:17

as we go farther to the right Yeah that's because

01:19

the X values or the returns on the gold investment

01:22

increase or go farther to the right Well then the

01:25

Y values or returns on the hats for cats equities

01:28

investment decrease or go closer to the Y axis Well

01:31

sometimes we want to put a number on how strong

01:34

or weak The inverse correlation is between any two pairs

01:38

of variables So basically we're trying to determine if the

01:40

inverse correlation is one that follows a very steady amount

01:44

of decrease in one variable for a fixed amount of

01:47

increase in the other or if the amount of decrease

01:50

in one variable fluctuates for a fixed increase in the

01:53

other or more simply how closely the points on the

01:56

scatter plot are to an imaginary line like this thing

01:59

that best represents them Right That's an r squared correlation

02:02

there we'll get to it So the measure of how

02:04

strong the correlation is between these two variables is called

02:08

yes the correlation coefficient or our value Well a strong

02:13

inverse correlation would have the data points all cozied up

02:17

to the best fitting line coincidently called the line of

02:21

best fit Kind of like all of your new you

02:23

know friends after you win the ninety million dollars Powerball

02:27

lottery Well a week inverse correlation would have the data

02:32

spread out away from the line and best fit So

02:34

there's really no clustering here It's just a whole bunch

02:36

of dots on a graph that don't really tell you

02:39

much of anything you know like the location of kids

02:41

at a middle school dance compared to the location of

02:44

the dance floor So how do we find the our

02:46

value to determine how strong our correlation is inverse or

02:50

otherwise Well typically people use some sort of technological gadgets

02:53

such as a graphing calculator spreadsheet or a nap Let's

02:56

take that investment data from before comparing Gold returns to

02:59

returns on equity and hats for cats and walked through

03:02

how you'd find the R value using a spreadsheet So

03:05

open your favour like Excel or Google sheets or OpenOffice

03:08

Cal Core sells for days or and whatever you use

03:11

We're using Excel for this demo but they all work

03:13

in basically the same Put the data for the gold

03:15

returns without the presented signs in the first column Put

03:18

the data from hats for cats also without the presented

03:20

signs In the second column like that go to any

03:22

blank cell like the top selling the third column C

03:25

one there Then click on the formula's tab Choose the

03:28

Mohr Functions button and Anju Statistical See the coral option

03:32

there you selected Once we choose correlation we'll get a

03:35

pop up asking us to define the two sets of

03:37

data highlight on ly the first column of data It

03:40

should load that set of cells in the top row

03:42

of the pop up All right now left Click in

03:44

the second row of the pop up Highlight on ly

03:46

the second column of data and well should slide those

03:50

cells right into place Got it Okay So click okay

03:52

and boom instant our value Well it should show up

03:55

in the cell you picked to enter the Correlation Command

03:58

See one If you followed our Russians to et You

04:01

know t looks like our correlation has the strength of

04:03

negative point eight four Oh one Okay But like what

04:06

does that mean Is that strong or weak Positive Negative

04:10

Well for inverse correlations there's a range of values between

04:14

zero and negative one that we consider strong medium and

04:17

weak Inverse correlations like strong inverse correlation is generally run

04:21

from about negative point seven The negative one These will

04:23

be scatter plots where the points are quite close to

04:26

the best fit line like they're extremely counter or inversely

04:30

correlated Like if you found that every guy over forty

04:34

who drove a red convertible portion wore a gold chain

04:37

necklace there had a really big garage well then it

04:40

would be negatively correlated to our expectations Okay medium inverse

04:45

correlation is generally run from about negative point For the

04:48

negative point seven seas will be scattered plots with points

04:50

group less closely around the line A best fit Think

04:53

about it like well maybe half to two thirds of

04:55

all the guys have a small garage versus a big

04:58

garage Yeah something like that Weak inverse correlation is generally

05:01

run from well zero toe negative point for easily scatter

05:04

plots with almost no riel tight grouping Maybe there's some

05:07

trend if you really study it hard and think about

05:10

Roar Shack But there's really no correlation between the size

05:13

of your garage and when you're driving a convertible red

05:15

portion you wear a gold chain necklace All right one

05:18

thing we have to be careful about With the inverse

05:19

correlation XYZ thie implied value judgments that mistakenly get applied

05:22

to the two variables that are inversely correlated like in

05:25

our correlation calculation on returns we saw the gold investment

05:28

rise while the equity in hats for cats investment had

05:31

decreasing returns and basically was saying that people retreated putting

05:36

there cash into gold when they were nervous about the

05:38

equity markets Well that doesn't mean gold will always be

05:41

the one to increase While hats for cats decreases Beyond

05:44

the obvious changes in the market that might make gold

05:47

suddenly tank an inverse correlation means that gold could be

05:50

the investment with decreasing returns While hats for cat shows

05:54

increasing returns right the correlation thing is just showing that

05:57

they're inversely correlated When one goes up the other goes

06:00

down It could be that well when one goes down

06:02

the other one goes up Got it by the biggest

06:04

takeaway smelling That inverse correlation means that as one variable

06:08

increases in general the other variable decreases particularly when you

06:12

have high R squared correlations there Also we can calculate

06:15

how strong the correlation is by finding the R value

06:18

which we typically do using some technological do Dad Yeah

06:21

thank you Google Sheets and excel in all that stuff

06:23

Inverse correlation Czar values run from zero to negative one

06:27

with strong being in close to negative one in week

06:30

being close to zero And we're hoping there's an infamous

06:33

correlation between the number of matches we get on tinder

06:35

and the number of dates when we get that in

06:37

badly But well so far the data is not backing 00:06:41.135 --> [endTime] us up Change our picture what Oh

Up Next

Finance: What are correlation coefficients?
36 Views

What are correlation coefficients? Correlation coefficients are calculated variances between two variables within a given time period. As variable...

Find other enlightening terms in Shmoop Finance Genius Bar(f)