Regression Line

Categories: Metrics, Trading

You know that one friend in your group who never has a beef with anyone, who always remembers little details about everyone’s lives, who always has a kind word, and even volunteers to help you move?

A regression line is kinda like that for a set of linear-ish data, always trying to stay as close to every data point as possible, no matter how far away some data points try to get. In fact, it’s the one “line of best fit” that minimizes the distances between the data points and the line.

It’s important to remember that we only find regression lines for data that is probably linear. We say “probably,” because there’s no way to be sure that a data set must be linear...only a bunch of circumstantial evidence that it might be linear.

Polly, the CEO of the world-renowned plant distributor “Polly’s Pretty Plants,” sells vegetable plants she has grown in her vast network of greenhouses. Okay, you got us. She works out of her mom’s basement, and she’s 13. Still, Polly is a maven for experimentation and statistical data gathering. So much so that she's gathered data on the different amounts of her special fertilizer/water mixture given to several plants from the same packet of tomato plant seeds all planted in soil from the same spot in her mom’s backyard.

Polly would like to be able to predict the amount of growth a seed will experience based on how much fertilizer she puts in the water. The whole point of the regression line is to allow her to do this...at least to a certain degree of accuracy. Regression lines give predictions, not guarantees.

Regression lines are the best fit lines to a set of data with a linear pattern. In this case, the phrase “best fit” means the line reduces the vertical distance between the points and the best fit line to as small as possible. We can find the slope and intercept of the regression line using the formulas, or we can just use tech to do everything for us.

And we can use the regression equation to help us predict either x or y values, with the expectation that the real result will probably be close to the value predicted by the regression equation.

Now we just need some regression to help us figure out if we were Albert Einstein or Fred Astaire in a previous life.

Related or Semi-related Video

Finance: What is a regression line?2 Views

00:00

Finance allah shmoop what is a regression line You know

00:07

that one friend in your group who never has a

00:09

beef with anyone who always remembers little details about everyone's

00:12

lives who always has a kind word and even volunteers

00:15

that help you move All right A regression line is

00:18

kind of like that for a set of linear ish

00:21

data always trying to stay as close to every data

00:24

point as possible no matter how far away some data

00:26

points try to get In fact it's the one line

00:29

of best fit that minimizes the distances between the data

00:34

points and the line That's the regression line Before we

00:37

get too carried away with the ins and outs of

00:39

finding these elusive regression lines it's important to remember that

00:42

we only find regression lines for data that is probably

00:46

linear We safe probably because there's no way to be

00:49

sure that a data set must be linear only a

00:51

bunch of circumstantial evidence that it might be linear There

00:54

are several bits that all work together to help us

00:57

feel okay with saying data are linear ish but probably

01:00

the most important and the only one we really need

01:02

to worry about Here in finance land is a linear

01:05

pattern in the scatter plot Okay Example Here we go

01:08

Polly the ceo of the world renowned plant distributor polly's

01:12

pretty plants sells vegetable plants She has grown in her

01:15

vast network of greenhouses Okay you got us She works

01:19

out of her mom's basement and she's thirteen Still polly

01:22

is a maven for experimentation in statistical data gathering So

01:26

much so that she has gathered the following data on

01:29

the different amounts of her special fertilizer water mixture given

01:33

to several plants from the same packet of tomato plant

01:36

seeds all planted in soil from the same spot in

01:38

her mom's backyard Well polly would like to be able

01:41

to predict the amount of growth a seed will experience

01:44

based on how much fertilizers puts in the water The

01:46

whole point of the regression line is to allow her

01:48

to do this at least to a certain degree of

01:51

accuracy Right Like that's we're getting it but first she

01:53

needs to know if the data points are linear okay

01:55

so polly quickly whips up a scatter plot that's what

01:58

thirteen year old girls d'oh isn't it and see what

02:01

appears to be a roughly believing your pattern right Well

02:04

now we could draw lines on that scatter plot until

02:06

the cows come home But only one is the line

02:08

of best fit or the line that gets a smallest

02:11

bunch of distances from each data point possible All right

02:14

well what would this regression line look like Okay well

02:17

as an eyeball on lee approach a line of best

02:20

fit does not have to hit any of the data

02:22

points but it should follow the slant or slope of

02:24

the data and tries to split the points so that

02:27

the distance is straight up from the line to The

02:30

points are balanced out by distances straight down from the

02:34

line to the point Unless you start with line any

02:36

line and it goes up from left to right All

02:38

right this first line does nothing right We usually try

02:41

to fix the slow first and what we need to

02:43

make the slope less steep In other words decrease the

02:46

slow period little by little until we have a pretty

02:49

good match to the slope of the data That last

02:51

line looks like it's pretty close to the slant of

02:53

the data And again this is just an eyeball approach

02:55

so yeah it won't be perfect We've got the slow

02:58

part pretty locked in But we don't have the split

03:00

The data points with equal distances thing going for us

03:03

yet we need to move the whole line straight up

03:05

until it kind of splits The data points a little

03:07

by little until we can get a good split There

03:09

don't have to be equal numbers of points above and

03:12

below It's more about equal distances So we have three

03:15

points all about the same medium ish distance above with

03:19

a very close point a medium close point and a

03:21

kind of farpoint below Yeah down there s o The

03:24

total distance above is about equal to the total distance

03:27

below What We could find the equation of the eyeball

03:30

line but it definitely isn't the perfect best fit line

03:34

It's just close So how do we find the equation

03:37

by hand Well the formula for the slope of the

03:40

regression line is m equals The correlation coefficient are times

03:44

the standard deviation of the wide data s sub y

03:48

divided by the standard deviation of the ecs data s

03:51

sub x right We typically don't find the correlation coefficient

03:54

or the standard deviations by hand especially since they're super

03:57

duper easy to get via technology like a graphing calculator

04:00

spreadsheet our website But well you can check out some

04:02

of our other videos of you really want to gold

04:05

if we pop polly's data from before into a t

04:08

I graphing calculator can run a lillian wreg when your

04:11

aggression to get pr value Then run one of our

04:13

stats on both the x and y data to get

04:16

standard deviations But we get the correlation Coefficient are to

04:19

be zero point eight six five four in santa deviation

04:21

of the ecs data s sub x to be one

04:23

point eight seven Oh wait right there and stare a

04:25

deviation of the white data s Why to be two

04:28

point seven Oh five three Easy Okay that in turn

04:30

gives a slope of our regression line of point eight

04:33

six five Four times two point seven oh five three

04:35

divided by one point eight seven away which is approximately

04:37

one point two five one four So we did all

04:39

the math there for you What We're still missing the

04:41

y intercept So how do we find that for polly

04:43

Wealth The formula for the y intercept of the regression

04:46

line b is found by taking the mean of the

04:48

wide data Why bar Minus the product of the slope

04:51

em And we just calculated in the mean of the

04:54

ecs data x bar again we usually get the two

04:56

means x bar And why bar using tech This is

05:00

the twenty first century after all People using the same

05:03

one var stats from before on each data set using

05:06

our tricked out diamond plate covered voice activated t i

05:09

graphing count Well that gives us exper equal to four

05:12

point five and wiebe are equaled a fourteen point two

05:14

six six seven to go along with our slope of

05:16

one point two five one four Well r y intercept

05:18

then is fourteen point two six six seven minds one

05:21

point two five one four times four point five which

05:23

is approximately eight point six three five for a lot

05:25

of numbers at you don't We've thrown them Yeah but

05:27

what's our regression equation than wealth We jam the slope

05:30

And why intercept into slope intercept form that y equals

05:33

mx plus b thing and we get y equals one

05:36

point two five one four Acts plussed And that's b

05:38

8 point six whatever well plotted on the original scatter

05:41

plot This is how that line looks right there Finding

05:44

the slope and white intercept using the formulas or find

05:47

again twenty first century people Come on ask your parents

05:49

No one except maybe old man hostetler who still thinks

05:53

calculators or a tool of the devil Does any of

05:55

this by hand And what any of you be really

05:58

mad if we told you that we already had the

06:00

equation on a screen way back at the beginning of

06:03

the by hand calculations Yeah we knew you'd be be

06:06

cool about it Seems that when we found the r

06:08

value we also had the slope And why intercepted the

06:11

regression line staring us right in the face just above

06:14

the are there see that value in the a equals

06:17

ro that's the slope of the regression line See the

06:21

b equals ro Yeah that's the why intercept the fact

06:24

that the values are a wee bit different than ours

06:27

Like poor decimal places in That's just a rounding here

06:30

so polly's got a regression line What can she do

06:32

with it Well she could use it to predict likely

06:35

growth of another of those same variety of seeds based

06:37

on how much fertilizer she adds And let's say probably

06:40

uses a five point five cubic centimeters of fertilizer How

06:42

tall might her plant be What we just plug five

06:45

point five in for acts in the regression equation giving

06:48

us white schools one point two five one four times

06:50

five point five plus eight point six three five four

06:52

which is fifteen point five and change centimeters She could

06:55

expect the plan to be about fifteen point five plus

06:58

centimeters tall after the same time period passes and probably

07:01

won't be exactly that value But it should at least

07:04

be in the neighborhood Regression lines give predictions not guarantees

07:07

So to recap regression lines are the best fit lines

07:10

to a set of data with linear pattern In this

07:13

case the phrase best If it means the line reduces

07:15

the vertical distance between the points and the best fit

07:18

line to as small as possible we can find the

07:21

slope and intercept of the regression line using the formula

07:24

or we could just use tact to do everything for

07:26

us And we can use the regression equation to help

07:28

us predict either x or y values with the expectation

07:31

that the real result will probably be close to the

07:33

value predicted by the regression equation Now we just need

07:36

some regression to help us figure out if we were 00:07:39.223 --> [endTime] albert einstein or fred flintstone in a previous life

Find other enlightening terms in Shmoop Finance Genius Bar(f)