ShmoopTube

Where Monty Python meets your 10th grade teacher.

Search Thousands of Shmoop Videos

Finance: What is the standard normal distribution? 6 Views

Share It!

Description:

What is the standard normal distribution? Standard Normal Distribution refers to statistical data in technical analysis and the level of standard deviation discrepancy from the mathematical mean. A normal distribution’s data is +/- 1 standard deviation 68% and 95% within +/- 2 standard deviations.

Language:

English Language

Subjects:

Transcript

00:00
And finance Allah shmoop What is the standard normal distribution
00:08
Senate Normal distribution is the destruction of the Z Scores
00:11
of the data points from a normal distribution Okay but
00:15
why do we need to create a new normal distribution
00:19
like the new normal Isn't that a thing Wasn't the

00:21
normal distribution we already had good enough before We explain
00:24
why the standard normal distribution is such a huge improvement
00:27
on the plain old normal distribution but we need a
00:30
quick recap of the original A normal distribution or normal
00:34
curve is a continuous bell shaped distribution that follows the
00:37
empirical rule which says that sixty eight percent of the
00:40
data is between negative one and one Standard deviations on
00:43
either side of the mean ninety five percent of the
00:45
data is between negative two and two Standard deviations on
00:48
either side of the mean and ninety nine point seven
00:51
percent of the data is between negative three and three
00:53
Standard deviations on either side of the mean well the
00:56
regular normal curve has its peak located at the mean
00:59
Ex Bar and is marked off in units of the
01:01
standard deviation s right there That's what it looks like
01:04
Adding the standard deviation over and over to the right
01:06
and subtracting the standard deviation over and over to the
01:09
left But what makes it normal The fact that sixty
01:12
eight percent of all the data is between one standard
01:14
deviation on each side of the means that makes it
01:17
normal It's that sixty eight percent truism that makes it
01:20
a normal distribution Then ninety five percent of the data
01:23
is between two standard deviations on either side of the
01:25
mean That's another test for normalcy And ninety nine point
01:28
seven percent of the data is between the three Senate
01:30
aviation's on either side Another test That's a third test
01:33
You passed all three your normal well tons of things
01:35
in nature and from manufacturing and lots of other scenarios
01:38
are normally distributed like heights of adult males or weights
01:42
of snicker bars or the diameter of drink cup lids
01:46
or eleventy million other things Okay fun size Snickers have
01:50
a mean weight of twenty point Oh five grams of
01:52
the standard deviation of point seven two grams and the
01:55
weights are normally distributed What that gives us this distribution
01:58
of fun size Snickers Wait it's the height of the
02:00
graph At any point it's the likelihood of us getting
02:02
a candy bar of that specific weight dire the curve
02:04
at a point the greater the chance we get the
02:06
exact weight This means that the fun size snickers wait
02:09
we'll get the most often is that twenty point Oh
02:12
five grams size that is smack dab in the middle
02:14
Right there waits larger and smaller than that will be
02:17
less common in our Halloween candy haul Waits like seventeen
02:21
point eight nine grams are twenty two point two one
02:24
grams will be extremely rare because there's shofar from the
02:27
middle and are at a part of the curve where
02:29
we have a very small likelihood of getting those weights
02:32
So why should we even mess with the normal distribution
02:34
we already have by calculating Z scores to create a
02:37
standard normal distribution And well what the heck is a
02:39
Z score Anyway We'll answer the first question in just
02:42
a sec but a Z scores of value we calculate
02:45
that tells us exactly how far a specific data point
02:48
is from the mean measured in units of standard deviation
02:51
Z scores were a way to get an idea for
02:53
how larger small a data point is compared to all
02:56
the other data points in the distribution It's like getting
02:59
a measure of how fast a Formula One racecar is
03:02
compared not to regular beaters on the road but two
03:05
other Formula One race cars the Formula One cars obviously
03:08
faster than the Shmoop mobile here But is it faster
03:12
than other Formula One cars That's what really matters A
03:15
Z score will tell us effectively where that one Formula
03:18
One car ranks compared to all the other ones we
03:20
can speed test If it's got a large positive Z
03:23
score it's faster than many if not most of the
03:26
cars It has a Z score close to zero Well
03:28
then it's right in the middle The pack speed wise
03:30
If it's got a small negative Z score well it's
03:32
the turtle to the other cars Hairs Why would we
03:35
plot the Z scores instead of the scores themselves Well
03:38
because the process of standardizing or calculating the plotting of
03:41
the Z scores of the data points makes any work
03:44
we need to do with the distribution about ten thousand
03:46
times easier When we calculated plot the Z scores we
03:50
create a distribution that doesn't care anything about the context
03:53
of the problem or about the individual means or standard
03:56
deviations or whatever Effectively we create one single distribution that
04:01
works equally well for heights of people or weights of
04:04
candy bars or diameters of drink lids or lengths of
04:08
ring tailed Leamer taels If we don't standardize by working
04:12
with Z scores we must create a normal curve that
04:14
has different numbers for each different scenario And we have
04:17
to do new calculations for each scenario for each different
04:21
set of values So let's explore the important features of
04:24
the standard normal distribution and how it differs from all
04:27
the other regular normal distributions The standard normal curve and
04:31
the regular normal curve look identical in shape They just
04:36
differ in how the X axis this thing right here
04:38
is divided Let's walk through an example where we compare
04:41
how the normal distribution of the actual data and the
04:43
standard normal distribution for the sea Scores of the data
04:46
are created at the same time Okay What are we
04:48
gonna pick here Well let's pick narwhal tusks They're very
04:52
close to normal in their distribution with a mean length
04:55
of two point seven five meters and standard deviation of
04:57
point to three meters The regular normal distribution of Narwhal
05:01
Tusk links are narwhal distribution is that I think we'll
05:05
have the peak located above the mean of two point
05:07
seven five meters We'll need the Z score of a
05:09
data point representing the length of two point seven five
05:12
to start labeling the standard normal distribution the same way
05:15
we'll Z scores were found by subtracting the mean from
05:18
a data point and dividing that value by the standard
05:20
deviation of the data To find a Z score we
05:23
subtract the mean two point seven five from our data
05:25
point also two point seven five to get zero And
05:28
then we divide that by the standard deviation of point
05:30
two three while we get a Z score for that
05:32
middle value of zero Here's the same normal curve of
05:35
the Tusk clanks paired with the standard normal curve of
05:38
the Z scores Now for the tick marks on the
05:40
straight up Tusk link distribution Right there we add the
05:43
standard deviation of point two three three times to the
05:46
mean of two point seven five to get the tick
05:49
marks to the right of the meanwhile we just get
05:51
was that two point nine eight and then three point
05:53
two ones were adding point to three to it And
05:55
then another point that gets us three point four four
05:57
There we go and we repeat that procedure on the
06:00
left but subtracted three times So we get to point
06:02
five to two point two nine And then what is
06:05
that two point Oh six on the left Well to
06:07
get these same values on our standard normal curve we
06:10
need to find some more Z scores The first score
06:13
of the right of the mean is that a value
06:14
two point nine eight meters It Z score will be
06:16
found by taking two point nine eight and subtracting the
06:19
mean of two point seven five to get that point
06:20
to three and then dividing that by the standard deviation
06:23
of point two three while we get one See that's
06:25
kind of a little mini proof there The second take
06:28
mark to the right will be for data points at
06:30
three point two one meters Well when we subtract the
06:32
mean we get point four six which we divide by
06:35
point two three and get Z equals two and the
06:37
third take mark their works out similarly gets a C
06:40
equals three See there it is Things will work out
06:42
similarly but negatively on the other side on the laughed
06:44
when we do the same thing for tick marks Negative
06:47
one negative too And then there we go Negative three
06:50
Well let's look at the two curves together One is
06:52
specific to the data of narwhal Tusk flanks while the
06:55
other is standardized to represent the perfect normal curve usable
06:59
for all normal data regardless of context or the values
07:02
of the means or standard deviations So after standardizing does
07:07
the standard normal curve follow the empirical rule Yeah it's
07:11
a normal curve After all it's even in the name
07:14
standard normal curve See they kind of tipped me off
07:17
to those things They're still sixty eight percent of data
07:19
points between Negative one and one on the standard normal
07:21
curve There's still ninety five percent of the data pretty
07:23
negative two and two on the standard normal curve And
07:26
there's still ninety nine point seven ten of the day
07:27
to pretty negative three and three on standard normal curve
07:30
so getting back to the ten thousand times easier thing
07:33
Well it comes in when we try to answer questions
07:36
like how many of the gummy coded pretzel logs weigh
07:40
between twelve and fifteen grams So here's the set up
07:43
Gummy coated pretzel log weights are normally distributed with a
07:47
mean of thirteen point two grams and a Sarah deviation
07:50
of point seven eight grams We want to know what
07:52
percentage of pretzel logs that come out of the gummy
07:55
bear coding machine way between twelve and fifteen grams which
07:58
the company considers their ideal weight range and likely that
08:01
customers wouldn't complain and send them back for being too
08:04
little or too big If we don't standardize things by
08:06
finding the Z scores of our boundary values of twelve
08:09
and fifteen grand we'll need some kind of technology to
08:11
interpret our mean standard deviation and boundary values in terms
08:15
of the normal curve specific to this situation If we
08:17
change anything about the problem like the boundary values or
08:21
mean or standard deviation well then we'll have to re
08:24
input all the new data and start completely over And
08:27
that would suck On the other hand since we know
08:29
that data are already normally distributed While we can simply
08:33
standardize the two boundary values by calculating their Z scores
08:36
and use the majesty of the Z table this thing
08:39
to answer our questions which is a table telling us
08:42
what percentage of data lies to the left or right
08:45
of an easy score across the whole standard normal distribution
08:49
Many lives were lost and billions of dollars were spent
08:52
Teo build this thing so you know you gotta respect
08:54
it not to put too fine a point on it
08:56
but if we don't standardize dizzy scores we need to
08:58
use a unique normal curve and unique calculations every single
09:02
time we work with those situations But if we do
09:05
standardized to Z scores we just need to check the
09:07
one table for every situation It's like choosing to go
09:10
to a different store every time we need a different
09:13
product or going toe one store that has all of
09:15
them in one place like you'd rather go to Safeway
09:18
than just the broccoli store and then the egg store
09:21
and then the milk store right So let's calculate our
09:23
two Z scores for our boundary values and then check
09:26
the Z Table to get our percentage of pretzel logs
09:28
in the sweet spot that twelve to fifteen range thing
09:31
What will take first data point twelve and subtract the
09:33
mean weight of thirteen point to giving us negative one
09:36
point two grams and then divide that by the standard
09:38
deviation of point seven eight which gives us a Z
09:40
score there of negative one point five three eight Then
09:42
we'll take the second data point fifteen subtract that mean
09:45
of thirteen point two to get one point eight then
09:47
divide that value by our standard deviation of point seven
09:50
eight to get his E score of two point three
09:51
eight Well there are two different kinds of ze tables
09:54
One shows the area to the left of a specific
09:57
Z score The other shows the area to the right
10:00
They both give the same info just so we'll use
10:03
a left ze table A Siri's of Z scores accurate
10:07
to the tense place runs down the left hand side
10:09
and the hundreds place for each of those e scores
10:11
runs across the top Well the percentage of data to
10:14
the left of a specific Z score can be found
10:16
at the intersection of a row and a column bullied
10:18
around both our Z scores to the hundreds Place negative
10:21
one point five four and then two point three one
10:24
respectively in order to locate a percentage of data to
10:27
the left of each one Well we'll go down to
10:29
the negative one point five row then across to the
10:32
column here headed by the negative zero point zero four
10:35
where negative one point five Avenue intersects with negative zero
10:38
point zero four street and we find a percentage of
10:41
data to the left of Z equals negative one point
10:44
five four of zero point zero six one seven eight
10:48
This thing Well well then head way down to the
10:51
two point three boulevard then across to the point zero
10:53
one road they cross at point nine eight nine five
10:57
six So now what What do we do with these
10:59
Two percentage is well glad you asked We know the
11:01
percentage of data to the left of our fifteen grand
11:03
upper boundary Which is that a Z score of two
11:06
point three one We also know the area to the
11:08
left of our twelve Graham lower boundary at a Z
11:10
score of negative one point five four announced time to
11:13
merge those two areas Check the area to the left
11:16
of the Z score of two point three one on
11:18
the standard normal curve This is the percentage of data
11:20
to the left of that value Now check the area
11:23
to the left of it Z score of negative one
11:25
point five four on the same standard normal curve Well
11:28
this is the percentage of data to the left of
11:30
that value If we cut away the area to the
11:32
left of Z equals negative one point five four or
11:35
left with the area here between Z equals negative one
11:38
point five for ends e equals two point three one
11:40
This is the percentage of data between these two values
11:44
and you're looking at this really heavily to be sure
11:46
that you got enough in that general sweet spot range
11:49
They don't get a whole lot of returns from angry
11:50
customers Well we just need to subtract the point Oh
11:53
six one seven eight from the point nine eight nine
11:55
five six to get the percentage of data between those
11:57
two values which is yes about ninety three percent so
12:01
What does that mean Well that means ninety three percent
12:03
of the gum encoded pretzel logs produced will be between
12:06
twelve and fifteen grams in weight And that's either good
12:08
news or not Well a couple of important safety tips
12:12
though Before you all head out to the store for
12:14
some more gumming coded pretzel log We should on Lee
12:16
try to standardize I'ii do things with Z scores if
12:19
the data are normal in shape to begin with If
12:22
they're not the data Maki nations here will be useless
12:24
to you Make sure you're paying attention to what kind
12:26
of ze table you have again Some show areas to
12:29
the left while others give areas to the right and
12:32
specific Z scores Every time you've got a set of
12:35
normally distributed data you should standardize the situation by finding
12:39
Z scores And while you'll save yourself a ton of
12:42
work in the long run what least tons of stats
12:44
work if we can't help you Sorry I do
Full Transcript