Standard Normal Distribution

  

The standard Normal distribution is a distribution of the z-scores of the data points from a Normal distribution.

Okay, but why do we need to create a new Normal distribution? Wasn’t the normal distribution we already had…good enough?

Before we explain why the standard Normal distribution is such a huge improvement on the plain ol’ Normal distribution, we need a quick recap of the original.

A Normal distribution, or Normal curve, is a continuous bell-shaped distribution that follows the Empirical Rule, which says that 68% of the data is between -1 and 1 standard deviations on either side of the mean…95% of the data is between -2 and 2 standard deviations on either side of the mean...and 99.7% of the data is between -3 and 3 standard deviations on either side of the mean.

The regular Normal curve has its peak located at the mean (x bar) and is marked off in units of the standard deviation (s), adding the standard deviation over and over to the right, and subtracting the standard deviation over and over to the left.

What makes it “Normal”? The fact that 68% of all the data is between one standard deviation on each side of the mean. Then 95% of the data is between two standard deviations on either side of the mean. And 99.7% of the data is between three standard deviations on either side of the mean.

Tons of things in nature and from manufacturing and from lots of other scenarios are Normally distributed. Like...heights of adult males, or weights of Snickers bars, or the diameter of drink cup lids, or eleventy-million other things.

Fun-Size Snickers have a mean weight of 20.05 grams with a standard deviation of 0.72 grams, and the weights are Normally distributed. That gives us this distribution of Fun-Size Snickers weights. The height of the graph at any point is the likelihood of us getting a candy bar of that specific weight. The higher the curve at a point, the greater chance we get that exact weight. This means that the Fun-Size Snickers weight we’ll get the most often is that 20.05 grams size that is smack dab in the middle.

Weights larger and smaller than that will be less common in our Halloween candy haul. Weights like 17.89 grams or 22.21 grams will be extremely rare, because they’re so far from the middle, and are at a part of the curve where we have a very small likelihood of getting those weights.

So why should we even mess with the Normal distribution we already have by calculating z-scores to create a standard Normal distribution, and what the heck is a z-score anyway?

We’ll answer the first question in just a sec, but a z-score is a value we calculate that tells us exactly how far a specific data point is from the mean measured in units of the standard deviation. Z-scores are a way to get an idea for how large or small a data point is compared to all the other data points in the distribution. It’s like getting a measure of how fast a Formula 1 race car is compared to other Formula 1 cars. The Formula 1 car is obviously faster than the Shmoopmobile, a 1987 Pacer, but is it faster than other Formula 1 cars? That’s what really matters.

A z-score will tell us effectively where that one Formula 1 car ranks compared to all the other ones we can speed test. If it’s got a large positive z-score, it’s faster than many, if not most, of the cars. If it has a z-score close to zero, then it’s in the middle of the pack speed-wise. If it’s got a small negative z-score, it’s the turtle to the other cars’ hares.

Why would we plot the z-scores instead of the scores themselves? Well, because the process of standardizing, or calculating and plotting the z-scores of the data points, makes any work we need to do with the distribution about ten thousand times easier.

When we calculate and plot the z-scores, we create a distribution that doesn’t care anything about the context of the problem, or about the individual means, or standard deviations, or whatever. Effectively, we create one, single distribution that works equally well for heights of people or weights of candy bars or diameters of drink lids or lengths of ring-tailed lemurs' tails.

If we don’t standardize by working with z-scores, we must create a Normal curve that has different numbers for each different scenario. And we have to do new calculations for each scenario for each different set of values.

Anytime...every time...you’ve got a set of Normally distributed data, you should standardize the situation by finding z-scores. You’ll save yourself tons of work.

Well...at least tons of stats work. We can’t help you with the lawn-mowing.

Find other enlightening terms in Shmoop Finance Genius Bar(f)