Probability and Statistics
Box and Whisker Plots
A box and whisker plot for a list of numbers is a picture built on a number line that uses five numbers: the lowest and highest values in the list, and the quartiles Q1, Q2, Q3. Try not to shave for two days before attempting one of these plots, since we'll need your whiskers to be nice and pronounced.
The picture will usually look something like this: a box with a line down the center, and two "whiskers." Oh. That's where it comes from.
There are five vertical lines in a box and whisker plot. These vertical lines correspond to the five numbers we mentioned above:
Having a scale is important so we can put the vertical lines in the right places. It's also important for helping us determine whether this "fruit and nut diet" is working.
The best way to get used to these pictures is to go ahead and build some. It's doodle time.
Draw a box and whisker plot for the following data:
10, 11, 12, 15, 15, 16
This list is already in order, so we don't need to reorder it. The lowest value is 10, the highest value is 16, and the quartiles are
Q1 = 11
Q2 = 13.5
Q3 = 15.
We'll start the box-and-whisker plot with a scale. The scale needs to go at least from 10 to 16 (since those are our lowest and highest data values), and we'll mark off increments of 0.5 since one of our numbers (Q2) is 13.5. It's all about the halves and the half-nots.
Next we find the lowest and highest values and the quartiles on the number line, and draw vertical lines above each:
We connect the lines above the quartiles with a box:
and finally draw the whiskers to the outermost lines:
Box and whisker plots give a visual idea of "where the data is." Half the numbers fall within the box, and half the numbers fall outside the box.
Of the numbers within the box, half are to the left side of the dividing line, and half are to the right side. By dividing everyone up evenly like this into separate cells, there will be less chance of a revolt or a riot. You're just trying to keep some order around here.
Take a look at this box and whisker plot:
Half of the data is within the box, between the numbers 1 and 1.5. Half the data is outside the box, between 0 and 1 or between 4 and 6.
Looking within the box, one quarter of the total data falls between the numbers 1 and 1.5, and one quarter between 1.5 and 4. We have the same number of of values squished into the space between 1 and 1.5 as we do in the space between 1.5 and 4. The latter values simply have a bit more room in which to stretch out and relax.
The length of the whiskers gives us a sense of how far away the lowest and highest values are from the rest of the data. In this picture, the lowest and highest values are very close to the rest of the data:
In this picture, the lowest and highest values are far from the rest of the data:
When the whiskers are super far out there, something doesn't seem quite right. A variation of a box plot adjusts for the presence of outliers and extreme values. In the real world, these would be equivalent to "rebels," "hippies," or "Harvard graduates." Outliers and extreme values are numbers that are far away from most of the other numbers. For example, if the scores on a test were
13, 72, 73, 85, 86, 87, 89
then the number 13 is really far away from the other scores. If only one student got a remarkably bad score on the test, maybe we shouldn't consider that score when deciding the letter grades for the other students. In fact, maybe we shouldn't acknowledge that that student exists at all. Do you hear a sound, class? Is that someone talking? Hm...guess somebody let a fly in...
To find the outliers and extreme values for the sake of the box and whisker plot, we first need to find the interquartile range (IQR). "Interquartile range" may sound like some fabricated piece of Star Trek vernacular ("Captain...we need a reading on our interquartile range!"), but it is actually a real thing. On the box and whisker picture, the interquartile range is the width of the box; in symbols the interquartile range is
IQR = Q3 – Q1.
In this box and whisker plot, the interquartile range is 3 – 1 = 2.
Meanwhile, if the quartiles are
Q1 = 4
Q2 = 9
Q3 = 10
then the interquartile range is
IQR = Q3 – Q1
= 10 – 4
For a box and whisker plot, we start at the edges of the box, go 1.5 times the width of the box in either direction, and draw the inner fences. These are the electrified ones that will keep out any unwanted, snooping values.
Numbers that are within the inner fences are considered "reasonable.'' The whiskers go to the farthest numbers that are within the fences.
Then we go out a distance of 1.5(IQR) again from the inner fences, and draw the outer fences. These are more for show and to make the neighbors jealous.
Numbers between the inner and outer fences are outliers, and numbers outside the outer fences are extreme values. We use an asterisk to mark outliers, and an empty circle to mark extreme values. Don't fill in your circle, or we won't know what you're trying to tell us.
We draw asterisks for the outliers (values between the inner and outer fences), and empty circles for the extreme values (values outside the outer fences), and we're done:
We are still trying to convince the mathematical community to rename outliers "astewhiskers," but it hasn't caught on yet. We'll keep you in the loop on that one.
Sometimes the poor box doesn't get all its whiskers. Even boxes aren't immune from alopecia. Let's draw a box and whisker plot for the numbers
13, 72, 73, 85, 86, 87, 89.
First we find the quartiles:
Q1 = 72
Q2 = 85
Q3 = 87.
We can find
IQR = 87 – 72 = 15
1.5(IQR) = 22.5.
Now we can draw the box and fences:
We can draw a whisker to 89. Since the smallest value within the inner fences is 72, there's nothing to draw a whisker to on the left-hand side of the box. We draw a circle for the extreme value 13. Later, if you're bored, you can turn that circle into a pie chart. Beats sitting around doing nothing. Barely.
Dr. Math gives us another example here. You can trust him. He's a doctor.
We draw the fences at distances of 1.5(IQR) from the box because Tukey told us to, and it's worked pretty well in practice.