Who doesn't like pictures? Nobody! There's only one little thing we need to cover before we get to the good stuff, though:
One simple way to look at single-variable data is to write it down in an ordered list. This helps us see the highest and lowest values, count how many pieces of data there are, and get an idea of what the values are. However, since lists can become long and unwieldy, especially when we start writing down a list of all the places we've never been to outside of our home town, it can be helpful to use other methods of looking at data.
A stem is something from which stuff grows. We can think of a digit like a stem: if we write down a single digit, each choice for the digit we write next "grows'' us a new number. If it doesn't seem to be growing, you can try watering it, but don't go crazy. We don't want to drown the poor things.
For example, we could use the digit 3 as a stem from which to "grow" either the number 31 or the number 35:
A stem and leaf plot is a fancy-shmancy sort of table that captures this idea of ''growing'' different numbers out of the same stem.
Let's forge ahead and build a couple. Since we're building numbers instead of actual plants, we're not playing God so much as we're, um, playing Euclid.
In the simplest case, our data consists of two-digit numbers.
Build a stem-and-leaf plot for the list
32, 34, 35, 36, 37, 45, 46, 47, 48, 55, 56, 57.
The stems of the numbers in the list are the digits that occur in the tens place: 3, 4, and 5.
32, 34, 35, 36, 37, 45, 46, 47, 48, 55, 57
We start our stem-and-leaf plot by putting these numbers into a table:
Now, for each stem we write down its leaves; that is, the digits that showed up in the ones place with that stem. For the stem 3, the leaves are 2, 4, 5, 6, and 7:
32, 34, 35, 36, 37, 45, 46, 47, 48, 55, 56, 57.
We put these into the table by the appropriate stem:
Similarly, we find the leaves for the stem 4:
32, 34, 35, 36, 37, 45, 46, 47, 48, 55, 56, 57
And enter them in the table:
Finally, we find the leaves for the stem 5:
32, 34, 35, 36, 37, 45, 46, 47, 48, 55, 56, 57
And enter those in the table:
The only thing left is to write a down a note at the bottom (formally called a key) that explains how we use the stem and the leaf to "grow'' a number. This note is not the appropriate place for you to write something like "Mark likes Mandy" or "I love your shoes." Let's keep it professional, people.
key: 3|2 = 32
Our data could also consist of numbers with decimal points. Oh joy, right?
Build a stem-and-leaf plot for the following list:
0.4, 1.2, 3.4, 3.6, 3.7, 0.5, 2.3
Very first thing, let's put the list in order:
0.4, 0.5, 1.2, 2.3, 3.4, 3.6, 3.7
Now we find the stems:
0.4, 0.5, 1.2, 2.3, 3.4, 3.6, 3.7
We use the stems to start our stem-and-leaf plot:
Now we find the leaves for each stem. First we find the leaves for 0:
0.4, 0.5, 1.2, 2.3, 3.4, 3.6, 3.7
And we put them in the plot:
We do the same for the other stem, filling in the rest of our table-like thing:
Finally, we make a key that explains how the numbers grow from a stem and a leaf:
key: 0|4 = 0.4
Be careful: Remember to make a key for your stem and leaf plot. Otherwise, we won't know if 1|5 means 1.5 or 15. This distinction makes a huge difference, especially if we're cooking. We suspect our dough does not require 15 cups of flour.
One other fine point: in the table, it's nice to write the leaves in numerical order. If we start by ordering the list of numbers, we don't need to think too hard about it. Then our mind can be free to wander and think about other things, like whether or not we left the air conditioning running. Shoot!
There's a fun, interactive stem-and-leaf tutorial here. If you ever get bored of it, head back on over here and finish reading this unit. In the meantime, we'll miss you.
Remember the difference between discrete and continuous data: discrete data has clear separation between the different possible values, while continuous data doesn't. We use bar graphs for displaying discrete data, and histograms for displaying continuous data. A bar graph has nothing to do with pubs, and a histogram is not someone you hire to show up at your friend's front door and sing to them about the French Revolution.
We'll do bar graphs first.
The colors of students' backpacks are recorded as follows:
red, green, red, blue, black, blue, blue, blue, red, blue, blue, black
Draw a bar graph for this data.
The values observed were red, green, blue, and black. If we had bins marked "red,'' "green,'' "blue,'' and "black,'' we could sort the backpacks into the appropriate bins. After doing this, how many backpacks would be in each bin?
Looks like most of these kids are singin' the blues.
A bar graph is a visual way of displaying what the data looks like after it's been sorted. We start with axes. The graph kind, not the "I want to be a lumberjack" kind. The horizontal axis has the names of the bins (in this case, the colors), and the vertical axis is labelled "Quantity":
Above each bin (color) we put a bar whose height is the number of things in that bin (backpacks of that color):
Since we're super fancy, we'll also color the bars different colors.
The advantage of doing this is that you won't wear your black crayon down to its nub and be left with a box full of unused colors. Time to let "chartreuse" see the light of day. If you're not using crayons, please disregard.
The bar graph shows what things would look like if we stacked up the backpacks in their respective bins. You'll need to imagine the zippers and pockets though. We won't be drawing those in the graph.
Check this out for a silly—yet oddly entertaining—way to comprehend bar graphs.
A fruit basket contains four kiwis, eight apples, six bananas, and one dragonfruit.
Draw a bar graph for the kinds of fruit in the basket. If the dragonfruit gets angry, just feed it one of the kiwis. They love New Zealanders.
If we sort the fruit into bins, we'll need bins for kiwis, apples, bananas, and dragonfruit, so the bar graph axes will look like this:
Sorting the fruit into the appropriate bins gives us this bar graph:
A bar graph makes it easy to see at a glance which bin has the most objects and which bin has the fewest objects. By the way, don't bother going to your local Target and trying to get more bins. We've cleaned them out.
A survey was conducted of 79 kids at a local swimming pool to find their favorite hot-weather refreshment. Use the bar graph below to answer the questions.
Now for histograms. Histograms are used for displaying data where the separation between "bins" is not so clear, and we need to make decisions about what bins we'd like to use. We hate making decisions though, which is why we're going to keep our Magic 8-Ball within reach, just in case.
A bunch of fish were caught in a lake. Don't worry, we'll throw them back in after we're done with this example. The lengths of the fish, in inches, were
6.2, 6.2, 6.55, 7, 7.4, 8.5, 8.6, 9, 9.1, 9.2, 9.25, 9.3, 10.4, 10.5, 10.6.
Make some histograms to display this data.
Let's make a histogram where the first bin contains all fish that are at least 6 but not more than 7 inches, the next contains all fish that are at least 7 but not more than 8 inches, and so on. That way, we won't have any long fish in with any short fish, and we don't need to worry about any of them getting inch envy. How many fish are in each bin?
To draw the histogram, we draw something that looks very much like a bar graph. The axes are labelled similarly, and the height of each bar corresponds to the number of fish in the bin. The bars are touching because there is no clear separation between different bins (there's not much difference between a fish 7 inches long and a fish 6.999 inches long, although one of them does get to go on a few more amusement park rides).
The bins we picked for the histogram were arbitrary. We could just as well draw a histogram where the bins held fish that were 6 to 6.5 inches, 6.5 to 7 inches, and so on. We just don't like to use half-inches if we can help it, because we don't want to encourage decimal points. They already insert themselves far too often.
We use the word interval to refer to the "size'' of the bins in a histogram. In the first histogram we drew for the fish lengths, the bins were 1 inch (6 to 7 inches, 7 to 8 inches, etc.). In the second histogram the bins were inch.
As the interval becomes smaller, the histogram gives a closer, more precise picture of what the data looks like. In the histogram with interval 1 inch, we could see that there were 5 fish between 9 and 10 inches. In the histogram with interval inch, we see that those 5 fish are actually between 9 and 9.5 inches. The smaller the interval we use for the histogram, the more we know about the data from the picture. If we were to use a large enough interval, we could throw all the fish into a single bin and let someone else figure it out. Wouldn't make our graph particularly interesting or tell us very much about the fish, though.
There's nothing particularly special about intervals of 1 or . We could use other intervals just as well.
Remember: when putting data into the bins, if a number is on the boundary between two bins, we put it into the bin on the right. If the bins are 6 to 7, and 7 to 8, then the number 7 goes in the bin from 7 to 8. It can laugh all it wants at its 6.999 friend from there.
Now that we've built both bar graphs and histograms, let's revisit the differences between them. Some differences are more important than others. The fact that bar graphs drink Coke while histograms prefer Pepsi, for example, is irrelevant.
Pie Charts, also known as circle graphs, are ways of displaying the proportions, or percentages, of data that fall into different categories. It makes sense that these graphs are useful for displaying categorical data.
To make a pie chart, we start with a circle and cut it into slices like a pizza. Yeah, it would probably make more sense if we went with a "pie" analogy, but where's the fun in being predictable?
If half the data (50%) falls into one category, its corresponding slice will be half (50%) of the pizza. If one-eighth of the data falls into one category, its corresponding slice will be one-eighth of the pizza, and so on. Our advice is not to invite too much data to your house when you're throwing a party, so that you can keep most of the pizza for yourself.
Thinking about backpacks helps us get to our happy place, so we'll do so once again. Suppose students had the following colors of backpacks:
red, blue, red, red, red, blue, green, green
There are 8 backpacks total. Half the backpacks are red, one quarter are blue, and one quarter are green. We can represent this by the following circle graph:
In this graph, we colored each individual slice of the graph with the backpack color it represented. We could instead have a key that explains what each color means. It may not be necessary in this example, but sometimes you'll have super-duper thin slices, so it's a good idea to get some practice. If, on the other hand, one of your special skills is the ability to replicate classic works of art on the head of a pin, you may have no need for a key.
If you're making pie charts and you don't have different-colored writing utensils handy, instead of making the pizza slices different colors you can just label each slice with the thing it represents. The important thing is to be able to tell the slices apart. If that means drawing a different Black Eyed Pea inside each one, so be it. Whatever works for you. Just make sure Fergie has the biggest slice. You know she'll get all "diva" about it.
A box and whisker plot for a list of numbers is a picture built on a number line that uses five numbers: the lowest and highest values in the list, and the quartiles Q1, Q2, and Q3. Try not to shave for two days before attempting one of these plots, since we'll need your whiskers to be nice and pronounced.
The picture will usually look something like this: a box with a line down the center, and two "whiskers." Oh. That's where it comes from.
There are five labeled points in a box and whisker plot. These dudes correspond to the five numbers we mentioned above (remember, Q2 is the same thing as the median):
Having a scale is important so we can put our labels in the right places. It's also important for helping us determine whether this "fruit and nut diet" is working.
The best way to get used to these pictures is to go ahead and build some. It's doodle time.
Draw a box and whisker plot for the following data:
10, 11, 12, 15, 15, 16
This list is already in order, so we don't need to reorder it. The lowest value is 10, the highest value is 16, and the quartiles are
Q1 = 11
Q2 = 13.5
Q3 = 15.
We'll start the box-and-whisker plot with a scale. The scale needs to go at least from 10 to 16 (since those are our lowest and highest data values), and we'll leave enough room for increments of 0.5 since one of our numbers (Q2) is 13.5. It's all about the halves and the half-nots.
Next we find the lowest and highest values on the number line, and draw dots above each. We also draw vertical lines for the three quartiles:
We connect the lines above the quartiles with a box:
And finally draw the whiskers to the outermost points:
Box and whisker plots give a visual idea of "where the data is." Half the numbers always fall within the box, and half the numbers fall outside the box.
Of the numbers within the box, half are to the left side of the dividing line, and half are to the right side. By dividing everyone up evenly like this into separate cells, there will be less chance of a revolt or a riot. You're just trying to keep some order around here.
Take a look at this box and whisker plot:
Half of the data is within the box, between the numbers 1 and 4. Half the data is outside the box, between 0 and 1 or between 4 and 6.
Looking within the box, one quarter of the total data falls between the numbers 1 and 1.5, and one quarter between 1.5 and 4. We have the same number of values squished into the space between 1 and 1.5 as we do in the space between 1.5 and 4. The latter values simply have a bit more room in which to stretch out and relax.
The length of the whiskers gives us a sense of how far away the lowest and highest values are from the rest of the data. In this picture, the lowest and highest values are very close to the rest of the data:
In this picture, the lowest and highest values are far from the rest of the data:
When the whiskers are super far out there, something doesn't seem quite right. A variation of a box plot adjusts for the presence of outliers and extreme values. In the real world, these would be equivalent to "rebels," "hippies," or "Harvard graduates." Outliers and extreme values are numbers that are far away from most of the other numbers. For example, if the scores on a test were
13, 72, 73, 85, 86, 87, 89
then the number 13 is really far away from the other scores. If only one student got a remarkably bad score on the test, maybe we shouldn't consider that score when deciding the letter grades for the other students. In fact, maybe we shouldn't acknowledge that that student exists at all. Do you hear a sound, class? Is that someone talking? Hm...guess somebody let a fly in...
To find the outliers and extreme values for the sake of the box and whisker plot, we first need to find the interquartile range (IQR). "Interquartile range" may sound like some fabricated piece of Star Trek vernacular ("Captain...we need a reading on our interquartile range!"), but it is actually a real thing. On the box and whisker picture, the interquartile range is the width of the box. In symbols, the interquartile range is
IQR = Q3 – Q1.
In this box and whisker plot, the interquartile range is 3 – 1 = 2.
Meanwhile, if the quartiles are
Q1 = 4
Q2 = 9
Q3 = 10
then the interquartile range is
IQR = Q3 – Q1
= 10 – 4
For a box and whisker plot, we start at the edges of the box, go 1.5 times the width of the box in either direction, and draw the inner fences. These are the electrified ones that will keep out any unwanted, snooping values.
Numbers that are within the inner fences are considered "reasonable.'' The whiskers go to the farthest numbers that are within the fences.
Then we go out a distance of 1.5(IQR) again from the inner fences, and draw the outer fences. These are more for show and to make the neighbors jealous.
Numbers between the inner and outer fences are outliers, and numbers outside the outer fences are extreme values. Sometimes these guys are marked on a box and whisker plot, but sometimes they're not. Depends on who you ask.
In this plot, the inner fences are the red lines:
We're still trying to convince the mathematical community to rename outliers "ultrawhiskers," but it hasn't caught on yet. We'll keep you in the loop on that one.
Sometimes the poor box doesn't get all its whiskers. Even boxes aren't immune from alopecia. Let's draw a box and whisker plot for the numbers
13, 72, 73, 85, 86, 87, 89.
First we find the quartiles:
Q1 = 72
Q2 = 85
Q3 = 87
Then we nab the interquartile range:
IQR = 87 – 72 = 15
Multiply that by 1.5 to get the inner fences:
1.5(IQR) = 22.5
Now we can draw the box:
We can draw a whisker to 89. Since the smallest value within the box is 72, there's nothing to draw a whisker to on the left-hand side of the box. We draw a circle for the extreme value 13. Later, if you're bored, you can turn that circle into a pie chart. Beats sitting around doing nothing. Barely.
Dr. Math gives us another example here. You can trust him. He's a doctor.
We say the fences are at distances of 1.5(IQR) from the box because Tukey told us to, and it's worked pretty well in practice.