# Probability and Statistics

### Topics

Remember the difference between discrete and continuous data: discrete data has clear separation between the different possible values, while continuous data doesn't. We use bar graphs for displaying discrete data, and histograms for displaying continuous data. A bar graph has nothing to do with pubs, and a histogram is not someone you hire to show up at your friend's front door and sing to them about the French Revolution.

We'll do bar graphs first.

### Sample Problem

The colors of students' backpacks are recorded as follows:

red, green, red, blue, black, blue, blue, blue, red, blue, blue, black.

Draw a bar graph for this data.

The values observed were red, green, blue, and black. If we had bins marked "red,'' "green,'' "blue,'' and "black,'' we could sort the backpacks into the appropriate bins. After doing this, how many backpacks would be in each bin?

Looks like most of these kids are singin' the blues.

A bar graph is a visual way of displaying what the data looks like after it's been sorted. We start with axes. The graph kind, not the "I want to be a lumberjack" kind. The horizontal axis has the names of the bins (in this case, the colors), and the vertical axis is labelled "Quantity":

Above each bin (color) we put a bar whose **height** is the **number** of things in that bin (backpacks of that color):

Since we're super fancy, we've also color the bars different colors.

The advantage of doing this is that you won't wear your black crayon down to its nub and be left with a box full of unused colors. Time to let "chartreuse" see the light of day. If you're not using crayons, please disregard.

The bar graph shows what things would look like if we stacked up the backpacks in their respective bins. You'll need to imagine the zippers and pockets though. We won't be drawing those in the graph.

Check this out for a silly—yet oddly entertaining—way to comprehend bar graphs.

### Sample Problem

A fruit basket contains four kiwis, eight apples, six bananas, and one dragonfruit.

Draw a bar graph for the kinds of fruit in the basket. If the dragonfruit gets angry, just feed it one of the kiwis. They love New Zealanders.

If we sort the fruit into bins, we'll need bins for kiwis, apples, bananas, and dragonfruit, so the bar graph axes will look like this:

Sorting the fruit into the appropriate bins gives us this bar graph:

Here's a slightly different sort of bar graph, where we compare a small number of observations. We're still making bins and putting things in the bins. Don't bother going to your local Target and trying to get more bins. We've cleaned them out.

A bar graph makes it easy to see at a glance which bin has the most objects and which bin has the fewest objects. By the way, you'd better have some large bins on hand if you're planning to put entire companies into them. We're talking industrial size.

### Sample Problem

A survey was conducted of 79 kids at a local swimming pool to find their favorite hot-weather refreshment. Use the bar graph below to answer the questions.

- How many kids said ice cream was their favorite hot weather refreshment?

- Which refreshment was the favorite of exactly 20 kids?

- Which was more popular, popsicles or ice water?

- How many kids said hot coffee was their favorite hot weather refreshment?

Answers:

- 40, since that's the height of the bar in the "ice cream" bin.

- Soda, since there are 20 kids in the "soda" bin.

- Popsicles, since the bar above "popsicles" is slightly taller than the bar above "ice water."

- Only one, but we felt like he was probably messing with us, so we threw out that result.

Now for histograms. Histograms are used for displaying data where the separation between "bins" is not so clear, and we need to make decisions about what bins we'd like to use. We hate making decisions though, which is why we're going to keep our Magic 8-Ball within reach, just in case.

### Sample Problem

A bunch of fish were caught in a lake. Don't worry, we'll throw them back in after we're done with this example. The lengths of the fish, in inches, were

6.2, 6.2, 6.55, 7, 7.4, 8.5, 8.6, 9, 9.1, 9.2, 9.25, 9.3, 10.4, 10.5, 10.6.

Make some histograms to display this data.

Let's make a histogram where the first bin contains all fish that are at least 6 but not more than 7 inches, the next contains all fish that are at least 7 but not more than 8 inches, and so on. That way, we won't have any long fish in with any short fish, and we don't need to worry about any of them getting inch envy. How many fish are in each bin?

To draw the histogram, we draw something that looks very much like a bar graph. The axes are labelled similarly, and the height of each bar corresponds to the number of fish in the bin. The bars are touching because there is no clear separation between different bins (there's not much difference between a fish 7 inches long and a fish 6.999 inches long, although one of them *does* get to go on a few more amusement park rides).

The bins we picked for the histogram were arbitrary. We could just as well draw a histogram where the bins held fish that were 6 to 6.5 inches, 6.5 to 7 inches, and so on. We just don't like to use half-inches if we can help it, because we don't want to encourage decimal points. They already insert themselves far too often.

We use the word **interval** to refer to the "size'' of the bins in a histogram. In the first histogram we drew for the fish lengths, the bins were 1 inch (6 to 7 inches, 7 to 8 inches, etc.). In the second histogram the bins were inch.

As the interval becomes smaller, the histogram gives a closer, more precise picture of what the data looks like. In the histogram with interval 1 inch, we could see that there were 5 fish between 9 and 10 inches. In the histogram with interval inch, we see that those 5 fish are actually between 9 and 9.5 inches. The smaller the interval we use for the histogram, the more we know about the data from the picture. If we were to use a large enough interval, we could throw all the fish into a single bin and let someone else figure it out. Wouldn't make our graph particularly interesting or tell us very much about the fish though.

There's nothing particularly special about intervals of 1 or . We could use other intervals just as well.

**Remember: **when putting data into the bins, if a number is on the boundary between two bins, we put it into the right one. If the bins are 6 to 7, and 7 to 8, then the number 7 goes in the bin from 7 to 8. It can laugh all it wants at its 6.999 friend from there.

Now that we've built both bar graphs and histograms, let's revisit the differences between them. Some differences are more important than others. The fact that bar graphs drink Coke while histograms prefer Pepsi, for example, is irrelevant.

Bar graphs

- are drawn to display discrete data.

- have "bins'' that can be figured out easily by looking at the data (colors, or genres, or dollars).

- usually have white space between the bars, but the bars can be touching (not super important).

- can be used to compare a small number of values (allowance per person, or income per company).

Histograms

- are drawn to display continuous data.

- can have whatever "bins'' you want, depending on how detailed you want the histogram to be.

- have their bars touching; might be a good idea to use bar sanitizer to prevent the spread of germs.