# High School: Statistics and Probability

### Interpreting Categorical and Quantitative Data HSS-ID.A.1

1. Represent data with plots on the real number line (dot plots, histograms, and box plots).

Statistics is all about data. Collecting, then analyzing, then making guesses based on previously collected data, then comparing to see if the predictions were accurate. Then collecting more data and analyzing it some more. Evidently, a statistician's work is never done.

Fortunately (or unfortunately) for your students, statisticians have come up with many different ways to represent this data. That way, they don't have to look at and try to make sense of an endless and ever-growing table of numbers.

Students should be comfortable with representing data on the real number line in the forms of dot plots, histograms, and box plots. Quite obviously, that means they should know the difference between them.

A dot plot is a diagram that represents a data set using dots over the number line. A histogram is a diagram that shows a data set as a series of rectangles that shows how often data occur within a given interval. A box plot, also called a box and whisker plot, is a diagram that shows a data set as a distribution along the number line, divided into four equal parts using the median (the middle data value) and the upper and lower quartiles (median of upper and lower half of data, respectively).

But why yammer on about these different plots when we can show you exactly what we mean? The following table shows how fast Michael Phelps, one of the world's greatest Olympic swimmers, can swim the 200-meter freestyle event (rounded to the nearest second).

 103 105 103 103 103 105 106 108 106 106 108 107

Students should know that to create a dot plot of Michael Phelps 200-meter freestyle times, they should focus on the portion of the number line that covers the data points. Looking at the data given above, we need to include numbers from 100 to 110.

Now, all we need to do is place a dot on the appropriate number for each data point with that number. For instance, since only one of his times was 107 seconds, we place only one dot on the number line at 107. Since 108 seconds occurs twice in our data table, we place two points, one on top of the other, on the number line at 108. Eventually, our dot plot should look something like this.

To create a histogram of the Michael Phelps data, students should create a chart with the time on the x-axis (horizontal axis) and count (or frequency) on the y-axis (vertical axis). A rectangle is drawn the width of each interval with a height equal to the count for that time. For example, drawing the rectangle for 103 seconds yields the following:

Now we can complete the histogram for the rest of the data.

One important feature of a histogram is that the rectangles don't care much for personal space. They're touching because they represent intervals rather than specific numbers. After all, time is continuous, right? For this reason, histograms are particularly useful for large ranges of data.

One last way students can visually represent Michael Phelps's 200-meter freestyle times is using a box plot. This type of plot divides the data into four equal parts using quartiles (a value that divides the data set into groups with equal number of data points). In the case of the given data, 12 data points are provided so each quartile will contain 3 data points. To find the quartiles, its best to first sort the given data from smallest to largest. In the case of the data we have been working with, this yields:

 103 103 103 103 105 105 106 106 106 107 108 108

Now, students should find the values for each of the three quartiles. In the case when there are an even number of data points, the value of the median is calculated as the average of the 2 middlemost numbers. For the above data, that yields 105.5.

To determine the lower quartile, we need to find the value that has 9 values above and 3 values below. In this case, the value will be 103. Similarly, the upper quartile is 106.5. Again these values are determined by taking the average value of the 3rd and 4th (for the lower quartile) and the 7th and 8th (for the upper quartile) values.

To begin to draw the box and plot diagram, draw the number line that covers the range of the values and draw a vertical line at the location of each quartile as shown:

If we connect these lines, we have our box.

The whisker part of the "box and whisker plot" comes in just after puberty. We're only joking. We can add in two more pieces of information: the minimum and maximum value. Draw one data point at the minimum value and another at the maximum value and create a whisker from the middle of the box out to this data point.

Now we have a box plot (and whisker) plot. Where else can inanimate objects have whiskers, except in statistics?

#### Drills

1. The following data represents the height, in inches, of the Golden State Warriors Basketball Team. What is the median height of the Warriors?

79

The median height of the Warriors is 79 inches (which is 6'7"!). Now that's a basketball team. To find this value, sort the heights from smallest to largest: 69, 75, 75, 75, 78, 78, 79, 79, 81, 81, 82, 83, 84, 84, 84. There are 15 data points, so the median is the value that has 7 values less than it and 7 greater than it. The eighth value is 79, which is the median.

2. The height, in inches, of the pitchers for the 2012 San Francisco Giants is listed below. What is the median height of this pitcher's roster?

73.5

The median height of the 2012 Giants pitching staff is just over 6 feet at 73.5 inches. Not quite as tall as the basketball team. Sorted from smallest to largest, the data is: 70, 71, 71, 72, 72, 73, 74, 74, 75, 76, 76, 77. There are 12 data points, so the median is the value that has 6 values less than it and 6 values greater than it. Because there are an even number of data points, the sixth and seventh value from the sorted list (73 and 74) need to be averaged to determine the median. This leads to 73.5 inches.

3. The following represents test scores from a recent geometry quiz. What is the lower quartile for the data?

79.5

The lower quartile is the score that divides the class: 25% did worse than that score and 75% did better. Since there are 12 student scores listed, we need to find the value that has 9 values greater than it and 3 values less than it. To get the lower quartile, we average the third and fourth values in the sorted list. So we take the average of 77 and 82 which yields 79.5.

4. The following represents test scores from a recent geometry quiz. What is the upper quartile for the data?

90.5

The upper quartile represents the all-stars for this quiz, with the top 25% of the scores. There will only be 3 students in the upper quartile, and if you're on Shmoop, you're most likely one of them! To get the upper quartile, we average the ninth and tenth values in the sorted list. So we take the average of 90 and 91 which yields 90.5.

5. The following lists the salaries in millions, for the top ten highest paid CEOs in the United States.

Which of the following is the box plot for this data set?

With Shmoop by your side, maybe you'll be a top earning CEO someday. Until then, you'll just have to study the data of their salaries. For this data set, the median salary is \$66, the lower quartile is \$62.5 and the upper quartile is \$79.5 so the correct answer is (A). In (B), the median is incorrectly set at \$75 (don't be so greedy, CEOs). In (C), the maximum value is incorrectly displayed as \$82. In (D), the lower quartile is set at the minimum value of \$56 instead of the correct value.

6. Twenty of your classmates were asked to keep track of the number of hours of TV they watched for a week. After the week was up, the following data was collected.

 10 7 8 11 7 12 7 14 8 13 7 8 6 11 12 10 9 11 11 12

Which of the following is the proper histogram for this set of data?

If we have a total of 20 data points, we should have a total of 20 when the bars of the histogram are added up. The bars in (A) are too low, and the bars in (B) are too high. The only accurate representation of the data is in (D), which distributes the data points accordingly. All the others are incorrect.

7. Given the following box plot, what are the median, lower, and upper quartiles?

12, 12.7, and 15

Based on the box plot, the median is at 14, the lower quartile is 12 and the upper quartile is 16. In (A), the lower quartile is incorrect, in (B), the median is incorrect and in (D), the lower quartile is incorrectly listed as the minimum value.

8. Which of the following is the dot plot for the data: 8, 7, 6, 10, 5, 6, 6, 6, 8, 8.

There is one 5, four 6's, one 7, three 8's, and one 10 in the data, and (A) is the only dot plot that matches this distribution. Among other errors, the other dot plots all have a point at 9 even though there is no 9 in the data set.

9. Given the following dot plot, what is the median of the data?

7

How do we even go about this question? Aren't medians for box plots, not dot plots? Remember, dot plots are just a representation of a data set. If we can extract the data, we can find the median. First, let's count the number of data points, or the number of dots on this plot. In this case, there are 10 dots representing the 10 data points. So, we need to find the value that splits the data into two equal parts. Each part will have 5 data points. If we start counting from the first dot, the fifth and sixth occur at 7, so the median is 7.

10. If a data set has 20 values, how many values are in the lower quartile?

5