High School: Statistics and Probability
Interpreting Categorical and Quantitative Data HSS-ID.A.2
2. Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets.
One of the main reasons for collecting data is so it can be compared to other data. Sounds like a dream come true, doesn't it?
Comparing data allows us to make big statements. After all, how can we be sure that using Shmoop increases test scores if we don't have two sets of data, pre-Shmoop and post-Shmoop, to compare?
Rather than comparing entire data sets, however, we can summarize the data and compare these summaries. That way, rather than comparing long and seemingly never-ending lists, we can compare two very basic factors that tell us a lot about the data: the center and spread of the data.
The center of the data is exactly what it sounds like: a representation of the middle of the data, or a typical value. It gives us a good first guess as to where on the number line the data will fall. Students should know the two types of centers of data: mean and median. The mean, or average, is the sum of all the data points divided by the number of data points, while the median is the value that splits the data into two intervals.
Students should know that the center of data can give us a good sense of the data set overall. For instance, we'll know that the heights of buildings are more closely represented by an average of 100 feet than by an average of 100,000 feet. Still, the center of data doesn't tell us the whole story. Let's say we have the following two sets of data:
Set 1: 4, 5, 6, 4, 6, 5
Set 2: 1, 9, 2, 8, 0, 10
Both of these data sets have an average of 5, but the first set only has values between 4 and 6 and the second data set has values between 0 and 10, a much wider range. This is "wideness" or "breadth" of the data is represented by the spread of data, and that's the second aspect students should consider when summarizing data.
Students should know how to use the interquartile range and standard deviation to describe the spread of data. The interquartile range (IQR) is the range that spans the middle fifty percent of the data. To determine the IQR, the lower (Q1) and upper (Q3) quartile need to be determined. Once that is done, IQR = Q3 – Q1.
The standard deviation, denoted by σ, is the spread of the data away from the mean of a set of data. If you could simultaneously move away from the mean in both directions, when you had traveled the distance of the standard deviation in both directions, then 68% of the data would be between you and your clone (in a normal distribution, anyway).
Students should know that the mean has the formula
and the standard deviation has the formula
Often, the mean and standard deviation are used together and the median and interquartile range are used together. Students should know that the mean and standard deviation are most frequently used when the distribution of data follows a bell curve (normal distribution), shown below.
Students should understand that the larger the values of the IQR or standard deviation, the larger the spread of the data is. If students are struggling with why this is so, show them mathematically using the formulas (since the quartiles are further apart, or the differences between the data points and the mean are further apart). Now, rather than comparing tables of dozens or even hundreds of numbers, we just need to compare two.
Here's a resource teachers can use to help explain normal distribution curve.