From 11:00PM PDT on Friday, July 1 until 5:00AM PDT on Saturday, July 2, the Shmoop engineering elves will be making tweaks and improvements to the site. That means Shmoop will be unavailable for use during that time. Thanks for your patience!

# High School: Statistics and Probability

### Making Inferences and Justifying Conclusions HSS-IC.A.2

2. Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation. For example, a model says a spinning coin falls heads up with probability 0.5. Would a result of 5 tails in a row cause you to question this model?

Students should understand that sometimes, weird things happen in statistics. Not spooky weird, but an almost unnatural weird. Like flipping a coin ten times and ending up with ten heads. Odd.

Long ago, statisticians would seek the help of mental health professionals to figure out if they were crazy when these strange events kept happening. The mental health industry has been devastated ever since statisticians came up with standardized mathematical ways that help explain these phenomena. (The correlation between statisticians and the mental health industry has yet to be statistically confirmed.)

Students shouldn't freak out when things don't go exactly according to plan. There are ways to test whether results fit nicely into a statistical model or not. In statistics, the options for numerical tests are as numerous and appealing as germs on a hotel room comforter. By the way, good luck sleeping on your next vacation.

Students should be familiar with Goodness of Fit tests (which aren't the same tests used in JC Penney changing rooms). Students should also know that these tests help measure whether or not a statistical model fits certain observations.

The Chi-Squared Goodness of Fit Test (called the Chi-Squared test, for short) assumes that any discrepancy within our data is the cause of chance rather than a faulty model. We can use the Chi-Squared test provided a large enough population, an appropriate random sample, and all that other good stuff that comes with proper statistical studies.

Students should know how to calculate the value of χ2, where

In the formula, O is our observed frequency value and E is our expected frequency value.

Students also need to find the degrees of freedom, which equals the number of categories in our sample minus 1. So if we have 4 different kinds of fruit, that means we have 3 degrees of freedom. Simple enough.

As a side note, statisticians love tables. Not round tables or square tables or three-legged tables. We mean tables of values. Never-ending columns and rows of numbers upon numbers. Whatever floats their boat, right?

Students don't have to like these tables, but they should know how to use them. By that we mean compare our χ2 value to the number corresponding the degrees of freedom and significance level p = 0.05 on the table. If χ2 is larger, then our data doesn't quite match the model. If χ2 is less than the critical value (the one given by the table), the model works well enough.

#### Drills

1. Different types of cell phone brands were sold at the local store over a one-month period. The data below represents how many phones were sold and how many were projected to be sold. Did the model predict the expected values accurately enough? Assume p = 0.05.

 Category Observed Expected iPhone 198 213 Motorola 32 28 Nokia 15 10

Yes, the χ2 value was too low

To start off, the Chi-Squared test assumes there is no significant difference between observed and predicted frequencies in cell phone model sales. But we need to calculate our χ2 value to find out. If we do so, we end up with χ2 = 4.13. We have two degrees of freedom and a p = 0.05 value. Compared to the number on the table, our χ2 is lower, so we've narrowed our answer choices down to (B) or (D), but it's important to remember that a low χ2 value means the model is correct.

2. A city crime commission has modeled the expected crime rates for certain crimes. The observed frequencies are detailed as well. What is the χ2 value?

 Category Observed Expected Robbery 1283 1459 Assault 456 345 Murder 839 967 Theft 5683 5671

73.9

All we need to do is apply our formula:

If we do that for each category, we end up with a total of 73.91254, which is closest to (C). That value doesn't mean much without another value to compare it to, but that's what the question asked for.

3. When the χ2 value is negative, what does this imply?

The calculation was performed incorrectly

It's impossible to have a negative χ2 value. After all, it's squared for a reason. Usually, students forget to square their (OE) values, which might lead to negatives, but it's not a big deal. Just redo the calculation.

4. An airport traffic model shows that 25% of airplanes arrive late, 25% of airplanes arrive early, and 50% of airplanes arrive on time. Out of 20 airplanes, 10 arrive late, 5 arrive early, and 5 arrive on time, is this model valid? (Assume p = 0.05.)

No, the model is invalid

Out of 20, our expected values for late, early, and on-time planes are 5, 5, and 10, respectively. This means our χ2 value comes out to 7.5. With 2 degrees of freedom of and p = 0.05, our critical value is 5.991. Since our χ2 value is greater than the critical value, the model is invalid and (A) is the right answer.

5. Your basketball team has decided to figure out shooting percentages of all of its players by creating a model. The study categorizes the player into categories of age: 14, 15, 16, 17, and 18. How many degrees of freedom are there?

4

Degrees of freedom are the number of categories minus one. Our categories are the 5 different ages. Since 5 – 1 = 4, our answer is (C).

6. Several quantitative financial analysts working for a giant bank have created a financial model for Apple's stock price. They predict the following prices over the next 12 months. The observed values are shown as well. Calculate the χ2 value and find out whether or not their model is valid. (Assume p = 0.05.)

 Month Observed Price Expected Price Jan 334 333 Feb 333 333 March 332 333 April 331 333 May 325 335 June 320 334 July 315 320 Aug 317 321 Sept 319 315 Oct 322 315 Nov 320 316 Dec 319 317

The model is valid; there is no observable statistically significant difference

We have 12 – 1 = 11 degrees of freedom. After calculating the χ2 value, we get a number close to 1.3. With 11 degrees of freedom and a significance level of 0.05, our χ2 value is well below the critical value. This means the model is fairly accurate.

7. Your teacher claims that if you were to come up to the front of the class and select one of ten numbers, 1 through 10, randomly, his model would predict the numbers you will choose with statistically accurate significance. In fact, he's so confident he's willing to lower the p value to 0.01. How good is his model of predicting your behavior?

 Observed Value Expected Value 1 4 3 1 5 7 2 2 7 6 1 4 10 1 4 10 8 4 4 3

His model doesn't work, since the values are too different to be due to chance alone

If we calculate our χ2 value, we end up with 49.17. Our critical value is 21.66 (since we have 10 – 1 = 9 degrees of freedom). Since our χ2 value is greater than our critical value, your teacher's model doesn't work. He could learn a thing or two from you.

8. Identify the mistake in this χ2 value calculation.

 O E (O – E)2 χ2 35 35 0 0 235 235 0 0 7563 7561 -2 -0.00026 13423 13400 -23 -0.00172 2532 2500 -32 -0.0128 435 450 15 0.033333 54645 54000 -645 -0.01194

The third column was not actually squared, despite what's written

If you did the calculations or if you simply noticed the negative signs, you'd realize that the third column was simply not squared, leading to the wrong result. All other answers are completely untrue. While answer (B) is partially right, the χ2 value can often be very small, which is not a cause for concern if all the calculations are performed correctly.

9. After several hours of isolation, 16 people are asked to estimate the time. Scientists designed a model that predicts how far off in minutes their guesses would be. The observed differences and the expected differences were assembled into the following table. Was the model accurate? (Assume p = 0.05.)

 Observed Expected 1 5 2 1 5 6 2 2 8 8 1 3 4 1 10 7 16 2 4 4 32 6 6 1 2 7 4 1 7 5 2 2

No

With a quick calculation of our 16 – 1 = 15 degrees of freedom, we can see that the χ2 value is humongous (over 265) and far greater than the critical value on the table. Clearly, this model needs some more work.

10. If our χ2 value comes out to be greater than the critical value, which of the following can we assume?