# High School: Statistics and Probability

### Making Inferences and Justifying Conclusions HSS-IC.B.5

5. Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.

Students should understand that a randomized experiment can be used to compare two treatments. A good way to compare the two treatments statistically is through a t-test. (If your students are confused, just do Mr. T impressions: "I pity the fool who doesn't use a t-test for these types of problems!")

The t-test spits out a number that, when used along with a table further provided below, can help us determine the significant difference between two treatment options. The most commonly used α value is 0.05, but you can change that at will in your classroom.

Our t value is given by the following, where x1 and x2 are the averages of the two treatment samples.

In the calculation, x1 and x2 are our sample averages.

In the above equation, s1 and s2 are the standard deviations of each treatment and n1 and n2 are the sample sizes of each treatment type. It looks scary and overly complicated, but as long as your students plug the right numbers in the right spots, they should be fine.

We can find the intersection of the p value and the degrees of freedom on the table. The degrees of freedom for the two trials as a whole equals the minimum whole number when  is calculated. Once we know that, we can find the appropriate value on the gargantuan table. (Your students should get used to looking up values in tables. They'll be doing a lot of that.)

Before we go comparing values to tables, though, we should know whether we're performing a one- or two-tailed test. (If it helps them, tell your students to think of Sonic and Tails.)

If we assumed there was a difference, we only need to use the one-tail (Sonic) test. If we started out with a hypothesis that assumed no difference between the two treatments, we use a two-tail (Tails) test.

Students should know, though, that using a one-tail test is sometimes very unethical, especially in medicine because it tests for one outcome (say, greater effectiveness) rather than both possible outcomes (greater and less effectiveness).

Then, we go back to our numbers. Yes, there are a lot of them, so let's see what we've got. We have our t value, our α value (the significance level that was given to us), and our degrees of freedom. What do we do with all of them?

Let's say we have two samples, normally distributed, with a significance level of 0.05 (our α) and our test statistic (the t value) comes out to be 2.34 with 4 degrees of freedom. We would look at the table to find the degrees of freedom and find where our test statistic value falls in with respect to the p values. In this case, our t falls between 2.132 and 2.776.

So for a one-tail test, our p value is greater than 0.025 but less than 0.05. For a two-tail test, all we do is multiply our p value by 2, which means that in our case, it's greater than 0.05 and less than 0.1. Then, we compare this value to α

If the p < α, we reject whatever hypothesis we had. If p > α, we accept it (or at least don't reject it just yet). Do you remember those hypotheses?

Think of it this way: Tails's two tails are identical, so there's no difference between them. That means for a two-tail test, we assume the treatments are the same. For a one-tail test, it's the opposite.

Continuing with our example from before, our p > α for a two-tail test, which means we accept the hypothesis that the two treatments are the same. This agrees with our findings for the one-tail test (where p < α), where we reject the hypothesis that the two treatments are different.

Students must be able to tell which test to run, know how to run it, and find out if there is a significant difference or not.

#### Drills

1. The correct way to phrase a hypothesis for a study where the averages of two sample sets are assumed to be the same is?

There is no significant difference between the mean of Sample 1 and the mean of Sample 2

Answers (A), (B), and (D) all assume that there is a significant difference between the two samples even though we want to assume the opposite. The only option is (C), which says that the difference between the two is negligible.

2. You have decided to measure the average height of all the boys and all the girls in your class. Based on what you know about the average height of men compared to women, what will be your hypothesis before a one-tail test?

The average height of boys in my class is greater than or equal to the average height of girls in my class

Whenever phrasing a one-tail null hypothesis, we must use the phrase "equal to." It's less than or equal to or greater than or equal to, but it cannot be just greater than or just less than. Answer (A) is the null hypothesis for a two-tail test and (C) is contrary to known average height statistics.

3. A recent claim has been made that people who have an iPad spend more time on the iPad than people who spend time on their Tablets. After all calculations are performed, the study noted the t statistic to be 2.8, with 24 degrees of freedom, a two-tail test, and a significance level of 0.01. Is there truly a significant difference between the two data sets?

Yes, because p < α

Looking at the table, we note that p is somewhere in between 0.01 and 0.005 (we have to multiply by 2 for a two-tail test, remember?). This means our p < α, meaning that we reject the hypothesis that there isn't a difference. There is a significant difference. The only answer that matches these findings is (B).

4. Does soil filled with nitrogen increase the weight of worms compared to soil without nitrogen? It's no cure for cancer, but it's still a study. The weights of the seven worms you found are shown below.

 With Nitrogen Without Nitrogen 67 54 56 23 45 67 34 3 23 34 12 23 60 3

Assuming a significance level (α) of 0.05 and a two-tail test, find out where the p value lies by using the table provided.

0.40 > p >0.30

First, we can calculate the sample means for with nitrogen and without nitrogen as 42.42 and 29.57, respectively, with standard deviations of 20.32 and 24.21. If we use the formula , we get a t value of about 1.08. With 6 degrees of freedom, our p value lies between 0.3 and 0.4 (did you remember to double for the two-tail test?), which is much larger than α. That means we accept the hypothesis that there's no difference.

5. The following data shows, in minutes, how long it takes a new drug to dissolve inside your stomach. A market competitor's data is shown as well. Is there a significant difference between the two drug manufacturers? Conduct a one-tail test with α = 0.01.

 New Drug Competitor 5 1 2 5 6 2 2 6 1 2 6 4 2 1 7 6 7 2 5 6 6 2 3 6 6 2 2 6 8 4 1 5 4 2

No, the p value is above the significance value

The data should be averaged and the standard deviation found. After that, we can compute the t value to be about 0.879. Since the total degrees of freedom equals 16, we can see that the p value lies between 0.15 and 0.2, which is larger than α. In that case, we can reject the one-tail hypothesis that the drug manufacturers are significantly different.

6. An auto company wants to test the pressure (in psi) at which Tire 1 fails compared to Tire 2 from a pool of randomly selected older tires. Their goal is to see whether Tire 1 is better or not compared to Tire 2 in age-related resistance changes. Assuming an α value of 0.05 and a two-tail test, is there a significant difference or not?

 Tire 1 Tire 2 16.8 17.8 72.4 48.2 53.1 10.3 58.1 48.1 98.3 84.1 48.1 9.2 85.8 16.5 32.1 45.2 85.1 88.2 75.6

Yes, there is a significant difference

The only difference here is that our n values aren't the same, but that shouldn't affect the calculation of the degrees of freedom. We still calculate them based on the formula , which gives us 8.5 (or 8, since it's the highest whole number possible). Our t value should turn out to be 1.677, which lies between the (doubled) p values of 0.1 and 0.2. Since the p value is larger, we reject the hypothesis that the two are the same, and we can say that there is a significant difference.

7. When one sample size is one trial larger than another, the degrees of freedom are equal to which of the following?

The smaller of the n values subtracted by 1

While the real answer is "the largest whole number that is less than or equal to the average of the two n values subtracted by 1," that's not an option. But if we think about it, if we average two numbers that have a difference of 1 (say, 10 and 11), we'll end up at their halfway point (10.5). The largest whole number that's less than or equal to that will be the smaller of the two n values, and we need to subtract that by 1. So our answer is (A).

8. A study compares how much faster someone runs after drinking a shot of espresso compared to someone who drinks only water. If the data seems fairly different, what would be the best course of action?

Run a statistical test to find out because it's best not to assume anything

Even if something seems to be significantly different, the best way to analyze the differences and be sure of their validity is to perform a statistical test. (A) and (D) are assumptions and (B) is just silly.

9. When is the significance level α is stricter?

The lower it is, the stricter it is

The lower the α value is (say, 0.01 compared to 0.05), the harder it will be to find a significant difference based on how our statistical tests turn out.

10. A study has claimed that if you ask nicely, McDonald's employees give you an average of 25 grams more of French fries than if you're rude. (You also avoid having your food spat on.) The data shows that the test statistic is equal to 3.2 with 24 degrees of freedom. Assuming an α value of 0.05, is there truly a significant difference when performing a one tail test?