# Basic Statistics & Probability

### Topics

There's a high probability (see what we did there?) that at some point in your probability/statistics unit, your teacher will ask you to do a **data-gathering statistics project**. Don't worry, we can help!

These kinds of projects always follow more or less the same pattern. Here are the steps you need to know:

- Ask a question
- Form a hypothesis
- Collect your data
- Analyze your data
- Display your data
- Make a conclusion

## What Makes a Good Question?

To start, you need to have a good "theme question" to ask. Pick something that interests you. You may need to ask follow-up questions to fully develop your project. You also need to know what kind of data to collect: for example, should it be numerical data or can it be data that has answers given in words? Do you need both?

Examples of good questions for a statistical study:

a) Do the political views of parents influence the political views of students at Shmoople Hills High School?

b) Do 8th graders in my school need less homework each night?

Why these are good questions:

a) They are interesting.

b) They’re specific about a targeted audience.

Examples of not-so-good questions for a statistical study:

a) Is swimming more popular than ice hockey?

b) Are girls or boys taller?

Why these are not-so-good: If you don’t put a boundary around the audience you’re measuring, things get a lot more complicated – are you trying to make a statement about all the girls and boys in the world? Phew, that could get tiring. Also, in general, simple popularity contests don’t reveal interesting correlations for statistical studies.

After you pick your topic, you need to design the specific questions you will ask. Good questions are unbiased, meaning that they don't try to influence the person being asked. Let's say that you want to try to cut down on the amount of homework your teachers assign and ask the following questions:

a) On average, how much time do you spend each night on homework?

b) Many kids are concerned about not having enough free time. Do you agree?

Which question is biased, *a* or *b*? If you answered *b*, you are probably awake and paying attention. Question *b* is *leading* the interviewee to agree with you. The best questions are concise, specific, direct, and neutral (non-leading).

## Setting Up an Example Study

Shmoop is curious about how middle school students in San Francisco use two fictional social networking websites called FaceSpace and MyBook.

To make this more interesting than a simple popularity poll, we will look to see if there are differences in the way boys and girls respond to the questions in our survey.

In our fictional survey, we asked 50 middle school boys and 50 middle school girls these questions:

- Do you use FaceSpace, MyBook, or both?
- How much time per day do you spend on these sites?
- Are you "friends" with your parents?
- To the best of your knowledge, do your parents monitor your usage?

You can also create an easy-to-use questionnaire that can be filled out by the interviewer or the interviewee. Our fictional questionnaire was a spreadsheet that looked like the table below (for girls). We had an equivalent one for boys.

Girls | MyBook (y/n) | FaceSpace (y/n) | Time (hours) | Parent Friend (y/n) | Parent Monitor (y/n) |

1 | |||||

2 | |||||

3 | |||||

4 | |||||

5 | |||||

6 | |||||

7 | |||||

8 |

Using this, we could quickly write in the answers of each person we asked.

## Formulate a Hypothesis

Now that you know what you're going to study, you need to predict what your results will prove. This is called formulating a hypothesis. Take a guess at what the generalizations will be and *why* they may turn out that way.

For our study, we expect we'll find that girls will use MyBook and FaceSpace more and spend more time on these sites than boys. In addition, we think that parents will be more likely to monitor the usage of their daughters than that of their sons.

## Collect a Sample

It's almost time to gather your data, but read this section carefully before you proceed. In order to conduct an accurate study, *whom* you interview is of the utmost importance. Most likely, you will not be able to poll the entire population that you are interested in studying; you will need to poll a *sample* of that population. For our study, we couldn't ask every middle school student in the world about his or her social networking patterns, so we selected a sample of 50 boys and 50 girls.

The sample you poll *must* be randomly chosen to have **significance**. If you are interested in the favorite movie of students in your middle school and ask all of your friends about their favorite movies, this would not be representative of the whole school, since the people in your group of friends probably share similar interests and hobbies. Even if you stand outside of your second period science class and ask the first 30 kids you see, this would not be random; chances are high that you would poll a large number of students in your own grade and your friends may be more likely to stop and talk with you.

Here are a few ways you can get a truly **random sample** of your school:

- Place every student's name in a box and randomly draw names.
- Place a questionnaire in every fifth locker.
- Randomly pick 10 homeroom teachers and ask them to pass out questionnaires to their students.

There are many different ways to collect samples, like collecting free samples at the supermarket, and here's a little video on those methods:

## Analyze Data

Now you've designed a study, created a questionnaire, and polled your random sample. It's finally time to look at the numbers and do some math. Your first step is simply to add up the numbers from your survey. Then, you can calculate the percentages for each category and display them in table form.

Here is what we might have found with our social networking study if we actually had surveyed 50 middle school girls and 50 middle school boys in San Francisco:

Social Networking Data(% that answered yes to question) | Girls | Boys |

MyBook | 86% | 66% |

FaceSpace | 30% | 36% |

Both | 24% | 22% |

Neither | 8% | 20% |

"Friend" with Parent | 66% | 50% |

Parent Monitors | 54% | 30% |

Mean time spent on these sites | 2.20 hr/day | 1.01 hr/day |

These are just the basic percentages, but they still tell us quite a bit. Based on our fictional survey, girls do spend significantly more time on social networking sites than boys and parents do tend to monitor their daughters more than their sons.