Advanced Statistics—Semester A

Let Shmoop be your guide between the Scylla of data and Charybdis of statistics.

  • Course Length: 18 weeks
  • Course Type: AP
  • Category:
    • College Prep
    • Math
    • High School

Schools and Districts: We offer customized programs that won't break the bank. Get a quote.

Get a Quote

This course has been granted a-g certification, which means it has met the rigorous iNACOL Standards for Quality Online Courses and will now be honored as part of the requirements for admission into the University of California system.


Statistics gets a bad rap in some circles. People think that stats is all about shysters throwing out a confusing bafflegab of numbers to trick people into agreeing with them. And then they cry out, "Oh no, I've been bafflegabbed!" as their life savings disappear in a puff of white smoke.

As you might expect, we here at Shmoop have a different take on things. Statistics is the study of analyzing data and making inferences about a population. That sounds intimidating, but here's the thing to remember—statistics is just another type of mathematics. And we know all about math. There are rules to math; if you use the right equation at the right time, then you'll get the right answer. And if someone tells you something outrageous, like they've found a negative whole number, then you've caught them bafflegab-handed. 

Statistics works the same way. Honest.

In Semester A of this year-long AP® Statistics course, we'll be covering the foundation for all of statistics: data. What they look like, how to analyze them, how to graph them all pretty like, and even how to go out into the wild and gather some data of our own. Without good, solid data, we'd be up Bafflegab Creek without a clue.

Here's a sneak peek at what we'll cover in this semester:

  • We'll chat about all the different types of data out there, along with the best ways to display them. You don't graph categorical data with a histogram for the same reasons you don't eat Thanksgiving dinner with a spatula.
  • People won't stay awake to hear your conclusions if you insist on reading off all 500 data points you collected. We'll discuss how we can summarize our data using just a handful of numbers.
  • Sometimes our data are unruly, so we can use a transformation to get them to settle down.
  • Data don't appear out of thin air. Somebody has to go collect them. Once you've identified the population and parameters you're interested in, it's time to conduct a survey, experiment, or observational study to nab a sample.
  • Since statistics is about making inferences, we'll need some way to talk about how likely we think different events are. So, we'll wrap up the semester by boning up on probability. Plus, we'll run some simulations, which are a super-handy tool in our stats toolkit.

Of course, it would be hard to learn statistics' rules without a caravan of readings, guided questions, problem sets, and activities, so we've got all that covered, too.

Just so's you know: this is a two-semester course. This is Semester A, and you can find Semester B right here.

Technology Requirements

A graphing calculator is used throughout the course (and on the AP Stats exam). We give plenty of instruction on using the TI-84 specifically, but any graphing calculator from this list will do the trick.

Required Skills

A strong working knowledge of algebra, up through Algebra 2.


Unit Breakdown

1 Advanced Statistics—Semester A - Visualizing and Describing Data

We'll start our story of statistics off at the beginning—with our data. Whether it's qualitative or quantitative, we'll have it covered. One of the best ways to work with data is visually, so we'll put in plenty of practice time creating all kinds of tables and graphs. Frequency tables, pie charts, bar graphs, histograms, even exotic graphs like time plots, frequency polygons, and ogives.

2 Advanced Statistics—Semester A - Numerical Measures for Quantitative Data

We're pretty busy, so we want to know all the important information about our data in a hurry. That's where measures like the mean, median, mode, interquartile range, variance, and standard deviation come into play. Some of them tell us about where our data are located, some of them show how spread out the values are, and together they give us a pretty good idea of what's going on in a dataset. Adding in box-and-whisker plots and z-scores is just the icing on this data cake.

3 Advanced Statistics—Semester A - Comparing Distributions

If you've ever been part of a heated rivalry with one of your classmates, then this unit is for you. We're going to look at how to compare the distributions of two datasets. Who's better at improving students' test scores—Shmoop or our hated rival, Miss Linda Poomhs? Using our various graphing methods, we'll finally settle the score. And then we'll summarize our results using plain English. That way, Linda will know exactly how much better we are than her.

4 Advanced Statistics—Semester A - Bivariate Data: Scatterplots and Correlation

Double the data, double the fun. In this unit, we'll start working with bivariate data. We'll slap our two variables into scatterplots and see what their relationship status is. Instead of checking their Facebook pages, we'll use the correlation coefficient. It's also possible to create a least-squares linear regression, which is a fancy/scary way of saying "a line that fits the data really well." We'll use it to make predictions about the data's behavior (while keeping an eye out for trouble from the residuals and any outliers).

5 Advanced Statistics—Semester A - Function Models and Transformations

Not all bivariate data fits on a nice, easy-to-use line. Two other common results for two variables are when they have a exponential or power relationship. Since we went through all that trouble last unit to learn about lines of best fit, we'll just use some clever math tricks to transform our non-linear data into something a bit less bent.

6 Advanced Statistics—Semester A - Planning and Conducting a Study

It's time to introduce you to your new best friend: randomness. Not because it's "so random lol" and that cracks you up. No, it's because randomly choosing individuals for our sample is what makes it a representative sample, which is the linchpin for doing good statistics. There's more to it than grabbing a blindfold and some darts and going to town, so we'll go in-depth on the different kinds of samples we can take for surveys, experiments, and observational studies.

7 Advanced Statistics—Semester A - Probability and Simulations

What's the probability that we'll round out this semester by talking about probability and simulations? Well, unless the Unit Title Manager (totally a real Shmoop position, apply today) is completely awful at their job, it's probably 100%. We'll use our understanding of sample spaces, dependent and independent events, and simple and compound probabilities to simulate a bunch of events.


Recommended prerequisites:

  • Algebra II—Semester A
  • Algebra II—Semester B

  • Sample Lesson - Introduction

    Lesson 1.02: Displays for Categorical Data with Only One Variable

    Two PAC-MEN, dressed as Darth Vader and Luke Skywalker, cross lightsabers.
    There's got to be a better way to choose how to show these data.
    (Source)

    The new thing in communications is "infographics." It sounds exotic and complicated, until we realize it's just a composite word using "information" and "graph." This isn't a new idea—how many times have we arranged our sugary breakfast cereal or colored candies in bars of the same color?

    Confession time. We've done that since age 2.

    Whether we knew it or not, we were constructing a pictograph (unless we got in trouble for playing with our food or ate the data before our graph was finished). We've known ever since we could eat Fruit Loops that it's interesting to look at a picture of our data.

    Pictographs are one choice for categorical data, but in the business and scientific worlds, we're much more likely to see categorical data in the form of pie charts or bar charts. They're more informative, and hey, pie is delicious. These types of graphs can be found in financial reports, news stories, textbooks, and just about everywhere someone might need to report some data.

    In this lesson, we'll cover how to choose between pie charts and bar charts, and check out some rules on how to construct them. We'll also learn about pareto charts, which are a special kind of bar chart.


    Sample Lesson - Reading

    Reading 1.1.02: Recipe For Data Pie

    Choosing the best kind of chart for a set of categorical data is like choosing a topping for apple pie. Some people like cheese; others like ice cream. It would be pretty strange to find both of those toppings on a piece of pie; in the same way, there's usually a best choice between pie charts or bar graphs for summarizing categorical data. It all depends on how we collected the data.

    Pie Charts

    Pie charts are best for data where the categories don't overlap. To keep a pie chart from looking too boring, we should probably have at least three different categories. Otherwise you get something like this one, which shows the percentage of babies born in the United States by gender.

    The only surprise here is there are more men born in the U.S. than women. At least according to the U.S. Census Bureau. We could have just as easily written a sentence with this information and saved some of our scented markers in the process. The lesson is that we don't always need a graph when we have only two categories.

    Here's an example of a pie chart, also known as a circle graph. The most effective pie charts use relative frequencies in percentage form to label the slices.

    A variation on the pie chart is a circle graph with the middle cut out. It looks more like a donut than a pie, but it gives the same information (except for who ate the middle out of the pie). Now, please pass the fork.

    Bar Graphs

    Bar graphs are a little more flexible than pie charts. Any data that can be shown in a pie chart can also be shown in a bar graph, but bar graphs also work if more than one response can be counted by each subject in a survey.

    The first three sample problems on this page show some bar graphs using data that also could have been shown in a pie chart. The vertical axis in each of these examples shows "Quantity" or "Number," which is the same thing as frequency. We could also use relative frequency for a bar graph, which is a good idea if we have a large number of observations. If we use relative frequency, we would label the vertical axis as "Percentage."

    When more than one response can be counted by each subject in a survey, we can't use a pie chart or we'll end up with a pie that represents more than 100%. In this case, a bar graph is our only choice. This happens often in surveys.

    Example: Survey Question on Household Electronics

    Suppose a group of fifty families are selected to complete a survey on what kind of electronics they own. They check a box for each item they own. The survey results are shown in the table below.

    Type of ElectronicsNumber Reporting "Yes"
    HD Television49
    Desktop Computer30
    Gaming Console26
    Wireless Router18
    Smart Phone50
    Tablet Computer14
    Digital Camera12
    Programmable Stove50
    3-D Printer1

    The total number of "Yes" responses is far greater than 50, so we clearly can't use this data in a pie chart. On the other hand, we can easily use this information to make a bar graph.

    We can immediately see that the electronics most people own include smart phones, HD televisions, and programmable stoves.

    If we wanted, we could change this bar chart to use relative frequency instead of frequency. Most charts in articles and books use the percentage version of relative frequency. Our modified table is shown below, along with the first couple calculations.

    Type of ElectronicsNumber Reporting "Yes"Percentage Reporting "Yes"
    HD Television49, or 98%
    Desktop Computer30, or 60%
    Gaming Console26, or 52%
    Wireless Router180.36, or 36%
    Smart Phone501.00, or 100%
    Tablet Computer140.28, or 28%
    Digital Camera120.24, or 24%
    Programmable Stove501.00, or 100%
    3-D Printer10.02, or 2%

    To make our bar chart even better, we could put the products in descending order on the horizontal axis.

    This bar graph is much more effective than the first one. People relate better to percentages than counts, and the arrangement of the items in descending order makes it even more obvious which items are hot and which ones are not.

    Now, if only we could hook up the 3-D printer to the programmable stove, we could make a pizza without ever going into the kitchen.

    Recap

    • We can use pie charts and bar graphs to display categorical data with one variable.
    • We can't use pie charts if more than one response can be counted by each subject, and a pie chart should have at least three categories. To stuff our charts with as much information as possible, we can label the slices with the percentage of the data represented.
    • We can use bar graphs for any kind of categorical data, including cases where more than one response can be counted by each subject. The categories in a bar graph may be rearranged to make the graph more informative. Either counts or percentages may be used to label the vertical axis.
    • If we order the bars in a bar graph from greatest to least, we create a pareto chart.

    Throughout this course, we include videos that recap concepts or present practice problems on the material from a lesson. Feel free to skip over them as long as they're part of the Recap section. If they're wedged in the middle of the Reading, though, you'll want to watch them there and then.

    For our first Recap videos, here are a couple example problems about reading pie and bar graphs:


    Sample Lesson - Activity

    Activity 1.02b: Creating Pictures of the Creative Arts

    Art comes in many forms. Even graphs are an art form. Just like a painting or a song, it's a way to get the artist's vision across to their vast, adoring audience. In this activity we become the artist, and our medium is data and graphs. Just so we're clear, there will be no cutting off of anyone's ears in the name of art or love here.

    In 2008, the National Endowment for the Arts collected data on participation in different creative activities. The results of the survey can be found by clicking here, scrolling to page 762, and reading Table 1237: Personal Participation in Various Arts or Creative Activities: 2008.

    They looked at participation across different categories. One of those categories was age group. We'll be exploring the question of whether people are more likely to participate in the creative arts as they get older, and you'll be creating some pie charts and bar graphs to help answer this question. You'll be making your bar graphs by hand, for that old-fashioned, homemade look. You can use technology for your pie chart, though; that's okay. Just this once. Just don't tell anyone.

    The bar graphs, though? Totally by hand.

    Step 1: First we'll explore the age distribution of the U.S. population. The population (in millions) for different age groups appears in the second column. Use these data to create a pie chart.

    Step 2: Now we'll explore participation in two different activities. Choose any two; for example, you might be most interested in Classical Music and Photography. Notice that the data for each of the activities are shown as percentages, not as counts. Use the data to make bar graphs of participation in your chosen activity by age group.

    Step 3: Free choice time! Using any data in this table, make a bar graph. Maybe you'll look at participation in creative arts by ethnic group or gender, or maybe you'll choose a specific age or gender group and look at participation in several of the various activities. No rules here. Well, except maybe all the rules about making accurate charts. Follow those rules. Again, this one should be a bar graph too.

    Step 4: Now it's time to make some observations. For each of the graphs, write one or two comments about what the graphs are telling us. Does participation in the arts increase with age? Are certain activities more popular than others? Make sure that any conclusions you make are actually supported by the graphs.

    Step 5: Before submitting any work, be sure to check for accuracy and clarity. Anyone should be able to pick up these graphs and know exactly what you're trying to communicate. Even the random guy who is always in line ahead of us at the smoothie place should be able to understand what these graphs are trying to say. This means label, label, and label them some more. When you're done, upload your work using the button below.


    Sample Lesson - Activity

    1. Which of the following sets of data would be appropriate for a pie chart?

      I. The numbers of all U.S. adults who traveled overseas last year, given in ten-year age groups.
      II. The percentage of unemployed adults in each of the Mid-Atlantic States.
      III. Results from a survey of 1000 adults asking which professional sports they watch on TV.

    2. Whale-watching enthusiasts can spend days on a boat and never see a single whale. Oddly, the not-seeing-whales thing has little impact on their enthusiasm. The data table below shows information for three local charters. A successful trip is one where at least one whale was spotted.

      Charter CompanyNumber of Trips Number of Successful Trips
      Whale of a Trip175105
      Thar-she-blows12580
      Humpback Heaven185115

      We would like to compare success rates for each charter. (A success rate is the percentage of trips where a whale was spotted.)

      Which one of the following statements about these data is false?

    3. The pie chart below shows U.S. enrollment in 2003, by type of school.

      Which of the following statements about this pie chart are true?

    4. There's nothing worse than a lost suitcase. All that stuff on its way to Jamaica while we're standing there in Paris. This chart shows the numbers of bags that were lost in 2006 for a whole bunch of airlines.

      Which one of the following statements is false?

    5. The table below compares the number of fatal and non-fatal shark attacks for the years 2004-2007.

      YearFatal AttacksNon-Fatal AttacksTotal Attacks
      200475865
      200545761
      200645963
      200717071
      (Source)

      After we decide never to swim again without our Shark Repellent Bat-Spray, we decide to make a graph showing the number of fatal attacks for each of the four years. (Those are the only ones we're worried about, after all.) Here is our graph.

      Which of the following statements is false?