Study Guide

# Probability and Statistics - Types of Data

## Types of Data

The Merriam-Webster dictionary says that data is "factual information used as a basis for reasoning, discussion, or calculation." If we observe something and write down what we observe, we're gathering data. It's definitely easier than hunting data.

• ### Qualitative v. Quantitative Data

There are two general types of data. Quantitative data is information about quantities; that is, information that can be measured and written down with numbers. Some examples of quantitative data are your height, your shoe size, and the length of your fingernails. Speaking of which, it might be time to call Guinness. You've got to be close to breaking the record.

Qualitative data is information about qualities; information that can't actually be measured. Some examples of qualitative data are the softness of your skin, the grace with which you run, and the color of your eyes. However, try telling Photoshop you can't measure color with numbers.

Here's a quick look at the difference between qualitative and quantitative data.

• The age of your car. (Quantitative.)

• The number of hairs on your knuckle. (Quantitative.)

• The softness of a cat. (Qualitative.)

• The color of the sky. (Qualitative.)

• The number of pennies in your pocket. (Quantitative.)

Remember, if we're measuring a quantity, we're making a statement about quantitative data. If we're describing qualities, we're making a statement about qualitative data. Keep your L's and N's together and it shouldn't be too tough to keep straight.

• ### Categorical Data

One other type of data you'll need to know is categorical data. This is data that can be organized into mutually exclusive categories. If we look at a bunch of bananas and they're all either green, brown, yellow, or blue, then we could use the categories "green," "brown," "yellow," and "blue" to record our data. We'd stay away from the blue ones if we were you.

### A Few Examples

This statement refers to categorical data. The categories are the different colors of hair that have been observed: brown, blonde, and red. Incidentally, most of the blonde students got this question wrong.

The car is orange-red.

This statement sounds like it's referring to categorical data, but it isn't. This statement refers to data that is qualitative, but not categorical. There isn't enough information to determine what the categories would be. If we went with the standard colors of the rainbow, to what category would the color "orange-red'' belong? We don't remember there being an "orange-red" Wiggle.

Categorical data is usually qualitative. However, quantitative data can also be put into categories—more on this later.

### Sample Problem

There's a family in which the dad is 5'11'', the mom is 5'7'', and the kid is 4'8''. The mom is a little unnerved by how quickly her daughter is gaining on her, but it's irrelevant to this problem, so don't let it bother you.

Anyway, these are measurements, and are therefore quantitative data. However, we could also say that both the dad and the mom are between 5 and 6 feet tall, and the kid is between 4 and 5 feet tall. If we say this, we've taken our quantitative data and put it into categories. The categories are "5 to 6 feet tall," "4 to 5 feet tall," and so on. If the dog wants to play, we'll need to add a "1 to 2 feet tall" category.

To summarize what we have so far, data is either quantitative (about quantities or numbers), or qualitative (about non-measurable qualities). Sometimes data can be turned into categorical data by putting it into categories. Or by waving a wand over it and saying "categoriarmus!"

Most of our statistics will be done on quantitative data, since this is math, after all. We can also do some things with categorical data. It's hard to analyze data that's qualitative and not categorical, since we need to have numbers somewhere. Yes, we're a slave to numbers. We're seeing someone about it.

• ### Discrete v. Continuous Data

Whenever we collect data, there's a collection of possible values from which we record our observations. If we're flipping a coin, the possible values we can observe are H (heads) or T (tails). Or, occasionally, the very rare E (edge). If we're measuring someone's height in centimeters, the possible values are any positive number of centimeters and fractions of centimeters. There are two different ways to classify data based on the possible values we can observe.

Data is discrete if there's clear separation between the different possible values. Either there will be a finite number of possible values, or we're counting something.

### Sample Problems

If we flip a coin and record the result there are only two possible values (ignoring that pesky "edge" thing): H and T. There's no possible value between H and T, so our observations are discrete.

Recording the numbers of coins in different piggy banks would also give us discrete data, since there's a separation of one whole coin between any two numbers we might get. Even a half-dollar is still a whole-coin.

Sets of data that record counts of actual, physical things are discrete. We can't have half a person when we're counting a town's population, unless we're in a horror movie.

However, data is continuous if there's no clear separation between possible values. Like if two values are still kinda-sorta seeing each other, but haven't really discussed if they're an "item."

### Sample Problem

If we measure someone's height in centimeters we could get 160 cm, or 160.01 cm, or 160.001 cm (assuming we had a very accurate method of measurement). For any two possible values (say, 160 cm and 161 cm), there's another possible value between them (160.5 cm). Those infuriating numbers can always be broken down into smaller and smaller numbers. It's part of the reason we love them so much. Can't count with them, can't count without them. That means our observations are continuous.

Sets of data involving measurements that can have fractions or decimals are generally continuous.

• ### Univariate v. Bivariate Data

Before we start analyzing, we need to make one more distinction between different types of data. Then our data can take a seat on the couch and we'll start getting to the root of its daddy issues.

Single-variable or univariate data refers to data where we're only observing one aspect of something at a time. With single-variable data, we can put all our observations into a list of numbers.

### Sample Problem

We take a group of people, measure their heights, and get this list of heights:

5'2'', 5'4'', 6'1'', 5'9'', 5'3''.

This is univariate data, since we're only observing one aspect (the height) of each person.

With two-variable, or bivariate data, we observe two aspects. We can put our observations into a table. The columns-and-rows kind, not the upending-and-throwing-across-the-room-in-a-rage kind.

### Sample Problem

We take a group of people, measure their heights and weights, and get the following information: This is bivariate data, since we have observations about two aspects (the height and weight) of each person.