Chapter 2 Chapter 2: How We Use Statistics

In this chapter, we will cover the following topics:

What are some of the basic terms we use in behavioral science statistics?
What are data and where do they come from?
What is reliability and validity and how do they influence data?
What are the types of claims we make about data?

2.1 What are data?

Before we can learn the story of numbers, there are a few terms we must define. I know definitions can seem boring and as a reader, you’re already tempted to skim this chapter. Fight the temptation! Understanding these definitions will make it a lot easier to understand how we use numbers to answer questions.

Statistics is a story about data, which is a catch-all term for things we are analyzing. Data can be anything from a random list of numbers to the text of a Shakespeare play. For the purposes of this class, data are a series of individual values called data points, often grouped in variables and observations.

A variable is something that we can measure with a series of data points. It is simply a list of data points where each value reflects a different data point. Examples of variables are things such as the names of students in your class, the addresses of your aunts and uncles, the time of day people were born, the height of basketball players, and so forth. Notice that variables take many different forms of data, such as numbers, text, or times.

A variable in and of itself may not be that useful. For instance, a variable called age may have the values 33, 21, 48, 12, 69. And a variable called name could have the values “Fiona”, “Serena”, “Octavius”, “Kasper”, “Marshall”, and “Elena”. These two variables together can become more powerful if we connect them.

A dataset, as defined here, is a set of data, grouped in variables and observations, or related data points between variables that are connected. When we present and analyze data, we will always use columns to represent variables and rows to represent related observations. Often, each row will represent an individual person, with each column being specific information about a person. For instance, variables might be name, address, sex, age, and birthday. Each row would represent a person. Or, in an experiment, a row may represent a different participant, with their scores on each variable of interest, such as a different column for each type of personality test they took. Here’s an example, with each row indicating a different person and each column indicating different variables:

Name	Age	Favorite type of food	Spatial Navigation Test Score
Ravi	37	Italian	41
Mandala	24	Peruvian	56
Reese	18	Mexican	39
David	57	Chinese	24

Setting up your data correctly is very important. I would say 50% of the stats questions I get from students doing research projects are about how to set up data. If you set up your data in this format, with rows representing related observations and columns representing variables, you’ll be set. I’ll talk more about this in Chapter 4, but it’s good to start thinking of data this way.

2.2 Getting data

When we measure variables, we are interested in measuring populations, which is the group we are interested in measuring. We decide what group we are interested in making conclusions about based on what we are interested in studying. The population of interest can be as big or as small as we want them to be. In psychology, we may be interested in all humans, or a subset, like all children between the ages 2-4, or all younger siblings. Or our groups may be more limited. A pollster might only be interested in registered voters in a specific county.

Usually, we can’t measure all the members of a population, so we have to use samples. A sample is any subset of a population. A person conducting a survey may only ask some of the shoppers in a store about their attitudes in order to assess what shoppers overall think about the store. Pollsters only ask a subset of voters to find out which candidates are likely to be elected. The hope with samples is that we can use samples to understand the population.

2.3 What can we use data for?

In my experience, statistics can be used to do three things with data: Describe, Infer, and Predict. Statistics has several branches, and separate branches for each of these processes.

The first purpose of statistics is to describe and summarize data, which is generally called descriptive statistics, or the branch of statistics dedicated to using techniques to effectively describe data. This includes summary statistics such as means and standard deviations, which can give us an idea about large amounts of data. This also involves graphs, which can transform many numbers into beautiful and informative figures.

Inferential statistics is a branch of statistics concerned with using samples to understand and make conclusions about populations. Samples are subsets of populations and every sample may be different. I may want to find out what colors are popular. If I ask 20 people at random what their favorite color is, I would get one set of results. Then if I ask a different set of people their favorite color, I would likely get different results. Inferential statistics helps us to make conclusions about populations with samples.

Models are mathematical functions which allow us to use data to predict. They can be simple lines or complex models which can use millions of variables to predict the weather. Models are never perfect but can be very useful if used correctly.

2.4 Where do data come from?

I am a psychologist. I became a research psychologist because psychology is hard. I believe psychologists make some of the best statisticians because psychology is so difficult.

Psychology is hard because what we study is really hard to observe. Psychology can be defined as “the study of the brain, mind, and behavior”. But how does one observe these things and then turn those observations into data? Behavior is fairly easy to observe, but psychologists are interested in all those unseen processes in the mind as well. How do you measure a thought? An emotion? A memory? No one in the right mind (pun definitely intended) would deny that there is a mind and that mental concepts like memory, attention, thoughts, emotions, and so forth are real. But how do we measure them?

Traditionally, science used to be about measuring things we could observe with our basic senses. For instance, physicists could observe ideas like motion and energy. They could agree on basic definitions of these concepts, like agreeing that temperature could be measured by how much mercury expanded in a tube. This ability to objectively observe phenomena is one of the most important elements of science and is one reason many people used to consider psychology not to be a science. To address these concerns, behaviorists wanted to apply this approach to psychology and only study stimuli and behaviors which could be objectively observed and ignore mental processes which cannot be observed.

Fortunately, science has moved well beyond this basic definition. Now we accept we can study phenomena that we can’t directly observe if we can operationalize them. Operationalization is where we get our data from. When we measure something, we have a conceptual idea of what we want to measure, which is called a construct. A construct is a set of concepts that is usually subjective and generally has a theoretical meaning. A construct is often something that we know exists and can often easily understand but is often hard to define. For instance, you can try defining time. You can take a minute, or if you’re a philosopher, take a life-time (another awful pun!). We all know what time is, but it’s very hard to explain the concept.

Color is another example. It’s really hard to define different colors precisely, and if you’ve ever had an argument about whether something is sea green or robin’s-egg blue, this becomes obvious. But we generally agree on what certain colors are. Taking a shade of color and calling it “blue” is operationalizing.

In statistics, we like to be precise about operationalizing. An operational definition is a precise set of steps that takes a concept and turns it into a specific datapoint. Operational definitions have to be clearly defined and able to be reproduced in many different circumstances. Here are some examples of operationalizing color:

Take the wavelength of the light reflected by a color and say that certain colors are defined by certain wavelengths as measured by a specific machine.
Ask a famous artist to label colors based on their expertise
Ask twenty people to choose what word best describes a particular shade of color and take the most frequent response

Each of those definitions has problems. There is no correct way to operationalize a concept but there are many wrong ways, and some ways are clearly better than other ways. Here is another example. I want to operationalize how fast schoolchildren can run. I could do the following:

Ask children to tell me whether they consider themselves “fast”, “average”, or “not fast”
Have children run a 100 meter dash and time them with a stopwatch
Have children run a 1 mile race and write down what place they finished in, such as first place, second place, etc.

Each of these definitions gives us a clear datapoint I can associate with each child. The steps are reproducible, such that any person could generate similar data using the same definition. However, each of these definitions measure different concepts. How fast a person can run is a broad concept. It could reflect speed in a short distance or speed in a long distance. It could reflect how fast people run in a race versus how fast they run alone.

Like what was mentioned in the last section, some operational definitions are better than others. If a person wants to make data fit a specific story, then they can change the definition to make that story look better. For instance, if a new police commissioner wants to show that they have reduced crime, one thing they can do is redefine crime itself. For instance, many crime surveys ignore crimes that are committed by prisoners in prison. If we do that, one way we can reduce crime is by placing more people in prison. This can reduce the crime rate because it moves people to prison, where their crimes won’t be counted. This shows the importance of using the same definition and sticking with it, even if it has some problems.

2.5 How accurate are our data?

When we use the word “accuracy”, we’re measuring how much our numbers match the thing they are supposed to measure. In science, accuracy consists of two principles: validity and reliability. Reliability is the degree to which our measurements are stable or consistent. For instance, if we measure the same thing repeatedly, will we get the same results? Validity is the degree to which our data reflect the concept we are trying to assess. There are many types of validity, which are much better discussed in a research methods textbook. But I’ll give a brief overview here.

One data example comes from my fitness tracker, which is a band I wear on my arm that estimates how many steps I take, floors I climb, and my heart rate. My fitness tracker also tells me how many miles I’ve walked today. A reliable fitness tracker will give the same distance and steps estimate each time I walk the same distance. For instance, if I walk ten laps around a track, the amount of steps it says I took each lap should be similar, since each lap is the same distance and my steps and distance should be very similar. However, if it measures .2 miles one lap and then .9 miles the next lap, the reliability is probably very low.

High reliability doesn’t necessarily mean high accuracy. Imagine every time I run a mile, the fitness tracker says I only ran .8 miles. The readings are reliable, because they are close to one another, but they are reliably underestimating my distance. Or likewise, my tracker might double-count each step. Each time I run a mile, it suggests I ran 2 miles. It is reliable but it is reliably overestimating my data.

Validity is a much more philosophical concept than reliability, and one that is often outside of the scope of statistics. At its simplest, validity represents whether the data I collect are a good measure of the concept. In the steps example above, it captures how well my fitness tracker actually captures steps or measures distance. If the fitness tracker is close to the actual number of steps, it is valid.

Validity also represents how well variables are operationalized. Simple to operationalize variables like steps or distance are easy to examine, but most variables in psychology are harder to operationalize. Many examples of how people misuse statistics come from using less valid operational definitions.

In the example above, redefining crime by not counting prison crime may cause crime rates to decline as more people are incarcerated. This may make it appear like crime rates are lower and that higher rates of incarceration reduces crime. If you are trying to count the number of times that a crime happens overall, this might not be a very valid definition. However, if you’re trying to use crime as an index of how likely a person is to be a victim of a crime outside of prison, prison crime is much less relevant. In that case, it may be a good idea to redefine crime to not include prison crime.

The final thing to note is that variables can be either reliable or valid or both (or neither). A reliable but not valid variable will give the same answer each time, but overestimate or underestimate the variable. For instance, if I have a blood pressure machine that always reads my blood pressure as 10 points higher than it really is, it is reliable but not valid. A variable can be valid but not reliable. If for instance my fitness tracker sometimes overestimates and sometimes underestimates my steps, but on average is pretty close to the actual number of steps I take, it would be valid but not reliable.

2.6 Claims about data

We want to figure out what kind of stories data are telling. Many times we do this by having specific questions and using data to answer those questions. In psychology, we often do this by designing experiments with the purpose of answering specific types of questions, which I will call claims.

2.6.1 Frequency claims

The first type of claim are frequency claims. These are claims about how often or how frequently something happens. If I am interested in how frequently something occurs in a group of people or if something is more frequent in one group of people versus another group of people, this would be a frequency claim.

A frequency claim may be a poll examining what percent of people will vote for a specific candidate. It might be an analysis about what percent of the population has a specific attitude or belief. It could be a measure of whether a mental disorder occurs more frequently in a certain sample than in the population at large.

It could be a comparison between two groups, like if I set up an experiment comparing children who saw a violent TV show versus children who did not. My hypothesis is that a greater percentage of children who saw the violent TV show would engage in violent play than the percentage of children who did not see the violent TV show.

Frequency claims often compare categorical variables, which are variables that only indicate what category a person is in and nothing else. A categorical variable can be a word, such as “female” or “student”. They indicate what group a person is in. There can be cases where categorical variables are dichotomous or where there are only two options for groups. Often these are cases where the options are yes or no, like if I indicated whether a person is a smoker or is not a smoker.

Categorical variables can also be numbers, but in this case the numbers are only indicating what group a person is in. They do not have any other meaning. Zip codes are an example. For example, the zip code 33472 is not 10000 more anything than the zip code 22472. In this case, the numbers only tell us what category a person lives in. Whether a zip code is higher or lower means nothing; it could just as easily be expressed as a word. The important reason for this is because doing math on categorical numbers is meaningless. For instance, the total zip code of a group of data would be meaningless.

In some cases, we dummy code categories using numbers. For instance, I may code a variable indicating smoking status as 0 for non-smoker or 1 for smoker. Even though I’m using numbers, the numbers only indicate category and the fact that one category is coded 0 and the other 1 doesn’t mean the latter category is more or better or higher in any way. This is done for convenience and often happens in data where each of the categories have very long names.

2.6.2 Association claims

Association claims are claims where a person is examining whether two or more variables are related to one another. This includes ideas such as correlation or even examining whether one variable predicts another variable.

In association claims, sometimes we just want to see whether two variables are related. For instance, we could examine if there’s an association between how extraverted a person is and how many friends they have. Or I could examine whether there’s a an association between a person’s relationship satisfaction and how much time they spend with their partner. Or I could see whether the amount of words in a list is associated with how many words a person remembers.

In some association claims, I want to examine whether two variables are related without having any idea whether one variable is causing the other variable. However, in other cases, I may have a clear idea that one variable is causing another variable. For instance, I may have the hypothesis that the amount of time a couple spends together is related to their relationship satisfaction. In that case, I would call the amount of time a predictor variable and the relationship satisfaction the outcome variable. Sometimes people call these independent and dependent variables respectively, though independent variables usually refer to variables where the experimenter manipulates something.

Association claims are measured by correlations and regressions. The key to know about association claims is this: even though we may have an idea which variable causes the other variable, we can never know for sure. This is why I go around shouting “correlation does not imply causation”. I may think that the amount of time a couple spends together is causing their relationship satisfaction, but the opposite may be true. Relationship satisfaction may be the thing which causes a couple to spend time together or not.

Correlation may not imply causation but often it’s as good as we can get in science. This is because it’s often unethical or impossible to test things using real experiments, where we carefully manipulate a variable and see how it affects another variable. One example is the well-known link between smoking and cancer. This is only a correlation because we can’t do a study where we assign one group to smoke for many years and assign another group to not smoke for many years and examine which group gets cancer more. That experiment would be impossible.

Because this was only a correlation, cigarette companies stated for years that science did not prove that smoking caused cancer. They were correct when they said this and many smokers used these statements to justify their continued smoking. However, when comparing the biological evidence that there are many carcinogens in cigarette smoke with the strong correlation between the amount and time a person smokes and their likelihood of cancer, we can make a very strong case that smoking causes cancer.

2.6.3 Group mean claims

The third type of claims are what I call group mean claims or claims about the mean (or average) of a sample. These claims are questions examining whether the mean of something in a sample is different from a population or whether the mean of something is higher in one group than another group.

I might examine whether there is a difference in the mean GPA of high schoolers who play sports versus those who do not. Or I might want to look at whether the mean score on an anxiety measure is lower in a group of people who meditate regularly than the population at large. Another example might be looking whether there is differences in mean reaction time in a group that drank no coffee, versus a group who drank one cup, or a group who drank two cups.

In all those cases, I have a measure which can be turned into a number for each person in my study. Then I look at whether the mean of that number (the average) is different in one group versus another group or whether the mean of a sample is the same or different from the population.

A mean claim is different from a frequency claim because a mean claim involves a variable that is not categorical but can be a number which is averaged. Mean claims involve means which are usually not percents or proportions or frequencies.

2.7 Summary

To summarize, there are a lot of things we can use statistics for. In this chapter, you should have learned about the following terms:

Principles of data

Data
Observations
Population
Sample

Types of statistics

Descriptive Statistics
Inferential Statistics
Models

Where do data come from

Construct
Operationalize
Reliability
Validity

Types of claims

Frequency claims
Categorical variable
Dichotomous variable
Dummy code
Association claims
Predictor variable
Outcome variable
Group mean claims