If you’ve ever lived near a research hospital, you may have come across an advertisement that reads something like this:

Seeking individuals between the ages of 25 and 75 to participate in a 10-week clinical trial. Participants must currently be experiencing symptoms of insomnia with trouble sleeping more than 6.5 hours a night for at least a 3-month period. Contact XXX at the Medical Center for further information.

Statistical inference is the process of collecting a sample (or subset) from the population you are trying to study and gathering insights about the population using data you collect from the sample.

Medical teams are among the many researchers and analysts who use statistical inference in their work. In a clinical trial, clinicians and researchers collect data and run experiments on a sample group of patients to study the efficacy and safety of new treatments and drugs.

In other settings, researchers and analysts use sampling and statistical inference to improve products, study ecosystems, gauge political sentiment, and much more!

In this article, we’ll cover the difference between populations and samples and explain why sampling and statistical inference are integral to statistics.

What Is a Population?

In statistics, a population is the entire group of subjects affected by your research question.

Let’s look at some examples:

If you’re interested in how many voters will vote for the Republican nominee in the next presidential election, your population consists of all U.S. voters.

If you’re interested in how many voters will vote for the Democratic candidate in the next Illinois governor’s race, your population of interest is all Illinois voters.

If a medical team wants to know whether a certain drug improves outcomes for people who have insomnia, the population of interest consists of all people who have insomnia.

What Is a Sample?

A sample is a subset of a population. Statisticians use sample data to make inferences about a population when data for the entire population is unavailable. This process is called inferential statistics or statistical inference.

Here are some examples of samples:

If you’re interested in how many voters will vote Republican in the next presidential election, you might poll a group of people from across the U.S. and ask them who they plan to vote for.

A medical research team could run a clinical trial on a sample of 500 patients to study a new treatment's efficacy and possible side effects.

A wildlife biologist studying the migration behavior of American bison would have difficulty tracking every bison, but she could track and study the behavior of a sample group of bison.

Sampling Methods

For statistical inference to work, samples need to be representative of the population from which they are drawn. If your sample is not representative of the larger population, your inferences will be heavily biased, or worse, completely misleading.

As an example, imagine trying to gauge the popularity of a new plan in your city to build bike lanes. If you only sample residents who own bikes, your sample would not represent the views of the overall population. Support for the bike lanes is likely to be very high within your sample, leading you to overestimate the plan's popularity among the broader population of city residents.

To ensure a good sample, statisticians rely on various sampling methods. The gold standard of these sampling techniques is called the simple random sample, where selection into the sample is randomized, and members of the population have an equal chance of being selected into the sample.

When you can’t collect a random sample, you can rely on methods favoring convenience and feasibility over a purely random selection. You should be aware, however, that these methods introduce more bias and sampling error.

The size of a sample is another important aspect of sampling. To get a representative sample, your sample size needs to be large enough to represent the diversity of views or outcomes in your population.

Population Parameters and Sample Statistics

You can use the terms parameter and statistic to distinguish between measurements taken from a population and measures taken from a sample.

A parameter is a measurement, such as a mean or a standard deviation, describing some aspect of a population.

A statistic is a measurement, such as a mean or standard deviation, describing a sample.

Say you want to know the mean (or average) household income in India. You want to know a population parameter: the average household income across all households in India.

Suppose you are unable to collect data for every Indian household. In that case, you will be unable to calculate the population mean. Still, instead, you can gather data for a sample of households and use the sample mean to estimate the true population parameter.

Population vs Sample

In summary, here are the main differences between a population and a sample in statistics.

Outlier (from the co-founder of MasterClass) has brought together some of the world's best instructors, game designers, and filmmakers to create the future of online college.