Statistics

What Is a Residual in Stats?

03.02.2022 • 8 min read

Sarah Thomas

Subject Matter Expert

This article gives a quick definition of what’s a residual equation, the best way to read it, and how to use it with proper statistical models.

In This Article

What is a Residual?

A residual (or error) is the difference between the predicted value of your data and the actual value of your data. Often we denote a residual with the lower case letter $e$ .

Calculating residuals is easy. You can find residuals using the following equation.

e = y - ŷ

$e$ is the residual for a given observation of a variable
$y$ is the actual or observed value of $y$
$ŷ$ is the predicted value of $y$

Example and Practice Finding Residuals

The following table contains a data set with the weight and height of ten individuals. Then, we’ve plotted the data on a scatter plot with weight on the horizontal axis (the x-axis) and height on the vertical axis (the y-axis).

Weight (x)	Height (y)
127 lbs	64 in
138 lbs	61 in
145 lbs	68 in
155 lbs	65 in
159 lbs	70 in
163 lbs	71 in
175 lbs	67 in
180 lbs	76 in
190 lbs	75 in
220 lbs	78 in

GRAPH 1- Residual Equations - scatter plot graph

In statistics, you will often use a model to approximate the relationship between two variables. This model could take the form of a line (a linear model) or a curve (a non-linear model).

Linear Model

In this example, we’ll use a linear model to approximate the relationship between the weights and heights in our data set (the linear model is represented by the green line in the figure below).

Using this model, we can now estimate a person’s height given their weight. For example, if we know a person weighs 132 lbs, our model estimates 63 inches or 5 ft 3 inches for the person’s height.

Notice that the model does not perfectly line up with the data. If it did, every point on the scatter plot would fall directly on the line, and the predictions of the model would match the data perfectly. Instead, we see a discrepancy between the data and the heights predicted by the model.

For example, take a look at the actual and predicted heights associated with a weight of 138 lbs. Our model predicts that a person weighing 138 lbs will be 64 inches tall, but the data shows a person weighing 138 lbs who is only 61 inches tall.

For any given weight, the difference between the actual height we observe in the data ( $y$ ) and the predicted height given by the model ( $ŷ$ ) is what we call the residual. For the observed data point (138, 61), the residual is $y-ŷ$ = 61-64 = -3.

Note that when the actual value from your data lies below the linear model y < ŷ, you will get a negative residual. When the actual value from your data lies above the linear model y > ŷ, you will get a positive residual. This is because you always subtract the predicted value from the actual value to find the residual.

linear model resulting in negative residual

Here's an explanation of linear regression models from one of our instructors.

Now that you know how to find residuals see if you can find the residuals for the remaining data points. Use the actual and predicted values of y in the second and third columns of the table below to fill out the last column labeled “Residuals (e).”

Weight (x)	Actual Height (y)	Predicted Height (ŷ)	Residuals (e)
127 lbs	64 in	62 in
138 lbs	61 in	64 in	-3
145 lbs	68 in	65.5 in
155 lbs	65 in	67.2 in
159 lbs	70 in	68.1 in
163 lbs	71 in	68.85 in
175 lbs	67 in	72.12 in
180 lbs	76 in	71.2 in
190 lbs	75 in	74 in
220 lbs	78 in	80 in

Answer key: (From top to bottom) 2, -3, 2.5, -2.2, 1.9, 2.15, -5.12, 4.8, 1, -2

Interpreting Residuals

One of the biggest challenges of building a statistical model is deciding which model to use. How do you know whether to use a linear or a non-linear model? If you decide to use a linear model, how do you know what the slope and intercept of the line should be? What is the line of best fit?

Residuals are incredibly useful for determining which models are best suited for a particular data set. Using something called a residual plot graph, we can determine whether a linear or a non-linear model is preferable. We can also use the sum of the squared residuals to find a model that minimizes residuals. We’ll cover both of these topics next!

What is a Residual Plot?

A residual plot is a scatter plot with the residuals of a variable plotted on the y-axis and the values of the x-variable plotted on the x-axis.

Continuing from our example above, let’s create a residual plot for our data on heights and weights. Earlier, we found that the residuals for our data were: 2, -3, 2.5, -2.2, 1.9, 2.15, -5.12, 4.8, 1, -2. Our residual plot has these residual values plotted against the y-axis, and the observed weights plotted against the x-axis.

You should always use a linear model when there is a linear relationship (either a positive or negative correlation) between your variables. You should use a non-linear relationship when the correlations between the variables change between being positive and negative. Sometimes, a scatter plot of your data will clearly show a linear or non-linear trend, but sometimes the pattern can be more ambiguous. If this is the case, you can use a residual plot to determine which model to use.

When a residual plot shows the residual values plotted randomly above and below the x-axis, then a linear model is a good fit for the data. This was the case in the residual plot for heights and weights, so we were right to use a linear model instead of a non-linear model.

When the plotted residuals follow a u-shaped pattern or an inverted u-shaped pattern, then a non-linear model is better suited for your data.

Graph - When plotted residuals follow u-shaped pattern or inverted u-shaped pattern, then use non-linear model

Concept of Linear Association and Linear Regression

So far, we have talked a lot about statistical models. These models are called regressions. A linear regression model or regression line is the same thing as the linear models we have been discussing so far. It is a model of the association between two variables — an independent variable $y$ - also called an outcome or response variable) - and a dependent variable $x$ - or explainer variable.

Because a linear regression is a line, we can express linear regressions using the equation for a line.

As you may recall from a geometry class, the equation for a line is y=mx+b, where:

$y$ represents the value of the variable plotted on the y-axis
$x$ represents the value of the variable plotted on the x-axis
$m$ represents the slope of the line
$b$ represents the y-intercept of the line

We use the same equation but with slightly different notation when dealing with linear regression. You’ll often see a linear regression expressed in one of the following ways.

Linear Regression Equation

$ŷ = a + bx$ or $ŷ_{i} = ꞵ_{0} + ꞵ_{1}x_{i}$

$ŷ$ or $ŷ_{i}$ is the predicted value of $y$
$a$ or $ꞵ_{0}$ is the vertical intercept of the linear regression
$b$ or $ꞵ_{1}$ is the slope of the linear regression (also referred to as the regression coefficient)
$x$ or $x_{i}$ is the value of the x-variable associated with a particular value of $y$

Residuals and Linear Regressions

Once we’ve determined that a linear regression - as opposed to a non-linear - should be used, we can continue to use residuals to determine which line is the “best fit” for the data.

Ordinary Least Squares

A method that is commonly used for this is called Ordinary Least Squares (OLS) method. In OLS, you choose the linear regression that minimizes the sum of the squared residuals. By doing this, you are essentially minimizing the discrepancies between the data and your model.

We square the residuals because some residuals are positive while others are negative. If we don’t square the residuals, the negative residuals will cancel out the positive residuals in our calculations. The reason why we don’t take the absolute value of the residuals instead of taking the squares is that we want to give more weight to very large residuals and make sure any large residuals are minimized. (A large residual squared will become an even larger number compared to a small residual that is squared.)

Ordinary Least Squares (OLS) Regression is a method for finding a linear regression where the regression used is the one that minimizes the sum of the squared residuals.

\text {OLS Regression Method}: min \Sigma (e_{n})^2

Let’s have another look at the scatter plot and linear regression we used for our weights and heights example.

Remember, the basic idea behind OLS is that we want to minimize the sum of the squared residuals. On our graph, the vertical distance of the lines represents the residuals.

If we were to use the OLS method, we would square the distance of each of the red lines and add all of the squared distances together. According to the OLS method, the regression we should choose is the one with the smallest squared distance!

On our graph, the vertical distance of the lines represent the residuals

Summary and Applications of Residuals

To recap, a residual tells us how well a model fits the data. It is the difference between the actual value of a variable $y$ and the predicted value of a variable $ŷ$ .

In regression analysis, residuals can be used to determine whether a linear or a non-linear regression should be used to model the data. This determination can be made using a scatter plot of residuals called a residual plot. Residuals can also be used to determine which regression is the best fit for your data.

A common method for finding a model of “best fit” is the OLS method. In OLS, you choose the regression that minimizes the sum of the squared residuals.

Explore Outlier's Award-Winning For-Credit Courses

Outlier (from the co-founder of MasterClass) has brought together some of the world's best instructors, game designers, and filmmakers to create the future of online college.

Check out these related courses:

Intro to Statistics

How data describes our world.

Explore course

Intro to Statistics

How data describes our world.

Explore course

Precalculus

Master the building blocks of Calculus.

Explore course

Precalculus

Master the building blocks of Calculus.

Explore course

Calculus I

The mathematics of change.

Explore course

Calculus I

The mathematics of change.

Explore course

Intro to Microeconomics

Why small choices have big impact.

Explore course

Intro to Microeconomics

Why small choices have big impact.

Explore course

Statistics

A Step-by-Step Guide on How to Calculate Standard Deviation

Standard deviation is one of the most crucial concepts in the field of Statistics. Here, we'll take you through its definition and uses, and then teach you step by step how to calculate it for any data set.

Sarah Thomas

Subject Matter Expert

Statistics

Calculating p-Value in Hypothesis Testing

In this article, we'll take a deep dive on p-values, beginning with a description and definition of this key component of statistical hypothesis testing, before moving on to look at how to calculate it for different types of variables.

Sarah Thomas

Subject Matter Expert

Statistics

Understanding Sampling Distributions: What Are They and How Do They Work?

Sampling distribution is a key tool in the process of drawing inferences from statistical data sets. Here, we'll take you through how sampling distributions work and explore some common types.

Sarah Thomas

Subject Matter Expert

If you could change one thing about college, what would it be?

What Is a Residual in Stats?

Sarah Thomas

Share

What is a Residual?

Example and Practice Finding Residuals

Linear Model

Interpreting Residuals

What is a Residual Plot?

Concept of Linear Association and Linear Regression

Linear Regression Equation

Residuals and Linear Regressions

Ordinary Least Squares

Summary and Applications of Residuals

Explore Outlier's Award-Winning For-Credit Courses

Intro to Statistics

Intro to Statistics

Precalculus

Precalculus

Calculus I

Calculus I

Intro to Microeconomics

Intro to Microeconomics

Share

Related Articles

A Step-by-Step Guide on How to Calculate Standard Deviation

Sarah Thomas

Calculating p-Value in Hypothesis Testing

Sarah Thomas

Understanding Sampling Distributions: What Are They and How Do They Work?

Sarah Thomas

Further Reading

A Guide To Understand Negative Correlation

Discrete vs. Continuous Variables: Differences Explained

Understanding Math Probability - Definition, Formula & How To Find It

Calculate Outlier Formula: A Step-By-Step Guide

Set Operations: Formulas, Properties, Examples & Exercises

Z-Score: Formula, Examples & How to Interpret It