Little girl getting her height measured against a ruler on the wall. This helps represent data and correlation
Outlier Articles Home

Statistics

Understanding the Pearson Correlation Coefficient

04.11.2023 • 6 min read

Sarah Thomas

Subject Matter Expert

In this article learn what Pearson Correlation Coefficient is and the strength of association. Also, read about when to use it, how it’s calculated, and faqs.

In This Article

  1. What Is a Correlation?

  2. What Is the Pearson Correlation Coefficient?

  3. How To Determine the Strength of Association

  4. Correlations and Scatter Plots

  5. When To Use the Pearson Correlation Coefficient

  6. How To Find the Pearson Correlation Coefficient

  7. Frequently Asked Questions

Pearson's correlation coefficient is a statistical measure that helps us determine the relationship between two variables. It is widely used in fields including finance, the social sciences, and the natural sciences.

In this article, we’ll discuss the concept of Pearson's correlation coefficient, how to calculate it, and how to interpret it.

What Is a Correlation?

In statistics, correlation is a measure of the relationship between two variables. Correlations can be positive, negative, or zero.

  • If two variables in your dataset, X and Y, have a positive correlation, it means they tend to move together in the same direction. As one variable increases, so does the other.

  • A negative correlation tells you the two variables tend to move in opposite directions. There is a negative relationship between the two variables. If X increases, Y tends to decrease, and vice versa.

  • A zero correlation means no observable association between variables x and y.

Always remember that correlation is distinct from causation. When two variables are correlated, you cannot conclude that one of the variables causes the other to change. To establish causation, you need additional evidence and information.

If you want to learn more why correlation does not mean causation, here’s a short lesson:

Intro to Statistics

Intro to Statistics

How data describes our world.

Explore course

What Is the Pearson Correlation Coefficient?

The Pearson correlation coefficient is one of the most common methods for measuring correlation. You might also hear this term being called Pearson’s r, a bivariate correlation, the Pearson product-moment correlation coefficient (PPMCC), or just, the correlation coefficient for short.

A Pearson correlation coefficient measures a linear correlation's direction and magnitude. A linear association—as opposed to a non-linear one—is a correlation approximated by a straight line, where the change in one variable is approximately proportional to the observed change in the second variable.

A linear correlation is strictly positive or negative, whereas a non-linear correlation can change with the values of x and y.

Graph showing a linear correlation and a non-linear correlation

How To Determine the Strength of Association

Pearson’s r ranges from -1 to 1, where -1 represents a perfect negative correlation, and 1 represents a perfect positive correlation. The closer the absolute value of r is to 1, the stronger the correlation, and the closer the absolute value is to 0, the weaker the correlation.

A strong correlation means a stronger association between the two variables. If X and Y are strongly correlated, knowing the value of X gives you more information about Y—and vice versa—compared to when the variables are weakly correlated.

1. Perfect Negative Correlation (r=-1)

A perfect negative correlation is an association between two variables where an increase in one is always associated with a perfectly proportional decrease in the other. In other words, the two variables have a perfectly proportional inverse relationship. The correlation coefficient for a perfectly negative correlation is -1.

2. Negative Correlation (-1≤ r <0)

A negative correlation is any inverse correlation where an increase in the value of X is associated with a decrease in the value of Y. For a negative correlation, Pearson’s r is less than 0 and greater than or equal to -1.

3. Zero Correlation (r=0)

A zero correlation indicates there is no observable linear relationship between your two variables. Knowing the values of X will not tell you anything about the value of Y.

4. Positive Correlation (0<r≤1)

A positive correlation is any correlation where an increase in the value of X is associated with an increase in the value of Y, and a decrease in the value of X is associated with a decrease in the value of Y. For a positive correlation, Pearson’s r will be greater than 0 or less than or equal to 1.

5. Perfect Positive Correlation (r=1)

A perfect positive correlation is an association between two variables where an increase in one is always associated with a perfectly proportional increase in the other. The correlation coefficient for a perfectly positive correlation is 1.

Line graph showing perfect negative correlation, negative correlation, no correlation, positive correlation, and perfect positive correlation

Correlations and Scatter Plots

Scatter plots are a useful way of visualizing correlations. A scatter plot is a graph that maps the values of one variable—measured along the x-axis—to the values of the second variable—measured along the y-axis.

If there is a linear correlation between your two variables, you can draw an upward or downward-sloping straight trend line through your data to approximate the association.

By looking at a scatterplot, you should be able to determine both the direction and magnitude of a linear correlation.

1. Is the correlation positive or negative?

Looking at a scatter plot, you can tell whether a correlation is positive or negative by the slope of the trend line.

  • A negative downward-sloping line indicates a negative correlation

  • A positive upward-sloping line indicates a positive correlation

  • If no linear trend line can be drawn through the data, there is no linear correlation.

2. Is the correlation strong or weak?

You can approximate the strength of a correlation by looking at how close the data points are to the trend line.

  • The more closely clustered the data points are around the trend line, the stronger the correlation.

  • The further away the data points are from the trend line, the weaker the correlation.

Graph showing negative correlation, no correlation, and positive correlation

When To Use the Pearson Correlation Coefficient

You can use the Pearson coefficient under the following circumstances:

  • The variables you are comparing are both quantitative variables. If you are working with ordinal variables, you can use the Spearman rank correlation or Kendall’s tau, and If your variables are nominal, you can use a correlation measure called Cramér’s V.

  • Each variable is normally distributed.

  • You have no outliers in your data. Correlations are extremely sensitive to outliers. If you include even one outlier in your calculations, you will get misleading results.

  • The relationship between the two variables appears to be linear rather than non-linear

How To Find the Pearson Correlation Coefficient

How To Calculate Pearson’s r by Hand

Pearson’s correlation coefficient is equal to the covariance of your two variables divided by the product of their standard deviations.

r=covxysxsy=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2r = \dfrac{cov_{xy}}{{s_x}{s_y}}=\dfrac{\sum{(x_i-\bar{x})(y_i-\bar{y})}}{\sqrt{\sum{(x_i-\bar{x})^2(y_i-\bar{y})^2}}}

Where:

  • r is the Pearson correlation coefficient

  • x and y are the two variables

  • cov(x,y) is the covariance of x and y

  • sx is the standard deviation of x

  • sy is the standard deviation of y

  • xi is each individual observation of x

  • yi is each individual observation of y

  • x̄ is the mean of x

  • ȳ is the mean of y

How To Calculate Pearson’s r in Excel

You can also calculate Pearson’s r in Excel using the =PEARSON() function.

Suppose your data for variable x is stored in cells A1 through A20 and that your values for variable y are stored in cells B2:B20. You can calculate Pearson’s r with the following command:

=PEARSON(A1:A20, B2:B20)

Frequently Asked Questions

What does a Pearson Correlation of 0.2 mean?

Any correlation where the absolute value is between 0 and 0.4 is considered a weak correlation.

What does a Pearson Correlation of 0.5 mean?

A Pearson correlation coefficient of 0.5 indicates a moderate positive correlation. More generally, a correlation coefficient between 0.4 and 0.7 is usually considered a moderate correlation.

Is 0.7 a strong correlation coefficient?

A Pearson correlation coefficient of 0.7 and above is typically considered a strong positive correlation.

Why is correlation important?

Correlations are important descriptive statistics because they establish an association between variables. Even when this association is not causal, a correlation tells you that you can garner some information about one of the variables based on the other.

For example, suppose you know that test scores are correlated with the number of hours a test taker studies for the test. If you know a test taker has a high score, you can predict that their study hours are also high, and the stronger the correlation, the more confident you can be in the prediction.

Does the Pearson correlation coefficient tell you how steep the slope of the trend line is?

The Pearson correlation coefficient, r, does not represent the slope of the linear trend line. Instead, the value of the correlation coefficient tells you about the direction of the correlation and the strength of the relationship between your two variables.

For example, if you find that r equals 0.7, it does not mean that the slope of your trend line is 0.7. A correlation coefficient of 0.7, instead, tells you that there is a strong positive correlation between your two variables. This tells you that the slope of your trend line will be positive, but it does not tell you how flat or steep the trend line will be.

Can you estimate a population correlation coefficient using a sample correlation?

Yes. More often than not, you will be using statistical inference to infer a population correlation (a parameter) from sample data. In this case, you need to make sure you have independent observations.

If you are working with a small sample size, you’ll need to assume that your two variables are bivariately normally distributed in the population. You will also need to use hypothesis testing to determine the statistical significance of your estimate.

Explore Outlier's Award-Winning For-Credit Courses

Outlier (from the co-founder of MasterClass) has brought together some of the world's best instructors, game designers, and filmmakers to create the future of online college.

Check out these related courses:

Intro to Statistics

Intro to Statistics

How data describes our world.

Explore course
Intro to Microeconomics

Intro to Microeconomics

Why small choices have big impact.

Explore course
Intro to Macroeconomics

Intro to Macroeconomics

How money moves our world.

Explore course
Intro to Psychology

Intro to Psychology

The science of the mind.

Explore course

Share