What Is R Squared? (Definition and How to Calculate It)

By Indeed Editorial Team

Published June 2, 2022

The Indeed Editorial Team comprises a diverse and talented team of writers, researchers and subject matter experts equipped with Indeed's data and insights to deliver useful tips to help guide your career journey.

Professionals in various industries use data analysis to make reliable business decisions that may affect company performance. One type of data analysis that many organizations use is regression analysis, which involves evaluating the impact of one variable on another. Understanding the relevant concept of coefficient of determination, also known as R squared, may be helpful if you're considering a career in data science, the social sciences, or finance. In this article, we define what r squared is, discuss its importance, share its formula, and list the steps on how to calculate it manually using an example.

What is r squared?

R squared (R2) or coefficient of determination is a statistical measure of the goodness-of-fit in linear regression models. While its value is always between zero and one, a common way of expressing it is in terms of percentage. This involves converting the decimal number into a figure from 0% to 100%. Linear regression is a kind of predictive analysis that describes the relationship between two types of variables, which are independent and dependent. The general goals of using regression analysis are to understand and then predict the impact of independent variables on a dependent factor.

Goodness-of-fit refers to the proximity of actual data or observed values to a regression line, which contains fitted values resulting from predictions on the relationship between variables. The closer the actual data points are to the line, the higher the value of the coefficient of determination. And the closer this value is to one, the stronger the relationship is between the independent and dependent variables. This information can be helpful to individuals whose jobs involve working with data and making predictions, such as data scientists and investors.

Related: Guide to Standard Deviation in Finance and How to Calculate

The importance of the coefficient of determination

The coefficient of determination indicates the percentage that the independent variable can explain the variance in the dependent variable. For example, it can show the relationship between inflation and interest rates, because a significant inflation may trigger higher interest rates. Many organizations perform regression analysis to:

  • Understand a phenomenon: A company may use this statistical concept to determine the factors that affect its performance. For example, it may help an organization understand why the sales for a particular item decreased in a specific month.

  • Predict future events: Retailers often use the coefficient of determination to predict their sales in quantifiable terms.

  • Make business decisions: The coefficient of determination can help organizations determine effective strategies that can improve their performance. For instance, if by using regression analysis, they see that employing a marketing strategy can explain the increase in sales numbers, they may choose to utilize it instead of another method.

In the finance industry, investors use the coefficient of determination when comparing a fund to a specific benchmark index. The result of their analysis often guides their decisions and helps them diversify their investment portfolio. It's important to understand that the coefficient isn't a measure of a fund's performance but a way to assess the degree to which it matches the movements of a benchmark, such as the S&P/TSX Composite Index. Many investors attempt to choose instruments with high and low coefficient of determination values when compared with a benchmark to increase their chances of generating profit.

Related: 12 Jobs for Statistics Majors (With Salaries and Duties)

The formula for coefficient of determination

There are several formulae that business professionals use to calculate the coefficient of determination. Here are two formulae to obtain this statistical value:

Straightforward formula

Using the equation that is equal to the square of the formula for correlation coefficient or R may be the most straightforward. R indicates the relationship between an independent and dependent variable. With its formula as the base, the equation for the coefficient of determination is:

R2 = [(NsumXY – (sumX) (sumY)) / (sqrt (NsumX2 - (sumX)2)) (sqrt (NsumY2 - (sumY)2))]2

Where:

  • N is the number of observations in the dataset.

  • X and Y are the variables.

  • Sqrt is square root. While it's possible to calculate the square root of a number by hand, it may be easier to use a calculator to perform this operation.

  • Sum is short for summation, which is adding a sequence of numbers. For example, sumX is the total of all the X values and sumXY is the total of the products resulting from multiplying the values of X and Y.

Sum of squares formula

Discussions involving the coefficient of determination often mention the concept of sum of squares. This is a technique in regression analysis to determine dispersion among data points. Computing the coefficient of determination by applying this requires more steps and knowledge of statistical concepts. Here's an equation for the coefficient of determination using sum of squares:

R2 = SSR / SST

Where:

  • SSR is the regression sum of squares or explained variation. This describes the extent to which the regression model or line represents the data.

  • SST is the total sum of squares. It indicates observed data variation and is the result of squaring the differences between each of the data points and the dependent variable's mean.

Related: How to Learn Data Science (A Complete Guide)

How to calculate the coefficient of determination using an example

You can use the coefficient of determination on investment situations, business cases, or personal life events. For example, suppose that five students obtained individual scores of 90, 70, 65, 72, and 75 on a mathematical aptitude test which they took before the start of the school year. At the end of the year, these same students received grades of 96, 72, 76, 78, and 80 in their math subject. You can calculate the coefficient of determination here by following these steps:

1. Determine the formula to use

There are several formulae you can choose from to solve for the value of the coefficient of determination. Equations that involve computing for the sum of squares typically require a working knowledge of statistics, specifically concepts such as mean and standard deviation. When performing a manual computation of a coefficient of determination, it's beneficial to choose the formula that's easiest to follow. To answer the sample problem, you can use the equation for correlation coefficient (R) first and then square the result to get the coefficient.

2. Identify the values of X and Y

It's important to determine the figures to use in a word problem to ensure that you get the right answer. In the sample problem, the values of X, which correspond to the independent or predictor variable, are 90, 70, 65, 72, and 75. The values of Y, which is the dependent or response variable, are 96, 72, 76, 78, and 80.

3. Calculate metrics

This involves computing the value of each of the metrics necessary for you to use in the formula. Doing this can help you solve the sample problem more easily than calculating each variable as you encounter it in the equation. Considering the formula, here are the important metrics and their corresponding values regarding the sample problem:

  • sumX = 372

  • sumY = 402

  • sumXY = 30,236

  • sumX2 = 28,034

  • sumY2 = 32,660

Solving for the value of sumX and sumY requires adding the values of X and Y. Getting the answer for sumXY involves multiplying each value of X and Y and adding the products. You can calculate sumX2 and sumY2 by first multiplying each X or Y value by itself, and then adding the results.

4. Write the formula

To understand what values are essential, it's beneficial to write down the complete formula. This can also show you if you're omitting a value or calculation. Here is the formula you can apply to the example:

R2 = [(5 x 30,236) - (372) (402) / (sqrt ((5 x 28,0342) – (372)2)) (sqrt ((5 x 32,6602) – (402)2))]2

5. Solve the first part of the equation

The first part of the equation is NsumXY – (sumX) (sumY). Considering the sample problem, the value of N is 5 because that is the number of observations. When you substitute the other metrics with the values in the problem, the equation becomes (5 x 30,236) – (372 x 402). The answer to the first part is 1,636.

6. Calculate the second part of the equation

The second part, (sqrt (NsumX2 - (sumX)2 )) (sqrt (NsumY2 - (sumY)2)), involves more complex operations compared to the first part of the equation. You can simplify it into (sqrt 1,786) x (sqrt 1,696). While it's possible to compute manually the square root of a number, it may be more efficient to use a calculator. The answer to the second part of the equation is about 1,740. 418. e

7. Compute and interpret the coefficient of determination

It's now possible to solve for the value of the coefficient. When you substitute the values you obtained with the equation, the result is:

R2 = (1,636 / 1,740.418)2

To compute this, divide the answer to the first part by the answer to the second part of the equation. Then raise the quotient, which is 0.94, to the power of two. This means multiplying the number by itself once. The definitive answer and the coefficient of determination is 0.884. When interpreting this, it's beneficial to first convert the decimal number to percentage. You can say that their scores in the mathematical aptitude test can explain 88.4% of the differences in grades among the five students.

Please note that none of the companies, institutions, or organizations mentioned in this article are affiliated with Indeed.

Explore more articles