With many things we try to do in Excel, there are usually multiple paths to the same outcome. Some paths are better than others depending on the situation. The same holds true for linear regression in Excel. There are three ways you can perform this analysis (without VBA). They are:
Each of these methods has an appropriate time and place. Let’s take a look at each one individually.
[Note: Want to learn even more about advanced Excel techniques? Watch my free training just for engineers. In the three-part video series I'll show you how to easily solve engineering challenges in Excel. Click here to get started.]
Linear Regression with Excel Charts
When you need to get a quick and dirty linear equation fit to a set of data, the best way is to simply create an XY-chart (or “Scatter Chart”) and throw in a quick trendline. Add the equation to the trendline and you have everything you need. You can go from raw data to having the slope and intercept of a best-fit line in 6 clicks (in Excel 2016).
Let’s say we have the data set below, and we want to quickly determine the slope and y-intercept of a best-fit line through it.
We’d follow these 6 steps (in Excel 2016):
- Select x- and y- data
- Open Insert Tab
- Select Scatter Chart
- Right-Click Data Series
- Select Add Trendline
- Check Display Equation on Chart
Now we know that the data set shown above has a slope of 165.4 and a y-intercept of -79.85.
Linear Regression in Excel with the LINEST function
The method above is a quick way to fit a curve to a series of data, but it has a significant downfall. The equation displayed on the chart cannot be used anywhere else. It’s essentially “dumb” text.
If you want to use that equation anywhere in your spreadsheet, you have to manually enter it. However, if you change the data set used to obtain the equation, that equation you manually entered will not update, leaving your spreadsheet with an erroneous equation.
What we need for these situations is a function that can perform the same kind of regression analysis done by the charting utility and output the coefficients to cells where we can use them in an equation.
The LINEST function does this perfectly. Given two sets of data, x and y, it will return the slope (m) and intercept (b) values that complete the equation
y = mx + b
The syntax of the function is as follows:
LINEST(known_y’s, [known_x’s], [const], [stats])
Known_y’s is the y-data you are attempting to fit
Known_x’s is the x-data you are attempting to fit
Const is a logical value specifying whether the intercept is forced to zero (FALSE) or not (TRUE)
Stats is a logical value that specifies whether regression statistics are returned
LINEST is an array function, so we need to enter it as an array formula, providing two cells to which it can return the values of m and b.
Let’s take a look at how LINEST could be used to determine the equation of a best-fit line for the data above.
Since LINEST will return two values, I start by selecting two adjacent cells on the worksheet.
Next, I enter the formula in the formula bar, rather than in the cell.
Finally, because it’s an array formula, I press CTRL+SHIFT+ENTER to calculate the cells.
The results are…
…exactly the same as those provided by the trendline method.
This was obviously more work than using a trendline, but the real advantage here is that the slope and y-intercept values have been output to a cell. That means we can use them dynamically in a calculation somewhere else in the spreadsheet.
Regression Using the Excel Solver
This last method is more complex than both of the previous methods. Fortunately, it will probably be unnecessary to ever use this method for basic single-variable linear regression. However, I’ve included it here because it provides some understanding into the way that the previous linear regression methods work. It will also introduce you to the possibilities for more complicated curve fitting using Excel.
- Enter “guess-values” for the slope and intercept of the equation
- Calculate new y-values based on those guess values
- Calculate the error between the calculated y-values and the y-data
- Use the Solver to find values of the slope and intercept that minimize the total error
Let’s start again with the x- and y- data we had before.
Next, enter some guess values for m and b into some cells on the worksheet.
Now create a new column of calculated y-values based on the m and b guess values and the known x-data.
Next, create an error column, calculating the difference between the y-data and calculated y-values.
Finally, create a new formula, calculating the sum of squares of the error column.
We will use the Solver to minimize this value – the sum of the squared errors. The reason why we use “sum of squares” instead of just “sum” is because we do not want an error of -100 in one cell to cancel out an error of 100 in another cell. We want each value in the error column to be driven to its minimum absolute value.
Now, let’s open up the Solver. If you have never used the Solver Add-In before, you must first enable it. Follow the steps here to enable the Solver.
After the Add-In has been loaded, you can open the Solver from the Data tab:
With the Solver open, the setup for this is pretty straightforward.
- We want to minimize the objective, cell H3, or the sum of the squared errors.
- To do so, we will change variable cells E3 and F3, the slope and y-intercept of our linear equation.
- As a last step, uncheck the option to “Make Unconstrained Variables Non-Negative”.
When properly set up, the solver dialog should look like this:
When we click “Solve”, the Solver does its thing and finds that the values m = 165.36 and b = -79.85 define the best-fit line through the data. Exactly what was predicted by the chart trendline and LINEST.
Of course, this is totally expected. After all, we have just done “manually” what the Trendline tool and LINEST do automatically.
In the case of a linear single-variable regression like we have here, the Solver is probably complete overkill. However, this is just the start. We can use this same concept to do more complex multi-variable or non-linear regression analysis. Using the Solver, you can fit whatever kind of equation you can dream up to any set of data.
There you have it – three ways to perform linear regression in Excel:
- Chart trendlines
- LINEST function
What kinds of data do you need to fit an equation to? Which method do you prefer? Let me know in the comments below.