6. Regression¶

We start our machine learning applications with regression for a few simple reasons:

Regression is a fundamental method for estimating the relationship between one variable (“y”) and many other (“X”) variables.
But the coefficients obtained can also be used to generate predictions.
Note: The focus in this section is on RELATIONSHIP paradigm
Many issues that confront researchers have well-understood solutions when regression is the model being used.
Regression coefficients are easy to interpret.

Overall objectives

After this chapter,

You can fit a regression with statsmodels or sklearn
You can view the results visually or numerically of your model with either method
You can measure the goodness of fit on a regression
You can interpret the mechanical meaning of the coefficients for
- continuous variables
- categorical a.k.a qualitative variables with two or more values (aka “dummy”, “binary”, and “categorical” variables
- interaction terms between two X variables
- variables in models with other controls included (including categorical variables)
You understand what a t-stat / p-value does and does not tell you
You are aware of common regression analysis pitfalls and disasters

LeDataSciFi-2023