6. Regression¶
We start our machine learning applications with regression for a few simple reasons:
Regression is a fundamental method for estimating the relationship between one variable (“y”) and many other (“X”) variables.
But the coefficients obtained can also be used to generate predictions.
Note: The focus in this section is on RELATIONSHIP paradigm
Many issues that confront researchers have well-understood solutions when regression is the model being used.
Regression coefficients are easy to interpret.
Overall objectives
After this chapter,
You can fit a regression with
statsmodels
orsklearn
You can view the results visually or numerically of your model with either method
You can measure the goodness of fit on a regression
You can interpret the mechanical meaning of the coefficients for
continuous variables
categorical a.k.a qualitative variables with two or more values (aka “dummy”, “binary”, and “categorical” variables
interaction terms between two X variables
variables in models with other controls included (including categorical variables)
You understand what a t-stat / p-value does and does not tell you
You are aware of common regression analysis pitfalls and disasters