6. Regression

We start our machine learning applications with regression for a few simple reasons:

  • Regression is a fundamental method for estimating the relationship between one variable (“y”) and many other (“X”) variables.

  • But the coefficients obtained can also be used to generate predictions.

  • Note: The focus in this section is on RELATIONSHIP paradigm

  • Many issues that confront researchers have well-understood solutions when regression is the model being used.

  • Regression coefficients are easy to interpret.

Overall objectives

After this chapter,

  1. You can fit a regression with statsmodels or sklearn

  2. You can view the results visually or numerically of your model with either method

  3. You can measure the goodness of fit on a regression

  4. You can interpret the mechanical meaning of the coefficients for

    • continuous variables

    • categorical a.k.a qualitative variables with two or more values (aka “dummy”, “binary”, and “categorical” variables

    • interaction terms between two X variables

    • variables in models with other controls included (including categorical variables)

  5. You understand what a t-stat / p-value does and does not tell you

  6. You are aware of common regression analysis pitfalls and disasters