8. Getting our hands dirty with ML¶
The objective of this chapter is to put the principles from the last chapter into practice. We will use sklearn
, the go-to package for machine learning in python. Go ahead and just bookmark its user guide right now, you’ll be visiting it a lot.
At the end of this portion of class (between these pages and lectures), you should be able to
Build a pipeline that
Preprocesses realistic data (i.e. multiple variable types) in a pipeline that handles each variable type
Hypertunes the model’s parameters to improve its performance
Finally, evaluate its performance on a test sample
Use that pipeline within the best practice workflow to optimize several models and pick your preferred
Discuss key issues relating to
The value of preprocessing
The value of feature engineering
How your hold-out split and folding method can cause unrealistic performance estimates