# 5.4.2. An intro to SK-learn + Fitting One Model¶

This is just showing you how sklearn fits ONE model for ONE set of hyperparameters on a generic set of X and y. The idea is to show you the flow of how we work through estimation. Do NOT wholesale copy this code for assignments - it is deliberately missing a bunch of best practices, as we build up your familiarity with developing a ML model. But the steps here are universally present in everything we do.

## 5.4.2.1. Five steps to fit a model¶

Step 1: Import class of model from sklearn

from sklearn.linear_model import Ridge


Step 2: Load data into y and X, and split off test data

# this cell is copied from the L17 lecture file
# EXCEPT: I put the interest rate in its own "y" variable
#         and remove the y variable from the fannie_mae data

import pandas as pd
import numpy as np

url        = 'https://github.com/LeDataSciFi/ledatascifi-2021/blob/main/data/Fannie_Mae_Plus_Data.gzip?raw=true'
y          = fannie_mae.Original_Interest_Rate
fannie_mae = (fannie_mae
.assign(l_credscore = np.log(fannie_mae['Borrower_Credit_Score_at_Origination']),
l_LTV = np.log(fannie_mae['Original_LTV_(OLTV)']),
)
.iloc[:,-11:] # limit to these vars for the sake of this example
)

from sklearn.model_selection import train_test_split

rng = np.random.RandomState(0) # this helps us control the randomness so we can reproduce results exactly
X_train, X_test, y_train, y_test = train_test_split(fannie_mae, y, random_state=rng)


Step 3: Choose initial model hyperparameters by instantiating this class with desired values

# create ("instantiate") the class, here I set hyper param alpha=1
ridge = Ridge(alpha=1.0)


Step 4: fit() the model on training data

ridge.fit(X_train,y_train)

Ridge()


Step 5: Apply the model to new data. Either:

• <modelname>.predict(X_test) will predict what $$y$$ should be using $$X_test$$, and is used in supervised learning tasks

• <modelname>.transform(X_test) will change $$X_test$$ using the model, and is common with preprocessing and unsupervised learning

ridge.predict(X_test)

array([5.95256433, 4.20060942, 3.9205946 , ..., 4.06401663, 5.30024985,
7.32600213])


The text here is adapted from PDSH