5.5.3. Estimating the loadings on 3 factor model¶

On the last page, we estimated the CAPM model and got an annual estimate of beta for each firm. However, while CAPM is a useful entry point to start learning about expected return models, CAPM describes returns poorly in the real world in many settings. Because of that, hedge funds typically use 3 to 5 factors in their asset pricing models.

So this page makes two changes to the CAPM estimation:

Here, I use CRSP data on stock returns. As noted on the last page, this is the industry standard for return data.
This page will implement a “Fama-French (FF) 3 factor” model. This model helped Eugene Fama win a Nobel Prize, and it includes the market premium, plus a size factor and a value factor.

Similarly to before, let’s estimate, for each firm, for each year, the alpha, beta, and size and value loadings. So we want a dataset that looks like this:

Firm	Year	alpha	beta	SMB_loading	HML_loading
MSFT	1990	0.042351	0.329337	-0.627803	-3.094143
MSFT	1991	0.154813	1.350377	-0.166336	-0.901889
MSFT	1992	0.121780	1.610188	0.241002	-1.281456
MSFT	1993	0.019707	1.372881	-0.204308	-1.273144

Again, the pseudo code is simple:

Set up the data (asset returns with factors merged in)
1. Load your asset returns.
2. Merge in the market returns and any factors you want to include in your model.
3. Make sure your returns are scaled like your factors (e.g., below, I convert to percentages to match the FF convention)
4. Make sure your asset returns and market returns are both excess returns
For each firm and time period you’re interested in, estimate the model via df.groupby([asset_id,year]).apply(reg_in_groupby, formula='<model>')

5.5.3.1. Set up the data¶

import pandas as pd
import numpy as np
import pandas_datareader as pdr

Load your stock returns. Here, I’ll use a subset of CRSP, but you can use anything.

The returns don’t even have to be firms!

They can be any asset. (Portfolios, mutual funds, crypto, …)

crsp = pd.read_stata('https://github.com/LeDataSciFi/ledatascifi-2022/blob/main/data/3firm_ret_1990_2020.dta?raw=true')
crsp['ret'] = crsp['ret']*100 # convert to precentage to match FF's convention on scaling (daily % rets)

Then grab the market returns. Here, we will use one of the Fama-French datasets.

ff = pdr.get_data_famafrench('F-F_Research_Data_5_Factors_2x3_daily',start=1980,end=2010)[0] # the [0] is because the imported obect is a dictionary, and key=0 is the dataframe
ff = ff.reset_index().rename(columns={"Mkt-RF":"mkt_excess", "Date":"date"})

Merge the market returns into the stock returns.

crsp_ready = pd.merge(left=ff, right=crsp, on='date', how="inner",
                      indicator=True, validate="one_to_many")

5.5.3.2. Estimate the model¶

So the data’s basically ready. Again, the goal is to estimate, for each firm, for each year, the alpha, beta, and size and value loadings.

You caught that right? I have a dataframe, and for each firm, and for each year, I want to <do stuff> (run regressions).

Pandas + “for each” = groupby!

So we will basically run crsp_ready.groupby([firm,year]).runregression(). Except there is no “runregression” function that applies to pandas groupby objects. Small workaround: crsp.groupby([firm,year]).apply(<our own reg fcn>).

We just need to write a reg function that works on groupby objects.

import statsmodels.api as sm

def reg_in_groupby(df,formula="ret_excess ~ mkt_excess + SMB + HML"):
    '''
    Want to run regressions after groupby?
    
    This will do it! 
    
    Note: This defaults to a FF3 model assuming specific variable names. If you
    want to run any other regression, just specify your model.
    
    Usage: 
        df.groupby(<whatever>).apply(reg_in_groupby)
        df.groupby(<whatever>).apply(reg_in_groupby,formula=<whatever>)
    '''
    return pd.Series(sm.formula.ols(formula,data = df).fit().params)

Let’s use that function!

(
    crsp_ready # grab the data
    
    # Two things before the regressions:
    # 1. need a year variable (to group on)
    # 2. the market returns in FF are excess returns, so 
    #    our stock returns need to be excess as well
    .assign(year = crsp_ready.date.dt.year,
           ret_excess = crsp_ready.ret - crsp_ready.RF)
    
    # ok, run the regs, so easy!
    .groupby(['permno','year']).apply(reg_in_groupby)
    
    # optional: clean up the output with better var names 
    .rename(columns={'Intercept':'alpha','mkt_excess':'beta',
                    'SMB':'SMB_loading','HML':'HML_loading'})
    .reset_index()
)

	permno	year	alpha	beta	SMB_loading	HML_loading
0	10107.0	1990	0.053564	0.374896	-0.575158	-3.034802
1	10107.0	1991	0.152044	1.353209	-0.154796	-0.935610
2	10107.0	1992	0.121237	1.611697	0.261069	-1.284102
3	10107.0	1993	0.020430	1.358716	-0.220972	-1.283983
4	10107.0	1994	0.167363	0.982035	-0.618733	-1.235349
...	...	...	...	...	...	...
75	65875.0	2005	-0.118006	0.855879	-0.550304	-0.041979
76	65875.0	2006	0.076719	1.027060	-0.378648	-0.231876
77	65875.0	2007	0.053609	0.861027	-0.367128	0.053669
78	65875.0	2008	0.087575	0.840263	-0.312572	0.024437
79	65875.0	2009	-0.049988	0.752539	-0.263469	-0.195175

80 rows × 6 columns

How cool is that! Holy smokes!

LeDataSciFi-2022

5.5.3. Estimating the loadings on 3 factor model¶

5.5.3.1. Set up the data¶

5.5.3.2. Estimate the model¶