9.5. Estimating the loadings on 3 factor model¶
On the last page, we estimated the CAPM model and got an annual estimate of beta for each firm. However, while CAPM is a useful entry point to start learning about expected return models, CAPM describes returns poorly in the real world in many settings. Because of that, hedge funds typically use 3 to 5 factors in their asset pricing models.
So this page makes two changes to the CAPM estimation:
Here, I use CRSP data on stock returns. As noted on the last page, this is the industry standard for return data.
This page will implement a “Fama-French (FF) 3 factor” model. This model helped Eugene Fama win a Nobel Prize, and it includes the market premium, plus a size factor and a value factor.
Similarly to before, let’s estimate, for each firm, for each year, the alpha, beta, and size and value loadings. So we want a dataset that looks like this:
Firm |
Year |
alpha |
beta |
SMB_loading |
HML_loading |
---|---|---|---|---|---|
MSFT |
1990 |
0.042351 |
0.329337 |
-0.627803 |
-3.094143 |
MSFT |
1991 |
0.154813 |
1.350377 |
-0.166336 |
-0.901889 |
MSFT |
1992 |
0.121780 |
1.610188 |
0.241002 |
-1.281456 |
MSFT |
1993 |
0.019707 |
1.372881 |
-0.204308 |
-1.273144 |
Again, the pseudocode is simple:
Set up the data (asset returns with factors merged in)
Load your asset returns.
Merge in the market returns and any factors you want to include in your model.
Make sure your returns are scaled like your factors (e.g., below, I convert to percentages to match the FF convention)
Make sure your asset returns and market returns are both excess returns
For each firm and time period you’re interested in, estimate the model via
df.groupby([asset_id,year]).apply(reg_in_groupby, formula='<model>')
9.5.1. Set up the data¶
import pandas as pd
import numpy as np
import pandas_datareader as pdr
Load your stock returns. Here, I’ll use a subset of CRSP, but you can use anything.
The returns don’t even have to be firms!
They can be any asset. (Portfolios, mutual funds, crypto, …)
crsp = pd.read_stata('https://github.com/LeDataSciFi/data/blob/main/Stock%20Returns%20(CRSP)/3firm_ret_1990_2020.dta?raw=true')
crsp['ret'] = crsp['ret']*100 # convert to precentage to match FF's convention on scaling (daily % rets)
Then grab the market returns. Here, we will use one of the Fama-French datasets.
ff = pdr.get_data_famafrench('F-F_Research_Data_5_Factors_2x3_daily',start=1980,end=2010)[0] # the [0] is because the imported obect is a dictionary, and key=0 is the dataframe
ff = ff.reset_index().rename(columns={"Mkt-RF":"mkt_excess", "Date":"date"})
Merge the market returns into the stock returns.
crsp_ready = pd.merge(left=ff, right=crsp, on='date', how="inner",
indicator=True, validate="one_to_many")
9.5.2. Estimate the model¶
So the data’s basically ready. Again, the goal is to estimate, for each firm, for each year, the alpha, beta, and size and value loadings.
You caught that right? I have a dataframe, and for each firm, and for each year, I want to <do stuff> (run regressions).
Pandas + “for each” = groupby!
So we will basically run crsp_ready.groupby([firm,year]).runregression()
. Except there is no “runregression” function that applies to pandas groupby objects. Small workaround: crsp.groupby([firm,year]).apply(<our own reg fcn>)
.
We just need to write a reg function that works on groupby objects.
import statsmodels.api as sm
def reg_in_groupby(df,formula="ret_excess ~ mkt_excess + SMB + HML"):
'''
Want to run regressions after groupby?
This will do it!
Note: This defaults to a FF3 model assuming specific variable names. If you
want to run any other regression, just specify your model.
Usage:
df.groupby(<whatever>).apply(reg_in_groupby)
df.groupby(<whatever>).apply(reg_in_groupby,formula=<whatever>)
'''
return pd.Series(sm.formula.ols(formula,data = df).fit().params)
Let’s use that function!
(
crsp_ready # grab the data
# Two things before the regressions:
# 1. need a year variable (to group on)
# 2. the market returns in FF are excess returns, so
# our stock returns need to be excess as well
.assign(year = crsp_ready.date.dt.year,
ret_excess = crsp_ready.ret - crsp_ready.RF)
# ok, run the regs, so easy!
.groupby(['permno','year']).apply(reg_in_groupby)
# optional: clean up the output with better var names
.rename(columns={'Intercept':'alpha','mkt_excess':'beta',
'SMB':'SMB_loading','HML':'HML_loading'})
.reset_index()
)
permno | year | alpha | beta | SMB_loading | HML_loading | |
---|---|---|---|---|---|---|
0 | 10107.0 | 1990 | 0.053564 | 0.374896 | -0.575158 | -3.034802 |
1 | 10107.0 | 1991 | 0.152044 | 1.353209 | -0.154796 | -0.935610 |
2 | 10107.0 | 1992 | 0.121237 | 1.611697 | 0.261069 | -1.284102 |
3 | 10107.0 | 1993 | 0.020430 | 1.358716 | -0.220972 | -1.283983 |
4 | 10107.0 | 1994 | 0.167363 | 0.982035 | -0.618733 | -1.235349 |
... | ... | ... | ... | ... | ... | ... |
75 | 65875.0 | 2005 | -0.118006 | 0.855879 | -0.550304 | -0.041979 |
76 | 65875.0 | 2006 | 0.076719 | 1.027060 | -0.378648 | -0.231876 |
77 | 65875.0 | 2007 | 0.053609 | 0.861027 | -0.367128 | 0.053669 |
78 | 65875.0 | 2008 | 0.087575 | 0.840263 | -0.312572 | 0.024437 |
79 | 65875.0 | 2009 | -0.049988 | 0.752539 | -0.263469 | -0.195175 |
80 rows × 6 columns
How cool is that! Holy smokes!