A canonical asset pricing job

Let’s estimate, for each firm, for each year, the alpha, beta, and size and value loadings.

So we want a dataset that looks like this:

Firm

Year

alpha

beta

GM

2000

0.01

1.04

GM

2001

-0.005

0.98

…but it will do this for every firm, every year!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas_datareader as pdr
import seaborn as sns
# import statsmodels.api as sm

Load your stock returns. Here, I’ll use this dataset, but you can use anything.

The returns don’t even have to be firms.

They can be any asset. (Portfolios, mutual funds, crypto, …)

crsp = pd.read_stata('https://github.com/LeDataSciFi/ledatascifi-2021/blob/main/data/3firm_ret_1990_2020.dta?raw=true')
crsp['ret'] = crsp['ret']*100 # convert to precentage to match FF's convention on scaling (daily % rets)

Then grab the market returns. Here, we will use one of the Fama-French datasets.

ff = pdr.get_data_famafrench('F-F_Research_Data_5_Factors_2x3_daily',start=1980,end=2010)[0] # the [0] is because the imported obect is a dictionary, and key=0 is the dataframe
ff = ff.reset_index().rename(columns={"Mkt-RF":"mkt_excess", "Date":"date"})

Merge the market returns into the stock returns.

crsp_ready = pd.merge(left=ff, right=crsp, on='date', how="inner",
                      indicator=True, validate="one_to_many")

So the data’s basically ready. Again, the goal is to estimate, for each firm, for each year, the alpha, beta, and size and value loadings.

You caught that right? I have a dataframe, and for each firm, and for each year, I want to <do stuff> (run regressions).

Pandas + “for each” = groupby!

So we will basically run crsp.groupby([firm,year]).runregression(). Except there is no “runregression” function that applies to pandas groupby objects. Small workaround: crsp.groupby([firm,year]).apply(<our own reg fcn>).

We just need to write a reg function that works on groupby objects.

import statsmodels.api as sm

def reg_in_groupby(df,formula="ret_excess ~ mkt_excess + SMB + HML"):
    '''
    Want to run regressions after groupby?
    
    This will do it! 
    
    Note: This defaults to a FF3 model assuming specific variable names. If you
    want to run any other regression, just specify your model.
    
    Usage: 
        df.groupby(<whatever>).apply(reg_in_groupby)
        df.groupby(<whatever>).apply(reg_in_groupby,formula=<whatever>)
    '''
    return pd.Series(sm.formula.ols(formula,data = df).fit().params)

Let’s apply that to our returns!

(
    crsp_ready # grab the data
    
    # Two things before the regressions:
    # 1. need a year variable (to group on)
    # 2. the market returns in FF are excess returns, so 
    #    our stock returns need to be excess as well
    .assign(year = crsp_ready.date.dt.year,
           ret_excess = crsp_ready.ret - crsp_ready.RF)
    
    # ok, run the regs, so easy!
    .groupby(['permno','year']).apply(reg_in_groupby)
    
    # and clean up - with better var names
    .rename(columns={'Intercept':'alpha','mkt_excess':'beta'})
    .reset_index()
)
permno year alpha beta SMB HML
0 10107.0 1990 0.042351 0.329337 -0.627803 -3.094143
1 10107.0 1991 0.154813 1.350377 -0.166336 -0.901889
2 10107.0 1992 0.121780 1.610188 0.241002 -1.281456
3 10107.0 1993 0.019707 1.372881 -0.204308 -1.273144
4 10107.0 1994 0.166985 0.983203 -0.622969 -1.226662
... ... ... ... ... ... ...
75 65875.0 2005 -0.117886 0.853634 -0.547212 -0.047118
76 65875.0 2006 0.076401 1.029162 -0.381499 -0.219379
77 65875.0 2007 0.053763 0.859263 -0.364127 0.047620
78 65875.0 2008 0.088386 0.839751 -0.314287 0.022244
79 65875.0 2009 -0.049534 0.751370 -0.271907 -0.193505

80 rows × 6 columns

How cool is that!

Summary

This is all you need to do:

  1. Set up the data like you would have to no matter what:

    1. Load your stock prices.

    2. Merge in the market returns and any factors you want to include in your model.

    3. Make sure your returns are scaled like your factors (e.g., above, I converted to percentages to match the FF convention)

    4. Make sure your asset returns and market returns are both excess returns (or both are not excess returns)

    5. Create any variables you want to group on (e.g. above, I created a year variable)

  2. df.groupby(<whatever>).apply(reg_in_groupby)

Holy smokes!