# A canonical asset pricing job¶

Let’s estimate, for each firm, for each year, the alpha, beta, and size and value loadings.

So we want a dataset that looks like this:

Firm

Year

alpha

beta

GM

2000

0.01

1.04

GM

2001

-0.005

0.98

…but it will do this for every firm, every year!

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# import statsmodels.api as sm
```

Load your stock returns. Here, I’ll use this dataset, but you can use anything.

The returns don’t even have to be firms.

They can be any asset. (Portfolios, mutual funds, crypto, …)

```crsp = pd.read_stata('https://github.com/LeDataSciFi/ledatascifi-2021/blob/main/data/3firm_ret_1990_2020.dta?raw=true')
crsp['ret'] = crsp['ret']*100 # convert to precentage to match FF's convention on scaling (daily % rets)
```

Then grab the market returns. Here, we will use one of the Fama-French datasets.

```ff = pdr.get_data_famafrench('F-F_Research_Data_5_Factors_2x3_daily',start=1980,end=2010) # the  is because the imported obect is a dictionary, and key=0 is the dataframe
ff = ff.reset_index().rename(columns={"Mkt-RF":"mkt_excess", "Date":"date"})
```

Merge the market returns into the stock returns.

```crsp_ready = pd.merge(left=ff, right=crsp, on='date', how="inner",
indicator=True, validate="one_to_many")
```

So the data’s basically ready. Again, the goal is to estimate, for each firm, for each year, the alpha, beta, and size and value loadings.

You caught that right? I have a dataframe, and for each firm, and for each year, I want to <do stuff> (run regressions).

Pandas + “for each” = groupby!

So we will basically run `crsp.groupby([firm,year]).runregression()`. Except there is no “runregression” function that applies to pandas groupby objects. Small workaround: `crsp.groupby([firm,year]).apply(<our own reg fcn>)`.

We just need to write a reg function that works on groupby objects.

```import statsmodels.api as sm

def reg_in_groupby(df,formula="ret_excess ~ mkt_excess + SMB + HML"):
'''
Want to run regressions after groupby?

This will do it!

Note: This defaults to a FF3 model assuming specific variable names. If you
want to run any other regression, just specify your model.

Usage:
df.groupby(<whatever>).apply(reg_in_groupby)
df.groupby(<whatever>).apply(reg_in_groupby,formula=<whatever>)
'''
return pd.Series(sm.formula.ols(formula,data = df).fit().params)
```

Let’s apply that to our returns!

```(

# Two things before the regressions:
# 1. need a year variable (to group on)
# 2. the market returns in FF are excess returns, so
#    our stock returns need to be excess as well

# ok, run the regs, so easy!
.groupby(['permno','year']).apply(reg_in_groupby)

# and clean up - with better var names
.rename(columns={'Intercept':'alpha','mkt_excess':'beta'})
.reset_index()
)
```
permno year alpha beta SMB HML
0 10107.0 1990 0.042351 0.329337 -0.627803 -3.094143
1 10107.0 1991 0.154813 1.350377 -0.166336 -0.901889
2 10107.0 1992 0.121780 1.610188 0.241002 -1.281456
3 10107.0 1993 0.019707 1.372881 -0.204308 -1.273144
4 10107.0 1994 0.166985 0.983203 -0.622969 -1.226662
... ... ... ... ... ... ...
75 65875.0 2005 -0.117886 0.853634 -0.547212 -0.047118
76 65875.0 2006 0.076401 1.029162 -0.381499 -0.219379
77 65875.0 2007 0.053763 0.859263 -0.364127 0.047620
78 65875.0 2008 0.088386 0.839751 -0.314287 0.022244
79 65875.0 2009 -0.049534 0.751370 -0.271907 -0.193505

80 rows × 6 columns

How cool is that!

## Summary¶

This is all you need to do:

1. Set up the data like you would have to no matter what:

2. `df.groupby(<whatever>).apply(reg_in_groupby)`