9.4. Estimating beta via CAPM, using Yahoo Finance¶
Let’s estimate, for each firm, for each year, the alpha and beta of a stock, using CAPM as our asset pricing model.
So we want a dataset that looks like this:
Firm |
Year |
alpha |
beta |
---|---|---|---|
GM |
2000 |
0.01 |
1.04 |
GM |
2001 |
-0.005 |
0.98 |
…but it will do this for every firm, every year!
The psuedocode is relatively simple:
Load the data on stock returns.
Load the market return premium and the risk-free rate (the “factors” in CAPM).
Note: Make sure your returns are scaled like your factors (e.g., below, I convert stock returns to percentages to match the convention in the FF dataset)
Make sure your stock returns and market returns are both excess returns
Merge the stock and factors returns
For each firm and time period you’re interested in, estimate the CAPM model to get the stock’s alpha and beta for that period.
Note
This page assumes you are familiar with CAPM, the capital asset pricing model.
This page uses Yahoo Finance for stock returns. Yahoo Finance is more of a “quick and dirty” way to get return data. CRSP is the industry standard data provider for stock returns, and you can access via WRDS.
This file can estimate CAPM for any firms that Yahoo has ticker data for.
As written, it will estimate beta separately for each calendar year in the date range you give it. But you can adjust that.
9.4.1. Getting started¶
Note
The code below might need to be modified to work as of Feb 2023. The fix is here.
# uncomment and run this to install packages
# after you successfully do that, add the comments back
#!pip install pandas_datareader
#!pip install yfinance
import pandas as pd
import numpy as np
import pandas_datareader as pdr # you might need to install this (see above)
from datetime import datetime
import yfinance as yf
9.4.2. Load asset return data¶
Load your stock returns. This file uses yahoo finance. The returns don’t even have to be firms! They can be any asset. (Portfolios, mutual funds, crypto, …)
# choose your firms and years
stocks = ['SBUX','AAPL','MSFT']
start = datetime(2016, 1, 1)
end = datetime(2016, 12, 31)
Tip
The code in the next block is explained more thoroughly in handouts/factor_loading_simple.ipynb
in the textbook repo because that file prints the status of the data throughout. Looking at this might help.
# download stock prices
# here, from yahoo: not my fav source, but quick.
# we need to do some data manipulation to get the data ready
stock_prices = yf.download(stocks, start , end)
stock_prices.index = stock_prices.index.tz_localize(None) # change yf date format to match pdr
stock_prices = stock_prices.filter(like='Adj Close') # reduce to just columns with this in the name
stock_prices.columns = stock_prices.columns.get_level_values(1) # tickers as col names, works no matter order of tics
# refmt from wide to long
stock_prices = stock_prices.stack().swaplevel().sort_index().reset_index()
stock_prices.columns = ['Firm','Date','Adj Close']
# add return var = pct_change() function compares to prior row
# EXCEPT: don't compare for first row of one firm with last row of prior firm!
# MAKE SURE YOU CREATE THE VARIABLES WITHIN EACH FIRM - use groupby
stock_prices['ret'] = stock_prices.groupby('Firm')['Adj Close'].pct_change()
stock_prices['ret'] = stock_prices['ret'] *100 # convert to p.p. to match FF's convention on scaling (daily % rets)
stock_prices.head(15)
Firm | Date | Adj Close | ret | |
---|---|---|---|---|
0 | AAPL | 2016-01-04 | 24.220573 | NaN |
1 | AAPL | 2016-01-05 | 23.613630 | -2.505899 |
2 | AAPL | 2016-01-06 | 23.151514 | -1.956989 |
3 | AAPL | 2016-01-07 | 22.174414 | -4.220460 |
4 | AAPL | 2016-01-08 | 22.291668 | 0.528782 |
5 | AAPL | 2016-01-11 | 22.652626 | 1.619251 |
6 | AAPL | 2016-01-12 | 22.981384 | 1.451303 |
7 | AAPL | 2016-01-13 | 22.390528 | -2.571022 |
8 | AAPL | 2016-01-14 | 22.880220 | 2.187053 |
9 | AAPL | 2016-01-15 | 22.330753 | -2.401494 |
10 | AAPL | 2016-01-19 | 22.222696 | -0.483893 |
11 | AAPL | 2016-01-20 | 22.252586 | 0.134502 |
12 | AAPL | 2016-01-21 | 22.139933 | -0.506250 |
13 | AAPL | 2016-01-22 | 23.317051 | 5.316720 |
14 | AAPL | 2016-01-25 | 22.861835 | -1.952286 |
9.4.3. Get the factor returns¶
Above, we got the asset returns, \(r_i\) (called “ret” in the dataframe).
To estimate \(\alpha\) and \(\beta\) in \(r_i-r_f = \alpha + \beta (r_m-r_f)\), we need \((r_m-r_f)\) and \(r_f\). Let’s download those now.
Note: \((r_m-r_f)\) is the excess return on the market, which is one “factor”. Hedge funds typically use 5 factors in tests.
# We need (r_mkt - rf), and rf
# the Fama French data library is a benchmark asset pricing dataset
ff = pdr.get_data_famafrench('F-F_Research_Data_5_Factors_2x3_daily',start=start,end=end)[0] # the [0] is because the imported obect is a dictionary, and key=0 is the dataframe
ff = ff.reset_index().rename(columns={"Mkt-RF":"mkt_excess"})
ff
Date | mkt_excess | SMB | HML | RMW | CMA | RF | |
---|---|---|---|---|---|---|---|
0 | 2016-01-04 | -1.59 | -0.76 | 0.52 | 0.35 | 0.40 | 0.000 |
1 | 2016-01-05 | 0.12 | -0.24 | 0.01 | 0.07 | 0.31 | 0.000 |
2 | 2016-01-06 | -1.35 | -0.22 | 0.00 | 0.16 | 0.05 | 0.000 |
3 | 2016-01-07 | -2.44 | -0.28 | 0.09 | 0.52 | 0.36 | 0.000 |
4 | 2016-01-08 | -1.11 | -0.51 | -0.04 | 0.25 | 0.06 | 0.000 |
... | ... | ... | ... | ... | ... | ... | ... |
247 | 2016-12-23 | 0.19 | 0.52 | -0.50 | -0.40 | -0.15 | 0.001 |
248 | 2016-12-27 | 0.27 | 0.23 | 0.14 | 0.14 | 0.04 | 0.001 |
249 | 2016-12-28 | -0.87 | -0.29 | 0.09 | 0.19 | -0.15 | 0.001 |
250 | 2016-12-29 | -0.04 | 0.10 | -0.33 | 0.27 | 0.02 | 0.001 |
251 | 2016-12-30 | -0.52 | -0.06 | 0.20 | -0.09 | 0.03 | 0.001 |
252 rows × 7 columns
9.4.4. Merge the asset and factor returns¶
assets_and_factors = pd.merge(
left=stock_prices,
right=ff,
on="Date",
how="inner",
indicator=True,
validate="many_to_one",
)
assets_and_factors
Firm | Date | Adj Close | ret | mkt_excess | SMB | HML | RMW | CMA | RF | _merge | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | AAPL | 2016-01-04 | 24.220573 | NaN | -1.59 | -0.76 | 0.52 | 0.35 | 0.40 | 0.000 | both |
1 | MSFT | 2016-01-04 | 49.591057 | NaN | -1.59 | -0.76 | 0.52 | 0.35 | 0.40 | 0.000 | both |
2 | SBUX | 2016-01-04 | 52.108959 | NaN | -1.59 | -0.76 | 0.52 | 0.35 | 0.40 | 0.000 | both |
3 | AAPL | 2016-01-05 | 23.613630 | -2.505899 | 0.12 | -0.24 | 0.01 | 0.07 | 0.31 | 0.000 | both |
4 | MSFT | 2016-01-05 | 49.817291 | 0.456200 | 0.12 | -0.24 | 0.01 | 0.07 | 0.31 | 0.000 | both |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
751 | MSFT | 2016-12-29 | 58.483051 | -0.142872 | -0.04 | 0.10 | -0.33 | 0.27 | 0.02 | 0.001 | both |
752 | SBUX | 2016-12-29 | 51.127605 | -0.053199 | -0.04 | 0.10 | -0.33 | 0.27 | 0.02 | 0.001 | both |
753 | AAPL | 2016-12-30 | 27.220133 | -0.779584 | -0.52 | -0.06 | 0.20 | -0.09 | 0.03 | 0.001 | both |
754 | MSFT | 2016-12-30 | 57.776413 | -1.208279 | -0.52 | -0.06 | 0.20 | -0.09 | 0.03 | 0.001 | both |
755 | SBUX | 2016-12-30 | 50.401356 | -1.420465 | -0.52 | -0.06 | 0.20 | -0.09 | 0.03 | 0.001 | both |
756 rows × 11 columns
9.4.5. Estimate CAPM¶
So the data’s basically ready. (We need to do two quick things below.)
Again, the goal is to estimate, for each firm, for each year, alpha and beta, from the CAPM formula.
Well, as we’ve said, if you are doing a “for each” on a dataframe, that means you want to use groupby!
So, I have a dataframe, and for each firm, and for each year, I want to <do stuff> (run regressions).
That almost directly translates to the code we need: assets_and_factors.groupby([firm,year]).runregression()
. Except there is no “runregression” function that applies to pandas groupby objects. But we can write such a function and then apply()
it. Meaning, our plan is to basically use this code: assets_and_factors.groupby([firm,year]).apply(<our own reg fcn>)
.
We just need to write a reg function that works on groupby objects.
import statsmodels.api as sm
def reg_in_groupby(df, formula="ret_excess ~ mkt_excess + SMB + HML"):
"""
Want to run regressions after groupby? E.g., repeat the regression
for each firm-year?
This will do it!
Note: This defaults to a FF3 model assuming specific variable names. If you
want to run any other regression, just specify your model.
Usage:
df.groupby(<whatever>).apply(reg_in_groupby)
df.groupby(<whatever>).apply(reg_in_groupby,formula=<whatever>)
"""
return pd.Series(sm.formula.ols(formula, data=df).fit().params)
(
assets_and_factors # grab the data
# Two things before the regressions:
# 1. need a year variable (to group on)
# 2. the market returns in FF are excess returns, so
# our stock returns need to be excess as well
.assign(year = assets_and_factors.Date.dt.year,
ret_excess = assets_and_factors.ret - assets_and_factors.RF)
# ok, run the regs, so easy!
.groupby(['Firm','year']).apply(reg_in_groupby,formula='ret_excess ~ mkt_excess')
# and clean up - with better var names
.rename(columns={'Intercept':'alpha','mkt_excess':'beta'})
.reset_index()
)
Firm | year | alpha | beta | |
---|---|---|---|---|
0 | AAPL | 2016 | -0.001081 | 0.965622 |
1 | MSFT | 2016 | 0.000690 | 1.166076 |
2 | SBUX | 2016 | -0.056862 | 0.848162 |