9.6. Cross-Sectional / “Assaying Anomalies” Style Analysis¶
This code replicates the “Table 1” analysis in many cross sectional asset pricing papers, which is a univariate sort showing alpha returns of portfolios (determined by sorting stocks along one dimension, thus “univariate”) against six different benchmarks.
The basic idea of this kind of table is that
We measure something about stocks. Maybe it’s their size, or their recent returns. Let’s call this variable X.
Then we sort stocks on X every month and divide them into 5 or 10 buckets. The stocks in each bucket constitute a portfolio, so we get the returns for that bucket for the next month. We also compute the “Long-Short” portfolio, which is always the return of the highest bucket minus the lowest bucket. Long-short portfolio are what you would get if you short the stocks in the lowest bucket to go long the highest bucket, and are “zero cost” portfolios.
We repeat this across time so we can see how each portfolio/bucket does and show the average return in each bucket.
To be more sophisticated, we do step 3 a few ways: we look at the portfolio returns compared to the market, and we also compute the portfolio alphas against some benchmark factor models.
If we see significant results in the High Minus Low column, practicianers will say that X is an “anomaly”, in that the asset pricing model can’t explain its returns.
The plan for this code file comes from the “Assaying Anomalies”” project, which is a protocol to evaluate whether a given factor X is useful in predicting returns.
Here is the repo containing that project’s code (mostly in Matlab as of Spring 2025, but will be in Python soon).
The paper describing that project, which you should read (its breezy), and what you should cite is: Novy-Marx, Robert and Velikov, Mihail, Assaying Anomalies (February 13, 2024). Available at SSRN: https://ssrn.com/abstract=4338007 or http://dx.doi.org/10.2139/ssrn.4338007
9.6.1. What “Table 1” looks like¶
In our code below, we are trying to replicate the structure of this table below, from Novy-Marx and Velikov.
# !pip install pandas-datareader # run this once, then comment
import pandas_datareader.famafrench as ff
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
# datasets = ff.get_available_datasets()
# datasets
9.6.2. Step 1: Get the factor portfolio returns.¶
# I think this are in the opensourceAP dataset, and you should grab from that source instead, IMO
df_factors = ff.FamaFrenchReader('F-F_Research_Data_5_Factors_2x3', start='1900-01-01').read()[0]
# add momentum to this
mom = ff.FamaFrenchReader('F-F_Momentum_Factor', start='1900-01-01').read()[0] # add momentum
mom.columns = ['Mom'] # rename
df_factors = pd.merge(df_factors, mom, left_index=True, right_index=True)
df_factors # FYI: contains Mkt-RF and RF, but no Mkt
Mkt-RF | SMB | HML | RMW | CMA | RF | Mom | |
---|---|---|---|---|---|---|---|
Date | |||||||
1963-07 | -0.39 | -0.41 | -0.97 | 0.68 | -1.18 | 0.27 | 0.90 |
1963-08 | 5.07 | -0.80 | 1.80 | 0.36 | -0.35 | 0.25 | 1.01 |
1963-09 | -1.57 | -0.52 | 0.13 | -0.71 | 0.29 | 0.27 | 0.19 |
1963-10 | 2.53 | -1.39 | -0.10 | 2.80 | -2.01 | 0.29 | 3.12 |
1963-11 | -0.85 | -0.88 | 1.75 | -0.51 | 2.24 | 0.27 | -0.74 |
... | ... | ... | ... | ... | ... | ... | ... |
2024-08 | 1.61 | -3.65 | -1.13 | 0.85 | 0.86 | 0.48 | 4.79 |
2024-09 | 1.74 | -1.02 | -2.59 | 0.04 | -0.26 | 0.40 | -0.60 |
2024-10 | -0.97 | -0.88 | 0.89 | -1.38 | 1.03 | 0.39 | 2.87 |
2024-11 | 6.51 | 4.78 | -0.05 | -2.62 | -2.17 | 0.40 | 0.90 |
2024-12 | -3.17 | -3.87 | -2.95 | 1.82 | -1.10 | 0.37 | 0.05 |
738 rows × 7 columns
9.6.3. Step 2: Get your signal, then construct portfolio returns.¶
Note: I’m just picking easily available portfolios. Your work is here.
Upgrade: Make everything from here to the table a function. (So you can do decile or tercile splits, or change how splits are done, in one line of code.)
# now, here we'd develop some "signal" and then create portfolio rets based on it
# I'm skipping... you figure that out
# I'll pretend I did that by grabbing 5 industry portfolio returns
df_portfolios = ff.FamaFrenchReader('5_Industry_Portfolios', start='1900-01-01').read()[0]
df_portfolios.columns = [f'Port{i+1}' for i in range(len(df_portfolios.columns))] # this is my anticipated portfolio number name scheme
df_portfolios.eval("HmL = Port5-Port1", inplace=True)
# Make each portfolio (except for HmL) excess returns
for col in ['Port1', 'Port2', 'Port3', 'Port4', 'Port5']:
df_portfolios[col] = df_portfolios[col] - df_factors['RF']
portfolios = df_portfolios.columns.tolist()
df_portfolios
9.6.4. Step 3: Run the regressions.¶
reg_df = pd.merge(df_factors, df_portfolios, left_index=True, right_index=True)
reg_df
Mkt-RF | SMB | HML | RMW | CMA | RF | Mom | Port1 | Port2 | Port3 | Port4 | Port5 | HmL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||
1975-01 | 13.66 | 12.91 | 8.28 | -0.78 | -0.90 | 0.58 | -13.82 | 21.19 | 11.94 | 12.90 | -1.32 | 17.31 | -3.88 |
1975-02 | 5.56 | -0.65 | -4.45 | 1.16 | -2.11 | 0.43 | -0.61 | 4.23 | 4.63 | 9.34 | 15.61 | 2.44 | -1.79 |
1975-03 | 2.66 | 4.00 | 2.38 | 1.26 | -1.33 | 0.41 | -2.04 | 7.75 | 1.40 | -0.17 | 0.88 | 3.74 | -4.01 |
1975-04 | 4.23 | -0.71 | -1.14 | 1.41 | -1.34 | 0.44 | 1.38 | 2.75 | 6.82 | 2.53 | 0.21 | 2.83 | 0.08 |
1975-05 | 5.19 | 2.89 | -4.10 | -0.98 | -0.60 | 0.44 | -0.58 | 3.60 | 5.95 | 4.56 | 6.08 | 5.28 | 1.68 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-08 | 1.61 | -3.65 | -1.13 | 0.85 | 0.86 | 0.48 | 4.79 | 0.57 | 0.70 | 0.86 | 5.96 | 2.55 | 1.98 |
2024-09 | 1.74 | -1.02 | -2.59 | 0.04 | -0.26 | 0.40 | -0.60 | 4.01 | 1.54 | 2.69 | -2.21 | 0.26 | -3.75 |
2024-10 | -0.97 | -0.88 | 0.89 | -1.38 | 1.03 | 0.39 | 2.87 | -2.08 | -2.38 | -0.48 | -3.50 | 0.72 | 2.80 |
2024-11 | 6.51 | 4.78 | -0.05 | -2.62 | -2.17 | 0.40 | 0.90 | 10.52 | 6.01 | 4.76 | -0.98 | 9.83 | -0.69 |
2024-12 | -3.17 | -3.87 | -2.95 | 1.82 | -1.10 | 0.37 | 0.05 | -0.82 | -8.38 | 0.56 | -5.79 | -7.46 | -6.64 |
600 rows × 13 columns
# define factor model formulas (these are the right side of the regrssion formulas)
# these are how formulas are specified for statsmodel's formula api
factor_models = {
'r^e': '1',
'CAPM': 'Q("Mkt-RF")',
'FF3': 'Q("Mkt-RF") + SMB + HML',
'FF4': 'Q("Mkt-RF") + SMB + HML + Mom',
'FF5': 'Q("Mkt-RF") + SMB + HML + RMW + CMA',
'FF6': 'Q("Mkt-RF") + SMB + HML + RMW + CMA + Mom'
}
# pre built output table
index = pd.MultiIndex.from_product([factor_models.keys(), ['alpha', 't-stat']], names=['Model', 'Metric'])
results = pd.DataFrame(index=index, columns=portfolios, dtype=float)
results
Port1 | Port2 | Port3 | Port4 | Port5 | HmL | ||
---|---|---|---|---|---|---|---|
Model | Metric | ||||||
r^e | alpha | NaN | NaN | NaN | NaN | NaN | NaN |
t-stat | NaN | NaN | NaN | NaN | NaN | NaN | |
CAPM | alpha | NaN | NaN | NaN | NaN | NaN | NaN |
t-stat | NaN | NaN | NaN | NaN | NaN | NaN | |
FF3 | alpha | NaN | NaN | NaN | NaN | NaN | NaN |
t-stat | NaN | NaN | NaN | NaN | NaN | NaN | |
FF4 | alpha | NaN | NaN | NaN | NaN | NaN | NaN |
t-stat | NaN | NaN | NaN | NaN | NaN | NaN | |
FF5 | alpha | NaN | NaN | NaN | NaN | NaN | NaN |
t-stat | NaN | NaN | NaN | NaN | NaN | NaN | |
FF6 | alpha | NaN | NaN | NaN | NaN | NaN | NaN |
t-stat | NaN | NaN | NaN | NaN | NaN | NaN |
# Run regressions for each portfolio and model
full_reg_outout = {} # to save everything, in case we want to access other stuff (like beta loadings)
for portfolio in portfolios:
for model_name, formula in factor_models.items():
reg = smf.ols(formula=f'{portfolio} ~ {formula}', data=reg_df).fit()
# extract the intercept coef and t-stat
alpha = reg.params['Intercept']
t_stat = reg.tvalues['Intercept']
results.at[(model_name, 'alpha'), portfolio] = alpha
results.at[(model_name, 't-stat'), portfolio] = t_stat
full_reg_outout[(model_name, portfolio)] = reg
results
Port1 | Port2 | Port3 | Port4 | Port5 | HmL | ||
---|---|---|---|---|---|---|---|
Model | Metric | ||||||
r^e | alpha | 0.661057 | 0.557900 | 0.646938 | 0.681369 | 0.597724 | -0.063333 |
t-stat | 3.900987 | 3.433497 | 3.230902 | 3.859817 | 3.052126 | -0.647961 | |
CAPM | alpha | 0.114768 | 0.038745 | 0.010464 | 0.203102 | -0.043658 | -0.158426 |
t-stat | 1.630695 | 0.550711 | 0.118044 | 1.773375 | -0.584753 | -1.670608 | |
FF3 | alpha | 0.091171 | -0.052544 | 0.137807 | 0.302106 | -0.186326 | -0.277498 |
t-stat | 1.290888 | -0.815539 | 1.721098 | 2.730282 | -3.023698 | -3.132967 | |
FF4 | alpha | 0.124516 | -0.093283 | 0.208553 | 0.235236 | -0.152898 | -0.277413 |
t-stat | 1.733414 | -1.427806 | 2.585315 | 2.095277 | -2.442501 | -3.067951 | |
FF5 | alpha | -0.074883 | -0.184758 | 0.318594 | 0.133618 | -0.180963 | -0.106080 |
t-stat | -1.142738 | -2.948850 | 4.167975 | 1.209001 | -2.945735 | -1.222041 | |
FF6 | alpha | -0.032642 | -0.206952 | 0.362788 | 0.093130 | -0.155887 | -0.123244 |
t-stat | -0.496089 | -3.265914 | 4.717103 | 0.833356 | -2.511528 | -1.400810 |