9.7. Open Asset Pricing¶
An unreal project that contains info on portfolio returns for hundreds of asset pricing anomalies (at the monthly or daily level) and stock-level signals. It makes it possible for any student to get close to the frontier of asset pricing faster than ever! This is because it makes it simple to get all the essential data you need. Within the site, there is lots of background info, and a companion paper.
Using this dataset, you can replicate many classic asset pricing papers.
You can also hunt for ways to use signals to create portfolios that make money.
You should check out:
The website for the project: https://www.openassetpricing.com, which includes
info on the Data page
a list of signals and info about them
Code they used to make the dataset
Sample code showing some ways to use the data
A partial list of academic studies using the dataset. Looking at this will give you more ideas about what’s possible
The python package that downloads the data in python. This makes it easy to use. Some examples of using this package:
START HERE, a must: quick tour
Combining it with stock price returns from CRSP (where pros get stock returns, not yfinance):
Using with Fama French factors and testing portfolio anamolies
Below, a quick tour showing
Get the list of signals (Signal Doc) we can download
Download the data (in one line!)
Merge with Fama-French Factors
Plot the cumulative returns to one anomaly, along with info about when the anomaly was discovered.
Exercises:
Plot the cumulative excess returns to one anomaly, along with info about when the anomaly was discovered.
Plot multiple anomalies as cumulative returns.
Upgrade: Plot log returns.
Upgrade: Integrate the publication date for each anomaly. There are clever ways to integrate this without doing multiple vertical lines!
Plot multiple anomalies as cumulative excess returns.
Upgrade: Integrate the publication date for each anomaly. There are clever ways to integrate this without doing multiple vertical lines!
Plot the moving average returns to one anomalies.
Upgrade: Do as excess returns.
Upgrade: Integrate the publication date for it. The idea is to see if the return for this anomaly is lower after publication… did the market incorporate this signal?
Plot the moving average returns to multiple anomalies.
Upgrade: Do as excess returns.
Upgrade: Integrate the publication date for each anomaly. There are clever ways to integrate this without doing multiple vertical lines!
# !pip install -U openassetpricing
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import zipfile
from datetime import datetime
import openassetpricing as openap
# -------------------- Parameters --------------------
signallist = ['IndIPO','BM']
# -------------------- Initialize OpenAP and Download Signal Doc --------------------
openap_obj = openap.OpenAP()
signaldoc = openap_obj.dl_signal_doc('pandas')
print("Available signals:")
print(signaldoc[['Acronym', 'Authors', 'LongDescription']].head(20))
# -------------------- Download Portfolio Data --------------------
# Download OSAP portfolio returns for the IndIPO signal
port_osap = openap_obj.dl_port('op', 'pandas', signallist)
port_osap['date'] = pd.to_datetime(port_osap['date'])
# Reduce to LS portfolios (openap provides multiple portfolio versions for each anomaly on each date)
port_osap.query('port== "LS"', inplace=True)
# -------------------- Add Mkt-Rf returns --------------------
import pandas_datareader
pandas_datareader.famafrench.get_available_datasets()
# Developed_3_Factors
# download F-F_Research_Data_Factors
ff = pandas_datareader.get_data_famafrench('F-F_Research_Data_Factors',start=1950,end=2024)[0]
ff = ff.reset_index().rename(columns={'Date':'yearm'})
# convert ff['Date'] from period[M] to YYYYMM number, to merge with port_osap
ff['yearm'] = ff['yearm'].dt.year * 100 + ff['yearm'].dt.month
# Create a mapping from dates in port_osap to a 'yearm' format (e.g. 202103 for March 2021)
date_mapping = port_osap[['date']].drop_duplicates().copy()
date_mapping['yearm'] = date_mapping['date'].dt.year * 100 + date_mapping['date'].dt.month
date_mapping, ff
# Merge into the FF data the port_osap dates for each money (last trading date of that month/year)
ff = pd.merge(ff, date_mapping, on='yearm', how='inner')
# For our purposes, we only need the date and the market excess return (Mkt-RF)
ff_processed = ff[['date', 'Mkt-RF']].copy()
ff_processed.rename(columns={'Mkt-RF': 'ret'}, inplace=True)
ff_processed['signalname'] = 'mkt_rf'
ff_processed
# -------------------- Combine Data -------------------- #
# NOTE: this drops all variables except date, ret, and signalname
port_osap = pd.concat([port_osap[['date', 'ret', 'signalname']],
ff_processed[['date', 'ret', 'signalname']]], ignore_index=True)
display(port_osap.groupby('signalname').describe()['ret'])
c:\Users\DonsLaptop\anaconda3\Lib\site-packages\pandas\core\arrays\masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
from pandas.core import (
Available signals:
Acronym Authors \
0 AbnormalAccruals Xie
1 Accruals Sloan
2 AccrualsBM Bartov and Kim
3 Activism1 Cremers and Nair
4 AM Fama and French
5 AnalystRevision Hawkins, Chamberlin, Daniel
6 AnnouncementReturn Chan, Jegadeesh and Lakonishok
7 AssetGrowth Cooper, Gulen and Schill
8 BetaLiquidityPS Pastor and Stambaugh
9 BetaTailRisk Kelly and Jiang
10 betaVIX Ang et al.
11 BM Stattman
12 BMdec Fama and French
13 BookLeverage Fama and French
14 BPEBM Penman, Richardson and Tuna
15 Cash Palazzo
16 CashProd Chandrashekar and Rao
17 CBOperProf Ball et al.
18 CF Lakonishok, Shleifer, Vishny
19 cfp Desai, Rajgopal, Venkatachalam
LongDescription
0 Abnormal Accruals
1 Accruals
2 Book-to-market and accruals
3 Takeover vulnerability
4 Total assets to market
5 EPS forecast revision
6 Earnings announcement return
7 Asset growth
8 Pastor-Stambaugh liquidity beta
9 Tail risk beta
10 Systematic volatility
11 Book to market, original (Stattman 1980)
12 Book to market using December ME
13 Book leverage (annual)
14 Leverage component of BM
15 Cash to assets
16 Cash Productivity
17 Cash-based operating profitability
18 Cash flow to market
19 Operating Cash flows to price
Data is downloaded: 8s
C:\Users\DonsLaptop\AppData\Local\Temp\ipykernel_14552\1088216037.py:36: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
ff = pandas_datareader.get_data_famafrench('F-F_Research_Data_Factors',start=1950,end=2024)[0]
C:\Users\DonsLaptop\AppData\Local\Temp\ipykernel_14552\1088216037.py:36: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
ff = pandas_datareader.get_data_famafrench('F-F_Research_Data_Factors',start=1950,end=2024)[0]
count | mean | min | 25% | 50% | 75% | max | std | |
---|---|---|---|---|---|---|---|---|
signalname | ||||||||
BM | 870.0 | 0.674195 | -16.609171 | -1.605303 | 0.480837 | 2.524949 | 33.273039 | 4.122496 |
IndIPO | 584.0 | 0.442920 | -18.936894 | -1.079318 | 0.351437 | 2.080434 | 15.404382 | 3.382486 |
mkt_rf | 870.0 | 0.642598 | -23.240000 | -1.940000 | 1.025000 | 3.422500 | 16.100000 | 4.357549 |
def plot_anomaly(signal, years_presamp=15, plot_end=2023):
'''
Plot the cumulative long-short return of a given signal, with vertical lines
indicating the publication date and sample end date of the signal's paper.
Parameters:
signal (str): The signal to plot.
years_presamp (int): The number of years to go back from the sample end date.
If None, use the sample start date for the publication.
plot_end (int): The year to end the plot. If None, use the most recent year in the data.
'''
# -------------------- Get Signal Documentation for IndIPO --------------------
doctarget = signaldoc[signaldoc['Acronym'] == signal].iloc[0]
# Publication date is the December 31 of the paper's year
doctarget_pubdate = pd.to_datetime(str(doctarget['Year']) + '-12-31')
# Publication's Sample end date is the December 31 of the SampleEndYear
doctarget_sampend = pd.to_datetime(str(doctarget['SampleEndYear']) + '-12-31')
# Build a paper name for annotation purposes
papername = f"{doctarget['Authors']} {doctarget['Year']} ({signal})"
if years_presamp is None:
start_date = pd.to_datetime(str(doctarget['SampleStartYear']) + '-12-31')
else:
start_date = doctarget_sampend - pd.DateOffset(years=years_presamp)
end_date = pd.to_datetime(str(plot_end) + '-12-31')
# -------------------- Filter Data --------------------
plotme = port_osap[(port_osap['date'] >= start_date) & (port_osap['date'] <= end_date)].copy()
plotme = plotme[plotme['signalname'] == signal]
plotme['cret'] = (1 + plotme['ret']/100).cumprod()
# Determine a y-axis location for annotating the vertical lines (75% of the way from 1 to the max)
yloc = (plotme['cret'].max() - 1) * 0.75
# -------------------- Plot Using Seaborn --------------------
plt.figure(figsize=(6, 4))
sns.lineplot(data=plotme, x='date', y='cret', linewidth=1.2)
plt.xlabel('')
plt.ylabel('Cummulative Long-Short Return')
if signal not in ['Mkt-Rf']:
# Add a red vertical line at the publication date and annotate it
plt.axvline(doctarget_pubdate, color='red')
plt.text(doctarget_pubdate, yloc, "\n" + papername + " Published",
color="red", rotation=90, verticalalignment='center')
# Add a blue vertical line at the sample end date and annotate it
plt.axvline(doctarget_sampend, color='blue')
plt.text(doctarget_sampend, yloc, "\n" + papername + " Sample Ends",
color="blue", rotation=90, verticalalignment='center')
plt.legend(title='')
plt.tight_layout()
plt.show()
plot_anomaly('IndIPO',plot_end=2000)
plot_anomaly('BM', plot_end=2000)
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
