9.7. Open Asset Pricing

An unreal project that contains info on portfolio returns for hundreds of asset pricing anomalies (at the monthly or daily level) and stock-level signals. It makes it possible for any student to get close to the frontier of asset pricing faster than ever! This is because it makes it simple to get all the essential data you need. Within the site, there is lots of background info, and a companion paper.

Using this dataset, you can replicate many classic asset pricing papers.

You can also hunt for ways to use signals to create portfolios that make money.

You should check out:

  1. The website for the project: https://www.openassetpricing.com, which includes

    • info on the Data page

    • a list of signals and info about them

    • Code they used to make the dataset

    • Sample code showing some ways to use the data

    • A partial list of academic studies using the dataset. Looking at this will give you more ideas about what’s possible

  2. The python package that downloads the data in python. This makes it easy to use. Some examples of using this package:

    1. START HERE, a must: quick tour

    2. Combining it with stock price returns from CRSP (where pros get stock returns, not yfinance):

    3. Using with Fama French factors and testing portfolio anamolies

    4. :star: Machine Learning example :star:

    5. More demos, in R though

Below, a quick tour showing

  • Get the list of signals (Signal Doc) we can download

  • Download the data (in one line!)

  • Merge with Fama-French Factors

  • Plot the cumulative returns to one anomaly, along with info about when the anomaly was discovered.

Exercises:

  1. Plot the cumulative excess returns to one anomaly, along with info about when the anomaly was discovered.

  2. Plot multiple anomalies as cumulative returns.

    • Upgrade: Plot log returns.

    • Upgrade: Integrate the publication date for each anomaly. There are clever ways to integrate this without doing multiple vertical lines!

  3. Plot multiple anomalies as cumulative excess returns.

    • Upgrade: Integrate the publication date for each anomaly. There are clever ways to integrate this without doing multiple vertical lines!

  4. Plot the moving average returns to one anomalies.

    • Upgrade: Do as excess returns.

    • Upgrade: Integrate the publication date for it. The idea is to see if the return for this anomaly is lower after publication… did the market incorporate this signal?

  5. Plot the moving average returns to multiple anomalies.

    • Upgrade: Do as excess returns.

    • Upgrade: Integrate the publication date for each anomaly. There are clever ways to integrate this without doing multiple vertical lines!

# !pip install -U openassetpricing
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import zipfile
from datetime import datetime
import openassetpricing as openap

# -------------------- Parameters --------------------
signallist = ['IndIPO','BM']

# -------------------- Initialize OpenAP and Download Signal Doc --------------------
openap_obj = openap.OpenAP()  
signaldoc = openap_obj.dl_signal_doc('pandas')

print("Available signals:")
print(signaldoc[['Acronym', 'Authors', 'LongDescription']].head(20))

# -------------------- Download Portfolio Data --------------------
# Download OSAP portfolio returns for the IndIPO signal
port_osap = openap_obj.dl_port('op', 'pandas', signallist)
port_osap['date'] = pd.to_datetime(port_osap['date'])

# Reduce to LS portfolios (openap provides multiple portfolio versions for each anomaly on each date)
port_osap.query('port== "LS"', inplace=True)

# -------------------- Add Mkt-Rf returns --------------------

import pandas_datareader 
pandas_datareader.famafrench.get_available_datasets()
# Developed_3_Factors

# download F-F_Research_Data_Factors
ff = pandas_datareader.get_data_famafrench('F-F_Research_Data_Factors',start=1950,end=2024)[0]
ff = ff.reset_index().rename(columns={'Date':'yearm'})

# convert ff['Date'] from period[M] to YYYYMM number, to merge with port_osap
ff['yearm'] = ff['yearm'].dt.year * 100 + ff['yearm'].dt.month

# Create a mapping from dates in port_osap to a 'yearm' format (e.g. 202103 for March 2021)
date_mapping = port_osap[['date']].drop_duplicates().copy()
date_mapping['yearm'] = date_mapping['date'].dt.year * 100 + date_mapping['date'].dt.month
date_mapping, ff

# Merge into the FF data the port_osap dates for each money (last trading date of that month/year)
ff = pd.merge(ff, date_mapping, on='yearm', how='inner')

# For our purposes, we only need the date and the market excess return (Mkt-RF)
ff_processed = ff[['date', 'Mkt-RF']].copy()
ff_processed.rename(columns={'Mkt-RF': 'ret'}, inplace=True)
ff_processed['signalname'] = 'mkt_rf'
ff_processed

# -------------------- Combine Data -------------------- #
# NOTE: this drops all variables except date, ret, and signalname 

port_osap = pd.concat([port_osap[['date', 'ret', 'signalname']],
              ff_processed[['date', 'ret', 'signalname']]], ignore_index=True)

display(port_osap.groupby('signalname').describe()['ret'])
c:\Users\DonsLaptop\anaconda3\Lib\site-packages\pandas\core\arrays\masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
  from pandas.core import (
Available signals:
               Acronym                         Authors  \
0     AbnormalAccruals                             Xie   
1             Accruals                           Sloan   
2           AccrualsBM                  Bartov and Kim   
3            Activism1                Cremers and Nair   
4                   AM                 Fama and French   
5      AnalystRevision     Hawkins, Chamberlin, Daniel   
6   AnnouncementReturn  Chan, Jegadeesh and Lakonishok   
7          AssetGrowth        Cooper, Gulen and Schill   
8      BetaLiquidityPS            Pastor and Stambaugh   
9         BetaTailRisk                 Kelly and Jiang   
10             betaVIX                      Ang et al.   
11                  BM                        Stattman   
12               BMdec                 Fama and French   
13        BookLeverage                 Fama and French   
14               BPEBM     Penman, Richardson and Tuna   
15                Cash                         Palazzo   
16            CashProd           Chandrashekar and Rao   
17          CBOperProf                     Ball et al.   
18                  CF    Lakonishok, Shleifer, Vishny   
19                 cfp  Desai, Rajgopal, Venkatachalam   

                             LongDescription  
0                          Abnormal Accruals  
1                                   Accruals  
2                Book-to-market and accruals  
3                     Takeover vulnerability  
4                     Total assets to market  
5                      EPS forecast revision  
6               Earnings announcement return  
7                               Asset growth  
8            Pastor-Stambaugh liquidity beta  
9                             Tail risk beta  
10                     Systematic volatility  
11  Book to market, original (Stattman 1980)  
12          Book to market using December ME  
13                    Book leverage (annual)  
14                  Leverage component of BM  
15                            Cash to assets  
16                         Cash Productivity  
17        Cash-based operating profitability  
18                       Cash flow to market  
19             Operating Cash flows to price  

Data is downloaded: 8s
C:\Users\DonsLaptop\AppData\Local\Temp\ipykernel_14552\1088216037.py:36: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  ff = pandas_datareader.get_data_famafrench('F-F_Research_Data_Factors',start=1950,end=2024)[0]
C:\Users\DonsLaptop\AppData\Local\Temp\ipykernel_14552\1088216037.py:36: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  ff = pandas_datareader.get_data_famafrench('F-F_Research_Data_Factors',start=1950,end=2024)[0]
count mean min 25% 50% 75% max std
signalname
BM 870.0 0.674195 -16.609171 -1.605303 0.480837 2.524949 33.273039 4.122496
IndIPO 584.0 0.442920 -18.936894 -1.079318 0.351437 2.080434 15.404382 3.382486
mkt_rf 870.0 0.642598 -23.240000 -1.940000 1.025000 3.422500 16.100000 4.357549
def plot_anomaly(signal, years_presamp=15, plot_end=2023):
    '''
    Plot the cumulative long-short return of a given signal, with vertical lines
    indicating the publication date and sample end date of the signal's paper.
    
    Parameters:
    signal (str): The signal to plot.
    years_presamp (int): The number of years to go back from the sample end date. 
                         If None, use the sample start date for the publication.
    plot_end (int): The year to end the plot. If None, use the most recent year in the data.
    '''

    # -------------------- Get Signal Documentation for IndIPO --------------------
    doctarget = signaldoc[signaldoc['Acronym'] == signal].iloc[0]
    # Publication date is the December 31 of the paper's year
    doctarget_pubdate = pd.to_datetime(str(doctarget['Year']) + '-12-31')
    # Publication's Sample end date is the December 31 of the SampleEndYear
    doctarget_sampend = pd.to_datetime(str(doctarget['SampleEndYear']) + '-12-31')
    # Build a paper name for annotation purposes
    papername = f"{doctarget['Authors']} {doctarget['Year']} ({signal})"
    
    if years_presamp is None:
        start_date = pd.to_datetime(str(doctarget['SampleStartYear']) + '-12-31')
    else:
        start_date = doctarget_sampend - pd.DateOffset(years=years_presamp)
        
    end_date = pd.to_datetime(str(plot_end) + '-12-31')
        
        
    # -------------------- Filter Data --------------------
    plotme = port_osap[(port_osap['date'] >= start_date) & (port_osap['date'] <= end_date)].copy()
    plotme = plotme[plotme['signalname'] == signal]
    plotme['cret'] = (1 + plotme['ret']/100).cumprod()
    
    # Determine a y-axis location for annotating the vertical lines (75% of the way from 1 to the max)
    yloc = (plotme['cret'].max() - 1) * 0.75
        
    # -------------------- Plot Using Seaborn --------------------
    plt.figure(figsize=(6, 4))
    sns.lineplot(data=plotme, x='date', y='cret', linewidth=1.2)
    plt.xlabel('')
    plt.ylabel('Cummulative Long-Short Return')

    if signal not in ['Mkt-Rf']:
        # Add a red vertical line at the publication date and annotate it
        plt.axvline(doctarget_pubdate, color='red')
        plt.text(doctarget_pubdate, yloc, "\n" + papername + " Published",
                color="red", rotation=90, verticalalignment='center')

        # Add a blue vertical line at the sample end date and annotate it
        plt.axvline(doctarget_sampend, color='blue')
        plt.text(doctarget_sampend, yloc, "\n" + papername + " Sample Ends",
                color="blue", rotation=90, verticalalignment='center')

    plt.legend(title='')
    plt.tight_layout()
    plt.show()
    
plot_anomaly('IndIPO',plot_end=2000)
plot_anomaly('BM', plot_end=2000)
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../../_images/05e_OpenAP_anomaly_plot_4_1.png
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../../_images/05e_OpenAP_anomaly_plot_4_3.png