{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Rolling returns\n", "\n", "```{admonition} Takeaway\n", ":class: note\n", "\n", "When you want rolling returns over some calendar unit (hours, days, weeks, months, year), in realistic data with many firms and returns missing for parts of the calendar (e.g. no weekend trading, no after-hours trading):\n", "\n", "1. `df['newVarName'] = df.set_index('date').groupby('asset')['ret'].rolling(window=,1).apply(lambda x: np.prod(1+x)-1).values`\n", "1. Do not use `window=#` unless you are sure your data is perfectly formed. `window=5` fails in real data because stock markets close for the weekend.\n", "1. The `` could be '7D', '24H', '1M' or many other things - [see this link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).\n", "1. The docs for the [`rolling` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#pandas.DataFrame.rolling) provide extra functionality. \n", "```\n", "\n", "Suppose we have an asset-timeperiod dataset, in long/tidy form. For example, this is a firm-day trading dataset:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FirmDateret
012022-01-110.0100
112022-01-120.0100
212022-01-130.0100
312022-01-14-0.0294
412022-01-180.0100
512022-01-19-0.0100
612022-01-300.0300
722022-01-110.0200
822022-01-12-0.0200
922022-01-130.0300
1022022-01-14-0.0300
1122022-01-180.0400
1222022-01-19-0.0400
1322022-01-300.0500
\n", "
" ], "text/plain": [ " Firm Date ret\n", "0 1 2022-01-11 0.0100\n", "1 1 2022-01-12 0.0100\n", "2 1 2022-01-13 0.0100\n", "3 1 2022-01-14 -0.0294\n", "4 1 2022-01-18 0.0100\n", "5 1 2022-01-19 -0.0100\n", "6 1 2022-01-30 0.0300\n", "7 2 2022-01-11 0.0200\n", "8 2 2022-01-12 -0.0200\n", "9 2 2022-01-13 0.0300\n", "10 2 2022-01-14 -0.0300\n", "11 2 2022-01-18 0.0400\n", "12 2 2022-01-19 -0.0400\n", "13 2 2022-01-30 0.0500" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df = pd.DataFrame({'Firm': [1,1,1,1,1,1,1,2,2,2,2,2,2,2],\n", " 'Date' : ['1/11/22','1/12/22','1/13/22','1/14/22','1/18/22','1/19/22','1/30/22',\n", " '1/11/22','1/12/22','1/13/22','1/14/22','1/18/22','1/19/22','1/30/22'],\n", " 'ret' : [.01,.01,.01,-.0294,.01,-0.01,.03,.02,-.02,.03,-.03,.04,-.04,.05] })\n", "df['Date'] = pd.to_datetime(df.Date,format=\"%m/%d/%y\")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The problem\n", "\n", "Let's create a variable containing the cumulative return over the last week. Let's stipulate that if we have less than five days of returns for a firm at any point, we just use what we have.\n", "\n", "```{exercise}\n", "Before continuing, eyeball the dataset above and guess what the rolling week-long returns should be for each observation.\n", "```\n", "\n", "## Working the problem\n", "\n", "This one-liner is very close to correct:\n", "\n", "```python\n", "df.groupby('Firm')['ret'].rolling(5,1).apply(lambda x: np.prod(1+x)-1)\n", "```\n", "\n", "It's not correct, but let's start here because it's easier to understand. Let me separate that code up and add comments for each part:\n", "\n", "```python \n", "(df.groupby('Firm')['ret'] # for each firm, grab the ret var\n", " .rolling(window=5,min_periods=1) # in each row, use the prior 1-5 obs (including the current one)\n", " .apply(lambda x: np.prod(1+x)-1) # R[0,T] = prod{ R(0), ..., R(T) } - 1\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To add that variable to our dataset, `df['varname'] = ` doesn't quite work. A small workaround does it: add `.values` to the end of the line. (_It returns just the column of numbers and ignores the weird index the code above makes._)\n", "\n", "So let's look at the result:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FirmDateretret1week_wrong
012022-01-110.01000.010000
112022-01-120.01000.020100
212022-01-130.01000.030301
312022-01-14-0.02940.000010
412022-01-180.01000.010010
512022-01-19-0.0100-0.009990
612022-01-300.03000.009614
\n", "
" ], "text/plain": [ " Firm Date ret ret1week_wrong\n", "0 1 2022-01-11 0.0100 0.010000\n", "1 1 2022-01-12 0.0100 0.020100\n", "2 1 2022-01-13 0.0100 0.030301\n", "3 1 2022-01-14 -0.0294 0.000010\n", "4 1 2022-01-18 0.0100 0.010010\n", "5 1 2022-01-19 -0.0100 -0.009990\n", "6 1 2022-01-30 0.0300 0.009614" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['ret1week_wrong'] = df.groupby('Firm')['ret'].rolling(5,1).apply(lambda x: np.prod(1+x)-1).values\n", "df[:7] # just look at the top of the dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{warning}\n", "The result above is wrong!\n", "- 1/17 was a holiday, so the return for 1/18 should start with 1/12 (and only use 4 rows).\n", "- 1/30 should only include itself since we are missing dates. \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's fix that. The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#pandas.DataFrame.rolling) mentions that the window length can use a time period!\n", "\n", "So instead of `window=5`, we can use `window='7D'`!\n", "\n", "Except there is a final complication: After `df.groupby('Firm')['ret']`, the date variable is gone. And `rolling()` needs that, So we need to keep the date variable around somehow. \n", "\n", "The last fix: Add `.set_index('Date')` before groupby, so that the date variable will stick around as the index. \n", "\n", "## The solution\n", "\n", "Put all that together:\n", "1. Put the date in the index\n", "1. \"For each asset\"\n", "1. Grab the returns for the prior week (using the dates to allow for holidays)\n", "1. Cumulate the returns\n", "1. Add that variable to the dataset\n", "\n", "We can write this in one line:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df['ret1week'] = df.set_index('Date').groupby('Firm')['ret'].rolling('7D',1).apply(lambda x: np.prod(1+x)-1).values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The answer\n", "\n", "```{note}\n", "Look at the last three dates for each firm, and you'll see how incorrect the `window=#` approach is.\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
retret1week_wrongret1week
FirmDate
12022-01-110.01000.0100000.010000
2022-01-120.01000.0201000.020100
2022-01-130.01000.0303010.030301
2022-01-14-0.02940.0000100.000010
2022-01-180.01000.0100100.000010
2022-01-19-0.0100-0.009990-0.019792
2022-01-300.03000.0096140.030000
22022-01-110.02000.0200000.020000
2022-01-12-0.0200-0.000400-0.000400
2022-01-130.03000.0295880.029588
2022-01-14-0.0300-0.001300-0.001300
2022-01-180.04000.0386480.018283
2022-01-19-0.0400-0.022449-0.002499
2022-01-300.05000.0473770.050000
\n", "
" ], "text/plain": [ " ret ret1week_wrong ret1week\n", "Firm Date \n", "1 2022-01-11 0.0100 0.010000 0.010000\n", " 2022-01-12 0.0100 0.020100 0.020100\n", " 2022-01-13 0.0100 0.030301 0.030301\n", " 2022-01-14 -0.0294 0.000010 0.000010\n", " 2022-01-18 0.0100 0.010010 0.000010\n", " 2022-01-19 -0.0100 -0.009990 -0.019792\n", " 2022-01-30 0.0300 0.009614 0.030000\n", "2 2022-01-11 0.0200 0.020000 0.020000\n", " 2022-01-12 -0.0200 -0.000400 -0.000400\n", " 2022-01-13 0.0300 0.029588 0.029588\n", " 2022-01-14 -0.0300 -0.001300 -0.001300\n", " 2022-01-18 0.0400 0.038648 0.018283\n", " 2022-01-19 -0.0400 -0.022449 -0.002499\n", " 2022-01-30 0.0500 0.047377 0.050000" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index(['Firm','Date'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 4 }