2.8. Writing good code

This is a short list of tips. Also look at these tips and these tips for when you’re stuck.

2.8.1. Naming variables and functions

I try to put verbs in the names of functions (get_returns()) and name dataframes after their unit-levels (monthly_df). Avoid ambiguous abbreviations.

2.8.2. Code style + Auto-Formatting!

There are many ways to achieve the same things in python. Python tends to be a very readable language compared to others. Still, the style of how you write the code can make python code easy to read or hard to read!

Tip

  1. Obey the naming suggestions above.

  2. Use an auto-formatter! This will rewrite your code automatically to have good style!

There are a few popular auto-formatters (black, yapf, and autopep8). In my JupyterLab set up, I explain how I set up Black, the “uncompromising Python code formatter” which is very opinionated (“any color you’d like, as long as it is black”).

Look at what Black does to this code:

This function is too long to even read:

def very_important_function(template: str, *variables, file: os.PathLike, engine: str, header: bool = True, debug: bool = False):
    """Applies `variables` to the `template` and writes to `file`."""
    with open(file, 'w') as f:
        ...

I hit CTRL+SHIFT+F and this was the result:

def very_important_function(
    template: str,
    *variables,
    file: os.PathLike,
    engine: str,
    header: bool = True,
    debug: bool = False
):
    """Applies `variables` to the `template` and writes to `file`."""
    with open(file, "w") as f:
        ...

2.8.3. Use a linter + coding assistance

A linter is a programming tool to detect possible errors in your code and stylistic issues.

See my JupyterLab set up, for install instructions you can follow to install Jupyterlab-lsp. This extension provides code navigation + hover suggestions + linters + autocompletion + rename assistance.

2.8.4. DRY code

Don’t Repeat Yourself. See the Functions page for tips on how to use functions to reduce duplication.

2.8.5. Premature optimization

In this class, you likely won’t get to the point where you try to optimize your code for speed. Our problem sets aren’t quite massive enough to need that. Some student projects might tempt students to optimize for speed.

Don’t! Total time = your coding time (initial time, debug time, revising time) + computer time. Computer time is cheap. Yours is limited.

First: Write clean, easy to use code that works.

Only once your code is virtually complete should you even contemplate speed. And still, you’re probably optimizing too soon. (You haven’t yet realized that you need to completely reformulate the approach to the problem.)

2.8.6. Magic numbers are bad

A magic number is a literal number or parameter value embedded in your code. Having numbers that act as in-line inputs and parameters can easily lead to errors and make modifying code very tough.

Here is an example that tries to download and stock price data.

If I want to change the stocks included, I’ll need to change the list of stocks twice.

import pandas_datareader as pdr  
from datetime import datetime
import yfinance as yf

# load stock returns 
start = datetime(2004, 1, 1)
end = datetime(2007, 12, 31)

stock_prices         = yf.download(['MSFT','AAPL'], start , end)
stock_prices.index   = stock_prices.index.tz_localize(None)      # change yf date format to match pdr
stock_prices         = stock_prices.filter(like='Adj Close')     # reduce to just columns with this in the name
stock_prices.columns = ['MSFT','AAPL']


# do more stuff...

Now I only need to change the variable stocks to alter the entire code.

import pandas_datareader as pdr  
from datetime import datetime
import yfinance as yf

# load stock returns 
start = datetime(2004, 1, 1)
end = datetime(2007, 12, 31)
stocks = ['MSFT','AAPL']
stock_prices         = yf.download(stocks, start , end)
stock_prices.index   = stock_prices.index.tz_localize(None)      # change yf date format to match pdr
stock_prices         = stock_prices.filter(like='Adj Close')     # reduce to just columns with this in the name
stock_prices.columns = stock_prices.columns.get_level_values(1)  # tickers as col names, works no matter order of tics

# do more stuff...