2.4. When to write functions

Writing your own functions is important for improving the clarity of your code because it

  • separates different strands of logic (Rule 5.B)

  • allows you to reuse code (Rule 5.A)

  • prevents copy/paste errors (Rule 5.A)

The “Code and Data” reference has a short but very nice illustration justifying and explaining those rules. Do yourself a favor and read it!

I’ll simply add that while a “wet jumpshot” is good in basketball, “wet code” is bad. You’ve Wasted Everyone’s Time by Writing Everything Twice, which is bad even though We Enjoy Typing.

Nah, you want your code to be DRY.

It’s very simple: Don’t Repeat Yourself.

A few more points:

  • Don’t abstract for the sake of it! Writing functions can be a time-consuming waste of time that eats up time. (Lots of redundancy in that sentence was a waste of time to the reader, but at least it was short. Similarly, bad code is less problematic when it is short.)

  • Rule 5.E: A magic number is a literal number embedded in your code. “Magic number” is a pejorative. Having numbers that act as in-line inputs and parameters can lead to errors and make modifying code very tough. For example, in a simple program, I wrote 2006 and ['AAPL','MSFT','VZ'] several times throughout the code. This is bad practice!

  • Rule 5.D: If you are going to reuse your code, write a “unit test”, which is a script that tests out the behavior of the function you’ve written to make sure it works as intended. It should run your function with a few different inputs, possibly inputs with deliberate errors, test the function’s output against answer you know in advance are correct. Using “toy” or “small” datasets is essential to developing functions and code more broadly.

    • Corollary: Don’t unit test for the sake of it. You don’t need to check what round_a_number() does when you use a string as input.

  • On “classes”: I don’t personally use classes much, if at all. It wasn’t necessary or extremely beneficial for my early projects (or so I, uniformed!, thought), so I didn’t use them and learn them. Then inertia took over… But defining your own classes can be extremely useful! If you want to use them when warranted during this semester, absolutely go for it. I recommend reading the Whirlwind of Python section as a starter.

2.4.1. Documentation

This is maybe ok, but probably bad:

# Elasticity = Percent Change in Quantity / Percent Change in Price
# Elasticity = 0.4 / 0.2 = 2
# See Shapiro (2005), The Economics of Potato Chips,
# Harvard University Mimeo, Table 2A.

The problem with comments is that if you change the code, you don’t have to change the comments and the code will still run. E.g., if you change elasticity=2 above, you might easily forget to change the comment associated with it, or explain why.

So using comments makes it possible that your code becomes internally inconsistent! The next example prevents that, yet still documents the code just as well because it is self-documenting:

# See Shapiro (2005), The Economics of Potato Chips,
# Harvard University Mimeo, Table 2A.
percent_change_in_quantity = -0.4
percent_change_in_price = 0.2
elasticity = percent_change_in_quantity/percent_change_in_price

Related points:

  • Use the naming of variables and the structure of the code to help guide a reader through your operations

  • x and y are usually (but not always!) bad variable names because they are uninformative

  • The aim for self-documentation underlies the logic behind the “Good Data” section, the “Directory” rules, how we name files, …

  • Documentation is sometimes necessary and unavoidable. (Linking to “The Economics of Potato Chips” in the example above is excessive, the name suffices.)

  • Documentation can clarify that, yes, I did mean to do this on purpose