2.2. THE GOLDEN RULES

Category

Rule

0. PLAN BEFORE YOU CODE

A. “Pseudo code” is writing out the broad steps in plain language. I often (almost always for complicated tasks) do this on paper, then translate it to code as an outline (in the code’s comments).

Maybe planning sounds boring and like a waste of time. I get it; I also want to shoot first like Han did… but coders like Han often end up looking like this guy

B. Break the problem into chunks/smaller problems. This dovetails with rule 5.B below nicely.

1. Automation

A. Automate everything that can be automated, don’t do point-and-click analysis!

B. Write a single script that executes all code from beginning to end

2. Version control

A. Store code and data under version control.

B. Before checking the directory back in, clear all outputs and temp files and then run the whole directory! (Check: Did it work right?)

3. Directories/folders

A. Separate directories/folders by function

B. Put input files into an input folder and outputs into a different one

A + B = your folders and files will be largely self documenting

C. Make directories portable - they should run on any computer, or if you move them to another place on your computer

D Use RELATIVE FILE PATHS, not absolute file paths

4. Keys / Units

A. Store cleaned data in tables with unique, non-missing “keys”

B. Keep data normalized as far into your code pipeline as you can

5. Abstraction - fncs/classes

A. Abstract to eliminate redundancy

B. Abstract to improve clarity

C. Otherwise, don’t abstract

D. Unit test your functions!

E. Don’t use magic numbers, define once as variables and refer as needed

6. Documentation

A. Is good… to a point

B. Don’t write documentation you will not maintain

C. Code is better off when it is self-documenting

7. Look at your data/objects

As discussed here