Following these GOLDEN RULES will save you hours and hours of time.


The rules below are really golden. How about you copy this page into your class notes repo someplace you can reference them?




A. “Pseudo code” is writing out the broad steps in plain language. I often (almost always for complicated tasks) do this on paper, then translate it to code as an outline (in the code’s comments).

Maybe planning sounds boring and like a waste of time. I get it; I also want to shoot first like Han did… but coders like Han often end up looking like this guy

B. Break the problem into chunks/smaller problems. Large chunks should be in different script files. Within an individual file that is doing one thing, try to break it down. See rule 3.A and 5.B below too.

1. Automation

A. Automate everything that can be automated, don’t do point-and-click analysis!

B. If the project involves running multiple code files in order, write a single script that executes all code for the project from beginning to end, like this one.

2. Version control

A. Store code and data under version control.

B. Use the Github workflow!

C. Before checking the directory back in, clear all outputs, delete temp files, and then run the whole directory to make sure the outputs reproduce! (Check: Did it work right?)

If the project uses notebook files, always look to see if the first executed code block has “[1]” next to it and that all the subsequent code blocks are numbered consecutively.

3. Organizing folders

A. Separate folders/directories and files by function

B. Put input files into an input folder and outputs into a different folder

A + B = your folders and files will be largely self-documenting

C. Make directories portable - they should run on any computer, or if you move them to another place on your computer. See the next rule.

D Use RELATIVE FILE PATHS, not absolute file paths ⭐ ⭐

4. Data

A. Store cleaned data in tables with unique, non-missing “keys”

B. Keep data normalized as far into your code pipeline as you can

C. Data cleaning and exploration data analysis golden rules here.

5. Functions

A. Write functions to eliminate redundancy

B. Write functions to improve clarity

C. Otherwise, don’t write functions

D. Test your functions! Use small examples where you know the right answers, and try variations to see if the function breaks in some cases.

6. Documentation and comments

A. Is good… to a point

B. Don’t write documentation you will not maintain

C. Code is better off when it is self-documenting

7. Writing code

A. Don’t use magic numbers, define once as variables and refer as needed

B. Write DRY code: Don’t Repeat Yourself! See rule 5.A.

C. Premature optimization (for speed) is the root of all evil.

D. Use self-documenting variable and function names.

8. Look at your data/objects

As discussed here and many other places.

ABCD = Always Be Checking your Data!