1.8. Debugging - Making code work

1.8.1. Bugs

Perhaps the most famous early case of a computer bug was reported by Grace Hopper, who was absolutely a computer science legend and pioneer, and certainly also a bad mama jama. You should definitely go down the wiki rabbit hole on Grace’s life sometime.

Computers are extremely powerful but incredibly stupid. We want to both

  1. Fix bugs when they happen

  2. Prevent them from happening!

This section is mostly about the former. Good coding habits and defensive coding will help prevent them, and I cover those in the next lecture.

So, to fix bugs, you need to

  1. Realize that you have a bug

  2. Figure out where it is

  3. Make it repeatable (and you’ll understand the bug)

  4. Fix it (duh) and test it (the existence of the bug should disabuse you of your coding invincibility!)

Advice that could save (or cost) you thousands: Those steps are general, and work for other things besides code, like plumbing and electrical work on your parent’s house.

1.8.1.1. Read the error codes!

Tip

This website-tutorialsteacher has a nice page listing the most common error types. If you get an error and aren’t sure what it means, refer to this link as a starting point.

Really, error codes tend to be informative! You can google them for more info, but even without Google, they often point directly at the issue and location.

I created a short walkthrough of the types of errors here.

1.8.1.3. Flipping switches / divide and conquer / or: find the bug

After slaving over your computer and a piece of paper (you smartly planned out your code before you went in head first), you’ve found a clever solution to your problem. Your code is beautiful and elegant, like this:

2+2 # imagine this is a bunch of code
2+2 # imagine this is a bunch of code
2+2 # imagine this is a bunch of code
Error # somewhere in the code is an error. But in real programming you don’t know the error is here!
2+2 # imagine this is a bunch of code
2+2 # imagine this is a bunch of code
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
C:\Users\DONSLA~1\AppData\Local\Temp/ipykernel_9464/3884189124.py in <module>
      2 2+2 # imagine this is a bunch of code
      3 2+2 # imagine this is a bunch of code
----> 4 Error # somewhere in the code is an error. But in real programming you don’t know the error is here!
      5 2+2 # imagine this is a bunch of code
      6 2+2 # imagine this is a bunch of code

NameError: name 'Error' is not defined

But python had other ideas, I guess…

Despite the appearance (his computer is on fire, after all), that guy works in IT. He spends all day taking calls from people with computer problems, typically menial. It drives him and his coworker crazy. One of the true lessons of the show, a profound piece of wisdom, really, and one that is my first method of solving virtually any technical issue, comes from that coworker:

I don’t mean turning the computer off and on again. (Well, sometimes.) … But you can turn parts of your code off:

2+2 # imagine this is a bunch of code
2+2 # imagine this is a bunch of code
# 2+2 # imagine this is a bunch of code
# Error # somewhere in the code is an error. But in real programming you don’t know the error is here!
# 2+2 # imagine this is a bunch of code
# 2+2 # imagine this is a bunch of code
4

At least we know the issue isn’t in the first two lines. We can proceed and look elsewhere.

Luckily, python error statements tend to be informative enough. Above, we know the issue is in line 4. But in more sophisticated settings, where the lines above aren’t 2+2 but chunks of code, and the error isn’t simply due to syntax or namespace issues… the on/off method can be useful. Why? Because many “errors” can exist even when the code executes.

1.8.2. Seriously… print your data and objects OFTEN!

Suppose you have a large dataset you want to explore. What can you do to look at it?

Here are some options:

  1. Print parts of it in Jupyter, and look at the actual data.

  2. Print many summary stats, and plots.

  3. Look at the dataset using variableInspector.

    • Warning: If your dataset is large enough, using variableInspector can significantly slow down your computer and force you to do a hard reset.

    • Workaround: Copy a smaller slice of the data to a new, smaller variable, and then look at that slice using variableInspector.

  4. Output to a csv file and open in Excel.

  5. Use the spyder program that came with Anaconda. Spyder has a UI that is more like Matlab or Stata, and so you can view and scroll through objects in memory. This can be very, very useful for developing code. In fact, Spyder is how most of my own research code is written.

If you have a dataset in Jupyter you want to open in Spyder, you can either

  • save the object from Jupyter (via the pickle module) and open it in Spyder,

  • or convert the ipynb file to a simple py file which removes the Markdown so that Spyder can run: (File menu > Export Notebook as > Export Notebook to Executable Script )

Then you’d simply execute the code in Spyder up to the point you were at, and continue.

Much of this class will require delivery of ipynb files, and I would recommend using Jupyter exclusively at the beginning. However, if you personally prefer Spyder for bigger projects later on, feel free to use Spyder until the project is ready for write up (at which point you copy the code into a ipynb file and add Markdown elements to build the report).

1.8.2.1. No, seriously, look at your data a lot!

This isn’t even a “debugging” point per se.

You know a 6 is a 6. But we will be handling increasingly large datasets, and it’s easy to make rather large changes without knowing exactly what you have done, … or not done … , if you don’t see into the object. Are you sure that object is exactly what you think it is, containing exactly what you think it does? Thus, the print statement and other ways of “glancing into” datasets are crucial. Even when you’ve all become pythonic pros, you should tend towards examining your objects “too much”.

1.8.3. Are you still stuck?

It’ll happen! We will try to build some ambitious code this semester! (Imagine trying to replicate the CAPM beta estimation on 500 firms in Excel!) Coding complicated analysis is iterative and debugging can be as tough as having your IT firm audited.

So if you’ve tried the above, what can you do?

  • Writing smart code will save us from getting into intractable problems. More on that next class.

  • Again, see the resources tab of our website! It’s got some good pointers, along with a 15 minute rule: Once you’ve spent 15 minutes attempting to troubleshoot a problem, you must ask for help!

  • Finally, clearing your head and getting a mental break might help you spot the problem:

.

1.8.4. Clear output and rerun from the start!

(Yes, I’m copying this from the prior page. It’s important!)

Warning

I can NOT emphasize this enough: The point of code is to make things reproducible. So code must run from beginning to end and produce the same thing every time.

The nature of developing code is that you’ll run some lines of code, then write more code, then go back and change something above (and run that part again), and then go back down and keep writing and running new code. When you’re done, your code will be broken!

A golden rule

  1. Always look to see if the first executed code block is “[1]” and that all the subsequent code blocks are numbered consecutively. Click on this link to see an example.

  2. If the code you’re looking at doesn’t meet those two rules, I click “Run” > “Restart Kernel and Run All Cells”.

  3. This applies to your own code! Restart and run from scratch regularly.

1.8.5. Did you follow these tips and fix your code?

Congrats!