3.3.2. Making a Plot

To start plotting, add these to your import statements at the top of your file:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt  # sometimes we want to tweak plots

3.3.2.1. Plotting process

#

Step

Note

0

Ask a question about the data

Ex: What is the distribution of unemployment in each state?

1

Q > What the plot should look like. Draw it!

Draw it on paper!

2

Plot appearance > which plot function/options to use

Find a pd or sns plot example that looks like that.

3

The function dictates how data should be formatted before you call the plot

Key: Wide or tall?

3.3.2.2. General tips

  1. Start with simple graphs, and then build in and layer on “complications” and features.

    • Easy formatting things you should always do: title, xlabel, ylabel

    • Perfecting graphs can involve a lot of tweaks and code, so only do this when you know you have the content of the graph correct and that it will be in your deliverable (assignment, report, etc)

  2. Really compare your code with the syntax in the documentation.

    • Understanding what each parameter does and needs is essential.

    • Triple check for typos, unclosed parentheses, and the like

  3. Three places to find similar charts (and the functions that make them)

3.3.2.3. Syntax tips

With seaborn, I usually use this syntax that looks something this like for graphing. (Delete the “<” and “>” and replace the inside with what you need.) Obviously, you’ll see many examples in this chapter that deviate from this. Usually, this is because you don’t need to explicitly declare “data”, or because “x” is just assumed as all variables in the dataset.

sns.<function>(data = <dataframe> [optional data functions],
               x = '<varname>', y = '<varname>',  
               [optional arguments for specific plots]   )

Tips for the “Optional data functions”:

  1. Sometimes I add .query() after the dataframe name to filter outliers

  2. Sometimes I add .sample() afterwards to plot a more manageable amount of data.

Example:

sns.boxplot(data=ccm.query('td_a < 1 & td_a > 0'),
            x='td_a')

3.3.2.4. Tips on plotting workflow

Generally, to plot in Python:

  1. Put your data into a DataFrame

  2. Format the data long if you want to use a sns function

  3. Use pd or sns plotting functions.

    • Q: Which? A: Whichever is easiest! panda’s plotting functions are simple and good for early-stage exploration and some simple graphics (bar, “barh”, scatter, and density), but seaborn has many more built-in options, has simpler syntax, and is easier to use, IMO.

    • Start with basic plots, then layer in features

    • Get the “gist” of the figure right

  4. If you need to customize the figure, you’ll end up using matplotlib commands after the main plot function. Matplotlib is a full-powered (but confusing as heck) graphing package. In fact, both pandas and seaborn are just using matplotlib, but they hide the gory details for us. Thanks, seaborn!

    • This page discusses customizing and improving figures

    • Only customize when necessary for hyper control. Focus on CONTENT over hyper-control of formatting.

    • Some “format” tweaks (add a title, change the axis titles) and choices about plotting can be quick/cheap and have high value, and you should do these right before you finish your project/assignment and are about to post it officially. Otherwise, focus on content.

3.3.2.5. “I swear the syntax is correct!”

Warning

After syntax errors, most graphing pain comes from insufficient data wrangling. Most plotting functions have assumptions about how the data is shaped. Data might be unwieldy but we can control it:

How do we wrangle our data to make plot functions happy?

  • Keep your data in “tidy form” (aka tall data aka long data. Seaborn expects data shaped like this. Long data is generally better for data analysis and visualization (even aside from Seaborn’s assumptions)

  • The exception: Pandas. If you want to plot using a pandas plot function, you might have to reshape (temporarily) your data to the wider “output shape” that corresponds to the graph type you’re generating.