3.3.2. Making a Plot

To start plotting, add these to your import statements at the top of your file:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt  # sometimes we want to tweak plots

3.3.2.1. General tips

  1. Start with simple graphs, and then build in and layer on “complications” and features.

  2. Really compare your code with the syntax in the documentation. Understanding what each parameters does and needs is essential.

  3. Triple check for typos, unclosed parentheses and the like

  4. What chart should I use (with sns examples) and more help on how can I make it

  5. The seaborn tutorial page is excellent: https://seaborn.pydata.org/tutorial.html

3.3.2.2. Syntax tips

With seaborn, I usually use this syntax that looks something like for graphing. (Delete the “<” and “>” and replace the inside with what you need.) Obviously, you’ll see many examples in this chapter that deviate from this. Usually this is because you don’t need to explicitly declare “data”, or because “x” is just assumed as all variables in the dataset.

sns.<function>(data = <dataframe> [optional data functions],
               x = '<varname>', y = '<varname>',  
               [optional arguments for specific plots]   )

Tips for the “Optional data functions”:

  1. Sometimes I add .query() after the dataframe name to filter outliers

  2. Sometimes I add .sample() afterwards to plot a more manageable amount of data.

Example:

sns.boxplot(data=ccm.query('td_a < 1 & td_a > 0'),
            x='td_a')

3.3.2.3. Tips on plotting workflow

Generally, to plot in Python:

  1. Put your data into a DataFrame

  2. Format the data long if you want to use a sns function

  3. Use pd or sns plotting functions.

    • Q: Which? A: Which ever is easiest! panda’s plotting functions are simple and good for early stage and some simple graphics (bar, “barh”, scatter, and density), but seaborn has many more built in options, has simpler syntax, and is easier to use, IMO.

    • Start with basic plots, then layer in features

    • Get the “gist” of the figure right

  4. If you need to customize the figure, you’ll end up using matplotlib commands after the main plot function. Matplotlib is a full-powered (but confusing as heck) graphing package. In fact, both pandas and seaborn are just using matplotlib, but they hide the gory details for us. Thanks, seaborn!

    • This page discusses customizing and improving figures

    • Only customize when necessary for hyper control. Focus on CONTENT over hyper-control of formatting.

    • Some “format” tweaks (add a title, change the axis titles) and choices about plotting can be quick/cheap and have high value, and you should do these right before you finish your project/assignment and are about to post it officially. Otherwise, focus on content.

3.3.2.4. “I swear the syntax is correct!”

Warning

After syntax errors, most graphing pain comes from insufficient data wrangling. Most plotting functions have assumptions about how the data is shaped. Data might be unwieldy but we can control it:

How do we wrangle our data to make plot functions happy?

  • Keep your data in “tidy form” (aka tall data aka long data. Seaborn expects data shaped like this. Long data is generally better for data analysis and visualization (even aside from Seaborn’s assumptions)

  • The exception: Pandas. If you want to plot using a pandas plot function, you might have to reshape (temporarily) your data to the wider “output shape” that corresponds to the graph type you’re generating.