3.3.2. Making a Plot¶
To start plotting, add these to your import statements at the top of your file:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt # sometimes we want to tweak plots
3.3.2.1. General tips¶
Start with simple graphs, and then build in and layer on “complications” and features.
Really compare your code with the syntax in the documentation. Understanding what each parameters does and needs is essential.
Triple check for typos, unclosed parentheses and the like
What chart should I use (with
sns
examples) and more help on how can I make itThe
seaborn
tutorial page is excellent: https://seaborn.pydata.org/tutorial.html
3.3.2.2. Syntax tips¶
With seaborn
, I usually use this syntax that looks something like for graphing. (Delete the “<” and “>” and replace the inside with what you need.) Obviously, you’ll see many examples in this chapter that deviate from this. Usually this is because you don’t need to explicitly declare “data”, or because “x” is just assumed as all variables in the dataset.
sns.<function>(data = <dataframe> [optional data functions],
x = '<varname>', y = '<varname>',
[optional arguments for specific plots] )
Tips for the “Optional data functions”:
Sometimes I add
.query()
after the dataframe name to filter outliersSometimes I add
.sample()
afterwards to plot a more manageable amount of data.
Example:
sns.boxplot(data=ccm.query('td_a < 1 & td_a > 0'),
x='td_a')
3.3.2.3. Tips on plotting workflow¶
Generally, to plot in Python:
Put your data into a DataFrame
Format the data long if you want to use a
sns
functionUse
pd
orsns
plotting functions.Q: Which? A: Which ever is easiest!
panda
’s plotting functions are simple and good for early stage and some simple graphics (bar, “barh”, scatter, and density), butseaborn
has many more built in options, has simpler syntax, and is easier to use, IMO.Start with basic plots, then layer in features
Get the “gist” of the figure right
If you need to customize the figure, you’ll end up using
matplotlib
commands after the main plot function. Matplotlib is a full-powered (but confusing as heck) graphing package. In fact, bothpandas
andseaborn
are just usingmatplotlib
, but they hide the gory details for us. Thanks,seaborn
!This page discusses customizing and improving figures
Only customize when necessary for hyper control. Focus on CONTENT over hyper-control of formatting.
Some “format” tweaks (add a title, change the axis titles) and choices about plotting can be quick/cheap and have high value, and you should do these right before you finish your project/assignment and are about to post it officially. Otherwise, focus on content.
3.3.2.4. “I swear the syntax is correct!”¶
Warning
After syntax errors, most graphing pain comes from insufficient data wrangling. Most plotting functions have assumptions about how the data is shaped. Data might be unwieldy but we can control it:
How do we wrangle our data to make plot functions happy?
Keep your data in “tidy form” (aka tall data aka long data.
Seaborn
expects data shaped like this. Long data is generally better for data analysis and visualization (even aside from Seaborn’s assumptions)The exception: Pandas. If you want to plot using a
pandas
plot function, you might have to reshape (temporarily) your data to the wider “output shape” that corresponds to the graph type you’re generating.