3.3.2. Making a Plot¶
To start plotting, add these to your import statements at the top of your file:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # sometimes we want to tweak plots
18.104.22.168. General tips¶
Start with simple graphs, and then build in and layer on “complications” and features.
Really compare your code with the syntax in the documentation. Understanding what each parameters does and needs is essential.
Triple check for typos, unclosed parentheses and the like
seaborntutorial page is excellent: https://seaborn.pydata.org/tutorial.html
22.214.171.124. Syntax tips¶
seaborn, I usually use this syntax that looks something like for graphing. (Delete the “<” and “>” and replace the inside with what you need.) Obviously, you’ll see many examples in this chapter that deviate from this. Usually this is because you don’t need to explicitly declare “data”, or because “x” is just assumed as all variables in the dataset.
sns.<function>(data = <dataframe> [optional data functions], x = '<varname>', y = '<varname>', [optional arguments for specific plots] )
Tips for the “Optional data functions”:
Sometimes I add
.query()after the dataframe name to filter outliers
Sometimes I add
.sample()afterwards to plot a more manageable amount of data.
sns.boxplot(data=ccm.query('td_a < 1 & td_a > 0'), x='td_a')
126.96.36.199. Tips on plotting workflow¶
Generally, to plot in Python:
Put your data into a DataFrame
Format the data long if you want to use a
Q: Which? A: Which ever is easiest!
panda’s plotting functions are simple and good for early stage and some simple graphics (bar, “barh”, scatter, and density), but
seabornhas many more built in options, has simpler syntax, and is easier to use, IMO.
Start with basic plots, then layer in features
Get the “gist” of the figure right
If you need to customize the figure, you’ll end up using
matplotlibcommands after the main plot function. Matplotlib is a full-powered (but confusing as heck) graphing package. In fact, both
seabornare just using
matplotlib, but they hide the gory details for us. Thanks,
This page discusses customizing and improving figures
Only customize when necessary for hyper control. Focus on CONTENT over hyper-control of formatting.
Some “format” tweaks (add a title, change the axis titles) and choices about plotting can be quick/cheap and have high value, and you should do these right before you finish your project/assignment and are about to post it officially. Otherwise, focus on content.
188.8.131.52. “I swear the syntax is correct!”¶
After syntax errors, most graphing pain comes from insufficient data wrangling. Most plotting functions have assumptions about how the data is shaped. Data might be unwieldy but we can control it:
How do we wrangle our data to make plot functions happy?
Keep your data in “tidy form” (aka tall data aka long data.
Seabornexpects data shaped like this. Long data is generally better for data analysis and visualization (even aside from Seaborn’s assumptions)
The exception: Pandas. If you want to plot using a
pandasplot function, you might have to reshape (temporarily) your data to the wider “output shape” that corresponds to the graph type you’re generating.