Search
Better Viz

Making Good Viz

There is an enormous amount of scholarship and debate about what makes for effective graphs and I can't possibly do the field justice. Below is simply one person's distillation of some tips that are reasonably well agreed upon. I'm aiming for concise here so that we can practice, but if you want more, visit the links below and links in the last lecture.

Don'ts

  • pie charts: humans stink at interpreting angles
  • stacked bar charts: tough to decode trends
  • make your reader do math: if $x-y$ is interesting, don't plot $x$ and $y$ separately
  • misleading scales
  • 3D unless absolutely necessary (and it almost surely isn't)
  • distracting chart junk
  • unnecessary colors

An illustration of some of those Don'ts in practice:

Another example to not replicate:

Do's: slides 49-64

  • Show the data, reduce the clutter, and integrate the text and the graph
    • graphs should aspire to be sufficient to understand without reading the text
  • Control the aspect ratio
  • Think about whether you need to include zero. Sometimes excluding it makes the figure misleading. Sometimes including it (and expanding the y-axis to do so) can hide the variation you're describing.
  • Facilitate comparisons:
    • by placing figure components next to or above (depends!) the stuff it is compared to
    • by using the same axis (two y-axes is usually bad!)
    • labels > legends! (so readers eyes don't have to dart back and forth)
    • sort in meaningful orders (i.e. not alphabetically!)

Transforming bad figures to good ones

Practice: Thinking and planning

TSP: Which type of graph (bar, line, or histogram) would you use for:

  1. The volume of apples picked at an orchard based on the type of apple (Granny Smith, Fuji, etcetera).
  2. The number of points for each game in a basketball season for a team.
  3. The count of apartment buildings in Chicago by the number of individual units.
  1. Suppose we create a scatter plot but find that due to the large number of points it's hard to interpret. What are two things we can do to fix this issue?
  2. Suppose that we create an n by n FacetGrid. How big can n get?
  3. What are the two things about faceting which make it appealing?
  4. When is pairplot most useful?

(The answers are available in the raw ipynb file for this class within the lectures repo.)

Practice: Fixer Upper

The usual process is to start making figures that are simple and then iterating to improve them. Naturally, almost all figures start, well, not great.

Work with the classmate next to you and improve our "first pass" figures we started last class.

Use the guidelines above.

My turn: Oh the possibilities

In the last lecture page, I introduced a larger set of firm accounting variables. I want to show you how far we can push this.

If you want to see the code that makes these, view the raw ipynb file on GitHub The code uses plotly's subpackage plotly-express which is ridiculously easy to use.

One more

This is a replication of a famous Hans Rosling TED talk figure using the well-known gapminder data:

Before next class

  1. Improve all of the plots in the Visualization Practice page.
  2. Flip through the links above and the references in the Making Viz page. Make note of any neat chart types or alterations that improve charts that you would like to implement sometime.
  3. Now that we have most of our toolkit in place, read What I do when I get a new data set as told through tweets.

References

See the last lecture.