Thoughts about visualization and storytelling
Visually attractive graphics also gather their power from content and interpretations beyond the immediate display of some numbers. The best graphics are about the useful and important, about life and death, about the universe. Beautiful graphics do not traffic with the trivial.
- Edward Tufte, The Visual Display of Quantitative Information, Chapter 9: Aesthetics and Technique in Data Graphical Design
Antipatterns in visualization
via Mushon Zer-Aviv, "Disinformation Visualization: How to lie with datavis"
Gallup: Pro-choice vs Pro-life
Riffing off of Mushon Zer-Aviv's critique of the various ways the Gallup abortion poll can be remixed visually to favor a certain political viewpoint.
Gallup's graph 1995 - 2010
The AP skew
(the AP article was originally published in 2009)
Back in 2009 when the AP wanted to use the data to tell a good story, all they needed was to skew the graph’s proportions a bit, change the starting point of the vertical axis and switch to thicker, less nuanced lines that based on the very same data emphasizes a clear trend in the graph. And a clear shift in opinion in such a huge ideological debate makes for a much more interesting story.
The LiveAction.org skew
Now here's LiveAction's take on it
Earlier in 2009 when pro-life site LiveAction.org wanted to graph the same data to preach to the choire and say “we’re winning!”…This time, no need to admit the public opinion was ever at the 1990’s low pro-life mark, so starting from 2003-2004 is fine. Focusing on a shoter time frame to tell a story of consistent increase helps too, as does the choice of 18-29 years old age brackets. There’s nothing about the data that isn’t true, but it is very selective.
Zooming out: Gallup 1995 - 2014
A much more muddled picture when you look at how opinion has changed from 2009 to 2014:
Adding nuance of which circumstances
Moving beyond pro-life versus pro-choice, let's look at the responses for "Do you think abortions should be legal under any circumstances, legal only under certain circumstances, or illegal in all circumstances?"
Variations in charting
Let's see what another level of granularity in data can show.
For respondents who said that abortions should be legal in certain circumstances, Gallup then asked them if they meant "in most circumstances" or "only in a few circumstances".
Stock area chart
When charting in Google Spreadsheets, the default chart choice is Area:
The colors are randomly chosen, and in the default assignment, are pretty confusing: blue is for Always, but red is for Mostly.
So let's switch Always to blue, make Mostly a lighter blue, and Few and Never shades of red:
Area color, politicized
Now here's where judgment can apply color choices in such a way to skew the perception of the graph. Someone who sympathizes with the pro-choice side might argue that Few should not be considered a pro-life proponent, on the logic that you're either pro-life or you aren't. Such an advocate may then choose to make the Few area more neutral in appearance:
By making Few much less associated with Never, the situation makes the pro-choice contingent much more prevalent.
The area chart is useful for showing parts of a hole. But let's go with Gallup's choice and use a line graph.
Again, the default choices by Google are not helpful:
The problem with area charts (similar to pie charts) is that the eye has a harder time distinguishing differences between amounts of area. With the line chart, however, we see a new insight: the respondents who believe abortion should be highly restricted, i.e. Few circumstances, are by far the majority. And pro-lifers may claim that as a victory in the battle for opinion.
Line, compressed vertical axis
Changing the minimum of the axis to 5, and the maximum to 50, we see more dramatic changes in opinion, though that just serves to make the polling results seem more jittery:
Area, with less nuance
One more iteration: an area chart with the two in-between opinions combined into a Sometimes category, i.e. the same as Gallup's original graph, except as an area chart:
An introduction to public affairs reporting and the core skills of using data to find and tell important stories.
- Count something interesting
- Make friends with math
- The joy of text
- How to do a data project
Just because it's data doesn't make it right. But even when all the available data is flawed, we can get closer to the truth with mathematical reasoning and the ability to make comparisons, small and wide.
- Fighting bad data with bad data
- Baltimore's declining rape statistics
- FBI crime reporting
- The Uber effect on drunk driving
- Pivot tables
Learn how to take data in your own hands. There are two kinds of databases: the kind someone else has made, and the kind you have to make yourself.
- The importance of spreadsheets
- Counting murders
- Making calls
- A crowdsourced spreadsheet
Phillip Reese of the Sacramento Bee will discuss how he uses data in his investigative reporting projects.
- Phillip Reese speaks
Mapping can be a dramatic way to connect data to where readers are and to what they recognize.
- Why maps work
- Why maps don't work
- Introduction to Fusion Tables and TileMill
A continuation of learning mapping tools, with a focus on borders and shapes
- Working with KML files
- Intensity maps
- Visual joins and intersections
The first in several sessions on learning SQL for the exploration of large datasets.
- MySQL / SQLite
- Select, group, and aggregate
- Where conditionals
- SFPD reports of larceny, narcotics, and prostitution
- Babies, and what we name them
The ability to join different datasets is one of the most direct ways to find stories that have been overlooked.
- Inner joins
- One-to-one relationships
- Our politicians and what they tweet
Sometimes, what's missing is more important than what's there. We will cover more complex join logic to find what's missing from related datasets.
- Left joins
- NULL values
- Which Congressmembers like Ellen Degeneres?
A casual midterm covering the range of data analysis and programming skills acquired so far.
- A midterm on SQL and data
- Data on military surplus distributed to U.S. counties
- U.S. Census QuickFacts
The American democratic process generates loads of interesting data and insights for us to examine, including who is financing political campaigns.
- Polling and pollsters
- Following the campaign finance money
- Competitive U.S. Senate races
With Election Day coming up, we examine the practices of polling as a way to understand various scenarios of statistical bias and error.
- Statistical significance
- Poll reliability
Do your on-the-ground reporting
- No class because of Election Day Coverage
While there are many tools and techniques for building data graphics, there is no magic visualization tool that will make a non-story worth telling.
- Review of the midterm
- The importance of good data in visualizations
- How visualization can augment the Serial podcast
One of the most tedious but important parts of data analysis is just cleaning and organizing the data. Being a good "data janitor" lets you spend more time on the more fun parts of journalism.
- Dirty data
Simon Rogers, data editor at Twitter, talks about his work, how Twitter reflects how communities talk to each other, and the general role of data journalism.
- Ellen, World Cup, and other masses of Twitter data
When the data doesn't directly reveal something obvious, we must consider what its structure and its metadata implies.
- Proxy variables
- Thanks Google for figuring out my commute
- How racist are we, really?
- How web sites measure us
Discussion of final projects before the Thanksgiving break.
Holiday - no class
Holiday - no class
Last-minute help on final projects.
In-class presentations of our final data projects.