## Wednesday, March 2, 2016

### "To Pie or Not To Pie" That is the question! Graph theory

In several recent posts I have attempted to convey the nature of how the current primary season is funded (on the Democratic side). In order to assist in conveying this information I have employed several different analytical angles and graphical strategies all generated in my favorite statistical package, R. These graphs have included histograms, maps, bar-plots, box-plots, and yes, dare I say it pie charts.

I wrote my most recent post and I was surprised to find that despite its inflammatory content, the only comments I received on it were criticizing my use of pie charts.

One article linked to the comment opened, "The pie chart is easily the worst way to convey information ever developed in the history of data visualization."

The article commenced to list some very reasonable information as to why pie charts are not an effective method of conveying information. They do mention that there is a slight benefit when comparing large differences because "their only real use is to let people know what a fraction looks like."

But is this true?

The article states that charts are used because:
- Charts are a way to take information and make it more understandable.
- In general, the point of charts are to make it easier to compare different sets of data.
- The more information a chart is able to convey without increasing complexity, the better.

All of these points are great but fail to capture the two primary reasons I use charts:
- Stimulate interest in the reader.
- Provide a visual aid by which readers can understand and take away key information.

So with these graphing objectives in mind, lets look at the following graphs all produced from the same data.

 Figure 1: Campaign finance pie chart. Post Code
 Figure 2: Campaign finance histogram chart. Post Code
 Figure 3: Campaign finance map. Post Code (not yet provided)
 Figure 4: Campaign finance barplot. Post Code
 Figure 5: Itemized contribution size over time, boxplot. Post Code
 Figure 6: Cumulative contribution over time. Notice the steep jumps in Clinton campaign reflects the effect of large donors while the smoothness in the Sanders campaign reflects the flow of numerous small donors. Post Code
A keen eye will immediately notice that all except the fist figure are generated using ggplot2, my favorite R graphing package. ggplot2 goes out of its way not to provide a pie chart rendering tool as they strongly discourage its use. Though there is a bit of a workaround using polarized coordinates and bar plots which I decided not to use.

From looking at all six figures we can see that each of them is clearly trying to communicate the same information in a different way. Figures 1 and 2 are concerned about size of contributions, while Figure 3 provides geographic mapping of the number of contributions. Figure 4 reorganizes the information by industry category rather than contribution size while Figures 5 and 6 are more concerned with how donations change over time.

Now, looking over these figures, I have to ask, which of them even comes close to conveying the same information a effectively as the pie graphs in figure I conveys this information?

The histogram, Figure 2, provides almost the same information yet you have to spend a considerable amount of effort looking at the Figure then do some mental math multiplying size of donation to frequency of donation in order to mentally come up with values that almost resemble Figure 1.

I could instead generate a density map to try to attempt to convey the same information.
 Figure 7: Density curve of campaign contribution size. Code
Yet this does not capture the information I would like to convey (Figure 7). From this graph you may mistakenly assume that for the Clinton campaign small contributions are more important than large ones. However, this is not the case as we know from Figure 1. The problem with a density graph like this is that it is measuring the density which is the number of contributions. This does not reflect in any obvious way how important those contributions are.
 Figure 8: Contribution size/importance plot. This is the same plot as a density plot (Figure 7) but rather than counting the number of contributions at each amount it calculates to total value of those contributions. Code

We get much closer to the information I am attempting to describe in Figure 1 with Figure 8. Figure 8 shows us that there are certain peak quantities most frequently donated with the two different campaigns. One quantity is around the $2700 mark for the Clinton campaign (the maximum allowable without using Super-PACs) while the other is the less than$100 area for the Sanders campaign.

Looking at Figure 8 we can gather basically the same information as that of the pie-chart. Maybe a little more as we can see that there are certain peak values (200,500,1000,2000,2700) which are more likely donor values. Yet, I would argue that this information is not really important. It might even be a distraction to the main point of the original post (FALSE: Clinton Funded by "Grassroots").

Not only is the information potentially a distraction, but it requires additional analysis on the part of the reader to figure out what information the chart is trying to convey. A pie-chart on the other hand is an amazingly simple chart that anybody who has familiarity with pies or charts can easy read and understand when comparing large differences in proportions. Thus readers can in a glance get a full and easy to remember understanding of the information that is being transmitted.

Conclusion

Here we have it! One pie-chart that efficiently conveys certain types of information against seven other figures which struggle to convey the same information as what the pie-chart easily conveys.

My final suggestion therefore is that people start thinking more about what they are attempting to communicate with their charts and less about what the chart gurus are telling us to do.

Building effective graphics is like writing effective pros. Know what you want to say and figure out the easiest and most straightforward way of saying it, period.

1. You're comparing apples and oranges!

You should have compared the pie chart vs. a bar chart using the exact same data. The histogram does not have the same grouping and isn't even doing the same thing. It's comparing counts, whereas the pie chart is comparing sums.

1. Stupid me. I did not think of that.

2. Here's some research on the subject:

Judgments of Change and Proportion in Graphical Perception
J. G. Hollands
University of Toronto, Toronto, Ontario, Canada
Ian Spence
University of Toronto, Toronto, Ontario, Canada

Abstract

Subjects judged change and proportion when viewing graphs in two experiments. Change was judged more quickly and accurately with line and bar graphs than with pie charts or tiered bar graphs, and this difference was larger when the rate of change was smaller. Without a graduated scale, proportion was judged more quickly and accurately with pie charts and divided bar graphs than with line or bar graphs. Perception is direct when it requires simpler or fewer mental operations; we propose that perception of change is direct with line and bar graphs, whereas perception of proportion is direct with pie charts and divided bar graphs. The results are also consistent with the proximity compatibility principle. Suggestions for improving the design of graphical displays are given.

3. It's best to keep everything else equal when comparing different types of charts. For example, a stacked bar chart would have conveyed the same message more clearly for figure 1.

4. A classical and well-cited research is Cleveland and McGill (1984): https://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/cleveland.pdf