Monday, March 7, 2016

For Whom Will the Michigan Mitt Swing?

Tomorrow, March 8th, Michigan with 130 delegates gets to vote one which of the Democratic candidates, Hillary Clinton or Bernie Sanders should be the Democratic presidential nominee. Michigan is an important state because is represents a large number of delegates.

Michigan has a also been in the news frequently this election season with the poisoning of water in Flint as a result of changes in how the city sources its water. Both democratic candidates have spent a considerable time in the state.

In a recent post I predicted outcomes within states based on the proportion of contributions within those states which have given to either the Sanders campaign or the Clinton campaign. Based on the donor rates within Michigan I predicted a 65% share of the vote would go to Sanders.

As Clinton has adopted a "stay the course", Obama 2.0, campaign strategy, Democrats in Michigan may be more likely to vote for her relative to Democrats in other parts of the country who have not seen the recent growth rates Michigan has experienced.
Figure 1: A map of counties supporting Sanders or Clinton. Donations are mapped to zip code level. Zoom to larger map to see donations indicated as either S for Sanders or C for Clinton. Size of letters correspond to number of donations from that zip code.
From Figure 1 we can see that by just looking at the number of contributions coming in by county we would expect Michigan to strongly support Bernie Sanders. However, population density is not well captured by county maps. We can see though that there is a strong level of support in the areas surrounding Detroit for both candidates.

Figure 2: Number of contributions in Michigan by contribution size for both candidates Clinton and Sanders.Note that because the majority of funds are too small to itemize, these estimates underestimates the total funds contributed for Sanders by 70-80% while underestimating funds contributed to Clinton by only 10-20%.
We can see that in terms of total number of contributions, Sanders is strongly outraising Clinton in Michigan by a factor of 2 to 1. However, as has been noted previously, large/wealthy donors disproportionately back Hillary Clinton above all other candidates. When it comes to large sponsors giving more than $1000 to Clinton in Michigan, she has hundreds while Sanders has 15 (too few to appear on the figure).

Figure 3: Total itemized funds contributed by contribution size. Note that because the majority of funds are too small to itemize, these estimates underestimates the total funds contributed for Sanders by 70-80% while underestimating funds contributed to Clinton by only 10-20%.
Though these contributions represent a small proportion of the total contributions to either candidate they do represent a large portion for the total funds contributed in Michigan. From Figure 2, we can see that those few large contributors make up a large portion of the funds donated in the state.

Who are these large donors?

Figure 4: Industrial backing of donors in Michigan.
Like the country at large, business executives and lawyers are Clinton's largest backers while health care workers, engineers, artists, academics, and the self-employed form a broad coalition of support for Sanders.

Related Articles: 
Clinton's Lack of Public Support Made up by Super-PACs
Analysis: Clinton backed by Big Money: Sanders by Small
Overwhelming Growth In National Support for Bernie Sanders Mapped
Big Business Backs Hillary: Small Bernie
Hillary 1993: Largest Drop in Girl Names EVER; Chelsea Distant Second
As First Lady, Popularity of Babies Named "Hillary" Dropped by an Unprecedented 90%
Hillary Clinton's Biggest 2016 Rival: Herself
The Simple Reason Sanders Is Winning
Cause of Death: Melanin | Evaluating Death-by-Police Data
Obama 2008 received 3x more media coverage than Sanders 2016
The Unreported War On America's Poor
What it means to be a US Veteran Today

Sunday, March 6, 2016

Clinton's Lack of Public Support Made up by Super-PACs

Hillary Clinton, with only $30 million raised in February far below the $43 million raised by her rival Bernie Sanders, Clinton is falling desperately short of public backing.

Fortunately, she has friends in high places. These friends are increasing their backing of her through the quasi-legal independent campaigning structures some of which are known as Super-PACs.

These organizations are a mixed batch many of them working for the collective interest of special interest groups such as the National Nurses United For Patient Protection Super PAC which backs Bernie Sanders or other Super-PACs such as the "League of Conservative Voters, Inc" has spent for instance $162,115.70 supporting Hillary Clinton.

These packs are free to support without fiscal limit any candidate thought are legally required to act independent agents not in contact with of individual campaigns.

It is easy to understand how Super-PACs could be justified legally. If there are organizations that support particular special interests then shouldn't these organizations have the right to back whatever candidate is also supporting those positions?

However, where things get tricky is when Candidates construct Super-PACs for the express purpose of skirting election laws. A famous case called Citizens United vs FEC in 2010 effectively reversed years of campaign finance reform law. An interesting note is that Citizens United Super PAC LLC has so spent $140k supporting Clinton's campaign. (Interestingly this same organization reports spending 512k opposing her.)

Anyways, the long and short of it is that Super wealthy donors who are prohibited from donating more than the legal limit to campaigns can set up Super-PACs in order to skirt election laws and back particular candidates. I do not know to what extent this is happening for the current campaign. However, it is important to recognize that a Super-PAC backed by a union composed of thousands of members (such as the nurse PAC supporting Sanders) is distinctly different than the typical organizations people concerned with Super-PACs are talking about.

Related Articles: 
Overwhelming Growth In National Support for Bernie Sanders Mapped
Big Business Backs Hillary: Small Bernie
Hillary 1993: Largest Drop in Girl Names EVER; Chelsea Distant Second
As First Lady, Popularity of Babies Named "Hillary" Dropped by an Unprecedented 90%
Hillary Clinton's Biggest 2016 Rival: Herself
Analysis: Clinton backed by Big Money: Sanders by Small
Legally Rig An Election: A Citizen's Guide to Gerrymandering 
Nevada:Sanders has 6x the Supporters as Clinton
The Simple Reason Sanders Is Winning
Cause of Death: Melanin | Evaluating Death-by-Police Data
Obama 2008 received 3x more media coverage than Sanders 2016
The Unreported War On America's Poor
What it means to be a US Veteran Today

Saturday, March 5, 2016

Prediction: 64% Sanders Wins Majority of Pledged Deligates

There are many ways so predict the future. All of them have a fair degree of uncertainty. Nate Silver at FiveThityEight uses a measure of ethnicity and political leanings to predict how well Sanders will do in different states. This seems like a sound method to me though it is not the only way to make predictions.

For the last month I have been playing with campaign contributions data and have seen a strong and steady increase in support for Bernie Sanders across the nation. I have mapped it county by county and the results are quite dramatic.

Yet, what does grassroots support really mean? Does it translate to votes?

In February, with only four states having voted, it was impossible to say how contributions translated to votes. But Super-Tuesday changed all of that!

With fifteen states having voted, we can now see if financial support maps to voting support.
Figure 1: This figure shows a relationship between percent of support for all times reported as of January 31st coming from that state with percent of vote (delegates when not available) coming from that state at the primary.
From Figure 1, we can see there appears to be a pretty strong relationship between percent of vote actually cast and percent of support coming in for that candidate. Let's try a formal model:
$$Vote = \beta_0 + \beta_1 ContSanders + \beta_2 Primary + \beta_3 Closed $$ Vote is the actual vote in the state. The explanatory variables are percent contributing to Sanders (ContSander). Primary and Closed refer to the difference between primary vs caucus voting systems and closed vs open. For closed systems only registered democrats are allowed to vote.

Looking at Table 1, with 79% of the variance explained (r2), we can see that percent of support coming in for a candidate from a state at the end of February is a very good predictor of how the vote will go. Increasing the number of explanatory variables increases the r2 to 86%.

Table 1: The regression of  Vote on ContSanders is V1 while V2 and V3 allow the inclusion of the explanatory variables Primary and Closed.

sigp < 0.001%p < 0.01%p < 0.01%
* Coefficient significant at 10%, ** at 1%, and *** at 0.1%

When examining the coefficient on ContSanders it is useful to reflect that while this value is statistically very different from zero, the point value estimate is reasonably close to 1. This is the target number we would like if we were to directly interpret proportion of supporters in a states as a good indicator of proportion of population in state supporting Sanders. This interpretation does not make sense since most of the contributions to Sanders campaign (72% of them) are not itemized (and thus included in this analysis) because they are less than the FEC threshold of $200 while a much smaller number (about 12%) of those for the Clinton campaign are not itemized.

A priori I did not have any hypothesis as to how the Primary vs Caucus method was going to play out though I did expect those states with Closed voting to be less likely to vote for Sanders as he is strongly favored among independents.

From these numbers coefficients we can now make predictions about how the rest of the states would vote if all of the states voted today (well really March 1st, Super Tuesday). The results give a point estimate of 42% of primary locations going to Sanders with a total expected number of pledged delegates of 1740 to that of Clinton 2285. So Clinton is expected to win??

But wait! Not quite so fast!

The election will not be held tomorrow. The momentum has been strongly with Sanders and it should be expected to stay strongly with Sanders.
Figure 2: Percent of contributions going to Sanders relative to that of Clinton in the South. Interestingly DC is the largest proportional supporter of Hillary. That is because it is the state/district which best exemplifies Hillary's primary backers, wealthy. States with black outlines have yet to vote.
Figure 3: Percent of contributions going to Sanders relative to that of Clinton in the Northeast. States with black outlines have yet to vote.
Figure 4: Percent of contributions going to Sanders relative to that of Clinton in the Midwest. States with black outlines have yet to vote.
Figure 5: Percent of contributions going to Sanders relative to that of Clinton in the West. States with black outlines have yet to vote.

From the Figure 2 through 5, we can see that the growth in support for Sanders has been steadily increasing in all areas of the country. The regions most friendly to Clinton are the South and the Northeast while those most friendly to Sanders are the West and the Midwest.

The South happens to be the region least supportive of Sanders campaign though it has had more votes than all other states combined. Thus we may be getting a distorted picture of how the primary season may go based on how these first few states have voted.

If we fit a simple line to each state then assume that growth in support will continue at a steady pace until the primary at that state.
Figure 6: Predicted support at time of primary mapped against support at end of January. ZZ are democrats abroad.
Predicting support based on historic rate of donations predicts that almost all states will have greater support for Sanders than they did at the end of January. States which have experienced more growth in support for Sanders or has later primaries tend to end up further to the right on the graph. The diagonal line is what happens if there was no growth in support for Sanders over time.

Using these new expected support levels at the time of the primaries that have already happened we can fit a new model.

Table 2:  This tables shows the results of using expected proportion of Sanders supporters as predictors for election results rather than actual support as of the end of January.
sigp < 0.001%p < 0.01%p < 0.01%

From Table 2, we can see that using predicted Sanders support rather than that last observed at the end of January gives us a slightly higher r2. However, those of us familiar with estimation will immediately realize that we have introduced a new level of uncertainty into the data. This is because we are using an estimated value to estimate yet another value.

Ignoring estimation uncertainty, using the best fit model I predict Sanders will get 57% of the pledged delegates. However, point estimates in statistics are almost never true. In order to estimate the error in the process I simulate randomly sampling from the distribution of possible coefficients for the Intercept, ContSanderP, Primary, and Closed coefficients and predict delegate distribution. 72% of the time Sanders is expected to get the majority of pledged delegates.

Yet, I have ignored the error in estimating the support for Sanders. Rather than doing something more complicated instead what I do is increase the standard error on all coefficients by a factor of 150% and simulate the delegate distribution again. Under this situation I predict Sanders will take the majority of the pledged delegates 64% of the time. (Note the more you increase the standard error the more watered down the predictions become until all you have is a 50-50 chance of Sanders winning.)

It will come as no surprise to anybody that I am an avid Bernie Sanders supporter. The level of corruption and deceit that seem endemic to the Clinton campaign combined with the consistent upright behavior and spot of messages of the Sanders campaign makes my endorsement of Sanders very easy. I might have considered supporting one of the Republican candidates, however Trump seems to be cleaning house.

It might come as some surprise that a self-described economist would openly support Sanders. However, the exaggerated claims that "economists" are opposed to Sanders does not add up when you look at the actual fiscal support Sanders has received from economists. As of the end of January, Sanders has logged 155 contributions from economists compared with Clinton's 189. That is to say 45% of contributions made to either campaign from economists have gone to the Sanders campaign.

So it was frustrating for me to see that Sanders seemed to be already getting behind in the pledged delegates for these first primary states. However, a few nights ago I built the models and crunched the numbers and was much relieved to find that not only was Sanders predicted to do well, but win the popular vote, and the majority of pledged delegates!

I know there is much uncertainty in any kind of predictions, especially one as surprising as this election season. Thus, I caution reading to much into this prediction or really any predictions that are coming out. Frankly, this model using contribution data seems to fit the data remarkably well and the results are encouraging. But even if the model predicted Clinton would take the pledged delegates and the popular vote, I would also strongly caution reading too much into such predictions.

Only 15 states have voted representing only 25% of the pledged delegates. All the while those who learn about Sanders seem to like him more and more while for Clinton the phenomenon seems to be going the other way.

As for my code. I am happy to release it however it is not in a good condition right now. I will have to revise it for public posting. That might take a few days but I wanted to get this out right now.

End Note:
I could imagine someone saying, "What do pledged delegates matter anyways since so many delegates are determined by unpledged or super-delegates?"

I can tell you this for certain if Clinton does not win by taking the majority of pledged delegates but rather though internal party politics then it is highly unlikely the majority of Sanders supporters are going to support her in the general election. Already, many of us are put off by the games the DNC have been playing by first restricting the debate schedule so as to minimize air time of democratic challengers to Clinton then by trying to cut Sanders off from access to the voter database.

This behavior coupled with ongoing ethical and potential legal violations by the Clinton campaign have given Sanders supporters a very strong dislike for underhanded tricks. Using party insiders to win the nomination against the will of the electorate, would be seen as intolerable.

Just for fun I have included the following list of predicted outcomes and actual outcomes as well as the predicted number of pledged delegates if all pledged delegates where distributed proportionately to that of the vote. Notice that the predicted outcomes for any given state can vary quite significantly. However, it is only across states that we hope to come up with a cumulative expected outcome that may be reasonable.

StatePrimary DatePredictionTRUEPledged SandersPledged ClintonN

Wednesday, March 2, 2016

"To Pie or Not To Pie" That is the question! Graph theory

In several recent posts I have attempted to convey the nature of how the current primary season is funded (on the Democratic side). In order to assist in conveying this information I have employed several different analytical angles and graphical strategies all generated in my favorite statistical package, R. These graphs have included histograms, maps, bar-plots, box-plots, and yes, dare I say it pie charts.

I wrote my most recent post and I was surprised to find that despite its inflammatory content, the only comments I received on it were criticizing my use of pie charts.

One article linked to the comment opened, "The pie chart is easily the worst way to convey information ever developed in the history of data visualization."

The article commenced to list some very reasonable information as to why pie charts are not an effective method of conveying information. They do mention that there is a slight benefit when comparing large differences because "their only real use is to let people know what a fraction looks like."

But is this true?

The article states that charts are used because:
- Charts are a way to take information and make it more understandable.
- In general, the point of charts are to make it easier to compare different sets of data.
- The more information a chart is able to convey without increasing complexity, the better.

All of these points are great but fail to capture the two primary reasons I use charts:
- Stimulate interest in the reader.
- Provide a visual aid by which readers can understand and take away key information.

So with these graphing objectives in mind, lets look at the following graphs all produced from the same data.

Figure 1: Campaign finance pie chart. Post Code
Figure 2: Campaign finance histogram chart. Post Code
Figure 3: Campaign finance map. Post Code (not yet provided)
Figure 4: Campaign finance barplot. Post Code
Figure 5: Itemized contribution size over time, boxplot. Post Code
Figure 6: Cumulative contribution over time. Notice the steep jumps in Clinton campaign reflects the effect of large donors while the smoothness in the Sanders campaign reflects the flow of numerous small donors. Post Code
A keen eye will immediately notice that all except the fist figure are generated using ggplot2, my favorite R graphing package. ggplot2 goes out of its way not to provide a pie chart rendering tool as they strongly discourage its use. Though there is a bit of a workaround using polarized coordinates and bar plots which I decided not to use.

From looking at all six figures we can see that each of them is clearly trying to communicate the same information in a different way. Figures 1 and 2 are concerned about size of contributions, while Figure 3 provides geographic mapping of the number of contributions. Figure 4 reorganizes the information by industry category rather than contribution size while Figures 5 and 6 are more concerned with how donations change over time.

Now, looking over these figures, I have to ask, which of them even comes close to conveying the same information a effectively as the pie graphs in figure I conveys this information?

The histogram, Figure 2, provides almost the same information yet you have to spend a considerable amount of effort looking at the Figure then do some mental math multiplying size of donation to frequency of donation in order to mentally come up with values that almost resemble Figure 1.

I could instead generate a density map to try to attempt to convey the same information.
Figure 7: Density curve of campaign contribution size. Code
Yet this does not capture the information I would like to convey (Figure 7). From this graph you may mistakenly assume that for the Clinton campaign small contributions are more important than large ones. However, this is not the case as we know from Figure 1. The problem with a density graph like this is that it is measuring the density which is the number of contributions. This does not reflect in any obvious way how important those contributions are.
Figure 8: Contribution size/importance plot. This is the same plot as a density plot (Figure 7) but rather than counting the number of contributions at each amount it calculates to total value of those contributions. Code

We get much closer to the information I am attempting to describe in Figure 1 with Figure 8. Figure 8 shows us that there are certain peak quantities most frequently donated with the two different campaigns. One quantity is around the $2700 mark for the Clinton campaign (the maximum allowable without using Super-PACs) while the other is the less than $100 area for the Sanders campaign.

Looking at Figure 8 we can gather basically the same information as that of the pie-chart. Maybe a little more as we can see that there are certain peak values (200,500,1000,2000,2700) which are more likely donor values. Yet, I would argue that this information is not really important. It might even be a distraction to the main point of the original post (FALSE: Clinton Funded by "Grassroots").

Not only is the information potentially a distraction, but it requires additional analysis on the part of the reader to figure out what information the chart is trying to convey. A pie-chart on the other hand is an amazingly simple chart that anybody who has familiarity with pies or charts can easy read and understand when comparing large differences in proportions. Thus readers can in a glance get a full and easy to remember understanding of the information that is being transmitted.


Here we have it! One pie-chart that efficiently conveys certain types of information against seven other figures which struggle to convey the same information as what the pie-chart easily conveys.

My final suggestion therefore is that people start thinking more about what they are attempting to communicate with their charts and less about what the chart gurus are telling us to do.

Building effective graphics is like writing effective pros. Know what you want to say and figure out the easiest and most straightforward way of saying it, period.

Tuesday, March 1, 2016

FALSE: Clinton Funded by "Grassroots"

The blatant distortions of reality put forth by the Clinton campaign are so offensive as to be laughable at times. In the victory speech of Hillary Clinton in South Carolina she spent a significant portion of it talking about how her campaign is financed by "grassroots".

Well, looking at the breakdown of funding for her campaign, only about 12% of her funds are from individuals contributing less than $200 while the vast majority of her funding (77%) is from individuals contributing $1000 or more.

If you are going to tell me that a movement is 77% funded by people giving $1000 or more is "grassroots", I am going to have to ask, "what grass are you smoking?" The only way you can call such a top-heavy movement "grassroots" is if you are growing grass in Koch brothers' back yard!

Obviously some small portion of the Clinton campaign is funded by small donors. However, for the campaign to misrepresent itself as "grassroots" powered by "small-donors" is frankly a complete falsehood especially when compared with a true grass roots funded campaign.

From the second graph we can see what a true "grassroots" campaign looks like. This is the Sanders campaign which has received only 10% of its total funds from individuals giving $1000 or more and 72% of its funds from people giving less than $200.

You might think that the Clinton campaign only looks bad when compared with publicly backed campaigns such as that of Bernie Sanders. However, this is not the case either. As I have noted previously, the Clinton campaign has far more big sponsors than all other candidates currently campaigning combined. And this is not counting the millions of funds paid into the Clinton Super-PAC.

But don't take my word for it. Run the analysis yourself.

Related Articles: 
Clinton Many More Rich Supporters Than All Other Candidates Combined
Big Business Backs Hillary: Small Bernie
Analysis: Clinton backed by Big Money: Sanders by Small
Overwhelming Growth In National Support for Bernie Sanders Mapped
Hillary 1993: Largest Drop in Girl Names EVER; Chelsea Distant Second
As First Lady, Popularity of Babies Named "Hillary" Dropped by an Unprecedented 90%
Hillary Clinton's Biggest 2016 Rival: Herself
Legally Rig An Election: A Citizen's Guide to Gerrymandering 
Nevada:Sanders has 6x the Supporters as Clinton
The Simple Reason Sanders Is Winning
Cause of Death: Melanin | Evaluating Death-by-Police Data
Obama 2008 received 3x more media coverage than Sanders 2016
The Unreported War On America's Poor
What it means to be a US Veteran Today