Saturday, February 13, 2016

The Simple Reason Sanders Is Winning

Sanders has way more backers across the United States (with the possible exception of the South).

Hillary Clinton might be doing well at the polls. However, the shocking fact of polling is that only 8-9% of those asked to participate in polls combined with most polls given to landline owners, the populations being polled do not currently represent of the voting population.

The results of these two factors is what we saw in the Iowa and New Hampshire primaries. That is, even the best estimates were off by a significant margin. As a result, of the growing inability to execute effective polls, we must look to other sources of data. Some have looked at search results, twitter posts, facebook posts, etc. These, posts sources of information seem useful though they are difficult to interpret.

A much better indicator of campaign health, I would suggest, is the ability of a candidate to inspire a wide and diverse base of supporters. From my last post, Analysis: Clinton backed by Big Money: Sanders by Small, it is clear from the data filed with the Federal Election Commission that Sanders has a massive number of small supporters relative Clinton's relatively small number of large supporters.

The implications of this information are initially unclear. Obviously, any candidate would want more supporters. However, distribution of supporters is important. Perhaps all of Sanders' supporters are in the North East around Vermont and New Hampshire and thus his message is not being picked up by the rest of the country.

Before jumping into the maps let me just first warn that due to the immensity of small contributions to the Sanders campaign, we do not have information on 74% of contributor information in contrast to that of the Clinton campaign in which we are only missing 14% of contributor data.

In the following maps I am counting how many contributors have contributed to each campaign in each county (the next unit smaller than a state, much like a municipality or district in other countries) of the contiguous United States (apologies Alaska, Hawaii, and our protectorates). A county is ranked from 0 to 1 with 0 being all of those who contributed to a campaign contributed to the Clinton campaign and a 1 being if all of those who contributed to a campaign contributed to the Sanders campaign. Any numbers in between indicate proportion of contributions to the sanders campaign from total number of contributions. Counties without recorded contributions are left out.

In April, very few people knew who Bernie Sanders is. Hillary Clinton however was well known and had people across the country contributing to her. From the map we can see that 82% of counties who had contributors, had the majority of contributors to Hillary.

As early as May 2015, we can see that Sanders is rapidly closing the contributor gap, knocking the number of countries in which Clinton leads from 82% to 62%.
In June, more of the same; Sanders gaining a significant foothold in California and New England.
We can see that even as Sanders is closing in on Hillary's lead, more of the counties in the US start participating in the process.
By as early as August, we can see that the average number of contributors across counties only has a 6 point gap between Clinton and Sanders.
 And into September, Sanders has taken the lead in contributing counties across the country.
In September Sanders does not gain ground but Clinton also does not lose ground.
But going into November, whatever gains Clinton had made become lost as an increasingly larger portion of the counties start contributing.
By the end of 2015, even with 74% of Bernie's contributions not being recorded compared with only 14% of Hillary's, Bernie has people committed enough to contribute to his campaign from across the US that 67% of counties favoring him relative to 31% that favor Hillary.

So what? Does this really matter?

Given that for every one Bernie supporters showing up in the data we know, there are approximately three contributors not showing up in the data, this is a pretty huge margin of supporters willing to give up their personal resources in order to support the Sanders presidential bid.

Yet, even these numbers are only from the end of December. January was the biggest month on record so far with Bernie Sanders for the first time out-raising Hillary. Once those contributions are reported to the FEC, I am certain a much bigger chunk of the map will be blue.

Despite Bernie mobilizing contributors from all around the country, you still may be unconvinced of Bernie's significant and otherwise difficult to observe edge over Hillary.

If so, scroll through these maps one more time and get a feel for the growing numbers and diversity of supporter rallying in the country. The momentum is with Bernie Sanders. Unless something dramatic and unexpected happens, Sanders is going to continue to dominate the primaries.

Related Posts:
Analysis: Clinton backed by Big Money: Sanders by Small
As First Lady, Popularity of Babies Named "Hillary" Dropped by an Unprecedented 90%
Hillary Clinton's Biggest 2016 Rival: Herself
Cause of Death: Melanin | Evaluating Death-by-Police Data
Obama 2008 received 3x more media coverage than Sanders 2016
The Unreported War On America's Poor
What it means to be a US Veteran Today

Tuesday, February 9, 2016

Analysis: Clinton backed by Big Money: Sanders by Small

This article examines FEC data in depth and finds what most people already know. Hillary Clinton's presidential bid is financed largely through a relatively small quantity of big donors while Bernie Sanders' presidential bid is funded by numerous small donors.

In order to do our analysis, we look at four hundred thousand individualized contributions reported to the FEC at the end of the 2015 year. These contributions are only reported for individuals who have donated $200 or more. Because many individuals give smaller than $200, this means we do not have individualized information for many individuals. For Hillary Clinton, this means about 16% of contributions are not reported as individualized. For Bernie Sanders, this means 74% of contributions are not reported as individualized.
Table 1: The histogram captures the distribution of contributions. Note the x axis is scaled by log10 so the same distanced exists between 1 and 10 as, 10 and 100, or 100 and 1000
From Table 1, we can see that throughout the entire 2015, Clinton has vastly more large contributors than Sanders with over 20,000 campaign contributors giving the maximum contribution value of $2700*. Clinton also has a larger number of large donors giving the contribution values with another 20,000 donors giving between $500 and $2700.

Conversely, for the smaller value donations, Sanders has many more contributors than Clinton with nearly 35,000 contributions at $100 compared with Clinton's 23,000. With $50 donations, Sanders also does much better with over double the number of donations with over 40,000 contributions compared with Clinton's 20,000. The difference is even more stark with Sanders receiving nearly 40,000 ten dollar donations compared with Hillary's 12,000.

 There are some ways to avoid the legal contributions limits as discussed in this NPR article.
Figure 2: A series of box-plots comparing contributions over time for 2015. The horizontal dotted line is at the individual maximum of $2700. There is also a maximum of 5000 that can be contributed to a PAC. Many of the contributions that exceed $2700 end up having part of them refunded to the contributor because they exceed the legal limit. Outliers are indicated with 'x's. Note the y axis is on the log10 scale. Looking at November, we can see some two significant outliers to from the "Hillary Victory Fund" at 1.6 and 1.8 million respectively. These are reported as unitemized but this seems rather unique.
From Figure 2 we can see some pretty shocking facts about the nature of her contributions early in her bid. In April and May and almost into June the upper quartile (top 25%) of her contributions were at or above the legal maximum. This is vastly different from Sanders who had a handful of contributions at or about the legal maximum but nothing close to the number by Clinton. Overall, difference between the two in April and May could not be any more stark with the upper quartiles for Sanders at or below the median for Clinton for nearly all of the months observed.

Overall we can see there is a significant amount of movement in the size of donations over time. For both Clinton and Sanders, there is a bit of a race to the bottom. This is driven somewhat by the nature of the reporting laws as contributions are not reported until an individual has given at least $200. After that, all contributions are reported. Thus many of the smaller contributions will be reported as repeat contributors keep donating.

Figure 3: Shows the distribution of contributions by candidate. This figure is the the same as Figure 2 except the y-axis is not scaled by log10 and the upper limit is set at the legal maximum of 2700. Outliers above 1.5x the interquartile range have also be removed.
Form Figure 3, we can see that the difference in the nature of contributions by candidate is vast with almost all of the contributions to the Sanders campaign of less than $500. For the first two months over half of the listed contributions for the Clinton campaign was $500 or more. Over time, the average size on contributions decreased though much faster for the Sanders campaign.

From Figures 2 and 3 we might be concerned that Sander's campaign is not capable of raising sufficient funds to compete with the Clinton campaign. However, this is forgetting that Sanders has many many more contributors than Clinton. In order to get an estimate of the number of contributions that are given but itemized, I look at the number of contributions each quarter unitized and assume that those contributions are on average $30 (probably a high estimate).

Table 1: Total Not-Itemized Contributions by quarter. # of Contributions is based on assuming each of these contributions estimated $30.

$ Not-Itemized# Of Contributions
First Report (July)$8,098,571 269,952
Fall Report (October)$5,193,811 173,127
Year End Report (December)$5,707,408 190,247
First Report (July)$10,465,912 348,864
Fall Report (October)$20,187,064 672,902
Year End Report (December)$23,421,034 780,701

From Table 1, we can see that Clinton initially reported nearly as many small contributions as Sanders, those contributions have since fallen off while Sanders small contributions have significantly increased in order to outpace Clinton by four to one.

Figure 4: Total number of contributions over time and the difference between the two.
Smoothing the number of small contributors across the months campaigning we end up with Figure 4 in which we quickly see how vast the difference between the number of contributors to the Sanders campaign and the Clinton campaign are.

Initially, Clinton enters the race a little earlier with a quarter million contributions. However, once Sanders enters the campaign, he quickly gains support with his total number of contributions matching that of Clinton by June 5th and continues to grow. By September 20th Sanders has already collected twice the number of contributions that Clinton has.

So how does this map to total contributions collected over time? We already know that Clinton has a large number of big donors on her side.
Figure 5: Total quantity of dollars contributed over time.
From Figure 5, we can see that despite Clinton getting an early and big hand up from large money. As early as July, the difference in funds raised by Clinton was just over 30 million more than that raised by Sanders. However, despite continuing to have large contributions, this difference has not increased much since with as of the end of the year, only a little more than a 35 million dollar different from individual contributions.

Overall, this is an AMAZING fact. Somehow, despite having the majority of wealthy democratic donors in her corner, Clinton has failed to out-raise Sanders since July!

Not only that, but Hillary is in a difficult position, many of her largest donors have already maxed out their ability to legally contribute to her campaign, yet very few of Sanders contributors have gotten close to maxing out their legal ability to contribute to campaigns. Of course there are always the dubiously legal contributions to candidate Super-PACs made legal by the infamous "Citizens United" supreme court ruling.

However, as Sanders has campaigned against Super-PACs and Hillary is attempting to win over his supporters, it will certainly be interesting to see how fundraising changes moving forward as she risks being hamstrung by her narrow but affluent base.

Source Code on GitHub

Related Posts:
As First Lady, Popularity of Babies Named "Hillary" Dropped by an Unprecedented 90%
Hillary Clinton's Biggest 2016 Rival: Herself
Cause of Death: Melanin | Evaluating Death-by-Police Data
Obama 2008 received 3x more media coverage than Sanders 2016
The Unreported War On America's Poor
What it means to be a US Veteran Today

Monday, February 1, 2016

As First Lady, Popularity of Babies Named "Hillary" Dropped by an Unprecedented 90%

In this article I examine the dramatic drop in the popularity of naming babies "Hillary" beginning at the start of President Bill Clinton's term. In order to understand the context of that drop, I look at the popularity of the first names of other First Ladies starting with Thelma (Pat) Nixon back in 1969.

The Rise and Fall of the Name Jennifer

Hillary Clinton's Biggest 2016 Rival: Herself
Obama 2008 received 3x more media coverage than Sanders 2016

In an article "Poor, Poor Hillary", the author makes note that the popularity of naming babies Hillary fell dramatically during her tenure as First Lady. I found this an interesting idea and decided to expand upon it to see if a fall in popularity was typical of First Ladies. Looking back at the last eight I found that all of them were associated with a drop in popularity of the first name.

Table 1: Drop in popularity of first name as a ratio of frequency at end of term divided by frequency the year before beginning of term.


From Table 1, we can see that the name "Hillary" dropped the most in popularity over her term as First Lady by a whopping 90%, followed by Laura (Bush) 62%, and distantly Michelle (Obama) 48%. Nancy retained the most popularity by only falling by 16% while First Lady.

Figure 1: Change in popularity of baby names super imposed over terms. The popularity has been scaled so that 1 is the peak of popularity of the baby name over the years as First Lady (or the one year prior). Any values above 1 have been scaled to 3% of their original size.

When we look at Figure 1 we can see that naming popularity seems to be heavily affected by First Ladies. Most names experienced a steady downward trend in popularity. The name "Rosalynn" is an exception as it peaked in popularity during the Carter administration before falling by the end to have risen in popularity once again.

The name "Hillary" is very unique in this pattern as unlike most names, it was growing rapidly in popularity prior to the Clinton administration. However, early into the Clinton administration the popularity dropped rapidly falling to pre-1980s levels for the name. Except for a small rally in during the 2007/2008 primary campaign against Obama, it has not recovered.

This massive drop in popularity is not only exceptional for its magnitude but because it completely reversed the trend in naming going on up to that point. This is in stark contrast to the other names Betty, Nancy, Laura, and Michelle which were generally loosing popularity since the 1970s. It is somewhat difficult to see how much popularity they have lost since I scaled down everything above 1 to 3% in scale.
Figure 2: This is the same information as Figure 1 except values above 1 are on the same scale as values below 1.

From Figure 2 we can see that many of the downward trends in popularity of names preexisted before First Lady status. Looking at this graph is it easy to suspect that names such as Michelle and Laura have fallen in popularity not because of the lack of popularity of them as First Ladies but because of an ongoing general trend for those names.

This analysis is very easy to do. All of the names are taken from the Social Security baby names list released each year. The code (without comments, apologies) can be found on GitHub.

Sunday, January 31, 2016

Hillary Clinton's Biggest 2016 Rival: Herself

In a recent post I noted that despite Bernie Sanders doing better in many important indicators, Obama 2008 received 3x more media coverage than Sanders 2016.

Reasonably, a reader of my blog noted that not all coverage was equal, that a presidential hopeful might be happier having no coverage than negative coverage. So I decided to do some textual analysis of the headlines comparing Sanders and Clinton in 2016 and Obama and Clinton in 2008.

I looked at 4200 headlines mentioning either Obama in 2007/08, Sanders 2015/16, or Clinton 2007/08 or 2015/16 scraped from major news sources: Google News, Yahoo News, Fox, New York Times, Huffington Post, and NPR (From January 1st, 2007 to January, 2008 and January 1st, 20015 to January, 2016).

First I constructed word clouds for the Clinton and Sanders race.
Figure 1: Hillary Clinton's 2015/2015 headline word cloud. Excluding "hillary" and "clinton" as terms when constructing the cloud.
Figure 2: Bernie Sanders headline word cloud. Excluding "bernie" when constructing the cloud.
From looking at the differences between Figure 1 and Figure 2, there appears to be some pretty significant differences. First off, the most frequent term in Figure 2 is "Clinton" followed by a lot of general stuff. "Black" for black vote since there is some concern that Bernie can't get the black vote perhaps combined with some high profile black political activists endorsing him.

Figure 1 though is a world of difference. Almost every major word is a scandal. Email and emails, ben ghazi, private server, and foundation. Each referencing either the email scandal in which Clinton set up an potentially illegal private server to house her official emails while secretary of state, Ben Ghazi, the affair in which diplomats died as a result of terrorist action which many have blamed on Hillary Clinton, as well as the alleged unethical misuse of Clinton foundation funds as a slush fund for the Clinton families luxurious tastes. Interestingly, "Bruni", as in Frank Bruni, a New York Times reporter who has taken some heat for his critical reporting of Hillary Clinton has appeared in the cloud.

But is this really so bad? How does these word clouds compare with those of 2007/2008?

Figure 3: The word cloud from 2007/2008 for Hillary Clinton excluding "hillary" and "clinton".
Figure 4: The word cloud from 2007/2008 for Barack Obama excluding "obama".
From Figure 2, 3, and 4 we can see a significant and substantive difference from that of Figure 1. In those figures the most newsworthy thing to report is the rivalry for the primary seat. All other issues are dwarfed. With Figure 1, scandals and criticism of Hillary Clinton abound. Looking at these word clouds, I would suspect that the Clinton camp would be happy to have the news coverage they had in the 2008 campaign rather than the coverage they are currently having.

But are these frequency word graphs really a reasonable assessment of the media? What of the overall tone of these many articles?
Figure 5: Sentiment analysis of the news coverage of Clinton 2008 and 2016 and Obama 2008 and Sanders 2016. Scales have been standardized so that a positive rating indicates higher likelihood of emotion being displayed and negative rating indicates lower likelihood of emotion being displayed.
From Figure 5 we can see that headlines mentioning Sanders score the highest on the emotions: anticipation, joy, surprise, trust, and positivism. He also scores the lowest in: anger, fear, sadness, and negativity. While Clinton 2016/2008 score the highest on: anger, disgust, fear, sadness, and negativity and the lowest on: anticipation, joy, trust, and positivism.

Compared with 2008, Clinton 2016 articles appear to: have less anger, anticipation, joy, trust, and fear while also having more disgust, sadness, surprise, negativism, as well as slightly more positivism. Overall, the prospects as gauged from the emotions engendered by the media appear to be pretty bleak for Hillary Clinton.

It is interesting to note that articles about Sanders score emotionally very similar in general to that of Obama in direction except that Sanders seems to be outperforming Obama with higher: anticipation, joy, trust, and positivism while also performing better by getting lower scores in: anger, fear, sadness, and negativism. In only one indicator does Obama do better than Sanders and that is in the emotion disgust. The largest emotional difference between Obama 2008 and Sanders 2016 is that Obama articles scored the lowest on surprise while Sanders have scored the highest.

Overall, we must conclude that at least in terms of emotional tone of articles if not coverage, Sanders is doing significantly better than Hillary and even better than Obama was at this time in the 2008 presidential race.

Thursday, January 28, 2016

Obama 2008 received 3x more media coverage than Sanders 2016

Many supporters of presidential hopeful Bernie Sanders have claimed that there is a media blackout in which Bernie Sanders has been for whatever reason blocked from communicating his campaign message. Combined with a dramatically cut democratic debate scheme (from 18 in 2008 with Obama to 4 in 2016 with Sanders) scheduled on days of the week least likely to be viewed by a wide audience this is seen as a significant attempt to rig the primary to ensure Clinton gets the nomination.

Despite a strongly supported petitions with nearly 120 thousand signatories and 30 thousand signatories demanding more debates, Debbie Wassermann Schultz, chair of the Democratic National Committee (DNC) and former campaign co-manager for Hillary Clinton's 2008 campaign has repeatedly denied the possibility of considering more debates.

Combined with a complex fiasco earlier in the year dubbed "DataGate" in which the DNC temporarily shut down the Sanders campaign from accessing critical voter information two days before the third debate based information presented by Schultz and refuted by the vendor. Access to the data was quickly restored after a petition demanding action gathered 285 thousand signatures in less than 48 hours.

With these two scandals in mind, Sanders supporters have become increasingly paranoid of what they view as the "establishment" acting to protect its candidate, Hillary. In this light, they have been very frustrated by the lack of media coverage of Sanders. Supporters claim that he and his views are almost entirely unrepresented by the news media.

I have been wary of jumping on this bandwagon. It seems natural that the democratic front-runner would get more coverage than that of a less known rival. Clinton naturally attracts media attention as she seems to have a new scandal every day while Sanders seems to be a boy scout who apart from being jailed for protesting segregation in the 60s, not enriching himself from private speaking fees and book deals, adamantly defending the rights of the downtrodden, and standing up to the most powerful people in the world really has little "newsworthy" about him.

Setting aside the difficult question of what the media considers "newsworthy", I would like to ask the question, "Is Sanders getting more or less media coverage than Obama got in 2007/2008?"

In order to answer this question, I look back at the front pages of online newspapers from 2015 and 2007. Starting on January 1st and going up till yesterday, I scraped the headlines of Google News, Yahoo News, Huffington Post, Fox News, NPR, and the New York Times.

Table 1: This tables shows the frequency the name "Sanders", "Obama", or "Clinton" (or "Bernie", "Barack", or "Hillary") have come up in each of the news sources for which headlines were recorded in the current race compared with that of the 2008 race. The columns Sander/Clinton and Obama/Clinton show the relative frequency. The highlighted rows show the relevant headline ratios with numbers less than 1 indicating the ratio of headlines featuring a challenger to that of Clinton.

Web N Sanders  Obama  Clinton  Sanders/Clinton  Obama/Clinton
2008 NYT 25902 1 100 138 0.01 0.72
2008 Fox 39132 10 167 357 0.03 0.47
2008 Google 8452 0 103 131 0.00 0.79
2008 HuffPost 1281 0 40 60 0.00 0.67
2008 NPR 20878 0 90 94 0.00 0.96
2008 Yahoo 27308 3 266 334 0.01 0.80
2016 NYT 36703 142 592 531 0.27 1.11
2016 Fox 32971 78 1284 898 0.09 1.43
2016 Google 21036 67 378 253 0.26 1.49
2016 HuffPost 45131 236 925 549 0.43 1.68
2016 NPR 9216 52 259 106 0.49 2.44
2016 Yahoo 19844 44 346 206 0.21 1.68

From Table 1, we can see that NPR is the news network which has the most balanced coverage of Obama in 2008 and Sanders in 2016. Fox is the least balanced of the networks with almost no coverage of Sanders. It is worth noting the the coverage of Sanders is abysmal in general, with no agency reporting on Sanders even half as much as Clinton. This is a significant deviation from Obama's race against Clinton in which only Fox reported on him with slightly less than 50% coverage.

Table 2: This table shows the total number of news reports across all agencies for each candidate in each race. 

Race  N  Sanders  Obama  Clinton  Sanders/Clinton  Obama/Clinton
2016  164901 619 3784 2543 0.24 1.49
2008  122953 14 766 1114 0.01 0.69

From Table 2 we can see that both candidates Sanders and Obama have not received nearly as much coverage by the media as their rival Hillary Clinton. Sanders however seems to be at significant disadvantage compared with Obama at the same time in the previous race as Obama on average had about two articles written about him for every three written about Clinton. Sanders has significantly less coverage with only one article written about him for every four written about Clinton.

By this time in the 2008 primary race, Senator Obama had received 2.8 times as much coverage relative to his rival Hillary Clinton as Senator Sanders (.69/.24=2.8). This is despite Sanders doing better than Obama in many key metrics (Crowds, Donations, and Polling).

With Sanders taking the lead in New Hampshire and neck and neck with Clinton in Iowa, we might wonder if coverage is improving for the Sander's campaign.

Figure 1: The top curve is the relative frequency of Obama coverage relative to that of Clinton while the bottom curve is that of senator Sanders to that of Clinton. A 1 on the y axis represents equal coverage of the challenger with that of Clinton.
From Figure 1 we can see that despite a remarkable performance in energizing large crowds, doing well on polls, and collecting an immense quantity of donations, media coverage appears to be dreadful for Sanders with even in the current peak, for every two stories about Clinton, there is only one story about Sanders.

This is probably in part due to how the DNC and the Clinton camp (doubtful there exists any difference) appear to have white washed the primary, restricting the debate structure and constantly adjusting Clinton's positions so that they appear indistinguishable from that of Sanders'.
Figure 2: Shows a popular twitter meme which conveys the frustration many have with the media.

In number of written articles Bernie has suffered due to an apparent media blackout. He has also suffered in lack of airtime. We can see this from Figure 2, in the number of minutes of coverage of him aired as of the 20th of December.

The criticisms of the DNC rigging the debate process and the bias in which candidate the media chooses to follow are significant concerns for any democracy. This all fits well within a "systemic" corruption framework of thinking. However, this framework might not accurately fit what is actually happening with the media and within the DNC.  Additional investigation is required before further conclusions can be made.

But even in the presence of uncertainty as to the true nature of the presidential campaign. Accusations such as these and others levied against the Hillary Clinton and the DNC should be investigated with due diligence as they represent a fundamental threat to the existence of the democracy far more pernicious and dangerous than anything Middle Eastern terrorists can muster.

Tuesday, January 19, 2016

Who are Turkopticon's Top Contributors?

In my most recent post "Turkopticon: Defender of Amazon's Anonymous Workforce" I introduced Turkopticon, the social art project designed to provide basic tools for Amazon's massive Mechanical TURK workforce to share information about employers (requesters).

Turkopticon, has a been a runaway success with nearly 285 thousands reviews submitted by over 17 thousand reviewers since its inception in 2009. Collectively these reviews make up 53 million characters which maps to about 7.6 million words as 5 letters per average word plus two spaces. At 100 words every 7 minutes this represents approximately 371 days collectively spent just writing reviews. It is probably safe to considered this estimation an underestimation.

So given this massive investment of individuals in writing these reviews, I find myself wanting to ask, "who is investing this kind of energy producing this public good?"

In general, while there are many contributors, 500 contributors represent 54% of the reviews written. With the top 100 reviewers making up 30% of the reviews written and the top 15 representing 11% of all reviews written.

Figure 1: Using this graph we can find the Gini coefficient for number of submissions at around 82% indicating that a very few individuals are doing nearly all of the work.
Within Turkopticon there is no ranking system for reviwer quality so it is not obvious who are the top contributors and what their reviewing patterns look like. In this article we will examine some general features of the top contributors.

Table 1: A list of the Top 15 Turkopticon review contributors. Rank is the reviewer rank by number of reviews written. Name is the reviewer's name. Nrev is the number of reviews written. DaysTO is the number of days between the oldest review and the most recent review. Nchar is the average number of characters written in each review. FAIR, FAST, PAY, and COMM are quantitative scales that Turkopticon requests reviewers rank requesters by. Fair indicates how the requester was at either rejecting or failing to reject work. Fast indicates how quickly the requester approved or rejected work. Pay indicates how the reviewer perceived the payment scheme for work was. And Comm refers to communication which indicates, if the worker attempted to communicate with the requester, how well that requester addressed the worker's concerns.

Thom Burr
NurseRachet (moderator)

Find the full list as a google document here (First Tab).

From Table 1 we can see that all of the top 15 reviewers have contributed over 1,200 reviews with bibytes being the most prolific reviewer contributing over 52 hundred. In terms of the reviewer active on Turkopticon the longest, NurseRachet (a forum moderator) has been on the longest followed by worry and Rosey. In terms of the longest winded kimadagem has the longest average character count per review at 490 characters or  approximately 70 words per review while CaliBboy has the shortest reviews at only 75 characters or around 10 words.

In terms of the averages the four rating scales there is a fair bit of diversity between the top reviewers with jaso...@h.. having the highest average score between the four scales of 4.8 and jmbus...@h... having the lowest average scores, around 2.7 followed by ptosis with a average a tiny bit higher than 3.

So now we have a pretty good idea of what in general the top contributors to Turkopticon look like.

But what of the quality of the contributions?

In order to understand what a quality contribution in Turkopticon looks like we must consider the standards that the community has come up with after years of trial and error.
1. The four different scales should be distinct categories. That is a high pay rate should not cause someone to automatically rank a high Fairness or visa versa.
2. To this end what is referred as 1-Bombs an attempt to artificially drop a requesters score by ranking all scales 1 should be avoided. Similarly, 5-Bombs should also be avoided.
3. Within Turkopticon there is also the ability to flag reviews as problematic. If one of your reviews is flagged, it means someone has a problem with it.
4. In general we would like reviews to be approached with a level head so that reviewers write independent reviews rather than ones based on their current mood.
5. Finally, in general we would like reviewers to review as many categories as they can when writing reviews.

From these 5 guidelines, I will attempt to generate variables that measure each of these targets.
1. For different scales I will focus on the relationship between pay and the other three scales for individual requesters (FairPay, FastPay, and CommPay for the correlations between Fair, Fast, and Comm with pay respectively). The reason I focus on Pay is that it seems to be the scale often times that concerns Mturk workers the most.

Table 2: For reviewers the average correlation between Pay and other scales.
Top 100
Top 15

From Table 2 we can see that the average reviewer has a very strong positive correlation between Pay and the other scales with FAIR, FAST, and COMM in the .73-.81 range. In contrast the Top 100 and especially the Top 15 all have much lower correlations. We should not necessarily hope for a zero correlation between these factors since one might expect a requester who pays too low might also act unfairly, not respond quickly to submissions, or have poor communication habits.

2. 1-Bombs and 5-Bombs are easy to observe in the data in terms of all 1s or all 5s. However, it is worth noting that all of either 1s or 5s might actually be a valid review given the circumstances. Variables 1Bomb and 5Bomb will be a variable measuring the likelihood that an individuals review will be either of the two categories.

3. Flags are also a variable that can be directly observed. Multiple flags can be featured on a single review. The highest flag hit in my data has 17 flags. The variable FLAG is the average/expected number of flags for an individual reviewer's reviews.

Table 3:  The prevalence rates of 1-Bombs, 5-Bombs, and Flags.
Top 100
Top 15

From Table 3 we can see the prevalence rates of 1-Bombs, 5-Bombs, and Flags is much higher among the general reviewers than that of the Top 100 and especially among the top 15.

4. In order to attempt to measure "level-headedness" I will just look at how reviews trend from a rating perspective. That is, is the value of the current review correlated (either positively or negatively) with the value of the next review?

Table 4: The auto-regressive one step correlation between review levels. In this case the "ALL" category only includes the 3,700 reviewers who have written more than 10 reviews.

Top 100
Top 15

From Table 4 we can see that inter-review correlation is pretty small especially when compared with the correlation between pay and other scales within the same review (Table 2). Interestingly for the average reviewer, there is almost no correlation across reviews. This might be a result of reviewers writing less reviews in general, thus spacing them more widely and therefore less likely to be sequentially influenced by personal psychological trends.

5. Finally in terms of completeness we can easily measure completeness in terms of how frequently reviews of individual scales were not completed.

Table 5: The completion rates of individual scales.

Top 100
Top 15

From Table 5 we can see that the completion rates of all scales are more or less equivalent between that of the general reviewers and that of the Top 100 and Top 15 except in the case of COMM. In this case we can see that the top reviewers are much less likely to rate communication.

Constructing A Quality Scale

In order to construct the best scale given our data, we will choose those variables and values that seems to typical of the top 15 most prolific reviewers. From Tables 2 and 3 we can see very distinct differences between the average reviewer and top reviewers. However, for our auto-correlation and completeness rates we see very little differences in general except that the top reviewers are much less likely to rate communication. I can't know exactly why this is the case but I suspect it is a combination of top reviewers avoiding 1-Bombs and 5-Bombs perhaps in combination with top reviewers finding it not typically worth their time to directly communicate with requesters.

So here is my proposed index using standardized coefficients (x/sd(x)):
ReviewerProblemIndex = 3*Flag + 3*1Bomb + 1/2*5Bomb +
                                          1*FairPay + 1*FastPay + 1*CommPay

Because we have standardized the coefficients we can read the scalars in front as directly representing the weight of that variable. Flags, I will weight the strongest as they are an indicator that someone in the community has a problem with the review. Next highest rating are 1Bombs which are widely regarded as a serious problem and frequently discussed on the Turkopticon forum.

5Bombs, FAIRPay, FastPay, and CommPay are also discussed but not considered as important (Turkopticon Discuss). I have caused the 5Bombs to be half as important as FairPay, FastPay, and CommPay variables as it seems cruel to penalize someone for being generous with reviews.

So let's apply our index and see how our top 15 reviewers score!

Table 6: The top 15 most prolific contributors ranked based on the ReviewerProlemIndex (Index, RPI). IRank is the ranking of reviewers in terms of the RPI. Name is reviewer name. Nrev is the number of reviews written. Rank is the reviewers ranked in terms of number of reviews written. The other variables are described above.

IRank  Index   Name Nrev  Rank  Flag  1Bomb  5Bomb  FairPay  FastPay  CommPay
1 1.9 jessema...@g... 1539 9 0.001 0.001 0.016 0.12 0.09 0.20
2 2.1 kimadagem 3732 2 0.002 0.000 0.014 0.05 -0.01 0.27
3 3.2 worry 2637 3 0.000 0.003 0.006 0.11 0.11 0.53
4 3.5 absin...@y... 1320 10 0.000 0.000 0.007 0.24 0.13 0.55
5 3.5 bigbytes 5236 1 0.001 0.000 0.007 0.20 0.04 0.54
6 4.0 surve...@h... 2488 5 0.001 0.001 0.008 0.32 0.29 0.34
7 6.4 shiver 1721 7 0.001 0.005 0.015 0.50 0.33 0.76
8 6.6 jaso...@h... 2100 6 0.001 0.004 0.070 0.41 0.27 0.83
9 10.9 Thom Burr 1594 8 0.002 0.013 0.030 0.87 0.84 0.92
10 11.0 Rosey 1313 11 0.004 0.009 0.022 0.81 0.81 0.85
11 12.4 NurseRachet (moderator) 1274 14 0.016 0.022 0.078 0.39 0.32 0.46
12 12.7 CaliBboy 1281 12 0.022 0.004 0.005 0.20 0.21 0.47
13 13.1 TdgEsaka 1234 15 0.015 0.016 0.029 0.57 0.40 0.73
14 13.4 ptosis 1278 13 0.009 0.039 0.034 0.80 0.78 0.73
15 17.2 jmbus...@h... 2539 4 0.003 0.170 0.020 0.99 0.98 0.92

From Table 6 we can see that in general the more prolific reviewers also tend to be higher ranked on the RPI with a few exceptions. One exception is "jmbus", despite being the fourth most prolific contributor he/she is ranked at the bottom of the top 15 contributors list. This is likely due to having the highest 1-Bomb rate of the index with 17% of reviews being 1Bombs. His/her reviews also seem to be almost entirely correlated with Pay as FairPay, FastPay, and CommPay are all correlated upwards of 90%.

Similarly, "jessema" though only being the 9th most prolific reviewer seems to have the highest quality of reviews (slightly ahead of "kimadagem") with very low Flag, 1Bomb, and 5Bomb rates as well as very low correlation between the scales Fair, Fast, and Comm with that of Pay. Interestingly, though both "Thom Burr" and "Rosey" have very high correlation rates between Pay and the other scales, because the have relatively low Flag, 1Bomb, and 5Bomb rates they are ranked near the middle.

Overall, except for a few exceptions, I am very impressed that the top contributors seem to score so well on the RPI index.

Table 7: The Top 100 most prolific contributors ranked based on the Reviewer Problem Index (RPI).
Rank  Index   Name Nrev  Rrank  Flag  1Bomb  5Bomb  FairPay  FastPay  CommPay
1 -0.13 seri...@g... 488 64 0.000 0.000 0.006 0.00 -0.05 0.00
2 1.67 james...@y... 365 98 0.000 0.000 0.000 0.29 0.00 0.18
3 1.72 donn...@o... 1064 23 0.001 0.000 0.006 0.04 0.04 0.27
4 1.85 jessema...@g... 1539 9 0.001 0.001 0.016 0.12 0.09 0.20
5 1.94 iwashere 689 44 0.003 0.000 0.017 0.00 0.05 0.12
6 2.03 kimadagem 3732 2 0.002 0.000 0.014 0.05 -0.01 0.27
7 2.06 mmhb...@y... 422 79 0.005 0.000 0.009 0.00 0.00 0.00
8 2.21 aristotle...@g... 579 51 0.002 0.000 0.010 0.10 0.11 0.19
9 2.90 Kafei 561 55 0.002 0.000 0.027 0.16 0.13 0.27
10 2.93 turtledove 1188 19 0.001 0.000 0.012 0.32 0.04 0.34
90 15.28 Anthony99 571 53 0.005 0.014 0.391 1.00 1.00 1.00
91 15.83 cwwi...@g... 543 57 0.011 0.070 0.026 0.84 0.85 0.84
92 16.25 rand...@g... 490 63 0.002 0.157 0.051 0.97 0.97 0.99
93 16.76 trudyh...@c... 378 95 0.008 0.140 0.056 0.87 0.84 0.80
94 16.79 jmbus...@h... 2539 4 0.003 0.170 0.020 0.99 0.98 0.92
95 17.30 hs 945 28 0.010 0.115 0.098 0.87 0.86 0.89
96 17.94 ChiefSweetums 691 43 0.010 0.185 0.054 0.68 0.68 0.81
97 21.49 Playa 414 85 0.010 0.239 0.014 0.93 0.90 1.00
98 31.56 Tribune 360 99 0.053 0.011 0.108 0.76 0.61 0.97
99 35.74 taintturk. (moderator) 1176 21 0.027 0.499 0.014 0.89 0.87 0.73
100 40.53 Taskmistress 698 42 0.017 0.755 0.020 0.91 0.91 0.96

Find the full list of Top 100 ranked here (Second Tab).

In Table 7 we can see how reviewers score on the RPI across all of the Top 100 reviewers. The Top 10 have great scores with SERI having the top ranked score with over 488 reviews written and no Flags or 1Bombs and only three 5Bombs. For SERI there is also no correlation between Fair or Comm with an amazingly negative correlation with Fast.

The worse 10 reviewers is much more interesting mostly due to tainturk a Turkopticon moderator and Tribune a former moderator being on the list. Everybody on the worse 10 list suffer from very high correlations between the other scales and Pay. Tainturk though also suffers from having 50% of his/her reviews being 1Bombs (for those reviews in which all of the scales were completed). This is not the worse as Taskmistress has 75% 1Bombs but this was surprising. Looking back at the early reviews I see that 1Bombs seem to be common earlier in Turkopticon and are intended to reflect a Amazon Terms of Service violation, something that has since been implemented.

Similarly Tibune has one of the highest flag count rates in the entire list with an expected numbe rof flags of 5% on his/her reviews. However, as Tribune was invited to be a moderator despite this spotted history, we can only assume that my rating system has some serious flaws.

Overall, I would therefore take the RPI ranking with a grain of salt. Perhaps some of the longer time contributors to Turkopticon are suffering from changing standard over time. If I have time I will revisit the rating system looking at only reviews within the last year or two.