Thursday, December 1, 2016

Efficiently Saving and Sharing Data in R

After spending a day the other week struggling to make sense of a federal data set shared in an archaic format (ASCII fixed format dat file).

It is essential for the effective distribution and sharing of data that it use the minimum amount of disk space and be rapidly accessible for use by potential users.

In this post I test four different file formats available to R users. These formats are comma separated values csv (write.csv()), object representation format as a ASCII txt (dput()), a serialized R object (saveRDS()), and a Stata file (write.dta() from the foreign package). For reference, rds files seem to be identical to Rdata files except that they deal with only one object rather than potentially multiple.

In order to get an idea of how and where different formats outperformed each other I simulated a dataset composed of different common data formats. These formats were the following:

Numeric Formats

Index 1 to N - ex. 1,2,3,4,...
Whole Numbers - ex. 30, 81, 73, 5, ...
Big Numbers - ex. 36374.989943, 15280.050850, 5.908210, 79.890601, 2.857904, ...
Continous Numbers - ex. 1.1681155, 1.6963295 0.8964436, -0.5227753, ...

Text Formats

String coded factor variables with 4 characters - ex. fdsg, jfkd, jfht, ejft, jfkd ...
String coded factor variables with 16 characters coded as strings
String coded factor variables with 64 characters coded as strings
Factor coded variables with 4 characters - ex. fdsg, jfkd, jfht, ejft, jfkd - coded as 1,2,4,3,2, ...
Factor coded variables with 16 characters
Factor coded variables with 64 characters
String variables with random 4 characters - ex. jdhd, jdjj, ienz, lsdk, ...
String variables with random 16 characters
String variables with random 64 characters

What type of format a variable is in is a predictive characteristic for how much space that variable takes up and therefore how time consuming that variable is to read or write. For variables that are easy to describe they tend to take up little space. An index variable in an extreme example and can take up almost no space as it can be expressed in an extremely compact format (1:N).

In contrast numbers which are very long or have a great degree of precision tend to have more information and therefore take more resources to access and store. String variables when filled with truly random or unique responses are some of the hardest data to compress as each value may be sampled from the full character spectrum. There is some significant potential for compression when strings are repeated in the variable. These repetitive entries can be either coded as a "factor" variable or a string variable in R.

As part of this exploration, I look at how string data is stored and saved when coded as either a string or as a factor within R.

Raw Files

Let's first look at space taken when saving uncompressed files.

Figure 1: File Size

File Size

Figure 1 shows the file size of each of the saved variables when 10,000 observations are generated. The dataframe object is the data.frame composed off all of the variables. From the height of the dataframe, we can see that rds is overall the winner. Looking at the other variable values we can see only that csv appear to consistently underperform for most file formats except for random strings.

Figure 2: File Size Log scaled

File Size Logged

In Figure 2 we can see that rds is consistently outperforming all of the other formats with the one exception of index in which the txt encoding simply reads 1:10000. Apparently even serializing to bytes can't beat that.

Interestingly, there does not appear to be a effective size difference between repetitive strings encoded as factors accounting for the size of the strings (4, 16, or 64). We can see that the inability of csv to compress factor strings dramatically penalizes the efficiency of csv relative to the other formats.

File Compression

But data is rarely shared in uncompressed formats. How does compression change things?

Figure 3: Zipped File Sizes Logged

File Size Zipped Logged

We can see from Figure 3 that if we zip our data after saving, the file size can do pretty much as well as rds. Comma delineated csv files are a bit of an exception with factor variables suffering under csv. Yet random strings perform slightly better under csv than other formats. Interesting rds files seem slightly larger than the other two file types. Overall though, it is pretty hard to see any significant difference in file size based on format after zipping.

So, should we stick with whatever format we prefer?

Not so fast. Sure, all of the files are similarly sized after zipping. This is useful for sharing files. But having to keep large file sizes on a hard drive is not ideal even if they can be compressed for distribution. There is finite space on any system and some files can be in the hundreds of MB to hundreds of GB range. Dealing with file formats and multiple file versions which are this large can easily drain the permanent storage capacity of most systems.

But an equally important concern, is how long it takes write and read different file formats.

Reading Speed

In order to test reading speeds, I loaded each of the different full dataframe files fifty times. I also tested how long it would take to unzip then load that file.

Figure 4: Reading and unzipping average speeds

File Size Zipped Logged

From Figure 4, we can see that read speeds roughly correspond with the size of files. We can see that even a relatively small file (30 MB csv file) can take as long as 7 seconds to open. Working with large files saved in an inefficient format can be very frustrating.

In contrast, saving files in efficient formats can dramatically cut down on the time taken opening those files. Using the most efficient format (rds), files could be 100 times larger than those used in this simulation and still open in less than a minute.

Conclusions

Finding common file formats that any software can access is not easy. As a result many public data sets are provided in archaic formats which are poorly suited for end users.

This results in a wide pool of software sweets having the ability to access these datasets. However, with inefficient file formats comes a higher demand on the hardware of end users. I am unlikely to be the only person struggling with opening some of these large "public access" datasets.

Those maintaining these datasets will argue that sticking with the standard, inefficient format is the best of bad options. However, there is no reason they could not post datasets in rds formats in addition to the outdated formats they currently exist in.

And no we need not argue that selecting one software language to save data in will be biased toward those languages. Already many federal databases come with code supplements in Stata, SAS, or SPSS. To access these supplements, one is required to have paid access to that software.

Yet, R is free and its database format is public domain. Any user could download R, open a rds or Rdata file, then save that file in a format more suited to their purposes. None of these other proprietary database formats can boast the same.

Tuesday, November 8, 2016

Trump and Clinton Supporters Agree on Relative Morality ... Mostly

When looking at the endless scandals swirling around the heads of the two rival candidates Hillary Clinton and Donald Trump, it can seem like the two candidates are equally tainted. Many people throw up their hands pleading for some other option.

How can we evaluate the alleged actions of these two candidates?

Is there some kind of objective way to do so?

And how does the decision to support a candidate affect the perception of an offense?

In order to attempt to address these questions I recruited 137 submissions on Amazon Mechanical Turk who submitted around 1100 five way "relative offense" rankings of five randomly matched actions from a list of 150. 46 submissions were from individuals supporting Clinton, 37 from individuals supporting Trump, and 54 other or not supplied. Ranking was from 1 "Least Offensive" to 5 "Most Offensive".

Within any item set of five actions only one action could be matched with an individual ranking.
For each action the mean number of times that action was classified under each ranking was calculated. That value was multiplied by the action ranking and summed across all levels to create an index.

The smallest index are the least offensive to non-offensive actions while the highest rankings are those actions respondents considered the most offensive.

Table 1: This table shows all 150 actions ranked from least offensive on average to most offensive as rated by all respondents. The Trump, Clinton, and Other columns are how each of these respective supporters rank the actions. A .5 means that two actions were ranked the same. The Trump_Clinton column is calculated by taking the rankings of Trump supporters for an action and subtracting the rank for those actions for Clinton supporters. Values in which Trump supporters and Clinton supporters diverge by more than 30 ranks are highlighted.

All	Trump	Clinton	Other	Trump_Clinton	*Scandal	Action
1	5	2	2.5	3		Eating meat.
2	1	2	15	-1		Posting a much younger picture of yourself on a dating website.
3	6.5	5.5	8	1		Not wearing your seat belt when driving.
4	15.5	2	6	13.5		Dating someone of a different race.
5	2.5	15	2.5	-12.5	Donald Trump	Swearing in public.
6	8.5	17	2.5	-8.5		Offering to pay someone 10 cents for every paper they deliver.
7	6.5	12	17.5	-5.5		Stealing candy from a baby.
8	12	25	10	-13		Jumping in line in front of someone else when waiting for customer service.
9	2.5	27	21	-24.5		Not tipping a waiter after average service.
10	4	14	33.5	-10		Hoard the armrest when sitting next to a stranger on a plane or movie theater.
11	60.5	5.5	5	55		Having sex with someone of the same gender.
12	27	8	38	19	Trey Radel	Buying illegal drugs for personal consumption.
13	25.5	11	28.5	14.5		Using toilet paper to vandalizing a stranger's house.
14	37	13	25	24		Taking the cab hailed for someone else.
15	29	27	12	2		Leaving chewing gum under a public table.
16	13	57.5	20	-44.5		Exaggerating the size of your penis in order to convince someone to have sex with you.
17	25.5	4	45	21.5		Illegally copying copyrighted music.
18	18	16	32	2		Not voting.
19	22	9	48	13		Illegally streaming copyrighted videos.
20	17	21.5	30.5	-4.5		Publishing with permission a different person's work as your own.
21	30	51.5	12	-21.5		Hunting a wild animal.
22	8.5	20	45	-11.5		Using a bad or ineffective preparedness test to screen potential students.
23	57.5	36.5	12	21		Sticking gum in someone's hair.
24	43	40.5	22	2.5		Having sex with your landlord to pay the rent.
25	20	7	50.5	13		Not flushing a public toilet after pooping.
26	46.5	36.5	14	10	Hillary Clinton	Continuing to stay married to your spouse after that person had multiple affairs.
27.5	32	33.5	19	-1.5		Paying a prostitute for sex.
27.5	10.5	18	27	-7.5		Spending money on something you do not need such as entertainment when you know there are people dying of hunger next door.
29	15.5	66	9	-50.5	Donald Trump	Earning 1000 times more money per hour as your lowest paid full-time employees.
30	46.5	39	7	7.5		Voting illegally twice.
31	10.5	29	40.5	-18.5	IRS	Using public funds to pay for professional development conferences which may not be very productive.
32	21	27	26	-6	Hillary Clinton	Refusing to release the transcripts of paid speeches you gave.
33	28	61	17.5	-33		Earning 1000 times more money per hour as your lowest paid contractor.
34	34	40.5	23	-6.5		Lying about damages in order to keep a tenant's security deposit.
35	63	64	2.5	-1	FBI	Selling guns with GPS trackers in them to a dangerous illegal organization with the intention of using the data to prosecute the organization.
36	19	19	65	0		Urinating on a toilet seat and not cleaning it.
37	74.5	61	24	13.5	Donald Trump	Misrepresenting your success to convince others to invest in you.
38	65	51.5	28.5	13.5	Donald Trump	Reposting images from a white supremacist group.
39	23.5	54	40.5	-30.5		Writing false review's on a business's website because you are angry with them.
40	49.5	31.5	30.5	18		Sneaking out of a resturant without paying for a meal.
41	14	21.5	81	-7.5		Copying a fellow student's homework.
42	35	57.5	39	-22.5	Hillary Clinton	A politician lying or falsely representing one quarter (25%) of his/her public statements.
43	32	36.5	58	-4.5		Stealing products from a convenience store. (Shoplifting)
44	55	77.5	42.5	-22.5	Donald Trump	Falsely reporting the magnitude of your donations to charities to make yourself look better.
45.5	52.5	45.5	52	7	Donald Trump	Publicly shaming someone for being fat or ugly.
45.5	81.5	69	16	12.5		Refusing to renounce the endorsement of a white supremacist group.
47	23.5	43	72	-19.5		Speeding through a school zone.
48	52.5	49	45	3.5		Promising special interest groups to vote for them if they donate to your campaign.
49	79	24	53	55		Masturbating in a public bathroom.
50	57.5	61	36	-3.5	Donald Trump	Lying about the number of floors in a building you own in order to charge higher rent.
51	41	71	60	-30		Using spray paint to vandalize a stranger's car.
52	41	36.5	68	4.5		Smoking in an area around non-smokers where "no smoking" signs are posted.
53	74.5	42	48	32.5		Spitting in someone's soup when they are not looking.
54	41	64	55.5	-23		Publishing without permission a different person's work as your own.
55	44	112	50.5	-68		Publicly renouncing homosexuality while secretly having sex with gay prostitutes.
56	74.5	48	48	26.5		Publicizing false claims about a person because you are angry with that person.
57	66.5	47	55.5	19.5	Chris Christie	Using your position to control how public funds are used in to punish a political rival.
58	32	51.5	63	-19.5		Providing alcohol to minors (a person less than 18 years old).
59	92	44	35	48		A public official hiring a less qualified friend over a well qualified stranger.
60	74.5	56	42.5	18.5		A fully mobile person refusing to give up a seat to a disabled or infirm person.
61	46.5	93	33.5	-46.5		Publicizing someone's address in an attempt to intimidate someone else.
62	39	10	89	29	Donald Trump	Judging a person on their appearance.
63	52.5	68	61	-15.5		Selling illegal drugs.
64	74.5	31.5	75	43	Hillary Clinton	Improperly storing national secrets.
65	52.5	64	77.5	-11.5	Donald Trump	A politician lying or falsely representing three quarters (75%) of his/her public statements.
66	81.5	30	70	51.5	Donald Trump	Enjoying firing people from their occupation.
67	68	86.5	54	-18.5		Stealing large sums of money from a company you work for.
68	46.5	100	57	-53.5	Donald Trump	Earning over 10 million dollars in a year and not paying any federal income taxes.
69	60.5	89	64	-28.5	Donald Trump	Filing bankruptcy in order to avoid paying the people who worked for you.
70	63	51.5	83.5	11.5		Sabotaging a competitors work.
71	81.5	73	73	8.5		Setting up fake accounts for your customers in order to increase profits.
72	81.5	23	93	58.5		Publicly exposing yourself.
73	100.5	101	37	-0.5	Donald Trump	Enter the occupied changing room of members of the opposite sex without consent.
74	49.5	83	77.5	-33.5	Clinton/Trump	A politician accepting money from special interest groups.
75	37	74	99	-37		A police officer accepting a bribe in exchange for not writing a ticket.
76	106.5	81	67	25.5		Stealing someone's car.
77	95	95.5	66	-0.5		Driving drunk or high.
78	66.5	77.5	80	-11	IRS	Using public funds designated for tax collection to produce a parody video.
79	57.5	92	77.5	-34.5		Using racial epitaphs.
80	57.5	33.5	94	24		Cheating on a test.
81	90	59	74	31		Not giving food to a starving person in front of you.
82	63	85	62	-22		A politician lying or falsely representing half (50%) of his/her public statements.
83	95	72	86.5	23	Anthony Weiner	A married person sending photographs of his/her genitalia to someone who is not that person's spouse.
84	111.5	67	82	44.5		Giving a blind person inaccurate change because they cannot tell the difference.
85	106.5	84	71	22.5		Urging someone who is attempting to remain sober to take a drink.
86	93	70	83.5	23		Without authorization setting up and using a credit card in another person's name.
87	100.5	45.5	106	55		Masturbating in public.
88	84.5	86.5	77.5	-2		Using public resources entrusted to your care to enrich yourself.
89	108	75	69	33		Mocking a disabled person for a physical handicap.
90	84.5	77.5	85	7		Bribing a public official to enrich oneself.
91	109	105.5	59	3.5	Donald Trump	Setting up a fake educational institution in order to enrich yourself.
92	37	94	111.5	-57		Torturing criminals as punishment for crimes.
93	90	88	95	2	Bill Clinton	Cheating on your wife/husband.
94	69.5	102	91	-32.5		Stalking someone you know in order to intimidate that person.
95	88	55	111.5	33	Doctor Kevorkian	Killing someone who is in pain and going to die within the next six months and wants help dying.
96	98	90.5	86.5	7.5	Tonya Harding	Hiring someone to break the leg of a rival athlete.
97	74.5	90.5	101	-16		Using threats of lawsuits to silence a woman who claims to have been assaulted by you.
98	114	82	98	32	Hillary Clinton	Using a private email server resulting in the risk of compromised national security.
99	87	97	107.5	-10		Writing laws to prevent people who do not support you from voting.
100	74.5	107	105	-32.5		Requiring a starving person attend your religious gathering before providing giving food.
101	97	98.5	104	-1.5	Donald Trump	Using your superior physical strength to force a person to kiss you.
102	86	103.5	107.5	-17.5		Stealing someone's identity in order to commit a crime.
103	103	80	113.5	23		Burning down your house to claim the insurance money.
104	111.5	114	90	-2.5		Using public resources entrusted to your care to enrich an ally or friend.
105	100.5	115	97	-14.5		Preventing someone from registering their child in your school because of that person's skin color.
106	116	109	100	7	Adolf Hitler	Encouraging racially motivated violence to promote your political aspirations.
107	104	120	102	-16		Bribing a public official to protect oneself.
108	74.5	118	115	-43.5		Rejecting someone's rental application because of that person's skin color.
109	90	124.5	109	-34.5	Bill Cosby	Sneaking a drug into someone's food or drink in order to force that person to have sex with you.
110	119	128	88	-9		Killing a healthy and well behaved pet for because you don't have to the time to take care of it.
111	100.5	105.5	110	-5		Stealing someone's needed pain medication for personal use.
112	139.5	123	96	16.5		Killing a healthy and well behaved pet because you find the animal annoying.
113	143	103.5	92	39.5	Edward Snowden	Publicizing national security secrets which might result in lives being lost.
114	114	95.5	121.5	18.5	Donald Trump	Threatening to sue someone in order to keep them from telling the truth.
115	132	124.5	103	7.5		A person of influence encouraging a crowd to physically attack someone or a group of people he/she does not like.
116	69.5	117	127	-47.5	George W Bush	Torturing terrorists with the hope of gathering information about future terrorist plots.
117	121	77.5	143.5	43.5		Ordering the assassination of a dictator.
118	105	120	126	-15		Recording someone having sex without their knowledge.
119	114	122	121.5	-8		Restricting the use of life saving technology in order to increase profits.
120	136	108	116.5	28		Misrepresenting the effectiveness of a life saving drug in order to increase profits.
121	133	98.5	131	34.5		Killing someone who is in pain and going to die within the next six months but does not want to die.
122	117	110	146.5	7		Not reporting an instance of known child abuse when you are legally mandated to report.
123	130.5	136.5	113.5	-6		Physically abusing your spouse.
124	128.5	112	125	16.5		Sending someone's spouse to the front line to die in order to marry the surviving window.
125	122	135	120	-13		Hiding information about the health risks of deadly product you sell.
126	124.5	134	123.5	-9.5		Stealing large sums of money from a mentally infirm client.
127	110	144	135	-34		A medical doctor refusing to treat a patient who needs urgent care because they are unable to pay.
128	119	129	123.5	-10		Significantly raising the price of a life saving drug in order to increase profits.
129	139.5	131	116.5	8.5		Viewing child pornography.
130	141.5	138	118	3.5		Killing a healthy and well behaved pet for fun.
131	119	143	128	-24		Someone with a known dangerous sexually transmitted disease having unprotected sex with someone who is unaware of the condition.
132	123	133	131	-10		Using your superior physical strength to hold an unwilling person while you grabbed their genitalia.
133	128.5	116	131	12.5	Brock Turner	Taking and sharing naked pictures of a person who is unconscious.
134	124.5	131	136.5	-6.5	Brock Turner	Have involuntary sex with a person you find unconscious.
135.5	130.5	112	136.5	18.5	Vladimir Putin	Assassinating a political rival.
135.5	127	145	129	-18		Physically forcing your spouse to have sex with you.
137	95	149.5	141.5	-54.5	Harry S Truman	Dropping an atomic bomb on a civilian city controlled by a rival nation
138	148.5	139	119	9.5		Allowing the executing of a convict if you have secret knowledge of the person's innocence.
139.5	138	120	149	18		A 40 year old having sex with a fifteen year old.
139.5	148.5	136.5	148	12		Using your power to pressure an employee to have sex with you.
141	137	140	141.5	-3		Not reporting an instance of known child sexual abuse.
142	148.5	126	138	22.5		Killing innocent people to further your political agenda.
143	144.5	142	139.5	2.5	Adolf Hitler	Encouraging people who follow you to attack and kill people you do not like.
144	144.5	127	139.5	17.5		Killing your spouse to claim the life insurance premium.
145	134.5	141	146.5	-6.5		Killing someone for their money.
146	126	131	145	-5	George Washington	Possessing a personal slave who has no human rights.
147	134.5	147	134	-12.5	Donald Trump	Groping without warning the genitalia of someone else.
148	148.5	146	133	2.5		Making child pornography.
149	141.5	148	143.5	-6.5		Killing innocent people for personal reasons such as curiosity or entertainment.
150	146	149.5	150	-3.5		Killing somebody to protect your reputation.

* Please note that I am making no claim to the veracity of the scandal.
** Upon viewing this list I realize that there are a number of scandals I forgot to include related to Hillary. I am unconvinced that any of them would have been ranked very high on the list. However they should have been included. Sorry team Trump.

General Differences
Looking at the table we can see that generally it agrees with out intuition with innocuous or generally non-offensive actions being ranked at the top, while actions that involve doing significant harm to others being ranked at the bottom.

Interestingly the rankings between Trump and Clinton supporters agree generally with a 84% correlation in ranking values.

From the differences in rankings of actions between Clinton and Trump supporters we start to get an idea on how these two different groups think. The largest difference is 68 ranks of difference with Trump supporters much more accepting than Clinton supporters of "Publicly renouncing homosexuality while secretly having sex with gay prostitutes". Clinton supporters seem less tolerant of deception in general with exaggerations of penis size and the writing of false reviews on business webpages being ranked much more offensive than their Trump counterparts.

Clinton supporters also find it generally more offensive to do harm to others such as dropping the atomic bomb on civilian populations and torturing terrorists or criminals.

The two camps have fiercely different perspectives on money. Clinton supporters are very concerned with the perceived injustices caused by unequal access to resources. These supporters are much more prone to rank as more offensive inequality in earnings as well as politicians accepting public donations. These supporters find it much more offensive for a doctor to refuse treatment on the basis of insufficient funds.

There also appears to be a difference in how concerned Trump supporters are with racial inequality with Trump supporters much less concerned with the act of rejecting an applicant due to the color of skin or minding the use of racial epitaphs.

Scandals
Perhaps unsurprisingly for the largest scandals Trump supporters and Clinton supporters seem to disagree with how offensive the actions of their candidates are. Clinton supports find the surprise groping of genitalia one of the worse actions someone can take while Trump supporters place it a bit lower on the list. For Trump supporters, improperly storing national secrets and using a private email server are ranked as much more offensive than for Clinton supporters.

Summary
While Clinton and Trump supporters mostly agree in generally how they rank objectionable actions, they do seem to disagree in some areas that seem consistent with differences in popular representation. Clinton supporters being concerned with economic, social, and political justice. Trump supporters being concerned with protecting economic rights as well as individual freedoms such as the right to offend others through word or deed.

Thursday, April 14, 2016

Calculating Average Consumption From One Week of Purchases

A number of large surveys have attempted to quantify consumer consumption from a limited period

of time observed. This task can be fairly complex as it is fraught with potentially large difficulties directly observing who is consuming what. Rather than this expensive method some researchers have attempted to substitute more easily observed purchase patterns inferring that in general house holds are going to consume what they purchase.

In order to aid in this analysis researchers collect data on both what is purchased and over what period of time it is to be consumed, for instance today (1) or over the next week (7).

Yet purchase patterns can be difficult to work with. Typically household consumption does not map perfectly to household consumption. For one, households can consume stocks from previous weeks. Likewise, households can purchase food to be held in stock for future weeks.

In order to adjust for missing consumption levels we want to adjust consumption to account for both the food items that will not all be consumed the week of observations
(1)
$$ C_{current.purchases} = C_{purchase} \frac{days.remaining.in.observation.period}{days.expected.to.consume}$$

as well as the food items that were purchased the previous week and consumed this week. We can calculate the probabilities of observing an individual outcome in the following way:
(2)
$$ P_{observing.purchase} = \frac{observation.period}{days.expected.to.consume}$$

We can note that the probability of observing a particular purchase if greater than 1 need only be set to one since if this is the case it is likely that this particular purchase will appear one or more times in our data.

Now we can combine (1) and (2) by dividing the current purchases by the likelihood of observing those purchases.

(3)
$$ E(C_{current.purchases}) = C_{purchase} \frac{days.remaining.in.observation.period}{days.expected.to.consume}/\frac{observation.period}{days.expected.to.consume}$$
$$=C_{purchase} \frac{days.remaining.in.observation.period}{observation.period}$$

If the probability is less than 1 otherwise we can use equation (1).

Finally in order to calculate average consumption we take the daily average for our estimated expected consumption levels? Right?

Not even close. This only begins to capture the problem as we have multiple purchases often on different days consumed in different patterns throughout the week.

In order to get us closer to the appropriate level of estimated consumption we need to both infer the missing consumption as well as spread out the observed consumption so that when we look at daily averages good A purchased on day 1 with an expected consumption period of 1 week will also be included with good B purchased on day 7.

In order to explore how to estimate consumption from only observing a limited period of time I have written a simulation testing four methods of estimation. The true consumption level for any individual is 1 unit. If there are multiple goods consumed than that 1 unit of consumption is spread across all goods so that every day only one unit is consumed.

Using only one good we get the following results. M1 is just taking the mean consumption if we divide quantity of goods purchased by number of days expected to consume. M2 is adjusting consumption by the inverse of the likelihood of observing that consumption. M3 is spreading consumption across all of the days of the week observed. M4 is both adjusting by likelihood of observations and spreading consumption across days of the week observed.

Table 1:Sim # is the simulation number while # Items is the number of different food items purchased while C Spread is the number of days consumption of that item is spread over. All values are simulated 250 times.

Sim	# Items	C Spread	M1	M2	M3	M4
1	1	1	1.00	1.00	1.00	1.00
2	1	2	1.00	1.00	1.00	1.00
3	1	3	1.00	1.00	1.00	1.00
4	1	5	1.00	1.00	1.00	1.00
5	1	6	1.00	1.00	1.00	1.00
6	1	7	1.00	1.00	1.00	1.00
7	1	8	0.88	1.01	0.88	1.01
8	1	9	0.74	0.95	0.74	0.95
9	1	10	0.70	1.00	0.70	1.00
10	1	15	0.42	0.89	0.42	0.89
11	1	20	0.35	1.01	0.35	1.01

Notice that with only 1 item consumed M1 and M3 are equivalent and M2 and M4 are equivalent. We can see that expected consumption for M2 and M4 provide much better estimates than for M1 and M3 when the consumption is spread out for goods for more than the observation period of one week on average.

Things get much more difficult when we include other goods in our calculation.

Table 2: Equivalent to Table 1 except now multiple items are being purchased at different periods (identified as # Items). In this the C Spread only refers to the first item. The remaining items are drawn randomly from the possible consumption spreads with much greater weight applied to lower consumption levels.

Sim	# Items	C Spread	M1	M2	M3	M4
12	2	1	0.72	0.72	0.90	0.91
13	2	2	0.71	0.71	0.90	0.91
14	2	3	0.71	0.71	0.91	0.92
15	2	5	0.64	0.65	0.87	0.89
16	2	6	0.66	0.67	0.83	0.84
17	2	7	0.60	0.60	0.80	0.81
18	2	8	0.59	0.62	0.78	0.83
19	2	9	0.59	0.65	0.74	0.83
20	2	10	0.58	0.65	0.70	0.81
21	2	15	0.58	0.71	0.64	0.83
22	2	20	0.54	0.69	0.59	0.83
23	3	1	0.62	0.62	0.86	0.88
24	3	2	0.61	0.62	0.88	0.89
25	3	3	0.58	0.59	0.87	0.88
26	3	5	0.53	0.53	0.83	0.84
27	3	6	0.55	0.55	0.81	0.82
28	3	7	0.50	0.51	0.78	0.79
29	3	8	0.52	0.53	0.78	0.82
30	3	9	0.52	0.54	0.75	0.81
31	3	10	0.50	0.54	0.72	0.80
32	3	15	0.49	0.55	0.68	0.82
33	3	20	0.47	0.53	0.63	0.76
34	4	1	0.58	0.58	0.84	0.85
35	4	2	0.55	0.55	0.85	0.86
36	4	3	0.52	0.52	0.84	0.85
37	4	5	0.48	0.48	0.81	0.82
38	4	6	0.49	0.49	0.80	0.82
39	4	7	0.46	0.46	0.77	0.78
40	4	8	0.46	0.48	0.76	0.79
41	4	9	0.48	0.49	0.76	0.81
42	4	10	0.48	0.50	0.73	0.79
43	4	15	0.45	0.49	0.70	0.79
44	4	20	0.44	0.47	0.68	0.78
45	5	1	0.56	0.57	0.85	0.86
46	5	2	0.52	0.52	0.84	0.85
47	5	3	0.49	0.49	0.83	0.84
48	5	5	0.45	0.45	0.79	0.80
49	5	6	0.45	0.46	0.78	0.79
50	5	7	0.45	0.45	0.78	0.79
51	5	8	0.44	0.45	0.76	0.78
52	5	9	0.44	0.45	0.76	0.79
53	5	10	0.44	0.45	0.74	0.79
54	5	15	0.44	0.46	0.71	0.77
55	5	20	0.42	0.45	0.69	0.76
56	6	1	0.52	0.53	0.83	0.85
57	6	2	0.49	0.49	0.83	0.85
58	6	3	0.47	0.48	0.82	0.83
59	6	5	0.44	0.45	0.79	0.81
60	6	6	0.44	0.45	0.80	0.81
61	6	7	0.43	0.44	0.78	0.79
62	6	8	0.45	0.46	0.76	0.79
63	6	9	0.44	0.45	0.77	0.80
64	6	10	0.42	0.44	0.73	0.78
65	6	15	0.43	0.45	0.74	0.82
66	6	20	0.41	0.43	0.70	0.76

When consuming multiple items simultaneously, the importance of spreading consumption out across all days observed becomes increasingly important. This is because daily consumption need be calculated as the sum of goods consumed that each day averaged across the number of days observed. Thus we see that while in Table 1 M2 does very well. In Table 2 M3 and M4 do much better than either M1 or M2 and M4 does slightly better than any other method at approximating total consumption.

Figure 1: Estimator performance given different item consumption spreads. The above values are for the estimator value averaged across between 1 and 6 items consumed with only the first item being at that particular spread value. M is method 1 through 4 described above.

It is worth noting that all of the methods underestimate total consumption though M4 does the best at adjusting for the missing data problem.

There are some things to consider when estimating consumption data in this way. One important thing is that if consumption tends to be for goods consumed over a long period of time then using anything but directly dividing by the period of time expected to be consumed over is going to give some pretty lumpy values.

For instance, imagine someone buys four liters of oil which they expect to consume over the next 30 days. Sure on average in order to account for the oil not observed for the many other similar people who bought their oil on previous periods you may want to divide the oil not by the thirty days (4L/30days) but by the probably adjustment value equation (3).

Thus you get (4L/7days). Averaging across four similar people who did not happen to purchase oil you approximate the population consumption level. (1L/7days*1/4=1/28). Thus on average for the population estimate, you are pretty close. However for that one guy in your data you now have one person who looks like they are consuming 4/7 of a liter of oil per day.

When screening your data for outliers this oil consumption positively pops out of the page at you. So you figure it is some kind of recording error and replace it using population estimates.

But the problem here is entirely created by the method used to infer consumption levels. If instead you had taken the consumer at his/her reported level and said that average consumption for that individual is 4L/30days or 2/15 liters per day then you would never need to substitute out this particular outlier because it would not exist in the data in the first place.

If you would like to review the R simulation used to generate these results you can find it here.

Thursday, February 18, 2016

Legally Rig An Election: A Citizen's Guide to Gerrymandering

You are running for class president against a pimpled-nosed, blond barbarian. You have given your best speech and your obnoxious opponent has given his best speech. The teacher is about to call on the class to vote! The time of reckoning is upon you.

As she is just announcing a hand raising in support of your opponent, you count in your head: three of your friends clearly in favor of you, three of your opponents friends, and three undecided classmates in the middle, each with a 50% chance roughly of voting for you.

If the vote is held now it will all be up to those three undecided/independent voters who gets elected. But wait! You have an idea, you shout out! "Wait wait! I have a fun idea. How about we vote in three groups of three? The winner of the most groups wins the election", you suggest.

The teacher, unaware of your wiley ways, shrugs.

You act quickly to divide the room into three groups of three.
1. All three of your opponents friends you place in one group.
2. Two of your friends you place in another group with one undecided.
3. The remaining friend you place with the two remaining undecided.

You signal happily to the teacher, now you are ready for the votes to be cast. Remember, previously, you calculated the chance of winning as 50-50. Now, you calculate the chance of winning as 75%

Really?

Well, let's count the new probable outcomes for each voting group/district.
1. The group with all of your opponent's friends will vote for him.
2. The group with two of your friends will 50% vote 3/3 for you and 50% vote 2/3 for you which means they will go for you.
3. The final group with one of your friends and two undecided is where the action is. You know your friend will vote for you. So there are two remaining random votes. There is a 25% chance they will both vote for you so you get 3/3 making you win. There is also a 50% change only one of them will vote for you 2/3 making you win. Thus there is only one outcome remaining in which you lose. That is if both undecided voters vote against you. This only happens 25% of the time (50% x 50%).

From this voting system you may have noticed something. By grouping classmates in this way, it is possible to win the class election without getting the majority of the votes. To see this, let's imagine the independent in group 2 votes against you and one of the independents in group 3. The total votes against you is 5 while the votes for you are 4. But because you carefully constructed the groups. You win two groups out of three and still win the election.

At this point, the teacher would not be happy but it is what she agreed to so...tough luck!

So how did this happen?

There is a few ways to look at this. One is that before Gerrymandering (regrouping into districts) you had three people who could vote against you. Now, you have stuck one of those voters with two sympathetic voters causing that vote to no longer count. All that remains is two voters who both need to vote against you in order to counteract the effect having your friend in Group 3.

This ideal of rigging a class election may seem absurd but it is exactly the kind of thing establishment figure do when they rewrite voting district lines to include or exclude groups depending upon how they affect the likely voting outcome.

Redistricting though often involves the votes of thousands of people. The concept though is the same.

I have written a small simulation showing how effective Gerrymandering can be on slightly larger scales. Within the simulation I set up 10 voting districts. In each of these districts there are 40 voters. Each district goes to whoever wins the most votes within that district. A total win is calculated by whoever wins the most districts.

Figure 1: A grid display of 400 voters when there is no Gerrymandering. Each square represents a different voter. Lighter colored squares have a higher likelihood of voting for you. Darker colored squares have a higher likelihood of voting against you.

Looking at Figure 1, we see that without Gerrymandering there are naturally some districts that seem more likely to vote for us and some that seem less. The method of gerrymandering I propose in order to try to increase the likelihood of us winning is a simple method of reorganizing the vote. First we sort all of the individuals by likelihood of voting for us. Next we group those who are the least likely to vote for us into "Gerrymandered districts". The rest we distribute randomly. Of course we should not really be calling any district in particular Gerrymandered because really the whole population has been Gerrymandered.

Figure 1: A grid display of 400 voters where the first two districts have been "Gerrymandered". Each square represents a different voter. Lighter colored squares have a higher likelihood of voting for you. Darker colored squares have a higher likelihood of voting against you.

Using this simple method of Gerrymandering for two districts, we can see extreme changes in likely voting outcomes for Districts 1 and 2 (Figure 2). What is less easy to observe but even more important is the subtle changes in Districts 3 through 10. Compared with Figure 1, these districts have now lost the voters previously much more likely to vote against you. This makes each of these 7 districts slightly more likely to vote for you.

The net affect is a certain loss in Districts 1 and 2 but a better than prior outcome in the remaining districts (you actually almost always win in this scenario if the popular vote is split 50-50). In order to calculate expected outcomes, I repeat the simulation 200 different times under each voter preference scenario (lean) and gerrymandering scenario and average the number of wins across simulations.

Figure 3: Within each number of districts Gerrymandered outcomes are simulated for 200 simulation runs. The orange line that crosses the 50% mark at 0 indicates that this is the expected outcome if there is no Gerrymandering happening.

From Figure 3 we begin to see how effective gerrymandering can be on the likelihood of winning. By using our method to gerrymander districts we seem to gain about 4 percentage points for each district we gerrymander up until we have four districts "gerrymandered". At the most favorable outcome, when we have gerrymandered four districts, we have effectively gained a 16 point lead against our rival. This means the popular vote could be 58-42 in favor of our rival and we could still win the outcome about half the time.

Something interesting happens at the fifth district gerrymandered. We have now gerrymandered half of the districts. At this last step we now are penalized for gerrymandering by 5 points. I do not have an easy explanation at this time.

Conclusion
In reality the practice of gerrymandering is much more complicated and subtle than this. There are typically restrictions on how you can group individuals, usually based on geography. Likewise you don't know how individuals are going to vote but you may have a pretty good idea how certain demographics are likely to vote. This complicates the methods. But I am sure there are "social engineering" or "vote engineering" firms out there that are able to exploit some of the strategies outlined here and other strategies in order to maximize the effect of gerrymandering.

That said, the classroom example and the simulations above capture the essence of gerrymandering. Gerrymandering is the practice of grouping voters together in such a way as to prevent those who are voting against you from having a vote. As such, gerrymandering is an enemy to democracy.

It is typically used by establishment candidates to insulate themselves from challengers. This allows establishment figures to feel that their seat is safe even when they accept private or political payoffs for voting consistently in ways which are against the good of their constituency.

(Code on Github)

Tuesday, January 19, 2016

Who are Turkopticon's Top Contributors?

In my most recent post "Turkopticon: Defender of Amazon's Anonymous Workforce" I introduced Turkopticon, the social art project designed to provide basic tools for Amazon's massive Mechanical TURK workforce to share information about employers (requesters).

Turkopticon, has a been a runaway success with nearly 285 thousands reviews submitted by over 17 thousand reviewers since its inception in 2009. Collectively these reviews make up 53 million characters which maps to about 7.6 million words as 5 letters per average word plus two spaces. At 100 words every 7 minutes this represents approximately 371 days collectively spent just writing reviews. It is probably safe to considered this estimation an underestimation.

So given this massive investment of individuals in writing these reviews, I find myself wanting to ask, "who is investing this kind of energy producing this public good?"

In general, while there are many contributors, 500 contributors represent 54% of the reviews written. With the top 100 reviewers making up 30% of the reviews written and the top 15 representing 11% of all reviews written.

Figure 1: Using this graph we can find the Gini coefficient for number of submissions at around 82% indicating that a very few individuals are doing nearly all of the work.

Within Turkopticon there is no ranking system for reviwer quality so it is not obvious who are the top contributors and what their reviewing patterns look like. In this article we will examine some general features of the top contributors.

Table 1: A list of the Top 15 Turkopticon review contributors. Rank is the reviewer rank by number of reviews written. Name is the reviewer's name. Nrev is the number of reviews written. DaysTO is the number of days between the oldest review and the most recent review. Nchar is the average number of characters written in each review. FAIR, FAST, PAY, and COMM are quantitative scales that Turkopticon requests reviewers rank requesters by. Fair indicates how the requester was at either rejecting or failing to reject work. Fast indicates how quickly the requester approved or rejected work. Pay indicates how the reviewer perceived the payment scheme for work was. And Comm refers to communication which indicates, if the worker attempted to communicate with the requester, how well that requester addressed the worker's concerns.

Rank	Name	Nrev	DaysTO	Nchar	FAIR	FAST	PAY	COMM
1	bigbytes	5236	294	219	4.89	4.99	2.74	3.26
2	kimadagem	3732	327	490	4.98	4.95	3.23	2.45
3	worry	2637	649	186	4.97	4.87	3.29	3.84
4	jmbus...@h...	2539	538	110	3.10	3.05	3.08	1.55
5	surve...@h...	2488	177	344	4.85	4.77	4.13	4.27
6	jaso...@h...	2100	260	78	4.98	4.90	4.78	4.73
7	shiver	1721	303	139	4.94	4.89	4.44	3.81
8	Thom Burr	1594	434	288	4.69	4.81	4.54	3.52
9	jessema...@g...	1539	467	157	4.96	4.70	3.64	4.00
10	absin...@y...	1320	309	75	4.97	4.91	3.99	3.78
11	Rosey	1313	634	101	4.80	4.76	4.35	4.18
12	CaliBboy	1281	83	201	4.02	4.07	2.71	3.84
13	ptosis	1278	367	110	3.00	3.04	2.89	3.29
14	NurseRachet (moderator)	1274	669	351	4.76	4.70	3.91	3.72
15	TdgEsaka	1234	523	258	4.75	4.81	3.73	3.00

Find the full list as a google document here (First Tab).

From Table 1 we can see that all of the top 15 reviewers have contributed over 1,200 reviews with bibytes being the most prolific reviewer contributing over 52 hundred. In terms of the reviewer active on Turkopticon the longest, NurseRachet (a forum moderator) has been on the longest followed by worry and Rosey. In terms of the longest winded kimadagem has the longest average character count per review at 490 characters or approximately 70 words per review while CaliBboy has the shortest reviews at only 75 characters or around 10 words.

In terms of the averages the four rating scales there is a fair bit of diversity between the top reviewers with jaso...@h.. having the highest average score between the four scales of 4.8 and jmbus...@h... having the lowest average scores, around 2.7 followed by ptosis with a average a tiny bit higher than 3.

So now we have a pretty good idea of what in general the top contributors to Turkopticon look like.

But what of the quality of the contributions?

In order to understand what a quality contribution in Turkopticon looks like we must consider the standards that the community has come up with after years of trial and error.
1. The four different scales should be distinct categories. That is a high pay rate should not cause someone to automatically rank a high Fairness or visa versa.
2. To this end what is referred as 1-Bombs an attempt to artificially drop a requesters score by ranking all scales 1 should be avoided. Similarly, 5-Bombs should also be avoided.
3. Within Turkopticon there is also the ability to flag reviews as problematic. If one of your reviews is flagged, it means someone has a problem with it.
4. In general we would like reviews to be approached with a level head so that reviewers write independent reviews rather than ones based on their current mood.
5. Finally, in general we would like reviewers to review as many categories as they can when writing reviews.

Variables
From these 5 guidelines, I will attempt to generate variables that measure each of these targets.
1. For different scales I will focus on the relationship between pay and the other three scales for individual requesters (FairPay, FastPay, and CommPay for the correlations between Fair, Fast, and Comm with pay respectively). The reason I focus on Pay is that it seems to be the scale often times that concerns Mturk workers the most.

Table 2: For reviewers the average correlation between Pay and other scales.

	ALL	Top 100	Top 15
FAIRPAY	0.80	0.56	0.44
FASTPAY	0.73	0.48	0.37
COMMPAY	0.81	0.67	0.61

From Table 2 we can see that the average reviewer has a very strong positive correlation between Pay and the other scales with FAIR, FAST, and COMM in the .73-.81 range. In contrast the Top 100 and especially the Top 15 all have much lower correlations. We should not necessarily hope for a zero correlation between these factors since one might expect a requester who pays too low might also act unfairly, not respond quickly to submissions, or have poor communication habits.

2. 1-Bombs and 5-Bombs are easy to observe in the data in terms of all 1s or all 5s. However, it is worth noting that all of either 1s or 5s might actually be a valid review given the circumstances. Variables 1Bomb and 5Bomb will be a variable measuring the likelihood that an individuals review will be either of the two categories.

3. Flags are also a variable that can be directly observed. Multiple flags can be featured on a single review. The highest flag hit in my data has 17 flags. The variable FLAG is the average/expected number of flags for an individual reviewer's reviews.

Table 3: The prevalence rates of 1-Bombs, 5-Bombs, and Flags.

	ALL	Top 100	Top 15
1BOMB	0.192	0.038	0.019
5BOMB	0.179	0.049	0.025
FLAGS	0.014	0.005	0.005

From Table 3 we can see the prevalence rates of 1-Bombs, 5-Bombs, and Flags is much higher among the general reviewers than that of the Top 100 and especially among the top 15.

4. In order to attempt to measure "level-headedness" I will just look at how reviews trend from a rating perspective. That is, is the value of the current review correlated (either positively or negatively) with the value of the next review?

Table 4: The auto-regressive one step correlation between review levels. In this case the "ALL" category only includes the 3,700 reviewers who have written more than 10 reviews.

	ALL	Top 100	Top 15
FAIRar1	0.00	0.10	0.10
FASTar1	0.00	0.09	0.12
PAYar1	0.02	0.10	0.07
COMMar1	-0.07	0.04	0.04

From Table 4 we can see that inter-review correlation is pretty small especially when compared with the correlation between pay and other scales within the same review (Table 2). Interestingly for the average reviewer, there is almost no correlation across reviews. This might be a result of reviewers writing less reviews in general, thus spacing them more widely and therefore less likely to be sequentially influenced by personal psychological trends.

5. Finally in terms of completeness we can easily measure completeness in terms of how frequently reviews of individual scales were not completed.

Table 5: The completion rates of individual scales.

	ALL	Top 100	Top 15
FAIRC	0.849	0.665	0.705
FASTC	0.825	0.651	0.695
PAYC	0.901	0.916	0.918
COMMC	0.605	0.147	0.081

From Table 5 we can see that the completion rates of all scales are more or less equivalent between that of the general reviewers and that of the Top 100 and Top 15 except in the case of COMM. In this case we can see that the top reviewers are much less likely to rate communication.

Constructing A Quality Scale

In order to construct the best scale given our data, we will choose those variables and values that seems to typical of the top 15 most prolific reviewers. From Tables 2 and 3 we can see very distinct differences between the average reviewer and top reviewers. However, for our auto-correlation and completeness rates we see very little differences in general except that the top reviewers are much less likely to rate communication. I can't know exactly why this is the case but I suspect it is a combination of top reviewers avoiding 1-Bombs and 5-Bombs perhaps in combination with top reviewers finding it not typically worth their time to directly communicate with requesters.

So here is my proposed index using standardized coefficients (x/sd(x)):
ReviewerProblemIndex = 3*Flag + 3*1Bomb + 1/2*5Bomb +
1*FairPay + 1*FastPay + 1*CommPay

Because we have standardized the coefficients we can read the scalars in front as directly representing the weight of that variable. Flags, I will weight the strongest as they are an indicator that someone in the community has a problem with the review. Next highest rating are 1Bombs which are widely regarded as a serious problem and frequently discussed on the Turkopticon forum.

5Bombs, FAIRPay, FastPay, and CommPay are also discussed but not considered as important (Turkopticon Discuss). I have caused the 5Bombs to be half as important as FairPay, FastPay, and CommPay variables as it seems cruel to penalize someone for being generous with reviews.

So let's apply our index and see how our top 15 reviewers score!

Table 6: The top 15 most prolific contributors ranked based on the ReviewerProlemIndex (Index, RPI). IRank is the ranking of reviewers in terms of the RPI. Name is reviewer name. Nrev is the number of reviews written. Rank is the reviewers ranked in terms of number of reviews written. The other variables are described above.

IRank	Index	Name	Nrev	Rank	Flag	1Bomb	5Bomb	FairPay	FastPay	CommPay
1	1.9	jessema...@g...	1539	9	0.001	0.001	0.016	0.12	0.09	0.20
2	2.1	kimadagem	3732	2	0.002	0.000	0.014	0.05	-0.01	0.27
3	3.2	worry	2637	3	0.000	0.003	0.006	0.11	0.11	0.53
4	3.5	absin...@y...	1320	10	0.000	0.000	0.007	0.24	0.13	0.55
5	3.5	bigbytes	5236	1	0.001	0.000	0.007	0.20	0.04	0.54
6	4.0	surve...@h...	2488	5	0.001	0.001	0.008	0.32	0.29	0.34
7	6.4	shiver	1721	7	0.001	0.005	0.015	0.50	0.33	0.76
8	6.6	jaso...@h...	2100	6	0.001	0.004	0.070	0.41	0.27	0.83
9	10.9	Thom Burr	1594	8	0.002	0.013	0.030	0.87	0.84	0.92
10	11.0	Rosey	1313	11	0.004	0.009	0.022	0.81	0.81	0.85
11	12.4	NurseRachet (moderator)	1274	14	0.016	0.022	0.078	0.39	0.32	0.46
12	12.7	CaliBboy	1281	12	0.022	0.004	0.005	0.20	0.21	0.47
13	13.1	TdgEsaka	1234	15	0.015	0.016	0.029	0.57	0.40	0.73
14	13.4	ptosis	1278	13	0.009	0.039	0.034	0.80	0.78	0.73
15	17.2	jmbus...@h...	2539	4	0.003	0.170	0.020	0.99	0.98	0.92

From Table 6 we can see that in general the more prolific reviewers also tend to be higher ranked on the RPI with a few exceptions. One exception is "jmbus", despite being the fourth most prolific contributor he/she is ranked at the bottom of the top 15 contributors list. This is likely due to having the highest 1-Bomb rate of the index with 17% of reviews being 1Bombs. His/her reviews also seem to be almost entirely correlated with Pay as FairPay, FastPay, and CommPay are all correlated upwards of 90%.

Similarly, "jessema" though only being the 9th most prolific reviewer seems to have the highest quality of reviews (slightly ahead of "kimadagem") with very low Flag, 1Bomb, and 5Bomb rates as well as very low correlation between the scales Fair, Fast, and Comm with that of Pay. Interestingly, though both "Thom Burr" and "Rosey" have very high correlation rates between Pay and the other scales, because the have relatively low Flag, 1Bomb, and 5Bomb rates they are ranked near the middle.

Overall, except for a few exceptions, I am very impressed that the top contributors seem to score so well on the RPI index.

Table 7: The Top 100 most prolific contributors ranked based on the Reviewer Problem Index (RPI).

Rank	Index	Name	Nrev	Rrank	Flag	1Bomb	5Bomb	FairPay	FastPay	CommPay
1	-0.13	seri...@g...	488	64	0.000	0.000	0.006	0.00	-0.05	0.00
2	1.67	james...@y...	365	98	0.000	0.000	0.000	0.29	0.00	0.18
3	1.72	donn...@o...	1064	23	0.001	0.000	0.006	0.04	0.04	0.27
4	1.85	jessema...@g...	1539	9	0.001	0.001	0.016	0.12	0.09	0.20
5	1.94	iwashere	689	44	0.003	0.000	0.017	0.00	0.05	0.12
6	2.03	kimadagem	3732	2	0.002	0.000	0.014	0.05	-0.01	0.27
7	2.06	mmhb...@y...	422	79	0.005	0.000	0.009	0.00	0.00	0.00
8	2.21	aristotle...@g...	579	51	0.002	0.000	0.010	0.10	0.11	0.19
9	2.90	Kafei	561	55	0.002	0.000	0.027	0.16	0.13	0.27
10	2.93	turtledove	1188	19	0.001	0.000	0.012	0.32	0.04	0.34
…	…	…	…	…	…	…	…	…	…	…
90	15.28	Anthony99	571	53	0.005	0.014	0.391	1.00	1.00	1.00
91	15.83	cwwi...@g...	543	57	0.011	0.070	0.026	0.84	0.85	0.84
92	16.25	rand...@g...	490	63	0.002	0.157	0.051	0.97	0.97	0.99
93	16.76	trudyh...@c...	378	95	0.008	0.140	0.056	0.87	0.84	0.80
94	16.79	jmbus...@h...	2539	4	0.003	0.170	0.020	0.99	0.98	0.92
95	17.30	hs	945	28	0.010	0.115	0.098	0.87	0.86	0.89
96	17.94	ChiefSweetums	691	43	0.010	0.185	0.054	0.68	0.68	0.81
97	21.49	Playa	414	85	0.010	0.239	0.014	0.93	0.90	1.00
98	31.56	Tribune	360	99	0.053	0.011	0.108	0.76	0.61	0.97
99	35.74	taintturk. (moderator)	1176	21	0.027	0.499	0.014	0.89	0.87	0.73
100	40.53	Taskmistress	698	42	0.017	0.755	0.020	0.91	0.91	0.96

Find the full list of Top 100 ranked here (Second Tab).

In Table 7 we can see how reviewers score on the RPI across all of the Top 100 reviewers. The Top 10 have great scores with SERI having the top ranked score with over 488 reviews written and no Flags or 1Bombs and only three 5Bombs. For SERI there is also no correlation between Fair or Comm with an amazingly negative correlation with Fast.

The worse 10 reviewers is much more interesting mostly due to tainturk a Turkopticon moderator and Tribune a former moderator being on the list. Everybody on the worse 10 list suffer from very high correlations between the other scales and Pay. Tainturk though also suffers from having 50% of his/her reviews being 1Bombs (for those reviews in which all of the scales were completed). This is not the worse as Taskmistress has 75% 1Bombs but this was surprising. Looking back at the early reviews I see that 1Bombs seem to be common earlier in Turkopticon and are intended to reflect a Amazon Terms of Service violation, something that has since been implemented.

Similarly Tibune has one of the highest flag count rates in the entire list with an expected numbe rof flags of 5% on his/her reviews. However, as Tribune was invited to be a moderator despite this spotted history, we can only assume that my rating system has some serious flaws.

Overall, I would therefore take the RPI ranking with a grain of salt. Perhaps some of the longer time contributors to Turkopticon are suffering from changing standard over time. If I have time I will revisit the rating system looking at only reviews within the last year or two.

Econometrics By Simulation