In a new paper, recently published in the open access journal PLOSONEBenjamin Mako Hill and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to re-evaluate the most widely cited estimate of the gender gap in Wikipedia.

A series of studies have shown that Wikipedia’s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia’s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality.

Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from “opt-in” surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia’s readers were female. We know, from several reliable sources, that Wikipedia’s readership is evenly split by gender — a sign of bias in the WMF/UNU-MERIT survey.

In our paper, we combine data from a nationally representative survey of the US by the Pew Internet and American Life Project with the opt-in data from the 2008 WMF/UNU-MERIT survey to come up with revised estimates of the Wikipedia gender gap. The details of the estimation technique are in the paper, but the core steps are:

  1. We use the Pew dataset to provide baseline information about Wikipedia readers.
  2. We apply a statistical technique called “propensity scoring” to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
  3. We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to “correct” for estimated bias.
  4. We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.

Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women.

Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors.

Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community’s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe.

Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.

Think big! What would it take to make crowdsourcing and crowdwork a more sustainable, fulfilling, and efficient sector of economic and social production? (photo by John McNabb, cc-by-nc-nd)

This weekend, Andrés and I attended the CrowdCamp Workshop at CHI in Austin, Texas. The workshop was structured a lot like a hackathon, with the objective being to work in teams to produce projects, papers, or research.

The group I worked with coalesced around a proposal made by Niki Kittur, who suggested that we envision how crowdsourcing and distributed work contribute to solving grand challenges, such as economic inequality and the ongoing impact of the 2008 financial crisis.

We then spent the better part of the weekend outlining an ambitious set of scenarios and goals for the future of crowdwork.

While many moments of our conversation were energizing, the most compelling aspects derived from the group’s shared desire to imagine crowdwork and distributed online collaboration as potentially something more than the specter of alienated, de-humanized piece-work that it is frequently depicted to be.

To spur our efforts, we used a provocative thought experiment: what it would take for crowdwork to facilitate fulfilling, creative, and sustainable livelihoods for us or our (hypothetical or real) children?

Despite the limits of this framing, I think it opened up a discussion that goes beyond the established positions in debates about the ethics and efficiencies of paid crowdsourcing, distributed work, and voluntary labor online (all of which are, to some extent, encompassed under the concept of crowdwork in this case). It also hellped us start imagining howwe, as designers and researchers of crowdwork platforms and experiences, would go about constructing an ambitious research agenda on the scale of a massive project like the Hadron Collider.

If everything goes according to plan, this effort will result in at least a paper within the coming few weeks. Assuming that’s the case, our group will be sharing more details about the workshop and our vision of the future of crowdwork soon.

I recently had a pilot version of a crowdsourcing task fail pretty spectactularly, but after discussing the failure with Mako I’ve concluded that my experience helps illustrate some interesting comparisons between labor relations in a distributed online market and more traditional sorts of employment and jobs.

The failure in this case started early: I did a mediocre job designing the task. It’s not really worth going into any details except to say that (out of laziness) I made it really easy for workers to either (a) purposefully respond with spammy results; (b) slack off and not provide responses; (c) try to complete the task but unintentionally do a bad job and therefore provide poor quality results; or (d) try to complete the task and do so successfully. I also did not do a good job incorporating any effective means of differentiating between whether the workers who did not provide accurate results were spamming, shirking, or simply failing

So why does this experience have anything to do with the nature of employment relations?

First, think about it from the employer’s (or the work “requester’s”) point of view. A major part of creating an effective crowdsourcing job consists in minimizing the likelihood or impact of (a)-(c) either by means of algorithmic estimation and/or clever task design. It’s not necessary that every worker provide you with perfect results or even perfect effort, but ideally you find some way to identify and/or remove work and workers that introduce unpredictable sources of bias into your results. Once you know what kind of results you’ve got, it’s possible to make appropriate corrections in the event that some worker has been feeding you terrible data or maybe just unintentionally sabotaging your task by doing a bad job.

In other words, low quality results can provide employer-requesters with useful information if (and only if) the employer-requester finds a way to identify it and use it to their advantage. This means that a poorly designed task is not just one that doesn’t elicit optimal performance from workers, but also one that doesn’t help an employer-requester differentiate between spammers, slackers, passive saboteurs, and those workers who really are trying and (at least most of the time) completing a given task successfully.

When I design a job I always assume that a relatively high proportion of the workers are trying to complete the task in good faith (sure, there are some spammers and slackers out there, but somehow they don’t seem to make up the majority of the labor pool when there’s a clear, well-designed, reasonably compensated task to be done). As a result, if I get predominantly crap responses back from the workers, I assume that they are (maybe somewhat less directly than I might like) providing me with negative feedback on my task design.

Now from the workers’ point of view, I suspect the situation looks a bit different. They have fewer options for dealing with employer-requesters who are trying to scam them. Most distributed labor markets lack features that would support anything resembling collective bargaining or collective action on the part of workers. Communications by workers to employer-requesters are limited and, consequently, there usually aren’t robust mechanisms for offering or coordinating feedback or complaints.

As a result, the most effective communications tool the workers possess is their work itself. Not surprisingly, some of them seem to use their work to engage acts of casual slacking and sabotage that resemble online versions of the “weapons of the weak” described by James C. Scott in his book on everyday resistance tactics among rural peasants.

The ease with which crowdsourcing workers can pursue these relatively passive forms of resistance and tacit feedback relates to a broader, more theoretically important point: in most situations, a member of an online crowd should have a much easier time quitting or resisting than workers in (for example) a factory when they decide they’re unhappy with an employment relationship for any reason. Why?  First off, crowdsourcing workers usually don’t have personal ties to a company, brand, co-workers, managers, etc. Second of all, the structure of online labor markets makes the cost of leaving any one job extraordinarily low. An office worker who (upon being confronted by, e.g., an unpleasant or unethical task) leaves her position risks giving up not only valuable resources like future wages or benefits, but also loses physical stability in her life, contact with friends and colleagues, and the respect or professional support of her superiors. In contrast, a worker in an online crowd who decides to leave her job loses almost nothing. While there is some risk associated with actively spamming or slacking (in some crowdsourcing markets, workers with low quality ratings can be banned or prevented from working on certain jobs), it’s still substantially easier to just walk away and find another task to do.

These are just some of the reasons why theoretical predictions from classical wage and employment economics – for example, that a $0.01 decrease in wages will result in some proportion of employees leaving their jobs – don’t hold up in traditional or crowdsourcing labor markets. The interesting point is that the reasons why these classical theories don’t hold up in crowdsourcing systems don’t have much to do with the complications introduced by social relations since social relations (between workers and employers as well as between workers and workers) are severely constrained in most online labor markets.


(Note: The first version of this post was written pretty late at night, so I didn’t include many links to sources. I’ll be trying to add them over the next few days.)

Electronika 302 Recorder - by Daniel Gallegos

Zombie trade agreements: According to some documents acquired by the organization European Digital Rights (EDRi), it appears the G8 has decided to do a Dr. Frankenstein impression and reanimate some of the most thoughtless portions of ACTA’s Internet provisions. This latest instantiation of the ACTA agreement wants control over intellectual property, technology devices, network infrastructure, and YOUR BRAINS.

An awesome experiment on awards (published in PLoS ONE) by Michael Restivo and Arnout van de Rijt – both in the Sociology department at SUNY Stony Brook – shows that receiving an informal award (a barnstar) from a peer may have a positive effect on highly active Wikipedians’ contributions. The paper is only three pages long, but if you want to you can also read the Science Daily coverage of it.

Mako’s extensive account of his workflow tools is finally up on Uses This. The post is remarkable for many reasons. First of all, Mako puts more care and thought into his technology than anybody I know, so it’s great to see the logic behind his setup explained more or less in full. Secondly, I found it extra remarkable because I have been collaborating (and even living!) closely with Mako for a while now and I still learned a ton from reading the post. My favorite detail is unquestionably the bit about his typing eliciting a noise complaint while he was in college. As a rather loud typist myself, I have been subject to snark and snubbery from various quarters over the years, but I’ve never had anybody call the cops on me!

The Soviet Union lives on! But maybe not quite where you’d expect it. My friends and former Oakland neighbors Daniel Gallegos and Zhanara Nauruzbayeva have recently moved themselves and their incredible Artpologist project to New York. Upon arrival, they found themselves surrounded by a post soviet reality that most New Yorkers or Americans simply do not know exists at all, much less in the epicenter of finance capital. Their latest project, My American New York, chronicles this “post soviet America” through photos, stories, Daniel’s beautiful sketches, drawings, and paintings (e.g. the image at the top of this post), all wrapped up in a series of urban travelogues.

Philosophy Quantified: Kieran Healy has done a series of elegant and thoughtful guest posts on Leiter Reports in which he explores data from the 2004 and 2006 Philosophical Gourmet Report (PGR) surveys in an effort to generate some preliminary insights about the relationships between department status and areas of specialization.

In doing some reading about collective action, cooperation, and exchange theory, I encountered (gated link) the figures below:

If you happen to be the kind of person who spends a lot of time around research combining social dilemmas, evolutionary models of cooperation, and econometric production functions, these may seem completely intuitive and you probably do not even need to read the paper to get the gist of Professor Heckathorn’s argument.

Otherwise, the images may feel a bit more like conceptual art. The Plot labeled “C” at the bottom right is my runaway favorite. I am also a big fan of the mysterious “arch” shape and the large “X” that appear in the first figure.

n.b., Professor Heckathorn does an admirable job explaining these images in the paper and my point here, is not to provide a Tuftean critique of  some rather ornate visualizations. Instead, I wanted to try to communicate the sensation I felt when I encountered these images in the context of an extraordinarily sophisticated and abstract simulation-based analysis of the social dilemmas used to analyze the theoretical conditions under which people may be more likely to cooperate and contribute to public goods.

That’s right, these figures are part of a model modeling models. Given that I am singling out this particular model for attention, they are also, you might say, part of a model model. Given that Professor Heckathorn’s work in this area is highly sophisticated and compelling, you might even say that these figures are part of a model model model (modeling models).

Matt Salganik and Karen Levy (both of the Princeton Sociology Department) recently released a working paper about what they call “Wiki Surveys” that raises several important points regarding the limitations of traditional survey research and the potential of participatory online information aggregation systems to transform the way we think about public opinion research more broadly.

Their core insight stems from the idea that traditional survey research based on probability sampling leaves a ton of potentially valuable information on the table. This graph summarizes that idea in an extraordinarily elegant (I would say brilliant) way:

Figure 1 from Salganik and Levy (2012), which they title: "a schematic rank order plot of contributions to successful information aggregation systems on the Web."

Think of the plot as existing within the space of all possible opinion data on a particular issue (or set of issues). No method exists for collecting all the data from all of the people whose opinions are represented by that space, so the best you – or any researcher – can do is find a way to collect a meaningful subset of that data that will allow you to estimate some characteristics of the space.

The area under the curve thus represents the total amount of information that you could possibly collect with a hypothetical survey instrument distributed to a hypothetical population (or sample) of respondents.

Traditional surveys based on probability sampling techniques restrict their analysis to the subset of data from respondents for whom they can collect complete answers to a pre-defined subset of closed-ended questions (represented here by the small white rectangle in the bottom left corner of the plot). This approach loses at least two kinds of information:

  1. the additional data that some respondents would be happy to provide if researchers asked them additional questions or left questions open-ended (the fat “head” under the upper part of the curve above the white rectangle);
  2. the partial data that some respondents would provide if researchers had a meaningful way of utilizing incomplete responses, which are usually thrown out or, at best, used to make estimates about the characteristics of whether attrition from the study was random or not (this is the long “tail” under the part of the curve to the right of the white rectangle).

Salganik and Levy go on to argue that many wiki-like systems and other sorts of “open” online aggregation platforms that do not filter contributions before incorporating them into some larger information pool illustrate ways in which researchers could capture a larger proportion of the data under the curve. They then elaborate some statistical techniques for estimating public opinion from the subset of information under the curve and detail their experiences applying theses techniques in collaboration with two organizations (the New York City Mayor’s Office and the Organization for Economic Cooperation and Development, or OECD).

If you’re not familiar with matrix algebra and Bayesian inference, the statistical part of the paper probably won’t make much sense, but I encourage anyone interested in collective intelligence, surveys, public opinion, online information systems, or social science research methods to read the paper anyway.

Overall, I think Salganik and Levy have taken an incredibly creative approach to a very deeply entrenched set of analytical problems that most social scientists studying public opinion would simply prefer to ignore! As a result, I hope their work finds a wide and receptive audience.

Long Tail Sports

February 19, 2012

Neymar-mania vs. Lin-sanity?

Lin-sanity notwithstanding, this is a time of year when I always find myself wanting more as a sports fan in America. The memories of the Super Bowl and BCS Championship game have already started to fade; March madness remains a long way off; pitchers and catchers have yet to report for Spring Training; and both the NBA and NHL have just passed the midpoint of their respective regular seasons. Add that it’s the middle of Winter (even an historically mild one), and these factors combine to make mid February a less than thrilling few weeks.

Lately, I’ve partially solved my urge for non-stop sports entertainment by turning to leagues that have much less popularity and almost no visibility in mainstream U.S. media coverage.

First, during a brief trip to Brazil for a conference, I enjoyed watching some early round action in the Paulistão, or the elite soccer league of São Paulo state. With historically dominant teams like Corinthians, Santos, and Palmeiras, São Paulo boasts one of the most competitive state-level championships within Brazil and usually includes several young players who will become international superstars with household names within a few years (e.g. if you haven’t heard of Neymar yet, just be patient, the teenage phenom will likely figure prominently in the Brazilian national team’s efforts when the country hosts the World Cup in 2014).

Then, the week after I returned from Brazil, I spent a few afternoons watching the final games of the Serie del Caribe, an international tournament that wraps up the Winter leagues in the Dominican Republic, Mexico, Puerto Rico, and Venezuela. The games were tight, competitive and included a number of Major League players who seemed either to have chosen to return home as triumphant stars or to hone their skills among Latin America’s most competitive leagues.

Despite the fact that you’ll never see your local ESPN network cover either of these events, both have a ton of history behind them and tremendous fan-bases (ESPN’s Brazilian and regional Latin American affiliates cover both). They are also extraordinarily competitive and played at a very high skill level.

Latin American soccer and baseball are not the only options. There are also a whole range of winter sports that never show up on U.S. television schedules until the Olympics. In other words, the only thing preventing you from watching terrific, exciting sporting events in the middle of the annual mid-Winter lull is the fact that you would probably either need to pay an inordinate sum for satellite coverage or seek out unauthorized streams on websites that serve sketchy advertisements and mal-ware along with the game.

At the risk of making a very Ethan Zuckerman-esque point, the Internet makes it theoretically trivial to solve this problem, but that theoretical triviality only underscores a much bigger problem in the way our attention is distributed and canalized by a combination of cultural habits and incumbent media networks. In other words, maybe you’d be more likely to watch Neymar and Santos take on Palmeiras if either your local television network would it or if you could easily find a high quality stream broadcasting in English (I also enjoy watching these things online because I get to listen to Portuguese and Spanish language announcers). Indeed, as long as somebody is streaming a broadcast of any of these games anywhere around the world, there’s no practical reason that it isn’t possible to watch that stream anywhere else. But for a whole variety of reasons that I don’t fully understand, that just doesn’t happen yet.

My point is that American sports fans live in a media ecosystem that has not yet figured out what to do with its (long) tail. There has to be a better, less monopolistic solution than satellite and cable providers charging high rates for access to particular sports packages or leagues. This model ensures that only existing fans who are willing to pay to watch teams they already like will ever subscribe to such services, condemning these sports and teams to continued obscurity. Instead, it would be great to see some affordable way for fans to take advantage of existing Internet streams to experiment with new sports, new leagues, and new cultures by tuning into otherwise less popular or less well-known events when their hometown favorites are not in season.