July 21, 2013
In a new paper, recently published in the open access journal PLOSONE, Benjamin Mako Hill and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to re-evaluate the most widely cited estimate of the gender gap in Wikipedia.
A series of studies have shown that Wikipedia’s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia’s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality.
Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from “opt-in” surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia’s readers were female. We know, from several reliable sources, that Wikipedia’s readership is evenly split by gender — a sign of bias in the WMF/UNU-MERIT survey.
In our paper, we combine data from a nationally representative survey of the US by the Pew Internet and American Life Project with the opt-in data from the 2008 WMF/UNU-MERIT survey to come up with revised estimates of the Wikipedia gender gap. The details of the estimation technique are in the paper, but the core steps are:
- We use the Pew dataset to provide baseline information about Wikipedia readers.
- We apply a statistical technique called “propensity scoring” to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
- We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to “correct” for estimated bias.
- We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.
Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women.
Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors.
Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community’s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe.
Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.
October 23, 2008
My trusty RSS feeds have turned up two interesting recent posts on the subject of the Obama campaign and it’s implications for the future of governance in a networked society.
First, David Lazer, professor at Harvard’s Kennedy School and Director of the Program on Networked Governance, asks some big questions (emphasis added):
The lights are not going off on this operation. If Obama loses, the network provides him an instant infrastructure to run again. The more intriguing question to me, as a student of politics, is what happens if, as seems likely right now, he wins. There are inter-related political and strategic questions. On the political side, the question is how Obama might use the apparatus to help him govern. Does he directly appeal to his e-mail list to support his policy objectives? There are, on average, about four thousand politically active Obama supporters in each Congressional district–that could be a lot of letters to Members.
And a few lines down:
…On the strategic side, the question is to what extent does the apparatus continue to evolve to allow grassroots involvement, and to what extent does stuff flow up as well as down? In the long run, the only way that there will be some stickiness to the structure is if the people who have been involved can mobilize for local action, can connect to each other, and feel that their voices matter.
Meanwhile, Joshua-Michele Ross at O’Reilly interviews Jascha Franklin-Hodge (founder and CTO of Blue State Digital, or BSD), who offers some partial answers to many of the same questions.
I recommend reading the whole post (and watching the videos, if you’re more of a visual person or whatever), but here’s the bullet-point version of Ross’s claims if you absolutely insist (emphasis removed from the original):
- Online U.S. political communities will morph from a campaign fundraising role to a governing role.
- Rather than one centrally governed behemoth, MyBO is enabling a thousand small campaigns to flourish…This kind of swarm politics has generated enormous amounts of energy (and money) from ordinary citizens.
- Technology (infrastructure and know-how) will become a necessary core competence in all U.S. political campaigns…Campaigns that maintain or are able to tap into a continuity of software, infrastructure and human capital will have serious advantage.
- When lobbyist data, earmark data etc. is available in standard formats it will be a great leap forward for more transparency in government.
Responses 1-3 are in varying stages of already being true. Number 4, on the other hand, has a long way to go (although the folks at the Sunlight Foundation are plugging away on that front).
Whether Franklin-Hodge’s vision of digital democracy comes to fruition, the devil will be in the details. An underlying concern voiced by Lazer is how the nodes (citizens and groups) at the edges of U.S. politics might use digital networks to enhance traditional mechanisms of representation (politicians and political parties). I would build off this insight to ask both authors whether they think the architecture of the network and the technologies that run it will also play an important role in determining the fate of netwoked democracy? If so, how do we design networks to facilitate democratic practice?
As a number of folks have argued, the choice of particular platforms and standards will enable certain forms of civic engagement while foreclosing or devaluing others. Furthermore, just because voters could gain access to the same kinds of technologies doesn’t mean they’ll use them equally effectively or even in the same ways (check out Eszter Hargittai’s research on skillful Internet use if you want some really sobering examples).
All of this is to say that the prospect of a networked polis (like a networked public sphere) presents a number of problems and challenges that few (if any) societies have been able to resolve with earlier communications technologies or institutional formations. In the ancient Greek version of the polis, a narrow class of citizens (land-owning men of means) had the ability and the right to participate. While contemporary democracies have become more populist and inclusive, the reality is that the playing field remains wildly uneven in favor of the wealthy, the well-educated, and the well-connected.
If the future imagined by Franklin-Hodge, Lazer, and others indeed comes to pass, all the fiber optic cable in the world will not make the democratization of effective citizenship any less of an uphill battle.
October 22, 2008
A few recent posts at The Next Right have confirmed that Jon Henke and Patrick Ruffini are the only conservative bloggers I know of seriously considering how to build a netroots movement on the right.
Henke builds off of Ruffini’s assessment of the Obama campaign, elaborating the idea of “long tail empowerment” to describe the distributed organizing structure currently employed by the Democratic candidate. He then juxtaposes this decentralized and market-based approach to campaigning with the top-down “command and control” approach currently being used by the Republicans.
Finally, Henke offers his explanation for these differences:
“I believe a great deal of this is attributable to the state of each Movement.
- Consolidation: The Right is behaving like a company within a declining industry, which focuses on increasing market share, rather than expanding the actual market itself. Declining industries are defensive, seeking tradition and efficiency rather than innovation. The Right – and the Republican Party – is trying to manage the decline by consolidating successes and attacking their opponent to limit the Left’s market share.
- Expansion: The Left is behaving like a company within an expanding industry, making speculative investment to build for market growth, for competitive advantage within the emerging market. The Left is playing offense, innovating. The political pendulum is swinging their way, and they are working to turn that momentum into permanent infrastructural gains.”
The irony here is that Henke’s (and Ruffini’s) analysis mirrors the claims made by Markos Moulitsas over the past five years on Daily Kos as well as in his books Taking On the System and Crashing the Gate. You can almost hear Kos chuckling to himself in the background of this post in which Ruffini spins out a fantasy in which Sarah Palin emerges as a latter day Howard Dean for the conservative movement:
Sarah Palin’s legacy as the VP nominee will matter inordinately in defining the Next Right. If the experience is seen as a constructive one (much like Dean), reminding us that it’s possible to get regular activists excited about being Republicans again, that Barack Obama ain’t the only one who can pack the arenas, and injecting a positive vibe into the GOP at the grassroots level, then I am optimistic about the GOP bouncing back. If instead the lesson of Palin is that we need to pick safe, uninspiring candidates (who will get utterly clobbered by Obama’s $1 billion+ re-election campaign, btw) who don’t offend Christopher Buckley, then I fear we are in for a long winter indeed.
Is that the theme song from the Twilight Zone playing in the background?
In all seriousness, I believe these guys make some excellent points and that their perspective merits sustained consideration by those on the left and the right
The question I have for Ruffini and Henke is whether a netroots of the right would (or even could) look like the netroots of the left? There’s a great case to be made (and some of us here at The Berkman Center are planning to publish some research in the near future that provides empirical support for this case) that technology usage patterns on the left and right of the blogosphere are significantly different. Combine that kind of evidence with some recent studies in cognitive psychology and some genetics-oriented political science work (pdf) and you can see the outline of an argument for the co-evolution of genes and political institutions.
The full extent or significance of this hypothetical argument is something I’m interested in exploring further. In the meantime, I should underscore that I’m neither advocating nor endorsing such a view just yet. It needs a lot of additional research to back it up and is in danger of sounding very deterministic at this early stage in its development.
Nevertheless, the nascent evidence for the co-evolutionary theory of U.S. politics gives me just enough rhetorical leverage to push back against some of Henke and Ruffini’s claims. It doesn’t take a neuroscientist to predict that it’s highly unlikely that the varieties of netroots activism that may evolve on the right will produce identical outcomes to that of the left. In building the fundraising and organizing capacity of the blogosphere, the Dean campaign, and the Obama campaign, the left has not used a single tool or technique that was not also available to the right. Likewise, individuals and organizations on the left have made conscious decisions to utilize the tools and techniques in particular ways that made sense within their existing organizational and institutional contexts. Those contexts are distinct from the ones on the right. As a result, the tools may or may not translate especially well.
I don’t have any answers here, just more questions. But I’m very curious to hear what Ruffini, Henke, Kos, and others would make of this issue.
October 13, 2008
Lately, I’ve benefited from a lot of long thought-provoking conversations about the Obama campaign with Gene Koo, one of the many wonderful fellows at the Berkman Center.
Most recently Gene passed along this article by Zack Exley about the Obama campaign’s organizing structure that appeared in the Huffington Post a few days ago.
Like Obama’s campaign itself, Exley’s article frequently gave me the sensation that there really is something qualitatively different about this election and about this candidate’s ability to design and mobilize an massive volunteer-driven force.
Exley visits campaign offices throughout Ohio and finds something he had never seen before in years of participating in and writing about politics in this country: dedicated, everyday people delegating responsibility and cooperating to achieve their goals:
“After visiting my fourth or fifth team, it was painfully clear that an enormous amount of power is unlocked by this incredibly simple act of distributing different roles to people who actually feel comfortable taking them on. And I say “painfully” because I couldn’t stop thinking about all the union and electoral campaigns I’ve worked on where we did not do this.” (emphasis added)
With numerous examples, Exley depicts an organization led by people who have discovered the power of distributed collective action. Only four years after the Kerry campaign (indeed, less than 12 months since the Obama folks set up shop in Ohio) comparisons are painful indeed:
“The Ohio campaign is attempting to build teams in 1,231 campaign-defined ‘neighborhoods;’ each covering eight to ten precincts. They are targeting virtually every inhabited square mile of the state. The campaign claimed to have teams in 65% of neighborhoods when I visited in early September. That’s risen to 85% coverage at press time—and they are shooting for 100%. In contrast, the Kerry campaign effectively wrote off rural counties, and completely abandoned them in the final few weeks of the campaign in a last minute all-in shift to the cities.”
According to Exley, the “secret” to this exponential growth lies in the ability of the team’s leaders to build a cellular organization from the ground up, absorbing anyone with the time, talent, and ability to make a sustained contribution. In addition, the leaders have demonstrated an impressive level of dedication to their goals and values, passing them along to subordinates through long and personal training sessions as well as extended periods of collaboration. When the time is right, leaders pass on responsibilities to veteran team members and then move on to building the next team. The results are a steadily growing pool of experienced leaders who have internalized the organization’s ideals and developed the skills and relationships to achieve their goals.
Exley’s most comprehensive assessment of the significance of the Obama campaign’s transformative distributed structure comes in the second paragraph:
“Win or lose, ‘The New Organizers’ have already transformed thousands of communities—and revolutionized the way organizing itself will be understood and practiced for at least the next generation. Obama must continue to feed and lead the organization they have built—either as president or in opposition. If he doesn’t, then the broader progressive movement needs to figure out how to pick this up, keep it going and spread it to all 50 states.”
The extent to which this analysis echoes the thinking of Markos Moulitsas Zúniga’s in his most recent book Taking on the System (2008) is unsurprising. In contrast, Zúniga credits the progressive blogosphere and other outsiders with spearheading the effort whereas Exley lays the credit with creative young organizers like Jeremy Bird (Obama’s Ohio General Election Director) and wise elder statesmen like Marshall Ganz (Harvard lecturer and political organizer par excellence). In thinking about this distinction as well as the substantive overlap between Exley’s and Zúniga’s work, I got to to wondering about the network topology of political campaigns.
Network topologies are everywhere – if you’ve ever looked at an organizational chart of any kind, you’ve seen one. With the growth of electronic and digital communications networks in the last 100 years or so, the impact of both poorly and well-designed networks has never been more apparent.
For example, the Internet represents the result of the most impressive large-scale network design in recent memory (at least since the telephone).The insight the facilitated the creation of the TCP/IP networking protocol that forms the basis of the Internet was the architectural advantage of a distributed point-to-point network. In contrast with “star” or “hub and spoke” networks, true point-to-point networks scale costlessly and are almost insusceptible to congestion or failure.
How does this relate back to politics? My (largely unsubstantiated) suspicion is that most political campaigns are designed as hub and spoke networks (or, at best, as trees). The implications of this design decision are relevant to the broader question of how the Internet has changed politics as well as the future of political organizing in a pervasively networked environment.
It is not an accident that most political organizations are hierarchical affairs, involving a relatively small number of well-connected and informed folks at the center who serve as common points of access for masses of less integrated nodes. Political elites derive a great deal of their power from their structural position (insofar as it grants them control over resources, jobs, and wealth) and the long life of hierarchical political institutions has made them appear almost natural.
The result, in structural terms, is a lot of inefficiency and vulnerability. In both tree and star networks, the failure of any hubs that connect the top or the center of the organization with its edges can be catastrophic for the survival of the network as a whole. This is part of the reason why, in organizational settings, the individuals or groups that occupy these strategic positions tend to accumulate power out of proportion to their rank (think of middle managers).
In the history of modern electoral democracy and political bureaucracy, the costs of these vulnerable and inefficient organizational structures have been deemed sufficiently low to justify the benefits of consolidated leadership and authority. If life on the Internet is any indication, that may be undergoing a subtle, but perceptible change.
All of this brings me back to Obama and to Exley. It’s important to underscore that the Obama campaign has not turned its back on hierarchical structures or centralized networks. The campaign is very much a national affair and the core organizers (such as state directors) continue to operate as potential choke-points capable of undermining the organization’s effective operation.
The key innovations observed by Exley and implemented by Obama’s personnel have to do with the means by which the branches of the tree are expanded, and not with the underlying structure of the tree itself.
At the same time, Exley describes an explicitly “viral” mode of assimilating volunteers, as well as the ability of the organization to scale its local staff exponentially throughout the summer. Both of these characteristics suggest that we’re looking at something other than a prototypical hierarchy-based organizing model.
The Obama campaign has effectively retained its hierarchy, but in the process it has ceded a tremendous amount of autonomy to its middle managers in an effort to build a more dynamic and scalable operation.
This strategy has been inspired by examples of distributed cooperation and political mobilization online, but it falls far short of embracing truly radical alternatives. What will be interesting to observe in coming years is whether the non-hierarchical approaches inspired by the point-to-point design of the Internet gain any traction in political organizations. In theory at least, nothing prevents a more decentralized organization from transmitting the ideas and tactics necessary to a political campaign. The problems arise in directing such distributed efforts towards a common goal in an effective way. For the time being, that is what the Obama campaign appears to have achieved to an unprecedented degree.
September 28, 2008
The current issue of The Prospect magazine contains a “debate” between Tim Harford and Pete Lunn on the significance of behavioral economics to economic theory as a whole.
The two authors succeed in putting on a friendly show of insult-swapping; in the process they somehow manage to endorse divergent perspectives on the latest research in their field.
Lunn contends that behavioral research is nothing less than a revolution in the making, uprooting the rotten foundations of the discipline in favor of a more nuanced, empirically accurate models of human action and motivation.
Harford puts forward a contrary view, in which behavioral theories and methods take their place among the tools of mainstream economics, but fall far short of transforming the field in a radical fashion.
FWIW, I find many of Harford’s claims quite compelling. I suspect that many economists can freely admit the limitations of theories grounded in a narrowly-selfish model of motivation; I recall reading something by Milton Friedman himself in which he points out that the accuracy of the assumptions is irrelevant, it is only the accuracy of the predictions that matters. In other words, Friedman agreed that it was obvious that narrowly-selfish “rationality” was a mere shadow of the depth and complexity of the human psyche. He merely claimed that it was the most reliable assumption anyone had yet found to model economic behavior on a large scale.
An interesting problem with Harford’s argument crops up in the following passage, though:
…the orthodox, rational-choice approach continues to work. Take a step back and look at the big picture. According to the laboratory experiments on public goods you describe, there is no such thing as the free-rider problem. If only that were true. It would mean that there was no climate change problem, because people would voluntarily restrict their carbon emissions to preserve the planet for strangers and the children of strangers. It would mean that fish stocks were healthy because fishing crews realised they were dealing with a common resource. London’s congestion charge would be counterproductive, because people do not respond to individual incentives: drivers would have willingly left their cars at home in order to leave the roads congestion-free for others.
No matter how many experiments you allude to, the discomfiting rational self-interested model explains our environmental predicament perfectly. It is also the inspiration for solutions such as a carbon tax or a cap-and-trade scheme.
Harford argues that orthodox models of narrowly-selfish rationality still work, but he neglects to point out that orthodox economists may have something to do with that fact.
As Michel Callon (1998) has theorized and Donald MacKenzie (2006) has demonstrated, if you look under the hood of every carbon-producing, fish-catching, and emissions-releasing industry, you find classically trained economists!. This is not a surprise, but it means that economic theorists can no longer turn to the world of already existing industries and firms as empirical “proof” of the accuracy of their predictions. Why not? It’s like betting on the outcome of a baseball game when you’re the starting pitcher. Economists and their theories play such a large role in shaping contemporary industries that it is no longer possible to distinguish clearly their theories from those industries. They have actively gone out and propagated the theory to such a degree that they can be said, in some cases, to have made it come true.
How does this relate to behavioral economics? That remains to be seen. Analytic models and business plans constructed on assumptions of altruism, cooperation, sharing, and social wealth are only beginning to emerge. Nevertheless, I wonder what kinds of industries would emerge out of more empirically accurate models of behavior?