Matt Salganik and Karen Levy (both of the Princeton Sociology Department) recently released a working paper about what they call “Wiki Surveys” that raises several important points regarding the limitations of traditional survey research and the potential of participatory online information aggregation systems to transform the way we think about public opinion research more broadly.

Their core insight stems from the idea that traditional survey research based on probability sampling leaves a ton of potentially valuable information on the table. This graph summarizes that idea in an extraordinarily elegant (I would say brilliant) way:

Figure 1 from Salganik and Levy (2012), which they title: "a schematic rank order plot of contributions to successful information aggregation systems on the Web."

Think of the plot as existing within the space of all possible opinion data on a particular issue (or set of issues). No method exists for collecting all the data from all of the people whose opinions are represented by that space, so the best you – or any researcher – can do is find a way to collect a meaningful subset of that data that will allow you to estimate some characteristics of the space.

The area under the curve thus represents the total amount of information that you could possibly collect with a hypothetical survey instrument distributed to a hypothetical population (or sample) of respondents.

Traditional surveys based on probability sampling techniques restrict their analysis to the subset of data from respondents for whom they can collect complete answers to a pre-defined subset of closed-ended questions (represented here by the small white rectangle in the bottom left corner of the plot). This approach loses at least two kinds of information:

  1. the additional data that some respondents would be happy to provide if researchers asked them additional questions or left questions open-ended (the fat “head” under the upper part of the curve above the white rectangle);
  2. the partial data that some respondents would provide if researchers had a meaningful way of utilizing incomplete responses, which are usually thrown out or, at best, used to make estimates about the characteristics of whether attrition from the study was random or not (this is the long “tail” under the part of the curve to the right of the white rectangle).

Salganik and Levy go on to argue that many wiki-like systems and other sorts of “open” online aggregation platforms that do not filter contributions before incorporating them into some larger information pool illustrate ways in which researchers could capture a larger proportion of the data under the curve. They then elaborate some statistical techniques for estimating public opinion from the subset of information under the curve and detail their experiences applying theses techniques in collaboration with two organizations (the New York City Mayor’s Office and the Organization for Economic Cooperation and Development, or OECD).

If you’re not familiar with matrix algebra and Bayesian inference, the statistical part of the paper probably won’t make much sense, but I encourage anyone interested in collective intelligence, surveys, public opinion, online information systems, or social science research methods to read the paper anyway.

Overall, I think Salganik and Levy have taken an incredibly creative approach to a very deeply entrenched set of analytical problems that most social scientists studying public opinion would simply prefer to ignore! As a result, I hope their work finds a wide and receptive audience.

When science fails

November 13, 2011

I just read this short piece by Richard Van Noorden in Nature about the rising number of retractions in medical journals over the past five years and it got me thinking about the different ways in which researchers fail to deal with failure (the visualizations that accompany the story are striking).

Esther Vargas 2008 cc-by-nc-sa

The article specifies two potential causes behind the retraction boom: (1) increased access to data and results via the Internet facilitating error discovery; and (2) creation of oversight organizations charged with identifying scientific fraud (Van Noorden points to the US Office of Research Integrity in the DHHS as an example). It occurred to me in reading this that, a third, complementary  cause could be the political pressure exerted on universities and funding agencies as a result of the growing hostility towards publicly funded research. In the face of such pressure, self-policing would seem more likely.

Apparently, the pattern goes further and deeper than Van Noorden is able to discuss within the confines of such a short piece. This Medill Reports story by Daniel Peake from last year has a graph of retractions that goes all the way back to 1990, showing that the upturn has been quite sudden.

All of these claims about the causes of retractions are empirical and should/could be tested to some extent. The bigger question, of course, remains: what to do about the reality of failure in scientific research? As numerous people have already pointed out, in an environment where publication serves as the principal metric of production, the institutions, organizations & individuals that create research – universities, funding agencies, peer reviewed journals, academics & publishers – have few (if any) reasons to identify and eliminate flawed work. The big money at stake in medical research probably compounds these issues, but that doesn’t mean the social sciences are immune. In fields like Sociology or Communication where the stakes are sufficiently low (how many lives were lost in FDA trials because of the conclusions drawn by that recent AJS article on structural inequality?), the social cost of falsification, plagiarism, and fraud remain insufficient to spur either public outrage or formal oversight. Most flawed social scientific research probably remains undiscovered simply because, in the grand scheme of policy and social welfare, this research does not have a clear impact.

Presumably, stronger norms around transparency can continue to provide enhanced opportunities for error discovery in quantitative work (and I should have underscored earlier that these debates are pretty much exclusively about quantitative work). In addition, however, I wonder if it might be worth coming up some other early-detection and response mechanisms. Here were some ideas I started playing with after reading the article:

Adopt standardized practices for data collection on research failure and retractions. I understand that many researchers, editors, funders, and universities don’t want the word to get out that they produced/published/supported anything less than the highest quality work, but it really doesn’t seem like too much to ask that *somebody* collect some additional data about this stuff and that such data adhere to a set of standards. For example, it would be great to know if my loose allegations about the social sciences having higher rates of research failure and lower rates of error discovery are actually true. The only way that could happen would be through data collection and comparison across disciplines.

Warning labels based on automated meta-analyses. Imagine if you read the following in the header of a journal article: “Caution! The findings in this study contradict 75% of published articles on similar topics.” In the case of medical studies in particular, a little bit of meta-data applied to each article could facilitate automated meta-analyses and simulations that could generate population statistics and distributions of results. This is probably only feasible for experimental work, where study designs are repeated with greater frequency than in observational data collection.

Create The Journal of Error Discovery (JEDi). If publications are the currency of academic exchange, why not create a sort of bounty for error discovery and meta-analyses by dedicating whole journals to them? At the moment, blogs like Retraction Watch are filling this gap, but there’s no reason the authors of the site shouldn’t get more formal recognition and credit for their work. Plus, the first discipline to have a journal that goes by the abbreviation JEDi clearly deserves some serious geek street cred. Existing journals could also treat error discoveries and meta-analyses as a separate category of submission and establish clear guidelines around the standards of evidence and evaluation that apply to such work. Maybe these sorts of practices already happen in the medical sciences, but they haven’t made it into my neighborhood of the social sciences yet.

I’m working on some preliminary research for a study of DailyKos and some of the other political blogs that continue to define the networked public sphere.

In the process, I had to start thinking more seriously about what it means to do an ethnography of this kind of community.  I’ve done some work on this before, so I had a few ideas, but the challenges and scope of Kos are a bit daunting.

As a result, I’m focusing on breaking down an initial assessment of the site into several categories (see below). I then use these to structure my observations and to help define problems that I’ll have to solve later (possibly with more than just a tab-happy browser window, the Internet Archive, and two tired eyeballs).

The main categories are:

  • History and Evolution of the site – including community structure, software, interface, layout, etc.
  • Organizational/Institutional Structure (behind the scenes stuff like money, hosting, contractors, etc.)
  • The Current Community
    • social network topography (i.e. is it just another bow-tie?)
    • practices, norms, governance, etc.
    • signs of life off-line?
  • Technical Platform & Software
  • Content (production & consumption)
  • Networks and public-sphere functions (linking, SEO, connections to the media, political parties, etc.)

Obviously, these overlap a lot and the list can get much more detailed (indeed it does in my notebook). The important stuff – at least the stuff that a lot of smart Internet research has identified as important – seems to be accounted for…