Wikis and Online Information Aggregation Systems as a Model for Survey Research
March 4, 2012
Matt Salganik and Karen Levy (both of the Princeton Sociology Department) recently released a working paper about what they call “Wiki Surveys” that raises several important points regarding the limitations of traditional survey research and the potential of participatory online information aggregation systems to transform the way we think about public opinion research more broadly.
Their core insight stems from the idea that traditional survey research based on probability sampling leaves a ton of potentially valuable information on the table. This graph summarizes that idea in an extraordinarily elegant (I would say brilliant) way:
Think of the plot as existing within the space of all possible opinion data on a particular issue (or set of issues). No method exists for collecting all the data from all of the people whose opinions are represented by that space, so the best you – or any researcher – can do is find a way to collect a meaningful subset of that data that will allow you to estimate some characteristics of the space.
The area under the curve thus represents the total amount of information that you could possibly collect with a hypothetical survey instrument distributed to a hypothetical population (or sample) of respondents.
Traditional surveys based on probability sampling techniques restrict their analysis to the subset of data from respondents for whom they can collect complete answers to a pre-defined subset of closed-ended questions (represented here by the small white rectangle in the bottom left corner of the plot). This approach loses at least two kinds of information:
- the additional data that some respondents would be happy to provide if researchers asked them additional questions or left questions open-ended (the fat “head” under the upper part of the curve above the white rectangle);
- the partial data that some respondents would provide if researchers had a meaningful way of utilizing incomplete responses, which are usually thrown out or, at best, used to make estimates about the characteristics of whether attrition from the study was random or not (this is the long “tail” under the part of the curve to the right of the white rectangle).
Salganik and Levy go on to argue that many wiki-like systems and other sorts of “open” online aggregation platforms that do not filter contributions before incorporating them into some larger information pool illustrate ways in which researchers could capture a larger proportion of the data under the curve. They then elaborate some statistical techniques for estimating public opinion from the subset of information under the curve and detail their experiences applying theses techniques in collaboration with two organizations (the New York City Mayor’s Office and the Organization for Economic Cooperation and Development, or OECD).
If you’re not familiar with matrix algebra and Bayesian inference, the statistical part of the paper probably won’t make much sense, but I encourage anyone interested in collective intelligence, surveys, public opinion, online information systems, or social science research methods to read the paper anyway.
Overall, I think Salganik and Levy have taken an incredibly creative approach to a very deeply entrenched set of analytical problems that most social scientists studying public opinion would simply prefer to ignore! As a result, I hope their work finds a wide and receptive audience.