Wikimania 2007 Taipei :: a Globe in Accord

This page is part of the Proceedings of Wikimania 2007 (Index of presentations)

Using Natural Language Processing to determine the quality of Wikipedia articles

Authors Brian Mingus (University of Colorado at Boulder), Trevor Pincock (University of Colorado at Boulder), Laura Rassbach (University of Colorado at Boulder)
Track Free Content
License GNU Free Documentation License (details)
About the authors
Brian Mingus is an undergraduate in Psychology at the University of Colorado at Boulder. He has worked in the Computational Cognitive Neuroscience Lab since June of 2005 as an OSS developer and motor control researcher. He has been involved in Wikipedia since 2003, his first major project being the creation of Qwikly with Erik Zachte, a service that provided the entire contents of several projects in a format suitable for PDAs.
Presenters/Trevor Pincock/Biography
Presenters/Laura Rassbach/Biography
In terms of size, Wikipedia is growing at a pace unrivaled throughout history and already dwarfs several of the most comprehensive encyclopedias combined. This has been encouraged by a policy which allows any contributor who wishes to remain anonymous almost complete editorial control over all content in the encyclopedia. This anonymous access has led to the difficult problem of determining the quality of any of the more than 1.7 million articles in Wikipedia.

Many of these anonymous volunteers have recently come together to form the "Wikipedia Editorial Team." This team has defined several ranked classes of quality that an article can fall under, including Featured Article (FA - this is the highest quality), A, Good Article (GA), B,Start and Stub. While over 750,000 articles have been classified, only 150,000 fall under FA (1364) , A (797), GA (1967), B (28421) and Start (113320).

In order to gain a higher rating, articles must meet a list of increasingly qualitative criteria. For example, the current Featured Article criteria specify that one must be "well written," "comprehensive," "factually accurate," "neutral," and "stable." As quality itself is an inherently subjective term, common knowledge holds that human ratings are necessary in order to make accurate judgments along these dimensions. But this has proven difficult, as there are simply not enough volunteers to make in-depth judgments on every single article along every single dimension. Instead, humans look at the overall gestalt of the article and make a single judgment - what class does it belong in?

To complement this approach, we have employed the Maximum Entropy classifier, using features derived from the field of Natural Language Processing, to determine the quality of Wikipedia articles. We present our findings using dozens simple features such as article length and the number of images and paragraphs, and also more sophisticated features such as those derived from the pagerank algorithm (Brin & Page, 1998) and semantic analysis. Based on our research, we believe the best approach to determining article quality is a combination of human ratings and machine classifications. Human ratings serve as excellent training data for a machine learning algorithm, and these same algorithms can reverse engineer the human ratings, to figure out exactly what they mean when they say, for example, that an article is of Featured quality.

  • PDF: RassbachPincockMingus07. Note: This paper is a DRAFT. Do not cite it.
4Final edit

Full text