Research

Predictive Model Identifies Wikipedia Arguments that Will Never Get Resolved

Computer keyboard with two gesturing hands keys

A joint study involving researchers from MIT, the University of Michigan and the Wikimedia Foundation has identified why so many Wikipedia disputes unresolved and developed predictive tools to help improve editorial deliberations. In a paper presented at the recent ACM Conference on Computer-Supported Cooperative Work and Social Computing, Jane Im from U Michigan's School of Information, Amy Zhang and David Karger from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Christopher Schilling from Wikimedia presented a model that predicts whether a Request for Comment (RfC) will go stale with a 75 percent accuracy rate within a week of dispute initiation.

Content disputes on the vast general-reference source can be handled in a number of ways. But at any point, editors who disagree can use RfCs by writing up a proposal or question on the relevant article talk page and inviting comment by the broader community by posting to various noticeboards. It was this approach that was the research focus.

Any editor can initiate an RfC and any editor — usually, an experienced one — who didn't participate in the discussion and is considered neutral can close the discussion. After 30 days, a bot automatically removes the RfC template, with or without resolution. RfCs can close formally with a summary statement by the closer, informally due to overwhelming agreement by participants, or be left "stale," meaning removed without resolution.

The researchers compiled a database consisting of 7,316 RfCs from English Wikipedia dating from 2011 to 2017. Those included closing statements, author account information and general reply structure. They also conducted interviews with 10 of the website's most frequent "closers" to better understand their motivations and considerations when resolving a dispute.

In an analysis of the dataset, the researchers found that about 58 percent of RfCs were formally closed. Of the remaining 42 percent, more than three-quarters (78 percent) had no participant activity to informally end the RfC; in other words, a full third of all RfCs in the dataset were left stale.

Major issues included "poorly articulated initial statements by inexperienced discussion initiators, lack of interest from third-party experienced Wikipedia editors and excessive bickering or contentiousness during the discussion," according to the paper.

"It was surprising to see a full third of the discussions were not closed," said Zhang, a Ph.D. candidate in CSAIL, in a statement. "On Wikipedia, everyone's a volunteer. People are putting in the work, and they have interest ... and editors may be waiting on someone to close so they can get back to editing. We know, looking through the discussions, the job of reading through and resolving a big deliberation is hard, especially with back and forth and contentiousness. [We hope to] help that person do that work."

The "help" provided by the team came in the form of a machine learning model to predict whether a given RfC would close or go stale. The model was developed through an analysis of 60-plus features of the text, Wikipedia page and editor account information. Those details included information on the number of comments, the maximum and average age of participants as well as the difference in their ages, the cognitive tone of the RfC and the sum of edit counts of participations, among many other aspects.

When "trained and tested" on the entire dataset, the best model achieved a 75 percent accuracy, an improvement of 8 percent over a baseline of simply predicting that a given RfC wouldn't go stale.

One day, the researchers predicted, the model could be used by RfC initiators to track the discussion as it unfolds. "We think it could be useful for editors to know how to a target their interventions," Zhang said. "They could post [the RfC] to more [Wikipedia forums] or invite more people if it looks like it's in danger of not being resolved."

The model could also be used for other community platforms involving large-scale discussions and deliberations, the researchers noted, such as planning forums for community projects, where participants weigh in on various proposals. As Zhang explained, "People are discussing [the proposals] and voting on them, so the tools can help communities better understand the discussions ... and would [also] be useful for the implementers of the proposals."

As an outcome of their project, the researchers have introduced Wikum, which helps users break down a large threaded discussion into manageable chunks to tag, group and summarize. "The work of closer is pretty tough," Zhang said, "so there's a shortage of people looking to close these discussions, especially difficult, longer, and more consequential ones. This could help reduce the barrier to entry [for editors to become closers] and help them collaborate to close RfCs."

The paper on the research project is openly available through researcher Jane Im's website.

About the Author

Dian Schaffhauser is a senior contributing editor for 1105 Media's education publications THE Journal and Campus Technology. She can be reached at dian@dischaffhauser.com or on Twitter @schaffhauser.

comments powered by Disqus

Campus Technology News

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.