Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias/Executive Summary
Previous studies, most notably the one carried out by the journal Nature in 2005, have sought to compare the quality of Wikipedia articles with that of similar articles in other online Encyclopaedias. In part as a result of the findings of such studies, Wikipedia has instigated a number of processes for assessing the quality of its entries, inviting readers and editors to rate articles according to criteria such as trustworthiness, neutrality, completeness and readability. Recently, Wikipedia's founder Jimmy Wales highlighted the value of conducting a study which analysed articles across both languages and subjects to allow differences in levels of accuracy and quality across language and subject domains to be identified. The results could inform editor recruitment efforts and the design of expert feedback mechanisms.
The size, scope and complexity of undertaking such a large-scale study necessitated gathering preliminary evidence to inform the methodology and design. It was therefore decided that a small-scale preliminary project would be essential to determine a sound research methodology, which is the reason that the present pilot study was undertaken. The present study, funded by the Wikimedia Foundation, presents the background, methodology, results and findings of a preliminary pilot conducted by Epic, a UK-based e-learning company, in partnership with the University of Oxford.
2. Aims and Objectives
The key aims of this pilot study are as follows:
- To explore the opinion of expert reviewers regarding attributes relating to the accuracy, quality and style of a sample of Wikipedia across a range of languages and disciplines.
- To compare the accuracy, quality, style, references and judgment of Wikipedia entries as rated by experts to analogous entries from popular online alternative encyclopaedias in the same language.
- To explore the viability of the methods used in respect of the first two aims for a possible future study on a larger scale.
3. Research Methodology
Three languages were selected for study: English, Spanish and Arabic. Pairs of articles in those languages were selected in the following broad disciplinary areas: (a) Humanities, (b) Social Sciences, (c) Mathematics, Physics and Life Sciences and (d) Medical Sciences. Each pair consisted of an article from Wikipedia, and an article from one of a range of comparator online encyclopaedias: Encyclopaedia Britannica (English), Enciclonet (Spanish), Mawsoah and Arab Encyclopaedia (Arabic).
Twenty-four postgraduate students of the University of Oxford were selected to help review pairs of articles and to identify academic experts in their fields who would be recruited to review the same pairs of articles. Thirty-three academic experts were finally recruited. All possessed doctorates and were employed in academic posts at a highly rated department within a well-established university. All students and academic experts were fluent in the target languages.
A feedback tool was devised for eliciting numerical scores and qualitative comments about the articles, which were reviewed blind by the academics, who were asked to certify that they had not sought out the original articles online during the review process. The feedback tool provided academics with a wide range of quality criteria, drawn from extensive previously published research.
Articles were standardised so as to erase information which helped to identify their origins; in particular, checks were carried out to ensure that a particular article was not the victim of vandalism (although this did not impact on article selection for the present study).
Twenty-two articles were selected in all. Some difficulty was encountered in finding articles of sufficient substance and scope in encyclopaedias paired with Wikipedia in different languages.
4. Data Coding and Analysis
Quantitative and qualitative data were analysed through separate processes. Quantitative data analysis was carried out on the sample overall, in relation to each language separately, and in relation to each disciplinary area separately. Data was coded in five main dimensions: i) accuracy, ii) references, iii) style/ readability, iv) overall judgment (including citability), v) overall quality score.
Qualitative analysis was initially carried out blind, and involved the reduction and display of reviewers' comments so that these could be compared with one another, in relation to specific articles, pairs of articles and across the sample as a whole. The qualitative analysis aimed to capture both the opinions of reviewers about specific aspects of the articles, and their overall judgments about each individually and in comparison with the other in the pair.
All of the results outlined below are based on a small sample studied for the purposes of piloting the study's approach and methods, and these results cannot therefore be generalised to the wider output of the online encyclopaedias referred to.
Quantitative results for the articles reviewed show that the Wikipedia articles in this sample scored higher overall than the comparison articles with respect to accuracy, references, style/ readability and overall judgment. The scores for the latter item, which includes citability, indicated that none of the encyclopaedias were rated highly by academics in terms of suitability for citation in academic publications.
Results across languages showed that Wikipedia fared well in this sample against Encyclopaedia Britannica in terms of accuracy, references and overall judgement, but no better on style and overall quality score. The same was true of Enciclonet, but the Arabic encyclopaedias scored significantly higher on style than Wikipedia and equally well on the other criteria.
Results across disciplines showed that Wikipedia scored higher in this sample in terms of provision of references in humanities-based articles, but no differences were apparent in terms of the other criteria, as was also the case with articles in mathematics, physics and life sciences. There was a similar result for articles in social sciences, but with higher scores on style/ readability for the other encyclopaedias. In medical science articles, Wikipedia scored significantly higher on accuracy, references and overall judgment, but there were no differences on the other criteria.
Qualitative results for this sample showed similar findings, but also revealed the importance to reviewers of articles possessing a sense of cohesiveness and structure. Although many Wikipedia articles in the sample were commented on favourably, they were criticised in some cases for lacking cohesiveness and for internal inconsistencies and repetition. Reviewers were particularly approving of articles that presented an engaging and coherent introduction to a topic, rather than excessive amounts of information.
The same differences seen in the quantitative analysis were evident in the qualitative with respect to different languages. In terms of different disciplines, small differences in terms of favoured quality criteria were evident, such as an emphasis on the notion of conciseness in the science-based article reviews.
In many respects, the methodological approach had proved productive and workable on the small scale of the present study. But it was recognised that there were difficulties (even on this small scale) in terms of identifying appropriate articles, recruiting a sufficient range of reviewers, and anonymising articles which, if the study were to be carried out on a far larger scale, would possibly prove hard to surmount. Therefore, it is recommended that the viability of a larger study of this kind in the future should be considered cautiously, and that consideration might be given instead to carrying out a series of more compact studies of this kind over time.
It is also recommended that more research might be carried out on what is reasonable and appropriate to expect of online encyclopaedia content. It was clear from this study that, while many academics spoke in positive terms about a high proportion of articles reviewed from all encyclopaedias, it was not the case that they were inclined to regard these as being citable in academic publications alongside peer-reviewed journals and published books. We recommend that more research is done on how users interpret and make sense of content from online encyclopaedias in general and from Wikipedia in particular.
Overall, the Wikipedia articles in this very small sample, investigated as part of a pilot study only in this instance, fared well in comparison with articles from other encyclopaedias. While no generalisations can be made from this outcome, these findings do help to point researchers in future studies towards investigation of the unique qualities of Wikipedia, as a source of knowledge that was shown in the small number of instances studied here at least to be capable of producing articles that were markedly up to date and well referenced.