Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias/Section 3

From Wikisource
Jump to: navigation, search

3. Research Methodology

Figure 3.1 below depicts the research methodology employed in the study. In summary, this consisted of 31 experts (academics and doctoral students) reviewing two pairs of articles each in their area of expertise and in their native language. The languages selected for the purpose of this study were English, Spanish and Arabic. The rationale for selecting the same is mentioned in section 3.1 below. The academic areas of expertise selected for the purpose of this study were (a) Humanities (b) Social Sciences (c) Mathematics, Physics and Life Sciences and (d) Medical Sciences. The rationale for selecting these four academic areas to classify both articles and the reviewers' areas of expertise, was that they correspond with the four main academic divisions at the University of Oxford, which is where this study was carried out. Further details on each aspect of the methodology are described in the sections that follow.

Epic Oxford report figure 3.1.png

Fig. 3.1 Flowchart of research methodology.

3.1 Selection Criteria

3.1.1 Selection of Languages

As of July 2012, there were 285 different language versions of Wikipedia[1]. Three of the most popular world languages were included for the purpose of this study, based firstly on their popularity in terms of numbers of native speakers[2] and secondly in terms of numbers of Wikipedia articles[3], with the intention of choosing those with potential for a wide reach.

The top five world languages in order by numbers of native speakers were found to be Mandarin (Standard Chinese), Spanish, English, Hindi-Urdu and Arabic. These appear in the list of number of articles per language version of Wikipedia ordered as follows: English, Spanish, Chinese, Arabic and then Hindi-Urdu. The Chinese Wikipedia was found to be heavily censored and was therefore excluded as it would possibly confound the research results[4]. The three languages selected at the end of this process were:

  1. English: The de facto language in the UK, Australia, USA, UAE and Malaysia and the unifying language for countries such as Bangladesh, Botswana, India, Hong Kong, Pakistan, Philippines and Tanzania.
  2. Spanish: The official language of Spain, as well as the de facto or de jure language of a large number of countries in Latin America, among them: Mexico, Argentina, Bolivia, Chile, Colombia, Ecuador, Paraguay and Venezuela. In addition, Spanish is the predominant language in Equatorial Guinea, Africa.
  3. Arabic: The official language of a large number of countries across the Middle East and North Africa, among them: Bahrain, Egypt, Kuwait, Oman, Qatar, Saudi Arabia, Algeria and Tunisia. Modern Standard Arabic is based on Classical Arabic and is the literary language used in most current, printed Arabic publications and spoken by the Arabic media.

These languages offer a range of numbers of total articles and average edits per article for Wikipedia, as shown in Table 3.1 below:

Language Ranking for total number of Wikipedia articles Total number of articles Average number of edits per article (2.s.f)
English 1st 4,003,764 136
Spanish 7th 904,461 68
Arabic 25th 186,414 58

Table 3.1 Characteristics of Wikipedia articles in each of the three study languages.[5]

3.1.2 Selection of Comparison Encyclopaedias in Each of the Languages

The criteria for the selection of the comparison encyclopaedia in each of the three languages were as follows:

Essential Criteria:

  1. The encyclopaedia should be available online.
  2. The encyclopaedia should be a popular choice among the native speakers of that language.
  3. The encyclopaedia should cover a broad range of articles within each specific discipline.
  1. The encyclopaedia should contain articles of reasonable length on each of the topics selected as per the reviewers' academic area of expertise, i.e. at least 1.5 pages in length or more.

Preferable Criteria:

  1. The encyclopaedia's articles should seem complete when read through by a native speaker of the language.
  2. The encyclopaedia's articles should contain links at the bottom of the articles to enable the user to access further information if required.

The selection process of the encyclopaedia was based on the availability, quality and length of its articles. The selection was carried out by the research team with reference in each case to a native speaker in each of the three study languages (postgraduate students at the University of Oxford). The selection of comparative encyclopaedias for the study was made independent of the opinions of the research team at the Wikimedia Foundation. This was done in order to increase the robustness of the study design by eliminating any potential biases in the selection of the alternative encyclopaedias for comparison.

The following encyclopaedias were selected:

EPIC Oxford report Page 16 Illustration.png

Encyclopaedia Britannica:

For the English language, the alternative encyclopaedia selected was the online home version of Encyclopaedia Britannica. As well as being the oldest English-language encyclopaedia, it was also the encyclopaedia originally chosen by Nature to compare with Wikipedia[6]. Britannica was founded in 1768, in Edinburgh, Scotland, and has grown continuously since then with offices in London, New Delhi, Paris, Seoul, Sydney, Taipei and Tokyo. The ownership of Britannica passed to two Americans in the 1930s and, since then, the company's headquarters has been in Chicago. Britannica was an early leader in digital publishing. In 1981, the first digital version of the Encyclopædia Britannica was created for the Lexis-Nexis service. It has been stated to be possibly the first digital encyclopaedia in the world. As personal computers grew in number in the mid-1980s Britannica produced the first multimedia CD ROM encyclopaedia in 1989. In 1994, Britannica Online, the first encyclopaedia on the Internet, was introduced[7].


Enciclonet was selected to be the alternative encyclopaedia of choice in Spanish. Enciclonet is an online project based on the Universal Encyclopaedia and developed by Micronet equipment. It is described as the first online general encyclopaedia in Spanish ( It was selected because of its high popularity, its high Alexa traffic rank of 322,628[8] and because of the comprehensive nature of its articles. The other online Spanish encyclopaedias considered were Enciclopedia Universal en Español (which was not chosen as it could not be accessed in January 2012), Ateneo de Cordoba[9] (which was not chosen as it incorporated Wikipedia articles), Gran Enciclopedia Aragonesa[10] (which was not chosen because it was not found to be as comprehensive as Enciclonet), a little-known encyclopaedia developed by the University of Sevilla[11] and Gran Enciclopedia de Espana[12] (which was not found to be as comprehensive as Enciclonet).

Mawsoah and Arab Encyclopaedia:

Mawsoah[13] was selected to be the alternative encyclopaedia of choice in Arabic for the social sciences and medical sciences. Arab Encyclopaedia[14] was selected as the alternative Arabic encyclopaedia for mathematics, physics and life sciences. Due to extreme difficulty encountered in finding an online Arabic encyclopaedia to meet all four essential criteria, it was decided to select the best encyclopaedia choices for each academic discipline as there appeared to be a substantial segregation of encyclopaedias by discipline.

Mawsoah was selected because it has 150,000 articles and its articles appear to be comprehensive and have good categorisation. Arab Encyclopaedia was chosen because it appeared to have the highest traffic amongst the other alternative online encyclopaedias and has hyperlinks embedded into articles. Unlike Mawsoah, however, Arab Encyclopaedia's articles are authored by a single person. In addition, it is extremely important to highlight that neither Mawsoah nor Arab Encyclopaedia covered all academic disciplines to the same extent, even for basic articles and articles on key concepts.

The other option considered for Arabic encyclopaedias was Dahsha[15], a Saudi Arabian encyclopaedia with high traffic. However, on exploring this option further, Dahsha did not appear to have the same coverage of topics as either Mawsoah or Arab Encyclopaedia.

3.2 Sampling

3.2.1 Sampling of Expert Reviewers

Step 1: Selection of student reviewers

Student reviewers were recruited from the University of Oxford. All were postgraduate students, either currently studying or recently having completed either a masters or doctoral degree. 116 students were initially identified as potential reviewers, in order to cover the full range of academic disciplines and native languages selected for the study, 12 of whom were finally invited to participate (a further 12 were identified as a back-up). Each selected student was asked to provide biographical information in terms of educational qualifications, area of expertise and current academic focus (see Appendix I (2)).

Step 2: Identification and recruitment of established academics

Student reviewers identified academic experts known to them in their own areas of academic expertise. Criteria for nomination were as follows:

Essential Criteria

  1. Each academic expert must have a higher educational qualification, preferably a PhD.
  2. The academic expert must have demonstrated their academic status by having a permanent post at a highly rated department within a well-established University.
  3. The academic expert should have worked closely with the student and have overlapping areas of research interests.
  4. The academic expert should be fluent in the student's native language.

Desirable Criteria:

  1. The academics and student should share the same native language.
  2. They should have a number of publications in peer-reviewed journals, or be a leading investigator on a large-scale, funded project.

Each student was asked to nominate three academic experts and to provide contact details and a brief biography for each of three nominees. The list of nominees was reviewed by the research team to ensure they were eligible for participation. In the rare cases where the academic did not have a PhD, students were asked to nominate another academic in their stead. The final list of nominated academic experts totalled 33, out of which number 22 accepted the invitation from the project team to participate.

Step 3: Completion of review using online feedback tool

Reviewers were asked to review articles in their native language and relating to their area of academic expertise using an online review tool specially designed for the purpose of the study. Of each pair of articles, one article was a Wikipedia entry on the topic and the other was an article on the same topic from the alternative online encyclopaedia for that language. Reviewers were not aware of the source of the articles and were asked to make no efforts to identify the same. All cues as to the source of the article were eliminated before the students viewed the article. This was carried out during the standardisation and anonymisation process, the details of which are described in section 3.4.2. Reviewers were asked to comment on the quality, accuracy, citability and style of each of the articles as well as on their opinions about the readability of the article and whether the information contained in it was, to the best of their knowledge, up to date. They were also asked to compare both articles within a pair, listing the strengths and limitations of each. Both quantitative and qualitative data were collected and reviewers were asked to confirm that they had made no attempt to identify the source of the articles by completing a declaration at the end of the review. The various dimensions assessed by the online feedback tool developed for the review process are detailed in Section 3.4.1.

3.3 Selection of articles

The selection of reviewers with strong academic credentials was considered to be paramount in this study, and therefore only after they had been recruited was it appropriate to seek articles that matched their areas of expertise sufficiently well.

A list of keywords for possible articles was drawn up based on the information provided by the students about:

  1. Their area of research and academic expertise.
  2. The nominated academic's area of research and academic expertise.
  3. Areas of overlap between the students' and academics' areas of research and expertise.

As it turned out, it was not always possible to select articles that mapped the students' and academics' areas of expertise exactly, as articles for these niche areas were not found to exist in many encyclopaedias or were found to be incomplete or of inadequate length. A second phase was then embarked on by the research team to select articles of substantial length (≥1.5 pages) that appeared most complete and comprehensive. This resulted in a list of possible articles that was much broader and less specialist than initially sought, and which did not map on to the niche aspects of the academic's expertise. Thus the selection of articles was constrained by two important factors: one, the need to find topics appropriate for the academics whom we were able to recruit to the project; secondly, that articles from different online encyclopaedias were of comparable substance and focus. (Such factors would need to be taken carefully into account when embarking on a future large-scale study, where the demands of finding large numbers of comparable articles are likely to be considerable.)

Nevertheless, the second phase allowed the compilation of the 22 pairs of articles for review, across three languages and four academic disciplines. The topics of the articles selected for review are listed in Table 3.2.

The selection criteria for articles listed in table 3.2 were as follows:

  1. The topic must be related to the academic and research interest of all the reviewers of the article.
  2. Availability of an article on the topic in both Wikipedia and the alternative encyclopaedia of choice.
  3. Length of the article on the topic in both Wikipedia and the alternative encyclopaedia of choice must be ≥1.5 pages when pasted into a MS Word document.
  4. No traces of vandalism in the article (the definition of vandalism is given in Section 3.4.2). Note: This criterion turned out to have no impact on the selection of articles for the present study.
Humanities Social Sciences Mathematics, Physics and Life Sciences Medical Sciences
ENGLISH Saint Thomas Aquinas/ Thomas Aquinas
Saint Anselm of Canterbury/ Anselm of Canterbury
Elementary/ primary education
Preschool education
Antibiotic resistance
SPANISH Cambio Climatico
Energia Renovable
Evo Morales
Hugo Chavez
Número racional
ARABIC Middle East
Mathematical proof
Parkinson's disease

Table 3.2 Final list of articles for review in each of the three study languages.

3.4 The Review Process

3.4.1 Development of a Feedback Questionnaire to Assess Articles

A feedback questionnaire was constructed following a literature review of current tools available to assess the quality and accuracy of written text. The feedback questionnaire was developed by the team.

It consists of 23 items that assess four key dimensions for assessing the quality of articles as follows:

  1. Intrinsic attributes of quality and accuracy
  2. Temporal attributes
  3. Style
  4. Subjective opinions

A variety of more detailed constructs was assessed under each of these dimensions using a Likert-type (i.e. 1-5) rating scale (see Appendix I (3)). These are listed in Table 3.3. Both qualitative and quantitative information was collected for each dimension.

Reviewers commented on each article within a pair using this feedback tool i.e. per reviewer; four such assessments were conducted corresponding to each of the four articles.

In addition, reviewers completed a comparative questionnaire after reviewing each pair of articles, where they were asked to comment about the two articles in the pair in comparison to each other (see Appendix 1(4)).

Dimension Construct Aspect Assessed
Intrinsic attributes of quality and accuracy Accuracy/ Validity Presentation of correct information, factual inaccuracies, errors, misleading statements
Breadth of references The extent to which the information is well researched and cited
Quality of references The relevance and importance of the references
Completeness All aspects of the topic addressed, omission of key facts
Conciseness Length of the article compared to the information contained in the text, presence of repetition
Coherence Coherence between different sections of the text
Relevance Extent of relevance of the information to the topic, presence of digressions
Neutrality Unbiased and objective nature of the information; acknowledgement of controversies and/ or gaps in knowledge
Temporal attributes Currency Information is up to date based on the reviewer's knowledge
Style Writing style Use of clear and appropriate language; spelling and grammatical accuracy, use of punctuation
Clarity and organisation Structure of the article, order in which information is presented, readability
Inclusion of photographs, charts and tables Inclusion of photographs, charts and tables and their contribution to an understanding of the text
Subjective opinions Enjoyment The extent to which the reviewer enjoyed reading the article
Citability The extent to which the reviewer would cite the article in (a) non-academic work (b) academic work
Strengths Key strengths of the article
Flaws Key limitations of the article

Table 3.3 Dimensions and constructs of article feedback questionnaire.

The key articles in previous literature that informed the design of the tool used in this study were as follows:

  1. Information Quality Discussions in Wikipedia. Stvilia B., Twidale M. B., Gasser L. and Smith C., 2005
  2. Assessing information Quality of A Community-Based Encyclopaedia. Stvilia B., Twidale M. B., Smith C and Gasser L., 2005
  5. Crawford, H. (2001). Encyclopedias. In: R. Bopp, L. C. Smith (Eds.), Reference and information services: an introduction (3 ed.). (pp. 433-459). Englewood, CO: Libraries Unlimited
  6. Gasser, L., Stvilia, B. (2001). A new framework for information quality. Technical report ISRN UIUCLIS--2001/1+AMAS. Champaign, IL: University of Illinois at Urbana Champaign
  7. Critical Appraisal Skills Programme (CASP) Making sense of evidence: 10 questions to help you make sense of qualitative research
  9. Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. Kittur A., Kraut R. E. Proceedings of the 2008 ACM conference on Computer supported cooperative work
  10. Measuring article quality in wikipedia: models and evaluation. Hu M., Lim E., Sun A., Lauw H. W. and Vuong B. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

3.4.2 Standardisation and Anonymisation Protocol

A standardisation and anonymisation protocol was drawn up to ensure that all cues as to the source of the articles were removed. This included the removal of particular formatting patterns such as the article tree at the beginning of Wikipedia articles, special in-text references and internal links and the names of the article's authors.

Fig. 3.4 summarises the steps in the standardisation and anonymisation process. All standardisation and anonymisation was conducted by three researchers native in English, Spanish and Arabic respectively who were not part of the review panel of the study.

Step 1: Reading of article to identify vandalism

After pasting the article into a MS Word document, standardisers were asked to read through the article to identify any vandalism (this was of particular importance for Wikipedia entries which are open to edition by any user). Vandalism was defined as any addition, removal or change of content in a deliberate attempt to compromise the integrity of the article[16]. Examples of typical vandalism are adding irrelevant obscenities and crude humour to a page, illegitimately blanking pages and inserting obvious nonsense into a page. No instances of vandalism were detected in any of the articles for the present study, either by standardisers or reviewers.

Epic Oxford report figure 3.4.png

Figure 3.4 Summary of standardisation and anonymisation protocol.

Step 2: Standardisation Process

Article Text:

All articles selected from Wikipedia and from other popular online alternative encyclopaedias then underwent a process of standardisation to remove visible cues as to the source of the article. This included the conversion of all article text to black Arial font with specified font sizes for the title (16 Bold), Headings (14 Bold) and Sub-headings (10). All text was single spaced and aligned to the left.

Supporting Material:

Any supporting material e.g. photographs, flow-charts and plots was pasted at the end of the document section in which they appear, one after the other, in their order of appearance in the text. They were resized to 5cm x 5cm, and captions were pasted beneath the corresponding pictures in Arial font, size 10. In cases where the picture lacked a caption, one was not added.

References and Links:

References at the end of the text were maintained in a standard list format, in black Arial font (size 8). All hyperlinks from reference lists were removed and the presence of 'notes' at the end of Wikipedia entries were placed under references and formatted accordingly. All '^ abcde' were deleted from the references when they occurred.

For articles from alternative encyclopaedia choices, a heading entitled 'Additional Information (from links)' [Arial font, black, bold, size 14] was created at the bottom of the text of the primary article in the MS Word document. All articles under the assorted references sections were read through to confirm they are not covered in the text of the primary articles. Articles whose topics were not included in the primary articles were pasted in the 'Additional Information (from links)' section in the order that they appeared in the primary article under sub-sections named after the title of the link and formatted as per the instructions mentioned in article text above. This procedure was not carried out for Wikipedia articles.

Step 3: Anonymisation Process

All articles then underwent a process of anonymisation to remove visible cues as to the source of the article. This included the following steps:

  1. Wikipedia articles were read by the researchers to identify potential acts of vandalism as mentioned in Step 1.
  2. Conversion to a standardised basic text format as mentioned in Step 2.
  3. Removal of cues:
    1. Certain characteristics cues such as the article tree in a Wikipedia entry, content warning (such as the 'article has multiple issues' box at the top of a Wikipedia entry), calls for donations, etc. were removed.
    2. Block quotes in Wikipedia entries were formatted from italics to regular text in Arial font (Colour: black, Font Size: 10).
  4. In text, references were maintained but hyperlinks, author names and affiliations were removed. The removal of authors' names was clearly essential in order to avoid making the origin of a particular obvious to reviewer, as indeed was the removal of the article tree from Wikipedia articles, because this information gave clear indications of the identity of encyclopaedias.
  5. All 'See Also', 'Related Articles', 'External links', links to user ratings [Wikipedia], and 'links', 'related articles', 'share', 'like', 'get involved' features [Britannica], were removed. An example of an article, standardised and anonymised according to this process and ready for review, is presented in Appendix I (6).

3.4.3 Development of the Online Review Tool

The articles and the article feedback questionnaires were uploaded onto an online review tool created using a Moodle. Moodle ( is a Course Management System, also known as a Learning Management System or a Virtual Learning Environment. It is a free open source web application that educators can use to create effective online learning sites.

The objective of the online review tool was to:

  1. provide an online platform for the experts to view, read and rate the pairs of articles accurately and easily and to make the review an enjoyable experience
  2. facilitate easy collection of both quantitative and qualitative data for the purpose of data analysis

A username and password was generated for each reviewer enabling them to log into their account online and perform the following operations:

  1. Consent to participate in the study.
  2. Read the instructions for the review.
  3. Access, view and read each article within a pair.
  4. Comment on each article individually.
  5. Comment on each article in comparison with each other.
  6. Confirm that he/ she has completed the review himself/ herself and declare that he/ she has not made any attempt to identify the source of the articles.

  1. Wikipedia (2012) Lists of Wikipedias, [Online], Available at: [Accessed 12/07/12].
  2. Wikipedia (2011) List of languages by number of native speakers, [Online], Available at: [Accessed 16/04/11].
  3. Pender M, Lasserre L, Kruesi L, Del Mar C, and Anaradha S. 2008. Putting Wikipedia to the Test: A Case Study. Paper presented at to the Special Libraries Association Annual Conference, Seattle, June 16.
  4. Wikipedia (2010) Task force/China, [Online], Available at [Accessed 01/07/11].
  5. Wikipedia (2012) Lists of Wikipedias, [Online], Available at: [Accessed 12/07/12].
  6. Giles, J. (2005) 'Internet encyclopaedias go head to head', Nature, vol.438, 15 December 2005, pp. 900-901.
  7. Taken from