Page:Wikidata making of.pdf/5

From Wikisource
Jump to navigation Jump to search
This page has been validated.
Wikidata: The Making Of
WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA

implemented language-independent identifiers with labels per language. Ideas were gradually converging towards Wikidata.

6 PROJECT PROPOSAL

Based on the long-standing interest in structured data around Wikimedia projects, Danese Cooper, then CTO of the Wikimedia Foundation, convened the Wikimedia Data Summit[1] in February 2011. Tim O’Reilly hosted the summit at the headquarters of O’Reilly in Sebastopol, CA. The invitation included representatives from the Wikimedia Foundation, Freebase (which had been acquired by Google the year prior), DBpedia, Semantic MediaWiki, R.V. Guha from Google, Mark Greaves from Paul Allen’s Vulcan, and others. Many different ideas were discussed, but a rough consensus between some participants emerged, which would prompt Vrandečić to start writing a proposal for what at first was called data.wikimedia.org, but eventually would become Wikidata.

The project proposal draft[2] was presented to the community by Vrandečić at Wikimania 2011 in Haifa, Israel. At that event, Qamarniso Ismailova, an administrator of the Uzbek Wikipedia, and Vrandečić met. They married in August 2012. The Q prefix in QIDs, used as identifiers in Wikidata, is the first letter in her name.

Möller made it clear that the Wikimedia Foundation would be, at that point, reluctant to take on a project of this scale. Instead he identified the German chapter, Wikimedia Deutschland, as a good potential host for the development. Thanks to the on-going collaboration in RENDER, Pavel Richter, then Executive Director of Wikimedia Deutschland, took the proposal to WMDE’s Board, which decided to accept Wikidata as a new Wikimedia project in June 2011, provided that sufficient funding would be available.[3] For Richter and Wikimedia Deutschland this was a major step, as the planned development team would significantly enlarge Wikimedia Deutschland, and necessitate a sudden transformation of the organization, which Richter managed in the years to come [56].

With the help of Lisa Seitz-Gruwell at the Wikimedia Foundation, they secured €1.3 Million in funding for the project: half from the Allen Institute for AI (AI2),[4] and a quarter each from Google and the Gordon and Betty Moore Foundation.[5] While looking for funding, at least one major donor dropped out because the project proposal insisted that the ontology of Wikidata had to be community-controlled, and would be neither pre-defined by professional ontologists nor imported from existing ontologies. Possible funders were also worried that the project did not plan to bulk-upload DBpedia to kick-start the content. Vrandečić was convinced that both of these requirements would not have had a positive effect on the organic growth of the community. Convinced the project would fail because of that, they dropped out.

7 EARLY DEVELOPMENT AND LAUNCH

Figure 4: Initial Wikidata development team from April 2013; from left to right: John Blad, Abraham Taherivand, Tobias Gritschacher, Henning Snater, Jeroen De Dauw, Daniel Kinzler, Markus Krötzsch, Lydia Pintscher, Silke Weber, Denny Vrandečić, Daniel Werner, Katie Filbert, Jens Ohlig

The development of Wikidata began on April 1st, 2012 in Berlin. During the first months of development, the groundwork was laid to allow MediaWiki to store structured data. At the same time, discussions in the Wikimedia communities led to people laying with the Wikidata project their highest hopes, and biggest concerns, for what the project would mean for the future of Wikipedia and the larger Wikimedia Movement. There were fears and fantasies of complete automation of Wikipedia article writing, a forced uniformity and alignment across the different Wikipedia language editions, and the loss of nuance and cultural context in structured data. Fortuitously, Google announced the Knowledge Graph in May 2012, which had a lasting positive impact on the interest into Wikidata.

Wikidata launched on October 29, 2012. This initial launch was, intentionally, very limited. Users could create new identifiers for concepts (QIDs), label them in many languages, and link them to Wikipedia articles and other Wikimedia pages. Statements were not supported yet, and the collected links and labels were not used anywhere. The first community-created item was about Africa.

One surprisingly contentious aspect was the use of numeric QIDs. Numeric QIDs are still being questioned today, with proponents arguing that a qualified name such as dbp:Tokyo is easier to understand than Q1490. A major influence for preferring abstract[6] QIDs were discussions with Metaweb regarding their experience with Freebase. Moreover, most other online databases, authority fles, and ontologies were also preferring abstract IDs. De-coupling a concept’s name from its ID can increase stability (since IDs do not change if names do), but studies found that Wikipedia article titles often are rather stable identifiers [20]. More importantly, however, the founders of Wikidata did not want to use an anglo-centric solution, nor suggest the use of many different language-specific identifiers (such as dbp:東京都) for the same Item.

8 EARLY WIKIDATA (2013–2015)

Wikidata grew. Crucial functionality was added, the community grew, and the content was expanded alongside many initial data modeling discussions. This growth was intentionally managed to be slow and steady, in order to build a healthy project, created and supported by a sustainable community.


  1. https://meta.wikimedia.org/wiki/Data_summit_2011
  2. https://meta.wikimedia.org/wiki/Wikidata/Technical_proposal
  3. https://www.wikimedia.de/wp-content/uploads/2019/10/Beschlüsse-des-8.-Vorstandes.pdf
  4. Wikidata actually triggered the creation of AI2: Paul Allen’s Vulcan Inc. could not legally provide charitable donations without a commercial contract, as required by Wikimedia, leading him to pursue the idea of an AI-related nonprofit that had been discussed for some time (Mark Greaves, personal communication).
  5. https://techcrunch.com/2012/03/30/wikipedias-next-big-thing-wikidata-amachine-readable-user-editable-database-funded-by-google-paul-allen-and-others
  6. Some rare QIDs do have meaning: https://www.wikidata.org/wiki/Wikidata:Humour

619