Page:Wikidata making of.pdf/7

From Wikisource
Jump to navigation Jump to search
This page has been validated.
Wikidata: The Making Of
WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA

Figure 9: Wikidata won the Open Data Award in 2014; from left to right: Sir Nigel Shadbolt, Lydia Pintscher, Magnus Manske, Sir Tim Berners-Lee

In addition to inspiring people’s imagination, it was also necessary to support the editors with specialized and large-scale editing tools to be able to create and maintain the vast knowledge graph. The development team focused on the core of Wikidata, and community members stepped up to the task of building these tools around the core. Here too, Manske was chief among them. He created tools such as Mix’n’match,[1] for matching Wikidata Items to entries in other catalogs; Terminator,[2] for gathering translations for Items in missing languages; The Wikidata Game,[3] for answering a question that will result in an edit on Wikidata[4]; and – maybe most importantly – QuickStatements,[5] that significantly lowered the bar for mass edits by non-technical editors.

During this time, it also became apparent that more support for the editors was needed to define ’rules’ around the data without losing the flexibility and openness of the project. In July 2015, the Property constraint system was introduced, which enabled editors to specify in a machine-readable way how each of the thousands of Properties should be used.

In September 2015, the initial Wikidata Query tool by Manske had served its purpose as a feasibility study and demo, and the Wikidata Query Service (WDQS) was launched.[6] WDQS is a Blazegraph-based SPARQL endpoint that gives access to the RDF-ized version [16, 21] of the data in Wikidata in real-time, through live updates [37]. Its goal is to enable applications and services on top of Wikidata, as well as to support the editor community, especially in improving data quality. Originally, Vrandečić had not planned for a SPARQL query service, as he did not think that any of the available Open Source solutions would be performant enough to support Wikidata. Fortunately he was wrong, and today the SPARQL query service has become an integral part of the Wikidata ecosystem.[7] In particular, the query service allows for the creation of beautiful, even interactive visualizations directly from a query, such as maps, galleries, and graphs (see Figure 10). The service supports federation with other query endpoints, and allows for downloading the results in various formats. Reaching the 2020s, however, the query service has started to become a bottleneck, as the growth of Wikidata has outpaced the development of Open Source triplestores [75].

Figure 10: Example query visualizations from WDQS: locations of movie narratives (above), and a timeline of space discoveries (below)

9 TEENAGE WIKIDATA (2015-2022)

In 2016, Google closed down Freebase and helped with the migration of the data to Wikidata [50]. The Wikidata community picked up the data carefully and slowly, and ensured that the influx of data would not push beyond their capacity to maintain it.

While Wikidata was always imagined to be useful outside of Wikipedia, its development had started out with the focus of providing a backbone for Wikipedia. This very soon expanded to the other Wikimedia projects, as well as data consumers outside of Wikimedia looking for general purpose data. But that is not the only expansion that Wikidata went through.

In early 2018, Wikidata was extended to also be able to cover lexicographical data, in order to provide a machine-readable dictionary. In late 2018, Wikimedia Commons was enhanced with the ability to record machine-readable data about its media files, based on Wikidata’s concepts and technology.

The newest wave of expansion is the Wikibase Ecosystem, where groups and organizations outside Wikimedia use the underlying software that powers Wikidata (called Wikibase[8]) to run their own knowledge graphs, which are often highly inter-connected with Wikidata and other Wikibase instances, as well as other resources on the Linked Data Web.

10 OUTLOOK

The first ten years of Wikidata are just its beginning. There is hopefully much more to come, but also much more still to do. Indeed, even the original concept has not been fully realized yet. The initial Wikidata proposal (Section 6) was split in three phases: first sitelinks, second statements, third queries. The third phase, though, has not yet been realized. It was planned to allow the community to define queries, to store and visualize the results in Wikidata, and to include these results in Wikipedia. This would have served as a forcing function to increase the uniformity of Wikidata’s structure.


  1. https://mix-n-match.toolforge.org
  2. https://wikidata-terminator.toolforge.org
  3. https://www.wikidata.org/wiki/Wikidata:The_Game
  4. Example questions include “Is this Wikipedia article about a human?” or “Is this image a good representation of this concept?”
  5. https://quickstatements.toolforge.org
  6. https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thread/N2HPRCYIWGLM2IDTNCHQLNY574H5ZEQR/
  7. Malyshev et al. have curated a large, freely available dataset of anonymized WDQS queries and analyzed practical usage [37].
  8. https://wikiba.se

621