Page:Wikidata as a knowledge graph for the life sciences.pdf/10

From Wikisource
Jump to navigation Jump to search
This page needs to be proofread.

Feature Article

Science Forum Wikidata as a knowledge graph for the life sciences

Figure 5. Drug repurposing using the Wikidata knowledge graph. We analyzed three snapshots of Wikidata using Rephetio, a graph-based algorithm for predicting drug repurposing candidates (Himmelstein et al., 2017). We evaluated the performance of the Rephetio algorithm on three historical versions of the Wikidata knowledge graph, quantified based on the area under the receiver operator characteristic curve (AUC). This analysis demonstrated that the performance of Rephetio in drug repurposing improved over time based only on improvements to the underlying knowledge graph. Details of this analysis can be found at https://github.com/SuLab/WD-rephetio-analysis (archived at Mayers and Su, 2020). The online version of this article includes the following figure supplement(s) for figure 5: Figure supplement 1. Drug repurposing using the Wikidata knowledge graph, evaluated using an external test set.

knowledge, there are two simple ways to contribute to Wikidata. First, owners of data resources can release their data using the CC0 declaration. Because Wikidata is released under CC0, it also means that all data imported in Wikidata must also use CC0-compatible terms (e.g., be in the public domain). For resources that currently use a restrictive data license primarily for the purposes of enforcing attribution or citation, we encourage the transition to CC0 (+BY), a model that "move[s] the attribution from the legal realm into the social or ethical realm by pairing a permissive license with a strong moral entreaty’ (Cohen, 2013). For resources that must retain data license restrictions, consider releasing a subset of data or older versions of data using CC0. Many biomedical resources were created under or transitioned to CC0 (in part or in full) in recent years ,

Waagmeester et al. eLife 2020;9:e52614. DOI: https://doi.org/10.7554/eLife.52614

including the Disease Ontology (Schriml et al., 2019), Pfam (El-Gebali et al., 2019), Bgee (Bastian et al., 2008), WikiPathways et al., 2018), Reactome (Slenter (Fabregat et al., 2018), ECO (Chibucos et al., 2014), and CIViC (Griffith et al., 2017). Second, informaticians can contribute to Wikidata by adding the results of data parsing and integration efforts to Wikidata as, for example, new Wikidata items, statements, or references. Currently, the useful lifespan of data integration code typically does not extend beyond the immediate project-specific use. As a result, that same data integration process is likely performed repetitively and redundantly by other informaticians elsewhere. If every informatician contributed the output of their effort to Wikidata, the resulting knowledge graph would be far more useful than the stand-alone contribution of any single individual, and it would continually improve in both breadth and depth over time. Indeed, the growth of biomedical data in Wikidata is driven not by any centralized or coordinated process, but rather the aggregated effort and priorities of Wikidata contributors themselves. FAIR and open access to the sum total of biomedical knowledge will improve the efficiency of biomedical research. Capturing that information in a centralized knowledge graph is useful for experimental researchers, informatics tool developers and biomedical data scientists. As a continuously-updated and collaboratively-maintained community resource, we believe that Wikidata has made significant strides toward achieving this ambitious goal. Acknowledgements The authors thank the thousands of Wikidata contributors for curating knowledge, both directly related and unrelated to this work, much of which has been organized under the WikiProjects for Molecular Biology, Chemistry and Medicine. The authors also thank the Wikimedia Foundation for financially supporting Wikidata, and many developers and administrators for maintaining Wikidata as a community resource. Andra Waagmeester is at Micelio, Antwerp, Belgium https://orcid.org/0000-0001-9773-4008 Gregory Stupp is in the Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States https://orcid.org/0000-0002-0644-7212 Sebastian Burgstaller-Muehlbacher is in the Center for Integrative Bioinformatics Vienna, Max Perutz

10 of 15