Structural Analysis of Biomedical Ontologies Center (SABOC)
SABOC is currently funded by the National Cancer Institute of the National Institutes of Health under Award Number R01 CA190779: A family-based framework of quality assurance for biomedical ontologies (P.I. Yehoshua Perl).
Current Project at a Glance: A family-based framework of quality assurance for biomedical ontologiesWe are currently developing a family-based Quality Assurance (QA) framework for biomedical ontologies. Ontology QA is critical for increasing the use of ontologies in interdisciplinary research and in electronic health records (EHRs). We are developing computational techniques for identifying concepts with high probability of errors to improve efficiency and effectiveness of ontology QA. Biomedical ontologies are large, complex knowledge representation systems that enable the integration of knowledge from different fields. The largest, best-known ontology repository is the National Center for Biomedical Ontologies (NCBO) BioPortal, containing more than 580 biomedical ontologies. However, errors have been discovered in BioPortal's ontologies. QA in BioPortal has been mostly focused on use-cases and ad hoc techniques. Our computational techniques will automatically identify sets of concepts with a high likelihood of errors to empower ontology QA.
The general process of creating an abstraction network from an ontology. We have developed different types of abstraction networks that capture different aspects of an ontology's structure. These abstraction networks are applicable to families of structurally similar ontologies. (a) Represents a subhierarchy of concepts (classes) from an ontology. (b) Represents the abstraction network summarizing (a).
In past research, we have designed QA techniques for individual ontologies and we have shown that sets of complex and uncommonly classified concepts have significantly higher percentages of errors. The theoretical basis for our QA are ontology summaries called Abstraction Networks (AbNs). Using AbNs, we identified error-prone concepts. In this project, we are performing QA for entire families of structurally similar ontologies. We have identified several important families, based on structural properties. If a classification of concepts yields higher than usual error rates in several ontologies of a family F, then we hypothesize that this will be true for such classifications for most ontologies of F. Our primary test beds are cancer-related ontologies, e.g., the National Cancer Institute thesaurus (NCIt), with different properties and purposes. Several non-cancer ontologies are also being analyzed.
The summarization of NCIt concepts according to their defining semantic relationships. (a) A subhierarchy of NCIt concepts. (b) A summary of these concepts from (a) according to the types of semantic relationship types (an "area taxonomy"). (c) A summary identifying the subhierachies of concepts that are modeled with the same types of semantic relationship types (a "partial-area taxonomy").
- Identify families of ontologies in the NCBO BioPortal, based on the structure of the ontologies
- Design a unified methodology for deriving abstraction networks for families of ontologies
- Build a software tool, the Ontology Abstraction Framework, to create abstraction networks for the ontologies in each family
- Investigate classifications that can indicate erroneous concepts in a family of ontologies
- Perform evaluation of our QA methodologies and usability studies for OAF
- Develop additional abstraction-network-based techniques and tools to support ontology development
SoftwareAs part of our research we are developing several software tools to enable the derivation, visualization, and exploration of summaries of ontologies. The software system supporting our abstraction-network-based studies is named the Ontology Abstraction Framework (OAF) and it is available as free and open source software. To download the Ontology Abstraction Framework see our Software page.
Selected PublicationsBelow are a selected list of the most relevant publications for the family-based QA project. For a complete list of publications associated with this project click here to view our complete Publications page.
Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies.
Journal of biomedical informatics, 61, 63-76. Click to read
Halper, M., Perl, Y., Ochs, C., & Zheng, L. (2017)
Taxonomy-Based Approaches to Quality Assurance of Ontologies
Journal of Healthcare Engineering. Accepted for publication.
Ochs, C., Geller, J., Perl, Y., & Musen, M. A. (2016).
A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies.
Journal of biomedical informatics, 62, 90-105. Click to read
Halper, M., Gu, H., Perl, Y., & Ochs, C. (2015).
Abstraction networks for terminologies: supporting management of “big knowledge”.
Artificial intelligence in medicine, 64(1), 1-16. Click to read
Ochs, C., Perl, Y., Geller, J., Haendel, M., Brush, M., Arabandi, S., & Tu, S. (2015).
Summarizing and visualizing structural changes during the evolution of biomedical ontologies using a Diff Abstraction Network.
Journal of biomedical informatics, 56, 127-144. Click to read
Ochs, C., Perl, Y., Geller, J., Arabandi, S., Tudorache, T., & Musen, M. A. (2017).
An Empirical Analysis of Ontology Reuse in BioPortal.
Journal of Biomedical Informatics, 71, 165-177. Click to read