Quality Assurance of Biomedical Ontologies Using Big Data Approaches
Biomedical ontologies have been used in a range of biomedical informatics applications from bench experiments to patient care at the bedside, as well as data integration, enabling knowledge discovery, and managing biomedical big data. Ontologies are often incomplete, under-specified, and non-static for reasons such as the evolving state of knowledge in a domain, the involvement of manual curation work, and the progressive nature of ontological engineering. Thus Ontology Quality Assurance (OQA) has become an indispensable part of the ontological engineering lifecycle. However, OQA has been challenged by the lack of systematic and scalable methods necessary to keep pace with the evolution and emergence of ontological systems.
We have developed scalable big data approaches using MapReduce to systematically auditing biomedical ontologies (e.g., detecting relation reversals and non-lattice fragments). With OQA methods implemented using massively parallel algorithms in the MapReduce framework, several orders of magnitude in speed-up were achieved. This big data approach makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between different versions of ontologies.
Large-scale Data Integration
Cross-institutional data sharing is crucial for developing and implementing large-scale clinical studies. Identification of patients across multiple institutions is required both for rare disease studies and other studies that need very large and diverse populations. We have developed an adaptable and flexible cross-cohort query framework for integrating and querying patient data from multiple sources. This framework has been successfully deployed for two ongoing national research resource sharing projects: (1) National Sleep Research Resource (NSRR)
; and (2) Center for SUDEP Research (CSR)
Ontology-guided Health Information Extraction
Electronic information in unstructured or semi-structured form in health and
healthcare has been steadily generated for decades. An explosive growth has occurred
since the recent adoption of electronic health records (EHRs). Textual health information includes clinical notes recorded in hospitals and health-related information on the web. Such health-related textual data contains an extraordinary amount of underutilized biomedical knowledge. In order to take advantage of such knowledge to facilitate second use of EHRs for patient cohort discovery and consumer health information retrieval, we have developed effective ontology-guided methods for automatic extraction of structured information from patient discharge summaries and online consumer health information.
Online Consumer Health Information Retrieval
The Internet provides an important source of consumer health information to patients, caregivers, families, and laypersons. The proliferation of online health information from government agencies, non-profit organizations, for-profit companies, and chatting and social networking sites presents myriad of challenges for information access and retrieval. We have addressed such challenges by (1) providing a multi-topic assignment approach to organizing consumer health information using Formal Concept Analysis; (2) introducing a novel Conjunctive Exploratory Navigation Interface (CENI) for supporting effective consumer health information retrieval and navigation; and (3) evaluating the effectiveness of CENI through a search-interface comparative evaluation using crowdsourcing with Amazon Mechanical Turk (AMT).