Aggregating the Syntactic and Semantic Similarity of Healthcare Data towards their Transformation to HL7 FHIR through Ontology Matching

Male and female medics examining scientific data vector


Background and Objective: Healthcare systems deal with multiple challenges in releasing information from data silos, finding it almost impossible to be implemented, maintained and upgraded, with difficulties ranging in the technical, security and human interaction fields. Currently, the increasing availability of health data is demanding data-driven approaches, bringing the opportunities to automate healthcare related tasks, providing better disease detection, more accurate prognosis, faster clinical research advance and better fit for patient management. In order to share data with as many stakeholders as possible, interoperability is the only sustainable way for letting systems to talk with one another and getting the complete image of a patient. Thus, it becomes clear that an efficient solution in the data exchange incompatibility is of extreme importance. Consequently, interoperability can develop a communication framework between non-communicable systems, which can be achieved through transforming healthcare data into ontologies. However, the multidimensionality of healthcare domain and the way that is conceptualized, results in the creation of different ontologies with contradicting or overlapping parts. Thus, an effective solution to this problem is the development of methods for finding matches among the various components of ontologies in healthcare, in order to facilitate semantic interoperability. Methods: The proposed mechanism promises healthcare interoperability through the transformation of healthcare data into the corresponding HL7 FHIR structure. In more detail, it aims at building ontologies of healthcare data, which are later stored into a triplestore. Afterwards, for each constructed ontology the syntactic and semantic similarities with the various HL7 FHIR Resources ontologies are calculated, based on their Levenshtein distance and their semantic fingerprints accordingly. Henceforth, after the aggregation of these results, the matching to the HL7 FHIR Resources takes place, translating the healthcare data into a widely adopted medical standard. Results: Through the derived results it can be seen that there exist cases that an ontology has been matched to a specific HL7 FHIR Resource due to its syntactic similarity, whereas the same ontology has been matched to a different HL7 FHIR Resource due to its semantic similarity. Nevertheless, the developed mechanism performed well since its matching results had exact match with the manual ontology matching results, which are considered as a reference value of high quality and accuracy. Moreover, in order to furtherly investigate the quality of the developed mechanism, it was also evaluated through its comparison with the Alignment API, as well as the non-dominated sorting genetic algorithm (NSGA-III) which provide ontology alignment. In both cases, the results of all the different implementations were almost identical, proving the developed mechanism’s high efficiency, whereas through the comparison with the NSGA-III algorithm, it was observed that the developed mechanism needs additional improvements, through a potential adoption of the NSGA-III technique. Conclusions: The developed mechanism creates new opportunities in conquering the field of healthcare interoperability. However, according to the mechanism’s evaluation results, it is almost impossible to create syntactic or semantic patterns for understanding the nature of a healthcare dataset. Hence, additional work should be performed in evaluating the developed mechanism, and updating it with respect to the results that will derive from its comparison with similar ontology matching mechanisms and data of multiple nature.

Contributing Authors