A String Similarity Evaluation for Healthcare Ontologies Alignment to HL7 FHIR Resources
Abstract
Current healthcare services demand the transformation of health data into a mutual way, while respecting standards for making data exchange a reality, rising the needs of interoperability. Most of the developed techniques addressing this field are dealing only with specific one-to-one scenarios of data transformation. Among these solutions, the translation of healthcare data into ontologies is considered as an answer towards interoperability. However, during ontology transformations, different terms are produced for the same concept, resulting in clinical misinterpretations. In order to avoid that, ontology alignment techniques are used to match different ontologies based on specific string and semantic similarity metrics, where very little systematic analysis has been performed on which string similarity metrics behave better. To address this gap, in this paper we are investigating on finding the most efficient string similarity metric, based on an existing approach that can transform any healthcare dataset into HL7 FHIR, through the translation of the latter into ontologies, and their matching through syntactic and semantic similarities. The evaluation of this approach is being performed through the string similarity metrics of the Levenshtein distance, Cosine similarity, Jaro-Winkler distance and Jaccard similarity, resulting that the Levenshtein distance provides more reliable results when dealing with healthcare ontologies