Analyzing data and data sources towards a unified approach for ensuring end-to-end data and data sources quality in healthcare 4.0

Category

Journal Article

Published

30 June 2019

Abstract

Background and Objective: Healthcare 4.0 is being hailed as the current industrial revolution in the healthcare domain, dealing with billions of heterogeneous IoT data sources that are connected over the Internet and aim at providing real-time health-related information for citizens and patients. It is of major importance to utilize an automated way to identify the quality levels of these data sources, in order to obtain reliable health data. Methods: In this manuscript, we demonstrate an innovative mechanism for assessing the quality of var- ious datasets in correlation with the quality of the corresponding data sources. For that purpose, the mechanism follows a 5-stepped approach through which the available data sources are detected, identi- fied and connected to health platforms, where finally their data is gathered. Once the data is obtained, the mechanism cleans it and correlates it with the quality measurements that are captured from each different data source, in order to finally decide whether these data sources are being characterized as qualitative or not, and thus their data is kept for further analysis. Results: The proposed mechanism is evaluated through an experiment using a sample of 18 existing het- erogeneous medical data sources. Based on the captured results, we were able to identify a data source of unknown type, recognizing that it was a body weight scale. Afterwards, we were able to find out that the API method that was responsible for gathering data out of this data source was the getMeasurements() method, while combining both the body weight scale’s quality and its derived data quality, we could decide that this data source was considered as qualitative enough. Conclusions: By taking full advantage of capturing the quality of a data source through measuring and correlating both the data source’s quality itself and the quality of its derived data, the proposed mecha- nism provides efficient results, being able to ensure end-to-end both data sources and data quality.