Leveraging the serverless paradigm for realizing machine learning pipelines across the edge-cloud continuum
Abstract
The exceedingly exponential-growing data rate highlighted numerous requirements and several approaches have been released to maximize the added-value of cloud and edge resources. Whereas data scientists utilize algorithmic models in order to transform datasets and extract actionable knowledge, a key challenge is oriented towards abstracting the underline layers: the ones enabling the management of infrastructure resources and the ones responsible to provide frameworks and components as services. In this sense, the serverless approach features as the novel paradigm of new cloud-related technology, enabling the agile implementation of applications and services. The concept of Function as a Service (FaaS) is introduced as a revolutionary model that offers the means to exploit serverless offerings. Developers have the potential to design their applications with the necessary scalability in the form of nanoservices without addressing themselves the way the infrastructure resources should be deployed and managed. By abstracting away the underlying hardware allocations, the data scientist concentrates on the business logic and critical problems of Machine Learning (ML) algorithms. This paper introduces an approach to realize the provision of ML Functions as a Service (i.e., ML-FaaS), by exploiting the Apache OpenWhisk event-driven, distributed serverless platform. The presented approach tackles also composite services that consist of single ones i.e., workflows of ML tasks including processes such as aggregation, cleaning, feature extraction, and analytics; thus, reflecting the complete data path. We also illustrate the operation of the approach mentioned above and assess its performance and effectiveness exploiting a holistic, end-toend anti-fraud detection machine learning pipeline.