Hopsworks, the data-intensive AI platform with a feature store, brings support for scale-out AI with Copernicus data and the H2020 ExtremeEarth project. Hopsworks is integrated with the Polar and FoodSecurity Thematic Exploitation Platforms (TEPs) on the CREODIAS infrastructure. Two use cases, polar and food security, have been developed by making use of the scale-out distributed deep learning support of Hopsworks and the PBs of data made available by CREODIAS and processed by Hopsworks and the TEPs .
This article is based on the paper “The ExtremeEarth software Architecture for Copernicus Earth Observation Data” included in the Proceedings of the 2021 conference on Big Data from Space (BiDS 2021) [1].
In recent years, unprecedented volumes of data are being generated in various domains. Copernicus, a European Union flagship programme, produces more than three petabytes (PB) of Earth Observation (EO) data annually from Sentinel satellites [2]. However, current AI architectures making use of deep learning in remote sensing are struggling to scale in order to fully utilize the abundance of data.
ExtremeEarth is an EU-funded project that aims to develop use-cases that demonstrate how researchers can apply deep learning in order to make use of Copernicus data in the various European Space Agency (ESA) TEPs. A main differentiator of ExtremeEarth to other such projects is the use of Hopsworks, a Data-Intensive AI software platform with a Feature Store and tooling for horizontally scale-out learning. Hopsworks has successfully been extended as part of ExtremeEarth project to bring specialized AI tools to the EO data community with 2 use cases already developed on the platform with more to come in the near future.
Bringing together a number of cutting edge technologies, which deal from storing extremely large volumes of data all the way to performing scalable machine learning and deep learning algorithms in a distributed manner, and having them operate over the same infrastructure poses some unprecedented challenges. These challenges include, in particular, integration of ESA TEPs and Data and Information Access Service (DIAS) platforms with a data platform (Hopsworks) that enables scalable data processing, machine learning and deep learning on Copernicus data; development of very large training datasets for deep learning architectures targeting the classification of Sentinel images.
In this blog post, we describe both the software architecture of the ExtremeEarth project with Hopsworks as its AI platform centerpiece and the integration of Hopsworks with the other services and platforms of ExtremeEarth that make up for a complete AI with EO data experience.
There are several components that comprise the overall architecture with the main ones being the following.
Hopsworks. An open-source data-intensive AI platform with a feature store. Hopsworks can scale to the petabytes of data required by the ExtremeEarth project and provides tooling to build horizontally scalable end-to-end machine learning and deep learning pipelines. Data engineers and data scientists utilize Hopsworks’ client SDKs that facilitate AI data management, machine learning experiments, and productionizing serving of machine learning models.
Thematic Exploitation Platforms (TEPs). These are collaborative, virtual work environments providing access to EO data and the tools, processors, and Information and Communication Technology (ICT) resources required to work with various themes, through one coherent interface. TEPs address coastal, forestry, hydrology, geohazards, polar, urban themes, and food security themes. ExtremeEarth in particular is concerned with the polar and food security TEPs where the use cases also stem from. These use cases include building machine learning models for sea ice classification to improve maritime traffic as well as food crops and irrigation classification.
Data and Information Access Service (DIAS). To facilitate and standardize access to data, the European Commission has funded the deployment of five cloud-based platforms. They provide centralized access to Copernicus data and information, as well as to processing tools. These platforms are known as the DIAS, or Data and Information Access Services [2]. ExtremeEarth software architecture is built on CREODIAS, a Cloud infrastructure platform adapted to the processing of big amounts of EO data, including an EO data storage cluster and a dedicated Infrastructure-as-a-Service (IaaS) Cloud infrastructure for the platform’s users. The EO data repository contains Sentinel-1, 2, 3, and 5-P, Landsat-5, 7, 8, Envisat, and many Copernicus Services data.
Figure 1 provides a high-level overview of the integration of the different components with each other. The components can be classified into four main categories(layers):
To provide a coherent environment for AI with EO data to application users and data scientists, the goal of the architecture presented here is to make most components transparent and simplify developer access by using well-defined APIs while making use of commonly used interfaces such as RESTful API. As a result, a key part of the overall architecture is how these different components can be integrated to provide a coherent whole. The APIs used for the full integration of the ExtremeEarth components via the inter-layer interfaces of the software platform are described below and also are illustrated in Figure 1.
Application users that interact with the TEPs effectively are the users of the AI products generated by the machine learning and deep learning pipelines developed by the data scientists in Hopsworks. Previously we described the integration of the various components. Figure 3 depicts the flow of events within this architecture.
In this blog, we have shown how Hopsworks has been integrated with various services and platforms in order to extract knowledge from AI and build AI applications using Copernicus and Earth Observation data. Already two use cases, sea ice classification (PolarTEP) and crop type mapping and classification (Food Security TEP ), have been developed using the aforementioned architecture by using the PBs of data made available by the Copernicus programme and infrastructure.
If you would like to try it out yourself, make sure to register an account. You can also follow Hopsworks on GitHub and Twitter.
Proceedings of the 2021 conference on Big Data from Space 18-20 May 2021
Data and Information Access Services
Note: This article was previously published by ExtremeEarth.