Data management is the most challenging aspect of building Machine Learning (ML) systems. ML systems can read large volumes of historical data when training models, but inference workloads are more varied, depending on whether it is a batch or online ML system.
This article presents a novel system that produces multiyear high-resolution irrigation water demand maps for agricultural areas, enabling a new level of detail for irrigation support for farmers and agricultural stakeholders.
Emerging use-cases like smart manufacturing and smart cities pose challenges in terms of latency, which cannot be satisfied by traditional centralized infrastructure.
This paper introduces the Hopsworks platform to the entire Earth Observation (EO) data community and the Copernicus programme. Hopsworks is a scalable data-intensive open-source Artificial Intelligence (AI) platform that was jointly developed by Logical Clocks and KTH Royal Institute of Technology.
The Human Exposome Assessment Platform (HEAP) is a research resource for the integrated and efficient management and analysis of human exposome data.
Bringing together a number of cutting-edge technologies that range from storing extremely large volumes of data all the way to developing scalable ML and deep learning algorithms in a distributed manner and having them operate over the same infrastructure poses unprecedented challenges.
Ablation studies provide insights into the relative contribution of different architectural and regularization components to machine learning models' performance. In this paper, we introduce AutoAblation, a new framework for the design and parallel execution of ablation experiments.
This project employs novel technologies, such as the Earth System Data Cube, the Semantic Cube, the Hopsworks platform for distributed deep learning, and visual analytics tools, integrating them into an open, cloud-interoperable platform.
HopsFS-S3 is a hybrid cloud-native distributed hierarchical file system that is available across availability zones, has the same cost as S3, but has 100X the performance of S3 for file move/rename operations, and 3.4X the read throughput of S3 (EMRFS) for the DFSIO Benchmark.
HopsFS-CL is a highly available distributed hierarchical file system with native support for AZ awareness using synchronous replication protocols.
Maggy is an extension to Spark’s synchronous processing model to allow it to run asynchronous ML trials, enabling end-to-end state-of-the-art ML pipelines to be run fully on Spark. Maggy provides programming support for defining, optimizing, and running parallel ML trials.
Implicit model for provenance can be used next to a feature store with versioned data to build reproducible and more easily debugged ML pipelines. We provide development tools and visualization support that can help developers more easily navigate and re-run pipelines .
The distribution oblivious training function allows ML developers to reuse the same training function when running a single host Jupyter notebook or performing scale-out hyperparameter search and distributed training on clusters.
Implicit provenance allows us to capture full lineage for ML programs, by only instrumenting the distributed file system and APIs and with no changes to the ML code.
New version of block reporting protocol for HopsFS that uses up to 1/1000th of the resources of HDFS' block reporting protocol. IEEE BigDataCongress’19.
Change Data Capture paper for HopsFS (ePipe). CCGRID’19.
Paper description of a demo given for Hopsworks ML pipeline at SysML 2019.
Describes how HopsFS supports small files in metadata on NVMe disks. Middleware 2018.
IEEE Scale Prize Winning submission, May 2017. Heavy on database optimizations in HopsFS' metadata layer.
First main paper on HopsFS at USENIX FAST 2017.
HopsFS' leader election protocol that uses NDB as a backend. DAIS 2015: 158-172.