artistic title image

Mass Storage for Machine Learning in Seismology

image/svg+xml

From dCache to dCache InfiniteSpace — HIFIS mass storage from DESY for Helmholtz

Raven as logo for dCache

dCache is an open source project which developed a system for storing and retrieving large amounts of data, providing world-wide access. It has been built and is further developed by Deutsches Elektronen-Synchrotron (DESY), the Fermi National Accelerator Laboratory (FNAL) and the Nordic e-Infrastructure collaboration (NeIC).

Thus, the system was a perfect candidate for DESY to provide mass storage Helmholtz-wide via HIFIS. Actually, it has been one of the first services connected to the Helmholtz AAI in early 2020 as a demonstrator. Now, it has become a regular service and is provided via the Helmholtz Cloud Portal branded as “dCache InfiniteSpace”.

SeisBench — A toolbox for machine learning in seismology

The first users of the then prototype are seismologists from the REPORT-DL project: Rapid Earthquake Phase Analysis of Ocean-bottom, Regional and Teleseismic events with Deep Learning. This was funded in 2019 by Helmholtz AI, another Helmholtz Incubator plattform, and within this context, SeisBench was developed — A toolbox for machine learning in seismology.

seisBench logo

SeisBench is an open-source python toolbox, aiming to standardise access to datasets and models for seismic waveform processing with deep learning. This way, SeisBench both reduces the overhead for developers of such models and bridges the gap between model developers and seismic practitioners.

Key part of SeisBench is the ability to directly access benchmark datasets and pretrained models. To facilitate the sharing of this data, they use the dCache InfiniteSpace. This service equips them with a high-performance repository, enabling the comfortable sharing of datasets of several hundred gigabytes. Additional functionality provided through webDAV allows to implement convenience functions, such as the possibility to enumerate available model weights. More detailed information can be found in the project’s documentation.

Schematic diagram of SeisBench.
Schematic diagram of SeisBench. By Jack Woollam, license: GPLv3

Within the nine months since publication, SeisBench has grown an active user base of almost 200 users. These users access the dCache InfiniteSpace repository around 5000 times per month. Users are located internationally, including researchers at world-leading institutions (e.g. Harvard, Cambridge, Cornell). The majority of users come from outside the Helmholtz community, which highlights the importance to grant world-wide and easy access to such contents.

In addition to the infrastructure, dCache InfiniteSpace offers the SeisBench team detailed statistics on usage patterns. This allows them to identify which parts of their software are most used by the community, e.g., which models are of the largest interest. They use this information for planning future focuses in the development of SeisBench.

How to use dCache InfiniteSpace via HIFIS for your projects

The storage service is usable for any user group with central Helmholtz stakeholders, but not intended for single users. To set up the service optimally and align to your needs, you as a leader of an interested user group shall please briefly apply via HIFIS support, providing the most relevant information as described in the documentation.

Get in contact

For dCache InfiniteSpace: HIFIS Support

For SeisBench: Jannes Münchmeyer, Jack Woollam, Andreas Rietbrock


Changelog

  • 2022-08-02 – adapt the date for release of DESY Storage (HDF) in the Helmholtz Cloud Portal
  • 2023-04-05 – suspend the date for release of DESY Storage (HDF) in the Helmholtz Cloud Portal
  • 2023-06-28 – Added/Updated Links to Helmholtz Cloud portal entry and documentation for newly onboarded dCache InfiniteSpace service, rephrased some wording