Thomas Ropars

Associate Professor
Univ. Grenoble Alpes

High-throughput stream processing in HPC systems for online data analysis

Context

In-situ data analysis is an attractive solution to process the data produced by large scale High Performance Computing simulations, as it allows analyzing the data online, i.e., as soon as they are produced [1, 2]. Analyzing data without flushing them first to a Parallel File System improves the global performance of the system while allowing to analyze larger portions of the data that are generated by the simulations. As such, it has the potential to increase the scientific knowledge that can be gained from the execution of numerical simulations.

With the raise of Big Data, distributed stream processing engines have become a major field of study. Several projects, such as Spark Streaming, Storm, or Flink, have emerged with the goal of being able to process large amount of data in real time. Due to their simple programming model and their large ecosystem of users and libraries, such tools can allow field scientists to easily analyze numerical simulations data at scale [3].

Using Big Data stream processing engines for online data analysis in the context of numerical simulations raises several challenges. The two main questions are: (i) How to efficiently implement streaming programs to analyze the data produced by numerical simulations? (ii) How to adapt Big Data stream processing engines to take advantage of the specific hardware of HPC systems and achieve the unprecedented throughput required by numerical simulations?

Mission

The postdoctoral researcher will integrate a research group including faculty members and research engineers, and work in close collaboration with PhD and master students on the topic of stream processing for online data analysis. The goal of our team is to study the ways to achieve high throughput for stream processing by taking into account the particular needs of online analysis for numerical simulations and by taking advantage of the specific hardware technologies (high performance networks, fast non-volatile storage, etc.) available in HPC systems.

This work will be run in the context of a national project involving academic research groups as well as some major French IT companies.

Location

The postdoctoral researcher will join the Informatics Laboratory of Univ. Grenoble Alpes, one of the largest laboratory in Computer Science in France. Univ. Grenoble Alpes is one of the 50 best universities in the world for computer science according to international rankings.

Job information

Requirements

Contact

References

[1] A. C. Bauer et al. In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, Computer Graphics Forum, 2016.

[2] I. Foster et al. Computing just what you need: online data analysis and reduction at extreme scales, European Conference on Parallel Processing, 2017.

[3] M. Asch et al. Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry, The International Journal of High Performance Computing Applications, 2018.