With the raise of Big Data, distributed stream processing engines have become a major field of study. Several projects, such as Spark Streaming, Storm, or Flink, have emerged with the goal of being able to process large amount of data in real time. These systems have been designed to run efficiently on execution platforms such as Cloud computing platforms where resources can be highly volatile, and commodity hardware is used. In this case, peak performance is not the most important concern.
Some execution platforms, on the other hand, provide their users with a very stable execution environment and high-end hardware resources. This is the case for some private data centers or for high performance computing systems. In this context, one expects a stream processing engine to be able to take advantage of the available resources to achieve very high throughput and/or low latency. But the design of existing stream processing engines hardly allows to make full use of these resources. For instance, experiments show that on platforms equipped with a high performance interconnect (e.g., Infiniband, Omni-Path, etc.), bottlenecks in a system such a Spark streaming prevents it from taking full advantage of the available network bandwidth.
Achieving high performance stream processing is a major goal in several application domains. The one we consider in our work is in-situ data analysis for large scale High Performance Computing simulations. The goal is to analyze the data online, i.e., as soon as they are produced. Analyzing data without flushing them first to a Parallel File System improves the global performance of the system while allowing to analyze larger portions of the data that are generated by the simulations. As such, it has the potential to increase the scientific knowledge that can be gained from the execution of numerical simulations.
The postdoctoral researcher will integrate a research group including faculty members and research engineers, and work in close collaboration with PhD and master students on the topic of high-performance stream processing. The goal of our team is to study the ways to achieve high throughput for stream processing by taking advantage of: i) high-end hardware technologies (high performance networks, emerging non-volatile memory, etc.) that are or will be available soon in data centers, and, ii) the low resource volatility experienced in some private data centers and HPC platforms.
To achieve this goal, the postdoctoral researcher will have the opportunity to conduct studies on the design of stream processing engines and/or on their implementation.
This work will be run in the context of a national project involving academic research groups as well as some major French IT companies.
The postdoctoral researcher will join the Informatics Laboratory of Univ. Grenoble Alpes, one of the largest laboratory in Computer Science in France. Univ. Grenoble Alpes is one of the 50 best universities in the world for computer science according to international rankings.