Towards high performance distributed stream processing

Tl;dr

One-year postdoc position at Univ. Grenoble Alpes (LIG laboratory)
Topic: high performance stream processing engines
Goal: Study the design and the implementation of distributed stream processing engines to take advantage of emerging hardware technologies (high performance networks, non-volatile memory, etc.).

Context

With the raise of Big Data, distributed stream processing engines have become a major field of study. Several projects, such as Spark Streaming, Storm, or Flink, have emerged with the goal of being able to process large amount of data in real time. These systems have been designed to run efficiently on execution platforms such as Cloud computing platforms where resources can be highly volatile, and commodity hardware is used. In this case, peak performance is not the most important concern.

Some execution platforms, on the other hand, provide their users with a very stable execution environment and high-end hardware resources. This is the case for some private data centers or for high performance computing systems. In this context, one expects a stream processing engine to be able to take advantage of the available resources to achieve very high throughput and/or low latency. But the design of existing stream processing engines hardly allows to make full use of these resources. For instance, experiments show that on platforms equipped with a high performance interconnect (e.g., Infiniband, Omni-Path, etc.), bottlenecks in a system such a Spark streaming prevents it from taking full advantage of the available network bandwidth.

Achieving high performance stream processing is a major goal in several application domains. The one we consider in our work is in-situ data analysis for large scale High Performance Computing simulations. The goal is to analyze the data online, i.e., as soon as they are produced. Analyzing data without flushing them first to a Parallel File System improves the global performance of the system while allowing to analyze larger portions of the data that are generated by the simulations. As such, it has the potential to increase the scientific knowledge that can be gained from the execution of numerical simulations.

Mission

The postdoctoral researcher will integrate a research group including faculty members and research engineers, and work in close collaboration with PhD and master students on the topic of high-performance stream processing. The goal of our team is to study the ways to achieve high throughput for stream processing by taking advantage of: i) high-end hardware technologies (high performance networks, emerging non-volatile memory, etc.) that are or will be available soon in data centers, and, ii) the low resource volatility experienced in some private data centers and HPC platforms.

To achieve this goal, the postdoctoral researcher will have the opportunity to conduct studies on the design of stream processing engines and/or on their implementation.

This work will be run in the context of a national project involving academic research groups as well as some major French IT companies.

Location

The postdoctoral researcher will join the Informatics Laboratory of Univ. Grenoble Alpes, one of the largest laboratory in Computer Science in France. Univ. Grenoble Alpes is one of the 50 best universities in the world for computer science according to international rankings.

Job information

Position: Postdoc.
Duration: 1 year (starting date: ASAP).
Location: Grenoble, France.
Laboratory: Informatics Laboratory of Univ. Grenoble Alpes. (https://www.liglab.fr/), Erods research team.

Requirements

PhD in Computer Science
Solid background in distributed systems and operating systems. Experience in the fields of data processing and/or high performance computing would be highly appreciated.
Willingness to publish in international conferences and journals
Very good command in spoken and written English

Contact

Thomas Ropars (thomas.ropars@univ-grenoble-alpes.fr)