News & press releases

Powering HPC and AI applications through intelligent management of resources - A glimpse into the LEGaTO runtime system

LEGaTO is working with the latest hardware resources to create energy-efficient computing solutions. However, these resources can only be fully exploited through intelligent behind-the-scenes management via the system’s back end. In this article, WP3 leader Miquel Pericàs (Chalmers University) explains the crucial contribution of the LEGaTO runtime system, often the unsung hero of the computing stack, to the project’s goals of energy efficiency, fault tolerance and security. 

Modern computer platforms are increasingly composed of a range of different processors, with graphics processing units (GPUs) and field-programming gate arrays (FPGAs) accelerating computing tasks alongside traditional central processing units (CPUs). In addition to continuing to increase performance, these heterogeneous systems help address the energy constraints of the future high-performance computing (HPC) and artificial intelligence (AI) landscape. This is vital taking into account the unsustainable energy demands of supercomputers and data centres – currently comparable to those of small towns – while intelligent devices at the edge, for example, will need to pack in significant processing power while consuming little energy. 

The drawback to heterogeneous systems, though, is the increased programming complexity which they entail. To respond to this, software stacks for heterogeneous platforms have been thoroughly researched and deployed. However, the focus has traditionally been on high performance, and current software stacks are severely lacking in their ability to leverage heterogeneous platforms to achieve energy-efficient and power-efficient operation. In addition to energy efficiency, reliable and secure operation is also becoming increasingly important as transistors get smaller and data centres larger and more interconnected. 

The main goal of the LEGaTO project is to address these challenges by starting with a made-in-Europe mature software stack, and optimising this stack to support energy-efficient computing on a commercial cutting-edge European-developed CPU–GPU–FPGA heterogeneous hardware substrate and FPGA-based Dataflow Engines (DFE), leading to an order of magnitude increase in energy efficiency. 

A major contributor to achieving this goal is the LEGaTO toolchain backend, also called the runtime system. The backend component consists of the technologies that are deployed during runtime to support the programmer's task and to intelligently manage the resources of the heterogeneous hardware platform. Through this approach, the LEGaTO runtime system aims to achieve its goals of energy efficiency, reliability and security. 

To help meet these goals, the LEGaTO back end is based on a task-based execution model. Tasking, as opposed to threading models, allows the runtime system to exploit higher parallelism and to perform the advanced scheduling necessary to effectively manage heterogeneous platforms.

A set of use cases from high-performance simulations of urban air quality to AI applications like a ‘smart mirror’ for home use are now being ported to the LEGaTO tasking model. By the means of novel scheduling and mapping technologies, the LEGaTO runtime system is enabling their energy-efficient and fault-tolerant execution on the heterogeneous platform. See the full list of applications on the use cases webpage

The diagram below shows how the back end interacts with the hardware resources and application layers of the LEGaTO stack.

The LEGaTO team in charge of the back-end development recently released an initial report detailing the development status of the LEGaTO runtime components. These include (a) OpenStack tools to manage and configure the target hardware platform, (b) a runtime system which exploits both the Nanos and XiTAO libraries, (c) fault tolerance and security schemes for GPU, FPGA and CPU, and (d) a tool to detect correctness errors in the development of LEGaTO programs. 

The team has been busy performing detailed experimentation of the software components. These experiments have demonstrated considerable advances in reaching the project targets of energy efficiency and fault tolerance. 

  • Energy efficiency: LEGaTO is exploring two main approaches to achieve higher energy efficiency. The first approach consists in improving the selection of resources, such as CPU cores, or GPUs. This research, which is mainly conducted in the context of the XiTAO runtime, has demonstrated up to twice the performance while halving energy consumption. 

A second approach consists in aggressively adopting FPGA technology, which has been demonstrated to achieve large gains in efficiency on selected applications.  Experiments conducted in the context of the OmpSs@FPGA platform have demonstrated speed-ups of up to 128 times on an N-body benchmark when comparing to a traditional parallel software implementation running on one ARM A53 CPU.

  • Fault tolerance: The current LEGaTO release also features support for advanced fault-tolerance in the form of fast GPU checkpointing, and in the form of reliable and energy-efficient FPGA undervolting. Checkpointing is a technique to periodically store the program state in order to recover from it in case an application crashes, like intermediate save points when editing a document. By means of novel differential multilevel checkpointing implemented in the FTI library, reductions of up to 62% in checkpointing time have been achieved. The latest release FTI 1.3 Heraklion includes support for HDF5, differential and incremental checkpoint for heterogeneous systems with CPU and GPUs. 

Undervolting is a technique to reduce the operating voltage of a device in order to save large amounts of energy. However, aggressive undervolting can result in large numbers of errors due to increases in the time needed to stabilize signals on the chips. Experimental analysis of FPGA undervolting conducted by the LEGaTO partners has shown that power reduction of over 90% can be achieved in the FPGA BRAMs without incurring excessive error rates. As an upcoming step, the runtime is being enhanced to automatically exploit these observations for reliable operation at low energy consumption.  

All components of the LEGaTO toolchain back-end are open source and can be freely accessed via the LEGaTO website:

Further reading

An Adaptive Performance-oriented Scheduler for Static and Dynamic Heterogeneity. J Chen, PN Soomro, M Abduljabbar, M Pericàs. arXiv preprint arXiv:1905.00673, 2019

Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets. K Keller, LB Gomez. 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019

Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-chip Memories. Behzad Salami, Osman S Unsal, Adrian Cristal Kestelman. 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018