Biomarker discovery

Biomarkers are measurable values that serve as indicators of a biological condition, identify risk factors, examine diseases, predict diagnoses, determine the state of the disease or measure the effectiveness of treatment.

Often the intention of biomarker pilot studies with very small sample sizes is to check whether it is worthwhile to continue to collect more samples in order to construct a classifier based on a larger data set.

Biomarker workflow
Biomarker workflow

Computation intensive methods that can check the potential of such small biomarker pilot studies and also give an estimate of the required sample size to obtain a sufficiently reliable classifier from the larger sample have been developed and accelerated within LEGaTO,

To construct a classifier, feature selection techniques are required to reduce the number of attributes (biomarker candidates) drastically. As a possible way to evaluate the predictive power of a classifier leave-one-out cross-validation has been applied, which is again computationally expensive.

There are individual biomarker candidates that have a high correlation with certain diseases, but are not reliable enough to act as predictors for the presence of a specific disease alone. A combination of biomarker candidates often can deliver a diagnosis with higher certainty. This calculation will be accelerated using LEGaTO tools.

To ensure data security concerns related to medical data, Scone has been used to show a secure way to protect sensitive data.

Healthcare use case in the LEGaTO system

In order to use computationally intensive methods that are based on Monte Carlo simulations and permutation tests and can test the potential of small biomarker pilot studies for large amounts of data, the algorithm had to be accelerated. With the MAXELER DFE it is possible to run 1 million simulations for 10,000 biomarker candidates instead of 2.5 hours on a standard laptop in 5.49 seconds. This acceleration enables to run 5 million simulations to work with 50,000 biomarker candidates in just 29.2 seconds.

Presentation 'Infection Research with Maxeler Dataflow Computing' by Tobias Becker (Maxeler) at the LEGaTO final event



  • OmpSs
  • Scone
  • Microserver Hardware Platform

LEGaTO components are available here:


Scientific publications



Event Date Materials
German Society for Clinical Chemistry and Laboratory Medicine Congress (DGKL2019) 26 September 2019

Slides: "Cerebrospinal fluid metabolites as biomarkers to distinguish between viral and autoimmune encephalitis"

Poster: "Combination of informative biomarkers in small pilot studies and estimation of sample size for extended studies"



Medium Date Article
LEGaTO website June 2020 Women and STEM: LEGaTO researcher talks about her passion for statistics
HiPEAC info January 2020 Bright sparks - Tackling the energy challenge in computing systems
LEGaTO website September 2019 Women and STEM: LEGaTO researcher talks about her experience as a woman in academia
LEGaTO website February 2018 New project to plug the software-stack support gap for energy-efficient computing