Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Improving the robustness and generalizability of data-driven methods for Spatial Transcriptomics data

25,000
Awarded Resources (in node hours)
MeluXina GPU
System Partition
January 2025 - January 2026
Allocation Period

AI Technology: Vision (image recognition, image generation, text recognition OCR, etc.) | Deep Learning | Other 

Recent advances in spot-based spatial transcriptomics technologies (SpT) (e.g. 10x Genomics Visium) and sub-cellular spatial transcriptomics technologies (e.g. 10x Genomics Xenium or NanoString CosMx) are likely to allow medical researchers and biostatisticians to unlock novel insights from tissue samples and potentially revolutionize our understanding of cancer biology. 

These new spatial sequencing technologies recently elected “Method of the year” by Nature Methods (2020) provide unprecedented information about cellular diversity, composition and intercellular communication. Such information is crucial to refine our understanding of the tumor microenvironment (TME) and its interaction with the immune system, which is key in better understanding cancer and developing new drugs. Nevertheless, given the recent emergence of these technologies, researchers currently lack sufficient perspective to fully exploit this novel source of information, namely SpT data. 

By applying the highly variable sequencing process to a closely knit spatial grid, spatial sequencing leads to spilling effects and introduces spatial correlations between spots, which impacts the sequencing output and might lead traditional analyses to erroneous interpretations. This observation has sparked an entirely new field of research around methods which aim at better understanding and analyzing SpT data. 

In particular, new methods such as BayesSpace and SpaNorm have been proposed, claiming to remove this unwanted source of variation (so-called “batch-effect”) or to better integrate the spatial dimension of the data in analyses (e.g. C-side), leading to more robust results. While these methods show great promises, they mostly rely on traditional statistical models and do not leverage recent advances in generative artificial intelligence (GenAI). 

This project aims at proposing novel methods leveraging multi-modality and foundation models to correct SpT data for unwanted sources of variation while preserving the biological signal. Furthermore, the project also aims at adapting downstream analyses, such as differential gene expression analysis (DGE) to such newly corrected (or normalized) data. Eventually, the project will also aim at exploiting Large Language Models (LLMs) to interact, in natural language, with this data modality and automate state-of-the-art analyses.