AI Technology: Deep Learning | Machine Learning
Understanding the evolution and structure-to-function relationships of proteins –the building blocks of life– is crucial for advancements in fields such as drug discovery, protein engineering, and evolutionary biology. Time is of the essence; access to high-quality 3D structures for proteins is urgently needed, as every day without this data means lost opportunities for progress in these critical fields.
The 3D structure of proteins conveys a wealth of information about their functionalities and is, therefore, a stepping stone for designing novel, targeted drugs. As such, this project aims to leverage EuroHPC's advanced GPU capabilities to predict the 3D structure of proteins using advanced and computationally burdensome machine learning algorithms. By combining sequence data from comprehensive databases like UniProt with the powerful computational resources of EuroHPC, we will generate detailed 3D structural models for hundreds of thousands of proteins, including multiple families of structurally similar proteins undergoing different evolutionary (i.e., mutation) processes. Access to such a repository is essential to accelerate advancements in biomedical research, and each day of delay impedes progress toward understanding disease mechanisms, engineering novel therapeutics, and tackling other pressing challenges in healthcare and biotechnology.
In particular, this project aims to enhance the understanding of protein evolution by integrating 3D structural information into existing sequence-based models. This integration will facilitate more precise predictions of mutational effects on protein stability and function, enabling novel insights into evolutionary dynamics at a structural level. By using state-of-the-art AI-driven approaches, e.g., the open source AlphaFold3, for protein structure prediction, this project aims to create the first comprehensive, open-access, 3D protein data bank that will be a valuable resource for the broader scientific community. This unique project will only be possible via the interconnected GPU nodes offered by EuroHPC, which allows efficient parallel processing and communication, ensuring that complex structure prediction and evolutionary analyses can be performed at scale.
The outcomes are expected to significantly advance our understanding of protein evolution, potentially leading to new hypotheses, tools, and methods in computational biology, and ultimately speeding up the research on drug discovery. These efforts will also foster cross-disciplinary collaborations and support the broader ecosystem of computational science and innovation in Europe.
Davide Mottin, Aarhus University, Denmark