Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Modern ColBERT

32000
Awarded Resources (in node hours)
MareNostrum 5 ACC
System Partition
May 2025 - May 2026
Allocation Period

AI Technology: Natural Language Processing

Information Retrieval (IR) is crucial for search engines and knowledge discovery, yet current methods struggle with the trade-off between effectiveness and efficiency. 

Late-interaction models like ColBERT offer a balance, enabling fine-grained token-level interactions without excessive computational costs. However, existing implementations are outdated and not optimized for modern NLP workloads.This project proposes ModernColBERT, a next-generation retrieval model built on ModernBERT, a state-of-the-art encoder. 

Using the Nomic Embed dataset, we aim to train a scalable, efficient, and high-performing ColBERT model, targeting top-tier performance on the Massive Text Embedding Benchmark (MTEB). ModernColBERT will drive advances in frugal, sustainable IR systems, benefiting both research and real-world applications.