Skip to main content
Logo
The European High Performance Computing Joint Undertaking (EuroHPC JU)

EuroLingua GPT: One Model for all European Languages

2,200,000
Awarded Resources (in node hours)
MareNostrum5 ACC
System Partition
1 May 2024 - 30 April 2025
Allocation Period

The joined effort of OpenGPT-X and AI Sweden seeks to create a free, large-scale multilingual European language model “EuroLingua GPT” and embed it in an infrastructure that allows its potential to be leveraged by a broad range of commercial and non-commercial players. 

With the proposed project, the team makes a big step in this direction. Within this project, the following scientific contributions are made: 

  • i) an extensive study on the impact of data quality on multilingual Large-Language-Models (LLMs),
  • ii) scaling laws for multilingual LLMs with respect to the number of model parameters and the number of languages 
  • iii.) training of a 180B parameter model on the 24 official European languages and multiple programming languages  
  • iv.) novel fine-tuning approaches for improving the factual correctness of multilingual LLMs, 
  • v.) multilingual alignment and instruction tuning of our trained model, 
  • vi) multilingual evaluation of our LLM.