2,200,000
Awarded Resources (in node hours)
MareNostrum5 ACC
System Partition
1 May 2024 - 30 April 2025
Allocation Period
The joined effort of OpenGPT-X and AI Sweden seeks to create a free, large-scale multilingual European language model “EuroLingua GPT” and embed it in an infrastructure that allows its potential to be leveraged by a broad range of commercial and non-commercial players.
With the proposed project, the team makes a big step in this direction. Within this project, the following scientific contributions are made:
- i) an extensive study on the impact of data quality on multilingual Large-Language-Models (LLMs),
- ii) scaling laws for multilingual LLMs with respect to the number of model parameters and the number of languages
- iii.) training of a 180B parameter model on the 24 official European languages and multiple programming languages
- iv.) novel fine-tuning approaches for improving the factual correctness of multilingual LLMs,
- v.) multilingual alignment and instruction tuning of our trained model,
- vi) multilingual evaluation of our LLM.
Lindholmen Science Park AB - Sweden
Fraunhofer IAIS - Germany