AI Technology: Natural Language Processing, Deep Learning
Digitalization is a longstanding goal of the EU, with varying degrees of progress among member states. Large language models are a promising catalyst for this process, but the current landscape presents significant challenges.
Proprietary models are not well-suited for scenarios where strong privacy and data protection are essential, while open alternatives tend to offer inferior performance or simply lack the support for many languages.
There is also insufficient data available to create accurate projections regarding resource and energy requirements for large-scale LLM adoption for advancing towards a digital society.
This project aims to support the first steps to accelerate this endeavor for Romania by training NELU, an open LLM with Romanian language support.
NELU is positioned as the baseline used to build and evaluate the performance and runtime characteristics of digitalization-oriented applications for both the public and private sector. Moreover, it will also showcase the significant impact of novel dataset curation techniques for underrepresented languages such as Romanian.
Alexandru Agache, National University of Science and Technology Politehnica Bucharest - Romania