Extreme-scale models have sparked a paradigm shift in natural language processing. Trained on broad plentiful data, they exhibit powerful emergent capabilities as they scale (e.g., zero-shot generalization), and their nearly universal effectiveness has led to a cornucopia of applications.
Accordingly, they have seen rapid ubiquitous adoption, with a uniquely fast time-to-market. GPT-3—OpenAI’s 175 billion parameters model—was commercially released in beta a month after its reveal to the research community in 2020. In 2021, it was powering 300 applications, generating 4.5 billion words per day. However, 2 years later, the tremendous costs associated with >10B parameters models limits their availability to English, Chinese, and Korean.
Although multilingual approaches are promising, they significantly underperform their monolingual counterparts. Studies are limited by the state-of-the-art HPC infrastructure required to run representative experiments.
Leveraging EuroHPC resources, we propose to study pathways to efficient multilingual models.
We undertake the largest systematic study to date on the influence of data, modelling choices, multilingual adaptation, and task specialization on multilingual performance, relying on extensive benchmarks and scaling laws. Unlocking models capable of multilingual generalization holds the promise of bringing the extreme-scale revolution to many more languages, and of obtaining unique insights into the nature of generalization.