Reproducible Scaling Laws for linear RNN open foundation models with strongly improved generalization and reasoning

50000

Awarded Resources (in node hours)

Leonardo BOOSTER

System Partition

January 2025 - January 2026

Allocation Period

AI Technology: Generative Language Modeling & Deep Learning

This project seeks to establish Linear Recurrent Neural Networks (LRNNs) as scalable and efficient alternatives to transformer-based Large Language Models (LLMs).

To achieve this, we will train and test promising LRNNs such as DeltaNet or Mamba with our recently proposed modification and also explore hybrid architectures guided both by theoretical insights and benchmarks in order to find the best candidate architecture.

Drawing inspiration from open initiatives for Transformer-based LLMs like DCLM by Apple/LAION, OLMo by AI2, and the BLOOM BigScience, we will aim to develop the first LRNN-based LLMs whose weights, dataset, and training pipeline are fully open source and are competitive with transformer-based models at the same training budget and parameter count, and potentially strong at code, math, and reasoning.

The project's model sizes will range from 1 to 7 billion parameters and will be trained on up to 1 trillion tokens.