The SafeLMM Project

1,050,000

Awarded Resources (in node hours)

Leonardo Booster

System Partition

20 April 2024 - 19 April 2025

Allocation Period

The Synthetic-data, Fair and Extreme-scaled Large Multimodal Model (SafeLMM) project will redefine the AI landscape by pioneering next-generation multimodal models that emphasise ethical and regulatory compliance.

Using the Leonardo supercomputer and in collaboration with LAION, e.v., Juelich Supercomputing Center, Horizon Europe project HPLT and Ontocord AI, PIISA.org, Efficient Translation Limited, among others, the SafeLMM models, ranging from 7B to 34B paramasters, will harness vast amounts of detoxified synthetic data and open and permissively licensed real data spanning images and text in 31 languages to address compliance with regulations.

Key project contributions:

Innovative Modelling: Implementing multimodal architectures for potentially enabling zero-shot transfer across modes, languages, and domains
Safe Content: Crafting content that strictly adheres to ethical and regulatory guidelines, addressing bias, toxicity, and privacy concerns.
Robust Governance: Incorporating data provenance, safety filtering, and attribution systems during training and inference.
Open-Access: Developing and sharing comprehensive documentation, data, models, tools and libraries, promoting transparency and collaboration in AI research.

By leveraging High Performance Computing, SafeLMM offers a fusion of synthetic data, multimodal capabilities and responsible AI practices. The ambition is not just high-performance models, but ones that carry the stamp of scientific, societal, and technological excellence.