Skip to main content
Logo
The European High Performance Computing Joint Undertaking (EuroHPC JU)

FairCLIP: Training a Fair CLIP Model with Hybrid Real and Synthetic Data

32,000
Awarded Resources (in node hours)
MareNostrum5 ACC
System Partition
July 2024 - July 2025
Allocation Period

AI Technology: Vision (image recognition, image generation, text recognition OCR, etc.); Natural Language Processing.

 

The FairCLIP project aims to advance large-scale vision and language models by addressing inherent biases in training datasets. 

Leveraging a balanced hybrid dataset composed of real and synthetic data sources, FairCLIP focuses on training a CLIP (Contrastive Language-Image Pre-training) model that promotes fairness across demographic groups. 

This innovative approach integrates state-of-the-art diffusion models for synthetic data generation, ensuring diverse representation without compromising performance. Key milestones include scaling experiments across small, medium, and large datasets, facilitated by high-performance computing environments and robust technical frameworks.

The project underscores interdisciplinary collaboration, bridging AI research with ethics and social sciences to pioneer equitable AI solutions