Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

LLM based CAD Information Extraction

25000
Awarded Resources (in node hours)
MeluXina GPU
System Partition
January 2025 - January 2028
Allocation Period

The core idea of this project is to study the impact of utilising generative AI, specifically multimodal Large Language Models (LLMs), to extract metadata from rendered 2D CAD (Computer-Aided Design) files inside a data exchange solution for the automotive industry. 

CAD files are widely used in manufacturing sectors such as automotive, aerospace, and construction. Traditionally, these CAD files are archived as images, making it difficult to retrieve structured information. 

Current Optical Character Recognition (OCR) systems have limitations in accurately extracting tabular information and annotations from these images, especially when it comes to complex layouts or dense information.

The expected outcomes of this project therefore include:

  • Extensive Benchmarking of Multimodal (Vision) LLMs for Industrial OCR: A comprehensive evaluation of various performance and cost of various multimodal LLMs for metadata extraction from 2D CAD files using hyperparameter optimisation for different existing alignment and fine-tuning strategies
  • Customisation Strategies using synthetic Data: Development of effective strategies for fine-tuning and customising multimodal LLMs for domain-specific applications in manufacturing and construction based on synthetic data generation
  • Enhanced Knowledge Extraction: Improved methodologies to extract structured and unstructured knowledge from visual data, potentially transforming the way information is retrieved and used in manufacturing and construction sectors
  • Scalable solutions using HPC and local inference: Development of a scalable and cost-effective AI solution that can be customised towards other multimodal knowledge extraction and information retrieval use cases based on HPC training and local LLM inference that suits industry needs for confidentiality in highly sensitive industrial sectors.