Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Investigation of Self Supervised Tokenization for LLM Based Sign Language Translation (TokenSignLLM)

35,000
Awarded Resources (in node hours)
LUMI-G
System Partition
November 2024 - November 2025
Allocation Period

AI Technology: Vision (image recognition, image generation, text recognition OCR, etc.) |  Natural Language Processing 

Advancements in pre-trained large language models (LLMs) have significantly enhanced language understanding across multiple modalities. Sign language, which is expressed through hand shapes, upper body gestures, and facial expressions can benefit from this development. Tokenization is commonly used as a first step in natural language processing. The scarcity of annotated data for various sign languages limits the tokenization of sign videos. This project proposes a novel experimental suite for sign language translation that compares unsupervised tokenization strategies through novel vector quantization methods used in speech. 

Our approach comprises two main stages: 

  • tokenized sign language representation 
  • machine translation

For representation, latent encoding schemes with pre-trained image and video embeddings are employed. Utilizing the YouTube ASL dataset (which offers approximately 2,800 hours of American Sign Language video with English translations) we aim to enhance model performance through pretraining. Since the dataset includes English translations, it can utilized for supervised pre-training of the embedding layer for sign language for continuous and tokenized discrete representations. In the machine translation component, the learned representations are utilised to fine-tune for desired dataset and sign language via transfer learning. Turkish Sign Language (TID) is one of the under-resourced sign languages. This project also aims to develop an efficient sign language translation system for under-resourced sign languages like TID to improve accessibility for the deaf and hard-of-hearing community, positively impacting their daily communication and social integration.