Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

JuriLabs

35,000
Awarded Resources (in node hours)
LUMI-G
System Partition
January 2025 - January 2026
Allocation Period

AI Technology: Generative Language Modeling | Virtual agents | Decision management: Classified and statistical learning methods 

JuriLabs is a virtual legal assistant based on generative AI. The application of LLMs to the legal field is rapidly emerging, with many preliminary works in progress. Several previous experiments have shown the importance of text database quality, the specificity of law corpuses by country and language, but also the importance of building a validation structure to better compare models with each other, and then to measure the progress of a model on a given corpus.

The challenges to be overcome by large-scale language models in the field of law are the ability to mention sources, articulate a line of reasoning, not to forget the structure of the language during relearning and in particular the ability to reason, but nevertheless to be able to “forget” part of the accumulated knowledge (or at least to have a temporal dimension of knowledge when the law is no longer applicable), and to rely on the data of user law firms who all have a large corpus of past work that they wish to use as an input to AI agents. The trade-off between fine-tuning and RAG will be taken into account during testing and will be published. 

The Jurilabs project aims to develop a virtual legal assistant based on a set of large language models that will address these specific challenges. By using a base of augmented French legal content, this project will move from a “raw” model to an “instruction” model based on a complete corpus of case law, capable of replicating legal reasoning. This MoE-type model has been tested with a few legal experts in France with a good level of quality, but it now needs to be improved both in terms of number of parameters in the central model and precision of expert models capable of detecting intentions. 

Based on various foundation models such as LLama-2 and Mistral 7B, the researchers have developed a technique to gradually increase the size of these models. The system needs to be fine-tuned on a new augmented foundation in order to validate the training methodology, the model structure and the evaluation structure.  By relying on specialized language models trained on large legal corpuses, this tool will deliver significant productivity and quality gains for legal research, drafting and case law monitoring.