AI Technology: Generative Language Modeling, Machine Learning, Deep Learning, Natural Language Processing
Following the launch of ChatGPT, an unstoppable trend has risen to use chatbots for day to day consultations and text writing by businesses, public institutions and individuals alike. However, just as the adoption has skyrocketed, so have concerns for the financial, environmental and societal costs for the hosting expense and data sources that are being used to train such models.
Privacy concerns over closed-source chatbots hosted in privately controlled cloud environments yet with publicly accessible front-ends where any user can upload potentially sensitive information to generate answers, highlights the need to find ways of leveraging the functionalities of chatbots, while ensuring privacy.
In January 2024, Multiverse Computing published a “phase 1” research paper on CompactifAI, a new and promising way to compress AI large language models (LLMs) using tensor networks.
Through this technique, the team demonstrated that CompactifAI alone enables compression of Meta’s open-source LlaMA-2 7B model to only 30% of its original size while recovering over 90% of the original accuracy after a brief distributed retraining.
The advantage of compressing open-source chatbots not only speeds innovation by removing the source code barrier, but also opens the possibility to host and run the compressed models on edge servers and smaller devices with limited internet connectivity.
This flexibility allows private and public entities to adopt the power of chatbots as customizable AI personal assistants, while ensuring control over the hosting and training data used, as well as drastically reducing the energy consumption and financial costs involved in the training and hosting of smaller models.
Multiverse is requesting EuroHPC access to move the project's research into “phase 2” to benchmark CompactifAI’s performance against the LlaMA-2 7B chatbot model.
The research team's aim is to continue validating advanced tensor network compression techniques for LLMs, including benchmarking the compressed model, especially in regards to operations such as commonsense reasoning, reading comprehension, language understanding, math and code generation while tracking data drift in order to validate its performance and energy efficiency against the original model.
Roman Orús, Multiverse Computing Research SL - Spain