No retraining required: Sakana’s new AI model changes the way machines learn

No retraining required: Sakana’s new AI model changes the way machines learn

Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more


researchers at Section AIan AI research lab focused on nature-inspired algorithms has developed a self-adaptive language model that can learn new tasks without the need for fine-tuning. Called Transformer² (Transformer Square), the model uses mathematical tricks to adjust its weights to user requirements during inference.

This is the latest in a series of techniques aimed at improving the skills of large language models (LLMs) at inference time, making them increasingly useful for everyday applications in various domains.

Dynamically adjust weights

Typically, configuring LLMs for new tasks is costly Fine-tuning processThe model is exposed to new examples and its parameters are adjusted. A more cost-effective approach is “Low rank customization” (LoRA), in which a small subset of the model’s parameters relevant to the target task is identified and changed as part of the fine-tuning.

After training and fine-tuning, the model’s parameters remain frozen, and the only way to reuse it for new tasks is through techniques such as few-shot and many-shot learning.

Unlike classical fine-tuning, Transformer-squared uses a two-stage approach to dynamically adjust its parameters during inference. First, it analyzes the incoming request to understand the task and its requirements, and then applies task-specific adjustments to the model’s weights to optimize its performance for that specific request.

“By selectively adjusting critical components of model weights, our framework enables LLMs to dynamically adapt to new tasks in real time,” the researchers wrote in one Blog post published on the company’s website.

This is how Sakana’s Transformer Squared works

Transformer-squared’s core capability is to dynamically adjust critical components of its weights during inference.

To do this, the key components that can be optimized during inference must first be identified. Transformer-squared does this through Singular value decomposition (SVD), a linear algebra trick that decomposes a matrix into three further matrices that reveal its internal structure and geometry. SVD is often used to compress data or simplify machine learning models.

When applied to the LLM weight matrix, SVD obtains a set of components that roughly represent the model’s different abilities, such as mathematics, language comprehension, or coding. In their experiments, the researchers found that these components can be tweaked to change the model’s capabilities on specific tasks.

To systematically use these findings, they developed a process called Singular Value Finetuning (SVF). At training time, SVF learns a set of vectors from the SVD components of the model. These vectors, called Z-vectors, are compact representations of individual abilities and can be used as controls to boost or dampen the model’s abilities at specific tasks.

At inference time, Transformer-squared uses a two-pass mechanism to adapt the LLM for unseen tasks. First, the request is examined to determine the skills required to address the problem (the researchers suggest three different techniques for determining the skills required). In the second phase, Transformer-squared configures the Z vectors corresponding to the query and executes the prompt through the model and the updated weights. This allows the model to provide a tailored response to each request.

Transformer Square Training and Inference (Source: arXiv)

Transformers Square in action

The researchers applied transformers squared Llama-3 And mistral LLMs and compared them with LoRA on various tasks, including mathematics, coding, reasoning, and visual question answering. Transformer-squared outperforms LoRA in all benchmarks and has fewer parameters. It is also notable that, unlike Transformer-Squared, LoRA models cannot adjust their weights at the time of inference, making them less flexible.

Another interesting result is that the knowledge gained from one model can be transferred to another. For example, the Z vectors obtained from Llama models could be applied to Mistral models. The results were not comparable to creating entirely new Z-vectors for the target model and transferability was possible because the two models had similar architectures. However, it suggests the possibility of learning generalized Z-vectors that can be applied to a variety of models.

Transformer Squad (SVF in the table) vs. base models and LoRA (Source: arXiv)

“The path forward lies in creating models that dynamically adapt and collaborate with other systems, combining specialized capabilities to solve complex, cross-domain problems,” the researchers write. “Self-adaptive systems like Transformer² bridge the gap between static AI and living intelligence, paving the way for efficient, personalized and fully integrated AI tools that drive progress across industries and in our daily lives.”

Sakana AI has released the code for training the components of Transformer-squared on GitHub.

Inference-time tricks

As companies explore various LLM applications, there has been a noticeable shift towards the development of inference time techniques over the past year. Transformer-Squared is one of several approaches that allow developers to adapt LLMs to new tasks at inference time without having to retrain or optimize them.

Titansan architecture developed by researchers at Google, approaches the problem from a different angle, giving language models the ability to learn and retain new information at the time of inference. Other techniques focus on enabling frontier LLMs to take advantage of their benefits increasingly longer context windows to learn new tasks without retraining.

As organizations have the data and knowledge specific to their applications, advances in inference time adjustment techniques will make LLMs much more useful.



Source link
Spread the love
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *