Take part in our daily and weekly newsletters to get the latest updates and exclusive content for reporting on industry -leading AI. Learn more
Large -speaking models (LLMS) can learn complex argumentation tasks without relying on large data records New study by researchers at Shanghai Jiao Tong University. Your results show that with just a small amount of well -curated examples you can train an LLM for tasks that have been assumed that you require tens of thousands of training instances.
This efficiency is due to the inherent knowledge that modern LLMs preserved during the previous training phase. With new training methods, companies may create tailor-made models without accessing the resources of large AI laboratories.
Less is more (sedan)
In their study, the researchers question the assumption that they need large amounts of data to train LLMs for the argumentation of tasks. They present the concept of “Less” (Limou). Your work builds up Previous research This showed that LLMS could be aligned with some examples with human preferences.

In their experiments, they showed that with a few hundred training examples they were able to create a limo data set for complex mathematical argumentation tasks. An LLM-Fein that is finely coordinated on the data set could create complex Chain of thought (COT) chains of arguments with which you could do the tasks with a very high success rate.
For example a QWen2.5-32b-Instructure The model, which was selected on 817 training examples based on the limousine, achieved 57.1% accuracy for the highly challenging Aime benchmark and 94.8% for mathematics models that have been trained a hundred times more examples. It also achieved the benchmarks as an argumentation models such as QWQ-32B preview (a version of the QWen model that was trained for the argument) and Openai O1-PRIEWBoth were trained with larger data and calculation of resources.
In addition, limo-trained models generalize with examples that differ drastically from their training data. For example on the Olympic embankment Scientific benchmark, the Lime model exceeded QWQ-32B preview and the challenge GPQA benchmarkIt achieved an accuracy of 66.7%, near Openai-o1 previews of the leading score of 73.3%.
What does it mean for Enterprise KI?
Adjusting LLMS is an attractive application for corporate applications. Many thanks to techniques like Repetition generation (Rag) and In-context learningLLMs can be adjusted in such a way that tailor -made data is used or carry out new tasks without the necessary fine -tuning.
However, the consideration tasks often require training and fine-tuning LEFs. The widespread conviction was that such tasks require large amounts of training examples with greatly detailed argument chains and solutions. Creating such data records is slow and impractical for many applications and companies.
Researchers have shown this in recent times Pure reinforcement learning approaches It can enable models to develop tasks to argue by generating many solutions and selecting those that work best. Although this approach requires fewer manual efforts, it still requires expensive arithmetic resources that go beyond the reach of many companies.
On the other hand, creating some hundred examples is an endeavor that many companies can tackle in order to bring special argumentation models to the reach of a wider series of organizations.
“This discovery has profound effects on research for artificial intelligence: it suggests that even complex arguments on competitive levels can be effectively caused by minimal but curated training patterns,” the researchers write.
Why limousines works
In their experiments, the researchers identify two important reasons why LLMS can learn complex argumentation tasks with fewer examples.
First, state -of -the -art foundation models were trained on a very large amount of mathematical content and code During the preliminary training. This means that these LLMs can be activated by carefully manufactured example for carefully manufactured examples.
Second, new post-training techniques have shown that the production of models that can create extended argumentation chains significantly improves their ability to argue. If you essentially give the models more time to “think”, you can unpack and apply your educated knowledge more effectively.
“We assume that successful thinking from the synergy of these two factors is evident: rich before trained knowledge and sufficient arithmetic resources at the time of infection,” the researchers write. “These developments jointly indicate a remarkable option: If models have a rich knowledge of argument and receive an appropriate calculation room, it can be data records. “

According to the researchers’ findings, the creation of useful limousine data records depends on the selection of the right problems and solutions. Data curators should prioritize challenging problems that require complex chains of arguments, various thinking processes and knowledge integration. The problems should also differ from the training distribution of the model in order to promote new approaches to arguments and to force it to generalize.
Accordingly, solutions should be clear and well organized, with the argumentation steps adapted to the complexity of the problem. High -quality solutions should also offer strategic support for education by gradually building up understanding through carefully structured explanations.
“By concentrating on a minimal but meticulously curated series of argument chains, we embody the core principle of the limousines: high -quality demonstrations instead of mere data volume are the key to unlocking complex argumentation functions,” write the researchers.
The researchers have Publication of the code and the data Is used to train the limousine models in their experiments. In the future, you will plan to expand the concept to other areas and applications.
Source link