Take part in our daily and weekly newsletters to get the latest updates and exclusive content for reporting on industry -leading AI. Learn more
Qwen teamA division of the Chinese e-commerce giant Alibaba The development of his growing family of open source QWen large language models (LLMS) has introduced QWQ-32bA new 32-billion parameter argumentation model for improving performance in complex problem-solving tasks by learning reinforcement (RL).
The model is available as an open weight Hug and further Modelscope Under an Apache 2.0 license. This means that it is available for commercial and research purposes so that companies can use them immediately to supply their products and applications (even if they use customers).
It can also be accessed for individual users Qwen chat.
Quan-without questions was Alibabaās answer to Openais original argumentation model O1
QWQ, short for QWen-with questions, was first presented by Alibaba in November 2024 as an open source argumentation model that focuses on with the competition with the competition Openais O1 pre-wall.
At the start, the model was developed to improve logical thinking and planning by checking and refining his own answers during the inference, a technique that made it particularly effective for mathematics and coding tasks.
The initial version of QWQ contained 32 billion parameters and a 32,000-person context length, alibaba emphasizing its ability to surpass the O1 previews in mathematical benchmarks such as Aime and mathematics as well as mathematical tasks such as GPQA.
Despite its strengths, the early iterations of QWQ fought with programming benchmarks such as Livecodebench, where the models from Openaai kept a lead. In addition, as with many aspiring argumentation models, QWQ was challenges such as language mix and occasional circular argument.
Alibabaās decision to publish the model under an Apache 2.0 license, however, ensured that developers and companies were able to freely adapt and commercialize it, which distinguishes alternatives such as Openas O1.
Since the first publication of QWQ, the AI
This postponement has the interest in large argumentation models (LRMS) Awakened and a new category of AI systems that use inference argument and self-reflection to improve accuracy. This includes Openais O3 series and the massively successful Deepseek-R1 The high-flyer capital management from the competing Chinese laboratory Deepseek, an offshoot of the quantitative analysis company in Hong Kong.
A new report From the web traffic analysis and research company similarly, Deepseek has been the charts since the start of R1 in January 2024 to become the most visited AI model providing website behind Openaai.
QWQ-32b, Alibabaās latest iteration, builds on this progress by integrating the RL and structured self-survey, which means that it is positioned as a serious competitor in the growing area of
Scaling the performance with multi -stage reinforcement learning
Traditional models for lesson reduction often fight with difficult argumentation tasks, but research by the QWen team suggests that RL can significantly improve the ability of a model to solve complex problems.
QWQ-32b builds on this idea by implementing a multi-stage RL training approach to improve mathematical thinking, skills and general problem solving.
The model was checked against leading alternatives such as Deepseek-R1, O1-Mini and Deekseek-R1 distilled-Qwen-32B, which shows competing results, although they have fewer parameters than some of these models.
For example, while Deepseek-R1 works with 671 billion parameters (with 37 billion activated), QWQ-32B achieves a comparable performance with a much smaller footprint-normally required 24 GB VRAM on a GPU (Nvidiaās H100S have 80 GB) compared to more than 1500 GB VRAM For the operation of the full Deepseek R1 (16 NVIDIA A100 GPUS), the efficiency of the QWen -RL approach is emphasized.
QWQ-32B follows a causal voice model architecture and contains several optimizations:
- 64 transformer layers with rope, Swiglu, RMS standard and attention QKV preload;
- Generalized query attention (GQA) with 40 attention heads for queries and 8 for key value pairs;
- Extended context length of 131,072 tokens that enable better handling of long -term inputs;
- Multi-stage training including stem, supervised fine-tuning and RL.
The RL process for QWQ-32B was executed in two phases:
- Mathematics and coding focus: The model was trained using an accuracy tester for mathematical thinking and a code execution server for the coding of tasks. This approach ensured that generated answers were validated for correctness before they were reinforced.
- General ability to improve: In a second phase, the model received a reward base with general reward models and rule -based checkers. This phase improved the instructions, the direction of man and argument for agents without affecting mathematics and coding functions.
What it means for enterprise decisions
For the company manager inlay, CEOs, CTOs, IT executives, team managers and AI application developers, QWQ-32B represents a possible shift in the AI
With its RL-controlled argumentation functions, the model can provide more precise, structured and context-related knowledge of what makes it valuable for application cases such as automated data analysis, strategic planning, software development and intelligent automation.
Companies that want to use AI solutions for complex problem solutions, coding aid, financial modeling or customer service stood-up may find the efficiency of QWQ-32b as an attractive option. In addition, the availability of open weight enables companies to optimize and adapt the model for domain -specific applications without proprietary restrictions and to adapt, which makes it a flexible choice for the strategies of companies AI.
The fact that it comes from a Chinese e-commerce giant can raise some security and concerns for some non-Chinese users, especially if the QWen chat interface is used. But as with Deepseek-R1, the fact that the model for download and offline use as well as the fine-tuning or retraining is upset that they can be easily overcome. And it is a practical alternative to Deepseek-R1.
Early reactions from AI performance users and influencers
The publication of QWQ-32B has already attracted attention from the AI
- Hug the face Vaibhav Srivastav (@Reach_vb) Emphasized the speed of QWQ-32B in a row thanks to the provider Hyperbolic laboratoriescall it āpaleā and comparable to the top models. He also noticed that the āDeepseek-R1 and Openaai O1-Mini with Apache 2.0 license beatsā.
- AI messages and rumor publisher Chubby (@kimmonism) Was impressed by the performance of the model and emphasized that QWQ-32b Deepseek-R1 sometimes exceeds, even though it is 20 times smaller. āSaint Moly! Qwen cooked! ā She wrote.
- Yuchen Jin (@yuchenj_uw), Co -founder and CTO of hyperbolic laboratoriesPresent Celebrated the publication by determining the efficiency gains. āSmall models are so powerful! Alibaba Qwen published QWQ-32b, an argumentation model, the Deepseek-R1 (671b) and Openaai O1-Mini defeated! ā
- Another hugging facial assembly member, Erik KaunismƤki (@erikkaum) emphasized the simple provision and informed that the model is available for use with a click for hug -heeling point of view and makes it accessible to developers without extensive furnishings.
Agent skills
QWQ-32B contains the agent functions so that it can adapt dynamic argumentation processes based on environmental feedback.
For optimal performance, the QWen team recommends using the following inference settings:
- temperature: 0.6
- Top: 0.95
- Great: Between 20-40
- Yarn scaling: Recommended for dealing with sequences that longer than 32,768 tokens
The model supports the provision with VLLM, an inference framework with high throughput. However, the current implementations of VLLM only support static yarn scaling, which maintains a fixed scaling factor regardless of the input length.
Future developments
The QWEN team sees QWQ-32B as the first step in scaling RL to improve the argumentation functions. With a view to the future, the team is planning:
- Further research the scaling of RL to improve the model information.
- Integrating agents in RL for long-term argument;
- Develop other basic models optimized for RL;
- Use advanced training techniques towards artificial general intelligence (AGI).
With QWQ-32B, the QWen team RL positions as an important driver of the next generation of AI models and shows that scaling can produce high-colored and effective argumentation systems.
Source link