Microsoft’s new rStar Math technology improves small models to outperform OpenAI’s o1 preview on math problems

researchers-math-stars.png

Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more


Microsoft is doubling the potential of Small Language Models (SLMs). the unveiling of rStar-Matha new reasoning technique that can be applied to small models to increase their performance on mathematical problems using reasoning techniques – performance similar to and in some cases even better than OpenAI’s o1 preview model.

Still in the research phase – as in a Article published on the preliminary briefing site arXiv.org and credited to eight authors from Microsoft, Peking University and Tsinghua University in China – the technique has been applied to several different smaller open source models, including Microsoft’s own Phi-3 mini, Alibaba’s Qwen-1.5B (a 1, 5 billion parameter model). and Qwen-7B (a 7 billion parameter model). It showed improved performance on all of them, even outperforming OpenAI’s most advanced model to date MATH (Solving Word Problems) Third-party benchmark with 12,500 questions covering various areas such as geometry and algebra and all difficulty levels.

Ultimately, so a Post on Hugging Facethe researchers plan to make their code and data available on Github at https://github.com/microsoft/rStaralthough one of the paper’s authors, Li Lyna Zhang, wrote in the comments on the Hugging Face post that the team is “still going through the internal review process for the open source release.” Therefore, “the repository remains private for now.” Please stay tuned!”

Community members were enthusiastic, calling the innovations “impressive” and praising the combination of Monte Carlo Tree Search (MCTS) and step-by-step instructions. One commenter emphasized the simplicity and usefulness of using Q values ​​for stage evaluation, while others speculated about future applications in geometric proofs and symbolic reasoning.

This news comes hot on the heels of the open source release of Microsoft’s Phi 4 model, A smaller AI system with 14 billion parameters is now available on Hugging Face under the permissive MIT license.

While the Phi-4 version expanded access to powerful small models, rStar-Math demonstrates a special approach: using smaller AI systems to achieve cutting-edge results in mathematical thinking.

rStar-Math uses several different models and components to help a small target model “self-develop”.

The key to rStar-Math is that it uses Monte Carlo Tree Search (MCTS), a method that mimics human deep thinking by iteratively refining step-by-step solutions to mathematical problems.

The researchers used MCTS because it “breaks down complex mathematical problems into simpler one-step generation tasks, reducing the difficulty of smaller models.”

However, they did not simply apply MCTS as other researchers have done. Instead, in a stroke of genius, they also ask the model they trained to always output its “thought chain” reasoning steps as natural language descriptions And Python code.

They requested that the model would contain the natural language answers as Python code comments and only the outputs that use Python would be used to train the model.

The researchers also trained a “policy model” to generate mathematical reasoning steps and a process preference model (PPM) to select the most promising steps to solve the problems, improving both over four rounds of “self-development” with each model improving the other.

For their initial data, the researchers said they used “747,000 math word problems from publicly available sources” along with their solutions, but used the two models described above to generate new steps to solve them.

Record-breaking results

After four rounds of self-development, rStar-Math reached key milestones:

• On the MATH benchmarkThe accuracy of the Qwen2.5 Math-7B model increased from 58.8% to 90.0%, surpassing OpenAI o1-preview.

• On the American Invitational Mathematics Examination (AIME)It solved 53.3% of the problems, placing it in the top 20% of high school competitors.

These results highlight the power of SLMs in tackling complex mathematical considerations traditionally dominated by larger systems.

Smaller is better?

In recent years, AI innovation has largely been driven by scaling language models, with increasing parameters seen as a way to improve performance. But the high costs associated with these massive models, from computing resources to energy consumption, have raised questions about scalability.

Microsoft offers an alternative path and focuses on efficiency. The release of rStar-Math further underscores this commitment by demonstrating how SLMs can match, and in some cases even exceed, the capabilities of their larger counterparts.

The twin publications of Phi-4 and Microsoft’s rStar Math paper suggest that compact, specialized models can provide powerful alternatives to the industry’s largest systems.

Additionally, these models challenge the notion that bigger is always better by outperforming larger competitors in key benchmarks. They enable mid-sized organizations and academic researchers to access cutting-edge capabilities without the financial or environmental burden of huge models.



Source link
Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *