Do AI reasoning models require new approaches to prompting?

cfr0z3n_minimalist_graphic_novel_style_splash_page_showing_an_a_7ecc7f0f-6b98-4b7a-a134-af1b0cabea63-1.png

Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more


The era of reasoning AI is in full swing.

After OpenAI has once again triggered an AI revolution o1 argumentation model Launched back in September 2024 – which takes longer to answer questions but pays off in higher performance, especially on complex, multi-step problems in math and science – the commercial AI space has been flooded with imitators and competitors.

There are DeepSeeks R1, Google Gemini 2 Flash Thinkingand just today, LamaV-o1all of which aim to provide a similar integrated “reasoning” as OpenAI’s new o1 and upcoming o3 model families. These models are committed Chain-of-thought (CoT) prompt – or “self-directed” – and forces them to think about their analysis mid-process, go back, check their own work, and ultimately arrive at a better answer than just shooting it off the top of their head Embeddings as fast as possible, like other large language models (LLMs) do.

However, the high cost for o1 and o1-mini ($15.00/1M input tokens vs. $1.25/1M input tokens for GPT-4o) is high OpenAI’s API) has caused some to shy away from the alleged performance improvements. Is it really worth paying 12 times more than the typical, cutting-edge LLM degree?

As it turns out, there are a growing number of converts—but the key to unlocking the true value of reasoning models may lie in the user stimulating them differently.

Shawn Wang (founder of AI Intelligence resin) presented on his Substack Over the weekend, a guest post from former Apple Inc. Ben Hylak, interface designer for visionOS (which powers the Vision Pro spatial computing headset). The post went viral because it convincingly explains how Hylak makes OpenAI’s o1 model produce incredibly valuable results (for him).

In short, instead of having the human user write prompts for the o1 model, they should think about writing “briefs” or more detailed explanations that provide a lot of context up front about what the user wants the model to output and who the user is and in what format the model should output the information for them.

As Hylak continues to write Substack:

With most models, we are trained to tell the model how to respond to us. Example: “You are an experienced software developer.” Think slowly and carefully

This is the opposite of what I achieved with o1. I don’t tell him how, just what. Then leave it to o1 to plan and solve its own steps. This is what autonomous thinking is used for, which can actually happen much faster than if you were manually checking and chatting as a “human in the loop.”

Hylak also includes a great annotated screenshot of a sample prompt for o1 that produced useful results for a list of hikes:

This blog post was so helpful that Greg Brockman, President and Co-Founder of OpenAI, re-shared it on his X account News: “o1 is a different kind of model. To achieve great performance, it needs to be used in a new way compared to standard chat models.”

I tried it myself in my recurring quest to become fluent in Spanish and here was the resultfor the curious. Maybe not as impressive as Hylak’s well-constructed prompt and response, but it definitely shows great potential.

Even when it comes to non-reasoning LLMs like Claude 3.5 Sonnet, there may be room for regular users to improve their prompts to get better, less limited results.

As Louis Arge, former Teton.ai engineer and current developer of the neuromodulation device openFUS, wrote on X“One trick I’ve discovered is that LLMs trust their own requests more than my requests,” and provided an example of how he convinced Claude to be “less of a coward” by first putting up “a fight.” with him triggered via his exits.

All of this shows that rapid engineering remains a valuable skill as the AI ​​era continues.



Source link
Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *