GPT-4.5 for Enterprise: Justify its correctness and know that the costs?

GPT-4.5 for Enterprise: Justify its correctness and know that the costs?

Take part in our daily and weekly newsletters to get the latest updates and exclusive content for reporting on industry -leading AI. Learn more


The publication of Openai Gpt-4.5 Was a bit disappointing, and many meadows on his insane price (approx. 10 to 20 -times more expensive than Claude 3.7 Sonett and 15 to 30 times more expensive than GPT-4O).

In view of the fact that this is the largest and most powerful non-boundary model from Openaai, it is worth taking into account its strengths and areas in which it shines.

Better knowledge and orientation

The model of architecture or training of the model has hardly any detail, but we have a gross estimate that it was trained with a 10 -more calculation. And the model was so large that Openai had to spread training over several data centers in a reasonable time.

Larger models have a greater ability to learn world knowledge and the nuances of human language (since they have access to high -quality training data). This can be seen in some of the metrics presented by the Openai team. For example, GPT-4.5 has a record urge in personality, a benchmark that evaluated hallucinations in AI models.

Practical experiments also show that GPT-4.5 is better than other general models that remain true to the facts and the following user instructions.

Users have pointed out that the answers from GPT-4.5 feel more natural and context-related than previous models. The ability to follow sound and style guidelines has also improved.

After the publication of GPT-4.5, AI scientist and Openai co-founder Andrej Karpathy, who had early access to the model. said He “expects (ED) to improve the tasks that do not take care of it, and I would say that these are tasks that are more EQ (in contrast to IQ) in connection with the knowledge, creativity, analogy, general understanding, humor, etc. and are bottlenecks.”

However, the assessment of writing quality is also very subjective. In a survey in which Karpathy was operated on different requests, most people preferred GPT-4O’s answers to GPT-4.5. He wrote on x: “Either the high button tester notice the new and unique structure, but the low-touch testers overwhelm the survey. Or we just hallucinate things. Or these examples are just not that great. Or it’s actually quite close and that is a much too small sample size. Or all above. “

Better document processing

In his experiment box that has Integrated GPT-4.5 In its box-AI studio product, GPT-4.5 “particularly effective for corporate uses in which accuracy and integrity are mission critical … Our test shows that GPT-4.5 is one of the best models that are available both in our evaler results and in terms of our evaluation values, and also its ability to treat many of the most difficult AI questions that we have met.”

In his internal reviews, Box found that GPT-4.5 is more precise when asked about the questions of answering corporate documents, the original GPT-4 exceeds about 4 percentage points in their test set.

Source: box

The tests of Box also showed that GPT-4.5 had emerged in mathematical questions with which was embedded in business documents, with which older GPT models had often struggled. For example, it was better to answer questions about financial documents that had to carry out argument about data and calculations.

GPT-4.5 also showed an improved performance when extracting information from unstructured data. In a test in which fields were extracted from hundreds of legal documents, GPT-4.5 19% was more precise than GPT-4O.

Planning, coding, evaluation of the results

In view of its improved world knowledge, GPT-4.5 can also be a suitable model to create high-ranking plans for complex tasks. Hands -off steps can then be handed over to smaller but more efficient models to work out and carry out.

Accordingly Constellation research“In the first tests, GPT-4.5 strong functions for agent planning and execution seem to show, including multi-stage coding workflows and complex task automation.”

GPT-4.5 can also be useful for coding tasks that require internal and context-related knowledge. Github now offers restricted access To the model in its copilot coding assistant and find that GPT-4.5 “effectively executed with creative input requests and provides reliable answers to dark knowledge questions”.

In view of its deeper world knowledge, GPT-4.5 is also “suitable”LLM-AAA-JudgeTasks in which a strong model evaluates the output of smaller models. For example, a model such as GPT-4O or O3 can create one or more answers, generate reason about the solution and hand over the final answer to GPT-4.5 for revision and refinement.

Is it worth the price?

In view of the enormous costs of GPT-4.5, it is very difficult to justify many applications. But that doesn’t mean that it stays that way. One of the Constant trends We have seen in recent years that the decline costs of the inference are, and if this trend applies to GPT-4.5, it is worth experimenting and finding ways with it to give the opportunity to use in corporate applications.

It is also worth noting that this new model can be the basis for future argumentation models. Per Carpathy: “Remember that GPT4.5 was only trained with prepared, supervised finet tuning and RLHF (learning to reinforce from human feedback). So this is not yet an argumentation model. Therefore, this model publication does not drive the model functions in cases where argument is critical (mathematics, code, etc.). Openai will probably now try to continue training with reinforcement learning on the GPT 4.5 model in order to enable thinking and to exceed the model functions in these domains. “



Source link
Spread the love
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *