Hugging face reduces AI vision models to a telephone-friendly size and thus lowers the computing costs

Hugging face reduces AI vision models to a telephone-friendly size and thus lowers the computing costs

Subscribe to our daily and weekly newsletter to obtain the latest updates and exclusive content on the industry-leading AI report. Learn more


Hug has provided a remarkable performance Breakthrough in the AIIntroduction of vision long-length models that run on devices that are as small as smartphones, and at the same time exceed their predecessors, require huge data centers.

The company is new Smolvlm-256m modelThe fact that less than a gigabyte GPU memory exceeds the performance of its IDEFICS 80B model From just 17 months ago-a 300-larger system. This dramatic reduction in size and improvement of performance marks a turning point for practical AI use.

“When we published Idefics 80b in August 2023, we were the first company to do this Open source A video language model, ”said Andrés Marafioti, research engineer for machine learning at Hugging Face, in an exclusive interview with venturebeat. “Due to the 300-fold size reduction, SMOLVLM marks a breakthrough in Vision-Language models.”

A performance comparison of the new SMOLVLM models from Hugging Face shows that the smaller versions (256m and 500m) all outperform their predecessor with 80 billion parameters with important visual issues. (Source: Hugging Face)

Smaller AI models that run on everyday devices

The progress comes at a crucial time for companies that have to deal with it astronomical computing costs the implementation of AI systems. The new Smolvlm models-available in 256m And 500m Parameter sizes – process pictures and understand visual content with speeds that have so far been unreachable in your size class.

The smallest version processes 16 examples per second and only needs 15 GB RAM with a stack size of 64, which makes you particularly attractive for companies that want to process large quantities of visual data. “For a medium -sized company that processes 1 million pictures every month, this means considerable annual savings in the computing costs,” Marafioti told Venturebeat. “The lower memory requirement means that companies can provide cheaper cloud instances, which lowers infrastructure costs.”

The development has already attracted the attention of large technology companies. IBM has teamed up with Hugging Face to integrate the 256m model in DoclingYour document processing software. “Although IBM certainly has access to considerable computing resources, you can efficiently process with smaller models such as these millions of documents to a fraction of the costs,” said Marafioti.

Processing speeds of SMOLVLM models across different stacks. They show that the smaller 256m and 500m variants significantly outperform the 2.2B version on both A100 and L4 graphics cards. (Source: Hugging Face)

How Hugging Face reduced the model size without impairing performance

The efficiency gains result from technical innovations both in image processing and language components. The team switched from a vision encoder with 400 million parameters to a version with 93 million parameters and implemented more aggressive token compression techniques. These changes ensure high performance and at the same time significantly reduce the computing effort.

These developments could be transformative for startups and smaller companies. “Startups can now launch demanding computer vision products within weeks instead of months, and at infrastructure costs that were unaffordable a few months ago,” said Marafioti.

The effects range beyond cost savings to the enabling of completely new applications. The models enable extended document search functions ColipaliAn algorithm that creates searchable databases from documentarchives. “You achieve a performance that comes very close to the tenfold size, and at the same time increases the speed at which the database is created and searched, significantly, which means that the company -wide visual search becomes accessible for the first time for companies of all kinds,” explained Marafioti.

A breakdown of the 1.7 billion training examples from Smolvlm shows that the document processing and caption make up almost half of the data record. (Source: Hugging Face)

Why smaller AI models are the future of AI development

The breakthrough questions the conventional opinion about the connection between model size and performance. While many researchers have assumed that larger models are required for advanced visual voice tasks, Smolvlm shows that smaller, more efficient architectures can achieve similar results. With important benchmarks, the 500m parameter version reaches 90 % of the performance of its 2.2b parameter siblings.

Instead of suggesting an efficiency plateau, Marafioti sees these results as proof of unused potential: “To date, the standard was to release VLMS from 2B parameters; We thought that smaller models were not useful. We prove that models with just a tenth of the size can actually be extremely useful for companies. ”

This development is carried out against the background of growing concern via AI Effects on the environment And Computing. By drastically reducing the resources required for Vision-Language-KI, the innovation of Hugging Face could help to tackle both problems and at the same time make advanced AI functions accessible to a broader spectrum of organizations.

The models are Open source availableHugging Face continues its tradition to improve access to AI technology. In combination with the efficiency of the models, this accessibility could accelerate the introduction of Vision-Language-KI in industries from healthcare to retail, in which the processing costs have so far been unaffordable.

In an area in which greater means better, the performance of Hugging Face indicates a new paradigm: the future of the AI ​​may not be in ever larger models that run in distant data centers, but in nimble, efficient systems that Run directly on our devices. While the industry deals with questions of scaling and sustainability, these smaller models could represent the greatest breakthrough so far.



Source link
Spread the love
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *