Diffbot’s AI model doesn’t guess – it knows thanks to a trillion-fact knowledge graph

nuneybits_Vector_art_of_Robot_with_a_mangifying_glass_analyzing_3b4bb72c-797a-413f-b6c8-b41ecd025743.webp.png

Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more


Diffbota small Silicon Valley company best known for running one of the world’s largest indices Web To knowtoday announced the release of a new AI model that promises to address one of the biggest challenges in the field: factual accuracy.

The new modela fine-tuned version of Metas LLama 3.3, is the first open source implementation of a system known as “Graph Retrieval-Augmented Generation” or “Graph Retrieval-Augmented Generation”. GraphRAG.

Unlike traditional AI models that are based solely on massive amounts of pre-loaded training data, Diffbots LLM draws on real-time information from the company To know grapha constantly updated database with more than a trillion interconnected facts.

“We have a thesis: that general thinking can eventually be reduced to about a billion parameters,” said Mike Tung, founder and CEO of Diffbot, in an interview with VentureBeat. “You don’t actually want to have the knowledge in the model. You want the model to be good at just using tools so it can query knowledge externally.”

How it works

Diffbots Knowledge graph is a sprawling, automated database that has been crawling the public web since 2016. It categorizes web pages into entities such as people, companies, products and articles and extracts structured information using a combination of computer vision and natural language processing.

Every four to five days, the Knowledge Graph is updated with millions of new facts to ensure it stays current. Diffbots AI Model leverages this resource by querying the graph in real time to retrieve information rather than relying on static knowledge encoded in its training data.

For example, when asked about a current news event, the model can search the Internet for the latest updates, extract relevant facts, and cite the original sources. This process is intended to make the system more accurate and transparent than traditional LLMs.

“Imagine asking an AI about the weather,” Tung said. “Instead of generating an answer based on outdated training data, our model queries a live weather service and provides an answer based on real-time information.”

How Diffbot’s Knowledge Graph outperforms traditional AI at finding facts

In benchmark tests, Diffbot’s approach seems to be paying off. The company claims its model achieves 81% accuracy FreshQAa benchmark created by Google for testing real-time factual knowledge that outperforms both ChatGPT and Gemini. It also reached 70.36% MMLU-Professionala more difficult version of a standard test of academic knowledge.

Perhaps most importantly, Diffbot makes its model completely open source, allowing companies to run it on their own hardware and customize it to their needs. This addresses growing concerns about data protection and vendor lock-in among major AI providers.

“You can run it locally on your computer,” Tung noted. “There is no way you can run Google Gemini without sending your data to Google and sending it outside of your premises.”

Open source AI could change the way companies handle sensitive data

The release comes at a crucial time in AI development. In recent months there has been increasing criticism of the tendency of large language models to “hallucinate” or generate false information even as companies continue to increase model sizes. Diffbot’s approach suggests an alternative path that focuses on basing AI systems on verifiable facts rather than attempting to encode all human knowledge in neural networks.

“Not everyone is aiming for larger and larger models,” Tung said. “With a non-intuitive approach like ours, you can have a model that offers more possibilities than a large model.”

Industry experts note that Diffbot’s knowledge graph-based approach could be particularly valuable for enterprise applications where accuracy and auditability are critical. The company already provides data services to large companies including Cisco, DuckDuckGo And Snapchat.

The model is now available via an open source version GitHub and can be tested in a public demo at diffy.chat. For companies looking to deploy it internally, Diffbot says the smaller version can run on a single version with 8 billion parameters Nvidia A100 GPUwhile the full version with 70 billion parameters requires two H100 GPUs.

Looking ahead, Tung believes that the future of AI lies not in ever larger models, but in better ways to organize and access human knowledge: “Facts are becoming obsolete. Many of these facts are moved to explicit places where you can actually change the knowledge and determine the provenance of the data.”

As the AI industry struggles with challenges around factual accuracy and transparency, the release of Diffbot offers a compelling alternative to the prevailing “bigger is better” paradigm. Whether it succeeds in changing direction remains to be seen, but it has certainly shown that size isn’t everything when it comes to AI.



Source link
Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *