Hallucinations in AI: How GSK is tackling a critical problem in drug development
Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more
Generative AI has become an important part of the infrastructure in many industries, and healthcare is no exception. But as organizations want GSK pushing the boundaries of what generative AI There are significant challenges to what they can achieve – especially when it comes to reliability. Hallucinationsor when AI models generate false or fabricated information are a persistent problem in sophisticated applications such as drug discovery and healthcare. For GSK, addressing these challenges requires leveraging compute scaling during testing to improve generation AI systems. That’s how they do it.
The hallucination problem in generative healthcare
Healthcare applications require exceptional levels of accuracy and reliability. Mistakes aren’t just inconvenient; They can have life-changing consequences. This makes hallucinations in large language models (LLMs) a critical issue for companies like GSK, where genetic AI is used for tasks such as scientific literature review, genomic analysis and drug discovery.
To mitigate hallucinations, GSK employs advanced inference time calculation strategies, including self-reflection mechanisms, multi-model sampling and iterative output evaluation. According to Kim Branson, SvP of AI and machine learning (ML) at GSK, these techniques help ensure agents are “robust and reliable,” while also allowing scientists to generate actionable insights more quickly.
Leverage computational scaling at test time
Calculation scaling during testing refers to the ability to do so Increase computing resources during the inference phase of AI systems. This enables more complex operations such as iterative output refinement or multiple model aggregation, which are critical to reducing hallucinations and improving model performance.
Branson emphasized the transformative role of scaling in GSK’s AI efforts, noting that “at GSK, we’re all about extending iteration cycles – how we think faster.” By using strategies such as self-reflection and ensemble Modeling allows GSK to utilize these additional computational cycles to produce both accurate and reliable results.
Branson also addressed the broader industry trend, saying: “You see this war happening over how much I can serve, my cost per token, and time per token. That will allow people to adopt these different algorithmic strategies that were not technically feasible before, and that will also drive the way agents are deployed and adopted.”
Strategies for reducing hallucinations
GSK has identified hallucinations as a critical challenge Genetic AI for healthcare. The company employs two main strategies that require additional computing resources during inference. Applying more thorough processing steps ensures that each answer is checked for accuracy and consistency before being deployed in clinical or research settings where reliability is of paramount importance.
Self-reflection and iterative output review
A core technique is self-reflection, where LLMs critique or edit their own answers to improve quality. The model “thinks step by step,” analyzing its initial results, pinpointing weaknesses and revising the answers as needed. GSK’s literature search tool illustrates this: it collects data from internal repositories and an LLM’s storage and then re-evaluates its results through self-criticism to uncover inconsistencies.
This iterative process leads to clearer and more detailed final answers. Branson underscored the value of self-criticism by saying, “If you can only afford to do one thing, do that.” By refining its own logic before providing results, the system can generate insights that meet the strict requirements Meet healthcare standards.
Multi-model sampling
GSK’s second strategy relies on multiple LLMs, or different configurations of a single model, to cross-check results. In practice, the system may run the same query at different temperature settings to generate different answers, use fine-tuned versions of the same model specialized for particular domains, or rely on entirely separate models trained on different data sets.
Comparing and contrasting these results helps confirm the most consistent or convergent conclusions. “You can get the effect of having different orthogonal paths to get to the same conclusion,” Branson said. Although this approach requires more computing power, it reduces hallucinations and increases confidence in the final answer – a key advantage in high-risk healthcare environments.
The Conclusion Wars
GSK’s strategies rely on an infrastructure that can handle significantly higher computing workloads. In what Branson calls “inference wars,” AI infrastructure companies – such as BrainsGroq and SambaNova – Compete for hardware breakthroughs that improve token throughput, reduce latency, and lower cost per token.
Specialized chips and architectures enable complex inference routines, including multi-model sampling and iterative self-reflection, at scale. For example, Cerebras’ technology processes thousands of tokens per second, enabling the use of advanced techniques in real-world scenarios. “You see that the results of these innovations have a direct impact on how we can effectively use generative models in healthcare,” Branson noted.
As hardware keeps pace with software needs, solutions emerge to maintain accuracy and efficiency.
Challenges remain
Despite these advances, scaling computing resources presents obstacles. Longer inference times can slow down workflows, especially when clinicians or researchers need quick results. Higher computing utilization also drives up costs and requires careful resource management. However, GSK believes these compromises are necessary to achieve greater reliability and greater functionality.
“As more tools are enabled in the agent ecosystem, the system becomes more useful to people, and ultimately computing power increases,” Branson noted. Balancing performance, cost and system capabilities allows GSK to maintain a practical yet forward-looking strategy.
What’s next?
GSK plans to continue refining its AI-driven healthcare solutions, with scaling computing power a top priority during testing. The combination of self-reflection, multi-model sampling, and robust infrastructure helps ensure that generative models meet the rigorous requirements of clinical environments.
This approach also serves as a guide for other organizations, showing how to balance accuracy, efficiency and scalability. Maintaining a lead in computational innovations and sophisticated inference techniques not only addresses current challenges, but also lays the foundation for breakthroughs in drug discovery, patient care and beyond.
Source link