Track Talk, W6

How to Confidently Test and Scale Large Language Models

Huw Price

12:00 - 12:45 CEST, Wednesday 17th June

Large Language Models (LLMs) are reshaping enterprise technology, but they also introduce testing challenges that traditional QA wasn’t built to handle.

As probabilistic black boxes, LLMs produce variable outputs, depend on unstable data, and evolve constantly, making regression testing, validation, and compliance far more complex. Meanwhile, regulations such as the EU AI Act and DORA are raising the stakes for auditability and accountability.

This talk explores why conventional methods fall short and presents a systematic framework for testing LLMs at scale. We’ll unpack five essential pillars of effective AI testing: isolating and controlling components for repeatability, applying data-driven and risk-focused strategies, generating realistic and synthetic test distributions, using deep learning–based techniques such as CTGAN, TVAE, and Bayesian modeling to efficiently test both current behaviors and predicted variations.

You’ll learn how to move beyond patchwork approaches and build a repeatable, future-proof framework tailored to the unique challenges of LLMs. From black-box unpredictability to regulatory scrutiny, you’ll leave with practical methodologies and tools to test smarter, scale faster, and reduce risk, without slowing LLM development. Whether developing LLMs in-house or integrating them into enterprise workflows, this session equips you to deliver reliable, auditable, and trustworthy LLM systems.