How to Confidently Test and Scale Large Language Models

Huw Price

12:00 - 12:45 CEST, Wednesday 17th June

Large Language Models (LLMs) are reshaping enterprise technology, but they also introduce testing challenges that traditional QA wasn’t built to handle.

As probabilistic black boxes, LLMs produce variable outputs, depend on unstable data, and evolve constantly, making regression testing, validation, and compliance far more complex. Meanwhile, regulations such as the EU AI Act and DORA are raising the stakes for auditability and accountability.

This talk explores why conventional methods fall short and presents a systematic framework for testing LLMs at scale. We’ll unpack five essential pillars of effective AI testing: isolating and controlling components for repeatability, applying data-driven and risk-focused strategies, generating realistic and synthetic test distributions, using deep learning–based techniques such as CTGAN, TVAE, and Bayesian modeling to efficiently test both current behaviors and predicted variations.

You’ll learn how to move beyond patchwork approaches and build a repeatable, future-proof framework tailored to the unique challenges of LLMs. From black-box unpredictability to regulatory scrutiny, you’ll leave with practical methodologies and tools to test smarter, scale faster, and reduce risk, without slowing LLM development. Whether developing LLMs in-house or integrating them into enterprise workflows, this session equips you to deliver reliable, auditable, and trustworthy LLM systems.

What you will learn

Adopt a systematic framework: Move beyond patchwork QA by implementing the five pillars of LLM testing
Strengthen compliance readiness: Build audit trails and validation processes aligned with the EU AI Act, DORA, and similar regulations
Accelerate with confidence: Use deep learning–based testing and synthetic data to scale faster while reducing risks tied to unpredictability and drift

Session Details

Advanced
45mins
15min Q&A
Testing the reliability, fairness, and safety of AI models

Buy Conference Ticket

Check out this video from Huw Price to hear more about her talk on How to Confidently Test and Scale Large Language Models.

We look forward to welcoming you to EuroSTAR 2026.

Session Speakers

Huw Price

Curiosity Software, UK

Described as “the godfather of test data”, Huw set up his first test data start‑up in 1995 and has since helped invent some of the earliest publicly available solutions for data masking, generation, and provisioning.

A serial test data inventor, Huw has worked with the world’s largest organisations. His work has spanned numerous industries, including banking, government, healthcare, and retail.

A pioneer of advanced data generation techniques, his talks share the lessons he has learned along the way. Test data is his passion, and he has been leading a vendetta against poor TDM practices for more than 30 years.

Stay in The Loop

Subscribe to our newsletter and never miss important announcements, updates and special offers from EuroSTAR.

Facebook
This field is for validation purposes and should be left unchanged.
Name*
First Last
Email*
Job Title*
Years in testing*
Company*
Country*
GDPR*
- I would like to subscribe to updates from EuroSTAR Software Testing Conference
ActiveCampaginChecker
CAPTCHA