Automated Structural/ White-Box Testing of LLM-Based Agents

Jens Kohl

Otto Kruse

12:00 - 12:45 CEST, Wednesday 17th June

LLM-based agents are rapidly being adopted across diverse domains. Since they interact with users without supervision, they must be tested extensively.

Current testing approaches largely focus on acceptance-level evaluation of the whole agent, that is testing the agent end-2-end from the perspective of a user to ensure the agent meets the user’s needs.

While these tests are intuitive and also assess non-functional behavior, they need manual evaluation of test outputs, are difficult to automate and do not help root cause analyses in case of failed tests. Additionally, their test environment is often time-consuming and expensive to build.

Add together, this leads to high development and especially operational costs, as agents need to be continuously updated to raise the bar for quality.

In our talk, we detail methods to enable structural or glass-box testing of LLM-based agents. This allows testing agent components and interactions on a deeper technical level in an automated workflow. Additionally, structural testing offers adapting well-known testing methods from software engineering to the domain of LLM-based agents such as the test automation pyramid.

We illustrate methods and applications on representative case studies and show effects regarding increased test quality while reducing costs through higher test reusability and test coverage as well as automated execution and checking of tests.

What you will learn

Methods and workflows for white-box testing of LLM-based agents
Increase test efficiency of LLM-based agents
Lessons learned from operationalizing LLM-based agents at scale

Session Details

Intermediate
45mins
Includes 15min Q&A
Implementing/integrating AI tools

Buy Conference Ticket

Session Speakers

Jens Kohl

BMW Group, Germany

Jens Kohl is a technology leader and builder with 13 years of experience at the BMW Group.

He is responsible for shaping the architecture and continuous optimization of the Connected Vehicle cloud backend. Jens has been leading software development and machine learning teams with a focus on embedded, distributed systems and machine learning for more than 10 years.

Session Co-Speaker

Otto Kruse

Mistral AI, Netherlands

Otto Kruse is a Forward Deployed AI Engineer at Mistral AI, where he partners with customers to move beyond AI experimentation and build production-grade AI systems embedded deeply into their products and workflows to drive measurable business impact. Prior to Mistral AI, Otto worked at Amazon Web Services (AWS).

BACK TO PROGRAMME

Stay in The Loop

Subscribe to our newsletter and never miss important announcements, updates and special offers from EuroSTAR.

Facebook
This field is for validation purposes and should be left unchanged.
Name*
First Last
Email*
Job Title*
Years in testing*
Company*
Country*
GDPR*
- I would like to subscribe to updates from EuroSTAR Software Testing Conference
ActiveCampaginChecker
CAPTCHA