Programme Launch Offer: Save 20% - Book Now

Track Talk, W7

Automated Structural/ White-Box Testing of LLM-Based Agents

Jens Kohl

Otto Kruse

12:00 - 12:45 CEST, Wednesday 17th June

LLM-based agents are rapidly being adopted across diverse domains. Since they interact with users without supervision, they must be tested extensively.

Current testing approaches largely focus on acceptance-level evaluation of the whole agent, that is testing the agent end-2-end from the perspective of a user to ensure the agent meets the user’s needs.

While these tests are intuitive and also assess non-functional behavior, they need manual evaluation of test outputs, are difficult to automate and do not help root cause analyses in case of failed tests. Additionally, their test environment is often time-consuming and expensive to build.

Add together, this leads to high development and especially operational costs, as agents need to be continuously updated to raise the bar for quality.

In our talk, we detail methods to enable structural or glass-box testing of LLM-based agents. This allows testing agent components and interactions on a deeper technical level in an automated workflow. Additionally, structural testing offers adapting well-known testing methods from software engineering to the domain of LLM-based agents such as the test automation pyramid.

We illustrate methods and applications on representative case studies and show effects regarding increased test quality while reducing costs through higher test reusability and test coverage as well as automated execution and checking of tests.