Programme Launch Offer: Save 20% - Book Now

Track Talk, Th1

Automated Testing of Large Language Models: A Live Demo

Anupam Krishnamurthy

09:00 - 09:45 CEST Thursday 18th June

How do you test a large language model’s shape-shifting outputs using automated tests? In this session, I will demonstrate the inner workings of a Retrieval Augmented Generation (RAG) model, and how you can qualify it using automated tests.

Spoiler alert: testing an LLM’s non-deterministic outputs will involve using another LLM to judge this output and score it. We will look at how these automated tests work by using LLM-as-judge.

Once we run the tests, we will prop up the hood and examine the stack traces of the evaluation framework, so that we can debug an unexpected result. We will then subject the RAG model to an indirect injection attack.

The tools used:

LangChain for creating the RAG model

Ragas for performing the assessment

LangSmith for examining the stack traces

Join me in this live demonstration, where we pit one LLM against another, and try to expose a security flaw in the bargain.