What you will learn
- A structured approach for evaluating LLM based applications
- An understanding of RAG, and the LLM-as-judge technique
- Understanding how a human-judge baseline can be used to pick a suitable LLM-judge model
- An appreciation for the challenges involved with using LLM-judges
Session Details
- Introductory
- 45 Minutes
- Includes 15mins Q&A
- Testing the reliability, fairness, and safety of AI models
Session Speakers

Anupam Krishnamurthy
Head of AI Testing – TestSolutions, Germany
Anupam is Head of AI Testing at TestSolutions. In the past, he has served in several roles spanning space science, software development, strategy consulting and process automation. As a manager turned software engineer, he continually chips away at the boundaries that separate the two disciplines. Anupam is currently uncovering testing principles that are applicable to AI augmented software. The non-deterministic nature of AI software, its stochastic behaviour and the need to evaluate subjective outputs present new challenges to the field of software testing. When he is not stuff like this, you’ll find him engrossed in a game of online chess, running long distances with a History podcast, or curled up in a corner with a book.
.
Stay in The Loop
Subscribe to our newsletter and never miss important announcements, updates and special offers from EuroSTAR.





