Programme Launch Offer: Save 20% - Book Now

Track Talk, T17

Metrics That Matter for GenAI Evaluation

Anamika Mukhopadhyay

Deepshikha

15:45 - 16:30 CEST, Tuesday 16th June

As Generative AI becomes part of everyday life, organizations are embedding LLMs and AI agents into mission-critical workflows. Yet most AI teams still rely on traditional metrics like accuracy, precision, and recall, which are useful for many AI systems but insufficient for generative tasks. Generative AI does not predict right or wrong; it creates possibilities. How do you evaluate creativity, coherence, safety, fairness, or trustworthiness with these metrics alone?

In this talk, we share our experience evaluating LLMs and AI agents across diverse industries and how we designed KPI frameworks tailored to specific contexts. Key insights include:

* Why conventional metrics alone cannot capture the complexity of generative AI
* New dimensions of measurement, from reasoning and factual consistency to fairness and robustness
* The necessity of use-case-driven evaluation for chatbots, summarizers, and autonomous agents
* Designing KPI frameworks that balance technical performance with business impact
* Practical lessons from applying these strategies in real-world deployments

To build Gen AI systems that are reliable, trustworthy, and valuable, we must measure more than correctness. This talk will show you how to define what good looks like, not just in numbers but in outcomes that truly matter.