Track Talk Th10

LLMs: Testing the Unknowns that Seem to Know It All

Vimmi Walia

Manisha Mittal

13:45-14:45 Thursday 13th June

As an Emerging QA practice lead at Nagarro, Vimmi is obliged to stay current ahead and develop my testing capabilities. She recently tested six applications based on Large Language Models (LLM) in a couple of months.

A good LLM test strategy requires the tester to open the black box and look inside. We must understand new technical concepts like embeddings and mappings, token limits, and temperature. We need to learn the functions in the product, and sometimes we must go into the code to identify potential issues and vulnerabilities. Learning to code helps too.

An excellent test strategy focuses on what can go wrong. Through experience, we’ve noticed problems and risks common to LLM-based applications: hallucinations, deception, security breaches, biases, explainability issues — and test coverage that’s insufficient to reveal them.

In this talk, we’ll share my experiences and a few practices we’ve found helpful, including

  • Checking for biases like race, gender, and ethnicity
  • Testing model robustness by varying query syntax, semantics, and response length; adding distractors; and asking malicious questions.
  • Validating responses with sources of reliable information.
  • Security testing for prompt injections, insecure output handling, poisoned training data, permission issues, data leakage, and more.

 

In this talk, using a sample LLM-based application, we will show how we can test these models. We’ll share my experiences, challenges, learnings, practices, and quick bites on how to test LLM-based applications effectively. We’ll help you to develop a basic understanding of generative AI and LLMs through play and example scenarios, and show a live demo of search and conversations. You may see a bit of Python code, too. Then we’ll talk about risks associated with LLMs and how you can address them in your test strategy.