• Skip to main content
EuroSTAR 2027 - Sign up for early access

EuroSTAR Conference

Europe's Largest Software Testing Conference.

  • Programme
    • Call for Speakers
    • 2026 Programme
    • Community Hub
    • Awards
  • Attend
    • Why Attend
    • Bring your Team
    • Testimonials
  • Sponsor
    • Sponsor Opportunities
    • Sponsor Testimonials
  • About
    • About Us
    • Our Timeline
    • FAQ
    • Blog
    • Organisations
    • Contact Us
  • Book Now

software testing tools

Gen AI for Software & Quality Engineering – Elevate your possible

May 26, 2026 by Lauren Payne

When we look at the history of software engineering and quality engineering (QE), we often see significant shifts occurring every decade or so. Each has brought with it new tools, processes and methodologies that have improved the way we develop, test, and maintain software applications, and by extension, the experiences we deliver to end users.

Test automation emerged in the 1970s to accommodate growing business demands, with Agile methodologies following in the ’90s to enhance quality and security. From the 2000s, DevOps and low-code strategies enabled quicker deployment and talent attraction.

The software industry was further propelled by social, mobile, analytics, and cloud (SMAC) technologies, completely transforming it as we knew it.

Despite these advancements, challenges persist in delivering quality products at speed, managing cybersecurity risks, containing costs, reducing technical debt, and sourcing skilled talent.

Now, with the emergence of generative AI, we stand on the brink of the next major evolutionary leap in software and quality engineering. This technology merges with traditional engineering principles to significantly expedite development, testing, and maintenance processes—ensuring faster market delivery of innovative products while freeing up time for strategic initiatives.

Introducing the Gen AI Amplifier for Software and Quality Engineering: Quality Reimagined.

Accelerate QE with the Gen AI Amplifier

The Gen AI Amplifier for Software and Quality Engineering is a groundbreaking accelerator designed to optimize the Quality Engineering & Testing (QE&T) stages of the software development life-cycle (SDLC). By applying the powerful capabilities of generative AI to our industry-leading best practices, frameworks, and methodologies, we’ve crafted a uniquely amplified approach. 

This accelerator is tailored to enhance the efficiency of QE&T tasks across the entire software life-cycle, from planning and design to building, testing, and deployment. It incorporates pre-built use cases and AI-driven prompts for every life-cycle stage—ranging from requirements and user stories to architecture modeling, design, code transformation and test case generation, API testing, synthetic test data and test automation. 

The Gen AI Amplifier is built on a robust QE foundation using our proprietary assets, and thoroughly tested and validated to ensure consistent, reliable results. It is integrated with multiple LLMs within a secure architecture framework, which is further strengthened by our gen AI guardrails, knowledge framework, and data platforms. 

The Gen AI Amplifier is helping us do in minutes and hours what software development and QA teams have traditionally needed days or weeks for: 

  • Generating first requirements from conversation documents 
  • Generating TMAP-guided test cases from requirements 
  • Generating comprehensive test data 
  • Generating automated test scripts, and more.

Generative AI adoption is driving impact across the entire test ecosystem

In the 15th World Quality Report, we learned that out of the 1,750+ senior executives surveyed globally, 64% have identified processes or applications that can benefit from AI. The majority are using AI towards building and improving test scope along with improving performance engineering, as well as the test ecosystem overall. They are actively utilizing AI to optimize their testing processes. 79% of the quality leaders agree that AI systems are going to be used to help them optimize their test scope and increase velocity.

64% of organizations have identified processes/ applications that can benefit from AI.

79% of quality leaders agree that AI systems are going to be used to help them optimize their test scope and increase velocity.

(Data from World Quality Report, 15th Edition, 2023-24)

The Gen AI Amplifier supports the full life-cycle for agile teams. We’ve integrated multiple Gen AI models in a secure environment, with pre- engineered and tested prompts built with our industry-leading quality engineering methodology, agile framework, and cloud-native architectural best practices.

With the help of our AI experts at Sogeti, the Gen AI Amplifier augments business analysts, product owners, lead architects and quality engineers at every stage – from requirements gathering and infrastructure design to comprehensive software testing.

Quality amplified early on, and at every stage, yields a significantly increased level of productivity, efficiency, and acceleration.

Closing thoughts, for now

Diving into generative AI is really exciting, and I believe in the importance of remaining grounded and focused on pragmatic experimentation. This isn’t just about incremental improvements—it’s about redefining our approach to software engineering and QA. While our ultimate goals haven’t changed, our means to achieve them have evolved. Let’s explore generative AI’s potential responsibly, learning and adapting as we go. Together, we’ll discover how these advanced tools can enhance our industry without losing sight of our core objectives.

Watch this space as we unveil more about the Gen AI Amplifier and its impact on software QA.

Author

Antoine Aymer
CTO for Quality Engineering & Testing, Sogeti

Antoine Aymer

CTO for Quality Engineering & Testing, Sogeti

Sogeti are Exhibitors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Filed Under: EuroSTAR Expo, Software Testing Tagged With: 2026, EuroSTAR Conference, Expo, software testing tools

Testing AI Agents: A Practical Blueprint for Custom Evaluation Frameworks

May 13, 2026 by Lauren Payne

A Leader’s guide to building domain-aware evaluation disciplines that turn experimental AI pilots into production-grade, auditable enterprise systems.

The AI Agent Production Gap

AI agents are moving rapidly into software engineering, testing, DevOps, support, and back-office workflows. Many organisations have running pilots; far fewer trust those agents enough to put them into production. Gartner predicts that more than 40% of agentic AI projects may be cancelled by the end of 2027 , citing cost, unclear business value, and inadequate risk controls.

The blocker is rarely the model. It is the absence of an evaluation discipline that can answer a harder question: can this agent complete the right task, in the right context, with the right controls, consistently?

Why AI Agents Need a Different Testing Strategy

Traditional software testing assumes a known input, a fixed expected output, and repeatable behaviour. AI agents do not work that way. Outputs vary. Retrieval can return different chunks. Tool calls may take different paths. The reasoning trace shifts from one run to the next.

Evaluating an agent requires checking whether it understood the user’s intent, retrieved the right context, called the right tool, avoided hallucination, followed enterprise policy, and escalated when its confidence was low. No single generic metric covers all of this.

Existing Evaluation Frameworks Are Necessary, Not Sufficient

A strong ecosystem of evaluation tools already exists. DeepEval, Ragas, Promptfoo, LangSmith, Braintrust, TruLens, Phoenix, and OpenAI Evals each give teams real leverage on prompts, RAG pipelines, model outputs, hallucination, retrieval quality, tool calls, traces, and regression behaviour. They are essential building blocks.

But a customer-support agent, a banking-compliance agent, a test-case generation agent, and a Playwright automation agent may all share an LLM core while having entirely different definitions of “good.” Generic accuracy and faithfulness scores cannot decide whether a generated test suite is release-ready. The evaluation tool provides the engine; the organisation must define the quality model

The Custom Evaluation Blueprint

A practical custom evaluation framework can be built in seven steps:

  1. Define the agent’s mission Write down what the agent must do, what it must never do, and how much autonomy is allowed before a human is required. This becomes the evaluation contract.
  2. Build task-level evaluation datasets Cover normal flows, edge cases, negative scenarios, ambiguous prompts, high-risk domain cases, and historical production issues.
  3. Create domain-specific rubrics Score domain relevance, business accuracy, retrieval correctness, reasoning, tool correctness, hallucination control, compliance, clarity, and escalation behaviour.
  4. Apply weighted scorecards A formatting slip is low severity; a wrong business recommendation is critical; a wrong tool call may block release. Weight accordingly.
  5. Combine automated evaluation with human calibration Automated evaluators give scale; expert reviewers calibrate the rubric over time to account for edge cases no automated scorer anticipated.
  6. Run regression evaluation continuously Re-score whenever the model, prompt, RAG corpus, tool definition, workflow, or enterprise policy changes.
  7. Convert scores into release gates Pass · Conditional Pass · Human Review Required · Block — each gate tied to a clear business risk threshold.


Custom Metrics Based on Business Context

Generic LLM benchmarks measure model capability in isolation. Enterprise AI agents operate in a business context — with specific user personas, data governance requirements, integration constraints, and financial consequences of failure. The metrics must reflect that context. Below is a framework for selecting and weighting evaluation dimensions by deployment domain.

Metric Clusters by Enterprise Domain

Each domain cluster below contains the metrics that carry the most signal for that type of agent. Select the cluster that matches your deployment, then tune weights using your organisation’s risk tolerance and regulatory posture.

Weighted Scorecard: Enterprise AI Agent Release Template

The table below shows how to structure a weighted scorecard across core evaluation dimensions. Adjust weights to match your domain cluster above and your organisation’s risk posture.

Business-Context Metric Matrix

The following matrix maps enterprise agent types to their primary KPI, the hardest-to-catch failure mode, and the metric that most reliably surfaces it.

Release Gates — Translating Scores into Decisions

Every weighted scorecard must terminate in a binary business decision. The four-gate model below maps score ranges to actions and assigns responsibility for each outcome.

Lessons from testron.ai Implementations


What This Means for QA Teams

Agentic AI is reshaping the QA mandate. Test execution is no longer the centre of gravity; evaluation design is. The QA function becomes the quality gatekeeper for enterprise AI agents — owning rubrics, scorecards, regression datasets, and human-in-the-loop calibration.

The skills that compound from here are domain-aware evaluation design, structured human review, and translating business risk into release gates that engineering and the business both trust.

Closing

The future of testing is not just more automation. It is trusted AI-agent evaluation: a clear mission, a custom rubric tuned to business context, a weighted scorecard, calibrated human review, and a release gate that reflects business risk.

Teams that build this discipline now will be the ones who put agents into production with confidence — and who earn the trust of the board, the regulator, and the customer.

References:
Gartner: Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 — gartner.com
DeepEval — deepeval.com

Author

Babu Manickam
CTO

Babu Manickam CTO, Indsafri

Over 27 years of experience in software testing, test automation, performance engineering, DevOps, and AI-led quality engineering. He has trained more than 50,000 QA professionals and works with enterprises to implement modern testing practices across automation, Generative AI, and agentic quality engineering. An active speaker and community contributor in the software testing ecosystem, with a strong focus on helping QA professionals transition into AI-augmented engineering roles.

Indsafri are Exhibitors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Filed Under: EuroSTAR Expo, Software Testing Tagged With: 2026, EuroSTAR Conference, Expo, software testing tools

The Real Problem with AI Testing Tools Isn’t the AI – It’s Trust

May 11, 2026 by Lauren Payne

Every software company is talking about AI. Copilots. Autonomous agents. Self-healing tests. But inside most engineering organisations, a quiet hesitation persists – not because teams doubt AI’s potential. Precisely because they do.

The real question holding teams back isn’t capability. It’s a far more fundamental one: Can we trust AI to participate in release-critical testing workflows?

That question matters because software testing is fundamentally different from most other AI-assisted tasks. A poor AI-generated image is an inconvenience. A poor AI-governed production release can damage revenue, customer trust, compliance, and operational stability.

The next evolution of quality engineering will not be driven by fully autonomous AI in isolation. It will be driven by human-controlled Agentic AI.

The Industry Has a Scaling Problem

Most organisations already have automation. What they struggle with is scaling it.

Traditional automation frameworks demand specialist skills, constant maintenance, brittle selectors, and significant engineering effort just to keep regression suites operational. As applications evolve, teams often spend more time maintaining tests than expanding meaningful coverage.

Meanwhile, release velocity continues to accelerate. Modern engineering teams are expected to deliver continuously across web, mobile, APIs, and increasingly complex customer journeys. The operational cost of maintaining automation has quietly become one of the biggest hidden blockers in software delivery.

This is where Agentic AI changes the conversation entirely.

From Scripts to Intelligent Testing Systems

At Virtuoso, our Agentic AI vision is built around a single, clear outcome:

Rather than treating test automation as a collection of static scripts, Agentic AI introduces intelligent operational workflows that help QA teams move from requirements to execution dramatically faster. The platform is designed to:

But critically, AI does not act unchecked. Every important decision point remains reviewable, traceable, and human-governed. Requirements are approved. Generated journeys are reviewed. Repairs are auditable. Changes are versioned.

The Future Is Not AI Replacing Testers

One of the most persistent misconceptions surrounding AI in testing is that the goal is complete removal of human involvement. In reality, experienced QA engineers are becoming more valuable – not less.

The organisations seeing the greatest success with AI are those combining machine scale with human judgement. That is the core philosophy behind Virtuoso’s approach to Agentic AI.

Operational QA, Not Just Better Automation

The industry has spent years focused on “test automation tools.” The next phase is operational QA systems – systems that can understand context, coordinate workflows, propose actions, and continuously improve automation at a scale that would traditionally require large engineering teams.

But operational credibility only comes from governance. Engineering leaders don’t simply need AI that can generate tests. They need AI they can trust inside release-critical pipelines. That means:

This is precisely why human-controlled Agentic AI will ultimately outperform uncontrolled autonomy in enterprise software delivery.

The Next Era of Quality Engineering

AI will absolutely transform software testing. But the winners in this space will not be organisations chasing fully autonomous systems with no oversight. The winners will be teams that successfully combine:

  • AI-Powered Scale
  • Intelligent Orchestration
  • Human Governance
  • Operational Trust

In software delivery, speed matters. But trust matters more.

Author

Andy Dickin – QA & Quality Engineering Leader Virtuoso QA

Andy specialises in AI-powered automation and modern quality engineering practices that help organisations scale software delivery with confidence.

Virtuoso are Exhibitors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Filed Under: EuroSTAR Expo, Software Testing Tagged With: 2026, EuroSTAR Conference, Expo, software testing tools

The AI Velocity Paradox: Why Your Test Data is the New Bottleneck  

April 22, 2026 by Lauren Payne

Generative AI has dramatically accelerated software development. Ideas that once took weeks to turn into working code can now be prototyped in hours. AI tools generate code, suggest tests, analyze logs, and help engineers iterate faster than ever. According to McKinsey’s research on developer productivity, AI-assisted teams can complete coding tasks significantly faster, with some organizations reporting development velocity improvements of 30 to 40% in AI-supported workflows.  

But something interesting is happening in many organizations: even as coding accelerates, releases still stall. Teams wait not for code to be written, but for confidence that it is safe to ship. In many cases, the blocker is not the tests themselves. It is the data behind them. In the AI era, test data has quietly become the new critical path to quality.  

Consider a typical enterprise team using AI coding assistants. Development velocity improves immediately. Features move from idea to code faster than ever. But releases still slip because QA teams must wait days for masked datasets or environment refreshes. The development bottleneck is gone, but the testing bottleneck remains.  

The Bottleneck is Shifting in the SDLC  

AI now supports multiple stages of the software lifecycle, reducing work that once took days to mere hours. Yet validation still takes time. Teams can generate tests quickly, but they often wait on environments, data provisioning, and compliance approvals before those tests can run.  

As development accelerates, constraints move downstream toward validation. Inside QA, test data and test environments are where projects most often slow down.

Why Test Data is Uniquely Hard  

Accessing usable test data is rarely simple. Industry research consistently shows that testers spend up to 30 to 40% of their time searching for or preparing data rather than executing tests. Manual provisioning of test datasets can take days or weeks. That cadence does not match modern CI/CD pipelines or AI-assisted development.  

When teams cannot access realistic data, test coverage suffers. Edge cases go untested, business workflows behave differently than they do in production, and defects slip through into live environments. These inconsistencies make tests unreliable and environments unpredictable. Furthermore, if automation frameworks and AI-driven testing tools rely on incomplete data, the automation itself becomes unreliable.

The AI Twist: More Code, More Testing Pressure

Agentic AI will generate more code changes and more test cases than any previous tooling. This increases the validation volume required in every build.  

If teams run AI-generated tests on unrealistic datasets, the results can be highly misleading. A build may pass in the lab while still hiding defects that appear in production. “Green builds” do not always mean safe releases. Without production-like data, testing becomes a simulation detached from reality.

Why Traditional Test Data Management Breaks Under AI Velocity 

Feature Traditional Test Data Management Modern Test Data Management 
(AI Era) 
Provisioning Model Centralized teams, ticket-based requests  Self-service, on-demand generation
Data Source Full database copies (heavy, slow)  Multi-source data automatically masked, blended, and pushed in QA/UAT environments  
Speed Days, weeks, or months Minutes (self-service UI or integrated into CI/CD pipelines)  
Architecture Fit Monolithic legacy systems  Spans legacy, SaaS, and
micro-services  
Compliance / PII Manual masking that differs by database, inconsistent enforcementAutomated, centralized PII masking and synthetic data, built-in by design

What Good Test Data Looks Like Today  

To support modern testing, data must meet three strict criteria:  

  • Realistic: Datasets must reflect production behaviors, edge cases, and real business workflows.  
  • Compliant: Personally identifiable information (PII) must be protected. Using synthetic test data (realistic but fictitious datasets generated to mirror production patterns) and advanced masking techniques helps preserve useful data characteristics without exposing real user details.  
  • Consistent: Enterprise applications span multiple systems. Test data must reflect accurate relationships across services, platforms, and applications.  

Platforms like K2view approach this by organizing enterprise data around business entities such as customers, policies, or orders using patented Micro-Database technology. This allows teams to provision complete, consistent datasets that reflect real user journeys across multiple systems while maintaining referential integrity. Instead of stitching together multiple tools, teams combine data masking, cloning, and synthetic generation within one platform.  

Don’t Let Test Data Become the New Bottleneck  

The bottleneck has moved, but it hasn’t disappeared. Teams that recognize test data as infrastructure — not an afterthought — will be the ones who actually realize the promise of AI-accelerated development. The rest will keep explaining why fast code still means slow releases. 

Author

Amitai Richman, Director of Product Marketing  K2view Test Data Management

Amitai Richman is Director of Product Marketing at K2view, where he leads GTM strategy for enterprise AI and test data management solutions. He specializes in translating complex data and agentic AI capabilities into clear business value for enterprise buyers, with a particular focus on TDM, synthetic data generation, and AI-ready data infrastructure. Amitai has presented on AI in software testing at industry forums across Europe and the US, and is a recognized voice in the product marketing and QA communities. 

Filed Under: EuroSTAR Expo, Software Testing Tagged With: 2026, EuroSTAR Conference, software testing tools

Currys achieves 4X faster release cycle with BrowserStack AI-Powered Test Management

April 17, 2026 by Lauren Payne

Introduction

Currys, a leading UK omnichannel retailer, is committed to delivering seamless digital experiences. With a vast online and offline presence, ensuring high-quality software releases is critical. However, fragmented testing, manual test management processes, and limited visibility into testing efforts slowed development cycles. To overcome these challenges, Currys adopted BrowserStack Test Management, resulting in faster releases, improved test coverage, and streamlined quality assurance.

The Challenge

Fragmented testing and limited visibility

When Gregg Ward, Principal Quality Assurance Manager, joined Currys, he found a fragmented testing process. Teams operated in silos, using multiple different tools for testing and Excel spreadsheets stored on individual computers to manage test cases. The lack of a centralized system led to inefficiencies, making collaboration difficult and slowing down issue resolution.

Without real-time visibility, their team spent more time chasing information than solving problems. Gathering data took three times longer than fixing issues, and the deep technical knowledge required made onboarding and collaboration challenging. Testing remained a black box for stakeholders, impacting release cycles and innovation. Test creation was also time-consuming, impacting release cycles and slowing innovation.

“We needed a single source of truth that would provide real-time visibility to everyone involved, from developers to stakeholders. We wanted to standardize transparency and real-time results, and BrowserStack ticked all those boxes,” Gregg says.

The Solution

BrowserStack AI-Powered Test Management

Currys moved to BrowserStack Test Management for its seamless integration, intuitive interface, and AI capabilities. The platform provided a centralized hub for test creation, execution, and tracking, enabling both technical and non-technical users to access critical information effortlessly.

Migrating from legacy systems was a major shift. With 40 years of legacy platforms and over 200,000 test cases in massive Excel files, they were concerned about losing critical data. But BrowserStack supported them every step of the way, providing hands-on assistance. The result? Not a single test case was lost—everything was intact. With BrowserStack’s support, Gregg’s team onboarded 120 users and imported 60,000 test cases from an existing Test Management tool, including a project with 22,000 test cases migrated in under five hours.

“We haven’t lost a single test case. Everything is there. BrowserStack Test Management lets you set up field mappings to match however your legacy test files were structured, and it’s incredibly flexible,” Gregg says. This ensured that Currys’ QA teams could continue their work without disruption, maintaining the integrity of their testing data while benefiting from a more streamlined and efficient test management system.

Having migrated their existing test cases, Currys leveraged BrowserStack’s AI capabilities to accelerate test creation. “We recently introduced BrowserStack AI to our testing teams across all areas of Currys, and the impact was immediate. The increase in coverage was fourfold, but the real advantage was the speed at which we could create test cases directly from our UI, Jira, or Confluence,” Gregg explains. AI-driven automation reduced manual effort and encouraged broader, more innovative test coverage.

The platform’s dashboards and reporting further improved issue detection. “Dashboards in BrowserStack are incredibly useful. They provide an easy-to-follow information path that we can share with stakeholders, making it simple to highlight key insights and zoom into the details when needed.”

With Jira integration enabling two-way visibility, developers and stakeholders could access real-time test session data without switching tools. “With the Jira integration, people can see what test sessions are happening, what’s failed, and the last time a test was run—all from a single ticket,” says Gregg

The Impact

Faster releases, greater confidence, and improved collaboration

With BrowserStack Test Management, Currys transformed its QA efficiency and dramatically increased release cadence. Previously, releases followed a rigid, waterfall-style process, deploying just once per sprint. Now, with AI-driven test management, the team releases up to four times per sprint, with some teams shipping updates every few days.

“Our average deployment cycle has increased fourfold with BrowserStack. Where we were releasing once in a two-week sprint, we’re now deploying four times per sprint, with some teams releasing updates as frequently as every few days,” Gregg states.

Beyond speed, collaboration and transparency have improved significantly. Teams now have real-time access to testing data, breaking down barriers between QA and development. “BrowserStack has changed how people interact with the quality team. It’s no longer just about testers pushing back on breaking changes. Everyone sees the same information at the same time, which promotes real-time discussions and cross-team collaboration,” Gregg emphasizes.

By consolidating all testing tools into BrowserStack, Currys eliminated inefficiencies caused by fragmented workflows. Previously, test management was scattered across multiple platforms, making it difficult to access real-time insights. Now, with a single source of truth, Currys has streamlined QA, accelerated release cycles, and improved software quality. “It’s a game-changer. Everyone knows where to look for information, everyone understands the quality of what we’re producing, and that’s all thanks to BrowserStack Test Management,” Gregg concludes.

If you would still like to know more, BrowserStack provides AI enabled products and agents across the testing lifecycle. Reach out to us here.

Author

Ankit Jain Senior Director – Product Management, BrowserStack

Ankit is a Senior Director of Product Management, spearheading the fastest growing Test Management and Scanner product lines at BrowserStack. With over 20 years of experience, as a Product Leader, Founder, Investor, and Developer, he has led global teams and driven strategic initiatives, specializing in 0-to-1 development, growth, monetization, and scaling across startups, hyper-growth companies, and public enterprises.

BrowserStack are Platinum Sponsors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Filed Under: EuroSTAR Expo, Platinum Tagged With: 2026, EuroSTAR Conference, Expo, software testing tools

Testing AI Agents: What QE Teams Need to Unlearn Before They Can Get This Right

April 13, 2026 by Lauren Payne

Run the same AI agent with the same input ten times. You will get ten different results.
Sometimes subtly different. Sometimes wildly.

That single fact breaks almost everything traditional QA was built on.

LangChain’s 2026 State of Agent Engineering report surveyed 1,300+ professionals. The
findings are stark: 57% of organizations now have AI agents in production. Quality is the number one barrier to deployment, cited by 32% of teams. And only 52% have any evaluation system in place.

AI Agents Are in Production, but Evaluation Is Still Maturing

Do the math. Roughly half the organizations shipping agents to production have no structured way to know if those agents work reliably. For enterprises with 10,000+ employees, the top concern is not cost or speed. It is hallucinations and output consistency.

Gartner’s 2025 Hype Cycle placed AI agents at the Peak of Inflated Expectations, noting that multi-agent workflows and model non-determinism may trigger cascading failures.

That confidence gap is where QE teams should be rushing in.

Why the Input–Output Contract No Longer Holds

Traditional QA lives on a simple promise: given input X, expect output Y. AI agents break that promise by design. A customer service agent might resolve the same complaint through five different valid approaches. A coding agent might fix a bug with three different architectures. The output varies. The path varies. Both can be correct.

You cannot write an assertion that says “the response must equal this exact string.” You cannot build a regression suite expecting identical behavior across runs. And you cannot rely on pass-fail verdicts when the definition of “correct” depends on context, tone, and user intent. This is not a tooling problem. It is a thinking problem. And it demands that QE teams unlearn some deeply held assumptions about what testing looks like.

Define Behavioural Boundaries, Not Exact Outputs

The most effective teams testing AI agents have made a counterintuitive shift: they stopped checking exact outputs and started defining behavioural bounds.

Anthropic’s engineering team addressed this in their guidance. They recommend evaluating the quality of the final output rather than the exact steps taken to reach it. Agents often arrive at effective solutions through alternative paths. If evaluation frameworks reject those paths, the test suite becomes brittle instead of robust.

Practically, this means asking different questions. Did the agent call the correct tools? Did it stay within policy guardrails? Did it reach a valid end state? Did it handle edge cases without hallucinating?

Simulate Users, Not Just Inputs

Structured simulation frameworks help reduce production agent failures. The approach is simple: test agents against diverse user personas, communication styles, and edge cases before deployment.

A customer service agent that handles polite requests perfectly might collapse with ambiguous or frustrated users. A voice assistant tested only with clear enunciation will fail in noisy real-world environments. Testing AI agents means testing the full range of human unpredictability.

This is exactly the problem TestMu AI’s Agent-to-Agent Testing platform was built to solve. It uses specialized AI agents to simulate diverse personas, generate thousands of test scenarios, and validate how your agent handles conversation, reasoning, and context across real-world conditions.

The concept of using agents to test agents sounds recursive, but it is the only approach that scales to match the complexity of these systems.

Quality Is a Continuous Signal

Many teams approaching agent testing are moving beyond the idea of quality as a one-time, pre-release checkpoint. Instead, they treat it as an ongoing signal.

Production logs can inform new test cases. Real user interactions can expand scenario libraries. Evaluation can run continuously as agents evolve, helping teams adapt as behaviour changes over time.

LangChain’s data confirms this shift: 89% of teams have implemented observability for their agents. But observability without structured evaluation is just logging.

The winning practice combines automated monitoring to flag anomalies with human reviewers making judgment calls on ambiguous cases. Platforms like KaneAI support this continuous model. When test authoring, execution, reporting, and test management live in one unified system, the feedback loop from a production anomaly back to the relevant test scenario becomes fast and actionable, tight enough to drive real quality improvements.

The Discipline Is Being Rewritten

Quality engineering is expanding. As AI systems introduce probabilistic behavior, tool
orchestration, and adaptive workflows, the craft naturally grows more complex. Engineers who understand both testing fundamentals and AI system mechanics are well positioned to navigate that shift.

For teams already practicing strong QE, the shift is less about starting from scratch and more about refining the lens.

Author

Mudit Singh Co-Founder at TestMu AI

With over a decade of experience building and scaling
software products, he has helped shape quality engineering and AI-driven testing strategies that empower engineering teams to ship reliable software faster. His work spans product strategy, AI-native quality engineering, and community-led innovation, bridging the gap between human expertise and autonomous systems.

TestMu AI are Gold Sponsors at EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Filed Under: EuroSTAR Conference, EuroSTAR Expo, Gold Tagged With: 2026, EuroSTAR Conference, Expo, software testing tools

How the Testing Discipline Adapts to Conform with Agentic SDLC

April 3, 2026 by Lauren Payne

Disclaimer: this article is 100% human effort, no LLMs were leveraged while writing it.

Since the end of 2022, people have been using LLMs, first for fun, and then for work related activities. 2023 was the year of doubts, with the majority of people calling this period the “AI-hype Era”. Many were afraid to even try out any of these LLMs. Then in 2024 we started hearing about successful results of AI adoption programs and saw AI-native solution adoption providing valuable assistance, primarily in coding tasks.

In 2025, the scenery changed again. We started hearing more about the pitfalls of GenAI transformation projects, the emerging risks and challenges, and how one could potentially bridge these gaps and avoid hurting their business. Most people were still cautious, but they were also curious.


Moving on from simply using chatbots, the natural next step was leveraging a code
assistant inside an IDE. This is a great way to boost your output in test automation, but
without the right context, i.e. proper test data and agentic knowledge of your enterprise systems, the code produced could turn out to be generic and not tailored to what you need.

The first answer to that problem was RAG (Retrieval Augmented Generation), and then more recently the MCP (Model Context Protocol). The former enables you to leverage additional data – custom embeddings and datasets – and effectively expand what your LLM can access. The latter provides communication between LLMs and Agentic systems with external systems such as project or test management tools.

Although the tooling is important, the human aspect needs to be considered too, and in
fact, is the biggest factor in successful AI transformation programs and adoption.
Now in 2026 we have an even clearer picture of implementing AI-native business and
operational solutions across almost every industry. The first thing to highlight is that the number one discipline where we see success in AI adoption is:

Testing!

This may come as a surprise to you, my fellow Testers, as usually Testing often seems to be an afterthought. Dev and DevOps continue to prioritize coding and delivering fast value. So how come Testers are the forerunners now?

The recipe is simple: Testers have a critical, methodical, and investigative mindset. They
provide unfiltered feedback and really care about the products they are working on. Testers also have a deep understanding of software and are comfortable with new, possibly unfamiliar technologies, both as users and as technical professionals.

Let’s break down, how that aligns with GenAI:

  • Prompt Engineering became a globally recognized role in 2023, and it became mandatory skill to pick up for teams working in an Agentic SDLC
  • For prompts, you need to be descriptive and have a thorough understanding of what you need the LLM to accomplish or provide
  • Testers already have the analytical mindset due to requirements analysis
  • Testers already understand what the end-users and the business are looking for
  • Testers already work closely together with developers
  • Test automation code needs to be integrated in CI/CD pipelines, and quality gates need to be defined at different stages of the delivery

EPAM realized that Testers are the Swiss Army Knives of the Software world. Testers make the perfect Prompt Engineers, as they possess all the required prerequisites to pick up the necessary new skillsets fast to excel in this AI world. And then, they can be the perfect catalysts and support system for pursuing broader AI adoption across an organization.

Example Agentic SDLC phases, AI assistants and benefits of using AI

Agentic SDLC is all about bringing AI-assistance to every stage of development, QA and operations, be it requirements analysis, user story creation, developer’s review of user stories, test case definition, code change impact analysis, test orchestration, or vibe
coding of product code and test automation code.

For each of those tasks, a pipeline of AI agents can provide task level productivity gains. The more use cases you identify to augment with AI, the more overall team productivity gains can be realized. For that, you need to investigate applicable disciplines holistically and ensure that each team member is engaging with the implemented AI solution while developing mastery (Note: A number of AI orchestration and collaboration platforms, like EPAM’s EliteA, have built-in tools to help managers track adoption and skill growth). That’s when adoption can accelerate, and your teams can together ensure an impactful ROI (return on investment) on GenAI adoption programs.

That’s when QA people come into the picture again: We like to set up QA metrics to see trends and be able to course correct when the ship is navigating in the wrong direction. AI solutions are software solutions as well. Usage of these needs to be carefully observed, and course corrected at times. Testers know how to do that and can help teams and organizations avoid waste through proactive, predictive, and preemptive monitoring.

Example agentic eco-system leveraging MCP and ELITEA’s system connectors

To provide better insight, let us give you numbers from one of clients, an insurance
payment platform provider. We measured up to 90% task-level productivity gains on
performance tests results analysis, and on requirements analysis. Test case generation
and orchestration provided 75%, while user story and user guide creation provided 67%
gains. Agents assisted vibe coding enabled developers and test automation engineers to
spend around 40-45% less time on coding.

These numbers may look high, but don’t forget that these were task-level gains. The teamlevel gains were between 27.8% and 31.8%, as not all the tasks of business analysts,
developers, and testers were AI-assisted. As highlighted above, the more use cases
augmented with AI, the more disciplines adopting those solutions, the higher the overall productivity gains are.

Overall, there is an incredibly positive light and exciting opportunity for our beloved
discipline in this new era. But it’s important that You, as a Tester, start adapting to and
working with this new style of delivery, or you risk being left behind. If you are unsure where to start or how, then reach out to us, we are always happy to help.

Visit EPAM at booth 15 at the EuroSTAR conference. Come on over and say hello, and let’s seize these new AI opportunities together!

https://www.epam.com/services/engineering/quality-engineering

Author

Péter Földházi Quality Architect, AI & Game QA Consulting, North America

Péter was first involved with QA as a beta tester of DOTA in 2006. Since joining EPAM in 2012, he moved towards test automation and is currently working in the USA as a Quality Architect.

He is leading Game Testing Consulting and GenAI adoptions in the Americas.

Péter has authored two ISTQB syllabi: Test Automation Engineering & Test Automation Strategy. He also invented two test automation methodologies: the Flow Model Pattern and the Tri-Layer Testing Architecture, the latter published as a white paper by the PNSQC. Péter has been one of the review board members of the HUSTEF since 2015.

Péter is a regular keynote and tutorial speaker on conferences such as STARWEST, STAREAST, and SauceCon. He used to be a guest lecturer at 3 Budapest based universities: Óbuda, Pázmány and the ELTE. Brewing beer and planting chilis are some of his hobbies.

Editor: Ted Weil – Marketing Manager, TestIO & EPAM Testing Practice

EPAM are Exhibitors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Filed Under: EuroSTAR Conference, EuroSTAR Expo Tagged With: 2026, EuroSTAR Conference, Expo, software testing tools

How AI Is Changing Test Case Creation 

April 1, 2026 by Lauren Payne

Why test case creation is under pressure 

In software development, speed is no longer a competitive advantage — it is an expectation. Teams release continuously, requirements evolve rapidly, and documentation quality varies. Yet one constant remains: quality must be reliable. 

Test case creation sits at the heart of this challenge. It translates requirements into structured validation, turning ideas into verifiable outcomes. But under increasing time pressure, this critical step often becomes a bottleneck. Requirements evolve rapidly, documentation quality varies, and the window for careful analysis keeps shrinking. When test cases are rushed, inconsistent, or incomplete, the consequences surface later — in escaped defects, costly rework, and delayed releases. 

This growing tension between speed and quality is exactly where Artificial Intelligence begins to reshape the discipline — not by replacing testers, but by redefining how test cases are created, reviewed, and refined. 

Most organizations still rely on manual test case derivation from requirement documents, user stories, or specifications. That work is important, but it comes with familiar challenges: 

  • Time-intensive effort: Large requirement sets can take days or weeks to translate into structured test cases. 
  • Human variability: Two testers can interpret the same requirement differently, producing uneven quality. 
  • Coverage gaps: Under deadline pressure, edge cases and negative scenarios are often missed. 
  • Automation friction: Manually written cases are frequently not “automation-ready” and require rework to be useful in pipelines. 

This is where AI has begun to reshape the discipline, not by replacing testers, but by changing how the work is distributed. 

What AI changes in test case creation 

AI introduces a new operating model: machine-generated drafts plus human validation. Instead of starting from a blank page, testers start from a structured baseline created by an AI engine that has processed the underlying requirements. 

In practice, the shift is not just “faster writing.” It impacts four core outcomes:

  1. Speed: AI can generate test case drafts in a fraction of the time needed for manual extraction. That can reduce lead time from requirements to executable testing, especially helpful in early phases or short sprint cycles. 
  1. Precision: When the AI is trained and designed for requirements understanding, it can standardize structure, language, and formatting across test cases, reducing ambiguity and improving consistency. 
  1. Higher coverage: AI can systematically scan the full set of available requirements and create broader scenario sets, including negative paths, boundary conditions, and dependencies that are commonly overlooked when time is tight. 
  1. Ready for automation: If test cases are generated in a structured format, clear preconditions, steps, expected results, and stable identifiers, they become significantly easier to map into automation frameworks and CI/CD pipelines. 

The key is how this is implemented. AI creates value when it produces output that is immediately usable by testers and automation engineers, not when it generates generic text that still requires heavy rework. 

Introducing msg.TestcaseGen.ai: faster, more complete, automation-ready 

msg.TestcaseGen.ai was built to modernize test case creation with AI, without sacrificing professional QA standards. The tool automatically generates structured test cases from requirement documentation and supports review and refinement by subject matter testers, enabling organizations to combine AI efficiency with human expertise. 

From a test management perspective, the benefits align directly with what many teams need right now: 

  • Faster test case generation: Reduce manual effort and free experts for analysis, risk assessment, and exploratory work. 
  • More precise, consistent structure: Improve readability and reduce interpretation gaps across teams and projects.
  • Higher test case coverage: Systematically derive cases from the full requirements set, supporting more robust functional validation.  
  • Automation readiness: Produce standardized test cases that can be transitioned more efficiently into automated test suites. 

In short, msg.TestcaseGen.ai helps organizations move from “test cases as a documentation burden” to “test cases as an acceleration asset.” 

Where it fits best: functional testing that scales 

AI-based test case generation is particularly effective in functional testing, where traceability to requirements and structured step design matter most. Typical use cases include: 

  • Structured bug testing: Creating reliable, repeatable cases that uncover functional defects. 
  • Regression testing: Ensuring existing features still work after change, supported by consistent, maintainable test sets. 
  • Localization readiness: Supporting coverage across language and region variants by deriving scenarios systematically from specs. 

This matters because functional scope expands quickly, especially in large programs, and manual test case work rarely scales at the same pace. 

Human testers still lead, AI changes what they spend time on 

AI does not remove the need for skilled QA professionals. It changes where expertise delivers the greatest value. 

Instead of spending most of the time on drafting and formatting, testers can focus more on: 

  • validating intent and risk, not just steps 
  • improving test design quality and coverage strategy 
  • identifying missing requirements and inconsistencies 
  • designing automation architecture and stability
  • ensuring test suites remain relevant over time 

AI becomes a productivity layer, while testers remain the quality authority. 

A practical path forward 

If you are evaluating AI for test case creation, the most pragmatic approach is: 

  1. Start with a real requirement set (not a “demo” example). 
  2. Generate a baseline suite using AI. 
  3. Conduct expert review and refinement.
  4. Measure impact on lead times, coverage and automation usability. 

That is exactly the kind of practical, real-world impact msg.TestcaseGen.ai is designed to deliver, helping teams test faster, more precisely, with higher coverage, and ready for automation. 

This human-plus-AI model reduces lead times, improves consistency, and increases coverage—without compromising professional QA standards. 

msg will be present as an exhibitor at EuroSTAR 2026 in Oslo (June 15–18). If AI-driven test case generation is on your roadmap, msg.TestcaseGen.ai is worth a closer look. https://testcasegen.com/ 

Author

Tuan Truong – Head of Test Architect Product Development

Stephan Ingerberg, Head of Sales, msg Test & Quality Management 

Stephan Ingerberg is a seasoned professional with over a decade of experience in the realm of software quality and digital assurance. He is a dedicated desciple of quality and testing since 2004. 

Currently serving as a pivotal figure in the Test & Quality Management division of msg, responsible  for sales, customer relations and commercial aspects within central Europe. His unwavering dedication to excellence and adept navigation of software quality make him indispensable in the pursuit of digital perfection. 

https://www.linkedin.com/in/stephan-ingerberg-digital-transformation

msg Test & Quality Management is an Exhibitor at EuroSTAR 2026, join us in Oslo

Filed Under: EuroSTAR Conference, EuroSTAR Expo Tagged With: 2026, EuroSTAR Conference, software testing tools

  • Page 1
  • Page 2
  • Page 3
  • Next Page »
  • Code of Conduct
  • Privacy Policy
  • T&C
  • Media Partners
  • Contact Us

part of the