Sponsor

Meeting the AI Software Quality Challenge: SmartBear’s Testing Portfolio

April 27, 2026 by Lauren Payne

AI-accelerated development has fundamentally changed how software is built, and across the industry, its impact on quality is already measurable. In SmartBear’s Closing the AI software quality gap study, we found nearly 70% of software professionals report application quality is declining as AI speeds up code generation, with development velocity increasingly outpacing teams’ ability to test effectively.

This is not a future risk or a theoretical concern. The gap between code generation speed and testing capacity continues to widen, creating an unsustainable dynamic. Teams face an impossible choice: either bottleneck development to maintain testing rigor or accept degraded application quality as development races ahead unchecked. But what if that tradeoff isn’t actually necessary?

Application integrity: The new standard for AI-era quality

Application integrity is the continuous, measurable assurance that your software works as intended at AI speed and scale. When code generation wildly outpaces application validation, maintaining integrity becomes impossible without sacrificing the velocity gains that AI-accelerated development promises. The consequences of compromised application integrity are severe: regulatory fines, brand damage, customer loss, and revenue impact. SmartBear addresses this challenge with the SmartBear Application Integrity Core™ – unifying the system of record (API catalog and test repository) with MCP tools and agentic workflows that empower developers and AI agents to deliver software that just works. Whether testing runs in cloud-native environments, on-premises infrastructure, or is managed directly within Jira, teams can continuously validate applications while maintaining control over quality as development accelerates.

SmartBear’s testing portfolio: Built for AI velocity

BearQ™: Autonomous QA for the next generation

SmartBear BearQ represents a fundamental shift in how testing keeps pace with AI-accelerated development. This agentic QA system operates at the highest levels of autonomy, serving as an exploration and testing analog to autonomous coding tools. BearQ thinks and tests like a real user, exploring applications autonomously and discovering flows rather than following pre-determined scripts. It adapts continuously as applications evolve, automatically updating tests without manual rewrites while maintaining full human visibility and control.

Reflect: Vision-based AI automation for modern applications

Reflect is a cloud-native test automation platform that uses vision-based AI to create and maintain tests that remain stable as applications evolve. By interpreting the UI the way users do, Reflect removes dependency on brittle selectors, enabling automation across web, mobile, and API workflows within a single platform. Teams can generate tests agentically or through natural language prompts, with built-in self-healing that automatically adapts to UI changes, reducing maintenance overhead while expanding coverage.

TestComplete: Enterprise desktop and web UI automation

TestComplete provides deep automation support for complex desktop applications, internal web systems, and legacy frameworks that modern cloud-first tools cannot reliably support. Its ability to run in secure, on-premises environments makes it essential for organizations with compliance requirements or specialized UI frameworks. Supporting multiple automation approaches – from record-and-replay to full scripting – TestComplete enables teams with different skill levels to work within the same system. Advanced hybrid object recognition combines property-based detection, text extraction, and vision AI to interact accurately with complex interfaces.

QMetry: Enterprise testing platform for scalable QA

QMetry is an enterprise test management platform that unifies performance, visibility, and automation in a single system designed to handle millions of test cases without performance degradation. As a centralized testing system of record, QMetry provides real-time visibility, audit-ready traceability, and customizable reporting across the entire organization. AI-driven capabilities streamline test creation and maintenance, with automated test case generation reducing creation time from 30-60 minutes to under 60 seconds. Built-in compliance features support regulated environments with flexible deployment options.

Zephyr: Jira-native testing for agile teams

Zephyr integrates testing directly within Atlassian Jira workflows, enabling teams to create, execute, and track tests alongside user stories, requirements, and defects without switching tools. This Jira-native approach provides end-to-end traceability across planning, execution, and validation while maintaining performance even as test libraries grow. Rovo agent skills enable natural-language queries to evaluate test coverage and assess release readiness, while MCP server capabilities extend Zephyr beyond Jira for more flexible workflows.

Swagger: Spec-driven API testing and contract validation

Swagger enables teams to design, test, document, and govern APIs using OpenAPI as a shared source of truth. By deriving testing directly from API specifications, Swagger reduces drift between design and implementation while enabling both functional validation and contract testing. Swagger Functional Testing validates endpoints against OpenAPI specifications, ensuring requests, responses, and data structures conform to defined contracts. Swagger Contract Testing verifies that API changes don’t break downstream consumers, critical for distributed and microservices-based architectures.

ReadyAPI: Comprehensive API testing for real-world conditions

ReadyAPI enables teams to validate API behavior across functional and performance scenarios while simulating dependencies through service virtualization. Supporting REST, SOAP, GraphQL, JMS, and other protocols, this on-premises platform allows functional tests to be converted into load tests without rebuilding scenarios. LLM-driven test generation creates and validates complex test cases with large data volumes using no-code, prompt-based workflows. Service virtualization simulates dependent systems, enabling testing when external services are unavailable – especially valuable in complex environments requiring infrastructure control.

A testing system that scales with modern development

The SmartBear testing portfolio addresses the fundamental challenge facing development teams: maintaining application integrity as AI accelerates code generation. Individual tools solve specific testing challenges across UI automation, API validation, and test orchestration. Together, they create a unified testing system that scales with AI-driven development velocity.

When testing infrastructure operates as a coordinated system rather than isolated tools, teams gain the ability to validate applications comprehensively without sacrificing speed. Automation scales without becoming fragile. API changes are validated before reaching consumers. Testing coverage remains aligned with development rather than trailing behind it. The result is not a choice between speed and quality – it’s the ability to deliver both while maintaining the application integrity that modern software demands.

Author

Rob McNeil Senior Manager of Product Marketing

Rob is a Senior Manager of Product Marketing focused on defining the go-to-market strategy for SmartBear’s portfolio of software testing products. He is passionate about engaging with customers and bringing their voices into product strategy so that feature launches align with real market needs. He has been with SmartBear for four years, with his more recent projects centered on researching the impact of AI, including how it is bringing significant changes to software developers and testers, and launching new generative AI and agentic AI features to meet the demands of the evolving development landscape.

SmartBear are Gold Sponsors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Currys achieves 4X faster release cycle with BrowserStack AI-Powered Test Management

April 17, 2026 by Lauren Payne

Introduction

Currys, a leading UK omnichannel retailer, is committed to delivering seamless digital experiences. With a vast online and offline presence, ensuring high-quality software releases is critical. However, fragmented testing, manual test management processes, and limited visibility into testing efforts slowed development cycles. To overcome these challenges, Currys adopted BrowserStack Test Management, resulting in faster releases, improved test coverage, and streamlined quality assurance.

The Challenge

Fragmented testing and limited visibility

When Gregg Ward, Principal Quality Assurance Manager, joined Currys, he found a fragmented testing process. Teams operated in silos, using multiple different tools for testing and Excel spreadsheets stored on individual computers to manage test cases. The lack of a centralized system led to inefficiencies, making collaboration difficult and slowing down issue resolution.

Without real-time visibility, their team spent more time chasing information than solving problems. Gathering data took three times longer than fixing issues, and the deep technical knowledge required made onboarding and collaboration challenging. Testing remained a black box for stakeholders, impacting release cycles and innovation. Test creation was also time-consuming, impacting release cycles and slowing innovation.

“We needed a single source of truth that would provide real-time visibility to everyone involved, from developers to stakeholders. We wanted to standardize transparency and real-time results, and BrowserStack ticked all those boxes,” Gregg says.

The Solution

BrowserStack AI-Powered Test Management

Currys moved to BrowserStack Test Management for its seamless integration, intuitive interface, and AI capabilities. The platform provided a centralized hub for test creation, execution, and tracking, enabling both technical and non-technical users to access critical information effortlessly.

Migrating from legacy systems was a major shift. With 40 years of legacy platforms and over 200,000 test cases in massive Excel files, they were concerned about losing critical data. But BrowserStack supported them every step of the way, providing hands-on assistance. The result? Not a single test case was lost—everything was intact. With BrowserStack’s support, Gregg’s team onboarded 120 users and imported 60,000 test cases from an existing Test Management tool, including a project with 22,000 test cases migrated in under five hours.

“We haven’t lost a single test case. Everything is there. BrowserStack Test Management lets you set up field mappings to match however your legacy test files were structured, and it’s incredibly flexible,” Gregg says. This ensured that Currys’ QA teams could continue their work without disruption, maintaining the integrity of their testing data while benefiting from a more streamlined and efficient test management system.

Having migrated their existing test cases, Currys leveraged BrowserStack’s AI capabilities to accelerate test creation. “We recently introduced BrowserStack AI to our testing teams across all areas of Currys, and the impact was immediate. The increase in coverage was fourfold, but the real advantage was the speed at which we could create test cases directly from our UI, Jira, or Confluence,” Gregg explains. AI-driven automation reduced manual effort and encouraged broader, more innovative test coverage.

The platform’s dashboards and reporting further improved issue detection. “Dashboards in BrowserStack are incredibly useful. They provide an easy-to-follow information path that we can share with stakeholders, making it simple to highlight key insights and zoom into the details when needed.”

With Jira integration enabling two-way visibility, developers and stakeholders could access real-time test session data without switching tools. “With the Jira integration, people can see what test sessions are happening, what’s failed, and the last time a test was run—all from a single ticket,” says Gregg

The Impact

Faster releases, greater confidence, and improved collaboration

With BrowserStack Test Management, Currys transformed its QA efficiency and dramatically increased release cadence. Previously, releases followed a rigid, waterfall-style process, deploying just once per sprint. Now, with AI-driven test management, the team releases up to four times per sprint, with some teams shipping updates every few days.

“Our average deployment cycle has increased fourfold with BrowserStack. Where we were releasing once in a two-week sprint, we’re now deploying four times per sprint, with some teams releasing updates as frequently as every few days,” Gregg states.

Beyond speed, collaboration and transparency have improved significantly. Teams now have real-time access to testing data, breaking down barriers between QA and development. “BrowserStack has changed how people interact with the quality team. It’s no longer just about testers pushing back on breaking changes. Everyone sees the same information at the same time, which promotes real-time discussions and cross-team collaboration,” Gregg emphasizes.

By consolidating all testing tools into BrowserStack, Currys eliminated inefficiencies caused by fragmented workflows. Previously, test management was scattered across multiple platforms, making it difficult to access real-time insights. Now, with a single source of truth, Currys has streamlined QA, accelerated release cycles, and improved software quality. “It’s a game-changer. Everyone knows where to look for information, everyone understands the quality of what we’re producing, and that’s all thanks to BrowserStack Test Management,” Gregg concludes.

If you would still like to know more, BrowserStack provides AI enabled products and agents across the testing lifecycle. Reach out to us here.

Author

Ankit Jain Senior Director – Product Management, BrowserStack

Ankit is a Senior Director of Product Management, spearheading the fastest growing Test Management and Scanner product lines at BrowserStack. With over 20 years of experience, as a Product Leader, Founder, Investor, and Developer, he has led global teams and driven strategic initiatives, specializing in 0-to-1 development, growth, monetization, and scaling across startups, hyper-growth companies, and public enterprises.

BrowserStack are Platinum Sponsors in EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

Testing AI Agents: What QE Teams Need to Unlearn Before They Can Get This Right

April 13, 2026 by Lauren Payne

Run the same AI agent with the same input ten times. You will get ten different results.
Sometimes subtly different. Sometimes wildly.

That single fact breaks almost everything traditional QA was built on.

LangChain’s 2026 State of Agent Engineering report surveyed 1,300+ professionals. The
findings are stark: 57% of organizations now have AI agents in production. Quality is the number one barrier to deployment, cited by 32% of teams. And only 52% have any evaluation system in place.

AI Agents Are in Production, but Evaluation Is Still Maturing

Do the math. Roughly half the organizations shipping agents to production have no structured way to know if those agents work reliably. For enterprises with 10,000+ employees, the top concern is not cost or speed. It is hallucinations and output consistency.

Gartner’s 2025 Hype Cycle placed AI agents at the Peak of Inflated Expectations, noting that multi-agent workflows and model non-determinism may trigger cascading failures.

That confidence gap is where QE teams should be rushing in.

Why the Input–Output Contract No Longer Holds

Traditional QA lives on a simple promise: given input X, expect output Y. AI agents break that promise by design. A customer service agent might resolve the same complaint through five different valid approaches. A coding agent might fix a bug with three different architectures. The output varies. The path varies. Both can be correct.

You cannot write an assertion that says “the response must equal this exact string.” You cannot build a regression suite expecting identical behavior across runs. And you cannot rely on pass-fail verdicts when the definition of “correct” depends on context, tone, and user intent. This is not a tooling problem. It is a thinking problem. And it demands that QE teams unlearn some deeply held assumptions about what testing looks like.

Define Behavioural Boundaries, Not Exact Outputs

The most effective teams testing AI agents have made a counterintuitive shift: they stopped checking exact outputs and started defining behavioural bounds.

Anthropic’s engineering team addressed this in their guidance. They recommend evaluating the quality of the final output rather than the exact steps taken to reach it. Agents often arrive at effective solutions through alternative paths. If evaluation frameworks reject those paths, the test suite becomes brittle instead of robust.

Practically, this means asking different questions. Did the agent call the correct tools? Did it stay within policy guardrails? Did it reach a valid end state? Did it handle edge cases without hallucinating?

Simulate Users, Not Just Inputs

Structured simulation frameworks help reduce production agent failures. The approach is simple: test agents against diverse user personas, communication styles, and edge cases before deployment.

A customer service agent that handles polite requests perfectly might collapse with ambiguous or frustrated users. A voice assistant tested only with clear enunciation will fail in noisy real-world environments. Testing AI agents means testing the full range of human unpredictability.

This is exactly the problem TestMu AI’s Agent-to-Agent Testing platform was built to solve. It uses specialized AI agents to simulate diverse personas, generate thousands of test scenarios, and validate how your agent handles conversation, reasoning, and context across real-world conditions.

The concept of using agents to test agents sounds recursive, but it is the only approach that scales to match the complexity of these systems.

Quality Is a Continuous Signal

Many teams approaching agent testing are moving beyond the idea of quality as a one-time, pre-release checkpoint. Instead, they treat it as an ongoing signal.

Production logs can inform new test cases. Real user interactions can expand scenario libraries. Evaluation can run continuously as agents evolve, helping teams adapt as behaviour changes over time.

LangChain’s data confirms this shift: 89% of teams have implemented observability for their agents. But observability without structured evaluation is just logging.

The winning practice combines automated monitoring to flag anomalies with human reviewers making judgment calls on ambiguous cases. Platforms like KaneAI support this continuous model. When test authoring, execution, reporting, and test management live in one unified system, the feedback loop from a production anomaly back to the relevant test scenario becomes fast and actionable, tight enough to drive real quality improvements.

The Discipline Is Being Rewritten

Quality engineering is expanding. As AI systems introduce probabilistic behavior, tool
orchestration, and adaptive workflows, the craft naturally grows more complex. Engineers who understand both testing fundamentals and AI system mechanics are well positioned to navigate that shift.

For teams already practicing strong QE, the shift is less about starting from scratch and more about refining the lens.

Author

Mudit Singh Co-Founder at TestMu AI

With over a decade of experience building and scaling
software products, he has helped shape quality engineering and AI-driven testing strategies that empower engineering teams to ship reliable software faster. His work spans product strategy, AI-native quality engineering, and community-led innovation, bridging the gap between human expertise and autonomous systems.

TestMu AI are Gold Sponsors at EuroSTAR 2026. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

How Artificial Intelligence is Rerouting Quality Assurance

March 25, 2026 by Lauren Payne

AI QA is reshaping software testing by bringing intelligence into every stage of the development lifecycle. By combining AI and machine learning, QA teams are moving from brittle automation to adaptive, predictive strategies that catch bugs earlier, reduce test maintenance, and speed up releases.

This post breaks down how Artificial Intelligence (AI) in Quality Assurance (QA) is transforming software testing from smarter test case generation and faster defect prediction to continuous optimization. You’ll also see how teams can start applying AI in practical ways.

What is artificial intelligence in quality assurance?

AI QA refers to the integration of AI and machine learning (ML) into quality assurance workflows. Practically speaking, it’s about using AI to take repetitive tasks off the QA team’s plate, giving them more time to focus on activities that require human insight, like exploratory testing or evaluating edge-case behavior.

A few of the jobs AI QA can perform include:

Generating test cases based on user behavior, system logs, requirement, or recent code changes.

Predicting failure points by analyzing historical defect data, commit patterns, and code complexity.

Triaging bugs automatically using NLP (Natural Language Processing) to group related issues, flag duplicates, and suggest likely root causes.

Prioritizing test execution based on risk scores, code velocity, and business-critical areas to reduce unnecessary test cycles.

Maintaining and evolving test suites by identifying outdated tests and generating new ones in response to product changes.

Why do QA teams need AI now?

Modern QA teams aren’t lacking tools. They’re lacking time, visibility, and actionable insights. Release cycles are getting shorter, systems are more complex, and user expectations continue to rise.

Here’s how that pressure shows up in practice:

Tests are running, but the value is unclear.

Automation is fragile and fixing brittle test scripts often takes time away from coverage.

Bugs still make it to production when testing isn’t aligned with real-world risk.

QA becomes a bottleneck when teams are expected to sign off without enough time, data, or confidence.

Leadership can’t measure impact without clear metrics.

AI QA helps solve these problems by enabling teams to work smarter, not harder. Instead of adding more scripts or expanding headcount, AI reduces waste and helps teams zero in on what really matters. The result is faster feedback and higher-quality releases.

How AI is helping teams improve efficiency

Here are five ways AI QA is helping teams improve efficiency and focus on what matters most:

Smarter test generation: AI tools analyze usage patterns, code changes, and defect logs to automatically generate test cases, saving time and improving test coverage.

With TestRail AI, you can generate first-draft test cases from your requirements text, then review and refine them before adding them to your suite.

Faster defect prediction: By modeling factors like code churn, commit frequency, and historical defect density, AI highlights high-risk areas before issues reach staging or production.

Intelligent bug triage: Using NLP, AI groups related bugs, flags duplicates, and suggests likely owners, helping teams resolve issues faster and reduce backlog noise.

Risk-based test prioritization: Rather than running every test on every build, AI assigns risk scores and ranks test cases based on business impact, recent changes, and failure likelihood.

Continuous test suite maintenance: AI flags outdated/redundant tests to reduce false positives and maintenance overhead.

The strategic edge: QA leaders are tracking and testing

QA teams are taking a more strategic approach. They’re seeking better visibility into what’s working, what’s wasting time, and where risk is hiding. AI is helping leaders track metrics like:

Test debt velocity: How quickly are tests becoming outdated, and how does that affect confidence in test results?

Risk-based test ROI: Which tests are consistently catching critical bugs—and which ones are just noise?

AI vs. manual performance: How do AI-generated tests compare to manual ones in terms of defect yield and maintenance cost?

Suite stability trends: Where is test flakiness increasing, and what are the patterns behind it?

How can my team start using AI in QA?

Start small with efficiency. The most effective teams begin by identifying where AI can have the greatest immediate impact and then build from there.

Map your friction. Where are you losing speed or confidence today?

Pick one high-leverage use case. Flakiness detection and test generation are great entry points. One simple starting point is using TestRail AI to draft test cases from requirements, then standardizing them with a human review step.

Choose transparent tools. Make sure your AI doesn’t introduce black-box risk.

Connect everything to TestRail. Use it as your system of record to track, trace, and manage your evolving strategy.

How TestRail helps teams create an AI QA strategy

AI QA tools can generate tests, flag risks, and optimize execution, but they work best within a structured system. TestRail brings those insights all together, helping teams turn AI QA into a repeatable strategy that scales across teams and release cycles.

Here’s how TestRail works for AI QA:

Track and generate tests in context: Use TestRail AI to draft test cases from requirements text, then manage AI-assisted and manual tests together with full visibility into history, execution, and ownership.
Visualize test coverage by risk: Filter by release, component, or risk category to see gaps and trends.
Centralize automated results: Connect TestRail to your automation and CI/CD pipeline to centralize reporting across automated test runs.
Maintain end-to-end traceability: Link test execution to requirements, defects, and user stories for complete accountability.
Report with clarity: Use dashboards and custom reports to surface performance trends, identify bottlenecks, and share QA impact across teams.

TestRail is built so that speed is measurable. Even as complexity grows and scales, your team stays in control.

Integrate a streamlined workflow with TestRail

Quality assurance measures real-world risk and complexity. Platforms like TestRail let you leverage AI QA without losing visibility, giving you tighter feedback loops and more confident releases. See it first hand, start your free 30-day trial today.

Author

Patrícia Duarte Mateus

With more than a decade of experience in Software QA and expertise in several business areas, Patrícia Duarte Mateus has a QA mindset built by the different roles she has played—including tester, test manager, test analyst, and QA engineer. She’s Portuguese, living in Portugal, and is currently a Solution Architect and QA Advocate for TestRail. Patrícia is also a speaker, mentor and founder of a project whose objective is to demystify and educate on Software QA with a focus on Portuguese-speaking people, called “A QA Portuguesa”. Her areas of interest beyond QA include deepening her knowledge of psychology, tech, management, teaching/mentoring, health, and entrepreneurship. Books, podcasts, Ted Talks and YouTube are always on Patrícia’s to-do list to ensure a good day!

TestRail are a participating in EuroSTAR Conference 2026 as a Gold Sponsor. Join us at EuroSTAR Conference EXPO in Oslo 15-18 June 2026.

How We Learned to Test Our RAG (and Accidentally Tested Our Content)

March 20, 2026 by Lauren Payne

In the last four years as a developer advocate at Qase, I’ve written more blog posts and LinkedIn articles than I can count. I’ve also reviewed hundreds of pieces from colleagues, checking whether the arguments hold, the examples work, and the claims are backed by evidence.

I got better at it over time. My earliest articles are nowhere close to my latest ones, both in knowledge and in writing style. Naturally, I tried using LLMs along the way. I quickly found out that LLMs can’t be trusted to write on my behalf, and they certainly can’t be trusted to fact-check. They hallucinate confidently, they miss nuance, they produce text that sounds right but often isn’t.

I did find one pattern that works: using LLMs as critics. I write, then I ask the LLM to poke holes in what I wrote. The diversity of their criticism is genuinely helpful. Even when they hallucinate in their feedback, it doesn’t matter, I still have to go through every point and decide what to keep. The hallucinations become noise I can filter, not errors that leak into my content.

This was my comfortable setup for a while. Then my friend Anupam Krishnamurthy showed me something that changed how I think about content quality entirely.

The Spark

Anupam and I co-own BeyondQuality, an open research community for software quality topics. He presented his research on evaluating RAG systems using a framework called Ragas.

For those unfamiliar: RAG (Retrieval Augmented Generation) is a way to make LLMs answer questions from your own documents. You chunk your content, store it in a vector database, and when someone asks a question, the system retrieves the relevant chunks and feeds them to the LLM to generate an answer. Ragas is a framework that evaluates how well this works: does the answer stick to the retrieved content? Did the system retrieve the right content in the first place?

Anupam built a RAG on the 37signals employee handbook and used Ragas to evaluate it. This was eye-opening. I started thinking about LLMs and content quality in a completely new way.

From Curiosity to Production

My first experiment was pure fun: I built a chat interface to a Deming book I own. It worked, but there was no real business case. I know what Deming writes; I’d rather just re-read the book.

The second idea had a real business case. At Qase, we have over a hundred blog posts, a Help Centre, and a customer support knowledge base. Every new piece of content operates on top of everything published before. If a new article contradicts something from six months ago, readers notice. It confuses people and erodes trust.

This is a classical regression problem, just in content, not in code. A company with hundreds of published pieces has the same problem as a codebase with no tests: changes go out unchecked against existing behavior. Before this, the only check was someone reading a draft and trying to remember if it contradicts anything. With a hundred articles, that’s not realistic. With RAG, I could retrieve just the relevant pieces from the existing corpus and check the new draft against them automatically. Ragas then evaluates how well the retrieval and checking actually work. The feedback loop went from “hope someone remembers” to a 25-minute automated run.

What Happened When We Tried It

The first obstacle was building the evaluation itself. To test whether the system answers correctly, you need ground truths: the key claims and ideas from your content. I tried using several LLMs to extract these from our 106 blog posts. They all gave inconsistent results. The LLMs couldn’t agree on what the articles were actually saying. I ended up reading every article myself and writing down the key ideas manually. There is no shortcut here yet.

Then I had to write evaluation questions. This turned out to be the same problem as writing tests after the code is already written: since you already know the system, you’re inclined to write tests that just confirm what’s there. My first questions were like that. They assumed knowledge of the content and led toward the answer. It took three iterations to learn to write questions the way a real person would ask them: short, simple, with genuine uncertainty about which way the answer goes.

Once I had 240 questions and the evaluation was running, the results told a clear story. On broad, open-ended questions, the LLM stopped relying on the retrieved content and started answering from its own general knowledge. It sounded confident and correct, but it was no longer grounded in what we actually wrote. Ragas caught this. Without measurement, we would never have noticed.

The whole evaluation for 240 questions costs $0.60 and takes 25 minutes — I’ve seen classical automated tests’ suites running for longer!

And then the biggest surprise. For some questions, the system couldn’t find the right article to answer from. Not because the retrieval was broken, but because we simply hadn’t written about those topics well enough. We set out to test the RAG. We ended up finding holes in our own content.

Where This Is Going

This research started with Anupam. Without his curiosity and his work at BeyondQuality, I would not have explored this direction at all.

Today we have regression testing for our content: the blog, the Help Centre, the CS knowledge base. New drafts get checked against everything we’ve already published. What started as a contradiction checker is now also helping us find gaps in our existing content we didn’t know were there.

If you want to see how RAG evaluation works hands-on, come to Anupam’s live demo at EuroStar in Oslo this June, 2026. He will walk through building and evaluating a RAG system step by step. And if the intersection of software quality and AI interests you, follow our work at BeyondQuality, where all research is published openly.

I’ll be at the Qase booth throughout the conference. If any of this sparked your curiosity, come say hi.

Author

Vitaly Sharovatov

As a quality enthusiast, I believe that people should take pride in their work and companies should aim to produce high-quality products. I have spent the last 24 years in IT, focusing on engineering, quality assurance and mentorship. I am also a huge animal lover and have saved and raised more than 50 cats and dogs.

QASE are a participating in EuroSTAR Conference 2026 as a Gold Sponsor

The 4 Pillars of Modern Testing: Building a Unified Ecosystem

March 16, 2026 by Lauren Payne

In the pursuit of digital transformation, many QA teams are hindered by integration debt—the hidden cost of reconciling manual data and bridging visibility gaps between disconnected platforms. This debt is paid in the form of administrative overhead, reduced velocity, and the inherent friction that occurs when test management and test automation operate in separate silos.

To address these complexities, many organizations utilize the Inflectra ecosystem to bridge the gap between planning and execution. As a provider of software lifecycle management tools, Inflectra focuses on creating a single source of truth across the development pipeline. Within this framework, SpiraTest (test management) and Rapise (test automation) are engineered to function as a unified system, ensuring that automated execution remains tethered to original
business requirements.

1. Transitioning from Integration to Unification

Traditionally, QA has been bifurcated. Test management tools track requirements and manual progress, while automation engines function within independent environments. This disconnect creates three primary strategic risks:

The Transparency Gap: Stakeholders may see successful execution reports without understanding which specific business requirements have been validated.
Version Divergence: Automated scripts often evolve independently of the test plans they support, leading to deceptive results based on outdated logic.
Operational Inefficiency: Engineers frequently duplicate effort by recreating manual test steps in code because a shared source of truth does not exist.

2. Transitioning from Integration to Unification

There is a fundamental difference between two tools connected by an API and a truly unified ecosystem. In a unified environment, the automation engine is not an external add-on; it is a native extension of the management layer.

When automation is managed as a primary asset within the test management tool,
organizations achieve End-to-End Traceability. This allows a Project Manager to evaluate a high-level requirement and immediately view the precise execution logs and evidence that confirm its stability.

The Pillars of a Unified QA Workflow

To modernize a QA department, leadership should focus on four technical pillars that define a mature, unified strategy:

Pillar I: Requirement-Driven Automation
Instead of developing scripts in isolation, a unified system allows teams to derive automation directly from manual definitions. By using the manual test case as the structural blueprint, Rapise ensures that automation mirrors the original business intent. This alignment ensures that any change to a manual requirement in SpiraTest is immediately reflected in the automation gap analysis.

Pillar II: Centralized Orchestration
Automation provides the greatest ROI when it is accessible and autonomous. A unified system acts as a Command Center, enabling teams to schedule and remotely execute tests across global labs directly from the management interface. Orchestration ensures the automatic capture of screenshots, logs, and outcomes into a centralized, auditable trail.

Pillar III: Data-Driven Scalability
Professional maturity in testing involves moving beyond simple record-and-playback functions toward Parameterization and Parallel Execution. By defining complex data sets in SpiraTest and deploying them through Rapise scripts simultaneously, teams can expand test coverage exponentially without increasing their maintenance footprint.

Pillar IV: The Continuous Feedback Loop (CI/CD)
In a modern CI/CD pipeline, the test management tool serves as the definitive quality gate. When build completions trigger Rapise automations, the results feed directly into SpiraTest release dashboards. This creates a self-documenting loop that provides stakeholders with a data-backed assessment of risk before code reaches production.

Leadership Perspective: Quality as a Discipline

True software excellence happens when strategy and execution are unified. By adopting a streamlined workflow, organizations reduce toolchain complexity and eliminate the
labor-intensive effort required to synchronize disparate platforms. The result is a more resilient QA process where quality is a continuous, intelligent discipline rather than a final checkpoint.

Summary for QA Executives

If your team spends more time managing tools than testing software, your architecture is suffering from integration debt. A unified system like SpiraTest and Rapise goes beyond task automation; it synchronizes your entire quality strategy, ensuring that every automated action serves a documented business goal.

Author

Adam Sandman

Adam Sandman is a visionary entrepreneur and a respected thought leader in the enterprise software industry. As the Founder and CEO of Inflectra Corporation, Adam has dedicated his career to revolutionizing how businesses approach software development, testing, and lifecycle
management.

Under Adam’s leadership, Inflectra has become a global provider of award-winning solutions, from SpiraTest’s powerful test management and flexible automation of Rapise to SpiraTeam’s end-to-end traceability. He has led Inflectra’s suite of software to grow into a global standard that empowers teams across the world to deliver high-quality software efficiently and collaboratively. His deep technical expertise, combined with a passion for innovation, has
positioned him as a trusted voice in the field, influencing trends and shaping best practices for agile development and quality assurance.

Adam is known for his engaging presentations at industry conferences, where he shares insights on topics such as automation, project management, and emerging technologies. His ability to translate complex concepts into actionable strategies has earned him a reputation as an effective educator and mentor.

Beyond his technical acumen, Adam is committed to fostering a culture of inclusivity and collaboration within the tech community. Through thought-provoking blogs, webinars, and public speaking engagements, he inspires professionals worldwide to adopt forward-thinking approaches to software development and testing.

When he’s not at the helm of Inflectra, Adam enjoys exploring the latest advancements in technology, mentoring up-and-coming tech leaders,

Inflectra is participating in EuroSTAR Conference 2026 as a Gold Sponsor

One Platform, Endless Possibilities: Introducing BrowserStack Test Platform 🚀

May 26, 2025 by Aishling Warde

Software testing has evolved. Engineering teams today are navigating an increasingly complex landscape—tight release cycles, growing test coverage demands, and the rapid adoption of AI in testing. But fragmented toolchains and inefficiencies slow teams down, making it harder to meet quality expectations at speed.

We believe there’s a better way.

Today, we’re thrilled to introduce the BrowserStack Test Platform—an open, integrated and flexible platform featuring AI-powered testing workflows that enable users to simplify their toolchain into a single platform, eliminating fragmentation, reducing costs, and improving productivity. Built to enhance efficiency, the Test Platform transforms how teams approach quality, delivering up to 50% productivity gains while expanding test coverage.

The Challenge: Fragmentation Meets AI

Traditionally, QA teams have had to juggle disconnected tools for test automation, device coverage, visual regression, performance analysis, accessibility compliance, and more. The result? Fractured workflows, hidden costs, and a lot of context switching.

We wanted to change that. Our goal was to bring every aspect of testing—across web, mobile, and beyond—under one roof, complete with AI-driven intelligence, detailed analytics, and robust security features. By unifying the testing process, teams can dramatically improve productivity, reduce costs, and focus on delivering what truly matters: stellar digital experiences.

Introducing BrowserStack Test Platform

1. Faster Test Cycles with Test Automation

Enterprise-grade infrastructure for browser and mobile app testing—run tests in the BrowserStack cloud or self-host on your preferred cloud provider. This helps improve automation scale, speed, reliability, and efficiency.
AI-driven test analysis, test orchestration, and self-healing to pinpoint and fix issues faster.
Designed to maximize the ROI of test automation, freeing you to focus on innovative work instead of manual maintenance.

2. BrowserStack AI Agents

The platform’s AI Agents transform every aspect of the testing lifecycle, from planning to validation.
With a unified data store, AI Agents gain rich context, helping teams achieve greater testing accuracy and efficiency.
Automate repetitive tasks, identify flaky tests, and optimize testing workflows seamlessly.

3. Comprehensive Test Coverage

20,000+ real devices and 3,500+ browser-desktop combinations to replicate actual user conditions.
Advanced accessibility testing ensures compliance with ADA & WCAG standards.
Visual testing powered by the BrowserStack Visual AI Engine to spot even minor UI discrepancies.

4. Test & Quality Insights

A single-pane executive view for all your QA metrics, integrated into the Test Platform.
Test Observability and AI-powered Test Management streamline debugging and analytics.
Data-driven insights to help teams make informed decisions and continuously refine their testing strategies.

5. Open & Flexible Ecosystem

Uniform workflows and a consistent user experience reduce context switching.
100+ integrations for CI/CD, project management, and popular automation frameworks, letting you plug and play with your existing toolchain.
Built for any tech stack, any team size, and any testing objective—no matter how unique.

Built for Developers, by Developers

Our team of 500+ developers has poured their expertise into building a platform that eliminates friction from the testing process. From zero-code integration via our SDK to enterprise-grade security, private network testing, and unified test monitoring—every feature has been designed with one goal in mind: making testing seamless.

The Future of Testing Starts Here

The BrowserStack Test Platform is more than just a product launch—it’s a paradigm shift in how engineering teams think about software quality. Whether you’re a developer, tester, or QA leader, this platform is designed to help you build the test stack your team wants.

Ready to transform your testing workflows? Explore the BrowserStack Test Platform.

Author

Kriti Jain – Product Growth Leader

Kriti is a product growth leader at BrowserStack and focuses on central strategic initiatives, particularly AI. She has over ten years of experience leading strategy and growth functions across diverse industries and products.

BrowserStack were Gold Sponsors in EuroSTAR 2025. Join us at EuroSTAR Conference in Oslo 15-18 June 2026.

The Evolution of AI in Software Testing: From Machine Learning to Agentic AI

April 9, 2025 by Aishling Warde

Everywhere you turn, someone is talking about AI — AI this, AI that. No wonder some people roll their eyes at the mention of artificial intelligence. For some, it’s all smoke and mirrors, just a glorified spreadsheet rather than a technological breakthrough capable of real cognitive reasoning.

And just when you think you’ve caught up, something new appears. First, we had simple machine learning and AI, then came Generative AI, and now Agentic AI is all the rage. If you feel like you’re constantly playing catch-up, you’re not alone.

But whether you love it or loathe it, AI isn’t going anywhere. In fact, some tools are now designed to think, create, and learn—just like Keysight’s Eggplant Intelligence.

The Thinking, Creating, and Learning Framework

This framework simplifies AI by breaking it into three key functions:

Thinking involves decision-making and adaptability, much like Agentic AI, which enables AI to make choices based on real-time data.
Creating is tied to generative AI capabilities, allowing AI to generate test cases and user scenarios autonomously.
Learning follows the principles of traditional machine learning, pioneered by Alan Turing in 1950, and enables AI to improve over time based on historical data.

Figure 1: Eggplant Intelligence supports the entire Quality Engineering Lifecycle

So, what’s the real difference between these AI types? How do they impact software testing? And does anyone actually care? The short answer: there are plenty of differences, they have a huge impact, and yes, you should care.

Before we unravel these questions, let’s take a trip down memory lane to understand how we got here.

The Birth of AI in Software Testing – Keysight Eggplant’s Heritage

Back in 1947, Alan Turing gave a lecture that introduced the idea of a machine’s ability to exhibit intelligent behaviour and learn just like a human. Since then, ‘machine learning’ and artificial intelligence has evolved considerably, and in 2018, Keysight Eggplant integrated such tools into its Digital Automation Intelligence (DAI) platform, which is now known as Eggplant Test. This was groundbreaking then and remains so today, enabling automated software testing to:

Identify all user journeys – Machine learning algorithms analyze applications and uncover every possible user journey to generate test cases automatically, improving test coverage and reducing manual effort.
Prioritize test cases – By learning from historical test runs and code changes, the system can pinpoint high-risk areas and prioritize testing where it matters most, optimizing testing time and resources.
Detect anomalies – AI can track normal system behavior, spot deviations, and flag potential defects early in the development cycle.
Adapt test scripts – Automated scripts dynamically adjust to application changes, minimizing maintenance and improving long-term test stability.

This goes beyond simple test automation. Imagine changing your payment gateway on an eCommerce site—Eggplant can auto-generate new test cases to reflect the update without requiring hours of script rewrites. That’s the power of intelligent automation.

But AI in software testing isn’t just about running test cases. Keysight Eggplant Test has also led the way in image-based testing, optical character recognition (OCR), and computer vision—critical for automating graphical user interface (GUI) testing in complex, secure environments.

Generative AI – Automating Test Creation

Next up: Generative AI, the “Creating” part of the framework. This subset of AI revolves around understanding and generating human-like language through natural language processing (NLP), including large language models (LLMs).

Generative AI can be used to automate test cases, reducing manual effort while improving accuracy. But Keysight is taking it a step further—our Gen AI capabilities are in development to generate test case frameworks directly from software requirements documentation, allowing testers to refine them rather than start from scratch once launched.

Security is also a major priority, which is why when Eggplant Test with Gen AI is launched it will operate using secure, offline, technology-agnostic LLMs. Unlike cloud-based solutions, our models will be deployed on-premises, ensuring complete control over sensitive data and compliance with strict security regulations.

Cloud-based AI testing tools that use ChatGPT pose risks, such as “shadow prompting,” where unchecked user inputs generate unreliable outputs. While techniques like prompt engineering can mitigate this, on-premises AI solutions eliminate the risk altogether.

Agentic AI – The Next Evolution

Now, we arrive at Agentic AI, the “Thinking” part of our framework. This evolution introduces intelligent agents that can autonomously design, execute, and optimize test cases. Using chain of thought, a technique that stacks multiple commands to perform complex tasks, these agents perform intricate testing, ensuring all possible user interactions and edge cases are covered.

Another breakthrough is computer use agents (CUA) such as large action models (LAMs), which automate browser-based processes by interacting with web applications just like human testers. This is crucial for end-to-end web testing across various devices and browsers.

And then there’s large vision models (LLaVA), which enhance technologies like traditional computer vision to interpret and validate visual data, verifying UI elements and graphical components in applications.

Sound familiar? It should. Eggplant Intelligence already integrates elements of AI, Gen AI, and Agentic AI into a single platform. Our system optimizes test coverage, automates interactions across digital environments, and executes tests just as a human would, all while remaining offline and compliant with AI governance laws in the UK, EU, and US.

AI Testing Compliance – The Keysight Advantage

Many testing tools rely on cloud-based AI architectures, making them non-compliant with the EU AI Act and other regulatory frameworks. Cloud-based solutions often fail to meet the strict security demands of regulated industries, leaving organizations exposed to potential privacy violations.

For industries like aerospace, defense, and healthcare—where data security is non-negotiable—cloud-based AI testing tools are simply not an option. Storing customer or intellectual property data outside a secure firewall can lead to legal consequences and hefty fines.

This is why Keysight Eggplant is the only AI-powered testing solution that prioritizes security, transparency, and governance. Our on-premises approach ensures that all sensitive data remains secure, meeting even the most stringent compliance requirements.

And let’s be clear—using cloud-based AI for test script generation or test reports is not only risky but illegal in many jurisdictions. GDPR and other data protection laws prohibit storing customer data outside of an organization’s firewall, making cloud AI tools a liability for compliance-conscious businesses.

The Future of AI in Software Testing

AI in testing isn’t just about keeping up with the latest buzzwords. It’s about making smart, future-proof choices that balance innovation with security, scalability, and compliance.

Keysight Eggplant has been pioneering AI-driven testing since 2017, long before many of today’s players entered the field. As AI evolves, we continue to push boundaries, ensuring our platform remains at the cutting edge of secure, offline AI testing.

So, if you’re serious about automated software testing and need a future-proof, AI-driven platform that doesn’t compromise security, compliance, or flexibility—it’s time to take a closer look at Keysight Eggplant.

Header image is a photo by Mauro Sbicego on Unsplash.