For the past few years, AI in testing has been dominated by impressive demos. A model generates a test script from a user story. A chatbot repairs a broken locator. A prompt magically produces automation code.
But when enterprise teams attempt to deploy these capabilities into real CI/CD pipelines, they quickly discover a gap between possibility and practicality.
Large Language Models (LLMs) are powerful, but they introduce serious challenges in production testing environments. They are inherently probabilistic; identical prompts can yield different results, undermining regression reliability. They can hallucinate syntactically correct but logically flawed automation steps. As test flows grow longer and more complex, models may lose context. Inference latency and cost can escalate. Model drift can alter behavior over time, requiring continuous revalidation.
Small Language Models (SLMs), by contrast, are fast, stable, and cost-efficient. They perform exceptionally well in structured, domain-specific tasks. However, they lack the deep reasoning power required for complex, multi-step intent interpretation.
So the real question is not “LLM or SLM?”
It is: How do we combine both intelligently to build reliable AI-powered automation?
The Hybrid Architecture: Precision Meets Reasoning
Production-grade AI testing requires architectural discipline.
The most effective approach is a hybrid model strategy:
- Custom SLMs trained on domain-specific automation assets—keywords, UI definitions, reusable components, and workflow logic.
- LLMs are reserved for advanced reasoning tasks, such as decomposing high-level test intent into structured automation steps.
This separation of responsibilities delivers determinism and efficiency for routine automation while preserving flexibility for complex reasoning. The result is faster inference, lower operational cost, improved stability, and reduced hallucination.
Rather than relying on a single general-purpose model, a hybrid architecture assigns the right model to the right task.

This architectural pattern can be applied to modern automation frameworks in general. At DWS, we have successfully implemented this hybrid model within TestArchitect (TA), demonstrating how domain-trained SLMs and reasoning-driven LLMs can operate together inside a structured automation ecosystem.
TestArchitect supports a wide spectrum of platforms and technologies, including SAP, Salesforce, mobile native, mobile web, mobile hybrid, browsers, and desktop applications. Teams can create fully no-code automated tests in plain English, allowing business analysts, domain experts, and testers of all skill levels to contribute without coding barriers. This approach accelerates in-sprint automation, helping teams keep pace with rapid release cycles while ensuring high-quality testing across diverse applications.
From Single-Model Scripts to Agentic AI
Even hybrid models alone are not enough. True production-grade automation is not generated by a single model guessing a script; it is orchestrated.
Agentic AI introduces a multi-agent system where specialized AI components collaborate across the testing lifecycle:
- Interpreting business-level test intent
- Mapping intent to reusable automation actions
- Generating structured UI interaction steps
- Validating and refining codeless automation
- Executing across platforms
- Performing intelligent self-healing
- Classifying root causes of failures
- Predicting impact and suggesting remediation
Instead of relying on a single probabilistic output, the system validates itself through structured collaboration. Each agent has a defined responsibility, reducing instability and improving reliability.
Within CI/CD pipelines, intelligent test selection based on code impact analysis ensures that only relevant test suites are executed, reducing cycle time while preserving coverage.
This marks a shift from AI as a script generator to AI as an orchestrated testing system.
Within TestArchitect, this agentic hybrid model powers end-to-end intelligent automation, but the architectural principles are transferable to other enterprise automation environments.

Data Engineering and Model Validation
The real differentiator in AI-powered testing is not model size. It is data quality and governance.
For domain-intensive industries such as finance, healthcare, and energy, AI must align precisely with industrial workflows. This requires:
- Structured automation datasets
- Clearly defined evaluation metrics
- Domain-specific fine-tuning
- Human-in-the-loop validation
Domain experts play a critical role in guiding quality standards and ensuring AI-generated outputs meet enterprise expectations.
To improve precision and consistency, Retrieval-Augmented Generation (RAG) is applied using structured project-specific automation data. Instead of relying solely on model weights, the system dynamically retrieves relevant context, existing action libraries, interface definitions, and project artifacts before generating tests.
This approach reduces hallucination, preserves framework alignment, and improves maintainability.
In TestArchitect implementations, this structured RAG strategy has enabled reliable AI-assisted test generation aligned with domain rules and enterprise governance standards.
Deployment, Cost, and Sustainability
Enterprise adoption requires more than innovation; it requires sustainability.
Custom SLMs can be hosted in private cloud or on-premises environments, ensuring that sensitive QA artifacts, requirements, test cases, and defect logs remain within organizational boundaries. Strategic collaboration with cloud-hosted LLMs minimizes unnecessary inference costs.
Because SLMs are lightweight and CPU-friendly, organizations avoid GPU-heavy infrastructure and excessive cloud spending. Millisecond-level responses enable seamless integration into Agile and DevOps workflows.
The hybrid agentic approach ensures that AI-driven automation remains:
- Secure
- Cost-efficient
- Predictable
- Scalable
Without architectural discipline, AI in testing becomes unstable and expensive. With the right design, it becomes a long-term competitive advantage.
Engineering the Future of AI-Driven Testing
AI will not replace automation frameworks. It will augment them.
The organizations that succeed will not be those experimenting with prompts, but those engineering intelligent systems, combining hybrid models, agent orchestration, structured data pipelines, and rigorous validation.
This hybrid, agentic AI architecture can be applied across automation ecosystems. Its successful implementation within TestArchitect demonstrates that production-grade, domain-specific, cost-controlled AI testing is not theoretical—it is achievable today.
The future of testing belongs to teams that treat AI as an engineering discipline.
Get Started with TestArchitect AI Today
Author

Tuan Truong – Head of Test Architect Product Development
He leads the design and evolution of enterprise-scale test automation solutions, helping global organizations modernize quality engineering practices and deliver large-scale systems with confidence. With over 20 years of experience in software testing, automation architecture, and product engineering, Tuan specializes in integrating AI into practical testing workflows. His current focus is on designing production-grade AI systems that combine Custom SLMs, LLMs, and Agentic AI to create scalable, cost-efficient automation solutions. He works closely with enterprise teams to transform emerging AI capabilities into reliable, real-world testing systems.
DWS are a participating in EuroSTAR Conference 2026 as a Gold Sponsor























