Bloggo back to the blog
Why is testing data difficult?-->
This will be a series of articles to discuss both the challenges in testing (big) data applications, as well as some approaches, lessons learned and techniques to overcome those difficulties.
We have put together a group of authors who have lots of experience in both developing and testing data-centric applications. We hope to achieve a great discussion between not just the authors but with the EuroSTAR community.
This first installment of our series will introduce the concepts of testing data applications and aims to set the scene for the types of applications we would be discussing in this blog series. Let us know which other applications or topics you are interested in, we appreciate any and all input to this series as it builds up to a roundtable discussion at EuroSTAR. The second blog (to be published next week) will then focus on the challenges of testing data applications we testers face.
Why is testing data difficult?
Most of the action occurs behind the scenes; even if output is displayed in a report or as a message to an interfacing system that is just the tip of the iceberg. There’s always much, much more underlying data that those reports and messages rely on. The combinations can be virtually endless depending on the type of application and business/functional logic, and there is always a dependency on various business rules as well as different types of data like transactional, master and input data that needs to be considered.
The environments we test are complex and not homogenous. There are often multiple programming languages, databases, data sources, data targets, and reporting environments that are all integral parts of the solution.
Data for our tests can come from functional specifications, requirements, use cases, user stories, legacy applications and test models – but are these complete or accurate? What about conflicts in some of these input specifications?
There are lots of different types of data that need to be managed: Files, tables, queues, streams, views, structured, and unstructured datasets (hello, promises of Big Data!) etc…
In many applications, data can be located in multiple places, for example, partitioned data, or data that is mapped to different structures across different servers.And don’t even get us started on all the various regulations, laws and their interpretation of test data security, privacy and protection – many organizations have realized that their current way of approaching test data might not be sufficient anymore.
There is no silver bullet for testing data or for test data.
How do you represent test data? How do the testers prepare that data? How do they place it where it belongs? How much time will this take? And how much time should it take?
What types of applications are we testing?
First, it’s good to classify exactly what types of applications we will address. While a lot of the principles and ideas that we will share can be applied to most computer programs, we’re going to focus on data integration systems that move, transform, and populate data.
Often these can be classified in the following types of applications:
• Data Warehouses and the ETL/ELT applications that populate them.
• Big Data applications – that analyze large volumes of data, often across distributed systems.
• BI applications that process data and present it
Before we start describing the challenges of testing these types of systems, let’s first look at some of the key characteristics that define these applications.
• Diverse sources of data that are used to feed the systems
• Complex environments where there are often multiple databases, file systems, application servers, and programming tools/languages that are used to process the data
• Large variety of rules used to process the data
• Dynamic requirements when it comes to presenting or working with the data
• Complex dataset types such as files, tables, queues, streams
• Structured, semi-structured, and unstructured data
• High dependency on interfacing systems and error / exception handling
In future posts, we will discuss some of the techniques and methods that we have found to be successful, and in some cases not successful, in trying to provide QA support for our ETL and Big Data applications. Please feel free to join the conversation, and share your stories, or let us know, which areas in particular you would like us to focus on. We will actively engage in the discussion threads, and hope to use some of the posts to start of topics for future discussion.
Accenture & Matthias Rasking:
Accenture is a global management consulting, technology services and outsourcing company with more than 266,000 people in 54 countries. Accenture Test Services has been providing testing services for more than two decades, both on-site and through our Global Delivery Network with more than 16,000 dedicated testing professionals.
Matthias Rasking leads Accenture Application Testing Services in Europe, Middle East and Africa. With more than 14 years of experience in the Testing and Quality Assurance space, Matthias has supported many multi-national clients in various industries in becoming high-performing businesses by establishing a structured and strategic approach to quality assurance. He supports the ASQF (a German organization dedicated to Software Quality) in their Test Data Management working group and furthermore is the Deputy Chair of the Technical work stream regarding Model Development at the TMMi Foundation.
Compact Solutions & Jeffrey Pascoe
Compact Solutions (www.compactbi.com) was founded on a passion for cutting edge technology and the need for better access to corporate information assets. Formed in May 2002 and headquartered in Chicago, Compact Solutions has offices in four countries: United States, United Kingdom, Poland and India. Compact’s goal is to bring every customer Speed, Power and Profit from their information. We provide both development and testing services for data applications, as well as software products for Metadata Integration and Automated Testing through TestDrive.
Jeffrey Pascoe is the Director of Solutions Delivery Europe for Compact Solutions. With 15 years experience he has a proven track record in providing traditional consulting services, training and education. In particular he has worked on a number of automated testing frameworks, and software products. His particular fields of interest and passions lie in large data integration applications, meta-programming, data governance, and automated testing.