Bloggo back to the blog
Learning to Love Data Warehouse Testing by Jackie McDougall-->
Last year, when I was asked to help a data warehouse testing project in trouble, I had mixed feelings. Whilst I was happy to get a chance to work on something a little bit different and off the beaten track for me, I was also slightly anxious. What did I know about data warehouses? I had a mental image of Mulder and Scully, investigating dark and dingy abandoned buildings, which I knew couldn’t be right, so I turned to Google to set me on the right track. However, there didn’t seem to be a consistent view. One search result claimed that ‘Data Warehouse Testing is Different…’, but two lines down, there was another result stating that ‘Testing a Data Warehouse is not very different…’. The truth was out there, as they say, but probably somewhere between the two. So I investigated some more.
After a bit of research on ‘official’ definitions of what a data warehouse is, I found Ralph Kimball’s: “A data warehouse is a copy of transaction data specifically structured for query and analysis”. Query and analysis is what makes it distinct from a relational database. With a relational database, you get a picture of the individual transactions and an operational view of data at a particular point in time (the Now). Whereas a data warehouse tends to provide a historical and summarised view of some attribute, management information about the business (the Progress and Trends). So, armed with a slightly better understanding of data warehouses, I got into the project itself.
A UK public sector client was tasked with achieving performance targets set by the Government on a particular aspect of their work. The solution being implemented was an Extract, Transform and Load (ETL) of data from multiple data sources to a single data warehouse. This was being provided by a third party who would also provide a set of agreed reports, showing if and when threshold levels had been breached. When I joined the team as Test Manager in January, a number of problems immediately became evident:
• The client was questioning if the solution was the right one (which caused some friction between the client and the supplier)
• The solution delivered was incomplete (but this had not been communicated to the client)
• Data quality issues had not been considered (but were a major concern to the PM)
• There was lack of buy in from the business and IT teams (and therefore no sense of urgency to become involved)
• Testing was floundering (there was no clear test strategy, a very laborious test script and defect management method, and the test team had got bogged down in very low level and constant regression testing).
According to the client’s test schedule, the test team should have been almost finished at the end of January – but the reality was very different, and it seemed to me the project was in danger of spiraling out of control.
In an ideal world, the supplier would carry out system testing on the ETL (including counting records in the source files, through each part of staging, and in the target tables, and checking all fields, including directly extracted, derived or calculated). They would also test the reports, which after all were the ‘business end’ of the solution, and the part that would actually enable someone to query and analyse the data in order to make a decision. That would then allow the team I was working with to carry out ‘pre-UAT’, sampling the data load, data validation and error processing, checking some of the key fields for calculated and derived data, and sanity checking the reports. But as we all know, we don’t live in an ideal world.
I decided the best way to move forward was to provide a look at the bigger picture. To do this, I first needed to tackle the questions of test resources and the test strategy. The extended test team needed to be engaged – from data quality to the IT team and real end users (who would ultimately be responsible for acceptance and sign off). There also needed to be a more formal test strategy, with defined acceptance criteria, test script management, release process, and defect management process.
With these two key foundations in place, I then set about establishing the way forward with the test approach. The project team were going to have to carry out more detailed testing than the pre-UAT sanity check, but I still wanted to avoid the low level testing they had drifted into. I recommended a risk based approach, where they defined the top ten common pathways that the source data would follow. What I was trying to find out was, broadly speaking did the solution produce the required reports with most of the information required in the majority of cases? The top ten concept caught on, and we investigated:
• Where the highest volumes would be found
• What the data would look like
• What the implications of data being missed in certain areas would be.
From this we got a very clear map of the data, the risk, and therefore the tests that we needed to focus on. We identified the areas of the map that had already been covered by earlier tests, but we also identified the gaps – and now, armed with priority and volume information, asked ‘Do we need to test this?’ ‘Do we have the time and resources?’ ‘Can we afford not to?’
With the earlier improvements in resource engagement and test strategy, and this Common Pathways map, the project team had much better visibility of where it was and what had been achieved. It wasn’t necessarily a pretty picture, but at least it was an informed view. With the data quality and IT teams becoming more involved, the acceptance testing could really begin, and we also decided to get business users involved as soon as the ETL and reports were more stable, to ease the process of acceptance and final sign off.
I left the project shortly after this, successfully handing back the day to day running of the testing to the project team. But what did I learn from my first experience of managing a data warehouse testing project? Well, the main lessons were:
• Define quality entry criteria expected from your supplier
• Insist on design/development documentation
• Get the requirements right and test them
• Consider data quality and when it gets addressed
• Get a really good test strategy
• Get the right (skilled) resources on board (early)
• Think carefully about the reconciliation exercise as part of acceptance.
This was a more detailed list at first – with all the evidence specific to my project, but then I realised, when you take all that away, this was a list that applied to just about every project I’d worked on. Which brings me back to my initial Google on Data Warehouse testing that said ‘it’s not very different from testing on any other project….’ So what I really learned was, choose appropriate tools for your context, and make the best of what you’ve got!