Bloggo back to the blog
G(r)ood testing 11: Explosive Software – When Risks Do Count-->
Testers are very aware of risks. Or, at least we think we are. Most testers have used the principle “No risk, no test” and many test managers have scheduled a risk analysis session with their stakeholders before finalizing the test plan. The generally accepted methodologies require that we make our test strategy risk based. Therefore, we prioritize the functions or business scenarios, we calculate the probability of errors and the estimate impact of an error should it occur. I regularly see test plans with pretty tables filled with pluses and minuses. These tables aim to guide our focus as a tester to the important areas and spent our time right. But how risk-driven are we really? And do we actually talk about risks?
Last week I drank a cup of coffee with my neighbor. He is an expert in project management with a specialization in project risks. We discussed the relationships between risks and various planning techniques. Our discussions lead towards the FMEA, (failure mode and effect analysis) which you very well may have heard of.
I was reminded of a visit I once brought to Shell. Back then I was still a student and large companies invited our student organization regularly. During our visit we were challenged by doing a case assignment, that suddenly came back to mind vividly.
We were asked to do a risk assessment for a chemical plant. Within the petrochemical they take risk very seriously, since the impact of failures are very concrete: products fail; reaction silos explode and when it does this can lead to serious health hazards. Compared to processes in IT, the processes within the petrochemical industry are much more tangible. I therefore like to explain FMEA risk analysis using this context.
Imagine a set-up with a large metal silo onto which pipes are connected. These pipes inject liquids and gases into the silo or are being used to extract them. A fire is lit underneath the silo and on top of it a pressure valve is installed that prevents the internal pressure from rising too high.
In the analysis that we performed we analyzed each of the mentioned components. We thought about what could happen with them. For example, more hydrogen is added to the mix and the composition in the reaction silo changes. What would happen then? What happens when the supply of hydrogen is reduced? Or, what will happen when gases are not removed, or too much. Image the fire we lit goes out, or burns harder. What would happen when the temperature in silo would drop or rise? Soon we argued over the causes and consequences. For instance, how can the fire be extinguished? Does someone need to turn a knob, can it happen when a valve is disrupted, does it need a halt in the supply of natural gas or maybe it can happen when the thermal sensor is broken? To analyze this, you can use an Ishikawa- or fishbone diagram.
Then we extended the analysis with second or third order failure modes. We investigated the impact of multiple component failure. For example, the situation where the pressure in the reaction vessel rises and the pressure relief valve is stuck. I told you: within chemistry impacts are more tangible and explosions are a clear risk.
Within the test discipline we rarely use FMEA and that actually be wonders me. Testing is all about Risk mitigation, so why settle for a table with pluses and minuses. It defines priority but does not address real risks. We therefore repeatedly fail to translate our risks into understandable business impact and fail to involve our business stakeholders. Why don’t we do real risk analysis, why do we settle for something less that deludes the effectiveness and value of our work? I think many testers are too much fixed on checking the specifications. But maybe it is because, the systems we test have grown too complex. This could explain why technical risks are so far detached from the customer experience. The main reason, however, is that we are afraid that when do a serious risk assessment; we will identify a lot of risks.
On one hand, it would demonstrate how difficult it is to build a good system. But it would also force us to convince the business that we mitigated all of the identified risks sufficiently. And, I am afraid, that is something we are incapable of.