Bloggo back to the blog
Metrics – Decision Maker or Pointer?-->
Metric – A standard of measurement.
The New England Patriots are the 3rd most successful American football team in history. They have appeared in 7 Super Bowl matches. Two other teams have appeared in 8 Super Bowl matches.
Seems a factual statement right, not too inflammatory or hard to disagree with? But what if we were to dig a little deeper:-
Q: ‘By successful, what do you mean?’
A: ‘They have appeared in the 3rd most Super Bowl appearances’
Q: ‘How many did they win?’
A: ‘They won 3’
Q: ‘Did any team with fewer appearances, win more than 3?’
A: ‘Yes, three other teams with less appearances won more and two other teams with less appearances won the same amount as New England’.
Okay, here’s another. There was a famous advert in the UK for cat food. The theme went ‘8 out of 10 cats prefer Whiskas’ (a brand of cat food). Now that is cool; cats that can talk and explain they prefer one food over another. I really want to hear from the other 2 cats why they didn’t prefer Whiskas. Shouldn’t be that hard right, let’s review the evidence as there are only 10 cats who were asked….in total……ever…..right?
I am being deliberately provocative of course. The above 2 examples are merely examples of why things should not be taken out of context. When it comes to metrics, context is the single most important aspect which is often overlooked or ignored to satisfy the needs of a specific message. Metrics without context are raw figures or statements that surely run the risk of being mis-interpreted.
Metrics – The full picture?
All companies need data, they make critical decisions based on data, but how many companies add context to their data to give them information/evidence which better inform their decisions? How many decision makers accept, that even by applying context to the data the evidence is historical and no matter how accurate cannot predict the future?
If we take the ‘8 out of 10’ example from above:-
What was the context?
What was the motivation for the survey?
What was meant by ‘prefer’ (ie. Ate all the ‘Whiskas’, ate a percentage more ‘Whiskas’ than other food)?
Is ‘prefer’ not subjective? How can we quantify personal opinion in order to make informed decisions?
Why only 10 cats?
Was there a choice of 2 or 10 cat foods?
In other words, accepting ‘8 out of 10’ at face value leaves all these questions unanswered and doesn’t give us the full picture of the situation.
If we take the Super Bowl example from above. Surely, the question is how do you define successful? By appearances or wins? Is my concept of successful the same as yours or your colleagues? Essentially, on the face of the metric in the first line it looks ok but without context and reasoning, it could be a misleading metric.
Metrics – The key?
We need metrics of course. We need to be able to report on status, progress, risk, health, etc. Metrics play a big part in reporting all of that. However, they should not be the KEY fulcrum or decision maker. They should be the ‘lead off point’, they should be the basis for digging further, additional analysis, asking the questions, the deeper dive investigation, etc.
As Gil Grissom would say – “The only thing that we can count on is the evidence.”
Once we have metrics and we follow the evidence, we become more informed. As Testers, being informed is the single most critical plus point you can have. Being more informed means you have more information. Having more information allows you to make better decisions and impart that information to stakeholders to allow them to make informed decisions.
Can we, or a stakeholder you disseminate information to, make informed decisions based on raw numbers without the context and the information from the follow ups above?
Of course, the obvious pothole with metrics is how metrics can be ‘gamed’ or dressed up. During the 2012 US Presidential Election, there were numerous sounds bites regards metrics being thrown at voters in an attempt to make them look attractive and in turn make candidates look better and consequently, make voters vote for the candidates. An example is:-
Mitt Romney: “I am not going to raise taxes on middle-income people”
Interviewer: “Is $100,000 middle income?”
Mitt Romney: “middle income is $200,000 to $250,000 and less”
Without the interviewer asking that question about what counts as middle class, the initial statement could sound really attractive and powerful to ‘some’ of those who thought they came under the banner of middle class. The reason I say ‘some’ is because middle class in the US is typically ambiguous, with social scientists putting forward it may constitute anything from 25%-66% of households. By the interviewer asking the subsequent question, it provides context and information. The subsequent answer shows that on face value, the statement is quite vague and has a different outcome dependent on whether you are a voter earning $190k or $260k. Imagine the interviewer hadn’t asked that additional question to provide the extra context.
VP Joe Biden, “we’ve created 4.5 million private-sector jobs in the last 29 months”.
Why 29 months? What happened at month 30 or 31 or 30 thru 36 for example? Again, an attractive statement but are these figures to suit an argument or cherry picking of data?
Essentially metrics are important, they make up a vital piece of the armory for any Tester or Manager to explain an issue, situation, problem or progress. But there are hidden dangers in raw metrics. Metrics without context are simply figures or statements on a page or dashboard. Metrics, on their own, aren’t that clear cut and can never give us the full story. The examples from above related to ‘middle class’ and ‘job creation’ show how metrics without context or story, can be misleading. Let’s try an example closer to our hearts, the testing field:-
A misguided goal amongst the Testing fraternity is ‘we need a 100% pass rate’. The wrongs regarding that are worthy of another article all on its own, but let’s stick to the example at hand.
Tester: ‘Hey boss, we have a 100% pass rate in our automated nightly run. All our tests passed’ (woohoo!)
The Tester is right and the information radiator ‘proudly’ displays that all tests have passed (100% green) in the regression run of feature ‘x’ for application ‘y’.
The search for the monotonous green bar has been successful. Hurrah, bug free software right?
Well, before we all celebrate, let’s ask a few questions to help us analyze the data:-
100% of what? All tests available or all tests executed?
What was the purpose of the execution?
Was it a regression run or merely testing new features (ie. Feature ‘x’)?
Was it 1 test, 100 tests, 1000 tests?
Were the tests a selection of the tests available, were they cherry picked?
Were the tests ran on an isolated piece of code (related only to the feature) or an integrated environment?
Are the tests all relevant and up to date?
Does ‘passed’ mean taking into account ‘known fails’ or ‘quarantined tests’?
Let’s just say it was 100 tests out of a suite of 500 tests on an isolated branch of code. Sure they passed, but is it the full story? Are we really in a position to say we have tested that feature or application fully? Are we ever?
Suddenly our dashboard of 100% green doesn’t make me feel so warm and fuzzy anymore. But without the context behind the data, would we know? Metrics are as much a danger as they are an aid.
‘Are your testing metrics offering value?’
According to Einstein: ‘Not everything that counts can be counted and not everything that can be counted counts’.
We can pretty much measure anything but that doesn’t mean we should. There is a tendency to value what we measure, not measure what we value. An example of that is counting Test Cases, lines of code, bugs, automated tests, etc.
Those metrics may have value to some people but in essence, do they tell you anything about your product or more importantly the health of your product?
We become so obsessed with metrics whilst ignoring or paying little attention to their validity. Test Cases for example are people’s initiatives, their thoughts. There is an obsession though with counting Test Cases. As Michael Bolton has often stated and asked ‘how do you count ideas?’
RANDOM MOMENT :)
In the same respect People are not Resources, Resources are finite, People are infinite, and they can generate many ideas, ideas we need to value and utilize. :)
Metrics are numbers and sometimes statements. Metrics without context or story are meaningless. Numbers simply aren’t as descriptive as words or stories. Numbers don’t explain reasoning and analysis.
Numbers can also mean different things to different people and different groups. A project on a dashboard showing green can be music to a Manager’s ears, but within the bowels of that project may be issues with code coverage, basic user scenarios, medium defects that sap customer experience, tests not ran, tests quarantined, tests not written, etc. Those issues might not yet present an overall project problem, but they are problems that they may not be visible to a Manager, who thinks everything in the garden is rosy.
Some basic rules for metrics
Lead to an action (questions, stories, improvements)
Not related to performance evaluation (bugs per tester, code per developer, etc)
Quite simply, we cannot and should not aim to measure everything. When you are reporting on a project from a testing perspective, the goal shouldn’t be to bombard the stakeholder with masses of figures and numbers. The goal should be to tell a story as to the status of the project. By all means use numbers where it helps to make the point, but don’t make it the point.
Ensure your metrics have a focus, therefore ensuring your report is relevant. A straightforward example would be not to incorporate how many test cases you have, but how many tests you have ran and what they have found to date. As a rule of thumb when I am looking numbers, I am usually more interested in the last result of a test and how often that test has been ran. This tells me the latest status of the software and the importance of the test (ie. If a test is ran once every 4 runs as opposed to every run, is it a vital test?)
Ensure your metrics are simple, in turn making your report easy to digest and remember for the stakeholder. Reports that have metrics weighed down by large and ambiguous graphs and bar charts will pretty much be forgotten the minute the stakeholder walks out the door. A simple report based on something such as the below makes it easy to remember.
Chris Clements pic
The above is quite simple but easy to digest and remember. Notice the lack of numbers or percentages as I would expect the above leads into discussion and further questions where numbers can be provided but provided based on the high level story. If we had lead the report with numbers, there is a risk the story never comes after. Note the action column, this should be a key part of any metric, to explain what’s next and what is being done as a result of this status.
I don’t think anyone queries the need for metrics. I do think everyone should be wary of them though. Metrics, used properly and with adequate context can be an incredibly forceful source of information. It can help shape discussions and questions and allow for planning and expectations. Metrics though, used incorrectly and merely as a source of random statements or numbers on a sheet, can be very dangerous and destructive to projects, teams and expectations.
It is up to the consumer of the metrics to decide how relevant or how best to utilize the metrics provided. If the user accepts them at face value then potentially, they are asking to be deceived or disappointed, at some point in the future. If the user uses them as a ‘kick off’ to find out more about the metrics and ask open questions, then the information the user receives, will allow the user to be more informed and therefore make more informed decisions.
The beauty of metrics is in the questioning that leads to information and as they say, beauty is in the eye of the beholder….Right?
Chris works as a Senior Test Manager for a Payments company. He has over 15 year’s experience within the Software Testing field working through various testing roles within various companies. Chris is also a member of the Irish Software Test Board.
Chris is a passionate advocator of ‘smart testing, not exhaustive testing’ and has a proven track record in test improvement and quality initiatives. He is also a firm supporter and practitioner of Session Based Testing and promoting this concept. Working within an agile environment, he has embraced the concepts of Agile for the Testing discipline and has been integral in re-defining traditional testing practices to accelerate Quality within the companies he has worked for.
Previous experience at companies such as Liberty Information Technology, 3Com and Nortel Networks all within the testing sector have helped him establish a sound testing knowledge.
Chris also maintains a blog where he shares his thoughts on various testing topics.