Bloggo back to the blog
A view from the Chair with Michael Bolton (Volume 7)-->
I’ve been travelling even more than usual this year. I’ve been to Winnipeg here in Canada; Milwaukee, Orlando, and New York in the United States; London and Brighton in England; Copenhagen in Denmark; Wellington, New Zealand; Gothenburg and Stockholm in Sweden; Oslo and Stavanger in Norway; Xi’An, Beijing, and Shenzhen in China. One of the rewarding things about visiting so many places and meeting so many people is the fact that I get exposed to so many different languages, cultures, and sub-cultures.
But what is a culture? I once heard culture described as “the way we do things around here”. On another occasion, someone described culture as “all the stuff we do that we don’t realize that we’re doing”. More formally, anthropologists would describe cultures in terms of knowledge, languages, beliefs, arts, and behaviours that a group adopts in order to adapt to its surroundings. Wade Davis says of cultural diversity, “When asked the meaning of being human, all the diverse cultures of the world respond with 10,000 different voices. Distinct cultures represent unique visions of life itself, morally inspired and inherently right. And those different voices become part of the overall repertoire of humanity for coping with challenges confronting us in the future.” ( http://news.nationalgeographic.com/news/2002/06/0627_020628_wadedavis.html)
For years, many people in the testing community have advocated international standards and a worldwide common language for testing. I made fun of that idea in this blog post but my objection to it is serious. If our goal is to make widgets fit together and work together, then standardizing and reducing diversity might be a pretty good idea. If our goal is to find unexpected problems in the design or implementation of a product, we need diversity and adaptability – precisely because bugs are unexpected and they don’t follow standards. Meanwhile, to all of those who would like to see international standards for testing, we’re pretty close to having some – and the results aren’t pretty.
Every year, I speak to hundreds of testers in dozens of countries. Almost everywhere I go, I hear about standard units of measurement called the Test Case, the Requirement, and the Defect. The quality of testing work, the performance of testers, and quality itself are evaluated in terms of ratios of these three units of measurement. To those who tout these forms of measurement, it doesn’t seem to matter that one requirement might be “express the outcome of this calculation to the nearest cent, rounded up from zero” and another might be “ensure that the system is consistent with the Apple Human Interface Guidelines”. This thing called a test case might consist of dozens of steps, lots of variation, and hundreds of implicit observations performed by a human, or it might be a check of a single outcome of a single low-level calculation. This thing called defect might be a CSS problem that causes text to spill outside the boundaries of a text box on one kind of Android phone, or it might be a design error that destroys a customer’s work and torpedos the value of the product. In each case, the variability that’s built into each unit can render the measurements meaningless, but that seems not to bother the managers who claim to depend on them. These unsupportable forms of measurement have been used in testing worldwide for a long time. Why?
One answer might be that, as long as you don’t look too carefully, measurements like “Defect Detection Percentage” or “Bugs Fixed vs. Bugs Found” seem plausible. Numbers and formulas have a patina of credibility that can dazzle people, especially the mathematically unwary. It takes some effort to develop a systematic and objective way to evaluate complex cognitive activities. Decisions about the skill of a tester and the worthiness of testing are multi-dimensional, qualitative, and highly subject to context. It’s much easier to count lines in a document, rows in a spreadsheet, or items on a list and turn them into ratios. If you obtain a measurement is consistent with your theories, it must be correct; and if the measurement doesn’t support your theory, it’s easy to dismiss validity problems by shrugging and saying that “no measurement is perfect”. Social-science approaches to testing might yield more nuanced, qualitative, and valid descriptions of testers and their work, but many software development cultures have long been dominated by people who are fascinated by numbers, from programmers on the one hand to accountants on the other.
To some, test cases might seem like a reasonable way to organize testing. A test case – expressed in a document as a set of specific steps to perform, with specific data to enter and specific output to observe – is like a little program or a subroutine in a larger program of testing. When all of the subroutines have been run and have returned the expected values, one could declare the testing program to have finished. In my observation, a tester who follows a test case strictly does not find many bugs, nor does the test case help with the really serious work of investigating, exploring, reproducing, and reporting a problem. Nonetheless, many testers find themselves being evaluated by the number of test cases they have performed in a day, rather than the value of the information that they’ve provided to the development team. Thinking of testing as a program to run, and evaluating testing by counting the number of transactions per unit time has an aura of plausibility, especially in cultures that revolve around programming.
Another answer for the longevity of outmoded measurements might be “inertia”. Things don’t shift from their path until they’re pushed. Cultures don’t change unless something happens that prompts them to change. Bad measurement has become embedded, stuck, in testing culture, and in my observation much of testing has become stagnant. Cultures stagnate when they don’t get new input from the world around them, or when they choose to ignore it until a crisis punctures the balloon.
Testing has to avoid becoming a stagnant monoculture, and that’s why I chose the theme of this year’s EuroSTAR to be “questioning testing”, which includes questioning where we’ve come from, where we are, and where we might go. Laurent Bossavit will examine how some of the myths about testing got started, and how they’ve survived for so long. Martin Pol will talk about the evolution of testing over his career, describing why testing culture had to change in the early days and why it has to continue changing. James Bach will deliver a tutorial on rapid test management – including how to evaluate the quality of testing effort without resorting to bogus numbers. Harry Collins, coming from a social science background, will give a keynote on the social and cultural issues that testers must consider. Ian Rowland will show us how easy it is to be fooled, and Fiona Charles will show how important it is to question everything around us – that is, our culture. And finally, Keith Klain will tell a story of how he and his team transformed his organization and its approach to testing, and continues to do so. Because it so perfectly fits our goals for this year’s EuroSTAR, I’d like to close by borrowing a refrain from Keith’s talk: Changing the culture is hard… but we’re going to do it anyway.