EuroSTAR Conference

Metrics In Quality Assurance: A Practical Starting Point

May 6, 2024 by Lauren Payne

Have you heard any of the following statements from within your team or anywhere else in your organization?

“The feedback loop is too long.”
“I’m not sure what tests we’re running.”
“I don’t know where our test results are.”
“I don’t understand our test results.”

These kinds of questions typically mean that you’ve successfully adopted CI/CD ways of working within development, and automation is freeing up your time for further improvements. But how do you answer these questions before they become real issues and people start to lose interest?

Luckily, the answer is within your reach! You need to define relevant metrics and make them visible to the whole organization, specifically your team.

What metrics should I have?

We get this question a lot. Unfortunately, the answer is the infamous “it depends.” It’s better to show something than nothing, so simply start somewhere.

Once your organization is capable of collecting, storing, and presenting data, you typically begin to realize what metrics are needed. “Well, that’s not really helpful,” you might be thinking. That’s why we want to present an interesting article we came across. In it, the authors present the following metrics:

User sentiment
Defects found in production
Test case coverage
Defects across sprints
Committed vs. delivered stories

When looking at these, we noticed some overlap with DORA metrics.

Deployment frequency

This should correlate with high “(1) User sentiment.” In fact, it’s a precondition before you can even observe it.

Lead time for changes

This tells you how quickly you can go from an idea all the way to production, which is the same as “(5) Committed vs. delivered stories.”

Change fail rate

This tells you how many defects you have found and how long it took you to fix them; in other words, “(3) Test case coverage” further enables you to analyze the root cause of your change fail rate.

“(4) Defects across sprints” is a more fine-grained example of the general fail rate.

Time to restore services

This tells you how quickly you can resolve production incidents, which is the next question after you’ve found out “(2) Defects found in production.”

Given the overlap and the fact that DORA metrics have been proven to work, we consider these as good ones to start with.

Where to start?

Now that we’ve defined several reasonable metrics, how can we collect them?

At Eficode, we believe in automation and that the data in reports and dashboards should be as real-time as possible. So, a few years ago, we started a couple of open source projects to support these kinds of initiatives:

In our customer cases, Jenkins CI has been the most used CI/CD solution, and we’ve already had a successful proof-of-concept when doing metrics with an open source time-series database called InfluxDB in combination with another open source tool, Grafana, which is for building dashboards.

Using open source solutions might need a bit of elbow grease, but they are the cheapest option by virtue of being entirely free. This helps you get going faster—remember, you want to start seeing data so you can evolve your metrics further.

Example of setup:

How to proceed once we have data?

After we’ve set up the infrastructure to start gathering data and visualizing it, we typically create a few graphs to answer some of the most asked questions. For example, “What is the pass ratio for the tests running in continuous integration (i.e., change fail rate or defects across the sprint as mentioned earlier)?”

The data comes directly from your CI/CD tool, so it’s as up-to-date as it can get. And if your data is visible to everyone, your team will have a better chance of comprehending the current situation.

The next step is to start thinking with your stakeholders about the product that you and your team are building. Not all data is as important to everyone. For example, managers want to see the overall pass ratio from the month period, whereas developers want the latest results and to know whether the environment is passing smoke tests.

Luckily, Grafana and other solutions support multiple dashboards. This way, it’s easy to visualize separate metrics for management, team leads, QA teams, etc.

We recommend the practice of providing essential data to each stakeholder while allowing the option to see all of the data when needed.

We’ve often seen that once you start showing current data, more ideas emerge about what should be tackled next. Most often, this leads teams to start making decisions based on facts rather than pulling reasons out of thin air.

Why not increase your knowledge further by learning about building quality in your software?

Author

Joonas Jauhiainen, DevOps Lead

Joonas is a DevOps lead with experience in telecom, banking, insurance, and manufacturing, among other industries. His hobbies include investigation of IT devices, developing games and other SW projects not to mention underwater rugby!

Eficode is an Exhibitor at EuroSTAR 2024, join us in Stockholm.

What our tests don’t like about our code

May 3, 2024 by Lauren Payne

When you start writing tests for your code, you’ll likely have the feeling – bloody hell, how do I drag this thing into a test? There is code that tests clearly like, and code they don’t. Apart from checking the correctness of our code, tests also give us hints about how to write it. And it’s a good idea to listen.

A test executes your code in the simplest possible setting, independent of the larger system it’s part of. But if the simplest possible setting is how it’s run in our app, and it’s impossible to tease out the individual pieces – that’s a bad sign. If we’re saying – nah, we don’t need tests, all the code is already executed in the app – that’s a sign that we’ve created a large slab that is hard to change and maintain. As Uncle Bob put it:

“another word for testable is decoupled.”

It’s been said plenty of times that good architecture gets us good testability. Let’s come at this idea from another angle: what hints do our tests give about our architecture? We’ve already talked about how tests help prevent creeping code rot – now, we’ll explore this idea in a particular example.

As a side note, we’ll talk mostly about tests that developers write themselves – unit tests, the first line of defense.

Our example is going to be a primitive Python script that checks the user’s IP, determines their region, and tells the current weather in the region (the complete example is available here). We’ll write tests for that code and see how it gets improved in the process. Each major step is in a separate branch.

Step 1: a quick and dirty version

Our first version is bad and untestable.

def local_weather():
    # First, get the IP
    url = "https://api64.ipify.org?format=json"
    response = requests.get(url).json()
    ip_address = response["ip"]

    # Using the IP, determine the city
    url = f"https://ipinfo.io/{ip_address}/json"
    response = requests.get(url).json()
    city = response["city"]

    with open("secrets.json", "r", encoding="utf-8") as file:
        owm_api_key = json.load(file)["openweathermap.org"]

    # Hit up a weather service for weather in that city
    url = (
        "https://api.openweathermap.org/data/2.5/weather?q={0}&"
        "units=metric&lang=ru&appid={1}"
    ).format(city, owm_api_key)
    weather_data = requests.get(url).json()
    temperature = weather_data["main"]["temp"]
    temperature_feels = weather_data["main"]["feels_like"]

    # If past measurements have already been taken, compare them to current results
    has_previous = False
    history = {}
    history_path = Path("history.json")
    if history_path.exists():
        with open(history_path, "r", encoding="utf-8") as file:
            history = json.load(file)
        record = history.get(city)
        if record is not None:
            has_previous = True
            last_date = datetime.fromisoformat(record["when"])
            last_temp = record["temp"]
            last_feels = record["feels"]
            diff = temperature - last_temp
            diff_feels = temperature_feels - last_feels

    # Write down the current result if enough time has passed
    now = datetime.now()
    if not has_previous or (now - last_date) > timedelta(hours=6):
        record = {
            "when": datetime.now().isoformat(),
            "temp": temperature,
            "feels": temperature_feels
        }
        history[city] = record
        with open(history_path, "w", encoding="utf-8") as file:
            json.dump(history, file)

    # Print the result
    msg = (
        f"Temperature in {city}: {temperature:.0f} °C\n"
        f"Feels like {temperature_feels:.0f} °C"
    )
    if has_previous:
        formatted_date = last_date.strftime("%c")
        msg += (
            f"\nLast measurement taken on {formatted_date}\n"
            f"Difference since then: {diff:.0f} (feels {diff_feels:.0f})"
        )
    print(msg)


if __name__ == "__main__":
    local_weather()

[source]

Let’s not get into why this is bad code; instead, let’s ask ourselves: how would we test it? Well, right now, we can only write an E2E test:

def test_local_weather(capsys: pytest.CaptureFixture):  
    local_weather()  
  
    assert re.match(  
        (            
            r"^Temperature in .*: -?\d+ °C\n"  
            r"Feels like -?\d+ °C\n"  
            r"Last measurement taken on .*\n"  
            r"Difference since then: -?\d+ \(feels -?\d+\)$"  
        ),  
        capsys.readouterr().out  
    )

[source]

This executes most of our code once – so far, so good. But testing is not just about achieving good line coverage. Instead of thinking about lines, it’s better to think about behavior – what systems the code manipulates and what [the use cases are].

So here’s what our code does:

– it calls some external services for data;

– it does some read/write operations to store that data and retrieve previous measurements;

– it generates a message based on the data;

– it shows the message to the user.

But right now, we can’t test any of those things separately because they are all stuffed into one function.

In other words, it will be tough to test the different execution paths of our code. For instance, we might want to know what happens if the city provider returns nothing. Even if we’ve dealt with this case in our code (which we haven’t), we’d need to test what happens when the value of city is None. Currently, doing that isn’t easy.

– You could physically travel to a place that the service we use doesn’t recognize – and, while fun, this is not a viable long-term testing strategy.

– You could use a mock. Python’s requests-mock library lets you make it so that requests doesn’t make an actual request but returns whatever you told it to return.

While the second solution is less cumbersome than moving to a different city, it’s still problematic because it messes with global states. For instance, we wouldn’t be able to execute our tests in parallel (since each changes the behavior of the same requests module).

If we want to make code more testable, we first need to break it down into separate functions according to area of responsibility (I/O, app logic, etc.).

Step 2: Creating separate functions

Our main job at this stage is to determine areas of responsibility. Does a piece of code implement the application logic, or some form of IO – web, file, or console? Here’s how we break it down:

# IO logic: save history of measurements
class DatetimeJSONEncoder(json.JSONEncoder):
    def default(self, o: Any) -> Any:
        if isinstance(o, datetime):
            return o.isoformat()
        elif is_dataclass(o):
            return asdict(o)
        return super().default(o)


def get_my_ip() -> str:
    # IO: load IP from HTTP service
    url = "https://api64.ipify.org?format=json"
    response = requests.get(url).json()
    return response["ip"]


def get_city_by_ip(ip_address: str) -> str:
    # IO: load city by IP from HTTP service
    url = f"https://ipinfo.io/{ip_address}/json"
    response = requests.get(url).json()
    return response["city"]


def measure_temperature(city: str) -> Measurement:
    # IO: Load API key from file
    with open("secrets.json", "r", encoding="utf-8") as file:
        owm_api_key = json.load(file)["openweathermap.org"]

    # IO: load measurement from weather service
    url = (
        "https://api.openweathermap.org/data/2.5/weather?q={0}&"
        "units=metric&lang=ru&appid={1}"
    ).format(city, owm_api_key)
    weather_data = requests.get(url).json()
temperature = weather_data["main"]["temp"]
    temperature_feels = weather_data["main"]["feels_like"]
    return Measurement(
        city=city,
        when=datetime.now(),
        temp=temperature,
        feels=temperature_feels
    )


def load_history() -> History:
    # IO: load history from file
    history_path = Path("history.json")
    if history_path.exists():
        with open(history_path, "r", encoding="utf-8") as file:
            history_by_city = json.load(file)
            return {
                city: HistoryCityEntry(
                    when=datetime.fromisoformat(record["when"]),
                    temp=record["temp"],
                    feels=record["feels"]
                ) for city, record in history_by_city.items()
            }
    return {}


def get_temp_diff(history: History, measurement: Measurement) -> TemperatureDiff|None:
    # App logic: calculate temperature difference
    entry = history.get(measurement.city)
    if entry is not None:
        return TemperatureDiff(
            when=entry.when,
            temp=measurement.temp - entry.temp,
            feels=measurement.feels - entry.feels
        )
def save_measurement(history: History, measurement: Measurement, diff: TemperatureDiff|None):
    # App logic: check if should save the measurement
    if diff is None or (measurement.when - diff.when) > timedelta(hours=6):
        # IO: save new measurement to file
        new_record = HistoryCityEntry(
            when=measurement.when,
            temp=measurement.temp,
            feels=measurement.feels
        )
        history[measurement.city] = new_record
        history_path = Path("history.json")
        with open(history_path, "w", encoding="utf-8") as file:
            json.dump(history, file, cls=DatetimeJSONEncoder)


def print_temperature(measurement: Measurement, diff: TemperatureDiff|None):
    # IO: format and print message to user
    msg = (
        f"Temperature in {measurement.city}: {measurement.temp:.0f} °C\n"
        f"Feels like {measurement.feels:.0f} °C"
    )
    if diff is not None:
        last_measurement_time = diff.when.strftime("%c")
        msg += (
            f"\nLast measurement taken on {last_measurement_time}\n"
            f"Difference since then: {diff.temp:.0f} (feels {diff.feels:.0f})"
        )
    print(msg)


def local_weather():
    # App logic (Use Case)
    ip_address = get_my_ip() # IO
    city = get_city_by_ip(ip_address) # IO
    measurement = measure_temperature(city) # IO
    history = load_history() # IO
    diff = get_temp_diff(history, measurement) # App
    save_measurement(history, measurement, diff) # App, IO
    print_temperature(measurement, diff) # IO

[source]

Notice that we now have a function that represents our use case, the specific scenario in which all the other functions are used: local_weather(). Importantly, this is also part of app logic; it specifies how everything else should work together.

Also note that we’ve introduced dataclasses to make return values of functions less messy: Measurement, HistoryCityEntry, and TemperatureDiff. They can be found in the new [typings module].

As a result of the changes, our code has become more cohesive – all stuff inside one function mostly relates to doing one thing. By the way, the principle we’ve applied here is called the [Single-responsibility principle] (the “S” from [SOLID].

Of course, there’s still room for improvement – e.g., in measure_temperature, we do both file IO (read a secret from disk) and web IO (send a request to a service).

Let’s recap:

– we wanted to have separate tests for things our code does;

– that got us thinking about the responsibilities for different areas of our code;

– by making each piece have just a single responsibility, we’ve made them testable

So, let’s write the tests now.

Tests for step 2

@pytest.mark.slow
def test_city_of_known_ip():
    assert get_city_by_ip("69.193.168.152") == "Astoria"


@pytest.mark.fast
def test_get_temp_diff_unknown_city():
    assert get_temp_diff({}, Measurement(
        city="New York",
        when=datetime.now(),
        temp=10,
        feels=10
    )) is None

A couple of things to note here.

Our app logic and console output execute an order of magnitude faster than IO, and since our functions are somewhat specialized now, we can differentiate between fast and slow tests. Here, we do it with custom pytest marks (pytest.mark.fast) defined in the project’s [config file]. This is useful, but more on it later.

Also, take a look at this test:

@pytest.mark.fast
def test_print_temperature_without_diff(capsys: pytest.CaptureFixture):
    print_temperature(
        Measurement(
            city="My City",
            when=datetime(2023, 1, 1),
            temp=21.4,
            feels=24.5,
        ),
        None
    )

    assert re.match(
        (
            r"^Temperature in .*: -?\d+ °C\n"
            r"Feels like -?\d+ °C$"
        ),
        capsys.readouterr().out
    )

Note that before, we’d have to drag the whole application along if we wanted to check print output, and manipulating output was very cumbersome. Now, we can pass the print_temperature function whatever we like.

Problem: fragility

Our tests for high-level functionality call details of implementations directly. For instance, the E2E test we’ve written in step 1 (test_local_weather) relies on the output being sent to the console. If that detail changes, the test breaks.

This isn’t a problem for a test written specifically for that detail (like test_print_temperature_without_diff [here] – it makes sense we need to change it if the feature has changed.

However, our E2E test wasn’t written to test the print functionality; nor was it written specifically for testing the provider. But if the provider changes, the test breaks.

We might also want to change the implementation of some functions – for instance, break down the measure_temperature() method into two to improve cohesion. A test calling that function would break.

All in all, our tests are fragile. If we want to change our code, we also have to rewrite tests for that code, which means a higher cost of change.

Problem: dependence

This is related to the previous problems. If our tests call the provider directly, then any problem on their end means our tests will crash.

If the IP service is down for a day, then our tests won’t be able to execute any code inside local_weather() that runs after determining IP – and we won’t be able to do anything about it. And if you’ve got a problem with internet connection, none of the tests will run at all, even though the code might be fine.

Problem: can’t test the use case

On the surface – yes, the tests do call local_weather(), which is our use case. But they don’t test that function specifically, they just execute everything there is in the application. Which means it’s difficult to read the results of such a test, it will take you more time to understand where the failure is localized. Test results should be easy to read.

Problem: excessive coverage

One more problem is that with each test run, the web and persistence functions get called twice: by the E2E test from step 1 and by the more targeted tests from step 2.

Excessive coverage isn’t great – for one, the services we’re using count our calls, so we better not make them willy-nilly. Also, if the project continues to grow, our test base will get too slow.

All these problems are related, and to solve them, we need to write a test for our coordinating functions that doesn’t invoke the rest of the code. To do that, we’d need test doubles that could substitute for real web services or writing to the disk. And we’d need a way to control what specific calls our functions make.

Step 3: Decoupling dependencies

To achieve those things, we have to write functions that don’t invoke stuff directly but instead call things passed to them from the outside – i.e., we [inject dependencies].

In our case, we’ll pass functions as variables instead of specifying them directly when calling them. An example of how this is done is presented in [step 3]

def save_measurement(
    save_city: SaveCityFunction, # the IO call is now passed from the outside
    measurement: Measurement,
    diff: TemperatureDiff|None
):
    """  
    If enough time has passed since last measurement, save measurement. 
    """
    if diff is None or (measurement.when - diff.when) > timedelta(hours=6):
        new_record = HistoryCityEntry(
            when=measurement.when,
            temp=measurement.temp,
            feels=measurement.feels
        )
        save_city(measurement.city, new_record)

Before, in step 2, the save_measurement function contained both app logic (checking if we should perform the save operation) and IO (actually saving). Now, the IO part is injected. Because of this, we now have more cohesion: the function knows nothing about IO, its sole responsibility is your app logic.

Note that the injected part is an abstraction: we’ve created a separate type for it, SaveCityFunction, which can be implemented in multiple ways. Because of this, the code has less coupling. The function does not depend directly on an external function; instead, it relies on an abstraction that can be implemented in many different ways.

This abstraction that we’ve injected into the function means we have inverted dependencies: the execution of high-level app logic no longer depends on particular low-level functions from other modules. Instead, both now only refer to abstractions.

This approach has plenty of benefits:

– reusability and changeability – we can change e. g. the function that provides the IP, and execution will look the same

– resistance to code rot – because the modules are less dependent on each other, changes are more localized, so growing code complexity doesn’t impact the cost of change as much

– and, of course, testability

Importantly, we did it all because we wanted to run our app logic in tests without executing the entire application. In fact, why don’t we write these tests right now?

Tests for step 3

So far, we’ve applied the new approach to save_measurement – so let’s test it. Dependency injection allows us to write a test double that we’re going to use instead of executing actual IO:

@dataclass
class __SaveSpy:
    calls: int = 0
    last_city: str | None = None
    last_entry: HistoryCityEntry | None = None


@pytest.fixture
def save_spy():
    spy = __SaveSpy()
    def __save(city, entry):
        spy.calls += 1
        spy.last_city = city
        spy.last_entry = entry
    yield __save, spy

This double is called a spy; it records any calls made to it, and we can check what it wrote afterward. Now, here’s how we’ve tested save_measurement with that spy:

@pytest.fixture
def measurement():
    yield Measurement(
        city="New York",
        when=datetime(2023, 1, 2, 0, 0, 0),
        temp=8,
        feels=12,
    )

@allure.title("save_measurement should save if no previous measurements exist")
def test_measurement_with_no_diff_saved(save_spy, measurement):
    save, spy = save_spy

    save_measurement(save, measurement, None)

    assert spy.calls == 1
    assert spy.last_city == "New York"
    assert spy.last_entry == HistoryCityEntry(
        when=datetime(2023, 1, 2, 0, 0, 0),
        temp=8,
        feels=12,
    )


@allure.title("save_measurement should not save if a recent measurement exists")
def test_measurement_with_recent_diff_not_saved(save_spy, measurement):
    save, spy = save_spy

    # Less than 6 hours have passed
    save_measurement(save, measurement, TemperatureDiff(
        when=datetime(2023, 1, 1, 20, 0, 0),
        temp=10,
        feels=10,
    ))

    assert not spy.calls


@allure.title("save_measurement should save if enough time has passed since last measurement")
def test_measurement_with_old_diff_saved(save_spy, measurement):
    save, spy = save_spy

    # More than 6 hours have passed
    save_measurement(save, measurement, TemperatureDiff(
        when=datetime(2023, 1, 1, 17, 0, 0),
        temp=-2,
        feels=2,
    ))
 assert spy.calls == 1
    assert spy.last_city == "New York"
    assert spy.last_entry == HistoryCityEntry(
        when=datetime(2023, 1, 2, 0, 0, 0),
        temp=8,
        feels=12,
    )

[source]

Note how much control we’ve got over save_measurement. Before, if we wanted to test how it behaves with or without previous measurements, we’d have to manually delete the file with those measurements – yikes. Now, we can simply use a test double.

There are plenty of other advantages to such tests, but to fully appreciate them, let’s first achieve dependency inversion in our entire code base.

Step 4: A plugin architecture

At this point, our code is completely reborn. Here’s our central module, app_logic:

def get_temp_diff(
    last_measurement: HistoryCityEntry | None,
    new_measurement: Measurement
) -> TemperatureDiff|None:
    if last_measurement is not None:
        return TemperatureDiff(
            when=last_measurement.when,
            temp=new_measurement.temp - last_measurement.temp,
            feels=new_measurement.feels - last_measurement.feels
        )


def save_measurement(
    save_city: SaveCityFunction,
    measurement: Measurement,
    diff: TemperatureDiff|None
):
    if diff is None or (measurement.when - diff.when) > timedelta(hours=6):
        new_record = HistoryCityEntry(
            when=measurement.when,
            temp=measurement.temp,
            feels=measurement.feels
        )
        save_city(measurement.city, new_record) # injected IO


def local_weather(
    get_my_ip: GetIPFunction,
    get_city_by_ip: GetCityFunction,
    measure_temperature: MeasureTemperatureFunction,
    load_last_measurement: LoadCityFunction,
    save_city_measurement: SaveCityFunction,
    show_temperature: ShowTemperatureFunction,
):
    # App logic (Use Case)
    # Low-level dependencies are injected at runtime
    # Initialization logic is in __init__.py now
    # Can be tested with dummies, stubs and spies!

    ip_address = get_my_ip() # injected IO
    city = get_city_by_ip(ip_address) # injected IO
    if city is None:
raise ValueError("Cannot determine the city")
    measurement = measure_temperature(city) # injected IO
    last_measurement = load_last_measurement(city) # injected IO
    diff = get_temp_diff(last_measurement, measurement) # App
    save_measurement(save_city_measurement, measurement, diff) # App (with injected IO)
    show_temperature(measurement, diff) # injected IO

[source]

Our code is like a Lego now. The functions are assembled when the app is initialized (in __init__.py), and the central module just executes them. As a result, none of the low-level code is referenced in the main module, it’s all hidden away in sub-modules [console_io], [file_io], and [web_io]. This is what dependency inversion looks like: the central module only works with abstractions.

The specific functions are passed from elsewhere – in our case, the __init__.py module:

def local_weather(
    get_my_ip=None,
    get_city_by_ip=None,
    measure_temperature=None,
    load_last_measurement=None,
    save_city_measurement=None,
    show_temperature=None,
):
    # Initialization logic
    default_load_last_measurement, default_save_city_measurement =\
        file_io.initialize_history_io()
    return app_logic.local_weather(
        get_my_ip=get_my_ip or web_io.get_my_ip,
        get_city_by_ip=get_city_by_ip or web_io.get_city_by_ip,
        measure_temperature=measure_temperature or web_io.init_temperature_service(
            file_io.load_secret
        ),
        load_last_measurement=load_last_measurement or default_load_last_measurement,
        save_city_measurement=save_city_measurement or default_save_city_measurement,
        show_temperature=show_temperature or console_io.print_temperature,
    )

As a side note, initialization is here done with functions (file_io.initialize_history_io() and web_io.init_temperature_service()). We could just as easily have done the same with, say, a WeatherClient class and created an object of that class. It’s just that the rest of the code was written in a more functional style, so we decided to keep to functions for consistency.

To conclude, we’ve repeatedly applied dependency inversion through dependency injection on every level until the highest, where the functions are assembled. With this architecture, we’ve finally fully decoupled all the different areas of responsibility from each other. Now, every function truly does just one thing, and we can write granular tests for them all.

Tests for step 4

[Here’s the final version] of our test base. There are three separate modules here:

– e2e_test – we only need one E2E test because our use case is really simple. We’ve written that test at step 1.

– plugin_test – those are tests for low-level functions; useful to have, but slow and fragile. We’ve written them at step 2.

– unit_test – that’s where all the hot stuff has been happening.

The last module became possible once we introduced dependency injection. There, we’ve used [all kinds of doubles]:

dummies – they are simple placeholders
stubs – they return a hard-coded value
spies – we’ve talked about them earlier

They allow us a very high level of control over high-level app logic functions. Before, they would only be executed with the rest of our application. Now, we can do something like this:

Exercising our use case

@allure.title("local_weather should use the city that is passed to it")
def test_temperature_of_current_city_is_requested():    
    def get_ip_stub(): return "1.2.3.4"  
    def get_city_stub(*_): return "New York"  
    captured_city = None  
  
    def measure_temperature(city):  
        nonlocal captured_city  
        captured_city = city  
        # Execution of local_weather will stop here  
        raise ValueError()  
    
    def dummy(*_): raise NotImplementedError()  
  
    # We don't care about most of local_weather's execution,  
    # so we can pass dummies that will never be called    
    with pytest.raises(ValueError):  
        local_weather(  
            get_ip_stub,  
            get_city_stub,  
            measure_temperature,   
            dummy,  
            dummy,  
            dummy  
        )  
  
    assert captured_city == "New York"

We’re testing our use case (the local_weather function), but we’re interested in a very particular aspect of its execution – we want to ensure it uses the correct city. Most of the function is untouched.

Here’s another example of how much control we have now. As you might remember, our use case should only save a new measurement if more than 6 hours have passed since the last measurement.

How do we test that particular piece of behaviour? Before, we’d have to manually delete existing measurements – very clumsy. Now, we do this:

@allure.title("Use case should save measurement if no previous entries exist")  
def test_new_measurement_is_saved(measurement, history_city_entry):  
    # We don't care about this value:  
    def get_ip_stub(): return "Not used"  
    # Nor this:  
    def get_city_stub(*_): return "Not used"  
    # This is the thing we'll check for:  
    def measure_temperature(*_): return measurement  
    # With this, local_weather will think there is  
    # no last measurement on disk:    
    def last_measurement_stub(*_): return None  
  
    captured_city = None  
    captured_entry = None  
  
    # This spy will see what local_weather tries to  
    # write to disk:    
    def save_measurement_spy(city, entry):  
        nonlocal captured_city  
        nonlocal captured_entry  
        captured_city = city  
        captured_entry = entry  
  
    def show_temperature_stub(*_): pass  
  
    local_weather(  
        get_ip_stub,  
        get_city_stub,  
        measure_temperature,  
        last_measurement_stub,  
        save_measurement_spy,  
        show_temperature_stub,  
    )  
  
    assert captured_city == "New York"  
    assert captured_entry == history_city_entry

We can control the execution flow for local_weather() to make it think there’s nothing on disk without actually reading anything. Of course, it’s also possible to test for opposite behavior – again, without any IO (this is done in test_recent_measurement_is_not_saved()). These and other tests check all the steps of our use case, and with that, we’ve covered all possible execution paths.

A test base with low coupling

The test base we’ve built has immense benefits.

Execution speed

Because our code has low coupling and our tests are granular, we can separate the fast and slow tests. In pytest, if you’ve created the custom “fast” and “slow” marks as we’ve discussed above, you can run the fast ones with a console command:

pytest tests -m "fast"

Alternatively, you could do a [selective run from Allure Testops] – pytest custom marks are automatically converted into Allure tags, so you can select tests by the “fast” tag.

The fast tests are mainly from the unit_test module – it has the “fast” mark applied globally to the entire module. How can we be sure that everything there is fast?

Because everything in that module is decoupled, you can unplug your computer from the internet, and it will run just fine. Here’s how quick the unit tests are compared to those that have to deal with external resources:

We can easily run these fast tests every time we make a change, so if there is a bug, we’ll know it’s in the code we’ve written since the recent test run.

Longevity

Another benefit to those quick tests we’ve just made is longevity. Unfortunately, throwing away unit tests is something you’ll inevitably have to do. In our case, thanks to interface segregation and dependency injection, we’re testing a small abstract plug point and not technical implementation details. Such tests are likely to survive longer.

Taking the user’s point of view

Any test, no matter what level, forces you to look at your code from the outside, see how it might be taken out of the context where it was created to be used somewhere else.

With low-level tests, you extract yourself from the local context, but you’re still elbow-deep in code. However, if you’re testing an API, you’re taking the view of a user (even if that user is future you or another programmer). In an ideal world, a test always imitates a user.

A public API that doesn’t depend on implementation details is a contract, a promise to the user – here’s a handle, it won’t change (within reason). If tests are tied to this API (as our unit tests are), writing them makes you view your code through that contract. You get a better idea of how to structure your application. And if the API is clunky and uncomfortable to use, you’ll see that, too.

Looking into the future

The fact that tests force you to consider different usage scenarios also means tests allow you to peek into the future of your code’s evolution.

Let’s compare step 4 (inverted dependencies) with step 2 (where we’ve just hidden stuff into functions). The major improvement was decoupling, with its many benefits, including lower cost of change.

But at first sight, step 2 is simpler, and in Python, simple is better than complex, right? Without tests, the benefits of dependency inversion in our code would only become apparent if we tried to add more stuff to the application.

Why would they become apparent? Because we’d get more use cases and need to add other features. That would both expose us to code rot and make us think. We’d see the cost of change. Well, writing tests forces you to consider different use cases here and now.

This is why the structure we’ve used to benefit our tests turns out to be super convenient when we need to introduce other changes into code. To show that, let’s try to change our city provider and output.

Step 4 (continued): Changing without modifying

New city provider

First, we’ll need to write a new function that will call our new provider:

def get_city_by_ip(ip: str):
    """Get user's city based on their IP"""

    url = f"https://geolocation-db.com/json/{ip}?position=true"
    response = requests.get(url).json()
    return response["city"]

[source]

Then, we’ll have to call local_weather() (our use case) with that new function in __main__.py:

 local_weather(get_city_by_ip=weather_geolocationdb.get_city_by_ip)

Literally just one line. As much as possible, we should add new code, not rewrite what already exists. This is such a great design principle that it got its own name, the open/closed principle.

We were led to that principle because we wanted a test to run our app logic independently of the low-level functions. As a result, our city provider has become a technical detail that can be exchanged at will.

Also, remember that these changes don’t affect anything in our test base. We can add a new test for the new provider, but the existing test base runs unchanged because we’re adding, not modifying.

New output

Now, let’s change how we show users the weather. At first, we did it with a simple print() call. Then, to make app logic testable, we had to replace that with a function passed as a variable. It might seem like an unnecessary complication we’ve introduced just for the sake of testability. But what if we add a simple UI and display the message there?

It’s a primitive Tkinter UI; you can [take a look at the code here], but the implementation doesn’t matter much right now, it’s just a technical detail. The important thing is: what do we have to change in our app logic? Literally nothing at all. Our app is just run from a different place (the Tkinter module) and with a new function for output:

def local_weather():  
    weather.local_weather(  
        show_temperature=show_temperature  
    )

This is triggered by a Tkinter button.

It’s important to understand that how we launch our application’s main logic is also a technical detail; it doesn’t matter that it’s at the top of the stack. From the point of view of architecture, it’s periphery. So, again, we’re extending, not modifying.

Conclusion

Testability and SOLID

Alright, enough fiddling with the code. What have we learned? Throughout this example, we’ve been on a loop:

we want to write tests
but we can’t because something is messy in the code
so we rewrite it
and get many more benefits than just testability

We wanted to make our code easier to test, and the SOLID principles helped us here. In particular:

Applying the single responsibility principle allowed us to run different behaviors separately, in isolation from the rest of the code.
Dependency inversion allowed us to substitute expensive calls with doubles and get a lightning-fast test base.
The open/closed principle was an outcome of the previous principles, and it meant that we could add new functionality without changing anything in our existing tests.
It’s no big surprise why people writing about SOLID mention testability so much: after all, the SOLID principles [were formulated by the man who was a major figure in Test-Driven Development].

But TDD is a somewhat controversial topic, and we won’t get into it here. We aren’t advocating for writing tests before writing code (though if you do – that’s awesome). As long as you do write tests and listen to testers in your team – your code will be better. Beyond having fewer bugs, it will be better structured.

Devs need to take part in quality assurance; if the QA department just quietly does its own thing, its efficiency is greatly reduced. As a matter of fact, it has been measured that

“having automated tests primarily created and maintained either by QA or an outsourced party is not correlated with IT performance.”

Tests give hints on how well-structured your code is

Let’s clarify causation here: we’re not saying that testability is at the core of good design. It’s the other way round; following engineering best practices has good testability as one of many benefits.

What we’re saying is writing tests gets you thinking about those practices. Tests make you look at your code from the outside. They make you ponder its changeability and reusability – which you then achieve through best engineering practices. Tests give you hints about improving the coupling and cohesion of your code. If you want to know how well-structured your code is, writing tests is a good, well, test.

Authors

Artem Eroshenko CPO & Co-founder of Qameta Software

Artem Eroshenko, CPO and
Co-Founder Qameta Software

Maksim Stepanov, Software Developer Qameta Software

Mikhail Lankin, Content Writer Qameta Software

Allure Report is an Exhibitor at EuroSTAR 2024, join us in Stockholm

The Deloitte Digital Tester: Lifting the Curtain on AI-powered Automation

May 2, 2024 by Lauren Payne

In today’s technology field it’s almost impossible to discuss the future without mentioning the big impact Artificial Intelligence (AI) will have, specifically Generative AI. The concept behind generative AI is not new, but has seen a breakthrough to a wider audience in recent years, with AI-tools such as OpenAI ChatGPT, DALL-E, Microsoft Copilot becoming available. Generative AI is able to – you guessed it – generate outputs such as text, images, audio and all of their derivatives, which up until recently were the exclusive domain of the human brain.

Generative AI is driving a wave of technological innovation that can be felt far and wide. This is no different for our own field of expertise: Digital Test Management. We’re currently still witnessing the early days of a transformation which will impact most aspects of how we test and validate the software applications we use every day. The importance of ensuring whether an application is working as designed and intended has always been paramount: how can we ensure the trust of people and organizations in products and services if we don’t have the evidence to show for it? Generative AI will play a big role in transforming our ways of testing, allowing people to work more efficiently, automating many tasks and greatly improve test coverage. In the end this will make it possible to find and fix more defects earlier, ensuring a much smoother – and in many cases safer – end-user experience.

To this end, we want to introduce the Deloitte Digital tester, an AI-driven test platform which is capable of supplementing human testers as an automated and independent tester across the entirety of the testing journey!

Introducing the Deloitte Digital Tester

The recent advances have made it possible to pioneer a novel concept: a fulltime, AI-driven ‘digital tester’ that is capable of interpreting requirements and user stories to subsequently create test cases, execute them, log defects and finally report on the result. This is not science-fiction, but already reality. The Deloitte Digital Tester solution was built to operate across all phases of testing and fully integrates with existing testing tools and ecosystems. It can autonomously perform the following tasks:

Test Design: generate test cases from requirements and user stories.
Test Planning & Scripting: generate automated test cases or convert existing ones.
Test Data Creation: generate relevant test data to support test execution.
Test Execution: execute test cases and validate the results, all to create defects and support tickets where required.
Reporting: show clearly the test execution progress and outcome analysis by generating consolidated reports.

The Deloitte Digital Tester is not intended to replace human testers, but rather supplement them. This brings some key benefits: human testers will be able to focus on defining and validating business focused test strategies, design and architecture. They will be able to execute more value-adding test cases in parallel to the digital tester, evaluate the overall results and confirm the accuracy. Freeing up human testers also allows for more exploratory-based testing and validation of business interactions in the application. It also keeps testers motivated by automating many repetitive tasks and repeated regression testing. Additionally, the specific skills of the Deloitte Digital Tester allow the solution to do many additional test runs and eliminate blind spots, strongly increasing test coverage while reducing the overall time required to script and run tests.

Benefits of Next-Gen Automation

Our AI-solutions plays a key role in taking Test Automation to the next level. For several years already the concept of Test Automation has firmly rooted itself in the test management sphere: before we had AI, test cases were already being automated to be run as many times as required, and without human input other than creating the automated test cases in the first place and interpreting the results. This allowed for quicker test execution, high test case reusability, improved regression testing and even the creation of large amounts of test data to assist manual testing. With today’s AI-capabilities, the Deloitte Digital Tester is taking the next step: it is able to automate the creation and execution of test cases and evaluate the outcomes. This approach brings many benefits, the main ones we’ll go through here:

Continuous Testing becomes a truly integral part of the software development process, rather than testing being handled during specific phases, often post-development. The Deloitte Digital Tester allows us to automate testing activities and ensure quicker execution and more efficient identification of defects. This can be realized through In-sprint automation, where test cases are created while development is still ongoing. These test cases can already expose defects that can subsequently be addressed as early as possible. The sooner a defect is found, the lower its impact and the cheaper the cost to fix it.
AI/ML-based Test Data Management utilizes AI and machine learning to optimize the generation of representative test data while at the same time masking sensitive information such as personal data or confidential information. Integrating the Deloitte Digital Tester with the wider testing ecosystem makes the benefits even greater by increasing efficiency across the full landscape.
Self-Healing allows automated test cases to automatically update themselves in response to changes in the application’s development. Traditionally automated test scripts required a certain level of human intervention to cover changes in the application being tested. The Deloitte Digital Tester is AI-enabled and scriptless, while at the same time employing machine learning algorithms to dynamically adapt to changes in the application, reducing the need for maintenance. This is particularly relevant in Agile CI/CD environments, where rapid iterations are dependent on efficient regression testing.
Increased Test Coverage is realized by AI-driven automated testing. This allows for a scalable and broad-spectrum approach to validate end-to-end (E2E) processes. Additionally, AI algorithms enable us to identify and prioritize test cases based on their potential impact on critical business processes, thereby optimizing test coverage to focus on the areas with the highest risk.
Product Validation at Scale is a complex sounding term that signifies how the Deloitte Digital Tester enables organizations to industrialize testing by automating repetitive testing tasks and streamlining the testing process. By standardizing testing practices and creating reusable test assets, organizations can achieve consistency and efficiency in product validation efforts. Additionally, AI capabilities facilitate the analysis of test results and identification of trends or patterns across multiple product lines, enabling continuous improvement of testing processes and product quality.

An Impact that Matters

The above brings us to a key question: what impact does the Deloitte Digital Tester make once an organization chooses to implement it? Especially in case of projects with multiple releases the benefits can be very high when compared to the initial investment. What’s important to note is that the business case behind implementing the Digital Tester will depend on the testing maturity level of the organization. The stronger an organization’s testing capabilities are, the faster the Digital Tester will breakeven and start to provide ongoing benefits and efficiencies compared to traditional automation or manual testing. However, even in case of low maturity levels of adoption, the Deloitte Digital Tester allows an organization to break-even 2-4 months earlier compared to manual testing.

Let’s assume that an organization is running on average 500 test cases per month, and they choose to automate up to 80% of their testing lifecycle efforts.^[1] If we are looking at a timeline of 1 year, we can distinguish 2 phases:

Ramp-Up Period (4 months in case of high maturity): The Digital Tester requires an initial period during which the solution is trained, and its automation capabilities are being built. If more traditional testing is already going on during this time, this will require more resourcing to maintain these efforts in parallel.
Benefit Realization Period (8 months & beyond): Once implemented, efficiencies and automation come into play. This drastically reduces any efforts to automate activities and sets the stage for manual testing to be much more focused and exploratory, for example cross-functional end-to-end-testing, business interaction testing and risk-based deep dives.
[1] The complexity distribution of test cases in this example is 30% simple, 30% medium, 20% complex, 20% very complex.

In this particular use case, we observe the following quality outcomes by increasing test coverage and next-gen automation:

Future-Proofing your Testing Capabilities

The advent of Generative AI, exemplified by the Deloitte Digital Tester, marks a big step forward in the evolution of software testing and quality assurance. As our world becomes ever more complex, increased consumer scrutiny, regulations and market trends are challenging organizations in many ways to be more fast and agile. In this context, it’s hard to overestimate the increased importance of testing new software applications in the most efficient and thorough way possible. The Deloitte Digital Tester can play a pivotal role in taking your testing capabilities to the next level. This enables you to ensure the smoothest and safest end-user experience for products and services, crucial to maintaining the trust of all stakeholders. It is often said that building a reputation takes years but losing it can happen in a matter of days or even hours.

Rising to meet these challenges, the Deloitte Digital Tester represents a pivotal shift towards a future where AI-driven automation augments human testing capabilities, upgrading the way we approach software testing. As organizations embrace this technological evolution, they not only future-proof their testing processes but also pave the way for innovation and excellence in the way they deliver their products and services.

Contact Details

Thomas Clijsner, Partner, Deloitte Risk Advisory

Tel: +32 479 65 06 96 Email: tclijsner@deloitte.com

Rohit Neil Pereira, Principal, Deloitte Consulting LLP

Tel: +1 916 803 0079 Email: ropereira@deloitte.com

Ramneet Singh, Director Deloitte Risk Advisor

Tel: +32 471 61 89 67 Email: ramnesingh@deloitte.com

Dirk Evrard, Manager Deloitte Risk Advisor

Tel: +32 472 75 92 03 Email: dievrard@deloitte.com

Deloitte is an Exhbitior at EuroSTAR 2024, join us in Stockholm

Lowering Testing Barriers with Computer Vision-Based AI

April 30, 2024 by Lauren Payne

The constant surge in digital transformation forces organizations to perform tests on an increasing number of platforms but still get results as quickly as possible. Manual testing just can’t keep up with the speed of business, so teams turn to test automation. But traditional software testing tools are only effective to a point. Generally, these tools rely on identifying objects on the screen through their internal representation—e.g., coordinates, class name, type, and many other. This method of identifying objects can be very fragile. Even a small change might result in the tool failing to find the object. The drawbacks of these techniques prevent teams from scaling their test automation efforts up to the levels they require.

To that end, the most common test automation challenges include:

• Relentless test maintenance—Tests that rely on unique object properties can be susceptible to breaks, thereby making testers perform regular updates to ensure their tests still run on each supported environment.

Test execution time is too long—Even if a test set runs without interruption, it can take a significant amount of time to run all the tests to completion.
Insufficient test coverage—Teams must support an ever-expanding range of platforms, devices, and operating systems, requiring testers to customize the tests for each environment.
Test creation fatigue—It takes time to build and design effective tests, with much effort required to uniquely identify on-screen objects that are part of the test.

What our research at OpenText revealed was that automated object detection with computer vision is key to lowering these barriers.

Computer Vision-Based AI for Automated Object Detection

Recognizing objects without knowledge of their internal representation is one key objective to developing an AI engine. This goal can be accomplished by combining AI-based algorithms that accurately and consistently recognize objects regardless of device or environment.

For example, a test step might require clicking the shopping cart icon on a mobile app. The AI engine should be able to locate the shopping cart icon on the current screen without knowing:

If the screen is on a mobile device.
Whether the device is running Android or iOS.
If the screen is a desktop browser.
Whether it’s Chrome, Firefox, Edge, or another browser.

The ability to “Click the shopping cart” step should work under any circumstance with an AI engine using computer vision through an artificial neural network and optical character recognition.

Why Computer Vision?

An AI engine understands a screen’s composition and breaks it down into the unique objects that it contains. Additionally, the AI engine knows nothing about the implementation of the object. It treats the object as an image, regardless of the device or platform it comes from. As such, a powerful computer vision tool is needed and should be supported by an artificial neural network (ANN), a layered structure of algorithms that classify objects. It will train the ANN with many visual objects, resulting in a model that identifies objects it will likely encounter in applications under test (AUT). Thus, when the AI engine is tasked with locating a specific object, it utilizes the model to identify a match in the AUT.

In terms of architecture, a best practice is to implement the AI engine as a separate module. Rather than restricting it to a specific product, any product can theoretically use the engine.

OCR-Based Identification for Text Objects

AI engines also need to leverage OCR to identify text-based objects. These objects may themselves be part of the test, or they could function as a hint to identify the object’s relative location. This capability is useful if an object appears multiple times on a screen. For example, a login screen might have two text boxes, one for the username and one for the password. OCR helps identify which of the edit boxes is which. OCR can also identify a button by its textual caption.

Lowering and Removing Test Automation Barriers

AI-based test automation reduces the time it takes to build and design tests because objects are identified simply by looking at them. AI algorithms lower skill barriers because they identify most objects and are hidden from the user. Teams can also use the same test without modification on different devices and platforms. They simply procure an appropriate device and run the test on it as-is. And because the algorithm doesn’t rely on an object’s underlying implementation and properties, the test keeps running even if there is a change. If the test’s flow stays the same, the test will continue to run.

The final barrier yet to be removed completely is test execution time. Tests will always take a finite time to run; hence there is a lower limit on the amount of time they take. However, AI-based testing helps teams test earlier and provides robust mechanisms that parallelize and optimize test execution, reducing the wait time for results.

Author

Michael O’Rourke, Product Marketing Manager, DevOps Cloud

Michael O’Rourke is a product marketing technologist in cloud, enterprise software, and DevOps. His diverse background derives from 20 years of experience at HPE, IBM, T-Mobile, Micro Focus, and more. He holds a degree in Management Information Systems and is a certified Product Owner, Scrum Master, PMP, and Pragmatic Marketing practitioner. He is also an international speaker, trainer, and blogger. At OpenText, Michael drives the development and execution of go-to-market strategies for OpenText’s DevOps Cloud.

OpenText is an EXPO Gold Sponsor at EuroSTAR 2024, join us in Stockholm.

The Essentials of Test Data Management in Modern Software Development

April 25, 2024 by Lauren Payne

In today’s fast-paced software development world, Test Data Management (TDM) is more than a technical necessity; it’s a strategic asset. Let’s unpack the essentials of TDM and how it influences the quality, efficiency, and compliance of software testing.

The Core of Test Data Management

At its heart, TDM is about efficiently creating and managing data used for testing software applications. This involves ensuring the data is realistic, comprehensive, and secure, enabling testers to simulate real-world scenarios accurately.

Key Challenges in Test Data Management

Data Complexity: Modern applications demand complex and diverse data sets. TDM solutions must provide ways to generate and manage these data sets efficiently.
Data Privacy and Compliance: With regulations like GDPR, ensuring test data complies with privacy laws is crucial. TDM plays a vital role in anonymizing and protecting sensitive information.
Efficient Test Data Management: Balancing the need for quality data with storage and performance constraints requires efficient management of test data, often across multiple environments.

Approaches to Effective Test Data Management

Data Insight: Understanding the structure and dependencies within your data is vital. Data insight tools aid in creating more effective and relevant test data by providing a deeper understanding of the underlying data.
Data Masking: A critical aspect of TDM, data masking involves obscuring sensitive data within a test dataset. It ensures that the privacy and integrity of personal or confidential data are maintained, while still providing a functional dataset for testing.
Synthetic Data Generation: This involves creating artificial, non-sensitive data that closely mimics real-world data, addressing both complexity and privacy concerns.
Data Subsetting: This approach focuses on creating smaller, more manageable versions of your databases that contain only the data necessary for specific tests. It helps in reducing storage requirements and improving the performance of test environments.
Database Virtualization: Virtualizing databases allows for the creation of multiple, isolated test environments without physically replicating data. It’s essential for managing test data across different scenarios efficiently and reducing storage costs.
Automated Test Data Provisioning: Automation in TDM can significantly reduce the time and effort required to prepare test data, leading to more agile and efficient testing cycles.

The Impact of TDM on Software Development

Implementing robust TDM strategies leads to:

Improved Software Quality: Accurate and comprehensive test data ensures more effective testing, leading to higher-quality software.
Enhanced Compliance: With proper data masking and anonymization, TDM helps in maintaining compliance with data privacy laws.
Increased Efficiency: Automated and streamlined TDM processes contribute to faster testing cycles, reducing time-to-market for software products.

Conclusion

Test Data Management is an indispensable part of modern software development. Its impact on software quality, compliance, and efficiency cannot be overstated. Whether you’re a developer, a QA professional, or a project manager, understanding and implementing effective TDM practices is key to the success of your software projects. Tools like DATPROF play a supportive role in this journey, offering practical solutions to the complex challenges of TDM. Come meet us at EuroSTAR to learn more and see DATPROF in action!

Author

Maarten Urbach

Maarten Urbach has spent over a decade helping customers enhance test data management. His work focuses on modernizing practices in staging and lower level environments, significantly improving software efficiency and quality. Maarten’s expertise has empowered a range of clients, from large insurance firms to government agencies, driving IT innovation with advanced test data management solutions.

DATPROF is an exhibitor at EuroSTAR 2024, join us in Stockholm.

Top 10 Quality Issues to Solve at EuroSTAR 2024

April 23, 2024 by Lauren Payne

As we approach another EuroSTAR in Stockholm, many of us in IT and testing are reflecting on how we can improve our processes and strategies. It will be halfway through 2024, a time of year when doubts and concerns can creep in about our testing goals and improvements.

As you review your software quality strategy, I’d like you to reconsider our impulse towards ever-increasing test automation. Are we falling into the trap of trying to eat faster to lose weight? By only accelerating our efforts, we fail to confront the real root causes of testing inefficiencies and bugs.

You can’t automate quality into software

Just as diet fads promise thinness through gimmicks, we’ve been sold a fantasy. It promises us that more test automation will solve all our quality problems. But, while judicious automation provides value, many teams over-invest in automation at the cost of broader quality blockers.

When you have a hammer, everything looks like a nail, so teams hammer away endlessly to construct vast automated architectures. Meanwhile, quality lingers at the same mediocre levels.

10 Software Quality Issues to Address at EuroSTAR 2024

A common set of fundamental issues plague software projects. Teams often cite problems like:

Confidence and Stability – Frequent defects erode trust in releases
Defects into Production – Poor protection of live environments
Insufficient Test Time – Perpetual last minute “hardenings”
Release Uncertainty – Go/no-go decisions go down to the wire
Failing Requirements – Poorly defined scope leads to endless clarifications
Developer Rework – High levels of unplanned work
Team Misalignment – Lack of transparency across functional groups
Knowledge Silos – Bottlenecks form around key people or tools
Bloated Testing – Massive, unwieldy automation suites requiring heavy maintenance
Technical debt – Volumes of (re)work build over time, with insufficient knowledge to tackle it

Rather than focus on accelerating test execution speed, we need to confront why these problems arise in the first place. Increasing execution automation acts as a bandage; quality gaps stem from deeper process and strategy issues.

From silver bullets to software quality

At EuroSTAR 2024, let’s resolve to understand these root causes and thoughtfully solve them. For example, what drives unstable requirements? Is our analysis happening too late? What drives last minute surprises? Are we integrating and testing incrementally? Do our teams have transparency to coordinate their efforts? Are our tools and environments configured efficiently?

Thoughtful process analysis and improvement is less flashy than automation. Yet, it is far more impactful. Techniques like value stream mapping can uncover waste and barriers. Then, we can apply lean principles like limiting work in progress, optimizing flow, and amplifying feedback loops.

Rather than mindlessly generate more test cases, we should carefully curate automated checks to maximise value. Shifting left helps prevent defects, while good pipelines and test data strategies better isolate changes to fail fast. Teams skilled in exploratory testing and bug advocacy can further spotlight weaknesses early.

A measured (and measurable) approach to software quality

Let’s ring in EuroSTAR 2024 with renewed discipline against reactive thinking. Measure first, understand next, then optimize sustainably. Partner with stakeholders to align priorities. Anchor automation in business needs, not false promises of all-encompassing test suites. Spend smart to conserve budget for high-impact interventions.

Test excellence comes not from hasty automation, but thoughtful rigor, transparency, and accountability. Progress may seem slower, but leads to stable, high-velocity teams. Development, testing, and operations must come together as one delivery team sharing data, tools, and practices.

By taking a measured, evidence-based approach, we can target the disease rather than just treat the symptoms. Just as sustainable diets come from lifestyle changes, let’s commit to curing our quality ills through systems thinking.

This year, at EuroSTAR, let’s fix the fundamentals. Our automation will still be there to serve us, at sustainable velocities and capacities serving downstream needs. Set aside reactionary tactics, and instead bank quality through proactive strategies. Another EuroSTAR brings new perspectives, if we remain open to self-reflection and growth.

Restoring Confidence and Alignment with Curiosity Modeller

I speak to many organizations who experience the recurring quality issues and process misalignments discussed in this blog, each eroding their release confidence.

These challenges all have common roots:

1. Lack of transparency;

2. Incomplete system comprehension;

3. Inadequate feedback loops;

4. Unconnected teams.

Too often software gets built fast then tested slow. Teams lack shared artifacts to capture decisions and expected behaviours, undermining unified understanding.

Curiosity Modeller tackles these systemic issues by making system behaviour explicit early through collaborative models. These living models form the core artifact driving understanding, alignment and test generation.

Curiosity Modeller restores confidence and release quality by:

Visualizing expected functionality clearly across groups – no more hidden assumptions or differing interpretations of requirements.
Auto-generating optimal test cases to validate actual vs intended behaviour – preventing defects via early testing and signalling.

Producing regenerative tests tied directly to the models – no more realigning stale regression suites or maintaining copious test automation artifacts.
Enabling behaviour simulation for rapid prototyping – failing fast to prevent downstream rework.
Integrating with test execution and auto-generating Test Automation – overcoming misalignment, endless maintenance and skills silos.

Supporting API testing to safely exercise business logic – going beyond fragile end user flows.

Generating high-value test data to focus coverage on key scenarios – informed by risk models.

Shift left to deliver quality

Instead of intensifying downstream testing, Curiosity Modeller shines a light starting left in the lifecycle. Visual flows form the central artifact aligning groups on system behaviour, while preventing defects before code gets written. This proactive approach restores trust, accelerates releases, facilitates coordination and uplifts quality engineering. It delivers confidence through deep comprehension.

Find us at EuroSTAR 2024!

The Curiosity team will be in the EuroSTAR Expo hall in Stockholm – drop by to discuss how you can build software confidence early and throughout your delivery pipeline. Before then, why not head to our website to learn more about Curiosity Modeller, try it for yourself, and talk to us about your quality needs?

Author

Rich Jordan

Rich Jordan has spent the past 20 years leading change within the testing industry, primarily within Financial Services. He has led enterprise transformations and quality teams who have won awards in both Testing and DevOps categories. Rich has been an advocate of model-based test automation and test data innovation for over a decade, and joined Curiosity in November 2022.

Curiosity Software is an Exhibitor at EuroSTAR 2024, join us in Stockholm.

Power Up Your Test Automation Practices With AI: Unlock Key Use Cases

April 9, 2024 by Lauren Payne

With the rapid pace of development cycles and the complexity of modern software systems, manual testing alone often can’t meet the demands of quality assurance. This is where test automation comes into play, offering efficiency, accuracy, and scalability.

However, even with automation, challenges can still arise, such as maintaining test scripts, handling dynamic user interfaces, and detecting subtle defects. Enter AI, a game-changer poised to revolutionize test automation.

By infusing AI and ML into test automation, testers can build better automations faster through supercharged productivity, as well as improve accuracy and time-to-value through combining Generative AI and Specialized AI. Plus, testers can unlock new use cases by building AI-powered automations.

So, what are some of the top uses for AI and ML in testing that can supercharge your application testing practices?

Deploy an agent that performs testing fully autonomously

An AI-powered agent can seamlessly tackle the challenge of finding critical problems in your applications, as it can interact with an application constantly. Then, the agent can build a model of your application, discover relevant functionality, and find bugs related to performance, stability, and usability. An agent can also aid in creating a resilient object repository while navigating through a target application, gathering reusable controls for future test case development. The potential of AI doesn’t stop there—the agent can then continuously verify and refresh controls within an object repository, enabling self-healing and maintaining automated tests.

Generate automated low-code and coded tests from step-by-step manual tests

Have manual tests that you want to convert to automated tests? With the power of AI, you can accelerate automation by generating automated low-code and coded tests from manual tests, as well as leverage a flexible automation framework to ensure the resilience of your automated tests. And remember the object repository that your AI-fuelled agent assisted with creating? Equipped with this object repository, you can use AI to consider and smartly reuse any kind of object, such as buttons, tables, and fields.

Create purposeful and complex test data

With AI-infused large language models, you can supercharge your data through enhanced synthetic test data generation for manual and automated test cases. Using AI also enables you to create meaningful test data faster, allowing you to handle intricate data dependencies across multiple test data dimensions.

Streamline automated localization testing by leveraging semantic translation

By integrating AI into your test automation practices, you can leverage semantic automation and translation to remove the need for creating separate test cases for each language. The result? Maximized efficiency through seamless automated localization testing. Plus, you can run your automated test cases in different languages, allowing you to expand and scale your testing capabilities globally.

Overall, there’s unlimited potential for AI to supercharge continuous testing across the entire lifecycle—from defining stories, to designing tests, to automating and executing tests, to analyzing results.

UiPath Test Suite for AI-powered test automation

UiPath Test Suite, the resilient testing solution powered by the UiPath Business Automation Platform, offers production-grade, AI-fueled, low-code, no-code, and coding tools so you can automate testing for any technology while still managing testing your way. Later this year, you’ll be able to unlock AI-infused use cases for test automation, such as test generation, coded automations, and test insights, with Autopilot for Test Suite.

Author

Sophie Gustafson, Product Marketing Manager, UiPath Test Suite

Sophie Gustafson has worked at UiPath for two years and is currently a product marketing manager for Test Suite. Sophie has previous experience working in the consulting and tech industries, specializing in content strategy, writing, and marketing.

UiPath is an EXPO Platinum Partner at EuroSTAR 2024, join us in Stockholm.

Empowering Enterprises with Seamless Test Execution on a Unified Test Execution Environment

April 2, 2024 by Lauren Payne

The digital landscape is evolving every day and ensuring software quality is extremely important To ensure the applications meet the standards of functionality, reliability, and performance, businesses rely on extensive testing practices. Nevertheless, there are many hurdles to overcome to conduct tests successfully and efficiently due to the sheer complexity and size of current software systems.

Overseeing test execution gets harder as businesses mature and their software ecosystems get more and more complex. Traditional approaches often result in inefficiencies, delays, and increased expenses because they use diverse tools, fragmented processes, and fragmented teams.

These challenges are easily resolved with a unified test execution infrastructure, providing an integrated structure for managing and carrying out tests over the entire software development lifecycle. Enterprises can broaden test execution with ease and maximize efficiency and quality via a unified infrastructure, which integrates testing tools, standardizes processes, and fosters cooperation.

Unified Test Execution – The Need of the Hour

Businesses frequently use an assortment of testing frameworks and tools to meet distinct technological and testing requirements. However supporting this fragmented ecosystem can be challenging and can cause problems with compatibility, integration, and overhead.

As teams or projects function independently in siloed test environments, it may result in duplication, inaccurate testing procedures, and a lack of visibility across the operation. It can hinder interactions, limit teamwork, and reduce the effectiveness of the testing process as a whole.

Establishing consistency, repeatability, and scalability in test execution requires standardizing testing procedures and centralizing testing infrastructure. Enterprises can gain greater oversight and insight over their testing attempts, enhance resource utilization, and accelerate workflows by implementing a unified approach in testing.

LambdaTest: Empowering Enterprises with AI-driven Test Execution

The unified test execution environment offered by LambdaTest revolutionized the way businesses plan, organize, and execute their testing activities. LambdaTest’s range of AI-powered capabilities enables enterprises to increase test efficiency, enhance test infrastructure management, and deliver software designed to be of better quality at scale.

Through an assortment of innovative capabilities, LambdaTest uses artificial intelligence (AI) to improve testing processes. Its Auto Heal feature efficiently recognizes and fixes issues with the test environment in real time, minimizing interruptions and ensuring testing operations progress. The capacity to identify test failures promptly with fail-fast capabilities allows teams to address vulnerabilities early in the development cycle and accelerate resolution, thus enhancing overall efficiency. Also, test cases get intelligently prioritized by the Test Case Prioritization functionality using AI algorithms based on their impact and likelihood of failure. Teams can reduce time-to-market and improve software quality by employing this strategic approach to focus on high-risk areas, increase testing coverage within restricted schedules, and swiftly address important issues.

Moreover, GPT-powered RCA (Root Cause Analysis) offers deeper insights into the underlying causes of test failures by analyzing test results and historical data. By identifying patterns, trends, and potential correlations, the AI engine enables teams to address root causes effectively and prevent the recurrence of issues. Furthermore, the Test Intelligence module provides actionable insights derived from comprehensive test data and analytics.

By aggregating metrics, performance indicators, and user feedback, LambdaTest empowers teams to make informed, data-driven decisions, optimize testing strategies, and continuously enhance software quality.

Conclusion

LambdaTest’s unified test execution environment, enriched with AI features such as Auto heal, Fail fast, Test case prioritization, GPT-powered RCA, and Test intelligence with test insights represents a significant advancement in enterprise test automation. By harnessing the power of AI, LambdaTest empowers organizations to streamline test execution, mitigate risks, and deliver superior software products that meet the demands of today’s dynamic market landscape.

Author

Mudit Singh

A product and growth expert with 12+ years of experience building great software products. A part of LambdaTest’s founding team, Mudit Singh has been deep-diving into software testing processes working to bring all testing ecosystems to the cloud. Mudit currently is Head of Marketing and Growth for LambdaTest.

Lambdatest is an EXPO Gold Sponsor at EuroSTAR 2024, join us in Stockholm.

EuroSTAR Conference

What metrics should I have?

Deployment frequency

Lead time for changes

Change fail rate

Time to restore services

Where to start?

How to proceed once we have data?

Author

Joonas Jauhiainen, DevOps Lead

Step 1: a quick and dirty version

Step 2: Creating separate functions

Tests for step 2

Problem: fragility

Problem: dependence

Problem: can’t test the use case

Problem: excessive coverage

Step 3: Decoupling dependencies

Tests for step 3

Step 4: A plugin architecture

Tests for step 4

Exercising our use case

A test base with low coupling

Execution speed

Longevity

Taking the user’s point of view

Looking into the future

Step 4 (continued): Changing without modifying

New city provider

New output

Conclusion

Testability and SOLID

Tests give hints on how well-structured your code is

Authors

Artem Eroshenko, CPO and Co-Founder Qameta Software

Maksim Stepanov, Software Developer Qameta Software

Mikhail Lankin, Content Writer Qameta Software

Introducing the Deloitte Digital Tester

Benefits of Next-Gen Automation

An Impact that Matters

Future-Proofing your Testing Capabilities

Contact Details

Thomas Clijsner, Partner, Deloitte Risk Advisory

Rohit Neil Pereira, Principal, Deloitte Consulting LLP

Ramneet Singh, Director Deloitte Risk Advisor

Dirk Evrard, Manager Deloitte Risk Advisor

Computer Vision-Based AI for Automated Object Detection

Why Computer Vision?

OCR-Based Identification for Text Objects

Lowering and Removing Test Automation Barriers

Author

Michael O’Rourke, Product Marketing Manager, DevOps Cloud

The Core of Test Data Management

Key Challenges in Test Data Management

Approaches to Effective Test Data Management

The Impact of TDM on Software Development

Conclusion

Author

Maarten Urbach

You can’t automate quality into software

10 Software Quality Issues to Address at EuroSTAR 2024

From silver bullets to software quality

A measured (and measurable) approach to software quality

Restoring Confidence and Alignment with Curiosity Modeller

Shift left to deliver quality

Find us at EuroSTAR 2024!

Author

Rich Jordan

Author

Sophie Gustafson, Product Marketing Manager, UiPath Test Suite

Unified Test Execution – The Need of the Hour

LambdaTest: Empowering Enterprises with AI-driven Test Execution

Conclusion

Author

Mudit Singh

Artem Eroshenko, CPO and
Co-Founder Qameta Software