Retry harder – why rerunning tests is a bad idea?

Reading Time: 4 minutes

It’s a sunny day in the office, you went there earlier than anybody else, cause you’re a frikin’ overachiever, you grabbed a big cup of strong coffee, big glass of water to stay hydrated and you sit in your spot and look at the good old CI friend. But what do you see there?! A failure, a red run, a problem, what a blasphemy… But you know what to do – re-run, re-build, re-pend all your testing sins and 30 to 90 mins later… voila a green run (or repeat the above steps until it is green, all steps besides coffee, otherwise you’ll OD). Does the above sound familiar? Well, if it isn’t that’s what your colleagues do when they “fix flaky tests”. 

How many times you need to be hit in the same spot, to conclude it hurts?

 

Quite honestly, sometimes rerunning tests reminds me that Tom and Jerry animation where Tom sequentially steps on a shovel, rake, and other garden instruments.

So, to all rerunning guys, if you walk under an apple tree, and an apple falls and hits your head, will you shake it so another one falls and you can verify it hurts indeed?

If a fire alarm goes on, would you stop and see if it wouldn’t simply go away, or you will evacuate yourself?

Of course not, both cases you will just acknowledge the existence of problem and act in accordance with it. If you try to shut the “alarm” and pretend the problem doesn’t exist anymore, you’re simply burying your head in the sand.

See also  QAshido - The path of the tester.

So, where the problem really is?

The simple math behind rerunning tests.
“We added some stabilizations to tests by adding retry and rerun” is a phrase you will often hear from devs and automation QA.
It is quite simple to understand how rerunning tests, makes things appear as if “we stabilized the tests”.
Here’s a simple formula to find the percentage of failing test runs:
(number of failing test runs / total test runs) * 100
Last part is so we turn it into a percent.
Example: Let’s say you have test that fails 10 out of 100 runs, using the above formula that’s:
 (10/100) = 0.01 * 100 = 10% failure rate (in fact this was pretty obvious)
Now let’s say you add “rerun” for your flaky test in order to “stabilize it” and it will be rerun 2 more times. The formula would look like this:
 100 total runs + (10 * 3 number of reruns) or (10/130) * 100 = 7.69% failure rate.
Essentially, what you’re doing is bumping the total number of runs so what you provide is a smaller fraction of a bigger overall count, which is diluting the failures into a bigger bucket or runs. This is fraud, not a solution!

False alarms and courses of action

This is what flaky tests are indeed – false alarms, broken alarms, alarms that don’t operate as expected.

If you have a defective alarm what would you do? Well, fix it. Have you ever heard of a fireman that “reruns the alarm” 5 more times to prove you there’s no fire while your ass is burning? No.

So, there’s one single meaningful thing to do – fix the damn test. If you can’t fix it, delete it, you’re better off without it.

See also  Hindsight lessons about automation: Interfaces

To me, there isn’t really a dilemma here, the course of action goes like this:

  • Your test failed because it found a bug, you need to spend time, investigate, log a defect, test a fix, re-run your test to prove it is a fix and complete it.
  • Or your test is indicating problem with the test itself, you need to spend time, investigate, fix your test, re-run couple of times – make sure the fix works and complete it.

See the part that repeats? Investigate – “every failing test is an invitation for investigation” (Maaret Pyhäjärvi) and that’s what you need to do – invest yourself in order to do something meaningful with that information, not fool yourself and your colleagues.

Some lame excuses for rerunning tests

The most common excuses you will hear to rerun tests:

  • I rerun a test to find out if it will fail again.
    Well, it failed once, isn’t that enough? Plus, why do you stop when it passes? No longer a problem? I’d rather spend the time to find the tiny strings that test has, so it fails once and passes another time for no obvious reasons.
  • I rerun to see if it works on other platforms.
    That’s a good point, but you need to run on another platform in order to prove it. Regardless of if it does or doesn’t you still have an issue.
  • I rerun tests, because all of my colleagues do.
    Oh, c’mon man! Have some dignity!
  • We rerun tests, because we know they are unstable, but we don’t have time to fix them.
    Well, you’re making things worse – worse than failing tests are useless tests and tests that don’t provide accurate results are useless. You can skip these tests and that won’t make any difference.
  • Etc
See also  Hindsight lessons about exploration: The science of testing

Rerunning tests isn’t an honest, effective and moral QA practice

I guess one of the things that drives me mad when it comes to rerunning failing tests is that it isn’t moral to do so. Like, who the hell you think you’re fooling? Yourself, your colleagues.

If our job as QA/testing specialists is to provide a better-quality product, better insight to the stake holder or internal clients of the risks and the status of the product, how the hell does lying to them fits in the above?

Is it worth it to sacrifice quality, business success and our valuable time so we can protect some made up, fake reputation we have as automation engineers?

 So, tell me – what was the lamest excuse you last heard for rerunning a failing test? 😊

 

Please follow and like us:

Mr.Slavchev

Senior software engineer in testing. The views I express here are mine, they don't represent any position held by any of my employers. Experience in mobile, automation, usability and exploratory testing. Rebel-driven tester, interested in the scientific part of testing and the thinking involved. Testing troll for life. Retired gamer and a beer lover. Martial arts practitioner.

More Posts - Website

Follow Me:
LinkedIn

4 thoughts on “Retry harder – why rerunning tests is a bad idea?”

    1. Great post, the worst time you could spend is to rerun tests you know are failing. Several times I found myself testing some cases once failed and then they passed. Someone said: “failed APIs, unstable staging environment, dirty QA users and so on”. Thanks for sharing.

  1. “how the hell does lying to them fits in the above?”
    Because we already lied, and/or were lied to, that the typical UI or API test execution automation (running unattended on a server during the night) is the only way we can use specifically automation and more general development in testing.
    We don’t do semi-automation, like using it for speeding up our mor explorational testing.
    Even more we don’t develop anything else aside automation, no tools for anything. Gathering of data from different places and maybe converting it or downloading artifacts from our CI server.

    Unattended test execution is only tool out of many in our tool box.
    But the industry is obsessed with it and most don’t get that a tool box exists, despite they are having the skill to do so. Development.

    Your article was a nice read, thanks for that!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.