Error, fault and failure – what’s it all about?

Since I recently started my career as a QA I am still learningQuality Assurance badge lots of new things. I wont lie if I say all the definitions from textbooks and ISTQB books still look a bit blurry to me. So every new breakthrough I achieve matters. Same case was with the definitions of “error”, “fault” and “failure”. At the beginning all these terms looked exactly the same to me, I thought “Why would anyone give a f**k what’s the definition, it has to be fixed, does it?” Well that’s what we are about to figure out.
First to mention is – in software business it’s all about credibility and every poor quality software product is kicked out with the speed of light. There’s a huge variety of software companies and if you don’t keep up top standards, you suck. You need to provide the best quality of software. Software without bugs is utopia but good software company doesn’t release a product with major issues, because they might cost a lot.

So let’s dive into definitions. Error – fault – failure might look familiar, but they actually have different impact.
Lets say we have a scenario like this:

A bank asks us to write a web based software which has only one job, to transfer money from client’s bank account to some cash desk where the client can withdraw. It only has log in form and one input field for the amount of money they want to transfer. They push one button and it’s done. Lets say we forgot to test if the input takes negative value. And we release … now if a client inputs negative value he’ll actually gain money. This is a bug we all know this, and it goes live you will cry.

Now, I know that’s a dumb example but it’s simple enough to notice few things:
What caused the problem is a simple “Error”, a human mistake, probably one or two lines of code.
In this case may be a simple check like this will do it:

[code]if (input < 0) {
return //some error here

But it actually doesn’t matter, what does is “How it costs for the developer to fix it?” Well as we see it’s just a matter of review some code find the error and fix it.

If he misses it, it goes to the QA, if he finds it there we have the second definition – a “fault”, meaning the error, led to some incorrect behavior of the app – user is able to input invalid value. We log a bug, we fix it, it’s still cool and safe for the client.

Bet let’s say the QA missed this and it went live – that would be a disaster, this is a “failure”¬†– we failed to provide quality piece of software, our client might have lost millions because of this. What if it was a software for a space shuttle, or autopilot for an airplane, or a laser brain surgery tool – failures like these might cost even human lives. So having that in mind, we can describe the error – fault – failure dependency with this scheme.


So the conclusion is simple Рthe closer a problem gets to the client, the more effort it takes to get fixed, more money, more time and most of all Рthe highest price we pay is our good reputation Рonce  lost, it might take forever to build it again.

So be careful, code wise, test aggressive and clever, don’t be simply a developer or QA engineer, be a craftsman.