Friday, August 22, 2014

Let the CI server take care of these Sporadics

In my last post on Sporadics (http://bit.ly/sporadics-1) I came to the conclusion that Sporadics do matter. Sporadics litter your test suites with failures. Any test suite with failures requires attention. Someone needs to have a closer look to figure out what went wrong.

If test suites usually succeed, any test failure would signal an issue that was just introduced by the very change or in case of nightly integration runs by a limited number of changes. In any case it is worth the work one has to invest.

With Sporadics creating a constant buzz of test failures there are test suites that fail with a varying number of failures. Any of them any time requires someone to have a closer look and chances are that only the “known” issues occurred. If this happens over and over again less and less attention will be paid and more and more “unknown” issues will go unnoticed. With an increasing number of Sporadics it becomes easier for a severe failure to slip through. The safety net will cease to be of any worth.

Sporadics are not only dangerous but expensive and frustrating too. So, someone or something has to deal with them. Who or what could this possibly be?

In agile methodology we want to automate as much as possible if not everything. So what about automation of sporadic detection?

It seems to be feasible, doesn't it? We have monotonous and repetitive work which in fact is just plain pattern matching: Does the failure look like one we already have in our list? Just compare test output of any kind like log files, backtraces and so on with the known pattern and mark the failure as known issue maybe even with a unique identifier attached to it. A rather simple script could do this.

And the best place to run such a script would be the CI server where all the changes for mainline are being built and tested anyway. And just add it to the nightly builds of mainline too. With that we would have a nice automated detection of sporadics and would always know whether or not a change introduces new bugs we really have to care for instantly.

Well, sounds promising. No more of these stupid tasks to scan test results. Just have a look at the CI server output and you instantly know the change looks good or not.

Done. Continuous integration and automation did the job.

Really?

Three questions arise - at least:
  1. How precise should the pattern matching work? 
  2. What if we find a known Sporadic in a test run? What should be the action to be taken? 
  3. How would a new test failure be marked as Sporadic and how would a Sporadic be removed from the list? 
Suppose the productive code or test code changes and the known Sporadic slightly moves its location in the code or the lines written to the log change and do not fit to the stored log content of the known Sporadic anymore. How flexible should the pattern matcher be? If it's too tight it might report known Sporadics as new ones thus prolonging the list of known Sporadics unduly. If it's too lose it might report new failures as known ones which would increase the risk of severe issues to slip through.

Only these few thoughts show that a Sporadics detector itself would be a quite sophisticated piece of code which brings its own risks that would need to be taken care of. And don't forget the amount of work required to develop and maintain it.

Suppose a Sporadics detector is in place and works quite well. What happens if it detects known Sporadics in a test run? Should it just report them and mark the test run a success if there are no other failures? Should it try to rerun the Sporadics to make them a success in the second run? Or in the third run? Or …?

How far would you go to make the tests green? And what if the Sporadic wouldn't turn green this time? Would it be a failure then? Although you know it's a Sporadic? Probably one would do without rerunning Sporadics. Just rely on the detection. If a known Sporadic has been identified then accept it as it is and remove it from the test result.

Suppose there is a test failure and it is not a known Sporadic. Should it be taken as a failure someone has to investigate immediately? Or would you try a rerun of all failures to check whether or not they will fail again?

In the first scenario some developer or QA guy would have to investigate the failure. If it turns out to be a Sporadic then this guy would have to add it to the database of known Sporadics for the Sporadic detector to find it the next time it occurs.

In the second scenario the Sporadic detector would do the job of rerunning and guessing. If it does not fail in a number of reruns again, it will be considered a Sporadic and will be added to the database. If not, the test run will be a failure.

So managing the list of known Sporadics will introduce either some extra work by a developer or QA person. Or in case of automation it will add extra run time on each and every test run for every test failure will be treated as a possible Sporadic. Thus test runs will be prolonged, turn around times will increase and the throughput of the CI server will decrease. Which in turn means extra costs.

If one is willing to spend these costs to catch those Sporadics. What is the tradeoff? With this automation of Sporadic detection we take some load from the developers. They do no longer have to analyze all these Sporadics manually. Which gives them more time to add new features. Sure one would like this one!

Really?

We were talking about attitude towards Sporadics. No-one takes them seriously. They keep on piling up. Does the automation of Sporadic detection help in changing this behavior? I don't think so. It's the complete opposite. Sporadics are getting even more out of focus. Developers do not need to care for them anymore. They are no longer a pain in the … for them. One does not need to be a fortuneteller to expect the number of Sporadics to be further increasing.

To be honest. If the corpus of tests is in bad condition and there are more failing test runs then ones that succeed such a mechanism (as cheap as possible) might help to classify the test failures to have a list of Sporadics one could work on. But then don't the developers do have these lists of their own Sporadics already? They did use them for their manual checks. They are written down in some issue tracker, wiki, excel sheet, text document or on plain post-it notes on some whiteboard. All these solutions are by far cheaper then the automation and automated central management of Sporadics. Last but not least these solutions keep the developers in the loop. They could feel the pain of Sporadics everyday.

Something else would be required to solve the Sporadics issue. Something that really puts pressure on the developers to solve it. But thats another story.



The opinions expressed in this blog are my own views and not those of SAP

Friday, August 15, 2014

Sporadics don't matter

In my daily work I quite frequently stumble over the phenomenon of - as we call it - Sporadics. Sporadics are tests that fail every now and then for a bunch of reasons. Other great blog posts call them intermittent test failures (http://spin.atomicobject.com/2012/04/27/intermittent-test-failures/) or non-deterministic failing tests (http://martinfowler.com/articles/nonDeterminism.html).

However the phenomenon bothers me a lot these days. But it’s not so much the Sporadic itself but the attitude towards them. Every time a Sporadic occurs I hear sentences like:

“This is just a Sporadic you can merge into mainline anyway”
“You can’t get rid of Sporadics, you have to deal with them statistically”
“It’s too expensive to fix them”
“It’s infrastructure anyway”
“There are too many of them to tackle them. Just ignore.”

In general people (aka developers) tend to consider them as a minor issue or as a matter of fate you can’t cope with anyway. That leads to the impression that Sporadics just exist but they do not tell you anything about the quality of your product just like the “real” test failures do. Real test failures signal a broken feature that has to be fixed or a changed one that makes the tests fail for expectations changed. One could fix that. One could even write a ticket for and resolve it in due time. Everything is nice and comparatively easy when it comes to “real” test failures.

However, over the last years I came to the impression that the “real” test failures which always get much attention and awareness are not the ones I should be afraid of but that the Sporadics are really the ticking bomb. How come?

As said before for some reason developers tend to avoid analyzing Sporadics. Probably this hasn’t been always the case but comes out of experience. Often enough the root cause of the failure turned out to be some problem with the test infrastructure or a bug in the test itself. So, no real problem with the application has been found. Somehow this notion has been settled in. Sporadic failure means infrastructure or test issue means no bug means ignore the red test and merge into mainline.

But what happens underneath? Is this all there is? What about the application under test showing a non-deterministic behavior?

These things happen (own experience). And they are what gives me the creeps. Non-deterministic behavior of the application yields strange behavior at customer site which is very hard to analyze and thus very expensive to fix. And who’s to blame? If Sporadics are considered unimportant or even irrelevant no one will fix them. They keep piling up. And somewhere in this pile there these little bombs will hide. No one will stumble over these issues. They just keep creeping into the mainline by and by.

That’s why I consider Sporadics as the most important test failures out there. If a Sporadic occurs this has to be analyzed very fast. If you are lucky it really is an infrastructure or test design issue. But nevertheless go and fix them. Keep the Sporadics away from your test suites. Otherwise the information the tests could provide you with will decrease and at some point in time vanish into nothing.

In this series of blog posts I want to explore the phenomenon of Sporadics. I want to find out what they really are and what to do about them. Each Sporadic tells a story about your test environment, your test design and your application design. Each Sporadic failure serves as a signal to take action. Let’s see what one could do about them. I hope you will join me on this journey. 


The opinions expressed in this blog are my own views and not those of SAP