Thursday, August 28, 2014

Where do these Sporadics come from?

In part2 of this series I concluded that automating the detection of intermittent, random or non-deterministic tests (aka Sporadics) comes with unforeseeable extra costs. It may serve as a monitoring of Sporadics and might yield information to rank the Sporadics to decide which should be solved first. But if the efforts stop there then one has basically invested in managing status quo instead of changing it for the better. 

Before proposing a possible solution to the Sporadics problem I need to elaborate more on how a test becomes a Sporadic. Understanding this will lend hints to the solution to be established.

The basic assumption is that all tests have been successful when they were first added to their test suites in the first place. Otherwise talking about Sporadics would not be the topic to talk about …

Suppose there is a test which ran successful for a fair amount of time. Weeks or even months. Everything has been fine until the day came as it turned red for the very first time. A test failure occured. Nothing special. What happened back then? Someone might have investigated the issue. After some thorough work one came to one or more of these conclusions:

  1. The test failed because of a bug in the test, the bug got fixed
  2. The test failed because of a bug in the productive code, the bug got fixed
  3. The test ran successful again when the complete test run has been repeated
  4. The test ran successful when repeated in separation, so there supposedly was no issue with the test or the code under test itself
  5. The test failure does not relate to the change made to the productive code at all. Strange, but well (imagine a shrug, when reading)
  6. There has been some filer outage and the test wasn’t able to read or write a file or there has been any other issue with some part of the infrastructure the test was using
  7. You name it


(I will not go into detail about inherent fragile tests some of which could be a great source of Sporadics. Nor will I elaborate on possible root causes that would make a test intermittent. This is not the point in this post. I will save this for later ones.)

We are not talking about the first to items in this list. These are the good cases where the safety net worked and the required actions have been taken.

We are not talking about the occurrences when a real root cause analysis has been made, the problem has been found and fixed. These things happen probably more often than not, especially if finding 5 was not accompanied by finding 3 or 4, but not every time as the number of Sporadics in a corpus of tests will tell you.

Item number 6 would be an easy catch: Infrastructure issue. The guys maintaining it fixed the issue or the issue has been a temporary one. However the test has been good all the time. No-one has to do anything about it. Sure it will be green again when the infrastructure issue does not reappear.

Items 3 and 4 tend to be soothing enough for the guy investigating this issue that no other actions followed. Looks good now. So, just merge into mainline. Must have been some hiccup sort of.

Item number 5 consumed the most time investigating. It looks strange, but a rerun standalone and as part of its test suite succeeded. Somehow it leaves a bad taste in the mouth. But hey, didn’t it succeed all the time? And now it does again. Let’s leave it alone and do some important work, what d’you think?

While item 6 is a bad signal itself, items 3, 4, 5 are the ones that could break the neck of the company. If we are lucky they “only” signal issues with the tests themselves. Some hidden dependency that appears in some strange situations just on days of certain signs of the zodiac while the moon is in a particular position … you’ve got the idea. If we are not that lucky they were only representing the tip of the iceberg and there has been some non-deterministic behavior in our productive code which might lead to loss of data or any other hazardous event when in production use at a customer’s site. Maybe you only have a hard time analyzing it working long hours and at weekends or you might be facing a PR disaster or even worse a substantial claim for damages.

Experience shows that test failures like items 3 through 5 are the more easily shrugged off the more often they occur. These failures somehow are not taken as serious as they should be. Quite often there is an argumentation including pure statistics or issue tracker records for the code under test showing that it is not worth the time one would have to spend to find the root cause or if the root cause is known to fix it. Me personally, I witnessed such a line of thought more than once. However, tests that failed this way for a first time will start to fail a second, a third time and over again. And there you are. Now you have a Sporadic. A known issue. An item on a list or a record in a database. And one by one they creep in.

Why could this happen? In an ideal world a developer would follow the Continuous Integration principle and thus would be eager to get rid of any test failure in mainline builds at any time. For we are not living in such an ideal world things are a bit different. There are test failures that won't be fixed for reasons listet above. It would be too easy to blame developers for not caring. 

Developers find themselves confronted with various, concurrent and sometimes even contradicting requirements. Features always come first for the company gets paid for it. Maintenance is important too. Not to forget quality. Or some refactorings. Issues with sporadic failing tests for features delivered long ago (several weeks or months) tend to get lost in this. They just don’t reach the level of urgency they would require to get the necessary attention.

In part1 I was complaining about the attitude developers show towards the Sporadics. This attitude gets influenced by the whole environment developers find themselves in. So just introducing yet another tool will not help. There is also the social aspect of this. Or as +Steve Sether pointed out in his comment on my second post of this series:

"Since the problem is essentially a social problem, I think we should look towards social science for guidance.  It's been experimentally verified that people discount any potential badness that happens in the future.  So, bad thing in the future is much better (in people’s minds) than bad thing in the near present."

So let's explore the possible solution next time around. The teasing will have an end then ...


Read also:



The opinions expressed in this blog are my own views and not those of SAP


Friday, August 22, 2014

Let the CI server take care of these Sporadics

In my last post on Sporadics (http://bit.ly/sporadics-1) I came to the conclusion that Sporadics do matter. Sporadics litter your test suites with failures. Any test suite with failures requires attention. Someone needs to have a closer look to figure out what went wrong.

If test suites usually succeed, any test failure would signal an issue that was just introduced by the very change or in case of nightly integration runs by a limited number of changes. In any case it is worth the work one has to invest.

With Sporadics creating a constant buzz of test failures there are test suites that fail with a varying number of failures. Any of them any time requires someone to have a closer look and chances are that only the “known” issues occurred. If this happens over and over again less and less attention will be paid and more and more “unknown” issues will go unnoticed. With an increasing number of Sporadics it becomes easier for a severe failure to slip through. The safety net will cease to be of any worth.

Sporadics are not only dangerous but expensive and frustrating too. So, someone or something has to deal with them. Who or what could this possibly be?

In agile methodology we want to automate as much as possible if not everything. So what about automation of sporadic detection?

It seems to be feasible, doesn't it? We have monotonous and repetitive work which in fact is just plain pattern matching: Does the failure look like one we already have in our list? Just compare test output of any kind like log files, backtraces and so on with the known pattern and mark the failure as known issue maybe even with a unique identifier attached to it. A rather simple script could do this.

And the best place to run such a script would be the CI server where all the changes for mainline are being built and tested anyway. And just add it to the nightly builds of mainline too. With that we would have a nice automated detection of sporadics and would always know whether or not a change introduces new bugs we really have to care for instantly.

Well, sounds promising. No more of these stupid tasks to scan test results. Just have a look at the CI server output and you instantly know the change looks good or not.

Done. Continuous integration and automation did the job.

Really?

Three questions arise - at least:
  1. How precise should the pattern matching work? 
  2. What if we find a known Sporadic in a test run? What should be the action to be taken? 
  3. How would a new test failure be marked as Sporadic and how would a Sporadic be removed from the list? 
Suppose the productive code or test code changes and the known Sporadic slightly moves its location in the code or the lines written to the log change and do not fit to the stored log content of the known Sporadic anymore. How flexible should the pattern matcher be? If it's too tight it might report known Sporadics as new ones thus prolonging the list of known Sporadics unduly. If it's too lose it might report new failures as known ones which would increase the risk of severe issues to slip through.

Only these few thoughts show that a Sporadics detector itself would be a quite sophisticated piece of code which brings its own risks that would need to be taken care of. And don't forget the amount of work required to develop and maintain it.

Suppose a Sporadics detector is in place and works quite well. What happens if it detects known Sporadics in a test run? Should it just report them and mark the test run a success if there are no other failures? Should it try to rerun the Sporadics to make them a success in the second run? Or in the third run? Or …?

How far would you go to make the tests green? And what if the Sporadic wouldn't turn green this time? Would it be a failure then? Although you know it's a Sporadic? Probably one would do without rerunning Sporadics. Just rely on the detection. If a known Sporadic has been identified then accept it as it is and remove it from the test result.

Suppose there is a test failure and it is not a known Sporadic. Should it be taken as a failure someone has to investigate immediately? Or would you try a rerun of all failures to check whether or not they will fail again?

In the first scenario some developer or QA guy would have to investigate the failure. If it turns out to be a Sporadic then this guy would have to add it to the database of known Sporadics for the Sporadic detector to find it the next time it occurs.

In the second scenario the Sporadic detector would do the job of rerunning and guessing. If it does not fail in a number of reruns again, it will be considered a Sporadic and will be added to the database. If not, the test run will be a failure.

So managing the list of known Sporadics will introduce either some extra work by a developer or QA person. Or in case of automation it will add extra run time on each and every test run for every test failure will be treated as a possible Sporadic. Thus test runs will be prolonged, turn around times will increase and the throughput of the CI server will decrease. Which in turn means extra costs.

If one is willing to spend these costs to catch those Sporadics. What is the tradeoff? With this automation of Sporadic detection we take some load from the developers. They do no longer have to analyze all these Sporadics manually. Which gives them more time to add new features. Sure one would like this one!

Really?

We were talking about attitude towards Sporadics. No-one takes them seriously. They keep on piling up. Does the automation of Sporadic detection help in changing this behavior? I don't think so. It's the complete opposite. Sporadics are getting even more out of focus. Developers do not need to care for them anymore. They are no longer a pain in the … for them. One does not need to be a fortuneteller to expect the number of Sporadics to be further increasing.

To be honest. If the corpus of tests is in bad condition and there are more failing test runs then ones that succeed such a mechanism (as cheap as possible) might help to classify the test failures to have a list of Sporadics one could work on. But then don't the developers do have these lists of their own Sporadics already? They did use them for their manual checks. They are written down in some issue tracker, wiki, excel sheet, text document or on plain post-it notes on some whiteboard. All these solutions are by far cheaper then the automation and automated central management of Sporadics. Last but not least these solutions keep the developers in the loop. They could feel the pain of Sporadics everyday.

Something else would be required to solve the Sporadics issue. Something that really puts pressure on the developers to solve it. But thats another story.



The opinions expressed in this blog are my own views and not those of SAP

Friday, August 15, 2014

Sporadics don't matter

In my daily work I quite frequently stumble over the phenomenon of - as we call it - Sporadics. Sporadics are tests that fail every now and then for a bunch of reasons. Other great blog posts call them intermittent test failures (http://spin.atomicobject.com/2012/04/27/intermittent-test-failures/) or non-deterministic failing tests (http://martinfowler.com/articles/nonDeterminism.html).

However the phenomenon bothers me a lot these days. But it’s not so much the Sporadic itself but the attitude towards them. Every time a Sporadic occurs I hear sentences like:

“This is just a Sporadic you can merge into mainline anyway”
“You can’t get rid of Sporadics, you have to deal with them statistically”
“It’s too expensive to fix them”
“It’s infrastructure anyway”
“There are too many of them to tackle them. Just ignore.”

In general people (aka developers) tend to consider them as a minor issue or as a matter of fate you can’t cope with anyway. That leads to the impression that Sporadics just exist but they do not tell you anything about the quality of your product just like the “real” test failures do. Real test failures signal a broken feature that has to be fixed or a changed one that makes the tests fail for expectations changed. One could fix that. One could even write a ticket for and resolve it in due time. Everything is nice and comparatively easy when it comes to “real” test failures.

However, over the last years I came to the impression that the “real” test failures which always get much attention and awareness are not the ones I should be afraid of but that the Sporadics are really the ticking bomb. How come?

As said before for some reason developers tend to avoid analyzing Sporadics. Probably this hasn’t been always the case but comes out of experience. Often enough the root cause of the failure turned out to be some problem with the test infrastructure or a bug in the test itself. So, no real problem with the application has been found. Somehow this notion has been settled in. Sporadic failure means infrastructure or test issue means no bug means ignore the red test and merge into mainline.

But what happens underneath? Is this all there is? What about the application under test showing a non-deterministic behavior?

These things happen (own experience). And they are what gives me the creeps. Non-deterministic behavior of the application yields strange behavior at customer site which is very hard to analyze and thus very expensive to fix. And who’s to blame? If Sporadics are considered unimportant or even irrelevant no one will fix them. They keep piling up. And somewhere in this pile there these little bombs will hide. No one will stumble over these issues. They just keep creeping into the mainline by and by.

That’s why I consider Sporadics as the most important test failures out there. If a Sporadic occurs this has to be analyzed very fast. If you are lucky it really is an infrastructure or test design issue. But nevertheless go and fix them. Keep the Sporadics away from your test suites. Otherwise the information the tests could provide you with will decrease and at some point in time vanish into nothing.

In this series of blog posts I want to explore the phenomenon of Sporadics. I want to find out what they really are and what to do about them. Each Sporadic tells a story about your test environment, your test design and your application design. Each Sporadic failure serves as a signal to take action. Let’s see what one could do about them. I hope you will join me on this journey. 


The opinions expressed in this blog are my own views and not those of SAP