Sunday, September 28, 2014

To Quarantine or Not To Quarantine Contagious Tests

Something is rotten in (the state of) your code base and spreads like a disease. Where there was only one soon enough there will be many. They are contagious as hell - Sporadic failing tests in already secured code. Regressions that is.

In my last post I extended the Green Mainline Policy by a human factor which might help to bring the Green Mainline Policy forward in environments that are ridden with bugs and regressions and which want to find a way out of it. With this human touch comes the freedom of choice. Instead of having an uncompromisable automatism which would just close mainline until a proven fix arrived a human would be able to weigh different aspects against each other and to decide not to close mainline even if a Sporadic popped up. I also talked about the fact that in cases like this at least an initial analysis would be requested which would be the base for the decision. What could this decision look like? What options does the mainline owner have?

Well obviously mainline could be closed until a fix arrives. This would work if the cause of the test failure could be fixed easily. Mainline wouldn't be closed for long.

The second option would be to accept the fact that a fix would take fairly long, too long to close mainline this long time thus hindering other developers or teams to bring new changes to it. Acceptance of the fact alone wouldn't be enough for the test still fails and will do so with every new build pipeline run. Every subsequent run will or may show this error dependent of it being a Sporadic or a constant regression. In either way this test failure will make it hard to assess the run quickly for in case the run fails it is not obvious whether it fails due to the "known" issue or a not yet known one. A closer analysis would be required. In this situation there needs to be a mechanism that ensures the fix to come as quickly as possible while it makes sure the "known" issue does not shadow other failures. The mechanism has to make sure that a failure in a run really is a failure that needs attention payed to it.

The mechanism is known as Quarantine. I'm not the first to talk about a quarantine. Martin Fowler did cover this briefly in [1] and there are CI servers around that support a quarantine out of the box and I'm sure there are plenty of articles on this topic around. So, what's my take on this?

My feelings towards a quarantine are ambivalent. At the one hand I'd welcome it as a means to separate intermittent tests from the ones that would provide valuable feedback. On the other hand I suspect a quarantine would easily turn into a collection of failing tests that just keeps growing without anyone taking any actions on them. Because of that I would prefer a quarantine to be guarded with strict rules in order for it to be of constant value.

The basic idea of Green Mainline Policy is the fact that code in mainline is and stays regression free for it represents the state of the code that has already been shipped to customers or will be shipped with the next release. Keeping it from showing regressions including and especially sporadic ones is the major task. Usually this would be done while mainline is closed. If this is not possible a regression could be quarantined in very special cases.

I'd see these four reasons: The regression is caused by

  • a bug in a test that would need to undergo a major refactoring to fix it which takes some time
  • a bug in the application code itself which would need to undergo a major refactoring to fix it which takes some time
  • a bug in an external component where the fix of which takes some time to arrive if at all
  • a hard to analyze sporadicly failing test which would need to undergo thorough analysis before it could be classified as one of the first three cases

In these cases quarantine would serve as a to-do list of open issues to be worked on with highest priority. Any deterministic regression should fall in one of the first three categories.  

However quarantine will always be the broken window. It holds the failing tests that are not supposed to be there at all. What once have been useful tests that guarded the code base turned into a useless mess with no other feedback than the mere fact that they have become of no value. They could not be trusted anymore. 

Once a quarantine exists there will be the question: Why not add just one more intermittent test? Just one. Really! That is the crucial question of Green Mainline Policy for once you have opened the door for one others might slip in as well. In  my opinion the strict rule set and its strict appliance to each and every regression without any exception will be the only way to keep this under control. Otherwise you would start to hide failing tests in plain sight. 

But it is not sufficient to have a rule set for getting into quarantine. You would also need a clear rule for getting out of it. The first way out of quarantine would be a fix. When the fix arrives the previously failing test will be removed from quarantine. For deterministic regressions it's a matter of priorities. If fixing them is highest priority the according tests will leave quarantine soon enough. It is getting interesting if no fix is available which could have several reasons:

  • the priorities for analyzing and fixing complicated issues are not high enough 
  • fixing really takes long for a major issue has been detected which requires extended efforts to get fixed
  • the analysis of a sporadicly failing test takes very long or seems not feasible at all
  • the external component provider does not come up with the required fixes

The first of these reasons could be mitigated by inventing a timeout or a maximum amount limit for the quarantine. If the timeout would be reached the respecting test would move out of quarantine and if still failing mainline will be closed. The second one appears to be a candidate for the maximum amount limit. If the maximum amount limit would be reached the next test to be moved into quarantine would close mainline for at least one other test has to move out of quarantine which by definition only would succeed if a fix would be available. These two mechanisms would work as a priority manager. They would raise the priority of fixes automatically.

And what about the other two reasons? The Sporadic and the issue with the external component? It is not predictable when or if a fix will be available in due time. How long would you wait for a fix? It does not make any sense to wait forever. Release date will come. Would you ship a software with known regressions or issues with an external component? Probably not. So, they have to be dealt with. If quarantine is not empty at release date then the product would have to be shipped with limitations. Timeout or maximum amount do not help with that. At some point in time late enough to remain optimistic and early enough to mitigate the bugs that will not get fixed you need to decide whether or not to wait for the fixes any longer. Then time has come to apply other strategies. In the end you need to decide between:

  • finding a workaround to avoid the bug (if possible)
  • disable the feature if no workaround exists - one couldn't use it anyway (deliver with limitation, might be solved by later patch)
  • go back to earlier version of 3rd party component (if possible)

It might prove difficult to hit the best point in time for this decision. And here is the bad news about a quarantine: It moves the need to handle regressions into the future. It might be a not-so-far-away future but it still has the consequence that regressions not get handled when they occur. When you're striving for a continuous delivery then this would break your neck for you would have to ship a product with known regressions or you wouldn't ship for a long time if you are not prepared to ship with known regressions. Both of which does not comply to the continuous delivery idea at all.

So, what's my conclusion then? I'm convinced that a quarantine could be a useful tool. But it depends. The tool itself will not fix you any bug. We've talked about this earlier. If there is a culture where regressions and Sporadics are considered evil and highest priority then this tool could help cleaning them up by making them visible. If there is a culture where these things are always overruled by something seemingly more important then a quarantine will just be another issue tracker with loads of issues never to be handled or like the to-do lists on dozens of desks which never will be worked on to the very end. It's just another container of things one should try to do when there is some time left. It's the people again and the culture they live up to that makes the difference.

Me personally I'd rather not have a quarantine. It's too many rules to know, to follow, and to think about. But I understand that sometimes you would need it to reach the level of maturity not to need it anymore.

Read also:

The opinions expressed in this blog are my own views and not those of SAP

"Something is rotten in the state of Denmark."
Hamlet (1.4), Marcellus to Horatio


Post a Comment