Sunday, September 28, 2014

To Quarantine or Not To Quarantine Contagious Tests

Something is rotten in (the state of) your code base and spreads like a disease. Where there was only one soon enough there will be many. They are contagious as hell - Sporadic failing tests in already secured code. Regressions that is.

In my last post I extended the Green Mainline Policy by a human factor which might help to bring the Green Mainline Policy forward in environments that are ridden with bugs and regressions and which want to find a way out of it. With this human touch comes the freedom of choice. Instead of having an uncompromisable automatism which would just close mainline until a proven fix arrived a human would be able to weigh different aspects against each other and to decide not to close mainline even if a Sporadic popped up. I also talked about the fact that in cases like this at least an initial analysis would be requested which would be the base for the decision. What could this decision look like? What options does the mainline owner have?

Well obviously mainline could be closed until a fix arrives. This would work if the cause of the test failure could be fixed easily. Mainline wouldn't be closed for long.

The second option would be to accept the fact that a fix would take fairly long, too long to close mainline this long time thus hindering other developers or teams to bring new changes to it. Acceptance of the fact alone wouldn't be enough for the test still fails and will do so with every new build pipeline run. Every subsequent run will or may show this error dependent of it being a Sporadic or a constant regression. In either way this test failure will make it hard to assess the run quickly for in case the run fails it is not obvious whether it fails due to the "known" issue or a not yet known one. A closer analysis would be required. In this situation there needs to be a mechanism that ensures the fix to come as quickly as possible while it makes sure the "known" issue does not shadow other failures. The mechanism has to make sure that a failure in a run really is a failure that needs attention payed to it.

The mechanism is known as Quarantine. I'm not the first to talk about a quarantine. Martin Fowler did cover this briefly in [1] and there are CI servers around that support a quarantine out of the box and I'm sure there are plenty of articles on this topic around. So, what's my take on this?

My feelings towards a quarantine are ambivalent. At the one hand I'd welcome it as a means to separate intermittent tests from the ones that would provide valuable feedback. On the other hand I suspect a quarantine would easily turn into a collection of failing tests that just keeps growing without anyone taking any actions on them. Because of that I would prefer a quarantine to be guarded with strict rules in order for it to be of constant value.

The basic idea of Green Mainline Policy is the fact that code in mainline is and stays regression free for it represents the state of the code that has already been shipped to customers or will be shipped with the next release. Keeping it from showing regressions including and especially sporadic ones is the major task. Usually this would be done while mainline is closed. If this is not possible a regression could be quarantined in very special cases.

I'd see these four reasons: The regression is caused by

  • a bug in a test that would need to undergo a major refactoring to fix it which takes some time
  • a bug in the application code itself which would need to undergo a major refactoring to fix it which takes some time
  • a bug in an external component where the fix of which takes some time to arrive if at all
  • a hard to analyze sporadicly failing test which would need to undergo thorough analysis before it could be classified as one of the first three cases

In these cases quarantine would serve as a to-do list of open issues to be worked on with highest priority. Any deterministic regression should fall in one of the first three categories.  

However quarantine will always be the broken window. It holds the failing tests that are not supposed to be there at all. What once have been useful tests that guarded the code base turned into a useless mess with no other feedback than the mere fact that they have become of no value. They could not be trusted anymore. 

Once a quarantine exists there will be the question: Why not add just one more intermittent test? Just one. Really! That is the crucial question of Green Mainline Policy for once you have opened the door for one others might slip in as well. In  my opinion the strict rule set and its strict appliance to each and every regression without any exception will be the only way to keep this under control. Otherwise you would start to hide failing tests in plain sight. 

But it is not sufficient to have a rule set for getting into quarantine. You would also need a clear rule for getting out of it. The first way out of quarantine would be a fix. When the fix arrives the previously failing test will be removed from quarantine. For deterministic regressions it's a matter of priorities. If fixing them is highest priority the according tests will leave quarantine soon enough. It is getting interesting if no fix is available which could have several reasons:

  • the priorities for analyzing and fixing complicated issues are not high enough 
  • fixing really takes long for a major issue has been detected which requires extended efforts to get fixed
  • the analysis of a sporadicly failing test takes very long or seems not feasible at all
  • the external component provider does not come up with the required fixes

The first of these reasons could be mitigated by inventing a timeout or a maximum amount limit for the quarantine. If the timeout would be reached the respecting test would move out of quarantine and if still failing mainline will be closed. The second one appears to be a candidate for the maximum amount limit. If the maximum amount limit would be reached the next test to be moved into quarantine would close mainline for at least one other test has to move out of quarantine which by definition only would succeed if a fix would be available. These two mechanisms would work as a priority manager. They would raise the priority of fixes automatically.

And what about the other two reasons? The Sporadic and the issue with the external component? It is not predictable when or if a fix will be available in due time. How long would you wait for a fix? It does not make any sense to wait forever. Release date will come. Would you ship a software with known regressions or issues with an external component? Probably not. So, they have to be dealt with. If quarantine is not empty at release date then the product would have to be shipped with limitations. Timeout or maximum amount do not help with that. At some point in time late enough to remain optimistic and early enough to mitigate the bugs that will not get fixed you need to decide whether or not to wait for the fixes any longer. Then time has come to apply other strategies. In the end you need to decide between:

  • finding a workaround to avoid the bug (if possible)
  • disable the feature if no workaround exists - one couldn't use it anyway (deliver with limitation, might be solved by later patch)
  • go back to earlier version of 3rd party component (if possible)

It might prove difficult to hit the best point in time for this decision. And here is the bad news about a quarantine: It moves the need to handle regressions into the future. It might be a not-so-far-away future but it still has the consequence that regressions not get handled when they occur. When you're striving for a continuous delivery then this would break your neck for you would have to ship a product with known regressions or you wouldn't ship for a long time if you are not prepared to ship with known regressions. Both of which does not comply to the continuous delivery idea at all.

So, what's my conclusion then? I'm convinced that a quarantine could be a useful tool. But it depends. The tool itself will not fix you any bug. We've talked about this earlier. If there is a culture where regressions and Sporadics are considered evil and highest priority then this tool could help cleaning them up by making them visible. If there is a culture where these things are always overruled by something seemingly more important then a quarantine will just be another issue tracker with loads of issues never to be handled or like the to-do lists on dozens of desks which never will be worked on to the very end. It's just another container of things one should try to do when there is some time left. It's the people again and the culture they live up to that makes the difference.

Me personally I'd rather not have a quarantine. It's too many rules to know, to follow, and to think about. But I understand that sometimes you would need it to reach the level of maturity not to need it anymore.

Read also:

The opinions expressed in this blog are my own views and not those of SAP

"Something is rotten in the state of Denmark."
Hamlet (1.4), Marcellus to Horatio

Sunday, September 14, 2014

Process vs people – Or: Processes won't fix you bugs

In the last post I proposed a technical solution, a process to raise the urgency of failing tests in mainline production. By closing mainline for changes when the build is broken the “Green Mainline Policy” is able to put so much pressure on developers that

“There are no excuses for not fixing the issue anymore. But does this change the attitude we talked about earlier? Probably not.“ 

It is obvious that this proposal, so far, could work as a technical emergency procedure only. Nothing has been said about or done to reduce the occurrence of Sporadics in the first place. Thus up to now there is only a mitigation strategy for the second order problem of Sporadics showing up in mainline. This second order problem could be attacked by a technical solution. In software development we are very good at technical solutions. We do not embark on stupid tasks we would rather write a script to do it for us. And it is even one of the agile principles to automate as much as possible to get rid of recurring secondary tasks that would keep us from anything but coding new features. What we are usually not so good at is the non-technical part of it all. Attitude is non-technical. And changing attitudes is not scriptable. Changing the “Sporadics don't matter” attitude or the habit of ignoring them would be to tackle the root cause of the issue. As usual attacking the root cause is much more complicated than alleviating symptoms. (A nice blog on this could be found at

What is missing to make the “Green Mainline Policy” a holistic approach that tackles the root cause?

To present you with the typical agile coach reply: It depends.

Suppose a project that suffers heavily from Sporadics. Introducing the “Green Mainline Policy” and enforcing it in this strict manner would only add to the multitude of pressures from all directions developers are faced with. The developers have to be part of the equation to make this policy a success. So, it is time to think about the people involved. We want them to understand Sporadics are  a mess and need to be fixed quickly. And we want them to prevent those Sporadics from showing up in the first place. We want them to change their attitude, their usual behavioral pattern to make this policy a success.

And this is where it starts to get tricky. There is no simple answer to that, because we are talking about people. People are different. People are special. When dealing with people one has to deal with gurus, techies, quality-guys, process-lovers and process-haters, people fond of this and people fond of that (preferably the very opposite). There are people that want to change the world and people that want to make a living. All of them have their reasons. And all of them are valid and respectable. When we are about to enforce new policies or processes we need to understand that people act in (at least) two roles: the Person and the Developer. The change in processes affects the Developer. He has to adopt to new policies or processes. As a person I might not like these policies. As a developer I would have the obligation to stick to them otherwise my job may be at risk.

Bringing about a “Green Mainline Policy” to a project that is riddled by Sporadics is to bring about a transformation that takes time. We want to change attitudes and habits of professionals that have their experience some of which may be on the job for a few years others for a dozen or even more. They are used to what they always did, what they learned back then or in between. Every single person needs to be picked up from where they are in terms of knowledge and in terms of the way they usually work. 

There might be a number of people that would welcome the policy for it supports the way they always wanted to work and probably did in the past. But it is likely that the majority of people will have reservations. My experience as an agile coach shows that most of them are born out of lack of knowledge how one could do otherwise. Many times in a training when telling people about test isolation, testing pyramid and things alike I heard sentences like these:

“Why didn't anyone tell me this before?”
“Wow! It's awesome!”

What I've seen in my life as an agile coach is a huge amount of end-to-end tests and a lack if not absence of any other type of test. I consider this the main source for Sporadics. In my opinion the agile techniques like test isolation and the knowledge about the testing pyramid as well as the SOLID principles would be a foundation for developers to build on. These techniques give them the instruments to stabilize their tests, to improve the code they write, and to get rid of the superfluous end-to-end tests. The improved code will be better testable which would bring about even more stable and quicker tests.

Teaching would be one aspect. Coaching is the other. My experience shows that a training is a nice place to learn about the techniques and even to practice them but there always will be the real code that does not support the newly learned techniques just like that. It proved to be a good idea to have experts around that understand the techniques and that know the code base for they are able to support the developers to bring the new knowledge to the old code base. Only if a developer will get support in an individual case at his code he will gain the insights he really needs. This support will help to counter the:

“This might work in area XYZ but in my area things are different. It won't work at all.”

Don't ask me how often I heard that one. But I did not tell this to make jokes about it. Someone who said this needs support to find a way how the agile techniques may work for her. When such an obstacle could be overcome this would convince people the most.

So, I'd like to extend the “Green Mainline Policy”:

  • Invest in the people by teaching them agile techniques
  • Make sure to provide ongoing coaching resources and support in individual cases by experts that know the code base and the “environment” as well as the agile techniques
  • Invest in cleaning up the code, make it more testable
  • Invest in cleaning up the tests, stabilize them

It is obvious that these things will take time. That's why I would propose to go for a more human handling of the “Green Mainline Policy” emergency process. There should be owners of the mainline that function as a gatekeeper. They would decide when to close mainline for changes. When mainline breaks they would have a look at the failure and classify it. If it qualifies for instant fixing than this will be requested and mainline will be closed. Sometimes the fix may not come that easy then at least a root cause analysis should be requested and mainline will be opened when this analysis proved there would be no easy fix. And so on. It depends on the situation you find yourself in. Sometimes even the strict CI server solution would work fine for you right from the start. In all other cases the reins need to be tightened in accordance with the progress in education and code cleanup. Things need to be improved incrementally. Any small step will make it better than before.

Yes, this is no easy solution. It will take time to implement. But it took quite a while to make the mess. Cleaning up needs its time too. Next time I will have a closer look at the types of Sporadics. Knowing these types will pave the way to propose solutions how to avoid them.

Read also:

The opinions expressed in this blog are my own views and not those of SAP

Saturday, September 6, 2014

Tune up the Speakers – Let Sporadics Scream at you

In my last post on this topic I pointed out that the main reason for Sporadics to creep into your test base would be the lack of urgency to prevent them from doing so. These days’ developers find themselves confronted with concurring requirements and fixing Sporadics is one of the least pressing ones. Thus making them more pressing would be required to move them up the queue. And we’ve talked about a mind shift that would be required to pay more attention towards Sporadics, so called, to get them analyzed and out of the test base. Just putting more pressure on them from a technical side would not help. In my first post of this series I was complaining about the it’s-just-a-sporadic attitude which contributes to the decrease of urgency. This has to go too.

So, what could be done? The first step would be to make the Sporadics ring in your ear each time they would show up.

Actually there are two solutions an informal and a technical supported one sharing a common thought. One works fine for smaller teams or projects the other may be required for larger teams or projects. Both solutions work with these principles in mind:

“There are no Sporadics. There are failing tests only”

Make people understand this. There are no second class failures that could be somehow ignored or paid less attention to. Every test failure indicates an issue to be solved as pointed out earlier. No statistics needed. No guessing on known issues required. The second principle is one borrowed from “Continuous Delivery” [1]:

“Do not check in on a broken build”

When the build and test pipeline breaks something is wrong with your product. It does not make sense to move on sandy grounds. The reason for the previous failure might be a bug in the application. The basic assumption is that the responsible developers would investigate the failure and provide a fix eventually. Only after this has been done the next check-in is allowed. As the authors of [1] pointed out breaking this rule would prolong the time until the build gets green again for analysis on a moving target is much harder to do. And second people would get used to see broken builds which would either yields ignorance or a big effort to clean up the build every now and then which nobody really wants to do.

Both these principles could actively be lived by smaller teams or projects without the use of any additional processes or tools when people accept them as the way they want to work. I’ve seen this working in real life. As long as the people involved share these principles and do not cease to enforce them they would be perfectly fine with this. But this may turn out to be hard work and sometimes the discipline erodes. If the window gets broken then it will be hard fixing it. This is getting even harder when talking about big projects with many teams involved. This is why I come up with this approach or proposal of mine. It is basically about the how I make a large organization react properly and instantly on broken builds not about how a certain kind of Sporadic failure could be fixed to make the test more stable. I will come to this later, though. For I do think all the post on the net lack one important discussion: Is the intermittent test itself really the test you should have or would another testing strategy for the feature under test help avoiding the issues that usually make tests unstable.

Proposal: Green Mainline Policy

When gentlemen agreements do not help to keep the “Do not check in on a broken build” principle alive, then it might be time to establish a tool supported process that enforces the desired behavior. I will call this a “Green Mainline Policy” or “Green Branch Policy” for it does not only be applicable to the mainline. (In the following I will only refer to mainline for the sake of readability.)

The objectives of the proposal are:
Reduce or eliminate the occurrence of Sporadics
Make fixing a broken build the highest priority task
Must work for large projects

So, what’s the proposal then? Basically it’s quite simple. Suppose there is a continuous build and test after a change has been submitted to mainline. If this run succeeds everything is fine, obviously. If it fails the CI server will
close mainline for any further submits except for the person/team responsible to fix the issue
open a ticket in the issue tracker
send a mail to the team/person responsible for the broken test
When the fix has been provided it will be accepted to be submitted to mainline and a new build and test run starts. Only if it succeeds the restrictions will be lifted and everything is back to normal.

Instead of letting the CI server handle the policy there could be QA people or mainline owners who would execute this policy manually. This depends on how business is done usually. The main point is to really close mainline for further changes. No-one will be able to bring any changes – except the fix -  near mainline anymore until the issue got fixed. With that the “Do not check in on a broken build” does not only depend on people behaving properly but will be enforced from outside of the team and thus will be much harder to undermine.

Reality check. There is a project I know of which incorporated this “Green Mainline Policy”. Although there have been some resentments in the beginning the policy proofed to be of value. Not only that Sporadics get analyzed (and very often) fixed quickly teams adopted this policy for their team codelines as a rule to follow without process enforcement. There has been an improved quality chill factor. Quality felt better than before when everyone was bothered by ever broken mainline builds. Now quality issues become evident and the improvement of quality is visible to everyone: Less and less broken mainline builds.

Invitation to further discussion

However, it was quite easy to type this down and it might be tempting to leave it like this. But this wouldn’t be the right thing to do for the proposal does raise a lot of questions which need to be answered. I will come to this in a second. First I would like to discuss what this simple approach could do.

It puts a whole lot of urgency on the fixing of the issue which broke the build. As it is mainline would only be opened when a fix arrives. This issue might it be a regression or any other (d)effect will not show up anymore given the fix has been a thorough one and does not, for example,  just increase a timeout that proofed to be too short. The lack of urgency for Sporadics would be gone. There couldn’t possibly be more urgency than a stopped mainline production would produce. There are no excuses for not fixing the issue anymore. But does this change the attitude we talked about earlier? Probably not. It’s more like an extrinsic motivation – one might call it pressure – than an intrinsic one to avoid or at least quickly remove Sporadics.

There are other questions popping up:
How long would mainline be closed if fixing the issue would take a day, or two, or even longer?
How would you verify the fix? Running the whole build and test pipeline again? Just verifying the failing test?
What if the failure could not be reproduced?

Much depends on the very nature of the failure. There are so many possible sources of failure in a test that one would need to put a more differentiated answer. It’s plain to see that the approach might not remain this simplistic. I will try to elaborate on this next time around.

Read also:
Part 1: Sporadics don't matter
Part 2: Let the CI server take care of these Sporadics

[1] Humble, Jez, and David Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation, Addison-Wesley, 6th printing 2012, p.66

The opinions expressed in this blog are my own views and not those of SAP