Friday, December 19, 2014

Agile Testing Days 2014: There are no agile testers - and no agile developers either

Just recently I wrote a blog post in response to the Agile Testing Days 2014 I was glad to have attended. This post was received controversially, to say the least. There have been many in favor for what I was saying, and there were many that didn't like it at all. For several reasons. I've got involved with interesting discussions with many of my readers., most of which led to the conclusion that there has been some misconception due to the fact that some of my arguments presented in the conversations were not mentioned in the post. I promised to write a follow-up to get to the point I was trying to make with the first post but didn't manage.

Here it is.

My Background

For there have been some not so polite responses questioning my right to express my thoughts regarding testers and testing I will explain where I come from and what my premises are. Currently I'm in QA for a 2M lines of code rich client application. As QA we define the processes for the development teams to follow to ensure a high quality of the product including everything from continuous integration, automated testing up to E2E and exploratory testing as well as static code checks and security scans. We follow the goal of establishing a continuous delivery process for this application.

My background over the years, however, was in development. As a developer I've come a long road from being a typical developer (from a testers perspective) to becoming an coach for agile methodologies (including agile testing). At certain times I did user acceptance testing and scripted testing. So, I've done my fair share on almost everything a tester would usually do. But my upbringing as a developer certainly makes sure I'm not completely familiar with the terminology a tester would use. Please forgive me this lack of knowledge but rest assured it is not because of ignorance.


The existence of agile testers (aka testers as members of agile teams) prevents the urgently needed mind shift development has to undergo and the existence of agile testers does prevent agile teams to become what they are supposed to be: agile small units that could build and ship the highest rated new features with nearly constant velocity. 


Development departments have a long record of software products not shipped in time and not having the expected quality. There have been many ideas around how this could be improved. The most successful of these has been the introduction of a QA department. Just like in any other industrial production QA would check the product after it has been assembled, and before it would be shipped to customers. For cars or machines this worked out pretty well. So, why not adopting this approach?

Development was known to deliver bad quality. QA would establish a Q-gate for the product to pass. Testers would test the product thoroughly before they would turn the thumb up. Fixes for issues would be requested and delivered by development. Sounds great. But virtually never worked out. Why? There always is the temptation to squeeze yet another feature in and yet one more, thus stretching the time for development beyond any deadline. QA would suffer for they are required to accomplish their tasks in a much shorter time frame then originally planned. Issues they found did not get fixed anymore for release date approaches to fast.

In such environments the guild of testers had to develop methods, techniques, and tools to somehow manage the increasing workload in a shortening time frame to work in. They had to mitigate the lack of information, the lack of involvement, and the fact that they were forced to test at the wrong end of the cycle. They sure would develop a pride in their work and their abilities. For they would act closer to the customers and would receive complaints and issue reports their understanding of customers needs would be better by far compared to the understanding a typical developer would have. They sure took pride in that as well. And rightly so.

Turning an eye on the developers. Before QA has been around they at least had to do some testing. Now there would be QA testing just enough quality in. Fine! More time to develop the features development was forced to squeeze in. It wouldn't come as a surprise if a developer wouldn't even felt relieved of the burden to spent valuable time on testing instead of walking further down the list of features he was supposed to finish. Once QA came into existence developers were disconnected from customers by yet another layer. On the other side of the process there were architects or system analysts that would lay out the whole product before even the first line of code has been written. They would talk to customers to collect requirements. Then they would put down all their design documents and contracts and stuff. And the developer had to walk down the list. Completely disconnected from the source of the requirements and the consumer of his work. They got cushioned in between these two layers. What would you expect them to become within this sandwich. No time to really change the architecture if they would find a flaw. No time for that. Just keep working around it. What connection to your work would you develop if you would have no means to influence major parts of it? A certain archetype would be required to survive in such environments. Nowadays we call them developers (at least when we stick to the cliché). This typos gathered in development departments. At one hand cushioned in and pampered, kept away from customers. At the other in splendid isolation just doing geeky stuff feeling proud of their ability to work around every problem in the shortest possible time.

To be sure: Not every developer fits into the cliché. Just like not every tester fits into the cliché built by many articles and books and which has been presented to me at Agile Testing Days this year or in replies to my postings. Things are not black or white, as usually.

Agile Teams

Fortunately there has been developers that did not fit into the cliché. Guys who wanted to do things differently as well as other people in this industry from other roles. And over the years ideas like XP, agile, scrum, and lean were invented. Now we are faced with agile teams. Suddenly things like backlogs, priorities, and short cycles matter. Frequent shipment of high quality product increments were expected. Customer feedback which leads to major adjustments in the product would be welcome. Imagine a developer from the safe haven of the good old days when they were supposed to just code and a whole organization built around them would take care of the rest. What would he make of this?

Imagine a tester with his safe haven QA department. Far away from developers. Now she was supposed to join teams with the developers. Two parties were put together that felt comfortable with the prejudices they had on each other. What would you expect to happen? Well, the safest thing would be to keep the respective specialization and organize the agile team like the departments they came from. The developers in the teams would, well, develop and the testers would just test. In fact the agile team would turn into a micro version of what has been the organization back then. I have seen this happen several times. And this does not seem to be special to me. Many of the sessions at Agile Testing Days shared the same experience to some extend.

Even the bible for agile testing the book by Lisa Crispin and Janet Gregory did elaborate on this. I like this book it has got a lot of knowledge and experience in it. I certainly like the insights from their experience over the years they would share in there. And it has the right title for sure: Agile Testing. I will come back to this just a little bit later.

Part I of the book elaborates on testers and what they could bring into an agile team. In what sense they are special and would yield a great benefit for the team. Yes, testers have their unique specialties. Yes, these are badly needed close to development and even before the first line of code has been written. We definitely would need more agile testing in agile teams or in software development in general. 

At some point in this part the "Whole team approach" was mentioned and welcomed. No distinction between testers and developers. Just a team working on the top priority tasks doing whatever has to be done by anyone available when a new task has to be worked on. The agile team as a bunch of generalizing specialists. I very much like this idea. In fact that would be the Holy Grail. An agile team that would be able to really work according to priorities.

Unfortunately the remainder of this part very much keeps the distinction between testers and developers and tries to explain why a tester would be required to write acceptance tests, to talk to customers to understand what he really needs and similar stuff. There is one question popping up my mind: Why would I manifest the old DEV - QA pattern in an agile team? Why would I still keep developers away from activities they would urgently need to do to improve their understanding of the product or the customer needs. If they just be kept away from these experiences, then they are just cushioned in and pampered as before and we are all no better off. 

We would still play this DEV & QA game. Testing happens after implementation. I have seen this happening: Testing gets squeezed in between end of implementation and sprint closing. Often resulting in not properly testing for lack of time. Letting bugs slip through and so on. Where would be the difference to the old days? Testing needs to be first priority. To get this done anyone in the teams needs testing skills to some extend.

If you really want developers to change, to develop new capabilities you need to let them do those things they never did before. A tester would be a great teacher to them if he shares his knowledge with them. A developer needs the feeling of pain a bad quality product gives the customer. A developer needs to write acceptance tests herself to understand what a user story requires them to produce. These first hand experiences are what would trigger insights for them. 

Are developers afraid of this? Yes. Are developers not prepared to do something else than develop? Yes, for sure. We are talking about change. Change leads to a feeling of uncertainty, leads to occurrences of not knowing what would be expected of them. Change brings about fear. Same goes for testers. Are they really required to code? Are they really required to do architecture? Bug fixing? Yes they are. In an agile team. And this brings about fear, uncertainty, feel of loss for them too.

So, what happens most of the time? Developers and testers comprising a general non-aggression pact. And each sticks with what she could do best. Keeping specialism alive. And this specialism yields problems itself. Specialists are only able to work on a limited amount of tasks. Those who they do not cover with their specialization they would not take on. Specialists tend to be bottlenecks if their special knowledge would be required by two concurring tasks one would be blocked. Specialism leads to situations where the user stories not get worked on in the order of decreasing priority but by the need for some specialist. Thus the next increment of the product will not contain what the customer needed the most but the features the team in its combination of specializations was able to get done. This would not comply to the ideas of agile anymore. A team like that wouldn't really be an agile team.

Agile Engineer

What would that be? She would be a generalizing specialist. Willing to learn new methods, techniques, new skills. A member of the team that would be able to take on almost any task belonging to the most important user story on the board. If writing some code it would be, then she will code. If it is testing, she would test. If it is something else, she would do this as well. She would not be afraid of asking when she lacks knowledge or experience. She would be willing and open to share her knowledge.

Okay, lets go with agile testers and agile developers. We need to start somewhere. But support them in becoming an agile engineer. This will be a though transition for many of them. They would need help and guidance on the way. And they need the right environment. Just because an organization decides to have agile teams and to go agile doesn't mean that there is anything else than a new label put on the stuff they have. An agile engineer would not yields the effects he could if the organization around him is not transitioning too. Fixed schedules, fixed resources, and fixed scope do not work with agile. If the organization is not going agile anything else could remain as it has been back then.

Agile engineers would build great agile teams working in great agile organizations. They would DO agile testing, would DO agile development, or agile planning. They would cease to be an agile tester, an agile developer, or an agile someone. This would be the time when there would be 

No agile tester - and no agile developers either

Read also:

On Coaching Agile
Part 1: Killer Questions - What do you want to test?
Part 2: Techniques - Make a Bold Statement
Part 3: Techniques - Show and Tell

On Sporadics - How to deal with intermittent tests in continuous delivery

The opinions expressed in this blog are my own views and not those of SAP

Saturday, December 6, 2014

On Coaching Agile: Techniques - Show and tell

When coaching agile methodology as well as with any other coaching you are supposed to make your point in a way that coachees are not only able to repeat what you have said but to get the idea of it. They are supposed to be able to adopt the principles to their own daily work. This will only work if you get their attention and to make them listen instead of just hearing. If you do not manage to achieve that you could as well talk about little elves in the trees. The outcome would be the same with regards to agile methodology.

So, what stands between you and a coachees attention very often sums up to pure prejudice, which could be translate into one simple sentence: "This won't work for our code." Very often this simple statement prevented my coachees from listening in the first place. They would attend the training for they are supposed to do so, and they are willing to stand it, somehow. These people will not listen, they are not open to what you want to say. To get their attention is to get around their prejudices. How would you do that?

Many people these days want to publish their own book. There are a lot of book on demand offerings where anyone who hears the call could publish his or her book dreaming of becoming the next Philip Roth or J.K. Rowling. Unfortunately most of them are not at this level. However, they enjoy what they are doing. Many attend creative writing classes to learn about how a novel would be written, what principles should be followed. The result is there are many novels around that reflect these classes teachings. They share the ever same plot structure:

Plot point 2 - Mid Point (Crisis) - Plot Point 2 - Dark Moment (Even deeper crisis) - Resolution - Wrap up

Very seldom there is development, own ideas to what a plot could look like. Most of the time there is just repetition of what has been told. They heard but did they really listen? Has there been something to listen to?

Another shortcoming of creative writing novels are the fact that they tend to TELL the reader a story. Just like this one here:

"Pat has left. Tom was very angry in one moment. The other he was wiping his eyes out. He felt sad and empty, changing moods with a blink of an eye."

Everything said. Does this catch you? Probably not. There is no flesh to the bones, no room for my thoughts to flow around a scene, to come to conclusions about it. Thinking was done for me. These sentences are gone long before the end of the novel. Same goes for many coaches. They would just stand in front of a room full of coachees and read their slides of have their say. People try to get through it without snoring. If I would be a coachee in there I would still have my prejudice and nothing could be done about it. Once out of the room I would have forgotten what might have been said. Why?

Well I just were told something. No one tried to make me thinking.

This what differentiates a serious writer from a creative writing writer is the ability to create scenes, to show me places and people, to make me think about them, to give room for thoughts and still be in control of the flow of the story. The writer makes me think and guides me through the story. Just like this:

"Photographs littering the floor. Pat and Tom at some lovely mansion up in the hills. Pat and Tom diving. Pat and Tom at many places. Pat kissing Tom. Tom hugging Pat. Some of them are torn, showing either Pat or Tom smiling at someone who's no longer there. Tom was standing at the window staring into emptiness with a blank expression. A vase lying shattered between his feet. His cheeks were wet of tears. Suddenly he turned around, kicked a photograph into the next corner and screamed at the top of his voice."

This scene shows the moods Tom are in. The reader is able to think, to figure out what's going on. She will be involved with the scene for the writer SHOWed it to her. And this is what we want to achieve in coachings as well. Get the people involved by showing what me mean. Give them a chance to come to the conclusions themselves instead of telling them what you already know and they do not know where this has come from.

A very nice example I recently was able to experience has been the session TDD and Refactoring with LEGO at Agile Testing Days 2014 in Potsdam, Germany. Bryan Beecham and Mike Bowler showed the effects of TDD, refactoring, technical debt and lack of communication between teams working on the same project with simple but very effective hands on exercises using LEGO bricks. You could physically feel the heavy load of technical debt. You could physically feel the effects of refactoring and TDD. Whether or not you come with prejudices and the strong feeling not to be willing to get involved with this the pure fact that you would be allowed to play with LEGO makes you letting down your defenses. And this is what opens up your mind just enough to be vulnerable for new ideas. When doing my next class room training I want to try this instead of reading and explaining my slides.

Another approach to get around the prejudice and especially to get around the this-won't-work-for-my-code statement would be to actually show that it works for their code as well. Me and my partner spent some time upfront the training to lay hands on the code of the team to coach. We would let them show us some part of the code they do not unit test and they think not to be unit testable at all. Within a week we would create the moving parts required to build and run a first unit test suite. And we would do just enough legacy code magic to get sample unit tests done that proof the approach works. Within the training we would have a session "Legacy Code" and the examples their are taken from the teams code base. This would be the time when we show them what could be done to their code. And guess what we got each time: Wow! No other part of the raining got them more involved and open for discussions than this one. This usually would be the time to go back to refactoring and TDD principles, to go back to test isolation. And every time we did this the teams picked up from there to grow their unit test base steadily. We just had to plant the seed and SHOW them how it works instead of TELLing them that it would be possible.

A checklist to summarize:

- makes people think
- gives room for own conclusions
- opens up peoples minds just enough to make them join the discussion
- this-won't-work-for-us will not work anymore

- for the LEGO thing: the session is harder to control
- preparation becomes much more expensive
- you would need access to the teams code base and some initial support

Read also:

On Sporadics - How to deal with intermittent tests in continuous delivery
Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7

The opinions expressed in this blog are my own views and not those of SAP

Monday, November 17, 2014

Agile Testing Days 2014: There are no Agile Testers

Fortunately I had the opportunity to attend the Agile Testing Days 2014 in Potsdam, Germany. It has been a great conference. I got so many new ideas out of it. It was very well organized and there were passionate and well chosen speakers around. I regret to only have booked the first and third day, especially for they built a car in the hotel lobby on day two. And I missed that one.

To give you a little background on myself. I'm not a tester. Never have been. Agile or not. Most of my career I've been a developer working on a wide range of topics in very small and very big teams. I have done my fair share of UI development both web and RCP. I have done server development and even system development. So I've seen many things between a slick UI and moving around single bits. What I haven't seen all the time was an agile tester. I barely ran into a tester, old school that is. Nowadays I'm involved with shipping a product of round about 2 million lines of code and coaching agile teams. So, maybe I'm a little bit biased here. But now you know. Just in case.

The whole conference has been arranged around the topic agile testers between past and future. Already the opening keynote of Lisa Crispin and Janet Gregory set the tone to what then was yet to come. To me it seems agile testers experiencing a phase of self-discovery these days. On talks gave insight to the live of a tester in a development team who out of a sudden got supported by a developer to get his testing done (Kristoffer Nordström). Another workshop tried to introduce tools that could help a tester to develop a proper testing strategy and to cover the cube of testing opportunities by identifying what he has to actually test. A major part of this workshop put emphasis on the means to obtain information about the application (Huib Schoots). The talk by Alan Parkinson explained how pull request could be used to communicate with developers and how to make sure only tested code gets into the trunk. These talks very much based in the notion that testers and developers are not really working together, that developers throw their stuff over the wall and testers have to cope with what they have implemented. At least that is how I understood what has been told.

And then there has been the third day opening keynote by Anthony Marcano with the title "Don't put me in the box". He was ventilating the idea that agile tester would be no title or job description but an activity one would do for some time. To me this would be the key idea for the future for testers, agile or not.

I've been around in this industry for over 25 years now. The longer I'm into it the more I lack the understanding what a tester would be good for. Or to put it another way around. I lack the understanding why there should be people called testers that would have the obligation to check the work others (developers) have done a little while ago. I'm very well aware of the fact that testing as an activity probably would be the most important task in software development. As an activity.

In my opinion there is no such thing like an Agile Tester when we assume this to be a person. To me agile means fast feedback and early as well as frequent adaptation to what ever needs to be adapted to. The instance of a tester that would consume whatever a developer has programmed only manifests the old fashioned way of waterfall like setups with late feedback and little adaptation. And it doesn't really matter whether the tester would be part of an agile team or not. As long as he is supposed to test after the implementation work has been done he could as well be half a world away. Well communication would be a little easier when he sits right next to the coders.

The major shift the agile tester has to go through would be the change from being a job title to becoming a role one could take on for some time up to its extreme in TDD where it would become a hat which has to switched often with other hats. In times of XP and TDD and all that stuff we would talk about agile testing instead. Agile testing would be any activity from TDD to ATDD or whatever testing happens while implementing.

And what happens to the people? The real existing agile testers? I understand that this change brings about uncertainty and there are a lot of people who ask themselves how to deal with this situation, how to move on. I've seen operations guys turning into devops applying TDD to writing chef recipes. I've seen operation guys moving from writing scripts into developing a build and test infrastructure in an agile team applying XP methodologies. They brought there special knowledge and they moved on. I'm sure same would be possible for agile testers. They do have valuable skills, unique skills that any team of developers would highly appreciate to have around. Agile testers could bring their knowledge to the developers while they bring their unique knowledge to the former testers. Anyone could draw something out of it. Testers would start to take part in the implementation and developers learn how to test early, frequently and automated. Finished user stories would be delivered in high quality in the first place. No need for subsequent QA anymore.

Okay. This might be to bold to claim. To be honest, there are testing activities that would require special knowledge not available for everyone. Think of usability testing or user acceptance testing where there is much more need to be able to deal with people to guide them through the process. These (and some others) are high value manual tests. These tests are not stupid repetition work like scripted manual testing would be. They are interesting an challenging. A work for specialists. Any other testing especially any testing that could be automated or which would be supported by a tool should, or shall I say: must, be done close to the implementation work to make sure to get early feedback. Only in such setups it would be possible to fix bugs as early and cheap as possible, to change architecture as long as one still has the knowledge about it.

To wrap it up. Although there are some testing tasks that would require specialists to get completed most of the testing would be work a developer could learn and has to learn when he is a member of an agile team with the assumption that agile teams incorporate some or many techniques of XP. Agile testing would be the testing work that would be done while implementing a user story. Agile testers would merge into these agile teams. Their job title would become a role or even better an activity. They themselves would move on to become one more guy to get user stories done. They would bring their unique strengths and capabilities to the teams. They would teach former developers how to test efficiently and they would learn how to code.

Once I've been a member of a real agile team. Anyone in this team would code and do TDD as well as any other testing that was required. Anyone would touch the build scripts or would build stuff to get automation done. Anyone was sharing the same idea: We as a team make something happen. We take on any activity needed to get our work done. That has been a great time. Probably the best, the most inspiring I've ever had. If I would imagine the future of our industry this would it be.

Read also:

On Sporadics - How to deal with intermittent tests in continuous delivery
Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7

The opinions expressed in this blog are my own views and not those of SAP

Friday, November 14, 2014

On Coaching Agile: Techniques - Make a Bold Statement

In coaching agile methodology or coaching in general making people to leave their comfort zone or pushing them out of it would be key to get your message across, to bring about a change. You want to make them think, open up their minds to the new concepts and ideas. There are several techniques to achieve this. Last time I was talking about the power of a simple question: What do you want to test?

This time I want to focus on another technique I use a lot: Make a bold and simplified statement.

What do I have in mind by that? Well a bold statement would be something that stands middle of the road, massive, hard to get around, a bit provocative too. You have to deal with it when it happens on your way. There is no way of slipping through unnoticed.

When coaching you want to make the people that attend the training or that you coach on the job understand what you have to say. I guess it is a common observation that people that attend a training at some point or the other tend to drop out of it for different reasons. So every now and then it is required to retain their attention, distracting them from whatever they would do at this very point in time. Making a bold statement or repeating it could do the job very well. Here are some examples of what I have in mind: J.B. Rainsberger claims Integration Tests are Scam and David Heinemeier Hansson declared TDD is dead. Long live testing.

One I would use very often would be: "There have to be unit tests". Well, sure it is not as bold as the two prominent examples above. It was meant for a different audience. But what they share, all of them are catchy and promise a message that would make people listen. Either for they think they would agree or not. But whether or not a statement will be received as bold or even disruptive or not very much depends on the context. In a context where there are only a small number of unit tests around but a huge number of long running UI click tests a simple statement like "There have to be unit tests" could be a really bold statement that would get a lot of attention, positive and negative of course. Tell you what, it really did.

If the bold statement you are about to use works or not depends on how well it fits the context you would like to use it in. It is supposed to put emphasis on a hot topic or a pain point the trainees experience in their daily work so they are able to relate themselves to it. It should be simple and rather short to be remembered. It should wrap up a message you want the trainees to understand. And of course there has to be a message in the first place and you should be able to explain it at every length at every time. As coaches we are not supposed to resemble the yellow press - just punchlines and nothing else. A good punchline helps but there has to be more to it.

The statement "There has to be unit tests" clearly is short and one could remember it quite well. But it clearly has an offensive touch to it when presented to a team that works on a code base that is so highly interwoven that it is not at all unit testable just like that. In order to bring in unit tests there have to be major refactorings prior to it. It simplifies the message, which would be to incorporate a test portfolio that resembles the testing pyramid we all know. So, what it definitely does not say and not even mention is that integration tests would be forbidden for all times. But it has been received like that and there were huge discussions following along. To be honest that has been the best that could possibly happen. Out of a sudden I've got all peoples attention. Most of them were disagreeing. But finally I was able to make my point.

Suppose you do have a message, sure you have. You wouldn't be coaching if you wouldn't have any, would you? So, you've got your message. And now? When coming up with your punchline be cautious to make it bold enough otherwise it would go unnoticed. You would not get the attention you want to get. You would not get people out of their comfort zone at all. If it is too bold, on the other hand, it either will be noticed but it does not grab hold in peoples minds for they do not relate to it. Or they will feel really offended which would spoil you as a coach.

There is one side effect I should mention here. A bold statement tends to become a synonym for you. It is like there is a sticky note on your forehead with this statement written to it. You will be cited and you will be misunderstood. People take the statement and try to turn it against you. You have to stand it. You have to preach over and over again. Repeating the message all over again.

A checklist to summarize:

  • distracts people
  • makes you get peoples attention
  • wraps up your message in a memorable little piece
  • annoys and breaks indifference
  • needs the right tuning for the audience you would like to address
  • easy to be misunderstood
  • could be offensive and repulsive

Read also:
On Sporadics - How to deal with intermittent tests in continuous delivery

The opinions expressed in this blog are my own views and not those of SAP

Monday, October 27, 2014

On Coaching Agile: Killer Questions - What do you want to test?

When coaching teams or individuals in agile techniques like TDD, Refactoring, Clean Code, CI, you name it then I've found out that certain questions yield interesting effects. My absolute favorite would be this one here:

What do you want to test?

In the meaning of: What feature or behavior do you want to test?

One of the key take aways in trainings on agile techniques would be the knowledge about and the appliance of the testing pyramid. Many teams I trained would have a test structure that imitates an upside down pyramid or a trapeze standing on the short edge or even a big letter T. When they attend a training they would learn about TDD and unit tests and are eager to write them for their code too. And what seems to be rather easy in the training with a well designed and small sample application tends to become overwhelmingly difficult when they get back to their own code base which usually would be more complex and more ridden with legacy code issues.

Nevertheless they start out and want to write the first unit test themselves. And here is where the trouble starts. How would you do that? Most of them lack experience of what a good unit test would look like. What unit should you begin with? And which tests would make sense? If you never ever tried to write unit tests for your production code these questions are tough. Everything is so much more complicated compared to end-to-end testing where there is no need to understand the inner structure of your code and where design issues are not preventing you from testing.

The following is condensed from several sessions with developers trying to put a piece of their code under test. There is no single developer who behaved exactly this way. I took the freedom to exaggerate the situation for the educational purpose. But please be aware that any part of this is taken from real situations: 

So, when I pair (actually: triple) with my developers before writing one single line of code I ask them: 

What do you want to test?

Most of the developers really have no good answer to this question. They are able to explain the implementation to its tiniest detail but they fail answering this rather simple question. That is why asking it will usually be the begin of a long discussion where they tell me what actually happens in their code which things get to be done and how. This is when I would guide the discussion to the structure of the code and the building blocks it consists of. We try to identify the parts that would not heavily make use of components which are not under control of the team. To begin with we would focus on the components that depend on components under control by the team to make things a little bit easier. Most of the time this initial discussion already brings about a better understanding of their own code and architecture which will help a lot in the upcoming part of the session. Besides, me as a coach I'm able to understand the coding on a relaively high level of abstraction which helps me to not get lost in too much details. I would gain enough knowledge to understand the features and what tests there should be. With that I'm able to guide them further down the road.

Once the component to put under test has been identified I would ask the question again: 

What do you want to test?

Many times this is when I get my tour through the component and what it would be doing one more time. This is when I try to find a nice unit to start testing with. Sometimes they would pick a unit to test themselves and sometimes not. Whatever they would pick I'd try to start with. It is important that they learn to decide themselves. It is important that they take over the lead as much as possible. It is not my show. It is not about how I would proceed. It's more about being a facilitator, to lend a helping hand. When there is a unit to test I ask one more time

What do you want to test?

Now it's about to find the method to test. Often the units picked had one method that would stand for what the whole unit is all about. Anything else in this unit are just private helper methods. Now I would try to nail them down on a certain behavior of this unit, a certain method to test. Usually I would get the same tour through algorithms and underlying layers again. And here it is important to come down to what a unit test is all about: testing a certain behavior for a defined input producing an expected output. And this is what they really have to understand here:

It is about this behavior. That's the reason we write unit tests. To secure a certain behavior of the unit.

It is not about testing all paths and all combinations. It is about testing a behavior which is directly derived from the acceptance criteria of a user story. Any code that is not covered by such acceptance criteria is useless for it implements stuff no one ever asked for. When developers write their tests with the implementation in mind then they might get a high code coverage but what we would really need is a high feature coverage. By asking the question

What (feature/behavior) do you want to test?

one would always stay in the outside-in perspective which helps to keep the acceptance criteria in mind. With that writing down expected results of tests becomes pretty easy for everything follows directly from the acceptance criteria.

So, to get to the point which tests have to be written I very frequently ask my question. It proofed to be of use to not let them go the whole tour again although they always try. Early interruption is required. But be careful. Because this repetition easily will become very annoying for the people it would be wise to guide through additional questions to point them to the expected behavior or to the obvious negative tests that would be needed. If the procedure does not seem to come to an end you should propose something by asking: What if we try this and that? In the end we want to write unit tests. We don't want to drive the developers crazy.

Especially when we talk about legacy code it is more like covering the existing code with negative tests for these are usually difficult to write as end-to-end tests if at all. This would add value to the existing test base and yields fast positive return on investment by disclosing possible fatal errors that would cause the application to crash. There never has been a session without at least one of these. In most cases these tests are quite easy to write and the bugs they are disclosing are often easy to fix. With that the developers would have a successful first unit test session. In later sessions the very same technique could be used to bring about the positive tests as well.

A checklist to summarize:

  • focuses on feature/behavior instead of algorithms and lines of code 
    • opens up your mind for new approaches
    • maybe the existing implementation could be improved?
  • forces to divide the feature into portions to be tested 
    • unveils structure of the feature at hand from an outside in point of view
    • discovers possible negative test opportunities instead of focusing on the happy path
  • annoys and breaks indifference
  • as any question forces the developer to come up with his own answers 
    • solution will be better accepted 
  • easily becomes too annoying and causes resistance 
    • guidance through hints or additional hint like questions required

Read also:

On Sporadics - How to deal with intermittent tests in continuous delivery

The opinions expressed in this blog are my own views and not those of SAP

Thursday, October 23, 2014

Interlude: A practical experience with quarantine

Sometimes one develops nice ideas or puts down some thoughts and then reality kicks in and what has been a nice theory gets under pressure by the events of daily life. Same for me. Recently I was talking about Sporadics and the attitude of shrugging them off. I was talking about people and the need to put them first and processes second. And I was talking about a quarantine to give people a chance to tackle difficult to solve issues without the need to close mainline. I also was talking about my ambivalent feelings towards a quarantine. Just recently these ambivalent feelings proved themselves valid as a quarantine turned into a ever growing bowl of bad apples and regressions entered mainline faster then one would be able to count up to five. What happened?

A quarantine has been set up with these rules:
  • any regression would be a possible reason to close mainline
  • if the regression turns out to be 
    • trivial then mainline will be closed until the fix arrives and proves to remove the regression
    • non-trivial and a fix would take some time and it is
      • a reproducible regression in a third party component then it will be quarantined for two weeks and mainline remains open
      • a reproducible regression in the application itself then it will be quarantined for two days and mainline remains open
    • a sporadic regression and
      • it showed up the first time then it will be observed as a known regression - no fix will be requested and mainline remains open
      • it showed up the second time then it will close mainline and an analysis will be requested, if analysis shows the fix would be
        • trivial then mainline remains closed until the fix arrived and proved to remove the regression
        • non-trivial then it will be quarantined and mainline will be opened again
  • a quarantined regression that gets not fixed until its time limit has been exceeded will close mainline

It turned out to be a hell of a job to enforce this <sarcasm>simple</sarcasm> set of rules and to convince people to consider removal of regressions their primary task and that disabling the test that unveils them wouldn't be the solution if not the test itself turns out to be buggy. That's why I need to touch the topic of quarantine once again.

The original idea would work if certain assumptions would be true:
  • "... quarantine would serve as a to-do list of open issues to be worked on with highest priority"
  • quarantine rules will be strictly applied

If one of them does not hold the whole concept would be broken. I further expressed my concerns about a quarantine:

"If there is a culture where these things are always overruled by something seemingly more important then a quarantine will just be another issue tracker with loads of issues never to be handled or like the to-do lists on dozens of desks which never will be worked on to the very end. It's just another container of things one should try to do when there is some time left."
And exactly this is what happened. And it made me think. Either the concept of quarantine was a dead end or the way it has been incorporated was plain wrong. What are my observations?

First, the set of rules is non-trivial. It takes some time to understand which possible cases there are and whether or not a regression would close mainline, will be quarantined or just observed. This rule set is hard to remember for the people to enforce it as well as for the people that would need to fix regressions.

Second, if you consider a green mainline the goal to achieve and if you accept the fact that red pipeline runs require a thorough analysis of what went wrong then the design of this quarantine has its flaws. It would allow for red pipeline runs to enter mainline in the case of a Sporadic occurring the first time. Any deterministic regression gets handled by closing mainline or at least putting it under quarantine while non-deterministic regressions could enter mainline.

Third, there has been no tool to put regressions in quarantine and to report on them with every pipeline run in order to make sure newly failing tests show up easily. Instead all test failures were present and required analysis. Many pipeline runs were analyzed that only contained quarantined test failures causing a lot of wasted work.

Forth, the rule set wasn't enforced completely. Regressions that reached their maximum time in quarantine did not cause mainline to be closed. Instead they started littering pipeline runs and along with the missing quarantine reporting it became even harder to find newly failing tests thus regressions a single change introduced.

Obviously this quarantine has been a failure both in concept and in execution. There needs to be another configuration of it to make it work. There needs to be something very simple and straight so that everyone could memorize it and that a tool could enforce it. And the latter would be the basic shift in my mind. Last time I was talking about not having an automatism but to add a human factor:
"Instead of having an uncompromisable automatism which would just close mainline until a proven fix arrived a human would be able to weigh different aspects against each other and to decide not to close mainline even if a Sporadic popped up"
And this was my fifth observation. It seems that at least in an environment where Sporadics are being shrugged of and features are rated higher then bug or regression fixes this human factor would invite people to discuss the rules and to ask for exceptions from them to just be able to do this and that very quickly. A human would be tempted to grant these exceptions and in special cases does grant them. As experience shows in many cases not sticking to the rules turns out to be the cause of a lot of trouble. In the case at hand what was supposed to be a shortcut, a speedup turned out to be a mess and a slowdown of things. Additional regressions were introduced.

With that experience in mind I try to remember what quarantine was all about. Which goals were expected to be achieved?

First of all the baseline of it all is the idea of continuous delivery where release candidates get produced all the time. If a certain pipeline run fails, this run does not yield a release candidate. Continuous delivery allows for frequent releases of small increments of the application thus getting fast feedback on any new features instead of waiting for yearly or half-yearly releases. The increment to be shipped has to be free of regressions. Otherwise it wouldn't be a release candidate for the pipeline to produce it would have been stopped when the error popped up.

So, if a quarantine gets introduced it must make sure
  • the software to be shipped is free of regressions and
  • the time between two release candidates is as short as possible to allow for small increments

If a regression gets up to two weeks in quarantine then experience shows that it will stay there for pretty much exactly this time. Considering this it would not be possible to quarantine a regression for two weeks because this would spread the time between two release candidates to at least this amount of time. If within two weeks one more regression of this sort gets quarantined the time between two release candidates would grow even longer. One regression every two weeks would make it impossible to produce a release candidate although one regression every two weeks would be a pretty good rate. What's more, the increment would be quite large. If a quarantine of two weeks is not possible then waiting for fixes of 3rd party components would not be possible any more except this 3rd party component comes with releases in a high frequency. With that the application has to mitigate issues in any 3rd party library by not using the buggy feature at all, which may mean to work around of it or to skip the own feature if it is dependent of this very feature the 3rd party component offers and which turned out to be buggy. If this becomes the default mitigation in the application a quarantine of two weeks wouldn't be needed anymore. Independent of the regression being located in the application or in any 3rd party component the fix has to be made in the application itself. The team developing the application would regain control over the application which in itself would be a nice thing to achieve.

So, allowed time in quarantine needs to be short.

Another flaw in the above quarantine rule set has been the fact that there would be a loophole for regressions to enter mainline. The fact that Sporadics need to show up twice before they would cause a close of mainline and with that open up the possibility to quarantine the regression needs to be dealt with. I'd propose to quarantine any regression no matter whether they are deterministic or not. There would be two days to get them fixed. If the fix does not arrive in due time the regression will leave quarantine and mainline will be closed. No test protocols need to be scanned for known not yet quarantined issues. The quarantine tool would report quarantined regressions properly and any new test failure would show up directly. While mainline is closed no other changes could be pushed to it. This would be an optimistic approach for it is based on the assumption that regressions get fixed with highest priority.

The new quarantine rule set would look like this:
  • any regression will be quarantined for up to two days
  • if the fix arrives within this time and it proved to be valid then the failing test will leave quarantine
  • if the fix does not arrive then the failing test leaves quarantine and causes the mainline to be closed
  • if the fix does arrive and it proved to be valid mainline will be opened again

that's it. Simple and straight forward. No distinction between types of regressions. If required an upper limit for regressions in quarantined could be added. If this has been reached mainline would be closed also. But the rules to open mainline would get a bit more complex for there are two conditions to consider. That's why I would prefer not to have the upper limit of quarantined regressions in the rule set. The two day limit is tight enough to put pressure on the resolution of regressions.

This rule set is simple enough that a tool could implement it, thus removing the human factor from the decision making whether or not to close mainline. The human factor should be used to decide upon the proper limits for regressions to stay in quarantine for this seems to be dependent from the current situation an application development finds it self in. So in the end the CI server could take care of the regressions by enforcing a quarantine although I wasn't fond of this idea earlier. But it would do so based on a simple rule set which does not require a huge and sophisticated tool and the pipeline runs still would be reported as failures and no release candidate will be produced as long as there are quarantined regressions.

I will report whether this approach works in reality.

Read also:

Part 1: Sporadics don't matter
Part 2: Let the CI server take care of these Sporadics
Part 3: Where do these Sporadics come from
Part 4: Tune up the speakers - Let Sporadics scream at you
Part 5: Process vs. People - Or: Processes won't fix you bugs
Part 6: To Quarantine or Not To Quarantine Contagious Tests

The opinions expressed in this blog are my own views and not those of SAP

Sunday, September 28, 2014

To Quarantine or Not To Quarantine Contagious Tests

Something is rotten in (the state of) your code base and spreads like a disease. Where there was only one soon enough there will be many. They are contagious as hell - Sporadic failing tests in already secured code. Regressions that is.

In my last post I extended the Green Mainline Policy by a human factor which might help to bring the Green Mainline Policy forward in environments that are ridden with bugs and regressions and which want to find a way out of it. With this human touch comes the freedom of choice. Instead of having an uncompromisable automatism which would just close mainline until a proven fix arrived a human would be able to weigh different aspects against each other and to decide not to close mainline even if a Sporadic popped up. I also talked about the fact that in cases like this at least an initial analysis would be requested which would be the base for the decision. What could this decision look like? What options does the mainline owner have?

Well obviously mainline could be closed until a fix arrives. This would work if the cause of the test failure could be fixed easily. Mainline wouldn't be closed for long.

The second option would be to accept the fact that a fix would take fairly long, too long to close mainline this long time thus hindering other developers or teams to bring new changes to it. Acceptance of the fact alone wouldn't be enough for the test still fails and will do so with every new build pipeline run. Every subsequent run will or may show this error dependent of it being a Sporadic or a constant regression. In either way this test failure will make it hard to assess the run quickly for in case the run fails it is not obvious whether it fails due to the "known" issue or a not yet known one. A closer analysis would be required. In this situation there needs to be a mechanism that ensures the fix to come as quickly as possible while it makes sure the "known" issue does not shadow other failures. The mechanism has to make sure that a failure in a run really is a failure that needs attention payed to it.

The mechanism is known as Quarantine. I'm not the first to talk about a quarantine. Martin Fowler did cover this briefly in [1] and there are CI servers around that support a quarantine out of the box and I'm sure there are plenty of articles on this topic around. So, what's my take on this?

My feelings towards a quarantine are ambivalent. At the one hand I'd welcome it as a means to separate intermittent tests from the ones that would provide valuable feedback. On the other hand I suspect a quarantine would easily turn into a collection of failing tests that just keeps growing without anyone taking any actions on them. Because of that I would prefer a quarantine to be guarded with strict rules in order for it to be of constant value.

The basic idea of Green Mainline Policy is the fact that code in mainline is and stays regression free for it represents the state of the code that has already been shipped to customers or will be shipped with the next release. Keeping it from showing regressions including and especially sporadic ones is the major task. Usually this would be done while mainline is closed. If this is not possible a regression could be quarantined in very special cases.

I'd see these four reasons: The regression is caused by

  • a bug in a test that would need to undergo a major refactoring to fix it which takes some time
  • a bug in the application code itself which would need to undergo a major refactoring to fix it which takes some time
  • a bug in an external component where the fix of which takes some time to arrive if at all
  • a hard to analyze sporadicly failing test which would need to undergo thorough analysis before it could be classified as one of the first three cases

In these cases quarantine would serve as a to-do list of open issues to be worked on with highest priority. Any deterministic regression should fall in one of the first three categories.  

However quarantine will always be the broken window. It holds the failing tests that are not supposed to be there at all. What once have been useful tests that guarded the code base turned into a useless mess with no other feedback than the mere fact that they have become of no value. They could not be trusted anymore. 

Once a quarantine exists there will be the question: Why not add just one more intermittent test? Just one. Really! That is the crucial question of Green Mainline Policy for once you have opened the door for one others might slip in as well. In  my opinion the strict rule set and its strict appliance to each and every regression without any exception will be the only way to keep this under control. Otherwise you would start to hide failing tests in plain sight. 

But it is not sufficient to have a rule set for getting into quarantine. You would also need a clear rule for getting out of it. The first way out of quarantine would be a fix. When the fix arrives the previously failing test will be removed from quarantine. For deterministic regressions it's a matter of priorities. If fixing them is highest priority the according tests will leave quarantine soon enough. It is getting interesting if no fix is available which could have several reasons:

  • the priorities for analyzing and fixing complicated issues are not high enough 
  • fixing really takes long for a major issue has been detected which requires extended efforts to get fixed
  • the analysis of a sporadicly failing test takes very long or seems not feasible at all
  • the external component provider does not come up with the required fixes

The first of these reasons could be mitigated by inventing a timeout or a maximum amount limit for the quarantine. If the timeout would be reached the respecting test would move out of quarantine and if still failing mainline will be closed. The second one appears to be a candidate for the maximum amount limit. If the maximum amount limit would be reached the next test to be moved into quarantine would close mainline for at least one other test has to move out of quarantine which by definition only would succeed if a fix would be available. These two mechanisms would work as a priority manager. They would raise the priority of fixes automatically.

And what about the other two reasons? The Sporadic and the issue with the external component? It is not predictable when or if a fix will be available in due time. How long would you wait for a fix? It does not make any sense to wait forever. Release date will come. Would you ship a software with known regressions or issues with an external component? Probably not. So, they have to be dealt with. If quarantine is not empty at release date then the product would have to be shipped with limitations. Timeout or maximum amount do not help with that. At some point in time late enough to remain optimistic and early enough to mitigate the bugs that will not get fixed you need to decide whether or not to wait for the fixes any longer. Then time has come to apply other strategies. In the end you need to decide between:

  • finding a workaround to avoid the bug (if possible)
  • disable the feature if no workaround exists - one couldn't use it anyway (deliver with limitation, might be solved by later patch)
  • go back to earlier version of 3rd party component (if possible)

It might prove difficult to hit the best point in time for this decision. And here is the bad news about a quarantine: It moves the need to handle regressions into the future. It might be a not-so-far-away future but it still has the consequence that regressions not get handled when they occur. When you're striving for a continuous delivery then this would break your neck for you would have to ship a product with known regressions or you wouldn't ship for a long time if you are not prepared to ship with known regressions. Both of which does not comply to the continuous delivery idea at all.

So, what's my conclusion then? I'm convinced that a quarantine could be a useful tool. But it depends. The tool itself will not fix you any bug. We've talked about this earlier. If there is a culture where regressions and Sporadics are considered evil and highest priority then this tool could help cleaning them up by making them visible. If there is a culture where these things are always overruled by something seemingly more important then a quarantine will just be another issue tracker with loads of issues never to be handled or like the to-do lists on dozens of desks which never will be worked on to the very end. It's just another container of things one should try to do when there is some time left. It's the people again and the culture they live up to that makes the difference.

Me personally I'd rather not have a quarantine. It's too many rules to know, to follow, and to think about. But I understand that sometimes you would need it to reach the level of maturity not to need it anymore.

Read also:

The opinions expressed in this blog are my own views and not those of SAP

"Something is rotten in the state of Denmark."
Hamlet (1.4), Marcellus to Horatio

Sunday, September 14, 2014

Process vs people – Or: Processes won't fix you bugs

In the last post I proposed a technical solution, a process to raise the urgency of failing tests in mainline production. By closing mainline for changes when the build is broken the “Green Mainline Policy” is able to put so much pressure on developers that

“There are no excuses for not fixing the issue anymore. But does this change the attitude we talked about earlier? Probably not.“ 

It is obvious that this proposal, so far, could work as a technical emergency procedure only. Nothing has been said about or done to reduce the occurrence of Sporadics in the first place. Thus up to now there is only a mitigation strategy for the second order problem of Sporadics showing up in mainline. This second order problem could be attacked by a technical solution. In software development we are very good at technical solutions. We do not embark on stupid tasks we would rather write a script to do it for us. And it is even one of the agile principles to automate as much as possible to get rid of recurring secondary tasks that would keep us from anything but coding new features. What we are usually not so good at is the non-technical part of it all. Attitude is non-technical. And changing attitudes is not scriptable. Changing the “Sporadics don't matter” attitude or the habit of ignoring them would be to tackle the root cause of the issue. As usual attacking the root cause is much more complicated than alleviating symptoms. (A nice blog on this could be found at

What is missing to make the “Green Mainline Policy” a holistic approach that tackles the root cause?

To present you with the typical agile coach reply: It depends.

Suppose a project that suffers heavily from Sporadics. Introducing the “Green Mainline Policy” and enforcing it in this strict manner would only add to the multitude of pressures from all directions developers are faced with. The developers have to be part of the equation to make this policy a success. So, it is time to think about the people involved. We want them to understand Sporadics are  a mess and need to be fixed quickly. And we want them to prevent those Sporadics from showing up in the first place. We want them to change their attitude, their usual behavioral pattern to make this policy a success.

And this is where it starts to get tricky. There is no simple answer to that, because we are talking about people. People are different. People are special. When dealing with people one has to deal with gurus, techies, quality-guys, process-lovers and process-haters, people fond of this and people fond of that (preferably the very opposite). There are people that want to change the world and people that want to make a living. All of them have their reasons. And all of them are valid and respectable. When we are about to enforce new policies or processes we need to understand that people act in (at least) two roles: the Person and the Developer. The change in processes affects the Developer. He has to adopt to new policies or processes. As a person I might not like these policies. As a developer I would have the obligation to stick to them otherwise my job may be at risk.

Bringing about a “Green Mainline Policy” to a project that is riddled by Sporadics is to bring about a transformation that takes time. We want to change attitudes and habits of professionals that have their experience some of which may be on the job for a few years others for a dozen or even more. They are used to what they always did, what they learned back then or in between. Every single person needs to be picked up from where they are in terms of knowledge and in terms of the way they usually work. 

There might be a number of people that would welcome the policy for it supports the way they always wanted to work and probably did in the past. But it is likely that the majority of people will have reservations. My experience as an agile coach shows that most of them are born out of lack of knowledge how one could do otherwise. Many times in a training when telling people about test isolation, testing pyramid and things alike I heard sentences like these:

“Why didn't anyone tell me this before?”
“Wow! It's awesome!”

What I've seen in my life as an agile coach is a huge amount of end-to-end tests and a lack if not absence of any other type of test. I consider this the main source for Sporadics. In my opinion the agile techniques like test isolation and the knowledge about the testing pyramid as well as the SOLID principles would be a foundation for developers to build on. These techniques give them the instruments to stabilize their tests, to improve the code they write, and to get rid of the superfluous end-to-end tests. The improved code will be better testable which would bring about even more stable and quicker tests.

Teaching would be one aspect. Coaching is the other. My experience shows that a training is a nice place to learn about the techniques and even to practice them but there always will be the real code that does not support the newly learned techniques just like that. It proved to be a good idea to have experts around that understand the techniques and that know the code base for they are able to support the developers to bring the new knowledge to the old code base. Only if a developer will get support in an individual case at his code he will gain the insights he really needs. This support will help to counter the:

“This might work in area XYZ but in my area things are different. It won't work at all.”

Don't ask me how often I heard that one. But I did not tell this to make jokes about it. Someone who said this needs support to find a way how the agile techniques may work for her. When such an obstacle could be overcome this would convince people the most.

So, I'd like to extend the “Green Mainline Policy”:

  • Invest in the people by teaching them agile techniques
  • Make sure to provide ongoing coaching resources and support in individual cases by experts that know the code base and the “environment” as well as the agile techniques
  • Invest in cleaning up the code, make it more testable
  • Invest in cleaning up the tests, stabilize them

It is obvious that these things will take time. That's why I would propose to go for a more human handling of the “Green Mainline Policy” emergency process. There should be owners of the mainline that function as a gatekeeper. They would decide when to close mainline for changes. When mainline breaks they would have a look at the failure and classify it. If it qualifies for instant fixing than this will be requested and mainline will be closed. Sometimes the fix may not come that easy then at least a root cause analysis should be requested and mainline will be opened when this analysis proved there would be no easy fix. And so on. It depends on the situation you find yourself in. Sometimes even the strict CI server solution would work fine for you right from the start. In all other cases the reins need to be tightened in accordance with the progress in education and code cleanup. Things need to be improved incrementally. Any small step will make it better than before.

Yes, this is no easy solution. It will take time to implement. But it took quite a while to make the mess. Cleaning up needs its time too. Next time I will have a closer look at the types of Sporadics. Knowing these types will pave the way to propose solutions how to avoid them.

Read also:

The opinions expressed in this blog are my own views and not those of SAP

Saturday, September 6, 2014

Tune up the Speakers – Let Sporadics Scream at you

In my last post on this topic I pointed out that the main reason for Sporadics to creep into your test base would be the lack of urgency to prevent them from doing so. These days’ developers find themselves confronted with concurring requirements and fixing Sporadics is one of the least pressing ones. Thus making them more pressing would be required to move them up the queue. And we’ve talked about a mind shift that would be required to pay more attention towards Sporadics, so called, to get them analyzed and out of the test base. Just putting more pressure on them from a technical side would not help. In my first post of this series I was complaining about the it’s-just-a-sporadic attitude which contributes to the decrease of urgency. This has to go too.

So, what could be done? The first step would be to make the Sporadics ring in your ear each time they would show up.

Actually there are two solutions an informal and a technical supported one sharing a common thought. One works fine for smaller teams or projects the other may be required for larger teams or projects. Both solutions work with these principles in mind:

“There are no Sporadics. There are failing tests only”

Make people understand this. There are no second class failures that could be somehow ignored or paid less attention to. Every test failure indicates an issue to be solved as pointed out earlier. No statistics needed. No guessing on known issues required. The second principle is one borrowed from “Continuous Delivery” [1]:

“Do not check in on a broken build”

When the build and test pipeline breaks something is wrong with your product. It does not make sense to move on sandy grounds. The reason for the previous failure might be a bug in the application. The basic assumption is that the responsible developers would investigate the failure and provide a fix eventually. Only after this has been done the next check-in is allowed. As the authors of [1] pointed out breaking this rule would prolong the time until the build gets green again for analysis on a moving target is much harder to do. And second people would get used to see broken builds which would either yields ignorance or a big effort to clean up the build every now and then which nobody really wants to do.

Both these principles could actively be lived by smaller teams or projects without the use of any additional processes or tools when people accept them as the way they want to work. I’ve seen this working in real life. As long as the people involved share these principles and do not cease to enforce them they would be perfectly fine with this. But this may turn out to be hard work and sometimes the discipline erodes. If the window gets broken then it will be hard fixing it. This is getting even harder when talking about big projects with many teams involved. This is why I come up with this approach or proposal of mine. It is basically about the how I make a large organization react properly and instantly on broken builds not about how a certain kind of Sporadic failure could be fixed to make the test more stable. I will come to this later, though. For I do think all the post on the net lack one important discussion: Is the intermittent test itself really the test you should have or would another testing strategy for the feature under test help avoiding the issues that usually make tests unstable.

Proposal: Green Mainline Policy

When gentlemen agreements do not help to keep the “Do not check in on a broken build” principle alive, then it might be time to establish a tool supported process that enforces the desired behavior. I will call this a “Green Mainline Policy” or “Green Branch Policy” for it does not only be applicable to the mainline. (In the following I will only refer to mainline for the sake of readability.)

The objectives of the proposal are:
Reduce or eliminate the occurrence of Sporadics
Make fixing a broken build the highest priority task
Must work for large projects

So, what’s the proposal then? Basically it’s quite simple. Suppose there is a continuous build and test after a change has been submitted to mainline. If this run succeeds everything is fine, obviously. If it fails the CI server will
close mainline for any further submits except for the person/team responsible to fix the issue
open a ticket in the issue tracker
send a mail to the team/person responsible for the broken test
When the fix has been provided it will be accepted to be submitted to mainline and a new build and test run starts. Only if it succeeds the restrictions will be lifted and everything is back to normal.

Instead of letting the CI server handle the policy there could be QA people or mainline owners who would execute this policy manually. This depends on how business is done usually. The main point is to really close mainline for further changes. No-one will be able to bring any changes – except the fix -  near mainline anymore until the issue got fixed. With that the “Do not check in on a broken build” does not only depend on people behaving properly but will be enforced from outside of the team and thus will be much harder to undermine.

Reality check. There is a project I know of which incorporated this “Green Mainline Policy”. Although there have been some resentments in the beginning the policy proofed to be of value. Not only that Sporadics get analyzed (and very often) fixed quickly teams adopted this policy for their team codelines as a rule to follow without process enforcement. There has been an improved quality chill factor. Quality felt better than before when everyone was bothered by ever broken mainline builds. Now quality issues become evident and the improvement of quality is visible to everyone: Less and less broken mainline builds.

Invitation to further discussion

However, it was quite easy to type this down and it might be tempting to leave it like this. But this wouldn’t be the right thing to do for the proposal does raise a lot of questions which need to be answered. I will come to this in a second. First I would like to discuss what this simple approach could do.

It puts a whole lot of urgency on the fixing of the issue which broke the build. As it is mainline would only be opened when a fix arrives. This issue might it be a regression or any other (d)effect will not show up anymore given the fix has been a thorough one and does not, for example,  just increase a timeout that proofed to be too short. The lack of urgency for Sporadics would be gone. There couldn’t possibly be more urgency than a stopped mainline production would produce. There are no excuses for not fixing the issue anymore. But does this change the attitude we talked about earlier? Probably not. It’s more like an extrinsic motivation – one might call it pressure – than an intrinsic one to avoid or at least quickly remove Sporadics.

There are other questions popping up:
How long would mainline be closed if fixing the issue would take a day, or two, or even longer?
How would you verify the fix? Running the whole build and test pipeline again? Just verifying the failing test?
What if the failure could not be reproduced?

Much depends on the very nature of the failure. There are so many possible sources of failure in a test that one would need to put a more differentiated answer. It’s plain to see that the approach might not remain this simplistic. I will try to elaborate on this next time around.

Read also:
Part 1: Sporadics don't matter
Part 2: Let the CI server take care of these Sporadics

[1] Humble, Jez, and David Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation, Addison-Wesley, 6th printing 2012, p.66

The opinions expressed in this blog are my own views and not those of SAP

Thursday, August 28, 2014

Where do these Sporadics come from?

In part2 of this series I concluded that automating the detection of intermittent, random or non-deterministic tests (aka Sporadics) comes with unforeseeable extra costs. It may serve as a monitoring of Sporadics and might yield information to rank the Sporadics to decide which should be solved first. But if the efforts stop there then one has basically invested in managing status quo instead of changing it for the better. 

Before proposing a possible solution to the Sporadics problem I need to elaborate more on how a test becomes a Sporadic. Understanding this will lend hints to the solution to be established.

The basic assumption is that all tests have been successful when they were first added to their test suites in the first place. Otherwise talking about Sporadics would not be the topic to talk about …

Suppose there is a test which ran successful for a fair amount of time. Weeks or even months. Everything has been fine until the day came as it turned red for the very first time. A test failure occured. Nothing special. What happened back then? Someone might have investigated the issue. After some thorough work one came to one or more of these conclusions:

  1. The test failed because of a bug in the test, the bug got fixed
  2. The test failed because of a bug in the productive code, the bug got fixed
  3. The test ran successful again when the complete test run has been repeated
  4. The test ran successful when repeated in separation, so there supposedly was no issue with the test or the code under test itself
  5. The test failure does not relate to the change made to the productive code at all. Strange, but well (imagine a shrug, when reading)
  6. There has been some filer outage and the test wasn’t able to read or write a file or there has been any other issue with some part of the infrastructure the test was using
  7. You name it

(I will not go into detail about inherent fragile tests some of which could be a great source of Sporadics. Nor will I elaborate on possible root causes that would make a test intermittent. This is not the point in this post. I will save this for later ones.)

We are not talking about the first to items in this list. These are the good cases where the safety net worked and the required actions have been taken.

We are not talking about the occurrences when a real root cause analysis has been made, the problem has been found and fixed. These things happen probably more often than not, especially if finding 5 was not accompanied by finding 3 or 4, but not every time as the number of Sporadics in a corpus of tests will tell you.

Item number 6 would be an easy catch: Infrastructure issue. The guys maintaining it fixed the issue or the issue has been a temporary one. However the test has been good all the time. No-one has to do anything about it. Sure it will be green again when the infrastructure issue does not reappear.

Items 3 and 4 tend to be soothing enough for the guy investigating this issue that no other actions followed. Looks good now. So, just merge into mainline. Must have been some hiccup sort of.

Item number 5 consumed the most time investigating. It looks strange, but a rerun standalone and as part of its test suite succeeded. Somehow it leaves a bad taste in the mouth. But hey, didn’t it succeed all the time? And now it does again. Let’s leave it alone and do some important work, what d’you think?

While item 6 is a bad signal itself, items 3, 4, 5 are the ones that could break the neck of the company. If we are lucky they “only” signal issues with the tests themselves. Some hidden dependency that appears in some strange situations just on days of certain signs of the zodiac while the moon is in a particular position … you’ve got the idea. If we are not that lucky they were only representing the tip of the iceberg and there has been some non-deterministic behavior in our productive code which might lead to loss of data or any other hazardous event when in production use at a customer’s site. Maybe you only have a hard time analyzing it working long hours and at weekends or you might be facing a PR disaster or even worse a substantial claim for damages.

Experience shows that test failures like items 3 through 5 are the more easily shrugged off the more often they occur. These failures somehow are not taken as serious as they should be. Quite often there is an argumentation including pure statistics or issue tracker records for the code under test showing that it is not worth the time one would have to spend to find the root cause or if the root cause is known to fix it. Me personally, I witnessed such a line of thought more than once. However, tests that failed this way for a first time will start to fail a second, a third time and over again. And there you are. Now you have a Sporadic. A known issue. An item on a list or a record in a database. And one by one they creep in.

Why could this happen? In an ideal world a developer would follow the Continuous Integration principle and thus would be eager to get rid of any test failure in mainline builds at any time. For we are not living in such an ideal world things are a bit different. There are test failures that won't be fixed for reasons listet above. It would be too easy to blame developers for not caring. 

Developers find themselves confronted with various, concurrent and sometimes even contradicting requirements. Features always come first for the company gets paid for it. Maintenance is important too. Not to forget quality. Or some refactorings. Issues with sporadic failing tests for features delivered long ago (several weeks or months) tend to get lost in this. They just don’t reach the level of urgency they would require to get the necessary attention.

In part1 I was complaining about the attitude developers show towards the Sporadics. This attitude gets influenced by the whole environment developers find themselves in. So just introducing yet another tool will not help. There is also the social aspect of this. Or as +Steve Sether pointed out in his comment on my second post of this series:

"Since the problem is essentially a social problem, I think we should look towards social science for guidance.  It's been experimentally verified that people discount any potential badness that happens in the future.  So, bad thing in the future is much better (in people’s minds) than bad thing in the near present."

So let's explore the possible solution next time around. The teasing will have an end then ...

Read also:

The opinions expressed in this blog are my own views and not those of SAP