Wednesday, July 8, 2009

Technical debt in Scrum projects

Code deteriorates with age. The older the system, the worse the maintenance. Why is this?

One answer may be that it is because of the maintenance itself: the things done over the years to keep the system afloat and the new features added, removed and changed. And people come and go. Some quit the team and new ones join. Sum it all up: deterioration.

Let’s talk about some of the causes and effects of deterioration.

Reinventing the wheel
An example of this is when a system contains several versions of the same function implemented at different times, often by different people, with only small differences. A similar situation is where a design (architecture) problem is solved in different ways in different places in the system. Either way, reinventing the wheel increases the complexity of the system and makes it harder to maintain. Developers won’t know which function is used where or even if they are still used at all (and consequently leave them there).

This is a negative spiral; the more complex the system gets the harder it is to understand, and consequently the harder it is to know if “my” problem has been solved already somewhere – which might lead me to unintentionally write my own version of a function that actually already exists. The lesson is to not reinvent the wheel. But it is easier said than done.

Maintainability is not something you can just add down the road. Building a maintainable codebase starts from the first line of code and it never ends.

Fear of changing what works – Legacy code
Once a system reaches a certain level of deterioration, people will be afraid to change certain parts. It’s often some critical and central component. Since it is central and critical it means it has been patched up numerous times, and changed and adjusted and changed again – over a long time. And hey, it works – now. And since it works, now, and since the code is so complicated and confusing, the developers don’t dare touch it. Consequently any features that would result in work in that critical component will be held back.

The way I see it, one of the root causes for this, is lack of regression testing abilities. With proper means for continuous and whole-covering regression testing, there is little to fear even when it comes to modifying old “legacy code”. Otherwise there are only two options when it comes to modifying legacy code; a complete rewrite, or just not touching it at all.

Lack of (efficient) regression testing abilities
The purpose of regression testing is to verify that every part of the system – even the less obvious parts – still works after a change or addition somewhere has been completed. The point with regression testing is to test a lot, over and over and over again. If you are in a situation where regression testing requires immense resources (which it does if you do it completely manually), then you will not be able or willing to do it as often or as extensive as is required. As a consequence your regression testing become inefficient, or even non-existent.

One of the steps in the right direction is to start automating your tests, and start running those automatic test cases continuously. The challenge, of course, is to;
(A) find a practical means - a tool - for automatic testing of your type of code, and
(B) figure out where to start.

I can’t help you with A (...psssst: JUnit, CPPUnit, CUnit, PHPUnit), but for B I suggest you just start somewhere. Don’t try to do it all at once – you won’t make it! Instead, just pick a simple starting-point, for example the most recent and newly added feature/module/component/part. Forget about the old stuff for now, just add automatic testing for the new things from now on. Something is better than nothing.

People joining and leaving
This is, unfortunately, unavoidable in most projects that are longer than just a few months. It happens. Either people get reassigned, or they choose to quit. And best-case, people join the team. Apart from a change in productivity caused by a team member leaving or joining, it obviously has an effect on the code itself too. People leaving will take knowledge with them, and people joining bring in new ideas (and misunderstand parts of what exists, too). One way to minimize the effects of this is to organize into “Feature Teams” that has a lot of close cooperation and joint commitments (like – tada! – Scrum suggests). This way you naturally spread knowledge among several people. It is also a pretty effective way of introducing new team members into the groove of things.

The classic method to minimize the problem of people leaving and joining is to write documents. I however argue that this is not the silver bullet for spreading & retaining knowledge. In fact, I think it's dangerous to rely on documents as the main tool for this; documentation is an extremely cost-inefficient and overrated way of transferring knowledge, and something that is often forgotten is the cost of keeping documentation up-to-date. As soon as documentation falls behind it becomes untrustworthy – and untrustworthy documentation will cause confusion and misunderstandings, and in the end no one will dare rely on documentation, and the time writing and updating it up to that point becomes waste.

In my opinion the guideline should be: don’t over-document – document just enough. Code Comments, I think, is a great benchmark of what level of documentation is “enough” for most situations. And remember; one excellent thing about code comments is that it is automatically (well…) kept up to date as the code changes – there’s little or no added cost for keeping it up to date.

Oh, and for the record, I’m not saying “Don’t write documents!”. If you really need to document then of course you should. I’m merely suggesting that you at the very least question the reason for doing that effort, and that you don’t forget to take into account the cost of keeping the document up-to-date as the system evolves.

Taking shortcuts - the Dirty that remains long after the Quick has been forgotten
“Well, we’ll do the quick-n-dirty fix now, just to get it done, and then we’ll go back later and clean it up...”. Have you ever said or heard something along those lines?

Short term gains such as reaching some immediate deadline makes it tempting sometimes to take shortcuts, and often it’s, sadly, a conscious decision. The problem with shortcuts is that they seldom or never get fixed afterwards, because there’s always that next deadline coming up with a new bunch of stuff to do with a new bunch of shortcuts that “has” to be made.

Doing things right from the beginning often requires a little more effort up front. And I think that different times call for different ways of acting. Sometimes it might be a correct decision to look only at short-term gains and cutting down on the immediate effort, and forgetting about those long-term consequences and drawbacks. But many teams and managers, I think, tend to be shortsighted by default – even when they don’t have to and there would in fact be room to do things properly. And in that case it is a matter of attitude (and competence). Do you do things fast and sloppy now and accept to pay for it later, or do you let it take a little longer now and reap the benefits of it later (for example in terms of costs saved on maintenance)?

It’s a challenge to figure out what is a shortcut and what is not. Remember Lean Software Development and the idea of “Extra effort” (and “Extra features”) being Waste. How do you know what is “Just enough effort”? There is no default answer to that. It depends. It’s up to you to figure that out for your system and for your business. But by figuring that out (or deciding on it) you will know what level to strive for; and anything below it is a shortcut and should not be accepted.

Don’t forget that you need to make sure that whatever your level is, it should be gut-felt by every team-member.

Bug fixes
Bug fixes have a tendency to deteriorate code. Enough patches in one place and the code will become more and more messy – at least if you have other problems too that cause code deterioration, such as people coming and leaving, inability to do regression testing, etc.

Bugs found in a production environment are often time critical to remedy. It can be tempting to take shortcuts to just fix the problem quickly and get a patch out there. But if you do that enough times but never take time to clean out the mess, you are destroying your system. See section above about taking shortcuts…

Summing things up – dealing with Technical Debt
A nice way to think of this deterioration of code is to think of it as “Technical Debt” - a term coined by Ward Cunningham.

Technical Debt is a long-term loan that we for some reason choose to, or have to, take from ourselves in order to achieve some short term gain. The Technical Debt increases for every individual loan and the debt never just goes away by itself. We have to pay “Interest” in the shape of things taking longer to complete because of this deteriorated code, and the only way we can decrease the Interest is by decreasing the loan – by paying “mortgages” e.g. by refactoring.

The most common (and worst) approach to dealing with Technical Debt is to ignore it. To pretend it doesn’t exist, and just push development forward without considering whether or not we’re taking loans.
The better approach to dealing with Technical Debt is to recognize it and have an active plan on how to deal with it in various situations.

I suggest you deal with technical debt into two steps: First stop increasing the debt further, and secondly start decreasing it. Only once you know that your debt is not increasing, you can start actively working with decreasing it.

To support you in the first step I suggest you insert a row in your Default Definition of Done that says “The Technical Debt has not increased”. It sounds trivial, but the intended effect is that it recognizes that it is OK that things take longer to complete if done in a way that doesn’t increase the debt - e.g. that it is OK to not take shortcuts. Remember to constantly remind people about this, and really do put your the money where your mouth is. Whenever faced with a choice, make sure you consider that the decision should be in-line with the attitude of letting things take longer in order to not increase the technical debt.

Once you’ve gotten used to that approach (it probably takes you a while and, if nothing else, will probably cause your velocity to drop significantly at first), the next step is to change the Definition of Done to instead say “The Technical Debt has decreased”. This is intended to recognize the fact that it is OK to also do some refactoring of things surrounding the current implementation “while you’re at it”. For example, urge all developers to modify the methods above and below the one currently worked in - even though it wouldn’t be necessary to complete the story itself! This way, for every new story completed the existing debt will decrease.
This type of refactoring will puts a lot of demands on your regression testing abilities. If you don't already have an automatic testing environment I suggest you start with introducing that first. Refactoring working code will (as explained in a section above) will otherwise be much too scary.

That's it from me for now. As always I'm interested in hearing other people's experiences and opinions in this matter.

2 comments: