Cleaning up the code base: how to avoid missteps

It’s time to cleanup… where do we begin

Vintage print of cleaning the living room

Life truly begins after you have put your house in order - Marie Kondō

You’ve finally hit that point in your company where everyone agrees that your tech debt is untenable and your development has crawled to a halt. The situation is so bad that you even have alignment across the entire leadership team that you need to invest in fixing the mess (otherwise known as your code base). It’s great that you finally have carte blanche to fix the issue but the reality is that you’re at a fork in the road - what you do next will either set you on the path for success or failure.

I’ve typically seen two approaches that companies have taken to date (spoiler alert: neither of them work). The third is the process I advocate for - I’ve applied this approach and seen great results. Here’s a list of these methods and an explanation of each:

Stop everything and fix method (doesn’t work)
Highlighting warnings without enforcement method (doesn’t work)
Hold the line method (works!)

Why the Stop everything and fix method doesn’t work

This approach, sometimes called a “code red”, or a “code yellow” to make it sound less dramatic, is where you pause delivery of new code and take on the Herculean task of cleansing all your existing code to make sure it complies with the newly adopted standards. This method feels like you’re being aggressive about investing in fixing tech debt but is actually the most dangerous approach because I have consistently seen this backfire.

Here are the three reasons it doesn’t work:

It is extremely expensive: the whole group’s productivity plummets, and it’s typically unrealistic for a business to spend more than 3 to 6 months in this phase of halting all new development when you have competitors nipping at your heels. Plus, the expectation that you’ll fix the entire mess that was created over 5-10 years in the span of just 3-6 months is unrealistic. And the more time you spend in this phase, the higher the expectation of nirvana when you say you’re now ready for new code development. And if you say you still have code quality issues after this investment your team’s credibility is shattered and it gives code quality a bad name. Worse yet, this means when you fight for code quality in the future, it’ll be perceived as a waste of time.
It’s unfocused: You might have tools that tell you where all the problems lie in your code base. But even when you have this exhaustive list, where do you begin? Which areas are more important? In taking this approach you might spend time on fixing messy areas, but these may not necessarily be the areas that you’ll be building new code off of. In that case, you’ll have wasted valuable months of “code red” on the wrong priorities.
Leads to a focus on irrelevant metrics: When you’re stopping all development at a ridiculous cost, you have to show metrics to demonstrate progress (and quickly). I’ve seen examples of catering to meaningless metrics including showing higher test coverage by adding meaningless test cases, big functions are randomly broken in two just to show that there aren’t huge functions. Any metric can and will be gamed to get to the end of the cleanup.

Why the highlighting warnings without enforcement method doesn’t work

In order to not halt ongoing progress, you use tools to generate warnings on where the issues in your code base lie (in the build system, in CI, maybe even at the compiler level). The difference from the previous approach is that you don’t treat these warnings as errors. In other words, you highlight where the issues are but you continue to allow existing code to build and deploy and you continue to build new features.

The intent with this approach is that because you have all these warnings, you now have some idea of the magnitude of the problem, and you can address it over time without halting new development.

This is a lovely idea, but it’s wishful thinking. Here’s why:

Persistence of the “broken window syndrome“: When a neighborhood is in a state of disrepair, it sends the signal that it is not monitored or valued, and it leads to more negative action. When your project is in disrepair and you’re adding new code, when there’s no enforcement of fixing problems in the existing code base, it’s tempting to take shortcuts as you add new code. What’s the point in being the first one to write unit tests when nothing is tested? And why attempt to streamline the code logic when everything else is a mess?
Warnings will be in the backlog forever: The tools will give you a list of things to fix, but you still have new features to deliver that are continuing to add to (and building on) the mess. In reality, you’re back at the beginning, and nothing has really changed except knowing how much of a mess you have on your hands. It feels even more demoralizing for the team because you can see how big the cess pool is, and you feel like you’re never going to waddle out of it.
You reach a state of not caring about warnings: While everyone can see that there are still problems and potential lurking bugs in your code, there are no incentives for removing the existing problems, and few obstacles to introducing new ones. So warnings no longer matter. If you already have 1832 warnings at compile time, who’s going to notice that there’s now 1833?

Why hold the line method works

The “hold the line” method means that it blocks you from adding new or modifying code that violates your coding standards/ checks, but it allows preexisting errors in your code base to be ignored. In essence, it stops you from creating a bigger mess, And anytime you modify existing code, it forces you to resolve the errors in that portion of your code base.

This approach solves the problems of the previous two approaches:

It stops the mess from getting worse: Your code can only get better than it was. You don’t have to worry about people introducing new problems and new violations.
It automatically prioritizes the hotspots: This approach gets you to fix code that you are actively working on. Whether it’s a new feature or an existing one, if you are working on it, it means it’s a company priority. So you’re automatically fixing tech debt in a manner that is aligned with your company priorities. It’s also more effective to fix code that you’re actively working on because you’re more likely to understand that code in its context, as opposed to fixing it in the context of a “code yellow” massive effort.
It gets you the most return on investment: Because you’re folding this cleanup effort into ongoing development, it happens gradually, and it doesn’t require massive coordination across the whole development team.

How I applied the hold the line method

In the last two years I’ve used trunk check trunk.io in applying this hold the line method and found it to be very effective. Here’s why it worked well for the cleanup effort I was heading:

It is a meta-tool: It uses your existing tools and it helps you bring in new ones. Just let trunk check run the linters that you specify, and it will keep track of pre-existing problems in untouched code and will error out in case of new errors or existing errors in code that you just modified. No need to spend time introducing new tools if you trust your existing checks.
It’s hermetic: Developers often have the issue that code runs fine locally on their machine, but not on the server. trunk check defines which binaries of which version of which tool to run, and makes sure that each project runs the specific set of checks determined in its configuration, regardless of what’s installed on a user machine or a build agent. This means as a developer, you get consistent results on every machine! You can do a pre-flight check locally with exactly the same tools that will run in CI. The end result is that it saves developers time because you figure out locally exactly why it won’t work on the server - you don’t have to build for half an hour to then find out it doesn’t work on the server!
It’s configured within your project: It’s not a centralized tool that needs a server and an admin. It’s fully configured in the code of your project and any changes to the configuration go in lockstep with your code. If you add a new check, it will only apply from the commit where it was added, giving you less headaches when checking out and experimenting on older branches.

I consider our introduction of trunk check a great success - it enabled us to add a big number of checks in our build/test pipeline without the initial effort to make the codebase compliant upfront.

It also saved a lot of developers’ time by allowing to check locally for compliance instead of waiting until the build agent would report a failure. In fact, it also saved us a lot of useless build time in the CI pipeline since fewer faulty commits were pushed to it!

And perhaps most importantly, it changed the mindset across the organization - fixing tech debt was no longer something that was pushed off for the rare occasions when we could afford to spend time on it. It didn’t require product management or leadership to prioritize fixing tech debt in a sprint or two. Instead, the cleanup became part of our ongoing working habits.