Tackling Legacy Code

Posted by: Peter Morlion , on 6/29/2021, in Category Patterns & Practices
Views: 222343
Abstract: Almost every developer will be confronted with legacy code early in their career. This can be a demoralizing experience that takes away the pleasure of writing software - especially if this situation continues for too long and developers don't see a way out. And yet, in my experience the way out looks very similar in most cases. Let me show you.

Improving Legacy Code – Disclaimer

A word of warning. This article will look at some technical steps you can take to improve your (legacy) code. But there is often more to it than just technical interventions. A software team must get enough time and opportunities to improve the existing code, without adding new features. For the best results, they need support from management and possibly the end-users.

Another thing I’d like to point out is that with "legacy code" I mean code that feels like it’s a mess, is unreadable, unmaintainable, riddled with bugs, or any combination of these problems.

Eyes on the Prize

First, let’s refrain from jumping right in and hacking away. Let’s get an understanding of what it is we want to achieve.

We generally want to improve our code in some way. Examples are improving readability, just getting it under test, improving performance, removing unused code, etc. In many cases, this involves a series of refactoring, not just a single take. You’ll often only see certain improvements after you’ve done others. So, it’s an iterative process.

The Start: Add Tests

We’re going to be messing with code that we find difficult to understand. This also means that bugs are easily introduced and things are easily broken. So before we do anything, we should put some effort in writing automated tests.

This will create a safety net for later changes. If we’re already unsure about the existing code, making significant changes will scare us even more. Even if it’s for the better. With a good test suite, we’ll feel more confident to make changes.

If all the tests continue to run successfully, we’re still not 100% sure that we haven’t broken anything. But we have reduced the chances of regression bugs when compared to making changes without automated tests.

Try not to change the code too much for now. We want our tests first, then start making changes. Our tests are there to verify that our changes don’t alter the existing behaviour. But depending on the code you’re working with, adding tests will be easy, hard or near impossible. If code is tightly coupled to other pieces of code, it will be much harder to run it in isolation in a test.

There are several ways to deal with this.

Use Different Types of Tests

Let me explain the test pyramid first. The test pyramid is traditionally depicted like this:

test-pyramid

What it means is that you should have a lot of unit tests, a reasonable amount of integration tests and a few UI or end-to-end tests. The reason is that UI tests and end-to-end tests tend to break easily and be harder to maintain. They also take longer to execute.

Unit tests on the other hand are small, easy to read, easy to change and usually run fast. I’ve experienced lengthy discussions on what a unit test is, what it is not, and how it differs from an integration test. But there’s no real answer. That’s why this is a better way to talk about the test pyramid:

better-test-pyramid

The more components you’re testing in a single test, the higher up the pyramid you are. So you should have more tests that test a single component than tests that test 3 or 4 components together. And you should have even less that test 10 components and again less that test the whole application.

So that’s the theory. In legacy code, things can tend to be a bit different in my experience. One technique I’ve successfully applied is to actually start with end-to-end tests to get the application under test. While the team and I added more end-to-end tests, we started refactoring more, which allowed us to write more unit tests. We worked our way down the pyramid instead of up, so to speak.

Add Dependency Injection

Another way to go about this, is to start by adding Dependency Injection. This does mean changing code before you write your tests, but the change is minimal. Take this piece of code for example:

public class AlertService
{
    public void AlertForError()
    {
        var client = new AcmeMailClient();
        var address = "admin@example.com";
        var body = "Something went wrong...";
        client.SendMail(address, body);
    }
}

It’s a simple example to explain our case. We have an ASP.NET Controller that accepts POST requests when certain errors occur and then sends out an email to an admin.

We can’t write a unit test for this, because we’re creating a new instance of the AcmeMailClient. If we run this in a test, it will create this instance as well, which will result in a real call to the Acme mail service and we don’t want that in our tests.

Luckily, this is easily fixed in two steps. First, we’ll extract an interface from our AcmeMailClient class:

public interface IMailClient
{
    void SendMail(string address, string body);
}

Our AcmeMailClient will have to implement this interface:

public class AcmeMailClient : IMailClient
{
    public void SendMail(string address, string body)
    {
        // send mail here
    }
}

The second step is to accept an instance of this interface in the constructor of the class we want to test:

public class AlertService
{
    private readonly IMailClient _mailClient;

    public AlertService(IMailClient mailClient)
    {
        _mailClient = mailClient;
    }

    public void AlertForError()
    {
        var address = "admin@example.com";
        var body = "Something went wrong...";
        _mailClient.SendMail(address, body);
    }
}

The calling code is now responsible for creating the new instance. For example, in a simple ASP.NET Controller:

[ApiController]
[Route("[controller]")]
public class WebhookController : ControllerBase
{
    [HttpPost]
    public void Notify()
    {
        var mailClient = new AcmeMailClient();
        var alertService = new AlertService(mailClient);
        alertService.AlertForError();
    }
}

The code we want to test doesn’t care about the specific implementation or where the instance was created or by whom. This allows us to write a test with a fake implementation. We’ll be using Moq (https://github.com/moq) to create our fake instance, but you could use one of the many other mocking libraries for .NET like Rhino.Mocks (https://hibernatingrhinos.com/oss/rhino-mocks), NSubstitute (https://nsubstitute.github.io/) or FakeItEasy (https://fakeiteasy.github.io/). For our tests, I’ve chosen xUnit.net (https://xunit.net/) but again, there are other options like MSTest (https://github.com/microsoft/testfx) or NUnit (https://nunit.org/).

public class AlertServiceTests
{
    [Fact]
    public void AlertForError_ShouldSendEmail()
    {
        var mockMailClient = new Mock<IMailClient>();
        var alertService = new AlertService(mockMailClient.Object);

        alertService.AlertForError();

        mockMailClient.Verify(x => x.SendMail("admin@example.com", "Something went wrong..."));
    }
}

Of course, you could say that we’ve just moved the problem elsewhere. Yes and no. Yes, because that is exactly all we did. But no, because it allowed us to write tests for this component. If you repeat this throughout your code, you’ll eventually have moved the constructor calls all the way up the call chain, to (more or less) the beginning of your application. This could mean you end up with this ugly (pseudo)code:

IService a = new A(new B(new C(), new D(), new E(new F(new G))));

But don’t panic. This was already happening before. It’s just very visible now. Now you can add an Inversion of Control container to do all the instantiating for you. Check out NuGet packages like Autofac (https://autofac.org/), Ninject (http://www.ninject.org/) or Microsoft.Extensions.DependencyInjection (https://docs.microsoft.com/en-us/dotnet/core/extensions/dependency-injection).

I won’t go into Dependency Injection too deeply, but it will allow you to pull apart tightly coupled pieces of code and put them under test. In your tests you would pass in fake implementations of the relevant interfaces.

But in your production code, this is what it could look like:

IService a = kernel.Get<IService>();

As long as you configured the Inversion of Control container correctly (i.e. defining which class should be used for which interface), this code will give you a new instance of the “A” class with all dependencies.

The important thing to keep in mind, is that we’re adding this so that we can get our code under test. Adding Dependency Injection makes it easier to write unit tests.

Talk the Talk

If you have access to developers that have worked with the code in the past, talk to them. They are an invaluable resource of knowledge. They might know the ins and outs of the project, the weird pieces of code and why they are the way they are. They may be able to tell you what to look out for, what the pitfalls of the project are. They’ll tell you things they’ve tried that didn’t work and things that did work. Even if they can’t tell you everything, they should be able to tell you something at least.

Make sure you don’t insult them. They wrote the code to the best of their ability, with the constraints that were present at the time (knowledge, time, budget, etc). You’re not there to judge. You just want to extract as much knowledge about the code as you can.

Another person (or group of persons) you will have to talk to is management. Both your direct managers or team leads as well as upper management. They have to know you’re going to invest time and effort into this and that things may break along the way (but often they’re breaking constantly anyway). They have to support you in this or the whole journey will be very frustrating.

Look Closely

Legacy code is notoriously unstable. Things are often breaking, and teams find themselves spending too much time fixing bugs and keeping the system up and running.

That is why you should add logging, monitoring and alerts to your application.

Logging

When you add logging, be sure that it makes sense. I recommend using a library that supports structured logging. Serilog (https://serilog.net/) introduced structured logging to .NET but most popular logging frameworks should support it by now.

In many legacy applications, we log simple pieces of text:

_logger.Info($"Received HTTP code {response.StatusCode} after {response.Elapsed} ms");

This would give us the following text in our log:

"Received HTTP code 200 after 104 ms"

If you’re logging to a text file, that’s basically the best you can do. But if you’re logging to a cloud platform for example, you can use structured logging to improve your search capabilities. For this, we’ll use slightly different code:

_logger.Info("Received HTTP code {code} after {time} ms", response.StatusCode, response.Elapsed);

The above piece of code uses NLog (https://nlog-project.org/) and will log the same piece of text, but in platforms that support it, it will include a JSON document:

{
    "code": 200,
    "time": 104
}

A good logging platform will then allow you to easily search all log lines where the code equals 200 and the time is greater than 100 for example. This is a lot trickier with simple lines of text. Structured logging will even serialize entire objects, giving you this for example:

{
    "code": 200,
    "time": 104,
    "request": {
        "url": "https://api.example.com/endpoint",
        "method": "POST",
        "body": {
            "foo": "bar"
        }
    }
}

Combined with decent log messages, structured logging allows you to quickly find the relevant log lines when something went wrong, and then understand why it went wrong. The log messages should tell you what the application was doing. The properties you added to the log event will tell you what the relevant data looked like at the time.

If it’s not yet possible to add structured logging, at least make sure you have decent logging that is useful for you. Add the necessary data to your log messages so that you can debug any issues that occurred.

Legacy code is typically prone to obscure bugs and being able to find and fix them quickly greatly reduces frustration.

Monitoring & Alerts

Any modern system should be monitored, to see if it is running well. Our legacy systems are often no longer "modern" so all the more reason to monitor them closely. A modern monitoring solution will allow you to automatically keep an eye on several parameters that interest you:

  • CPU usage
  • RAM usage
  • Response times
  • Error rates
  • Etc.

You should be able to see the current state of the application easily, usually in some dashboard with graphs.

Once you set up monitoring, you can automate even further. With alerts, you can automatically be notified if things are going wrong. For example, you could get an email when your API is returning too many HTTP 500 status codes. You can then check logs to see what is going wrong.

In some cases, you might be able to fix it before a customer or end-user even notices! A good monitoring solution keeps an eye on your application and lets you know if something is out of the ordinary. This allows you to focus on other important tasks, like adding new features and improving the existing code.

Just like your tests, monitoring and alerting is a safety net. A bug might still have made it through all your automated and manual tests. Monitoring is another way of catching issues as early as possible.

Measure

With all your safety nets in play, and your team now enthusiastically refactoring the legacy code, it’s interesting to track your progress. While strictly not necessary to refactor a legacy project and get it into a better state, it is useful to see how you’re doing and where you should be focussing your efforts.

There are many metrics out there that you can use. Here are some useful ones.

Code Coverage

Code coverage indicates how much of your code is covered by tests. People tend to focus on the target number, for example: "we need 80% code coverage." While that is good, that makes little sense in a legacy application that has little to no tests. What I propose then, is to start adding tests and move the target up as you go. For example, once you have 17% code coverage, set the minimum percentage to 15. Once you have 22, set it to 20. This way, you avoid dropping back below a certain threshold and you move the threshold up every time it’s possible. After a while, you can reach a point that you’re happy with.

But apart from the specific number, code coverage is also handy to see which parts of the code you’ve covered already. Before I start refactoring a piece of code, I check to see if it’s covered by tests. If it isn’t, I need to write some tests first. Once I have the functionality covered, only then do I start changing the code.

A good code coverage tool should be able to show you your source code files with an indication of what is covered by tests and what isn’t. For our simple example, I’ve used Coverlet (https://github.com/coverlet-coverage/coverlet) to generate the code coverage results and ReportGenerator (https://danielpalme.github.io/ReportGenerator/) to generate a report.

As always, there are alternatives if you want. Visual Studio Enterprise has code coverage built in. Other options are NCover (https://www.ncover.com/), OpenCover (https://github.com/OpenCover/opencover) or dotCover (https://www.jetbrains.com/dotcover/).

I added Coverlet as a NuGet package to my unit test assembly. Then I installed ReportGenerator as a global .NET tool:

dotnet tool install -g dotnet-reportgenerator-globaltool

With that, I can run my tests from the command-line and collect the code coverage results:

dotnet test --collect:"XPlat Code Coverage"

Then I can generate the report:

reportgenerator -reports:".\MyApp.Tests\TestResults\50054626-428d-4c57-86a7-045e118c2946\coverage.cobertura.xml\coverage.cobertura.xml" -targetDir:" .\MyApp.Tests\TestResults\50054626-428d-4c57-86a7-045e118c2946"

The final report looks like this:

code-coverage-report

You can go into separate files and even see the lines that were covered:

code-coverage-file-view

Static Code Analysis

Static code analysis will analyse your source code and give you a report of potential quality issues. Many of these tools will even help you fix the issue.

Possible solutions are Microsoft’s FxCop (https://docs.microsoft.com/en-us/visualstudio/code-quality/net-analyzers-faq?view=vs-2019), ReSharper (https://www.jetbrains.com/resharper/), SonarQube (https://www.sonarqube.org/), and NDepend (https://www.ndepend.com/).

Of these, NDepend is the most complete for .NET projects. It provides a vast array of metrics to measure the quality of your code. However, it isn’t free which can be a show-stopper for some teams and organizations.

Team Conventions

Another cause and manifestation of legacy code is lack of team conventions. If you don’t have them already, you really should get together with the team and agree on some things.

Code Conventions

It’s very jarring to navigate through code that is using different conventions throughout. Things like naming (variables, methods, classes, projects, etc) and indentation come to mind, but even architectural choices can be agreed upon.

Certain conventions can be enforced by tools like StyleCop for .NET (https://github.com/StyleCop/StyleCop) or ESLint for JavaScript and TypeScript (https://eslint.org/). By automating the enforcement, you can avoid pointless discussions or awkward code reviews where it feels like you’re nit-picking.

Conventions can evolve of course. When someone makes a good case, or the team decides a certain convention is no longer something they want to follow, then it’s OK to make the change. If conventions aren’t allowed to change, the code will start to feel like legacy code even more.

Procedures

Tackling legacy code is a team effort. So you’ll have to make sure everyone is on board. This means everyone should be in favour of fixing the issues the code has, but also that the team should agree on how they will collaborate.

How will we react to bugs? How will we deploy our software? What release schedule will we follow? There are many questions that need to be answered.

I recommend automating deployments as much as possible. This includes configuration changes, database changes, etc. This means you’ll have to put all this in source control, which only has benefits.

Once you’ve automated your deployments, you can start releasing very regularly. I would aim for weekly releases at the minimum. Releasing faster makes it easier to pinpoint a bug. Because you haven’t made a myriad of changes, the bug can only be found in a few places. If you do quarterly big-bang releases, there have been so many changes that it will be harder to find which change caused the bug. The time between the code change and the release has also been longer, making it difficult to remember your reasoning when you were changing the code.

It Will Be Alright

It can certainly feel like a legacy project is unsalvageable and things are only getting worse. But it is possible to change course. It will take time and work, and what I’ve covered may seem like a lot of work. However, if you do it step by step, using the tips I laid out here, you should be able to improve the situation you’re so frustrated with now. It certainly beats doing nothing about it!

This article was technically reviewed by Damir Arh.

This article has been editorially reviewed by Suprotim Agarwal.

Absolutely Awesome Book on C# and .NET

C# and .NET have been around for a very long time, but their constant growth means there’s always more to learn.

We at DotNetCurry are very excited to announce The Absolutely Awesome Book on C# and .NET. This is a 500 pages concise technical eBook available in PDF, ePub (iPad), and Mobi (Kindle).

Organized around concepts, this Book aims to provide a concise, yet solid foundation in C# and .NET, covering C# 6.0, C# 7.0 and .NET Core, with chapters on the latest .NET Core 3.0, .NET Standard and C# 8.0 (final release) too. Use these concepts to deepen your existing knowledge of C# and .NET, to have a solid grasp of the latest in C# and .NET OR to crack your next .NET Interview.

Click here to Explore the Table of Contents or Download Sample Chapters!

What Others Are Reading!
Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+

Author
Peter Morlion helps SMB's and scale-ups reduce regression bugs in legacy codebases with too much technical debt. He aims for high quality code, while remaining pragmatic and collaborating with the team. He firmly believes in automated tests and will coach teams to improve their test coverage and the quality of their tests. He enjoys teaching and sharing knowledge, as can be witnessed in the articles and blog posts he has written and online courses he has recorded.


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!