Leveraging Mutation Testing To Onboard Legacy Projects

6 min readFeb 13, 2023

You are an accomplished engineer, your skill set is vetted, and you are trusted with a challenging task — doing some important maintenance work on a legacy service you know nothing about.

The original developers of the service are long gone, you have enough business context for what it does but limited context for the technical implementation of the service.

Just understanding the architecture and code of the project is complex enough, but you need to figure out how to test the change, and then assess its risk and commit to effort estimations based on very little context and knowledge.

Wouldn’t this be amazing to have a way to reduce the unknown, and relevant risks that come with it?

In this post I’ll cover a pattern I’ve found useful when onboarding an unfamiliar codebase (doesn’t have to be legacy) that works quite well in boosting confidence before starting the actual coding phase.

Challenges With Onboarding to Legacy Projects

One of the greatest challenges with unfamiliar codebases, is lack of context — the lack of “tribal knowledge” that was gathered by the developers of the code while working on it, for example:

what are the complex parts of the code
all those “gotchas”
The “Why” behind some of the design decisions and patterns
Knowledge of which features were developed under pressure and are lacking test coverage and code quality, and which were developed under better patterns.

At least for me, when I start onboarding a new codebase — I start with a short exploration of the production environment to understand the moving parts.

Then I go to read the main code flows (the ones relevant for my change).

Then I skim through the tests to have a general sense of their quality, And while skimming them I run them. Hoping for the test suite to turn out green ✅.

Luckily, they passed! But what exactly are they testing? Are they worth anything? Will my changes be caught by the existing test suite?

How awesome would that be if you could know that the code you’re about to change is covered by thorough tests! That any change you’ll make will break a test, and you’ll have an assertion that’ll show you the impact of your change? How safe will you feel? How simpler time estimations for this project will be?

What if I told you you can!

You can have an assessment for how well the code you’re about to touch is covered by tests? Wouldn’t this be awesome?

Mutation Testing In A Nutshell

In a nutshell, mutation testing is the act of applying small changes to a program and asserting that the program’s test suite catches those changes — i.e. for each change, at least one test is failed.

Some examples for what are those “small changes”, called “mutants”, are:

Changing branch conditions — if the code has if x < 10 the mutant might be if x > 10 or if True

Changing literals — changing the values of literals such as strings / integers / Booleans so if your code is x = 2; return x * 10; the mutant might be x = 10; return x * 10;

Actual example in Scala from stryker-mutator project:

The teal circle is an actual mutation that was applied on the code — changing the s.trim.nonEmpty line 13 to s.trim.isEmpty

In my opinion, by running mutation testing on a codebase you gain two things:

Understanding which parts of the code are lacking test coverage or test quality
A “feeling” of how safe it is to introduce changes to the program

Using Mutation Testing For Blindspot Analysis

The intro for this post happened to me — I was assigned to introduce a change to a system I wasn’t very familiar with.

It had, what seemed to be, a fair test suite of a few hundred unit tests, but their quality wasn’t known to me and from initial skimming of their code I was afraid they weren’t good enough — they seem to do a lot of things, with quite long tests — which I know from experience to be a code-smell, and a bug red flag 🚩.

I was looking for a way to assess whether the changes I am about to introduce would have been covered by the test suite. Sounds exactly like what mutation testing promises no?

By sheer luck, I stumbled upon the stryker-mutator project just when I was about to start my work. After ~10 minutes of configuration I had an HTML report of all changes introduced to my code base that the existing test suite failed to catch.

Or in simpler words — it showed me which parts of the code base aren’t covered well enough by tests — It shined a very bright light on where it’ll be difficult, risky and time consuming to apply changes.

Caveats

It’s not all perfect, when I’ve executed the tests, many of the mutants that weren’t caught by the test suite were non-functional requirements that usually aren’t tested — for example:

Mutants that changed log messages — we don’t test for those.
Mutants that changed monitoring metrics values such as incrementing a request counter by 2 instead of 1 — we don’t have tests for those.

So while the initial report had quite a low Signal — Noise Ratio, by configuring the test not to change string literals for example, I was able to get better results that were actually actionable.

In addition — the mutants tested obviously do not cover 100% of possible changes, so a “green” execution doesn’t guarantee change safety, while a “red” execution is a definite problematic place, where code changes should be carefully introduced.

Isn’t Test Coverage Enough?

You are right to think that coverage can give you similar insights, BUT — coverage only checks for whether executing a test “steps” on a code path, it doesn’t check whether changing that path will cause a test to break — thus it might create a false sense of safety. From my experience, indeed, a high coverage rate isn’t an indicator for anything.

Final Thoughts

Getting comfortable in an unfamiliar codebase is always a challenge, let alone in a legacy codebase.

Running a mutation testing suite when onboarding the project can really ease the process!

It can shine a light on the problematic parts of the codebase.
It can serve as a warning sign to where you should be more careful introducing code changes, and where additional tests will be required.

As always, thoughts and comments are welcome on twitter at @cherkaskyb