Code Coverage Is NOT useless
Mini rant today. There are lots of teams across the software industry that are called some variation of “Software Quality”. That’s a lovely term. It means different things to different people. There are (at least) two kinds of quality at play here. Internal software quality (ISQ) and external software quality (ESQ). ESQ is about correctness and suitability for the task at hand. ISQ is about the code itself, not whether or not it works as specified. Not all quality teams are responsible for both kinds of quality.
Furthermore, as much as people want it to mean that the team called “Software Quality” is responsible for ensuring that the entire org is building software with both internal and external quality, that isn’t the case. Those teams are not, and cannot be, responsible for what others do. After all, they’re not the ones writing the code. What it does mean, and what they can, and generally do, do, is that they are responsible for defining and promoting good practices and especially, for pointing out places in the codebase where the code misses the mark.
There are two very important points in that last sentence. The first is that the quality team’s job is to identify where the code misses the mark. NOT the developers. Code ownership is important, and people write the code, but it’s important to distinguish between problems with code and process and problems with people. That, however, is a topic for another time.
The other point, and where I’m going with today’s post, is the pointing out part. The quality team’s job is to point out, with comparable, if not truly objective values, how much ISQ the code has. There are lots of ways to do that. Things like cyclomatic complexity, lint/static analysis warnings, code sanitizer checks, or code coverage percentages. Those measures are very objective. There are X lint errors. Your tests execute Y percent of your codebase and cover Z percent of the branch decisions. And you can track those numbers over time. Are they getting closer to your goal or further? You can argue the value of all of those metrics, but they’re (relatively) easy to calculate, so they’re easy to report and track.
Which, finally, gets us to today’s rant. I ran across this article that that says code coverage is a useless metric. I have a real problem with that. I’m more than happy to discuss the value of code coverage metrics with anyone. I know that you can have 100% code coverage and still have bugs. It’s easy to get to a fairly high percentage of code coverage and not say anything about correctness. In complex systems with significant amounts of emergent behavior it’s even harder to get correctness from low level unit tests. Just look at that article.
What bothers me most about that article is the click-baity title and the initial premise. It starts from “Because it’s possible for a bad (or at least uncaring) actor to get great coverage and not find bugs, coverage metrics are useless.” If you have that approach to management, you’re going to get what you measure. To me, code coverage is a signal. A signal you need to balance with all of the other signals. Letting one signal overpower all the others is hiding the truth. And like any useful signal, its absence is just as enlightening as its presence. If you have a test suite that you think fully exercises your API and there are large areas of code without coverage, why do you even have that code? If you really don’t need it remove it. Maybe your domain breakdown is wrong and it belongs somewhere else? Should it be moved? If you find that there are swaths of code that are untestable because you can’t craft inputs that exercise them, do you need a refactor? Is this an opportunity for dependency injection?
So the next time someone tells you that code coverage is a useless metric, maybe the problem isn’t the metric, it’s how they’re using code coverage. That’s an opportunity for education, and that’s always a good thing.