Exsqueeze Me?
With all due respect to Mike Meyers as Wayne Campbell, I saw something on the internet and the only possible response was Exsqueeze Me?. The quote started out OK, not great, but OK, then, right there at the end, it took a sharp left into crazy town.
Good teams can and will delete tests that have high false positive rates – or that never fail.
Here’s the thing. If you’ve got a test with a high false positive rate, that’s bad. Flaky tests are very bad. Bad in many dimensions. To list just a few of them,
-
They waste your time: Every time a test fails, you’re expected to go analyze what failed and why. Then figure out how to prevent that specific issue. However, if you go analyze the failure and it turns out the really wasn’t a problem, you’ve wasted however long it took you to figure out there wasn’t a problem.
-
Failing tests become business as usual: It’s the broken window effect. When no tests are failing then a sudden failing test is noteworthy. When you have a test that randomly fails then an additional failing test isn’t nearly as noticeable. If it wasn’t important enough to fix the only failing test, then each additional failing test is that much less important.
-
Alert (or Alarm) Fatigue is real: When the siren sounds it’s supposed to be unusual and noteworthy. If your alarm is going off all the time then it’s not an alarm, it’s just life. Just like the boy who cried wolf, if the alarm keeps going off and there’s nothing you can, should, or must do, you start to ignore it.
-
Flaky tests indicate a lack of understanding: It could be a lack of understanding of the domain, the environment, the test setup, or any combination of those three. If you don’t understand the system and situation in this specific case, what else aren’t you understanding? What are you missing that’s going to cause you problems later on?
That’s just some of the reasons flaky tests are bad. Deleting them isn’t the worst thing you could do, and it will fix the first three problems above, but it doesn’t do anything to fix the fourth. In fact, it just hides the problem. Ignoring a problem rarely makes it go away.
Therefore, most of the quote is almost correct. Instead of just removing a flaky test, a much better response is to fix the test so that it’s not flaky. It could be a bug in the code, a bug in the test, a problem with your test methodology, or a lack of understanding. Whichever it is, once you make the test pass consistently, you’re in much better shape. You don’t waste time. You’re incentivized to keep things clean. Alerts mean something. You understand your situation that much better. Which means you get to sleep that much better at night.
It’s the last part of the test that’s just plain WRONG. Some will say that a test that never fails serves no purpose and it’s wasting resources. Time. Bandwidth. Cognitive load. For no measurable benefit. They haven’t stopped a single bug from getting through.
That facts are real. All tests take time, bandwidth, and add some amount of cognitive load to the developers. But all of that, for all of your unit tests, should be minimal. If they’re not minimal then you have other problems (bad tests) you should fix1.
Just because a test hasn’t caught a bug yet, you can’t know that it won’t ever catch a bug. Even if no-one is changing the code directly, those tests can still help keep you safe. They do things like:
-
Protect against changes in dependencies: Dependencies outside of your control can change. Those changes can make your code break. If you don’t test, you don’t know.
-
Protect against environmental changes: There are lots of things in the environment that can change. Networks come and go. Clock speed changes. Processors get replaced and new ones show up. There can be subtle differences in the environment. If you don’t test, you don’t know.
-
Protect against bugs in tooling changes: Similarly, tools change. Runtime environments, compilers, and interpreters can change. Are you relying on undefined behavior? That can change without you knowing it. If you don’t test, you don’t know.
-
Provide examples of how to use the code being tested: Tests are great examples. They can be documentation. They can be used as a learning environment. They can be a reference design.
-
Acknowledge Hyrum’s Law: Given eough time and users, everything your code does, intended or not, is going to be relied upon by someone. You never want to change behavior on your users without knowing about it. That is not how you want to surprise your users.
-
Prevent bugs in the code under test: Finally, and certainly not the least important, you never know when a test is going to show you that you’ve broken something. Past performance is a good indicator of future performance, but it is not a guarantee. If you don’t test, you don’t know.
And that’s why the only possible response to someone saying you should delete tests that never fail is Exsqueeze Me?
-
Those tests might not be flaky, but they can be bad for many of the same reasons. And you should fix them, not delete them. But that’s a slightly different post. ↩︎