by Leon Rosenshein

To Test Or Not To Test

The NA repo has a requirement on code coverage. And not just a requirement, A high one. And that makes sense. The Autonomy code is mission, critical, man-rated, and people's lives depend on it not going wrong. But that requirement applies not just to the Autonomy code. it applies to everything in the repo. Frameworks, web services, and command line tools among other things. And that makes sense. Yes, unit tests add some up-front cost, but overall they speed things up. They do that by giving you instant feedback and confidence.

I made a change to one of our CLI tools a couple of months ago. Just a small change to allow the user to use a yubikey for two-factor auth when getting a usso certificate. Simple. Add a flag, check the flag's value, act on it. So I implemented it. Did some simple testing to make sure that it did what I wanted, then got ready to do the PR. Now our PR template has a checkbox to indicate that tests have been run. The PR build gates on passing all the tests, so we don't block PRs that haven't set the flag, but still, it's a good idea and doesn't take much time, so I ran them. And they failed. Seems there was another entry point I hadn't considered. Luckily there were tests and it was a quick fix to get the tests passing. That also reminded me to add some tests for my new flag. And those tests found some more edge cases.

So yes, the process from thinking I was done to actually submitting the PR was about 2 hours longer than it would have been without the tests, but that feature has been out for a while now, and people have used it. There haven't been any issues and it's solved not only the yubikey problem, but it's saved folks who have multiple phones associated with their account as well. If those tests hadn't been there something would have been interrupted to get a fix out, it would have taken more time to get it working, our user would have had to deal with bugs for a while. So those couple of extra hours up front saved a couple of days time across the org. And that's just for a simple CLI. Think about tools and services that are core to our workflow. If they're down for an hour that's 1000's of man-hours lost.