Some bugs are bugs of commission. In my experience, it’s most of them. You get an equation wrong. You compare against the wrong, but similarly named, field. Or maybe as simple as flipping a sign in a test. The nice thing about bugs like that is that they’re usually easy to spot. Ideally with your test suite, but even if not, the response is generally wrong, so you notice, and then you fix it.
Other bugs are bugs of omission. The code you’ve written is 100% correct. It has the right logic in the right place. The algorithms and calculations you’re doing are correct. The code usually, almost always, works. Those are the ones that are hard to find.
They’re the ones that aren’t obvious. They only happen when conditions are just right (or maybe, just wrong). They happen because of assumptions and biases used when the code was being written. And they creep in at various points in the development process.
At the lowest level it happens when you get caught by your assumptions of how a languageworks. Back in the old days of C/C++ memory management was critical. Not just being sure to free everything you allocated, but avoiding buffer overruns and using uninitialized memory. Both of those less likely (or nominally impossible) with more recently developed languages, but there are similar things. Even if there’s garbage collection to clean up after you, you can forget to allocate something or allocate it twice. In
go the differences between a slice and an array are subtle. Unless you are careful when passing/returning them you’ll end up with copies of copies. Your changes will be visible in some places and not in others.
The next level is dealing with more abstract things. Consider a search method. There are lots of things you could omit when using a collection. What if the thing you’re looking for isn’t there. What if there are more than one of them? What if the collection is empty (or worse, not defined)? How does your code handle that? Of course, it’s obvious when you use a method called
Find() that you’re doing a search. But sometimes what you’re doing is logically a search operation, but you don’t necessarily recognize it. Create something. Store it. Update it. Do some other stuff, then update it again. What happens if things change while doing that other stuff? Like someone else deleted the thing you created. What do you do then?
Or in a distributed system, what if the remote call you’re making works, but the success response is lost. Did you handle that case? Do you just try to create it again, or do you do a search and then try to create it again? Or do you just ignore the response in the first place since “it always worked before”? That’s a big sin of omission.
Then there’s the meta level of omission. You’ve got an API that can do multiple things. Some of those things are permanent. Consider the lowly
rm command. Back in the day it was possible to run
rm -rf /. The command, being the good command that it was, assumed you knew what you were doing and began deleting everything. After all, computers are good at doing what you tell them, not what you want. In case you’re wondering, even with good backups, it’s really hard to recover from that. Since that’s not something you would want to do (there are better ways to actually erase everything on a disk) there was no code to prevent you from running a valid, but probably unintended, command. Now, on linux at least, there’s the
--no-preserve-root option and a check to make sure you don’t make that mistake. But it’s indicative of a whole class of errors. The “Why would anyone do that?” error. And if you don’t check for it then at some point, it will happen.
The trick is to test for them. But how. Part of it is a mindset. Think of ways that things could fail, and test for those, not just the happy path. Think about ways your system could be misused, or at least misunderstood. Make sure you’ve guarded against those things too.
This is also where another set of eyes is good. Someone without your biases and assumptions. That person will think about things in a way you’re not. After all, if you thought about what would happen if someone tried to delete the entire filesystem you would have added the flag in the first place.