by Leon Rosenshein

Kinds Of Tests

There are lots of classes of tests. It all depends on what your goals are. Using the correct test for the situation ensures you’re actually testing what you think you are.

Unit tests are good for testing things in isolation. They let you control the inputs, both direct and dependent, and make sure the result is what you expect. The tighter the control you have the more reproducible the test and the fewer false positives you get. And it’s important to test not just that good inputs produce correct answers. You also need to ensure that all the possible combinations of incorrect inputs produce the correct response, whatever that means in your case. Having good, hermetic, unit tests is crucial to minimizing regressions.

Integration tests, on the other hand, have a whole different kind of isolation. You want integration tests to have as few controlled inputs as possible, but you want to be sure that the results of your integration tests are kept out of your production systems. Because what if there’s something wrong that you haven’t found yet. You run them carefully, and you know exactly what to expect as a result so you can look for small changes. Having good integration tests is crucial to finding problems with interactions and emergent behaviors. And if (when) you find some, it’s a good idea to figure out how to add them to the unit tests as controlled dependent inputs.

Scale tests are kind of like integration tests, but you’re looking for different things. Instead of having a small number of hand-crafted input sets and caring about the specific results, scale tests use lots of inputs and see what the impact of demand/concurrency is. The actual results aren’t checked. Instead the error rates and internal metrics, such as response time, queue sizes, memory used, are tracked and anomalies are flagged. Scale tests include not just the number of requests, but the number of requests per time and time at scale to see how a system responds to spikes and long periods of high demand. Good scale tests need lots of input, but give you confidence that your production systems will keep running if they get deployed.

Then there are tests you run in production. Some call those experiments or A/B tests, but they’re tests just the same. You’re just testing how something not under your direct control responds to changes. Things can get really dicey here. First, you need a good way to segment the population so only a subset get the new experience. You need to be able to define the group tightly and repeatably, If subjects go in and out of the group it’s probably not valid. You need to ensure that not too many subjects are in the test. You need to make sure that the test doesn’t have an unwanted impact on the control group. You need them though because good experiments let you test things safely in the real world with real users.

And of course you need frameworks to handle all of these different kinds of tests. Sure, everyone could write their own, but that’s a huge duplication of effort. And worse than that, it increases cognitive load, because now you have to worry not only about the tests you’re doing, but how the tests are done as well. And the last thing I want people running tests to worry about is if the test harness is really running the tests that have been written and returning the correct results.