by Leon Rosenshein


Big data is all about statistics, and lots of what we do comes down to statistics. Precision, Recall, MPBE, Code Coverage, SLAs, expected battery life, Texas Hold-em, and more. We look at a large enough sample, do some math (or let a computer do so math that we may or may not be able to follow), pick a threshold, and declare that meeting it is enough. No-one has found a better way. Of course none of the statistics guarantee how an individual instance will go. We can know that 99.9999999% of calls to a web service will take less than 150ms, yet it's still possible for 3 calls in a row to take 200ms. So what's a poor developer/engineer/data scientist/PM to do? It's the old boy scout motto, "Be Prepared". It's defense in depth. It's adding watchdogs and timeouts. It's fallback positions. And it's being honest with ourselves that statistics and probabilities are not certainties, but that's the way to bet. Here's a view of how it can impact results and decisions in the medical field.