by Leon Rosenshein

All Models Are Wrong, But Some Are Useful

As I’ve mentioned before, I’m a mechanical/aerospace engineer by training. My first job out of school was doing simulations to do trade off analysis on different aircraft configurations. Some were studies around mission effectiveness in different combat environments, and others were aerodynamic simulations to see what effect changing flight control software would have. In all of those cases we were using code to represent the various real-world systems and their interactions to predict what the results would be in the field. And we needed to do lots of trials.

This was 30 years ago, and the computers were slower, so we made lots of simplifying assumptions so we could get our results faster. First, computers are discrete, and the real world is generally, analog. We made time slices small and used all sorts of integration methods and feedback loops to make it seem like things were analog. Second, all that code was an approximation of reality. For the aerodynamics we took 3 and 4 dimensional tables (angle of attack, speed, altitude, and control surface positions) of flight data and interpolated between the data points. Which meant we left out a lot of things, like how all of those things (and others) were changing. For avionics and other systems, we used statistical data. How far away radar could detect objects of a given size. How effective missiles and countermeasures were. Again all defined in nice, neat, discrete tables.

In other words, a model of the systems. We knew they weren’t exactly correct, but we felt they were correct enough draw conclusions from the results. Which leads us right to

All models are wrong, but some are useful

        – George Box

Box was correct when he hinted at it in 1976, and made that exact statement in 1978. The models are wrong, but they’re useful in getting results faster, cheaper, and safer than you could by running those 1000’s of trials.

Box wasn’t the first one to talk about it though. Going back to the 1930’s, Alfred Korzybski talked about how maps represented something, and could be very useful, but they’re not the same thing.

A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness.

        – Alfred Korzybski

Besides being a representation of a single point in time, in the past, maps don’t have all the details. Depending on how closely the intended use of a map is with how you’re using it those details could be crucial. If you’re out hiking in the bush, even the most detailed road map won’t help you know elevation. Conversely, a topographic map is great for hiking in the mountains, but not very good if you need to know which freeway to take between two cities. And that doesn’t even consider that the map might just be wrong.

A lot of software development is based on models and mental maps of the various domains. From Machine Learned models (Uber’s ETA or pricing models, ChatGPT and the various LLMs, etc) to expert based heuristics (financial fraud detection algorithms, alerting on operational metrics, etc), to something as seemingly simple as the state machine for bug tracking, we use maps and models to help us understand the dependencies and interactions between systems. The better the model (dependencies), the more accurate the predicted results (interactions), and the better we can use them to drive system behavior in the direction we want. Get it wrong and we end up with the cobra effect and things get worse.

Another way we use maps is data schemas. Schemas are maps of the structure of how the data is expected to fit together. We use the schema to store the data. We use the schema to drive how we process the data. We use the schema to define how we accept input and provide output. The closer the map (schema) is to the territory (the actual data and its structure) the more useful the map is. If the schema doesn’t match the structure, then you find people working around the system instead of with the system. Using and changing things becomes even harder than it would have been.

With all of that said, maps and models are useful and important. They reduce cognitive load. They easy communication. They let us get results without having to simulation every air molecule flowing over a wing in a continuous stream. You just need to remember that the model might be useful, but it’s wrong in one or more ways, and that the structure of the map is helpful, but you can’t really travel by map.