by Leon Rosenshein


I cannot emphasize too much that architecture is as much about programmer discipline as any technical consideration. Without programmer discipline, all systems, no matter how well designed, degrade quickly into gray goo at the hands of people who don’t understand the "why." -- Allen Holub

Back in the stone age (actually the late 80s/early 90s) I was working for a 3rd tier aerospace company in southern California. One of the things we built was a tool we called ARENA. Basically a squadron level air combat simulation system. We handled flight modeling of aircraft and missiles, Air Combat AI, and a really fancy (for the time) display and replay system. And, I think, a pretty good domain driven architecture.

It helped that the domain was pretty simple (vehicles moving in 3D space) and that the entities didn’t really communicate with each other. At least not actively. Each object did it’s own physical modeling and put it’s location in a distributed shared memory system. After that each object was on it’s own to detect and respond to the other things in the world. And for each object we broke things down like the real world. Propulsion, aerodynamics, sensors, and either an AI or a set of input devices. Interfaces between systems were defined up front and we all knew what they were and respected them. We had external customers and internal users doing research into different flight modes and doing requirements tradeoffs.

Then we got a new customer. Someone wanted to use our system as a test bench for their mission computer (MC). Seemed reasonable at first. We already had a simulated world for the MC to live in, and we had well defined models of an individual aircraft, so how hard could it be to add a little more hardware to the loop? Turns out that it’s approximately impossible. At least with the architecture we had. Because our idea of the interfaces inside an aircraft were purely logical, while the MC expected distinct physical components talking to it over a single bus. So we wrote some adapters. That worked for some things, like the engine, because there was one input (throttle) and 3 outputs (fuel burn, thrust, and ingest drag). But it didn’t work for some of the more complex systems, like the radar, It had lots of inputs, including vehicle state, pilot commands, world state, and mission computer commands. And to get all of them we needed our adapter to reach into multiple systems. The timeline was tight so, instead of refactoring, we did the expedient thing and reached around our interfaces and directly coupled things. And it almost worked.

Everything was very close to correct, and often was, but the timing just didn’t work out. Things would drift. Or miss a simulation frame and stop, or worse, go backwards. So we added some double and triple buffers. That got rid of the backwards motion, and the pauses were usually better, but sometimes worse. So we added some extrapolation to keep things moving. Then we added another adjustment. And another. What was supposed to be a 4 week installation turned into a 3 month death march and resulted in a system that worked well enough to pass acceptance tests, but really wasn’t very good at what it was supposed to do. It went from a clean distributed system to a distributed ball of mud.

And that happened with the same team that built the initial simulation doing the mods. We knew why we had built ARENA the way we did. The reasons we put the boundaries and interfaces where we did. And why we shouldn’t have violated those boundaries. But we did. Not all of them, but enough of them. And we paid the price. And the system suffered because of it. Because we didn’t have the discipline to do the right thing.

Now imagine what would have happened if we didn’t know why things were the way they were. And there wasn’t any documentation of why. Which of course there wasn’t because hey, we all knew what was going on, so why right it down? Any semblance of structure would have fallen apart that much faster. And we probably would have not just had problems with the interface, but likely broken other things as well. It’s Chesterton’s Fence all over again. That’s where Architectural Decision Records (ADRs) come in. Writing down why you made the decisions you did. The code documents what the decision was, and the ADR documents why.

So two takeaways. First, next time you get into a situation where you have a choice between the “right” thing and the “expedient” thing, remember the long term cost of that decision. Remember that once you start down that slippery slope of expediency, it just gets easier and easier to make those decisions. Because once the boundaries and abstractions are broken, why bother to keep them almost correct? I’ll tell you why. Because this way lies the madness. Otherwise known as the big ball of mud.

Second, write down the why. The next person to work on that area won’t know all the background and the experiments that led to the current architecture/design. Having the why documented lets them avoid making those same mistakes all over again. Even (especially?) if that person is future you.