by Leon Rosenshein

Pictures Or It Didn't Happen

One of the oddities of software is that the vast majority of the work done can't be seen. It's just a series of 0's and 1's stored somewhere. You can spend man years working on paying down technical debt or setting yourself up for the future.

When I was working on the Global Ortho project we had something we called the MBT spreadsheet. The Master Block Tracking list. It started life as a PM's way of tracking the status of those 1 degree cells through the system, from planning to acquisition, ingest, initial processing, color balancing, 3D model generation, then delivery and going live to the public. There were only 5 - 10 blocks in progress at any given time, and it was all manual.

Then we scaled up. There were about 1500 blocks in total to process and we usually had more than 100 in the pipe at any time. it got to be too much for one person to handle. So our production staff did what they usually do. A bunch of people did a bunch of manual work outside the system and just stuffed the results in. And no one knew how it worked or trusted it. But the powers that be liked the idea, and they really liked the overview that let them know the overall state and let them drill down for details when they wanted.

So that gave us our boundaries. A bunch of databases and tables with truth, and a display format we needed to match. We just needed to connect the two. The first thing we did was talk to the production team to figure out what they did. We got lots of words, but even the people who wrote down what they did weren't sure if that was right. So we went to the trusty whiteboard. We followed the data and diagrammed everything. We figured out how things were actually flowing. That was one set of architectural diagrams. But they were only for reference.

Then we took the requirements and diagrammed how we wanted the data to flow, including error cases and loops. And it was really high level. More of a logic diagram. All of the "truth" was in a magic database. All of the rules were in another magic box. The state machine was a box. We used those few boxes with a few others (input, display, processing, etc) we figured out how to handle all of the known use cases and some others we came up with along the way.

The next step was to map those magic ideal boxes onto things we had or could build. Database tables and accessor, processing pipelines, state machines, manual tools, etc. That led to some changes in the higher level pictures because there was coupling we didn't have time to separate, so we adjusted along the way. But we still couldn't do anything because this was a distributed system and we needed to figure out how the parts scaled and worked together.

And that led to a whole new round of diagrams that described how things scaled (vertically or horizontally) and how we would manage manual processes inside a mostly automated system. How we would handle errors, inconsistencies, and monitoring. How we would be able to be sure we could trust the results.

Once we had all those diagrams (and some even more detailed ones), we did it. We ended up with the same spreadsheet, but with another 10 tabs, connected to a bunch of tables across the different databases, with lots of VBA code downloading data then massaging it and holding it all together. Then we went back to the powers that be and showed them the same thing they were used to looking at and told them they could trust it.

Understandably they were a little surprised. We had about 3 man-months into the project and everything looked the same. They wanted to know what we had been doing all that time and why this was any different. Luckily, we expected that response and were prepared. We had all those architectural diagrams explaining what we did and why. And by starting at the high level and adding detail as needed for the audience we were able to convince them to trust the new MBT.