For my friendgineers· Leon's musings on software development

February 18, 2021 by Leon Rosenshein

YAGNI, but ...

yagni

I’ve said it before, and I’ll probably say it again. Remember YAGNI. You Ain’t Gonna Need It. But, YAGNI, until you do. And if you haven’t planned for it, then what?

We got about 5 inches of snow overnight here in sunny Colorado. It’s all shoveled now, and I used my legs, not my back, so I’m fine. Actually, this is Colorado Powder, so I used a 36” push-broom for most of it. But it reminded me of a teaching moment from my youth in the northeast, where the snow is wet and heavy, more like concrete. And it’s not so sunny, so it kind of sets like concrete too.

After one of those 12” dumps my father sent me out to shovel my grandmother’s sidewalk and plow her driveway. I argued a bit, but hey, I was going to get to drive the tractor (not me, but the same tractor), so I did it. I did it exactly. Nice straight cuts down to the edge of the sidewalk and put the snow right there on the edge. A couple of turns around the driveway with the tractor and if the berm was on the driveway a little, there was still plenty of room for the car, so no problem. I told my dad I was done and asked if I could borrow the tractor to plow the lake off for hockey.

He took a look outside and asked if I was sure I was done because there was going to be more snow later in the week, and it was only November, so … I said yes. Everything is clear. He said OK, you can borrow the tractor, but you have to keep doing your grandmother’s sidewalk and driveway. I thought that was great and went and plowed the lake. Driving on ice with individual rear wheel braking is fun, btw.

And as it is wont to do in those parts, it got a little warmer (like mid 30s) then got cold and snowed again. So I plowed and shoveled, and if the driveway got a little narrower because the berms crept inward a little it still wasn’t a problem. The car still fit. Rinse and repeat a few times. And then one morning after plowing the car didn’t fit. And the berms were about 2 feet of near solid ice. So out I go w/ the tractor, pick, and shovel and beat the berms into submission. Got my grandmother’s car out and she got to work, so all was well.

At which point my father reminded me that just doing what you need at the moment has long term consequences, and saving a little time then can leave you in a really bad place later. Then he got out the big tractor and pushed the berms back to where they should have been.

That’s when I learned about the exceptions to YAGNI. You might not need it now, but sometime you know you’re going to in the future. Or at least the combination of work needed now to prepare for the future combined with the chance of needing it later and the amount of additional work needed to recover from the omission makes it worth buying the insurance of the pre-work.

And that applies to software just as much as it does to plowing the driveway. It’s certainly easier and faster to access S3 directly than it is to build and use a full POSIX compliant file system wrapper. So you probably shouldn’t do that since you don’t need the full wrapper now, but maybe a simple facade, with no semantic differences that does nothing but collect your accesses and maybe collects some metrics, is worth doing. The metrics are certainly nice, and it lets you see your access patterns better. And, if you realize later that yoU DO need that full POSIX compliant system or you need to switch to Azure blob store, or even just handle multi-region S3 access efficiently you’ve got a place to do it.

So remember, YAGNI, but …

February 17, 2021 by Leon Rosenshein

Ch-ch-ch-ch-changes

The greatest payoff of domain-driven design is not that the *state* of a business domain maps cleanly onto software but that *changes* to a business domain map cleanly onto software.

-- Chris Ford

Domain-Driven Design has lots of benefits. Ubiquitous language. Clear Boundaries. It works with the business, not against it. And done right it changes with the business.

Back in my early days I was working with a FORTRAN simulation of an F18. It was designed to see how changes to the flight computer would translate into changes in the handling characteristics. And it was pretty good at doing that. But that’s not what I was doing with it. I was using it as the flight model for a 1v1 simulation.

First, I turned it into a real-time, man-in-the-loop system, with a simple graphical display. That was relatively easy. Instead of reading a time series of control inputs hook the inputs up to some external hardware and the keyboard. It already ran faster than real-time at 50hz, so no problems there. Just connect the outputs to a simple out-the-window display and voila - a man-in-the-loop simulation. The only thing left to do was add another copy of the aircraft to the world. And that’s where I ran into problems.

You see, the simulation was designed around “ownship”, the vehicle being simulated. And it did a great job modeling the forces acting on ownship. Thrust, drag, lift, gravity, asymmetric forces from control surfaces, inertia. All modeled. And the result was net forces and rates. In all 6 degrees of freedom. But it had no absolute location, and or orientation. It was always at the center of the universe, and the universe moved around it. And for speed of implementation I put the notion of absolute location and orientation in the display system.

That’s fine if you’re the only thing in the universe, or at least all the other things don’t move, but it’s kind of hard when you want to interact with something and you’re both at the center of your own private universe that moves around you. But still, it’s just math and transformations, so I made it work for 2 aircraft in one visual world. I had it working for 3 as well, when I ran into a problem.

My boss told me we need to do MvN sims as well. Since we don’t have enough pilots let’s make some of the aircraft AI. Skipping the whole “How do I build an air combat simulation (ACS) AI overnight?” issue, I also had a problem in that these simulations needed to interact outside of the graphical display. Unfortunately my design had the display system as the only place everything interacted.

My first thought was to duplicate that logic everywhere things had to interact. It worked after a fashion, but it was hard to keep track of and as more and more things needed to know where everything else was it got even harder. Because I had my domains wrong.

I needed to change. So, back to the drawing board to fix the domains. Start with a consistent world for everything to live in. Move the integration/transformation into each model so that they had an absolute position. Share those around the system and let each system do what it needed to. Just like the real world.

And unsurprisingly, after that I found that a lot of the things we had planned became easier. Things like adding new vehicles, weapon systems, upgraded ACS AI, or adding a record/playback system. They matched the domain, so I was able to add them within the system, without doing code gymnastics to get them to fit.

February 16, 2021 by Leon Rosenshein

MVP

it depends

Just what is an MVP anyway? Like everything else, it depends. In this case on context. The Super Bowl has a Most Valuable Player, but that’s not the MVP I mean, In this case I’m talking about the MVP as Minimum Viable Product.

But even there, MVP isn’t consistently used. I first ran into the term when I was working on the initial release of in-browser 3D mapping for Virtual Earth. We (the dev team) were extremely busy trying to figure out how build a product that could stream a surface model of the entire globe down to a browser at sufficient level of detail to show the Microsoft Campus in the foreground and find the our office windows, but not flood the pipe with data for Mount Rainier. While we were busy figuring that out, a different group of people was figuring out what feature set we needed to make a product that fit into the Microsoft online offerings. Two groups trying to figure out what the MVP was. We eventually shipped something, but I’m not sure we ever agreed on what the MVP was. Since then I’ve seen the MVP debate play out the same way multiple times.

There’s 2 big reasons for that. First, because MVP is made up of some highly subjective terms. Let’s start with the last one and work backwards. Product is probably the easiest. It’s the thing you’re selling/giving away. Of course you haven’t defined what it is or why someone wants it. That’s where viable comes in. A viable product is one that a customer is willing to pay for. And for a product to be viable the total of what the customer pays, both direct and indirect, must add up, over time, to more than the expenses. If not, you’ll eventually run out of money and not be selling anything. Which leads us to minimum. What is the absolute least you can do and still have a viable product? The assumption here is that the minimum takes the least amount of time to build, makes the fewest wrong choices, and gets you more information from the customer soonest, so you can iterate on it and make better choices going forward. That makes sense, and is a good definition of MVP.

Second, there is another definition of MVP out there. One pushed by the Lean Startup movement. And it defines and MVP something as the minimum thing you can build to test the viability of a product idea. Which is a very different thing. The definition of minimum is about the same, and viability still refers to the product, but now, instead of validating the success of a product, it’s applied to the idea of the product. Which means when you’re building an MVP, you’re not checking to see if you can make money off it, you’re checking to see if anyone else thinks you have a good idea.

And that’s why we never reached agreement on what the MVP for VirtualEarth was. The dev team wanted to build and deploy something to see if anyone cared, and if so, what they cared about. Oh for sure we had ideas and built some of them in the first release, but mostly we wanted to find out from our early adopters what they thought was important. So we could build more of that. Meanwhile, that other group was trying to figure out, without actual data, what people were willing to pay for. We’ll never know for sure why Virtual Earth (Bing Maps) never became the standard online map, despite having 3D models, weather, moving trees, Streetside, BirdsEye, and user generated content first, but building things that people didn’t want or know how to use instead of what they wanted in the order they wanted it probably played a part.

So when you build an MVP, remember why you’re building it.

Stages of an MVP

February 12, 2021 by Leon Rosenshein

Strict Product Liability

How many times have you heard That’s not a bug, that’s a feature? Usually it’s a developer saying that to a user or tester as an excuse for things not working as expected, but sometimes it’s “the street finds its own uses for things”. And how you respond depends on how you approach the situation.

Consider this video showing how a developer might react to a tester not using the product as intended. Who’s responsible here? Is anyone responsible? Is the tester wrong? Was the developer wrong? Was the specification wrong? Was no-one wrong and this is a learning experience for everyone? Should something be done? It depends on what the goal was.

In the US we have the idea of Strict Product Liability (SPL). IANAL, but my basic understanding of SPL is that the maker of a product is liable for all reasonable uses of a product. Which means if you’re using a screwdriver as a scraper or pry bar and something bad happens the manufacturer might be liable. There are lots of other factors, and you should probably consult your own lawyer before you act on that description, but there’s an important lesson in there for us developers too.

And it’s this. You can’t control what your customers are going to do with the capabilities you give them. You can (and should) make it simple to use things the way you intend them to be used. Streamline the happy path and funnel use-cases towards it. But unless you spend way too much time constraining your customer they’re going to use the tools you give them to solve their problems.

So what do you do about it? Two big things. First, while you can’t force your customer to stay on the happy path, you can make it very hard for them to hurt themselves or others. If your schema has a string field in it for a title then limit it to a reasonable length. If not you’ll find that someone is storing a 10Mb JSON blob in every record and taking out your database. Or maybe they’re using it as temp storage for message passing for distributed processing and suddenly you’ve got 20K concurrent writes/reads.

Second, find out why they’re doing it. They probably have a good reason. And you probably have a solution to their problem, but they don’t know about it. So introduce them to it. And if you don’t, should you? If one customer needs it there’s a good chance there are others who would benefit from that capability as well. So maybe you should build it for them. They’ll be happier, your system will be happier, and the other users of the system will be happier.

February 11, 2021 by Leon Rosenshein

Kinds Of Tests

cognitive load

There are lots of classes of tests. It all depends on what your goals are. Using the correct test for the situation ensures you’re actually testing what you think you are.

Unit tests are good for testing things in isolation. They let you control the inputs, both direct and dependent, and make sure the result is what you expect. The tighter the control you have the more reproducible the test and the fewer false positives you get. And it’s important to test not just that good inputs produce correct answers. You also need to ensure that all the possible combinations of incorrect inputs produce the correct response, whatever that means in your case. Having good, hermetic, unit tests is crucial to minimizing regressions.

Integration tests, on the other hand, have a whole different kind of isolation. You want integration tests to have as few controlled inputs as possible, but you want to be sure that the results of your integration tests are kept out of your production systems. Because what if there’s something wrong that you haven’t found yet. You run them carefully, and you know exactly what to expect as a result so you can look for small changes. Having good integration tests is crucial to finding problems with interactions and emergent behaviors. And if (when) you find some, it’s a good idea to figure out how to add them to the unit tests as controlled dependent inputs.

Scale tests are kind of like integration tests, but you’re looking for different things. Instead of having a small number of hand-crafted input sets and caring about the specific results, scale tests use lots of inputs and see what the impact of demand/concurrency is. The actual results aren’t checked. Instead the error rates and internal metrics, such as response time, queue sizes, memory used, are tracked and anomalies are flagged. Scale tests include not just the number of requests, but the number of requests per time and time at scale to see how a system responds to spikes and long periods of high demand. Good scale tests need lots of input, but give you confidence that your production systems will keep running if they get deployed.

Then there are tests you run in production. Some call those experiments or A/B tests, but they’re tests just the same. You’re just testing how something not under your direct control responds to changes. Things can get really dicey here. First, you need a good way to segment the population so only a subset get the new experience. You need to be able to define the group tightly and repeatably, If subjects go in and out of the group it’s probably not valid. You need to ensure that not too many subjects are in the test. You need to make sure that the test doesn’t have an unwanted impact on the control group. You need them though because good experiments let you test things safely in the real world with real users.

And of course you need frameworks to handle all of these different kinds of tests. Sure, everyone could write their own, but that’s a huge duplication of effort. And worse than that, it increases cognitive load, because now you have to worry not only about the tests you’re doing, but how the tests are done as well. And the last thing I want people running tests to worry about is if the test harness is really running the tests that have been written and returning the correct results.

February 10, 2021 by Leon Rosenshein

Cognitive Load

cognitive load

I talk about cognitive load a lot (over 2 dozen times in the last year). Especially about reducing cognitive load. But what is it, and why is it important? And what does it have to do with software development, especially architecture and design?

To give an analogy, let’s imagine the brain is a tool. A general purpose tool that, with some instructions/adjustment can do lots of different things, You can store those instructions away and get them back when you need to do what they’re for. This tool can only make contact with the environment around it in a few ways, but those ways can be combined for more flexibility. One tool that meets those requirements is a computer. You’ve probably heard of them.

So, the brain is like a computer. That’s a nice analogy to use to help understand cognitive load. Especially the CPU part. Consider the CPU. It’s got a few registers. It can act on those registers in a handful of ways, including getting data into and out of a register. And that’s it. Everything else it does is a combination of those things. Let’s say your CPU has 5 registers. You can do anything you want with the info in them, but if you need to work with more than 5 things you’ll need to keep stopping to put one of those pieces of info down somewhere safe, pick up the new one, and move on. The bigger the difference between the number of pieces of info and the number of registers the more time spent just moving things around, not doing anything with the info. And every time you need to move something in or out of a register there’s a chance to get interrupted, or worse, drop something and lose it.

In a related fashion, computers appear to do multiple things at once. But in reality, for a given CPU that’s not really true. It does one thing for a few milliseconds, switches to a new thing for a few more, and cycles through everything over and over again, giving the appearance of doing all of those things at once. We call the time spent between doing different things a context switch, and they can take orders of magnitude longer than actually doing the work because the computer needs to put all of the info in those registers somewhere safe, then bring back the info that was in them the last time it worked on the other thing. It also needs to remember exactly where in the list of steps it was, and pick up where it left off. Again, that context switch is great opportunity to get something wrong.

Now, your brain isn’t a CPU, but let’s stick with the analogy. There are only a limited number of things you can keep in active memory at once. If the number of things you need to remember is higher than that you have to keep refreshing your memory. That’s cognitive load. The less time you spend refreshing your memory, the more time you can spend on what you’re trying to do.

Similarly, When you’re working on something, in the zone, as it were, you’ve got all the things you need in working memory and most of the instruction cycles in your brain are going towards getting that something done. When it’s time to context switch you need to save all that state, find and reload the old state. Until that’s done you aren’t making progress on anything. And for our brains that process is very slow and imprecise. Often you can’t get back to where you were, you can only get to a recent save point and then you need to go over the same things again to get back to where you were. That’s more cognitive load. Again, keeping it down helps you make progress against your goals.

So that’s what I mean by cognitive load and why it’s important. How it relates to development is a whole set of different topics for the future.

February 9, 2021 by Leon Rosenshein

Charting A Path

It's a sea chart, not a road map. Map out the destination (strategic goals) and the hazzards, but the route depends on the wind. "Road map" is not a useful metaphor.

-- Allen Holub

Sometimes you run across a phrase that really resonates. This is one of those cases. I’ve talked about roadmaps before, but it took me a few paragraphs and 6 questions to say what Allen said in 3 sentences.

Know where you want to go and what you need to avoid, but the actual path isn’t known until you can look back and see what it was. That’s pretty profound. Because metaphors are important. They provide context, and context is important. And that’s why a roadmap might not be the best metaphor. A roadmap is prescriptive about both path and time. Because it describes a journey over a well-known, static landscape. And development is often not a known, static landscape.

But it doesn’t mean don’t plan and don’t pick a direction. What it does mean is that you need to be both proactive and reactive at the same time. Either one alone won’t get you there. And you need to balance them.

You need to be proactive in that you need to keep the goal, the “landscape”, and hazards in mind. Where possible you want to take advantage of the situation you are in. Going with the wind, as it were. You also need to plan to avoid the hazards, he rocks and shoals along the way.

And you need to be reactive as you go. The situation is not static. The goal moves as you learn more about it. The wind might be stronger or weaker than expected. The cross-wind will be different than planned. Staying on heading X for Y hours won’t put you where you planned, so you need to react to where you are and re-plan.

So don’t skip the planning. If you don’t know where you want to go you’ll never get there, and there’s a good chance all you’ll do is go around in circles. But don’t slavishly follow the plan. Assuming nothing will change along the way will ensure you never get where you want to be just as certainly as not knowing where you’re going.

February 8, 2021 by Leon Rosenshein

The -ilities

tdd non functional requirements it depends

In software engineering there are lots of different kinds of requirements. There are the functional ones. They are the obvious ones that describe what the software is supposed to do. There are less obvious ones, that talk about what the software should do when something goes wrong. Then there are business requirements, like time to market and operational costs. And finally there’s a whole set of requirements that have nothing to do with how the software should work, or when it should be ready. Instead, talk about how the software should be designed.

These are the non-functional requirements (NFRs). The things that you need to think about when you design the system, not just the code. The NFRs are a set of nouns that describe the quality attributes of the system. You’ll often hear them called the -ilities since many of them and that way.

It’s usually easier to build a system that meets the functional requirements if you ignore the NFRs. And if you were only going to build one version, and only build it once, that might be the right thing to do. Because most of the -ilities are talking about things in the future. Operational things like reliability, scalability, and adaptability. If you don’t have to run it, grow it, or change it to meet future needs, why bother thinking about that or being able to handle it?

You shouldn’t. On the other hand, if you only have a rough idea of the current requirements, and notion of which direction things are going to go in the future it behooves you to not box yourself in a corner. But there are lots of -lities, so how do you know which ones are important and which ones aren’t?

Well, it depends. It depends on what you know, what you don’t know, and unfortunately, on what you don’t know that you don’t know. So how do you decide? How do you architect the system so that you choose the right NFRs, and then use them to both add customer value and keep from painting yourself into a corner?

There’s no simple answer, but there are guidelines. Domain Driven Design helps you find the clear boundaries between things so that you can change one thing without needing to change everything. Test Driven Design helps you know that anything you do need to change still works the same as it did before. Working with subject matter experts on a Ubiquitous Language for your system helps ensure that you’re solving the right problems and that everyone is talking about the same thing.

And finally, having enough adaptability in your system to adjust to new learnings and requirements as they are discovered. And that means not just adaptability in the system design, but in the overall process so that you can make the changes you need without having to fight the system.

February 5, 2021 by Leon Rosenshein

NoHello

Software development is an odd mix of collaboration and isolation. Especially now that we’re all WFH. And we work for a distributed company. Across at least 4 time zones worth of offices and folks working from even more places. Which means that collaboration takes place mostly on Zoom/gMeet instead of in person. Both of those are pretty high bandwidth and interactive, which is good. But because we’re not all awake at the same time, let alone working at the same time, that kind of real-time, high bandwidth, synchronous connection isn’t always possible.

So we fall back to more asynchronous connections. Like Slack, or email. Now email is clearly asynchronous and non-interactive, so we have no expectation of an immediate response. And email generally shows that. There’s a storytelling pattern to it. And I’ll get to that one of these days.

Slack, on the other hand, feels more like a phone call. I call, you answer. I talk, then you talk. Details are requested and added as needed. At least that’s what usually happens. But sometimes, the person on the other end isn’t really there. Or they’re at the keyboard, but busy doing something else. So you say “Hello” expecting an answer, but nothing happens. So you wait a few minutes, then figure the other person isn’t around, and move on. Some period of time later the person you said hello to notices and/or has a chance to respond, and says “Hello” back. But now you’re busy. This goes on for a while and eventually you ask your question, like “What was the link to that article you were talking about in the meeting?” And you get your answer. After 3 rounds back and forth, 6 workflow interruptions, 20 minutes watching slack for a response, and 4 hours of wall time. Because Slack isn’t a phone call.

While it sometimes feels like one, it’s really an asynchronous communication channel. It’s just that often the delay is minimal. So when communicating on Slack it’s important to keep that in mind. You’d never send me an email that says “Hello”, then wait for me to respond before continuing with the rest of the email. So why do it in Slack.

Which leads to what I talked about in the Hello vs NoHello debate. The short recap is, at least when communicating with me, don’t say “Hello” and wait for a response. Just ask your question. I’m fine with, prefer actually, a nice greeting, but don’t wait for me to respond. Say hello, or don’t, and ask your question. That gets you the answer faster, wastes less of your time waiting for me to respond, and interrupts me fewer times before I can answer the question.

It’s better for everyone in so many ways. What do you think? Share in the comments.

February 4, 2021 by Leon Rosenshein

Precision vs. Accuracy

Everyone wants to be accurate, and everyone wants to be precise. Those are both great goals. It’s a wonderful thing when you can be precise and accurate. On the other hand it becomes a problem when you trade one for another or even worse, mistake one for the other.

What’s the difference between precision and accuracy? The way I think about it, precision is a measure of the “size” of a quanta. One hour has 1/60th precision of a minute, and a year is 1/525600 the precision. Precision is a measurement of the measurement you’re making and has nothing to do with the thing being measured. If you measure in whole years there was a 1 year time period (525600 minutes) when I was 21 years old. If you were to measure in whole hours there was a 1 hour time period (60 minutes) when my age was 184086 hours. The measurement of 184086 hours old is much more precise than 21 years old. Measure it in minutes or seconds and it’s still more precise.

Accuracy, on the other hand is a measurement of how close the measurement is to truth, however you want to define truth. Going back to the age example, if I were to tell you I was 54 years old I would be 100% accurate. However if I told you I was 473364 hours old I would be almost 2% off. Both 54 years and 473364 hours represent the same timespan, but the accuracy of the two is different.

Of course the two are intimately related. Consider the birthday example. My birth certificate is considered truth, and it has a time of birth with a precision of 1 minute. But what’s the accuracy? We don’t really know. How much time passed between being born and looking at the clock? Probably not much, but some. And how precise/accurate was the clock? When was it last set? To what standard? It was almost certainly an analog clock, so the angle of view can change the reading as well. In my case it doesn’t make much difference, but consider the person with a birth time of 2359 local. That’s one minute of precision, but an accuracy slip of 2 minutes has the person born on the wrong day. And if it was December 31st it could be the wrong year as well.

Ballistics is another area where the difference between the two is apparent. Back in my earlier days we talked about Circular Error Probable (CEP) a lot. For a given initial condition how tight of a cluster would a bomb land in. How big would a circle need to be to include 90% (CEP-90) of the attempts? The smaller the circle, the better, and more precise, the system was. But that doesn’t say anything about accuracy. The bombsight could have been anywhere, but the CEP would be the same. Getting the bombsight and the center of the CEP to match is the accuracy. That was my job, and that sticking actuator gave me a lot of grief before I had enough data and didn’t have to worry about the size of the CEP, but that’s another story.

As engineers we know all this. We’ve been taught about it and have dealt with significant figures in math for years. But what does this have to do with development? It’s important when it comes to making estimates. Ideally you can be both accurate and precise, but that’s both hard and rare. In that case I say accuracy is more important. And even more important is to not confuse the two. Just because we estimate in hours or sprints, doesn’t mean we know to the nearest hour when something will be done. We need to be careful to not conflate the precision of the anwer with it’s accuracy. It’s an estimate. And it will get more accurate as time goes on and we know more and get closer to the end. But it rarely gets more precise. How to deal with that is a topic for another day.

https://imwrightshardcode.com/2013/07/to-be-precise/

https://en.wikipedia.org/wiki/Accuracy_and_precision

Recent Posts (page 34 / 70)