Recent Posts (page 30 / 65)

by Leon Rosenshein

Cognitive Load

I talk about cognitive load a lot (over 2 dozen times in the last year). Especially about reducing cognitive load. But what is it, and why is it important? And what does it have to do with software development, especially architecture and design?

To give an analogy, let’s imagine the brain is a tool. A general purpose tool that, with some instructions/adjustment can do lots of different things, You can store those instructions away and get them back when you need to do what they’re for. This tool can only make contact with the environment around it in a few ways, but those ways can be combined for more flexibility. One tool that meets those requirements is a computer. You’ve probably heard of them.

So, the brain is like a computer. That’s a nice analogy to use to help understand cognitive load. Especially the CPU part. Consider the CPU. It’s got a few registers. It can act on those registers in a handful of ways, including getting data into and out of a register. And that’s it. Everything else it does is a combination of those things. Let’s say your CPU has 5 registers. You can do anything you want with the info in them, but if you need to work with more than 5 things you’ll need to keep stopping to put one of those pieces of info down somewhere safe, pick up the new one, and move on. The bigger the difference between the number of pieces of info and the number of registers the more time spent just moving things around, not doing anything with the info. And every time you need to move something in or out of a register there’s a chance to get interrupted, or worse, drop something and lose it.

In a related fashion, computers appear to do multiple things at once. But in reality, for a given CPU that’s not really true. It does one thing for a few milliseconds, switches to a new thing for a few more, and cycles through everything over and over again, giving the appearance of doing all of those things at once. We call the time spent between doing different things a context switch, and they can take orders of magnitude longer than actually doing the work because the computer needs to put all of the info in those registers somewhere safe, then bring back the info that was in them the last time it worked on the other thing. It also needs to remember exactly where in the list of steps it was, and pick up where it left off. Again, that context switch is great opportunity to get something wrong.

Now, your brain isn’t a CPU, but let’s stick with the analogy. There are only a limited number of things you can keep in active memory at once. If the number of things you need to remember is higher than that you have to keep refreshing your memory. That’s cognitive load. The less time you spend refreshing your memory, the more time you can spend on what you’re trying to do.

Similarly, When you’re working on something, in the zone, as it were, you’ve got all the things you need in working memory and most of the instruction cycles in your brain are going towards getting that something done. When it’s time to context switch you need to save all that state, find and reload the old state. Until that’s done you aren’t making progress on anything. And for our brains that process is very slow and imprecise. Often you can’t get back to where you were, you can only get to a recent save point and then you need to go over the same things again to get back to where you were. That’s more cognitive load. Again, keeping it down helps you make progress against your goals.

So that’s what I mean by cognitive load and why it’s important. How it relates to development is a whole set of different topics for the future.

by Leon Rosenshein

Charting A Path

It's a sea chart, not a road map. Map out the destination (strategic goals) and the hazzards, but the route depends on the wind. "Road map" is not a useful metaphor.

    -- Allen Holub

Sometimes you run across a phrase that really resonates. This is one of those cases. I’ve talked about roadmaps before, but it took me a few paragraphs and 6 questions to say what Allen said in 3 sentences.

Know where you want to go and what you need to avoid, but the actual path isn’t known until you can look back and see what it was. That’s pretty profound. Because metaphors are important. They provide context, and context is important. And that’s why a roadmap might not be the best metaphor. A roadmap is prescriptive about both path and time. Because it describes a journey over a well-known, static landscape. And development is often not a known, static landscape.

But it doesn’t mean don’t plan and don’t pick a direction. What it does mean is that you need to be both proactive and reactive at the same time. Either one alone won’t get you there. And you need to balance them.

You need to be proactive in that you need to keep the goal, the “landscape”, and hazards in mind. Where possible you want to take advantage of the situation you are in. Going with the wind, as it were. You also need to plan to avoid the hazards, he rocks and shoals along the way.

And you need to be reactive as you go. The situation is not static. The goal moves as you learn more about it. The wind might be stronger or weaker than expected. The cross-wind will be different than planned. Staying on heading X for Y hours won’t put you where you planned, so you need to react to where you are and re-plan.

So don’t skip the planning. If you don’t know where you want to go you’ll never get there, and there’s a good chance all you’ll do is go around in circles. But don’t slavishly follow the plan. Assuming nothing will change along the way will ensure you never get where you want to be just as certainly as not knowing where you’re going.

by Leon Rosenshein

The -ilities

In software engineering there are lots of different kinds of requirements. There are the functional ones. They are the obvious ones that describe what the software is supposed to do. There are less obvious ones, that talk about what the software should do when something goes wrong. Then there are business requirements, like time to market and operational costs. And finally there’s a whole set of requirements that have nothing to do with how the software should work, or when it should be ready. Instead,  talk about how the software should be designed.

These are the non-functional requirements (NFRs). The things that you need to think about when you design the system, not just the code. The NFRs are a set of nouns that describe the quality attributes of the system. You’ll often hear them called the -ilities since many of them and that way.

It’s usually easier to build a system that meets the functional requirements if you ignore the NFRs. And if you were only going to build one version, and only build it once, that might be the right thing to do. Because most of the -ilities are talking about things in the future. Operational things like reliability, scalability, and adaptability. If you don’t have to run it, grow it, or change it to meet future needs, why bother thinking about that or being able to handle it?

You shouldn’t. On the other hand, if you only have a rough idea of the current requirements, and notion of which direction things are going to go in the future it behooves you to not box yourself in a corner. But there are lots of -lities, so how do you know which ones are important and which ones aren’t?

Well, it depends. It depends on what you know, what you don’t know, and unfortunately, on what you don’t know that you don’t know. So how do you decide? How do you architect the system so that you choose the right NFRs, and then use them to both add customer value and keep from painting yourself into a corner?

There’s no simple answer, but there are guidelines. Domain Driven Design helps you find the clear boundaries between things so that you can change one thing without needing to change everything. Test Driven Design helps you know that anything you do need to change still works the same as it did before. Working with subject matter experts on a Ubiquitous Language for your system helps ensure that you’re solving the right problems and that everyone is talking about the same thing.

And finally, having enough adaptability in your system to adjust to new learnings and requirements as they are discovered. And that means not just adaptability in the system design, but in the overall process so that you can make the changes you need without having to fight the system.

by Leon Rosenshein

NoHello

Software development is an odd mix of collaboration and isolation. Especially now that we’re all WFH. And we work for a distributed company. Across at least 4 time zones worth of offices and folks working from even more places. Which means that collaboration takes place mostly on Zoom/gMeet instead of in person. Both of those are pretty high bandwidth and interactive, which is good. But because we’re not all awake at the same time, let alone working at the same time, that kind of real-time, high bandwidth, synchronous connection isn’t always possible.

So we fall back to more asynchronous connections. Like Slack, or email. Now email is clearly asynchronous and non-interactive, so we have no expectation of an immediate response. And email generally shows that. There’s a storytelling pattern to it. And I’ll get to that one of these days.

Slack, on the other hand, feels more like a phone call. I call, you answer. I talk, then you talk. Details are requested and added as needed. At least that’s what usually happens. But sometimes, the person on the other end isn’t really there. Or they’re at the keyboard, but busy doing something else. So you say “Hello” expecting an answer, but nothing happens. So you wait a few minutes, then figure the other person isn’t around, and move on. Some period of time later the person you said hello to notices and/or has a chance to respond, and says “Hello” back. But now you’re busy. This goes on for a while and eventually you ask your question, like “What was the link to that article you were talking about in the meeting?” And you get your answer. After 3 rounds back and forth, 6 workflow interruptions, 20 minutes watching slack for a response, and 4 hours of wall time. Because Slack isn’t a phone call.

While it sometimes feels like one, it’s really an asynchronous communication channel. It’s just that often the delay is minimal. So when communicating on Slack it’s important to keep that in mind. You’d never send me an email that says “Hello”, then wait for me to respond before continuing with the rest of the email. So why do it in Slack.

Which leads to what I talked about in the Hello vs NoHello debate. The short recap is, at least when communicating with me, don’t say “Hello” and wait for a response. Just ask your question. I’m fine with, prefer actually, a nice greeting, but don’t wait for me to respond. Say hello, or don’t, and ask your question. That gets you the answer faster, wastes less of your time waiting for me to respond, and interrupts me fewer times before I can answer the question.

It’s better for everyone in so many ways. What do you think? Share in the comments.

by Leon Rosenshein

Precision vs. Accuracy

Everyone wants to be accurate, and everyone wants to be precise. Those are both great goals. It’s a wonderful thing when you can be precise and accurate. On the other hand it becomes a problem when you trade one for another or even worse, mistake one for the other.

What’s the difference between precision and accuracy? The way I think about it, precision is a measure of the “size” of a quanta. One hour has 1/60th precision of a minute, and a year is 1/525600 the precision. Precision is a measurement of the measurement you’re making and has nothing to do with the thing being measured. If you measure in whole years there was a 1 year time period (525600 minutes) when I was 21 years old. If you were to measure in whole hours there was a 1 hour time period (60 minutes) when my age was 184086 hours. The measurement of 184086 hours old is much more precise than 21 years old. Measure it in minutes or seconds and it’s still more precise.

Accuracy, on the other hand is a measurement of how close the measurement is to truth, however you want to define truth. Going back to the age example, if I were to tell you I was 54 years old I would be 100% accurate. However if I told you I was 473364 hours old I would be almost 2% off. Both 54 years and 473364 hours represent the same timespan, but the accuracy of the two is different.

Of course the two are intimately related. Consider the birthday example. My birth certificate is considered truth, and it has a time of birth with a precision of 1 minute. But what’s the accuracy? We don’t really know. How much time passed between being born and looking at the clock? Probably not much, but some. And how precise/accurate was the clock? When was it last set? To what standard? It was almost certainly an analog clock, so the angle of view can change the reading as well. In my case it doesn’t make much difference, but consider the person with a birth time of 2359 local. That’s one minute of precision, but an accuracy slip of 2 minutes has the person born on the wrong day. And if it was December 31st it could be the wrong year as well.

Ballistics is another area where the difference between the two is apparent. Back in my earlier days we talked about Circular Error Probable (CEP) a lot. For a given initial condition how tight of a cluster would a bomb land in. How big would a circle need to be to include 90% (CEP-90) of the attempts? The smaller the circle, the better, and more precise, the system was. But that doesn’t say anything about accuracy. The bombsight could have been anywhere, but the CEP would be the same. Getting the bombsight and the center of the CEP to match is the accuracy. That was my job, and that sticking actuator gave me a lot of grief before I had enough data and didn’t have to worry about the size of the CEP, but that’s another story.

As engineers we know all this. We’ve been taught about it and have dealt with significant figures in math for years. But what does this have to do with development? It’s important when it comes to making estimates. Ideally you can be both accurate and precise, but that’s both hard and rare. In that case I say accuracy is more important. And even more important is to not confuse the two. Just because we estimate in hours or sprints, doesn’t mean we know to the nearest hour when something will be done. We need to be careful to not conflate the precision of the anwer with it’s accuracy. It’s an estimate. And it will get more accurate as time goes on and we know more and get closer to the end. But it rarely gets more precise. How to deal with that is a topic for another day.

https://imwrightshardcode.com/2013/07/to-be-precise/

https://en.wikipedia.org/wiki/Accuracy_and_precision

by Leon Rosenshein

Poor Man’s Polymorphism

It’s been said that an if statement is polymorphism at its most basic. It’s also been said that if-else is a code smell. Last year I talked about using data and data structures to eliminate if and switch from your code. Coding to interfaces can make your code cleaner and easier to follow. If reducing if is good then eliminating it is great, right?

Not always. A big part of development is building a mental model of what you want to happen. And there’s a limit to how big those models can be. Sure, any decision logic can be turned into a state machine and used to “hide” the details, but sometimes the simplest state machine is just a set of if statements.

The other day I wrote about cleaning up some complicated decision logic. I did go back and clean it up a little more. The code ended up looking like

_, err := os.Stat(path)
if err == nil {
  return true, nil
}

if os.IsNotExist(err) {
  return false, nil
}

return false, err

Which is about as simple as it can get. And it’s obvious. No hidden complexity. No extra layer of abstraction. No deeply nested if turning the code into a right arrow. Sure, we could have written a factory that builds an instance of an interface that does the right thing, but that’s just hiding the if behind one more level of abstraction. And no-one wants that.

So how do you know when to just if it and when to use the interface? It’s a combination of code size and frequency. Replacing 8 lines in one place with a factory/interface construct doesn’t make sense. On the other hand, building your own vtable to map shape types to their different functions (something the compiler can do for you) is just as wrong, and doing it in one giant dispatch function that only uses if is even worse.

So remember, it depends, and let the data guide you.

by Leon Rosenshein

Expediency

Sometimes the wrong solution is the right solution at a particular moment in time. Times when code that might be less than elegant is appropriate. Back in the day of boxed products and in-store shopping, getting the gold master disk to the manufacturer in time to get your production slot was one of them. Miss your slot and you might miss holiday sales. Today it’s outage mitigation. The longer an outage goes on the more your customers suffer.

Which is not to say that you can/should write bad code at those times. Code that is full of bugs and edge cases is just as much a problem when you’re in crisis as it is when you’re not. But a quick  hard-coded if <wierd, well defined special case> then return at the right time and place can get your system up and running and keep your customers satisfied. And you almost certainly need to add some work to your backlog to go back and develop the long term fix. But that doesn’t negate the importance of the quick fix.

Shipping/Mitigating is a feature. And it’s probably the most important one. Even if you get everything else perfect, if it never ships, have you really done anything? You’ve expended energy, but have you done any work?

So how do you know when to do the “right” design vs the “expedient” design? Which one adds more customer value today? How much will the expedient solution slow you down tomorrow? How long will you need to get back the time spent on the “right” solution? If the quick solution takes 5 minutes and doesn’t change your overall velocity but the elegant solution will take a couple of man-months then you probably shouldn’t bother being elegant.

A classic example of this is the Aurora scheduler for Mesos. Aurora is a stable, highly available, fault tolerant, distributed scheduler for long running services. It has all of the things you’d expect from such a scheduler. It handles healthchecks, rolling restarts/upgrades, cron jobs, log display and more. If something happens to the lead scheduler another one is automatically elected. If the scheduling system falls over when it comes back it looks at where it thinks it left off and reconciles itself with the actual running system. It also has an internal memory leak. There’s something in the code that the lead scheduler runs that will cause it to eventually lose its mind. I don’t know exactly what it is, and neither do the authors/maintainers. But they know it’s there and they have a solution. Every 24 hours the lead scheduler terminates itself and restarts. At that point the rest of the system chooses a new lead scheduler and everything continues happily. I don’t know how much time they spent looking for the problem, but the solution they came up with took a lot less time. It works in this case because the system is designed to be resilient and they had to get all of those features working anyway. They just found a way to take advantage of that work to handle another edge case. With that fix in place there’s no need to spend more time worrying about it. It’s an isolated, testable (and well tested) bit of code, so it doesn’t impact their overall velocity.

On the other hand, adding more and more band-aids and special cases to your event handler will eventually lead to emergent behavior and your velocity will slow. Back in the Opus days we had a thing called the Flying Spaghetti Monster, or FSM. It’s job was to manage the lifespan of your data. Basically you would put a tag on a directory and then the system would delete it when it’s lifespan was over. It only had 3 modes. TimeSinceCreation, ChildTimeSinceCreation, and Quota. But the logic of how you could nest them and the time periods on the first two could lead to some odd corners, so whenever we had to touch that code the first thing we’d do was add more tests for the new case. Because it just kept getting more complex. After about a year I realized that it would take me less time to redo the whole set of logic than it would to modify the house of cards it turned into. Luckily we had all the tests to make sure the logic didn’t change.

So when do you do the right thing vs. the expedient thing? It depends on which gives more customer value over time, remembering that real value now is much more valuable than potential value later.

t.co/1xpWptiyBI?amp=1

by Leon Rosenshein

Help Me

I’ve talked before about how to ask for help. How you ask is important for lots of reasons. It lets the other person know what problem you’re trying to solve. It lets them know what didn’t work so they don’t suggest that. It shows that you’ve put in some effort and you’re not just looking for someone to do the work for you.

But just as important as how to ask is when to ask. And the answer, like always, is “It depends”. The easy answer is that you should ask as soon as it becomes apparent that it will cost more to figure it out yourself than it would to get help. Simple, right?

Not really. Because you can’t know either side of the equation. You might have an epiphany in the next 10 seconds, or it might take 3 more days of digging. And you can’t know what the cost is. You might be able to figure out the direct cost, approximated by the hourly rate of the person helping you and the time they spend helping you, but what about the indirect cost? They were doing something important, so that got delayed. They were possibly interrupted, so that cost additional time. All sorts of costs to consider.

On the other hand, think of the benefits. A 2 sentence answer at the right time can save you days of exploring and investigating. Getting done days sooner adds customer value itself, and that time lets you add even more value. So how do you balance it?

One way is to look back. If you’ve spent a day making no progress what’s the likelihood that you will have a breakthrough in the next hour? You know yourself best, but that likelihood is probably pretty low. So ask.

Another way is to look forward. If the destination is across territory that is uncharted (to you) and there are people who’ve already explored it, that might be a good time to ask for help. Or at least ask for a map. And if there isn’t one, make it for the next person.

Finally, be honest with yourself. Don’t let pride or fear keep you from asking. In many cases they are flip sides of the same coin. Everyone has things they don’t know. Don’t be afraid to ask for help. That’s one of the ways we learn. It doesn’t mean you failed, aren’t capable, or don’t care. It’s about doing the most valuable thing for the team and the customer.

And there’s one thing you can be sure of. There was a time when the person helping you didn’t know the answer either.

docs.google.com/document/d/1QzFcWc6TAbI1APOMv-9e8VYZtm08DoLCJyEQ2WUDLKE/edit#heading=h.eglxkhp0bwee
docs.google.com/document/d/1jPl0AzkW8G6QThwuThzz7bZRAkxfpFLTUXzuNq7mzaw/edit#heading=h.orc3ioi5c1dy
imwrightshardcode.com/2019/05/when-to-ask-for-help

by Leon Rosenshein

The Scouting Rule

Try and leave this world a little better than you found it.

        – Lord Baden-Powell

The scouts have as one of their tenets to not just clean up the campsite after themselves, but to make it a little cleaner/better. While we might not be scouts now, that idea applies to us too. It’s also a great way to manage technical debt.

Development velocity is important. The speed at which we can add features/capabilitiesdrives how quickly we can add value. The trick is managing both instantaneous long-term velocity. It’s easy to always make the choice that maximizes instantaneous velocity, but over time you often end up going slower overall.

One way to combat that is to clean things up while you’re working on things. The other day I was tracking down a bug in the infra tool that caused a crash if there was no pre-existing config file. The fix itself was simple. Just ensure the needed directory structure was in place before trying to create the config file. But while I was debugging I ran into this little bit of code

func (fileSystem) Exists(path string) (bool, error) {
  _, err := os.Stat(path)
  if err != nil && !os.IsNotExist(err) {
    return false, err
  return err == nil, nil
}

Now that is correct, but it took me 10 minutes to really be confident that it was. So I changed it to be

// If Stat works the file exists. If Stat errors AND the error is an IsNotExist error then the file doesn't
// exist. For any other error we say the file doesn't exist and return the error
func (fileSystem) Exists(path string) (bool, error) {
  _, err := os.Stat(path)
  if err != nil {
    if os.IsNotExist(err) {
      return false, nil
    }
    return false, err
  }
  return true, nil
}

That’s much easier to read, and the next person who has to go in there will thank me. It’s probably going to be me, and I’ll still be thankful. BTW, there’s an even better way to rewrite it, but that will wait until I have a free moment or find myself in there again.

So, next time you’re in the code and you see something not quite right or that should be refactored you should just do it right? Well, … it depends. It depends on how big a refactor it is, if it’s going to make doing what you went into the code to do easier, and what the next known things you’ll need to do are. If it is directly related and will make you more efficient then you probably should. If it will be helpful next week then maybe. If you think you’ll need it in 6 months then you should probably document the issue and not fix it now.

And as an aside, when you’re doing refactoring changes, do your best to keep them separate from the change they’re enabling. It’s much easier to test/review a pure refactor PR that isn’t supposed to have any runtime impact and a change-inducing PR separately.

https://docs.google.com/document/d/1QzFcWc6TAbI1APOMv-9e8VYZtm08DoLCJyEQ2WUDLKE/edit#heading=h.xcbfdi4m6jyn
https://www.stepsize.com/blog/how-to-be-an-effective-boy-girl-scout-engineer

by Leon Rosenshein

A Short Pause

Just wanted to let everyone know that I'm going to take a short pause here. I'm working with the Aurora comms team to figure out the best way to move forward with this and share with an even broader audience.

I wanted to say thank you to everyone. Working on these has helped me clarify my thoughts and think deeper about my opinions and viewpoints. So thank you to all those who commented or reacted in Slack or reached out directly.

I've still got plenty to say, but I'm always looking for new topics as well, so if there's something you have questions about feel free to add them as a comment or send them to me at mailto:leonr@aurora.tech.

Meanwhile, remember to ask "What is it you're really trying to do?", and that the answer to any question that starts with "How should I …" is probably It depends