Recent Posts (page 22 / 71)

January 26, 2022 by Leon Rosenshein

Decisions 2 (if ... elseif ... elseif ... elseif ...)

Planning engineering excellence

Speaking of the difference between a choice and a decision, the number of choices you have to pick from can deeply influence the decisions you make about the code you write. Because the number of choices you have can sometimes drive the implementation of the decision of how to do something far from where you actually make the decision.

You could do it up-front, with a set of micro-decisions, but you better know what the options are and make all the right choices then. If you do it that way though you don’t have much flexibility, because by the time the decision starts to impact what happens you’re far from the decision and it’s often too late to adjust. A better choice is to push the decision up the stack and abstract away the implementation down the stack to where more (context and detail) is known and the impact is clearer.

Another way to look at it is to think about some simple non-branching code. It proceeds from statement to statement doing whatever it’s told. All the choices were made by the compiler a long time ago. At runtime there are no decisions to be made.

Now add a simple if statement. Everything runs along smoothly, but somewhere in the middle there’s a decision and something different happens. If there’s only one of them then it’s a special case and (almost) everyone knows about it and expects it.

Of course, as time goes on more special cases show up, and the simple if turns into an if ... elseif ... elseif .... They look clean. But pretty soon new requirements show up. Special case upon special case, with sub-cases and caveats, to the point where you build a truth table, hope you get it right, then implement it. Eventually no-one is sure what happens, or why. Look at this code I wrote a year ago. It got so bad I put in a comment explaining what was supposed to happen because it wasn’t obvious to me, let alone someone else, from reading the code.

var filteredEntries []userstate.StateEntry
initialTargetLen := len(targets)
for _, entry := range entries {
    req := entry.Value.(authutils.GetCredentialsRequest)
    // This is getting complicated, so breaking it down. We EXCLUDE a profile if ALL of these conditions are false
    // 1) The user specified a list of profiles and the current profile is one of them
    // 2) The user did not specify a list of profiles AND specified the --all flag OR the --aws flag
    // 3) The user did not specify a list of profiles, the current profile is not an admin profile,
    //       and the user did NOT specify the --admin flag
    // 4) The user did not specify a list of profiles, the current profile is an admin profile,
    //        and the user specified the --admin flag

    if !((initialTargetLen > 0 && targets[entry.Key]) ||
        (initialTargetLen == 0 && (refreshParams.Flags.All || refreshParams.Flags.Aws)) ||
        (initialTargetLen == 0 && roleIsAdmin(req.AWSRole) == refreshParams.Flags.Admin)) {
            continue
        }
    filteredEntries = append(filteredEntries, entry)
    delete(targets, entry.Key)
}

Try adding a new parameter that interacts with that logic. Combinatorial explosion. And it’s possible (likely) that some of the flags are mutually exclusive. It needs to be fixed, and one day I’ll get to it. Probably the next time I need to add something. At that point I’ll figure out how to extract the logic in some flag/parameter abstract way.

The problem is that the decision (what the user wants to do) is made far away temporally from when the code needs to decide. And on a completely different timeline new requirements get added. And this code makes that hard.

Making that kind of change easy is an architectural decision. There are lots of ways to do it. Everything from changing the requirements to objects, abstractions, inheritance, and interfaces. And they should all be considered before a decision is made.

January 25, 2022 by Leon Rosenshein

Is That a Choice or a Decision

planning tyranny of or

I’ve talked about evaluating your decision based on what you knew at the time instead of the outcome of the decision before. That’s as true now as it ever was. Unfortunately, that’s a retrospective approach to understanding the quality of your decision. It doesn’t help much while you’re trying to make the decision.

As much as it would make my job easier, I don’t have a time machine that will let me evaluate my decisions based on what I find out later. But there are some rules of thumb that can help to ensure that you’re making a good decision, in the moment, regardless of what ends up happening.

The simplest, and most important, is to make sure that you’re actually making an informed decision. Sticking with the status quo, or taking the first option you come up with isn’t a decision. It’s a choice, but it’s not a decision. To make a decision you need to make a choice between options. If there’s only one option it’s rule by fiat or declaration.

That means you need at least two options to make a decision. Do or Do Not (there is no try). Do A or B. At least you’re making a choice now. Arguably a decision. But it’s a very limited one. Those kinds of choices are often polar opposites, and the world is rarely that simple.

Which means you need to avoid the tyranny of or. Sure, some choices are truly binary, but most aren’t. It’s not this or that, but more often, a little bit of this and some more of that. It’s conjunction junction. Hooking up tasks and options and choices into a coherent whole that solves the problem. Shades of grey.

So next time you need to make a decision, remember, if you have:

1 choice you’re making a decision by fiat or declaration
2 choices you’re probably choosing between polar opposites. Avoid the tyranny of OR
3+ choices and you’re making a more informed decision based on knowledge of the situation.

January 21, 2022 by Leon Rosenshein

Flow

Agile Flow

You’ve probably heard of flow. That perfect balance of concentration, energy, attention, and understanding that leads to progress. You’ve probably been in a flow state a few times. It seems magical. You get more done than you expected and when you’re done the result feels right. Everything seems to click and the code is something you’re proud of.

And it’s by no means restricted to development, or just creative endeavors, but also in repetitive, assembly line, types as well. One of my early jobs was at a concrete block factory. Everything from running the molding machine to loading trucks, and pretty much everything in between.

There was a lot of automation, but we also did a bunch of semi-custom work that added manual steps to the process. Particularly the part of putting the finished blocks into cubes on pallets so they could be stored in the yard or loaded for shipment. Some of the patterns we needed to use so the blocks interlocked and didn’t fall off when the truck went around a corner were complicated.

Even worse, some of them were rough split, so we had to keep pairs together so they fit well. Some days running the cuber was a struggle. Getting things in just the right place as more blocks kept flying down the line was hard. It felt like Lucy in the candy factory. Nothing seemed to go smoothly and eventually we’d either have to get more people at the cuber or hit the pause button and get caught up. Other days it, with the exact same kind of blocks, and the machine running at the exact same speed, things would just flow.

More relevant to us though is flow as a developer. I think this is an urban legend, but the XKCD page on the Ballmer Peak is real. And there’s certainly truth to the idea of reducing inhibitions to increase creativity. Pretty sure blood-alcohol-content isn’t the best way to get that increase, and stress and anxiety make things worse. So what does go into enabling flow?

The most common way of looking at it is to compare how your skill level at a task compares to the challenge of the task. Seems reasonable. With the caveat that your skill level at a task has an inverse, but non-linear, relationship to how much of a challenge it is. For any given task, the better at it you are, the less of a challenge it will be. The current model looks a lot like

Complex definition of flow

To me that just looks like a color wheel though. It’s visually round, which seems odd, and oddly specific. The sharp boundaries between states just doesn’t feel right.

What does feel right, to me at least, is a much older version of that image.

Simple explanation of flow

It’s the same author, but 40+ years older. It could probably do with a little more detail on the areas outside of flow, but for visualizing the difference between flow and not flow, I think it captures the real driver, which is a proper balance of skill and challenge. It might be hard to do, but you know you can do it, or it might be easy, and you can do it very easily. Either way, there’s a balance.

That’s why we like flow. Not because the task is easy, but because the mental state is easy. There’s no anxiety, boredom, worry, or apathy to deal with. All of your mental energy can go into accomplishing the task. Which also explains why so much gets done.

But here’s the catch. Flow is productive, but it’s often not growth. At all but the highest challenge/skill levels, you’re doing things that you know how to do. There will be incremental improvement, but to really make a step change to your skill/ability, you need to challenge yourself to go beyond your skill level. And there’s no flow there.

January 18, 2022 by Leon Rosenshein

Roller Coasters

Learning

Roller Coasters can be fun. Big drops. High speeds. Adrenaline rushes. And most of our offices have them relatively close. There’s the Steel Curtain at Kennywood, the Demon at Great America, and the Mind Eraser at Elitch Gardens. There’s even Roller Coaster Trail outside Bozeman.

From the outside it’s big and scary looking. You have some glimpses, but no understanding. You’re in line waiting to in and give it a try. There are folks in line with you who are very familiar with it. They’re telling you how wonderful it is. They describe the details and how it makes all the other coasters look tame.

There are also folks in line for the first time. They’re laughing nervously. Not quite sure what they’ve gotten themselves into. You can see the tension. The glances at the big drop or the loop. When a car rattles overhead they startle or duck. But they keep slowly moving forward. Nervously settling into the car when it’s their turn.

You start up the first big hill. The experienced folks are cheering. The novices are enjoying the view, but their heart rate and blood pressure are going up. They nervously check pockets to make sure things are secure.

You reach the top of the hill. Speed picks up. Wind in your hair. You float a little (or maybe a lot) in the seat. Let out a scream or two. The track levels out, the G-force builds. A couple of dips and sharp turns and you realize, you’ve got this. It’s fun. Probably with doing again. You start back up the other big hill. You’re confident. You’ve done this before.

Then the track drops out from under you. You’re in free fall. Then you’re upside down in a loop, a corkscrew, or both. How did we get here? What just happened? About the time you’ve internalized it you’re back at the station. An experienced rider, quite likely ready to do it again.

But there’s another kind of roller coaster we deal with a lot more often. That’s the roller coaster of understanding. It goes something like this.

Rollercoaster of understanding

New technologies are like that. I understand most of Kubernetes. I’ve been using Golang for a few years now. I’m pretty sure I get go functions and channels, but I’ve felt that way before, so there might be another hill coming.

Learning patterns is the same way. When you first learn them, you see the application for them everywhere, and you overuse them. But with time and experience, you learn not just when and how to use them, when and how to not use them.

And it's not just learning that is like a roller coaster. There are so many parts of the development cycle that feel like that. But those are stories for another time.

Regardless, the key is to push through. With humility as you realize there’s always more to learn.

January 13, 2022 by Leon Rosenshein

Getting Feedback

feedback

It’s feedback season again. There’s lots of advice on how and when to give feedback. The internet is full of it. What you don’t see so much of, and what’s about to become very relevant, is how to receive feedback. Like any other communication, regardless of how the message is sent, if it's not received there’s no information transfer.

So what can you do as the receiver of feedback? What is your role in the process? There are some simple things you can do to make getting feedback as effective as possible. And it doesn’t matter if you’re getting the feedback from a manager, peer, subordinate, or someone you only had incidental contact with.

First and most important, listen. Hear what the feedback is. Make sure you understand what is being said. Take notes. Ask clarifying questions if needed. It doesn’t matter if you think the person giving the feedback misunderstood or misheard or is wrong. Your job as the receiver is to get the information being delivered. That means you don’t control the conversation. Not the timing, and not the direction. Don’t put words in their mouths. You can ask clarifying questions, but not leading questions.

Second, don’t defend yourself or your action(s). This isn’t the time for that. Just take it in. Write it down. It goes back to listening first. Again, clarifying questions can be ok, but asking the giver to look at it from a different angle or asking if some new information changes their opinion deflects from what they’re trying to say.

Third, look for trends. Did you hear the feedback once, in a specific situation, from one person multiple times, or from many people over many instances? Is it something you only hear at work, or is it something you’ve heard in multiple contexts?

Fourth, don’t make instant promises. Unless the feedback is simple, like “I prefer to be called Tim rather than Timothy.” You’ll need to think about what was said, when and how often the situation arises, what you can do, and what you should do. The time to do that is not when you're supposed to be listening to the other person.

And finally, say thanks. The giver put themselves out there. Especially if it’s a subordinate or junior. They did their part and deserve thanks.

Of course, after you get the feedback comes the hard part. Interpreting it, understanding it, and making changes.

January 11, 2022 by Leon Rosenshein

Efficiency

agile

What is efficiency? According to Merriam Webster, efficiency is:

Efficiency (noun):

the ability to do something or produce something without wasting materials, time, or energy : the quality or degree of being efficient

Ex: Because of her efficiency, we got all the work done in a few hours.
Ex: The factory was operating at peak efficiency.

But what does that really mean? And how does it change depending on what you’re doing/producing? Do you measure efficiency the same way when you’re building a bridge, running a sprint, running a marathon, creating an original painting, or building software? Of course not. I ran across this distinction by Jessica Kerr.

“Efficiency” is about fewer steps, uniformity, control— only when building the same thing over and over.

“Efficiency” in building something new is about smaller steps, exploration, quick learning.

There’s something very subtle going on there. When you know exactly what to do and when to do it, being efficient means ensuring that everything happens exactly as planned, and exactly when planned. Anything beyond that minimum is wasted effort. Something to be eliminated when you’re trying to be more efficient.

On the other hand, when you don’t know exactly what to do next, when to do it, or what it will look like when you’re done, you approach it the other way. Take small steps. See if they work. Adjust and make sure you’re going in the right direction. Instead of a detailed roadmap with explicit steps (Taylorism), you have a sea chart. You have a goal, and you have the environment. The goal gives you your strategic direction. The environment helps you make tactical decisions.

We often use the first version because it gives us a sense of control. It’s rework avoidance theory. Define the path, execute the plan, arrive at the goal. No rework required.

Unless there is. If you just follow the plan you reach the last step and look up to see where you are. You might be close, you might be where you started, or you might be even further away. You won’t know until you get there. So you don’t know how much rework there’s going to be. You do know one thing though. The less you know about the environment between you and the goal, the further off you’re going to be when you finish the plan.

So if you want to execute efficiently through the unknown, take many more much smaller steps.

January 5, 2022 by Leon Rosenshein

SPOFs

engineering excellence

A SPOF is a single point of failure. It’s that one little thing that everything else depends on and doesn’t have redundancy. Like building an entire electric grid to make sure you have power available, adding an external generator and battery backup in case the grid fails, and building your house with multiple circuits to each room, then having a single line going from the battery/generator/grid multiplexer to the house. If that single line fails you’re out of power. At least in that case if the line fails you can probably run an extension cord from the working power supply directly to the house.

Now consider the James Webb Space Telescope (JWST). As I write this the JWST is about 500,000 miles from the Earth and is undergoing a complex, complicated process to get things ready for its real work. One of the biggest things it needs to get done is unfold its sunshield so the cold side stays cold. The sunshield, when unfolded, is the size of a tennis court and consists of 5 layers of incredibly thin (< 0.05mm) kapton film sheets. What makes it complicated is that the sunshield is packed for travel, it, along with all the rest of the satellite, needs to fit into a 5x15 meter fairing for launch. What makes it complex, is that during the unfolding process everything needs to move together at the right speed with the right force to avoid wrinkles and tears. On top of that, it needs to do it in a vacuum and microgravity. So you end up with 344 SPOFs.

Since it’s 500,000 miles away, in conditions that can’t be replicated reliably for any meaningful duration, testing in real conditions is hard. You can do lots of unit tests. You can do some underwater tests to approximate microgravity. But integration tests, not so much. So you plan. And you plan some more. You consider failure modes and build in contingencies. Then you build contingencies for your contingencies. And after all that, and $10,000,000,000 you launch with 344 SPOFs. In that case you have to have way more than 2 9’s confidence in your SPOFs. You need 4 or 5. That’s impressive.

So next time someone says it’s too hard to get 4 9’s on a highly available system remind them of the JWST, which got that kind of reliability with 344 SPOFs, so doing it Earthside, where you can touch things, change them, and have redundancy should be (relatively) easy.

January 4, 2022 by Leon Rosenshein

Back It Up There

engineering excellence devops

Back in the house now and mostly back to normal. Heat and electricity are on, but we still need to boil our water. Turns out most of our processes and preparations worked, but not all of them. The biggest failure appears to be the emergency alert system. The city/county uses Everbridge to share alerts. That’s fine and dandy, and they’re highly available and distributed. The problem is, they’re OPT IN, so if you don’t register for alerts you don’t get them. And of course, if you don’t know about the system you don’t register. Well, we’re registered now, so we’ve got that going for us.

Which got me thinking about systems testing and assumptions. When something absolutely, positively, has to work right, how do you verify it? Do you believe that if your car’s “Check Engine” light is off everything is fine? That’s a pretty bold statement. Especially if you don’t have a test to tell you if that light even works. Even if everything is working as planned, is there something that can go wrong that the light doesn’t tell you about?

Having a little green light that says everything is OK is a little better. At least then if the bulb burns out you know something happened and you can check. But even that leaves you at the mercy of what the system behind the light checks.

The only way to really know if your system can handle a particular fault is to try it out. Back when I was working on Falcon 4.0 the team down the hall was working on Top Gun, and they were trying out a new source control system, CVS. It seemed great. It understood merges. Multiple people could work concurrently on the same file and deal with it later. And our IT team took good care of the server. RAID array for the disks, redundant power supplies, daily backups, the whole 9 yards. And of course, the server died. It was the Pepsi Syndrome. Someone knocked over the server. Disks crashed. Motherboards broke. Network ports got ripped out.

Short story, it wasn’t coming back. But that’s ok. We’ve got backups. Weekly full backups and daily incrementals. Pick up a new server, restore the backup, and keep developing. Just a couple of days delay. Until we tried to restore the backup. Turns out we had write-only backups. The little green lights came on every week. The scripts returned successfully. The tapes were rotated off site for physical safety as planned. We just couldn’t restore the server from them.

Luckily we were able to piece together a close enough approximation of the current state from everyone’s machines and the project moved forward. We also added a new step to the weekly tape rotation process. Every Monday we’d restore from tape and apply an incremental on top of it to make sure the entire process worked. And of course, after that we never needed the backups again.

So next time you trust the absence of a red light, or the existence of a little green light, make sure you know what you’re really trusting.

December 30, 2021 by Leon Rosenshein

2021 In Review

best of

And what a year it’s been. Just some personal reflections. I worked from home for 11 1/2 months. 1 business trip, which was the first in ~19 months (and that hasn’t happened in about 12 years). 1 vacation. Became empty nesters as our youngest moved out. Restarted our latke fry. 1 office evacuation. 1 company change. 2 role changes. 3 different managers. And 153 blog posts.

Which was the biggest? That’s easy. Becoming empty nesters. Especially after working from home for over 20 months. It’s amazing how big a house can feel with just 2 adults and one dog. A lot quieter too. But what really made me notice was when I was on that business trip. Both my wife and I have traveled, together and separately since we got married, but for the first time in 30 years my wife spent the 6 nights at home alone. That’s a big change.

After that, it’s probably the change in roles. The move from ATG to Aurora, for as much churn as there was, really wasn’t that big of a change for me at first. My direct team and the larger group around it moved as a unit. We merged in some really sharp folks, but the work didn’t change much, nor did the environment. Picked up a few more responsibilities. Dropped a few as well. But over a few months I went from tech lead on the compute team to tech lead for Tech Foundations. Big change in scope. More indirect. More influence. But still exciting and I’m enjoying the work.

< Interlude for office evacuation >
SmokeFilledRoad
It’s a little smokey around here
< /Interlude for office evacuation>

And with 153 blog posts for the year, still sharing. Since we moved to Confluence I’ve got a little better access to history so I decided to play around with the confluence API and see what folks like.

Entry	Likes
Artificial Stupidity	8
Drive	6
Tenure	6
Ignorance As A Service	5
Exploitation	5
Another 4 Questions	5
Hyrum’s Law	5
Experimental Results	5

Alternatively, there’s the ones with 30+ viewers (since August when we started keeping track)

Entry	Viewers
Cloth Or Die	41
More Wat?	36
Ignorance As A Service	35
Language Matters	35
PR Comments	33
Continuous Education	32
Scrape To Taste	31
Core Services	30

December 22, 2021 by Leon Rosenshein

Language Matters

engineering excellence ownership hyrum's law

Hyrum Wright, of Hyrum’s Law posted a, in my view, valid, reasonable, and realistic question. Where do the responsibilities lie when a change to a core library/service/tool that is demonstrably good for many breaks tests for some? Is it with the person/team that made the fix? The person/team that wrote the test? The person/team that set things up so that nothing can be released with a failing test? And what should the next steps, immediate and long term, be anyway? Because in an automated, coupled system with gates to prevent bad things from happening this good thing either brings the system to a halt or gets delayed, which means value isn’t being delivered. The right answer, as is often the case, is “It depends”.

In a small (whatever that means) project it’s a non-question. It’s the same team, so just fix it. If it’s a periodically released, externally maintained thing, then fixing the issue is just part of the periodic upgrade tax. It’s only when you have a large project, with a team of teams, that the issue of responsibility becomes something to solve for. In theory, in the Team of Teams everyone’s responsible. In practice, when everyone’s responsible, it will often appear that no-one is responsible. Dealing with that conundrum is a topic for next year though.

The thing about language though is that how you ask the question is as important as the question itself. In the original post Wright asked whose fault the broken tests were. Based on later discussion, and the way I took it, “fault” was shorthand for “responsible for fixing”. But that’s not what question, as written, asked.

The literal question was about blame. That took the discussion in the team dynamics direction. Talking about dysfunctional teams and how even getting into that situation is possible. Blame is not productive or conducive to fixing problems and preventing them happening again in the future. It might have a short term impact, but all it does is push the problem further underground and make finding/fixing it harder next time.

Our jobs are to deliver as much value as possible as quickly as possible. Playing the blame game makes that harder. That’s why language matters.

Older Newer