Recent Posts (page 19 / 65)

by Leon Rosenshein

The Tyranny of Or

“The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function.

One should, for example, be able to see that things are hopeless and yet be determined to make them otherwise.”

― F. Scott Fitzgerald, The Crack-Up

A or B? Pick one. Many problems are answered that way. And maybe they’ve even been posed that way. But is A or B really a binary choice? Sometimes it is. But often it’s not.

Think about code reviews vs pair/ensemble programming. Code reviews are imperative and the only way to ensure quality code. Pair/Ensemble programming is critical and the only way to ensure quality code. Code reviews (or PR reviews) were instituted to solve a number of problems. Knowledge transfer. Bug detection/prevention. Adherence to the style guide. Getting a different perspective. What about pair programming? Shared knowledge. Bug detection/prevention. Shared style. Team cohesion.

Both methods are pretty good at achieving their goals. And those are pretty similar. On the other hand, code reviews can slow things down and knowledge transfer isn’t perfect. Pair (and especially ensemble) programming can miss parallelization of clearly separable work and you lose the benefit of a different perspective. So you have to choose one or the other. Right?

Maybe. You could do both as well. That gets you all the benefits. But it also has all the downsides. Maybe there’s a better approach. A hybrid approach that avoids the tyranny of or.

Defense in depth. Code in small groups. Talk a lot. Share approaches and changes as you develop. Automate as much as you can. Adherence to style guides. Lint for common structural issues. CI and automated tests, both unit and integration, so you know you haven’t had an unexpected impact on downstream customers/consumers. Selective code review from interested/relevant downstream partners and people more familiar with the ecosystem in general and environment, when appropriate. Get the benefits of both, and minimize the downsides.

Which is not to say that binary decisions are bad and that we should never make them. There are true binary choices. Especially when you look at other constraints. But just because something is presented as a binary choice does not mean you have to make one. Take the time to make a good decision in context, because, like all good decisions, it depends.

by Leon Rosenshein

Legibility

Definition of legible

1: capable of being read or deciphered
legible handwriting

2: capable of being discovered or understood
murder sweltered in his heart and was legible upon his face

-- Merriam Webster

The first one you know. UI/UX/Design stuff. Being easy to read. But the impact, positive and negative, of making things legible, especially the second definition, runs way deeper than choice of font size and foreground/background color.

Code can be readable and completely illegible. Green text on a black background with a monospace font that makes it easy to distinguish between 1 (the number one), I (the capital letter `eye`), and l (the lowercase letter `ell`) will make your code readable. But it doesn’t do much to help with discovery or understandability.

At the simplest, legibility in code comes from clean code. Separation of concerns. SOLID. KISS. DRY. All those acronyms. If you do those things reasonably well your code will be reasonably legible. At least at the tactical level.

But having truly legible code goes way beyond that. It’s about applying the same principles you would apply to a module/library to an entire system. It’s about your abstractions and data models and APIs. It’s about making sure that the system is understandable/discoverable at both the large and small scales, and that it’s easy transition between the levels as needed.

One thing that’s important to keep in mind while making things legible is that your model(s) of the system need to truly match reality, not just how you want reality to be. Take a complex system, make some simplifying assumptions, idealize things, and make it happen. When you do that it often feels correct, because you have control over what you’re doing. It’s predictable, understandable, and subtly wrong. But you won’t know it at first. It will mostly work. Until you hit that edge case.

So you patch around it. Until the next edge case. Rinse and repeat. Pretty soon your simple, elegant, legible system is none of those. So you come up with a new model and try again. And that cycle repeats.

Unless your models acknowledges that things aren’t that simple. That they allow for unexpected interactions. And that’s hard. Especially in large systems. 

by Leon Rosenshein

Prioritization vs. Categorization

MoSCoW. The method, not the capital of Russia (or any other city) or the mule.


Must: The system must meet these requirements or is considered a failure
o
Should: The system should meet these requirements, but if it doesn't we can do it later
Could: The system could meet these requirements. No one will object, unless there are must and should requirements that are unmet
o
Won't: The system won't do this. It will make the system worse and/or any time spent on these things is completely wasted. Don't do them. 


Seems pretty straightforward. The differences are clear. Do them in that order. You don't need any more information so get to work,

Not so fast. There are at least a couple of problems here. First, those are just labels. Labels on buckets of similarly important things. There's no sequencing provided inside a bucket. What happens if there are more items in the must bucket than there are teams to work on them? Even if there's enough time to serialize them, you don't know which one should be done first. So it's really categorization.

If there's only one team, and the requirements are all completely orthogonal, sequencing doesn't matter. Of course, in all the time I've been doing this I've never worked on a project like that. And I don't know anyone who has. It's probably happened somewhere, but it's rare enough to not worry about right now. Which means sequencing is important.

Second, while those are words, not numbers, there's really no difference between Must and Priority 1 (or 0, or -1). It's just the group with the highest importance. And they both suffer from the same kind of inflation. Every group/team/stakeholder thinks their problem/requirement is the most important. Or if not critical overall, critical to them, so they label it must. Because we all know that the shoulds almost never happen and they coulds are there for amusement only.

Which is not to say that categorization is unimportant. It's not. It's critically important. But it's not enough. You have to go beyond the categorization and really prioritize. You need an ordered list of what's the most important, balancing urgency and short and long term gain. You need to keep that list current. And most importantly, you need to follow it. Even (especially?) when a single stakeholder starts arguing loudly for their favorite thing.

by Leon Rosenshein

Problem Solving

A puzzle is a problem we usually cannot solve because we make an incorrect assumption or self-imposed constraint that precludes a solution

    -- Russell Ackoff

Similar to the XY Problem and my favorite question, “What are you really trying to do here?”, when you get stuck on a problem, make sure you understand the space you’re working in.

In development those constraints often come from the existing systems. The data structures and flow that are in use to solve the problem as it was understood last week. They were appropriate then, and we used them to solve that problem.

But this week we know more. And might understand the problem differently. But our first instincts are to treat all of the previous work as constraints on solving today’s problem. That’s a good place to start. After all, it worked so far. And it will likely work again.

Unless our new understanding of the problem has changed the underlying assumptions enough so that the constraints we’ve built for ourselves have become part of the problem. Maybe even the biggest part of the problem. Then you need to take another look at your assumptions and make sure they’re not holding you back.

Consider a workflow system. At first, getting things working and making the work flow is the problem and you can relegate problems and issues to some kind of exception handling. As the system matures and workload increases you continue to make things more robust. Smoother running. The percentage of issues goes down. But the raw number of issues goes up.

Until at some point the sheer number of issues, no matter how rare, becomes an issue itself. You reach a point where you can’t solve the problem by making them even rarer. Your problem space has changed. Your system has changed from a workflow system to an error handling system. The workflows keep happening, but instead of focusing time and effort on making them happen, now you need to focus on handling errors.

Which means the assumption that you can ignore errors is now incorrect and the place you’ve been stashing them for later is now a constraint. When you need to solve the current problem you need to revisit those constraints. You need to remove them from the problem solving at least, and probably from the system as well. And that’s OK. The code works for us, we don’t work for the code. If it needs to change then change it. Solving the problem, adding value, is the goal, not working within the existing constraints.

That doesn’t mean you should throw everything out and start again. That (almost) never works. You need to find the balance. And finding balance starts with knowing which of your assumptions and constraints are real, and which are just there because they’re comfortable.

by Leon Rosenshein

(Work) Spaces

"Multitasking" is probably too crude a category. When I first heard of XP, I thought pair programming was the *second* stupidest idea I'd ever heard. The stupidest was everyone working in the same team room (*not* an "open office"). But…

   -- Brian Marick

That’s something that resonates with me. And also a big part of what I miss about going to the office. I’ve been doing this programming thing for a while now, and I’ve done it in a lot of different environments. Before I was getting paid for it, it was late at night, alone as a teenager in my bedroom, in the back of a high school classroom mostly ignoring the calculus teacher (Sorry Mr. Topper), or in some cold, noisy, basement computer lab with rows and rows of computers.

Once I started getting paid it was still the noisy computer room, but sometimes the seat was in an F-16 simulator (since we only had one monitor and that’s where it lived) or the control room for the simulator (after we got another monitor). In the late 80s and 90s it was single offices with doors. Some folks wanted offices with outside windows, others wanted no windows (even at Microsoft, some folks didn’t like windows). It was easy to isolate yourself and focus on what you were doing. That meant it was also easy to lose track of time, what others were doing, and how what you were doing fit into what everyone else was doing. So it was easy to convince yourself that being busy was productive and you were making lots of progress.

By the mid 2000’s that started to change. Lots of open offices. Or at least multi-person offices. I’ve worked in both. Some team rooms were full rooms, with doors and windows and everything. Others were more ad-hoc, using whiteboards and couches and plants and room dividers to approximate separation. And there were general open-plan offices, with people loosely grouped by team, with the only separation being a slightly wider walkway between rows of desks to give some appearance of grouping.

One goal was to increase collaboration and interaction. Get folks who worked together to sit together and they’d talk more. Share more. Collaborate more. The other, usually unstated, but very real, was to reduce the space per person. At Microsoft the offices were at least 100 sq ft, often 150 - 200 for leads and people who often had small meetings in their office. Sharing offices and bullpens be 50 sq feet or less of “personal” space. 

It turns out that both the open plan and the individual office style are about the same. Whether physical (walls and doors) or virtual (noise cancelling headsets and social constructs), they both tend to isolate and reduce face to face interactions. There’s a little more talking, but we’re all aware of how common it is to Slack someone on the other side of a shared row, and while that’s effective for the communicators it’s isolated from the rest of the team.

Which gets us back to team rooms and what I miss most about being in an office. The team (or activity) space. The place set up to contain all of the people and information that is shared by a group of people working on something together. The ability to be loosely aware of a conversation and join in when it’s relevant, and when it’s not let it just seep into my unconscious awareness of things until I need to know.

The most productive, effective, and fun team I’ve been on was the team that delivered a shared viewing/editing platform for 2D, 3D, and streetside maps.  And it happened because we were in the same space. Somewhere between individual and mob programming. Discussing designs and implementations in real time. Changing things together. And right around the corner from our customers. So we had lots of chances to watch what they did and what their problems were. Opportunities to bring them into our space and try things together. Making changes in one area and having people working on related areas at least aware, if not involved in the decisions. Rapid iterations. Rapid releases. Rapid feedback.

There were lots of causes for that. The right physical space(s). The right motivations. The right incentives. The right tools and processes. The right people. In the same place at the same time working on the same thing.

I really miss that.

by Leon Rosenshein

Inversion

Goals are important. Knowing what you’re doing and why can help clarify things when you need to make a decision. Turning that around helps. Knowing what you’re not going to do is just as important.

Sometimes questions are like that too. Ask a question one way and it can be hard to answer. Invert the question and it can be a lot easier. Consider the following question:

Which one of the following does not have an integer cube root?

  1. 216
  2. 27
  3. 1331
  4. 700

The naive way is to calculate the cube roots and see. If you have a calculator that’s easy. Without, not so much. On the other hand, it’s relatively easy to calculate the cube of a number. If you change the question to which numbers are perfect cubes you can quickly come up with this table

1 -> 1
2 -> 8
3 -> 27
4 -> 64
5 -> 125
6 -> 216
7 -> 343
8 -> 512
9 -> 729
10 -> 1000
11 -> 1331

And see that 216, 27, and 1331 are perfect cubes, so 700 must not be. 

You can apply the same kind of question inversion to other things as well. Like debugging. When debugging the first question is usually “Why did that break?”. Often it’s helpful though to first go through the “How is this supposed to work?” cycle first. Especially if it’s an area new to you. 

Maybe performance is your thing. In performance, you normally ask “How can we speed this up?”. But maybe what you really need to do is to keep things from slowing down. That’s a different question, and the answer might be very different.

Really, it’s about perspective. Having the right one at the right time. Because how you look at things will influence how you see them. And how you try to change them.

by Leon Rosenshein

It Happens

"That hardly ever happens is another way of saying 'it happens'."

  -- Douglas Crockford

When someone says “That hardly ever happens.'' there are two ways to approach the situation. The first is to take it at face value and put just enough effort into the rare case to ensure the system continues to operate, even if that little part fails. Like adding timeouts and retries to network requests. Something might be delayed or a user might have to click a button again, but things still happen. And the happy path stays happy.

The second is to focus on those “rare” events. Put just enough effort into tracking the happy case so you know things worked and spend the rest of your time dealing with the outliers. Like silent failures when writing data to disk. Consider BatchAPI, which, like it’s predecessors, handles millions of tasks per week, sometimes per day. Even with 6 9’s of reliability, at those scales you’re going to have multiple tasks failing every day. And those are the ones that people care about. The ones people want to dig into and understand what went wrong.

In cases like that, in platform code, “hardly ever happens” is the place where you need to focus. As much as scale is important, error handling is more important. Both internal and external. Internal, platform, errors get handled inside the system, and ideally never impact the user (other than potentially a delay in seeing the result). That needs to be rock solid. Redundant. Failsafe. Because, like those disk write errors, something will happen. It’s up to the platform to make sure there’s no impact to users. And at the very least, the platform should clearly take responsibility for the error when it fails.

On the other hand, when a user’s bit execution fails, then what? It’s a successful failure. All of the tooling and framework code did what it should, and correctly, but the user’s code failed. Now you need to help the user. What failed? Why did it fail? How can it be reproduced? Was it a transient problem and a retry will just work?

As a platform you need to think about these things and how to respond. Because when you get down to it, the platform is really just a giant error handling system. And regardless of how rarely it happens, it will happen.

by Leon Rosenshein

Slacking Off

Firefighters, especially professional ones, have buckets of slack time. According to one study firefighters should have less than 25% utilization (time responding to incidents) to avoid burnout. 75% slack time. Built into the fabric of the system. If they don’t have that much slack time they scale up. Cities build more firehouses, buy more engines, and hire more firefighters to get that slack time back.

That’s probably a reasonable goal for your on-call person. First of all, you want your on-call to be fresh if something happens. Second, slack time doesn’t mean idle time. It means there’s not something specific scheduled to be done. There’s always plenty of maintenance work to be done, so the on-call isn’t idle. Things like updating runbooks, alerts, and documentation, automating common on-call tasks, and digging into perennial trouble spots.

But what about the rest of the team? No slack time there. That’s the way to get the most done, right? Wrong. I don’t know about you, but I know the planning I’ve been involved with does a pretty good job of spec’ing out the known knowns, and identifying the known unknowns, but the unknown unknowns and the things we know that just ain’t so always come up.

Even with perfect planning and no unknowns you need some slack. Vacations. Injuries. Illnesses. Outages (other people’s). All of those things, and more, say that if you schedule 100% of the time you’re not going to complete your plan. That seems to be true even if you follow Hofstadter's Law.

One option is to not commit to anything. Just work on the most important thing at any given moment. Things take as long as they take, but you’re never late. That works really well at a small scale, when there’s only one person/group deciding what the most important thing is. And that thing doesn’t change often while you’re working on something. But when the most important thing changes a lot, and the cost of your context switch is high, that leads to lots of churn and peanut buttering your progress. And that ignores anyone potentially waiting for you to be done.

If there are others depending on your work being done by a certain time and you miss then they’re going to miss as well. And their dependencies will miss something. Small delays compound and you end up with long delays. So we do need to make commitments to dates and hit them. Especially when working in a deeply interconnected system (which we do).

Which brings us back to having enough slack time. Or conversely, only committing a certain (less than 100%) amount of your time to work against deadlines. It won’t guarantee you meet those commitments, but if you don’t have enough slack, I can guarantee you won’t meet them.

by Leon Rosenshein

Experience -> Expertise -> Wisdom

ex·pe·ri·ence

practical contact with and observation of facts or events.
"he had already learned his lesson by painful experience"

ex·per·tise

expert skill or knowledge in a particular field.
"technical expertise"

wis·dom

the quality of having experience, knowledge, and good judgment; the quality of being wise.
"listen to his words of wisdom"


That’s the typical progression, right? You have experience(s). You gain expertise. You turn that into wisdom. That’s certainly the progression we all want, but is it typical? I’ve talked about the difference between data and wisdom before, but there’s a similar progression that isn’t about raw numbers and understanding.

The 10,000 hour rule, popularized in Outliers, says that to excel at something you need to spend 10,000 hours doing it. We can argue about the number of hours it takes, but what’s really key to that is not the number of hours of experience, but the amount of experience in those hours.

In the US people work about 50 weeks/year, and the workday is nominally 40 hours long, so each year is ~2000 work hours. By that logic, it would take 5 years to be an expert in whatever it is you do.

Now consider a worker on an assembly line. Putting wheels on a car. After 5 years that person has 10K hours and is an expert at putting wheels on a car. And probably putting nuts on bolts in general. But not putting the muffler on, let alone building a car. Or designing a car. Or driving a car. Because that 10K hours is really the same hour 10K times. It’s critical to getting the car correctly built and out the door, but from an experience standpoint, there’s really not much there.

To be an expert on building cars would require not just 10K hours on one task, but experience with all of the tasks required. Welding the frame. Building sub-assemblies. Installing them. Electrical work etc. To be an expert in automobile building you need not just hours of experience, but lots of different experiences. So after those 10K hours the worker has seen all of the common things that can go wrong, and many of the uncommon ones. They’ll have worked out ways of dealing with them and be able to work through them. That terson would be considered and expert in the field of automobile building.

Wisdom though, includes good judgment. Not just experience or expertise. It requires the ability to learn from your experiences, then realize what you’ve learned might not apply in some cases, and then learn something else. Something more general. Wisdom goes beyond the what and how into the why. You have to understand why something is being done, and be able to generalize how a seemingly unrelated action will have an impact on the end result. Learning how to learn, unlearn, relearn, generalize, and extrapolate is a whole different set of muscles than just being an expert. You could say you need 10,000 hours of being an expert and working with new and different experiences in differing contexts to turn expertise into wisdom.

That applies to knowledge work just as much as it does putting wheels on a car or building cars from parts. First you need to learn the tools, Then you need to spend time doing the work. Not just the same 10 or 100 hours over and over again, but new and different hours. Exposing you to new and different situations and constraints. 10,000 different hours. Which can take longer than 5 years. And that just gets you to the expert level. There’s still plenty of room for growth. And wisdom doesn’t magically appear after 10K hours of being an expert. It just starts small and narrowly scoped. As you gain more experience with your expertise the scope broadens. And that never stops. Your scope isn’t limited to your day job or even the art of software engineering in general.

So as you journey along the path of experience -> expertise -> wisdom keep looking for opportunities for new experiences. To expand your expertise and wisdom. To expand your scope.

by Leon Rosenshein

Is That An Error?

“. . . the errors are errors now, but they weren’t errors then.”

    -- Marianne A. Paget, The Unity of Mistakes: A Phenomenological Interpretation of Medical Work.

I’ve talked about tech debt and agility before. It’s the choices you make to get value sooner by pushing work out to the future. And it’s absolutely a good idea. When done in moderation and at the right time. Just like Taking on any other debt.

And I’ve also talked about decision making. How the process of making decisions and their quality is related to, but not the same as, the outcome of the decision. We should be making the best decisions we can with the information we have when we need to make the decision.

Communication is important. That means definitions are important. It’s important to label things correctly so that everyone has the same, shared understanding of the situation. Which brings me to my point.

Not everything that isn’t the way it should be now is tech debt. We learn new things all the time. When what we know changes we need to respond to it. That often adds work, but it’s not tech debt.

Consider this scenario. I’ve been at least tangentially involved with computer graphics for a long time now. Simulation, games, 3D mapping. When I started the limiting factor was geometry transforms. Doing all that floating point match to figure out which pixels the corners of a triangle mapped to on the screen took a long time. Especially compared to the simple integer math involved with filling the triangle once you had the corners. So we built our models with as few polygons as possible. Did aggressive culling by knowing that from one octant it was physically impossible to see some of the model, so don’t even try to draw that triangle. And z-buffering was expensive too, so we used the painter’s algorithm and spent a lot of time figuring out how to time drawing the model in the correct order, from relative back to front and just overwriting things.

Then the world changed and we got transform engines in silicon. Chips designed to do that math fast. And suddenly we were fill rate limited. So all of the old models and the rendering engines were suboptimal. We needed to rebuild them. Some people called that tech debt. But it’s not. It’s new work. It’s an opportunity to add value based on new information and capabilities. In fact, not doing the work to take advantage of the new capabilities and releasing the next version without supporting them would be adding tech debt.

And building models and rendering engines that took the transform limit into account was the right thing to do. When the models and engines were built that was the way to get the most performance from the system. Doing anything else would have been a bad decision, because we had information showing us that more polygons and higher framerate added value to the customer. Doing anything else at the time would have been an error.

But needing to do the work to support the new graphics cards was neither tech debt nor caused by a bad decision. We needed to do it because the environment, the context changed. The work needed is the work needed. You still need to do it. But don’t put the wrong label on it just because the terms are handy. Be honest with yourself and each other. Saying someone made the wrong decision because the world changed and now there’s a better choice doesn’t help anyone.