For my friendgineers· Leon's musings on software development

November 12, 2021 by Leon Rosenshein

PR Comments

I’ve talked about what makes a good commit message. They’re important markers left for future developers to understand what changed and why as they look back to understand why things are the way they are. They’re also important in the moment to the people who are reviewing the code. They explain the purpose, expected changes and interactions, and potentially, what won’t change.

That’s important information when you’re reviewing a PR for a bunch of reasons. One of them is that it informs what you comment on. Any issues you see during the review should be noted, but that doesn’t mean they’re PR comments. Let’s say there’s a PR on some library function and you notice during the review that there are some uses of the method that don’t handle the returned error properly. You should definitely make a note, probably a ticket for the right group(s), but those fixes most likely shouldn't be part of the library change PR (unless the change is to how errors are reported). On the other hand, if you notice that some of the new code is a lot like other code in the library, that should be a comment on the PR.

Which leads to another way that the commit message helps inform how you make the comment. Generally speaking, PR comments come in 2 flavors, blocking and non-blocking. Blocking comments are those that the reviewer feels need to be fixed AND re-reviewed before it’s OK to land the change. Pretty straight forward.

It’s the non-blocking ones that can be more interesting. They can range from inquisitive/informational to “this needs to be fixed but I trust the author to get it right”. The challenge then is to make sure the author of the PR (one of the readers of the comment) understands the commenter’s intent. The easiest way to do that is to be explicit. Identify what kind of comment it is and what the expected response (if any) is.

In a larger PR, or where reviewers with different focuses are looking it might be helpful to add some identifiers. Maybe the comment is security related, or is particularly important for a specific use case. For example, a comment related to calling the method in a CI process vs a production service or a CLI app. Noting the difference will help bring awareness to the use case and possibly uncover additional issues.

Another important thing to keep in mind is that although comments are attached to a specific line of code, they might apply to a block/loop/method/library/etc. That means, particularly for the larger scoped comments, a one line subject can provide some context and let the reader know what’s coming. And if you need to provide context, you’ll want to provide the detail.

Which gives us a nice template to use for PR comments. Something like:

<label> [optional identifiers]: <subject> <details>

In practice, this might look something like

nit: Misspelled word in error message, “thier” should be “their”

Which would be a blocking change that is expected to be fixed before land, but doesn’t need re-review. Or maybe

Suggestion: This logic is roughly duplicated in methods XXX, YYY, and ZZZ A future PR be considered should refactor the logic into one place

Indicating a future task to decide if there should be a refactoring, and then in that case a new PR should be made to refactor. The existing PR should NOT be delayed/extended to include the refactoring.

Check out Conventional Comments for more examples and suggested labels/identifiers.

November 10, 2021 by Leon Rosenshein

Plans and Planning

planning von moltke eisenhower

We’re in the thick of planning season again, and at times like this it's helpful to remember something else Eisenhower said back in 1950, “Plans are worthless, but planning is essential.” That sounds paradoxical on the surface, but it’s a generalization of a line from von Moltke the Elder’s On Strategy, “no plan of operations extends with any certainty beyond the first contact with the main hostile force.” And given Eisenhower’s background, that’s not an unsuprising source.

Thinking about it from that context, it’s not that paradoxical. It’s not that plans, in and of themselves, are worthless. It’s that plans make all sorts of assumptions about how the thing you’re making plans about are going to respond to the plan. And like any other assumption, reality is often very different.

So we know that long term plans are going to be problematic. But that doesn’t mean that planning (or even building the plan) is wasted. To build a good plan you need a deep understanding of the situation. You need to understand the drivers, the constraints, and the goals. Not just what they are, but the relative priorities inside and across those groups. Which means that the output of the planning process is not just the plan itself, but also a shared understanding of the problem and solution space.

It’s that shared understanding that really provides the long term value. You start with the plan and the first step in the plan. Then, as reality diverges more and more from the plan, the entire team can work together, with that shared understanding of the solution space, towards a solution to the problem the team set out to solve.

Not necessarily the solution as envisioned in the original plan. Reality has provided direct and imperative feedback on the situation. The team needs take that into consideration and update the plan, and the end goal, accordingly.

Because plans are worthless, but planning is essential to solving the customer problem you set out to solve.

November 8, 2021 by Leon Rosenshein

Cloth or Die

tools collections code quality it depends

You’ve probably heard of Adam Savage, of Mythbusters and tested.com fame. Lots of fun stuff there. And lots to learn as well. Every once and a while he’ll need to cut something. Depending on what it is, he’ll reach for a different tool. Plasma cutter for steel plate, Pipe cutter for pipe. Scissors for paper, cloth, aluminum and sheet metal.

Unsurprisingly, he doesn’t use the same scissors for all of them. Aluminum and sheet metal use one set, and they’re a lot beefier and have a much longer moment arm. One thing that might be surprising is that he doesn’t use the same scissors for cloth and paper. In fact, there are lots of kinds of scissors. Sure, they’re all hand operated and have moving blades, but they’re pretty different and some have specific uses. And apparently one of the mst damaging things you can do with a pair of scissors is cut paper. As he put it in his 20 minute video on scissors, on his cloth scissors he wrote “cloth or die” to make sure no one gets it wrong.

The same holds true in software. Consider the Collection. There are lots of different collection types, but in general a collection is a bag of stuff. It Depends. With enough extra code you can use almost any collection type to provide almost any functionality. But that doesn’t mean you should. What kind of collection you should use depends on what you want/need to do with the collection. Need really fast access to a statically indexed/ordered collection? Probably want an array. Or maybe a hash/map. Need to guarantee that there’s only one value for a given key? Try a set. Are there more reads or writes? Does it need to be sorted somehow? Do you need to maintain the order things were added? Are there memory constraints? All these things go into the decision making process.

So next time you need to maintain a collection of things make sure you know how you’re going to use it. And make sure you don’t earn the Wrong Scissors (de)merit badge.

November 5, 2021 by Leon Rosenshein

Continuous Education

learning tools

One of the things that I think is important for developers is continuous learning. There’s always something. From new fields to new techniques in an old field to new tools to really understanding the toolsets that are already in use.

Consider MS Word (or Google Docs). It took me a surprisingly long time to really internalize how to handle simple layout in those tools. Using page and section breaks instead of just hitting enter a bunch. Or using “Repeat as header row” for big tables. It takes a little longer at first, but until you really start doing things that way you end up spending a lot of time manually fixing the format of things. And still end up weird gaps in your docs where you used space to format and then something else changed and screwed it up.

The same is true for most (all?) of the tools we regularly use. Shell aliases. Shell pipelines. Git power tools. Rectangular operations on VSCode/Jetbrains/<Editor of choice>. There are the things we do all the time, the things we do occasionally and look up the details for, and things we have workarounds for, because we never figured out a better way.

One of the reasons I like to work through problems with others, beyond the immediate benefit of a fresh viewpoint and different experiences, is learning more about the tools we share, but use in different ways. Next time you’re pair/ensemble working with someone and you see someone do something in an interesting way stop and get more details. It’s a great way to learn something new, and if you’re in a group setting, you probably won’t be the only one.

Another great resource is MIT’s Missing Semester of your CS Education. A set of sessions that talk about using the tools of the trade. Not the algorithms, data structures, compiler design, or architecture patterns. Instead, how to get the most out of your tools. Automation. Validation. Consistency. How to remove friction from your daily life. Because very often the best way to be faster overall is not by increasing top speed, but by removing drag when you’re going slowest.

November 3, 2021 by Leon Rosenshein

Like A Baby

life learning

Slept like a baby

Taking candy from a baby

Taking baby steps

What do those phrases mean to you? The generally accepted meaning is something like

Slept like a baby -> Slept soundly and deeply

Taking candy from a baby -> Easy

Taking baby steps -> small tentative steps

I’ve got 4 kids, and let me tell you, in my experience, reality doesn’t match that. Babies may fall asleep at random times, but they can fight it if there’s something interesting to them, and when they do fall asleep they’ve all got unique hair triggers that will wake them up and keep them from going back to sleep.

Have you ever tried to take something from a baby or toddler that wants what it has? It’s not easy. If they hold it it’s theirs. If they can see it, it’s theirs. If they once saw it and want it now, it’s theirs. When my oldest was a baby she got a hold of her hair and wanted to taste it. But it wasn’t long enough, so she pulled on it. It hurt so she grabbed tighter and pulled harder. You can see where this is going. We had to pry her tiny little fingers out of her hair to get her to calm down. Definitely not easy to take something they want.

Baby steps. When a baby goes from crawling to cruising he furniture to walking they do take small steps. But that’s because their legs are so short. They can’t take bigger steps. But I wouldn’t call them tentative. All walking is falling with style, catching yourself before it’s too late. But babies don’t know that they can catch themselves. They throw it all out there and hope. Eventually they figure out how to catch themselves. That’s not tentative. That’s confident and assertive.

By now you’re asking what this has to do with software development. First, easy things often aren’t. After all we’ve got compilers, so anything is possible, but that doesn’t mean it’s easy or that it has no other impacts. Ownership and boundaries are real obstacles. And just like my daughter and her hair, we’re dealing with systems with inherent feedback loops, and sometimes those loops can make things worse. Putting a load balancer in front of multiple instances of a service makes things more scalable and resilient, until you hit some point and a slight delay in one instance causes the entire system to become unstable and fall over.

Second, doing things in small steps. Doing things in small steps makes sense. Do the smallest thing that you expect to add value. You don’t know if it will since you haven’t done it. But do it completely. Do the whole thing. If refactoring is needed, do it. If data migration is needed, do it. Take the step. Then catch yourself. Reevaluate and do it again.

That’s how software development is like talking about babies. What about slept like a baby you might be asking. That’s just there because it’s wrong, just like the other two, and I think lists should have at least three items, or they’re not lists.

November 1, 2021 by Leon Rosenshein

Realms

architecture domains context

“Bounded Contexts” sounds stuffy and arcane, the sort of term whiteboard warriors would cook up so henceforth I’ll refer to them as “Realms” because Realms have boundaries and Context is King.

-- Dan Bara

As I’ve mentioned, I think context is important. And I like bounded contexts because boundaries are important too. They help in knowing what you need to worry about and what you don’t.

But as Dan points out, the term bounded contexts is pretty abstract. It’s not a term used in the physical world. Noone points to a patch of ground and says “That’s a bounded context”. Also, “bounded context” sounds static to me. It makes it seem like the bounds are fixed, unchanging.

I think that does a real disservice to people trying to learn about them. Because there’s a lot of tension on the boundaries. Is something inside or outside? Should it be inside or outside? If it were up to the thing itself, would it want to be inside or outside?

Which is where Realms comes in. Realms have clear borders. Often there's some kind of process/ceremony when you change realms. Sometimes there's even a cost. Meanwhile, the realm is held together by a shared something. A context if you will, that all inhabitants of the realm agree on. They might not all like it, but while inside the realm they use that context. Sometimes a group leaves. Realms get split apart. Sometimes realms join, like the cities of Buda and Pesht. Sometimes there’s struggle, and sometimes it’s easy, like Pheasant Island. The key though, is dynamism and tension.

The same holds true with software and bounded contexts. In your typical e-commerce system you’ve got customers, orders, inventory, and more. Which domain is pricing in? Is it part of the inventory? What about sales and specials? Coupons and discount codes? Preferred pricing? Are they all their own domains? Lots of options. All are correct in some ways, but incorrect in others. So you pick one. Then you pay attention to the tension. And when necessary, the boundaries shift. Just like in the physical world.

I’m still going to call them bounded contexts, because in the ubiquitous language of software development the term has shared meaning, but in the back of my head I’ll keep the idea of realms and their tension in mind.

October 21, 2021 by Leon Rosenshein

More Comments

documentation code quality code for the maintainer comments

Code should be self documenting. That sounds good, but what does it mean? It’s about making your code legible. Names of things, variable, method, class, package, library, executable, etc should mean something. What the code is doing should be obvious from reading. Encapsulation and decomposition helps a lot here.

Writing the code itself is a conversation between you and the compiler/interpreter. It has to be very precise in what it does. And since it defines what happens, it is the ultimate source of truth for how things will be handled. But it’s not the source of truth for everything that was in your head during the conversation.

That’s where comments come in. They’re a conversation with the next developer that provides context for the maintainer. Even (especially?) if it’s you. Things like why the code was written this way. The external constraints that had to be met. The choices not taken. Things that work fine now, but will be a problem later when scale changes.

And since they’re a conversation with another person they don’t have the limitations of whatever language you’re writing in. They can be about more than the why. You can talk about approximations. You can talk about generalities. You can talk about how this piece is expected to fit into the bigger picture without breaking your encapsulation. You can have simple artwork like flowcharts or truth tables. You can even have links to entire documents that provide even more context.

So don’t let anyone tell you that your code is fully self documenting and there’s no need to add comments. Your code isn’t, and there is a need.

Just don’t add comments like this

// Increment i
i++;

October 20, 2021 by Leon Rosenshein

The Tyranny of Or

context decisions it depends tyranny of or

“The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function.

One should, for example, be able to see that things are hopeless and yet be determined to make them otherwise.”

― F. Scott Fitzgerald, The Crack-Up

A or B? Pick one. Many problems are answered that way. And maybe they’ve even been posed that way. But is A or B really a binary choice? Sometimes it is. But often it’s not.

Think about code reviews vs pair/ensemble programming. Code reviews are imperative and the only way to ensure quality code. Pair/Ensemble programming is critical and the only way to ensure quality code. Code reviews (or PR reviews) were instituted to solve a number of problems. Knowledge transfer. Bug detection/prevention. Adherence to the style guide. Getting a different perspective. What about pair programming? Shared knowledge. Bug detection/prevention. Shared style. Team cohesion.

Both methods are pretty good at achieving their goals. And those are pretty similar. On the other hand, code reviews can slow things down and knowledge transfer isn’t perfect. Pair (and especially ensemble) programming can miss parallelization of clearly separable work and you lose the benefit of a different perspective. So you have to choose one or the other. Right?

Maybe. You could do both as well. That gets you all the benefits. But it also has all the downsides. Maybe there’s a better approach. A hybrid approach that avoids the tyranny of or.

Defense in depth. Code in small groups. Talk a lot. Share approaches and changes as you develop. Automate as much as you can. Adherence to style guides. Lint for common structural issues. CI and automated tests, both unit and integration, so you know you haven’t had an unexpected impact on downstream customers/consumers. Selective code review from interested/relevant downstream partners and people more familiar with the ecosystem in general and environment, when appropriate. Get the benefits of both, and minimize the downsides.

Which is not to say that binary decisions are bad and that we should never make them. There are true binary choices. Especially when you look at other constraints. But just because something is presented as a binary choice does not mean you have to make one. Take the time to make a good decision in context, because, like all good decisions, it depends.

October 19, 2021 by Leon Rosenshein

Legibility

architecture code quality

Definition of legible

1: capable of being read or deciphered
legible handwriting

2: capable of being discovered or understood
murder sweltered in his heart and was legible upon his face

-- Merriam Webster

The first one you know. UI/UX/Design stuff. Being easy to read. But the impact, positive and negative, of making things legible, especially the second definition, runs way deeper than choice of font size and foreground/background color.

Code can be readable and completely illegible. Green text on a black background with a monospace font that makes it easy to distinguish between 1 (the number one), I (the capital letter `eye`), and l (the lowercase letter `ell`) will make your code readable. But it doesn’t do much to help with discovery or understandability.

At the simplest, legibility in code comes from clean code. Separation of concerns. SOLID. KISS. DRY. All those acronyms. If you do those things reasonably well your code will be reasonably legible. At least at the tactical level.

But having truly legible code goes way beyond that. It’s about applying the same principles you would apply to a module/library to an entire system. It’s about your abstractions and data models and APIs. It’s about making sure that the system is understandable/discoverable at both the large and small scales, and that it’s easy transition between the levels as needed.

One thing that’s important to keep in mind while making things legible is that your model(s) of the system need to truly match reality, not just how you want reality to be. Take a complex system, make some simplifying assumptions, idealize things, and make it happen. When you do that it often feels correct, because you have control over what you’re doing. It’s predictable, understandable, and subtly wrong. But you won’t know it at first. It will mostly work. Until you hit that edge case.

So you patch around it. Until the next edge case. Rinse and repeat. Pretty soon your simple, elegant, legible system is none of those. So you come up with a new model and try again. And that cycle repeats.

Unless your models acknowledges that things aren’t that simple. That they allow for unexpected interactions. And that’s hard. Especially in large systems.

October 15, 2021 by Leon Rosenshein

Prioritization vs. Categorization

planning

MoSCoW. The method, not the capital of Russia (or any other city) or the mule.

Must: The system must meet these requirements or is considered a failure
o
Should: The system should meet these requirements, but if it doesn't we can do it later
Could: The system could meet these requirements. No one will object, unless there are must and should requirements that are unmet
o
Won't: The system won't do this. It will make the system worse and/or any time spent on these things is completely wasted. Don't do them.

Seems pretty straightforward. The differences are clear. Do them in that order. You don't need any more information so get to work,

Not so fast. There are at least a couple of problems here. First, those are just labels. Labels on buckets of similarly important things. There's no sequencing provided inside a bucket. What happens if there are more items in the must bucket than there are teams to work on them? Even if there's enough time to serialize them, you don't know which one should be done first. So it's really categorization.

If there's only one team, and the requirements are all completely orthogonal, sequencing doesn't matter. Of course, in all the time I've been doing this I've never worked on a project like that. And I don't know anyone who has. It's probably happened somewhere, but it's rare enough to not worry about right now. Which means sequencing is important.

Second, while those are words, not numbers, there's really no difference between Must and Priority 1 (or 0, or -1). It's just the group with the highest importance. And they both suffer from the same kind of inflation. Every group/team/stakeholder thinks their problem/requirement is the most important. Or if not critical overall, critical to them, so they label it must. Because we all know that the shoulds almost never happens and they are there for amusement only.

Which is not to say that categorization is unimportant. It's not. It's critically important. But it's not enough. You have to go beyond the categorization and really prioritize. You need an ordered list of what's the most important, balancing urgency and short and long term gain. You need to keep that list current. And most importantly, you need to follow it. Even (especially?) when a single stakeholder starts arguing loudly for their favorite thing.

Recent Posts (page 24 / 71)