Recent Posts (page 4 / 71)

May 7, 2025 by Leon Rosenshein

Dijkstra On Bugs

Dijkstra software engineering flow debug code quality debugging

Unsurprisingly, there are hundreds of quotes about computers and programming by Edsger Dijkstra, and almost all of them are worthy of a post (or two). His work is foundational to much of what we do as software engineers. He was also a prolific, excellent, and memorable communicator. After all, he was the one who came up with Goto Considered Harmful and that one is certainly well known, almost dogma.

Image of Edsger Wybe Dijkstra — Edgser W. Dijkstra
Attr: Hamilton Richards

But today I’m going to talk about one of his lesser known statements. A statement about how we view program correctness and debugging.

Let me start with a well-established fact: by and large the programming community displays a very ambivalent attitude towards the problem of program correctness. A major part of the average programmer’s activity is devoted to debugging, and from this observation we may conclude that the correctness of his programs —or should we say: their patent incorrectness?— is for him a matter of considerable concern. I claim that a programmer has only done a decent job when his program is flawless and not when his program is functioning properly only most of the time. But I have had plenty of opportunity to observe that this suggestion is repulsive to many professional programmers: they object to it violently! Apparently, many programmers derive the major part of their intellectual satisfaction and professional excitement from not quite understanding what they are doing. In this streamlined age, one of our most under-nourished psychological needs is the craving for Black Magic, and apparently the automatic computer can satisfy this need for the professional software engineers, who are secretly enthralled by the gigantic risks they take in their daring irresponsibility. They revel in the puzzles posed by the task of debugging. They defend —by appealing to all sorts of supposed Laws of Nature— the right of existence of their program bugs, because they are so attached to them: without the bugs, they feel, programming would no longer be what is used to be! (In the latter feeling I think —if I may say so— that they are quite correct.)

July 1970
prof.dr.Edsger W.Dijkstra
Department of Mathematics
Technological University
EINDHOVEN, the Netherlands
EWD288

That’s not quite as pithy as Simplicity is prerequisite for reliability, and there’s a lot to unpack there. Go read it again.

To me, the first and most important thing he’s saying is that, as a profession, we not just accept, but defend the existence of bugs. That’s a pretty damning accusation. That the profession of software engineering feels that all programs should have bugs.

Second is that debugging is the fun part. That we need the opportunity to debug. That without that part it’s boring.

Third, that we somehow need the Black Magic of the computer to fill some psychological need.

That’s not how I see it, but it does give you something to think about. Take the first part. That we defend the existence of bugs. There’s some truth to that. For all but the most trivial of programs running in a constrained domain, I would assert that it’s impossible to ensure that future changes to not cause improper operation. Or at least impossible in practice. But that doesn’t mean we should ignore the possibility of bugs, or that we shouldn’t be as defensive as we can be. And we should maintain Zero Bugs. Prevent what you can, then fix what is exposed as fast as possible.

Personally, I don’t find debugging being fun. I think that conflates the feeling of accomplishment we get from finding/fixing an issue with enjoyment. There have been many occasions where I’ve been proud of myself for doing the work, and I’ve definitely felt the easy and fulfillment of getting into a flow state while tracking down an issue, I wouldn’t call it fun. And I don’t know many people who would.

As to needing the Black Magic of computers, that’s not something I experience, but it might be true for others. As a description of how people approach things, maybe? Regardless, I don’t think it’s a good reason to accept issues.

Having said that about the individual points, his meta-point that we don’t do enough to ensure that issues don’t end up in the hands of our users/customers, is valid. I think we can, should, and must, do better. In this age of fast and easy updates, I think we, as a profession, have somewhat forgotten the value of shipping good software in favor of shipping flashy software. And that reflects badly on us.

As software engineers, our goal should be to solve our user’s problems by balancing their needs and the system’s capabilities. Most of the time that’s by using more software. But sometimes it’s by using less software. And in both cases, it’s by delivering software that does the right thing. All of the time, not just most of the time.

That’s how we can honor our responsibilities as software engineers and respond to Dijkstra’s message.

May 5, 2025 by Leon Rosenshein

Zero Bugs

tdd zbb agile lean

Back when I worked on boxed products at Microsoft, we had 2-year release cycles. And towards the end of each one was a milestone called Feature Complete. That was the point in the project where all features we expected when we did planning 18 months earlier were done. Or at least the ones that we hadn’t decided to cut because we ran out of time. You would think that after feature complete, we’d be ready to ship. But that wasn’t the case.

Instead, the next big milestone was Zero Bug Bounce (ZBB). That was the second time in the history of the project that there were zero active bugs in our tracking system. The first was before we wrote any code. After that, the number of bugs climbed until shortly after Feature Complete. For ¾ of the project or more, the incoming bug rate was higher than the fix rate.

That wasn’t just our project. That was the way most software was written. You built it, then you tried to test quality in. It worked, after a fashion, but let’s not fool ourselves. It wasn’t very efficient, and it wasn’t a lot of fun. From the beginning of the project until some time after feature complete the backlog of work kept getting bigger.

At the same time, the early 2000’s, extreme programming and the agile movement were getting started. Borrowing some concepts from lean manufacturing, and the idea of building quality in instead of testing it in.

One of the ways that expressed itself was the idea of a Zero Bug Policy (ZBP). The idea that your software should have 0 bugs. At the time, most folks looked at that and said it was impossible. Of course, there were already examples of bug free software, but people still thought it was impossible to write bug free code.

And those folks are right. Even with Test Driven Development (TDD), and a full suite of unit, integration, and system tests, you can’t guarantee bug-free software. But that’s not what a ZBP is about. It’s not that you never make a mistake, or a bug never gets shipped to a customer. Instead, a ZBP is really about not having a bug tracking system.

While ZBB and ZBP have a Levenshtein distance of only 4¹, they’re completely different things. A ZBP means that instead of keeping track of your bugs and fixing them later, when you’re not so busy adding more bugs, you fix them now, for some reasonable value of now. You don’t drop everything and fix it², but as soon as you finish what you’re working on you fix the problem before you start something new. That means that every day is potentially a ZBB.

That’s a very different way to build software. It’s hard to do. You need to build the muscles for TDD and unit test. You need to build the muscle to say “No” when schedule pressure pushes you to move on to the next feature even though there are still issues with the current task. You need to build the deployment muscle so it’s easy to make the fix. All of these things and more are hard to do, and don’t show any immediate benefit³. It takes disciple and commitment.

Another benefit of ZBP is that you’re always ready to ship. You might not have the feature set you originally planned, and it might not be as pretty as you might have made things, but if you need to do a demo, you can demo everything you’ve done. If something happens and the release date moves forward, you have something to release. You can sleep at night and not have to worry about having the rug pulled out from under you.

Remember, even if you’re living in a ZBB world, you don’t have to stay there. You can bias your choice of work slightly so that your rate of finding issues is lower than the rate at which you fix them. Even if this doesn’t get to you ZBB before feature complete, the wall you hit at feature complete will be shorter.

And finally, you need to differentiate between planned features, feature requests, learning more about the domain you’re operating in, and software bugs. The first two have nothing to do with a ZBP. You can have as many of them as you see fit, and you can track them however you want. They key is that they are NOT bugs. That’s just future work you need to do.

New learning about the domain might or might not be a bug. Learning there’s a better way to do something, or an abstraction you should be using is not a bug. Finding your domain model doesn’t match the system you’re trying to model IS a bug and needs to be fixed ASAP.

Simple coding errors are also bugs. First, write a test that fails because of the bug. Then fix the code so that test, and all other existing tests, pass. Again, don’t add those issues to a long-term tracker and wait to fix those issues. Just fix them now.

Only one for the acronyms, but that’s cheating ↩︎
Sometimes you do need to drop everything and fix the problem. Or at least part of the team does. If something changes and your production system goes down, you mitigate it immediately. Similarly, if the bug found is blocking a large portion of the dev team, you might choose to fix it immediately. In most cases however, you can work the fix in as the next thing. ↩︎
In the long run, putting more effort into how you write code will pay you back, but you can always rent time by taking on technical debt. You just have to pay it back later. ↩︎

May 2, 2025 by Leon Rosenshein

Emergency Procedures

context mitigation outage process chesterton's fence

The other day I ran into a quote on the internet about the problem with emergency procedures. I generally agree with it. The quote went like this:

If you wouldn’t use your emergency process to deliver normal changes because it’s too risky, why the hell would you use it in an emergency?

But, as always, It Depends. It’s about the risk/reward ratio. You want the ratio to be low. If the system is down, the risk of breaking the system is low. If it’s down, you can’t crash it .

In general, you want one, and only one, deployment process. You want it to be easy, automated, idempotent, and recoverable. You want it to be well exercised, well documented, well tested, and fail safe. And in almost all cases, you should use it. All of the checks, validations, and audit trails are there for a reason (see Chesterton’s Fence).

The main goal of any deployment process is to make sure the user experience is not degraded. Or at least only temporarily degraded by a very small, broadly agreed upon amount. That means making sure that nothing happens by mistake and without an appropriate amount of validation. There can be tests (unit, integration, or system) that need to pass. There can be configuration validators. There can be business, legal, and communications sign-off. All in service of making sure no-one has a bad interaction. Actually deploying the new thing is often just a tiny part of the actual process. There’s a high risk of something going wrong, so you need to be careful to keep the overall risk/reward ratio down.

In an emergency situation though, the constraints are different. If the system is down, you can’t crash it. You can’t reduce the throughput rate. You can’t make the experience worse for users¹. The risk of making things worse is low, so the risk/reward ratio is biased lower.

In fact, many things you normally do to make sure you don’t have an outage are unneeded. You don’t need to keep in-flight operations going (because there are no in-flight operations). Instead, you can skip the step of your process that drains running instances. You don’t need to do a phased update to maintain your throughput. When nothing is happening, getting anything running is a step forward. Because nothing is running, you don’t need to do a phased roll-out to check for performance deltas or emergent behavior or edge cases. After all, things can’t get much worse. There are just a few of the things you don’t have to worry about when you’re trying to mitigate an outage.

Magic wand behind glass labeled 'In case of emergency, break glass'

Or to put it more simply, outage recovery is a different operation than system upgrade. When dealing with an outage, the first step is to mitigate the problem. When doing an upgrade, the most important goal is to have customers/users only see the good changes. There should be no negative changes to any user. Many steps can (and should) be shared between the two processes. But the goals are different, so the process is going to be different.

Ok, there are things you can do to make it worse. Like loose data. Or expose personal data. But generally speaking, if your system is down, you can’t make the user experience worse. ↩︎

April 30, 2025 by Leon Rosenshein

Careers are Non-Linear

career context scope

Hiring has been on my mind lately. I’ve been looking for an entry level developer. Someone just starting out in their career. I’ve described the arc of my career before. In fact, I came up with what I think is a pretty novel (and useful) way to describe the arc of a career. It’s also good for helping you visualize where you are at any given point compared to your company’s (or more specifically your managers) expectations. Anything that helps you do that gap analysis with your manager and guide the discussion on how you’re going to close those gaps is good for your career.

Radar plot of a better way to visualize a point in a career — Visualizing your career

One important thing to keep in mid though, is that while we think of careers as always being “up and to the right”¹, that’s not really the case. Especially as it’s perceived by the person living it. In fact, careers, as experienced by the person having the career, are very non-linear. The slope changes. It can even be negative. Particularly in one aspect or another. Even when the overall arc of the career trends towards more scope of influence.

Every career change, whether it’s role on a team, changing teams, promotion to a new level, or changing companies, changes your context. Everything you learned about where you were is still true, in that context. And much of it is still true in your new context. But not all of it.

That’s why your career is non-linear. What you’ve done got you to where you are. It was the right thing at the right time, in the right place. And while I don’t believe that the Peter Principle is generally true, when you start that new role, you definitely know less about it than you did about the role you just left. You don’t get promoted until you can’t do the job. You get promoted until you can’t learn/grow enough to do the next job. And you didn’t get the role change you just made because you can’t do the job. You got it because they do think you can do the job.

Think about it. If you really were ready for that next role (and people know and it was available), you would have gotten it. Since you didn’t, you’ve gone from being at the top of your old role, one of the best around, to being OK at your new role. Not bad, but nowhere near ready for promotion. So compared to other folks in the new role, you’re closer to the bottom that you are to the top. And that can feel like a move backward².

All of that is when you’re staying in the same basic job/role. Moving from IC to Manager has all of those issues, and a whole set of its own. The same applies when transitioning between Product/Project/Program Management and Development, or really any other “discipline” (using the term very loosely here).

The important thing to remember is that all of these steps are advances in your career. Even if they don’t feel like it to you at the time. Even (especially?) if they feel like more work. When you’re challenged and succeed, you grow.

As has been said, “If you rest, you rust”.

Progress being up and to the right is a metaphor. Knowing how we use metaphors, where they come from, and how they can subtly influence things even when you’re not trying, is an important topic, but for another day. Meanwhile, consider Metaphors We Live By and Darmok as tokens of the importance of metaphor in our lives ↩︎
Back in the day at Microsoft, the Principal band was levels 65-67. A three level span doesn’t seem like that big a span, but in fact it was huge. L64, senior engineer, was considered a terminal level. If you reached level 64 there was no longer any expectation that you would get promoted or eventually be asked to leave. L59-L63 was considered up or out, the only difference was the time. Moving from L64 to L65 (Senior to Principal) was a big deal. It was an inflection point in your career. From L65 on, even if you had no direct reports, you were expected to show results through others. You still had to do your work, but the big expectation was around how you impacted others. That’s fine and makes sense. The problem was that back then everyone in the Principal band compared to everyone else in the same band. And newly promoted folks at L65 were being compared to folks at L67 who were being considered for promotion. L68 was Vice President. So the first review cycle as a new L65 you were suddenly compared to someone about to be one of the Vice Presidents. Unsurprisingly, L65s didn’t come out well in that comparison. It certainly felt like a step backwards. Talk about imposter syndrome. ↩︎

April 29, 2025 by Leon Rosenshein

People Over Process

systems thinking policies agile tyranny of or

As seen on the internet

People over process.

Why?

Because systems can’t fix problems with people, but people can fix problems with systems.

People Over Process is from the Agile Manifesto. There’s a lot to unpack there. It starts by acknowledging that the software development is a socio-technical endeavor. There are people (that’s the socio part). But there are also tools and rules and processes, which makes it technical.

People affect technology, which affects people. — Socio-Technical Theory

First, and foremost, it’s over, not or. It’s not a Boolean choice. You get to have some of each. If you choose to only focus on the people, making them safe and happy, you can’t organize. You can’t even self-organize. Because without some norms, some process, you can’t communicate. And if you can’t communicate, you can’t coordinate. Not because no one cares, and not because no one wants to listen, but because anarchy is the opposite of coordination. Even the most libertarian knows that there needs to be some structure. Or you end up with the tragedy of the commons.

And if you choose process only, the first time something happens that your process doesn’t cover then you get stuck. Unless/until you can come up with a new process. Which takes a while, because there’s a well-developed process for changing the process. You did remember to add that to your set of processes, right?

So don’t ever let yourself be tricked into turning an analog choice into a Boolean one. Trust me, it won’t end well.

Second, it’s not quite accurate. Systems, particularly feedback systems, can fix, or at least minimize, problems with people. Processes, just like those warning signs on ladders, are there for a reason. They’re there because at some point in the past not just one person, but enough people did things in a way they thought was right, but was actually dangerous, and got hurt or killed. Processes are institutional scar tissue. Something bad happened, and the process is there to make sure it never happens again. The process is there for a reason, and that reason is so that the system can heal from a person’s mistake.

The trick is to have the right balance between the two. The agile manifesto says people over process, so at least 51%/49%, and less than 100%/0%, but that’s a pretty big range. Where you land in that range depends on lots of things. The context. The people. The familiarity of the people with the context. And some trial and error, because you’re unlikely to get the balance right the first time.

And there you have it. People over process. Because people can fix issues in your system. And your system needs to have processes to protect it from the people.

April 28, 2025 by Leon Rosenshein

The Power Of Examples

examples context tdd agile kent beck hillel wayne it depends trying to do

I’ve subscribed to Kent Beck’s Tidy First substack, and there’s lots of useful info there. He just posted a piece on Why TDD doesn’t Lead to Dumb Code. As usual, it’s a really good entry.

But what really stood out to me in that post was not what he was saying, but how he was saying it. In particular, his use of an example. Beck is trying to answer why TDD doesn’t lead to overly specific code. The task at hand is to use TDD to write a function called factorial. As a software developer, figuring out the factorial of a number is something I’m very familiar with. So the amount of cognitive overhead to understand the problem space was approximately zero. This left me all of my bandwidth to understand the message about TDD and generalization that he was really trying to get across.

That’s the beauty of good examples. They help the reader/listener understand the problem and the solution. And good examples don’t burden them with additional things they need to learn before they can get to the information you’re trying to impart.

One good way to do that is by knowing your audience and understanding their context. It’s great if you share the same context, but the key is speaking in their context. As the presenter of information, it’s on you to find the right example for the group you’re speaking to.

That’s why a lot of software stories use car analogies. Cars are ubiquitous. The classic agile incremental build image

Image saying the right way to build a product is incremental adding value over time, not bits and pieces with no value until the end — Making Sense Of MVP

works so well because you don’t need to think very hard to understand that if you value easier transportation, you get nothing until the end in the top half, while there’s value added at each step of the bottom half. That’s a great example

Besides tailoring your example to your audience, Hillel Wayne goes a step further and talks about the difference between instructive and persuasive examples. More importantly, he notes that while an example might be good at one or the other, you still need to use the right example depending on what you’re trying to do. A good instructive example is often not persuasive, and an example that’s very persuasive might not be good at teaching something. Like everything else about software, and engineering in general, It Depends.

All of this is just to say that good examples are hard to find. And they’re also very important. And worth the effort to find.

Because if you do, you’re much more likely to get your point across. Which is your goal in any communication. Hopefully my examples here have helped me.

April 25, 2025 by Leon Rosenshein

Biases: The Tyranny Of Or

tyranny of or agile decisions planning biases cognitive load

I’ve mentioned the Tyranny of Or many times, I talked directly about it four years ago, and I stand by what I said. However, because OR is such a loaded word there’s more to say. And, I get to use the Agile Manifesto and how it’s often mistakenly applied as an example.

To recap, the Tyranny of Or is any time you are forced into, or have convinced yourself, that you are in a situation where you need to make a choice between two options. For example, you’re standing at the checkout counter and there are Snickers bars and Milky Way bars. The cost the same, and you only have enough cash for one of them. So you force yourself to choose. Snickers OR Milky Way. You must choose one. There is no other option.

The thing is, there are other options. It’s often just a bias that we use to fool ourselves. To avoid having to spend the mental energy to make a more thoughtful, nuanced, decision. Yes, sometimes the choice truly is “or”, but very often it’s not.

Consider the candy choice. There are also Kit Kat, Almond Joy, and M&M’s (both plain and peanut) options. You could buy them instead. Or you could by nothing. You might only be able to buy one, weren’t actually limited to the original OR. There might even be a smaller version of each, where you could buy both. Maybe you can turn the or into an and. Here’s another option. Put something else back and get both full-size candy bars. You only think you must choose one. It’s a false dichotomy.

Then there are things that we simplify into an OR. Consider the principles of the Agile Manifesto.

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

The people who came up with that list we very careful about how they worded it. The didn’t say Left good. Right bad. The didn’t say instead of. They very clearly said

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

They value the things on the left over the things on the right. They didn’t say working software instead of documentation. They didn’t say responding to change instead of following a plan. You can do both. Make things work, and document the why’s and the why not’s. Respond to change, but have a sea chart with a clear goal. You might not know how you’re going to get there, but you do need to know where you’re going. Even if the goal might change over time.

When you’re faced with a value choice, it’s very easy to simplify it into an or. Doing only A or B is easier to explain, discuss, and reason about than 70% of A and 30% of B. Is that the right percentage? Are you actually hitting that exact percentage? How can you make sure a group agrees on those things? That’s way more cognitive load than “We’re doing A and only A”.

Software engineering is Engineering. That art of the possible. The compromise. Because It Depends. It’s always context dependent and nuanced.

So next time you think you’re forced to make a choice between A and B, check your assumptions and check your biases. It might really be a choice between A and B. But it might not be.

April 24, 2025 by Leon Rosenshein

Simple or Easy?

it depends simple easy complex complicated domain driven design code for the maintainer

Here’s a question for you. What’s the difference between simple and easy? Are they different, are they the same, or is one a superset of the other? If you had to choose one, which would you choose? And why?

First, let’s see what Sir Merriam-Webster’s reliable book has to say.

SIMPLE

readily understood or performed
simple directions
the adjustment was simple to make

EASY

requiring or indicating little effort, thought, or reflection
easy clichés

Similar, but not quite the same. It’s kind of like the difference between complex and complicated. Complicated processes are often made of many easy steps to do. You just need to do them correctly and in the correct order.

Complex things, on the other hand, are often simple. A Foucault Pendulum is simple. It’s just a weight on the end of a string. Describing its motion, on the other hand, is very complex. It’s a function of the weight, the length of the string, and the Earth itself, including the pendulum’s position on the Earth. Once you know about it, it’s simple, but figuring it out from just watching the pendulum swing is very hard.

It’s often like that in software. Complicated systems, tools, and pipelines are tedious to build, and hard to get right, because they’re so exacting. Bash scripts are very often complicated, but because they’re scripts, each step is easy. Distributed systems are simple. Just do things in parallel on different machines. Do the exact same things you would do on one machine, just do half of them on one machine and half on the other. Simple. But complex because the failure modes are multitude, and the differences between them are subtle.

To answer the questions:

Are simple and easy the same thing?

No, they’re not. Simple is easy to understand. Easy takes little effort to do. But beware, not everyone will agree of something is simple or easy. Or why.

Is one a superset of the other?

No. They’re related, and something can be both simple and easy. In fact, many things that are simple are easy, and many things that are easy are simple, but being simple does not imply being easy, nor does being easy imply being simple. There’s not a causal relationship between them.

If you had to choose one, which would you choose?

Of course, It Depends.

When I’m starting out on something I’m unfamiliar with, I want to make it easy. There might be lots of steps, and it might be complicated, but until I understand the system, I want it to be easy. Every part should do one thing and one thing only. Interactions between parts should be minimized. If something is not right, I want there to be only one place to look for the problem, and I want to be able to work on that one thing and not have to worry about breaking anything else. Of course, doing things often ends up with a brittle system that is easy to use, if you use it just right.

Later, for the same project, when I’m more familiar with the domain, I’m going to want to make it simple to use. The internals will become more complex, with interactions that require deeper understanding to deal with problems. It won’t be as easy to understand the intricacies, but the surface will be simple. Using it will be robust and easy to understand. It won’t every surprise you, even if you misuse it, and will be hard to misuse.

But what if you REALLY had to choose just one?

In that case, I’d choose easy. Because easy takes little effort right now. So I can add value. And if I can keep it easy, I’ll always be able to add value. And when you do that, you usually find that keeping it easy eventually makes it simple.

April 23, 2025 by Leon Rosenshein

Governing the Commons

book review commons context systems thinking

Adding to my book reviews, consider Governing the Commons by Elinor Ostrom. You might wonder what a book about Turkish fisheries, Swiss grazing pastures, Japanese forests, and Spanish and Philippine water systems has to do with software development. It does seem to be a bit of a stretch.

Some background first. The common part is all about what the Commons actually are. In this case, it’s the same commons talked about in The Tragedy of the Commons. A shared resource. Traditionally, that’s used to denote a shared, finite, natural resource. Like a pasture, a forest, or water. That’s the commons or, as Ostrom says, the Common Public Resource (CPR).

In software development, particularly large projects with multiple teams, but really, any project with multiple developers, there are many shared finite resources. The most obvious are time, IO bandwidth, and storage (RAM and long term). You can probably list others, but as you can see, software has its own CPRs.

Those tangible things aren’t the only CPRs though. There are also more intangible things. Things that make up Internal Software Quality (ISQ). Like architecture, naming conventions, coding styles, and domains and boundaries. Those things may not be finite, but they are on a common, public, and they are on a continuum that is influenced by local actions with regard to the whole.

That’s how software development is like the Commons, and Ostrom’s findings and conclusions can tell us something about how we might do things differently to encourage better results.

Conventional wisdom tells us that when you have a group of actors, each looking out for their best interests, who must share a finite resource, they quickly use it all up, because each actor is working toward their own local maximum. The each want to get as much of the resource as they can, and before anyone else can get it. Or, one actor ends up controlling the resource, to the detriment of everyone else. That’s the tragedy. And the most common “solution” to the problem, is heavy handed, external (often governmental) regulation. It sort of works, but leaves everyone looking for a way to game the system and get the most for themselves, even if it makes things overall worse for everyone.

What Ostrom found, after looking at the various successfully managed CPRs, was a list of 8 guiding principles:

Group boundaries are clearly defined

While the shared owners and individual teams may be changing, everyone agrees on who is in the overall group, and which sub-group they’re in. It’s self-organized and dynamic, and often based on unwritten tribal knowledge, but the organization can be clearly seen.

Rules of use are matched to local conditions

Since the resources under consideration are different, the rules around their use are specific to those resources. It’s not as simple as saying “There are 1000 users. You each get 0.1%.”

Most (all?) actors are involved in modifying the rules

If you’re in the community, you’re automatically part of governing the community and the resource. If you’re not in the community, you’re kept out of the resource, until you join. Joining may be complicated, but it doable.

The community is allowed to set and enforce the rules

The converse of the above. If you’re not in the community, you have little (or no) say. External influences (like a government) are influences only, and only have influence on group members, not the group itself.

The community monitors itself

The community is tracks itself and identifies and “grades” infractions. Again, this is often known but unwritten, bit for larger communities it may be written and public.

There are sanctions for breaking the rules

If you break the rules, there are consequences. The bigger the offense, the bigger the consequence. Up to and including being excluded from the community

The community manages conflict resolution

The community handles its own disputes. You can bring up a dispute you’re involved in, and the community can step in where there are unresolved disputes, but either way, it stays internal

It’s a system, so communities and CPRs can be nested

The only way to scale to large numbers of actors or groups of actors is to have a hierarchy and nest groups and portions of the commons inside larger groups. The challenge here is to maintain the other 7 principles while building and operating the hierarchy.

Sure, when Ostrom wrote this she was talking about natural resources, but software development is a socio-technical endeavor. As such, there are lots of CPRs that need to be managed. Applying these principles can help us maintain those resources at an appropriate level while ensuring that everyone, both individually and collectively, have the right amount of those resources.

Finally, there’s a download of the book available on archive.org. The formatting is pretty bad, but the words seem to be correct.

April 22, 2025 by Leon Rosenshein

More Error Types

it depends errors biases decisions

I’ve talked about Type I (False Positive) and Type II (False Negative) errors before. While it would have been so much better if they just called them False Positive and False Negative cases, they only cover part of the problem. A more complete list would include the Type III (the right answer to the wrong problem) and Type IV (the right answer for the wrong reason) errors.

The Type III error ought to be innocuous. After all, you may have wasted some time getting to an answer, but you get to a correct answer. Using that answer is going to be good, right? As usual, It Depends.

To use a car analogy, you’re driving down the road and you hear a thumping noise. Nothing seems obvious, so you keep going. A few miles later, the engine dies, and you safely get the car to the side of the road. After taking a few minutes to calm down, you get out of the car and walk towards the front of it. You notice that the hood isn’t fully latched down. So you open the hood, slam it properly, and confidently get back in the car to continue down the road. Unfortunately, when you go to start the engine, it won’t start. Yes, the hood wasn’t latched, and it may have been causing the noise, but fixing it didn’t help get the car running at all.

Since that didn’t work, you look around some more. You notice that there’s a loose spark plug wire. That’s surely a problem. You fix it, but the car still doesn’t start. You keep finding and fixing things, but you can’t get the engine to start. It turns out that that answer to the question of why the engine died is that you ran out of gas. All those other things, which are related to the car’s operation, don’t help. They are problems, and you came up w/ solutions to them. They answered the question of why there was a thumping noise. They answered the question of why the car was running roughly. But they didn’t answer the question of why the engine stopped running. You’re still sitting on the side of the road.

So before you go accepting whatever answer you got, make sure it’s the answer to the question you asked. Otherwise, it’s not going to solve your problem. And, might make things worse.

The Type IV error can be even harder to spot, and can do even more damage later. But why? You got the right answer, and it solved your problem. Yes, it did, but it also taught you something that isn’t true. It’s now one of the things you know that just ain’t so. And that’s one of the hardest biases to fight.

Going back to the car analogy, one day your car is pulling slightly to the right. You remember from watching NASCAR races that you can change the handling of your car by changing tire pressure, so you check, and the right front tire was a bit low. You add some more air to the tire. The problem goes away. You’re happy. Over the next few months, it happens a few more times because there’s a small leak in the tire. When the car starts pulling you know it’s time to add some air. When you get new tires there’s no leak, and you don’t have any more pulling. Then one day the car starts pulling to the left. You use your new knowledge and add air to the left front tire. The problem goes away, and you’re happy. It doesn’t come back and you forget about it. Months later you find that the left front tire is completely worn out in the center because it’s been overinflated. The car wasn’t pulling to the left because of a low tire, it was because of that pothole you hit, which slightly bent the tie rod. So now you have two problems. A bald tire and a bent tie rod.

If it hadn’t been for that Type IV error, the right answer for the wrong reason, you wouldn’t need to get at least 2 new tires and and a tie rod. If you hadn’t known that inflating the tire would solve your pulling problem, the right answer for the wrong reason, you probably would have dug a bit deeper and solved the real problem.

So before you go and use the solution you already know works, make sure it’s the solution for the current problem, not the solution for another problem with the same symptoms.

Older Newer