Recent Posts (page 3 / 71)

May 28, 2025 by Leon Rosenshein

Time Passes

time testing boundary value

Hot Take. Unit tests should pass, regardless of what day you run them. Time is hard. It has a way of passing when you’re not even thinking about it. When you’re writing simulations (or unit tests, which can be thought of as simulations of some small aspect of your code) one of the most important things to do is control time. As a general rule, unless you’re measuring performance or displaying/logging the current time, you probably shouldn’t be using your language’s equivalent of Time.Now(). In fact, even in those cases, I’ll assert that you shouldn’t be calling it directly. You should at least be using some kind of dependency injection, if not a whole façade¹.

The other day I was dealing with an error in unit tests. A test on a function that hadn’t been changed in a while started to fail. I was able to reproduce the error locally, and wanted to find out which change caused it. I tried to track it down, using git bisect to help me do it, but I wasn’t able to. Between the time I got the error to happen locally and when I got back to dealing with it, the error magically went away.

Heisenbugs are a terrible thing. The tests pass, but the bug is just sitting there waiting to bite you. It’s never a fun time when you have to find and fix one. One nice thing about them, to use the term nice loosely, is that they’re usually related to one of a few things. The environment (things like ENV variables, files on disk, current working directory), load on the machine (disk space, CPU or network load), multi-threading, or time.

In this case it was time. But not in the code. The code was just comparing the difference in working days between two dates. It was, in fact, correct. It used a calendar and the two dates and gave the right answer.

Instead, this was in fact a kind of Schrödinger’s test. Depending on when the test was run, sometimes it passed, and sometimes it failed. The test was checking that the number of working days between now and three days ago was always at least one.

That seems reasonable. Or at least on the surface, that seems reasonable. Since working days are Monday to Friday, with Saturday and Sunday being weekends, there are never more than two non-working days in any three day period, so there’s always at least one working day.

And that’s how the test worked. It looked something like

today = time.Now()
three_days_ago = today.add(‘day’, -3)
result = working_days_between(today, three_days_ago)
assert result >= 1

The problem was, the test forgot about the fact that it’s not always true, Like on a three or four day weekend. Like Memorial Day in the US. Run the test on most Tuesdays and it passes. The number of working days between Tuesday and the preceding Saturday is one (the Monday between them). But run it on the Tuesday right after Memorial day and the number of working days between them is zero. That Monday is not a working day. The function did the right thing. It normally returned 1, but that day it returned zero. And the test failed².

This is actually a hard function to test correctly. Any day can be a holiday. It’s a little better defined for official holidays but add in company holidays, religious holidays, and personal holidays, and it’s un-knowable at test time. There are just too many variables. If you don’t tell the function when holidays are you either have to know when you’re writing the test or find them out at test time.

The most robust way to test this is to change the function to take a calendar, then in the test pass in not just the two days, but the calendar that should be used. And then calculate how many working days there are between the two dates in the test. Then, assert that the return value is exactly the same. Then figure out the edge cases and use boundary value analysis to make sure you test all of them.

And by the way, don’t forget that your calendar will change over time, so when you ask the question, how many working days are there between two dates, you need to think about when the question is being asked and know the calendar that was being used at that time. Just in case you didn’t think this was complicated enough 

I’m not saying you should write your own time and date management/handling functions. Just like with security and cryptography, you better have a good reason. ↩︎
NB: The correct answer here is NOT to change the test to be >= 0. ↩︎

May 23, 2025 by Leon Rosenshein

Licensed Engineers

engineers software engineering licenses kipling

Closing out this series¹, the third most common reason I’ve seen thrown around for why software engineering isn’t real engineering is:

Real engineers have a license. Software engineers don’t

In the United States you can get become a Professional Engineer. Canada has licenses and the Iron Ring², which acknowledges the responsibilities an Engineer takes on towards society. Other countries have similar systems.

To the best of my knowledge, the only place that has a Software Engineering specialty for Professional Engineers is Texas, and while that’s called Software Engineering, it’s really more about computer hardware engineering, and the number of licenses issued is vanishingly small. In the 20+ years that specialty has existed, there has been no uptick in licensed Software Engineers nor has there been a demand for them. Neither from people in the field, industry, nor from any government.

With that as background, while it is true that some engineers have those licenses, most people with engineering degrees that work in their chosen field as engineers don’t. And no one says they’re not engineers. If most traditional engineers don’t bother to get a license when they could but are still called engineers, it’s not reasonable to say Software Engineers who don’t have a non-existent license aren’t engineers.

All that said though, it is important to note that just because you write code, you’re not necessarily a Software Engineer. There are lots of extremely skilled, well trained, and talented people who can build infrastructure. In fact, you can’t build and maintain today’s society without them. But many (most?) of them aren’t engineers. They’re technicians. They’re operators. They’re builders.

The same is true for software. There are many people who develop software. From Lego Mindstorm robots to Excel macros to websites to astrophysical signal processing. There are no-code solutions like LabVIEW and now Vide Coding. That’s all programming and software development. It’s important. It can be fun. And it can be crucial to advancing the state of the art in whatever field it’s being applied to.

But just as with your home contractor or heavy equipment operator, the fact that you’re building something doesn’t mean you’re doing engineering. Engineering is about why you make the choices you do and how you go about understanding and balancing between competing constraints that exist in a specific context that you find yourself in to provide optimum value.

6 box engineering process loop: Ask, Imaging, Plan, Prototype, Test, Share

And that right there is why Software Engineering really is Engineering.

Part 1: Constraints, Part 2: Engineers Estimate ↩︎
Fun fact, Rudyard Kipling, seen by many as the patron of the engineer (see The Sons Of Martha, McAndrew’s Hymn, and Hymn Of Breaking Strain) authored the Obligation recited by wearers of the Iron Ring. ↩︎

May 21, 2025 by Leon Rosenshein

Engineers Estimate

estimation engineeers software engineering mmmss

The other day I talked about the #1 excuse people use when they say software engineering isn’t engineering, that software has no constraints. If you think software engineers don’t have to deal with constraints, here’s the post. Or just go talk to a software engineer.

The second most common excuse I’ve seen is

Real engineers can and do estimate their work. Software engineers can’t (or won’t) accurately estimate.

First, let’s agree that if you’re trying to do something that isn’t even close to something that has been done before, the estimate is going to be wrong. It doesn’t matter if you’re trying to build a Mach 3+ jet, the tallest building in the world, the first steel suspension bridge, or an online service that responds to millions of requests a day in milliseconds.

Second, have you ever been involved in a large infrastructure project, like a highway system, a water system, or building a multi-story building? What about mid-sized project, like building a house, or designing a home appliance? If not any of those, what about a small project, like a kitchen or bath remodel? Or even changing a lightbulb? If you’ve ever done any of those, you know hard it is to come up with an accurate estimate. And if you’ve never done the work, but had the work done for you, you’ve seen how those estimates just that, estimates. The reality is often different. Wildly different. Even for traditional engineers doing things that have been done before.

But it’s true. Traditional engineers are expected to, and do, estimate their work. And the smaller the delta between what is and what will be, the more accurate the estimate. Generally. And that makes sense. The better understood the problem and solution domain, the better an estimate will be. Until you get to edge cases. You can move a support piling a little bit and change nothing else. That’s easy. But if you find you need to eliminate a support piling entirely because of soil conditions you suddenly find that you’re changed from an arched bridge to a suspension bridge. That’s going to blow the schedule. Or the non-load-bearing wall you wanted to remove isn’t load bearing, but there’s plumbing in it. There are lots of surprises that can come up when you actually have to do the thing.

And all of that can happen when you have clear and stable requirements. When the requirements are in flux, anything can happen.

The same thing happens with software engineering. The closer the thing you want is to what we already have, the better the estimate. Want to add a button to the UI? Easy to do and estimate. Develop a new database query? No problem. Unless the screen is full, and adding a button means switching from one screen to two. Or redesigning the whole thing. Or finding out that the data is actually spread across three different databases. Discovering this new information means your estimates need to change.

In fact, change is the biggest reason that estimates in software aren’t as accurate as anyone, including software engineers, would like. It’s very common to start with only the vaguest idea of what is wanted, then iterate until it’s found. This may very well be the most efficient way of developing the software that best solves the user’s problems. We’ve seen how waterfall and BDUF projects end up. They have the same problems with estimation and then they add building the wrong thing just to make it worse.

There’s another thing that comes up as well. As often as not, what software engineers are trying to do is not build a mechanical system, but build a system that replicates a process. A process with people in it. People who do things a certain way, not all of them the same. With a myriad of edge cases. Going back to how things are done in medical offices, the computer-based system took all of the constraints of the old, paper system and somehow mashed them into the new system. Having to deal with both sets of constraints makes the system much more uncertain. As noted above, the more uncertainty and change, the worse your estimates are.

So there you have it. Estimation is hard in software engineering. Because estimation is hard in general. Even if you’re doing something very close to things that have been done before. You don’t know what you don’t know, and the goals can often change as well. Just like in traditional engineering.

May 19, 2025 by Leon Rosenshein

Constraints

crossover project engineers software engineering constraints bits atoms

Over the years I’ve seen many people say that software engineering isn’t real engineering. They tend to come up with the same reasons, even if they have different examples. In my mind I’ve grouped them into a few major reasons.

Real engineers work with things in the physical world. Things made of atoms, and they’re constrained by physics. Software engineers, on the other hand, work on “bits”, and bits aren’t real¹. There are no constraints on bits other than the developer’s imagination.
Real engineers can and do estimate their work. Software engineers can’t (or won’t) accurately estimate.
Real engineers have a license. Software engineers don’t.

The other day I ran across another article saying that software development isn’t engineering. It used what I think of as major argument #1 for why software development isn’t engineering. I disagreed. Besides pointing to towards the Crossover Project, there were a couple of other things that I mentioned.

First of all, as a person who was formally trained and started their career as an aerospace engineer, I have a decent idea of what goes into that work. I dealt with atoms. Mostly atoms making up aircraft in my case.

Second, it’s true that there are lots of constraints

that go into aircraft design. Balancing weight vs. lift, thrust vs. drag, useful payload vs. takeoff weight. Range vs. loiter time vs. acceleration. All of these things have limits based on physics, available technology, and how you choose to balance them against each other. It’s multi-variate calculus. With no right answer, only different choices. In any given situation, the answer to the question of which design is “correct” is It Depends.

Taking them in turn, while your typical civil, mechanical, or aerospace engineer is working on buildings, infrastructure, vehicles, and other very large, very physical things, that’s not the only kind of traditional engineer there is. Electrical engineers are primarily interested in are electric fields and how they interact to transfer and transport energy. Sure, they deal with wires and physical components to do it, but that’s the medium, not the focus. After all, electric current is not the movement of atoms, but the movement of holes. When you’re concerned about negative space, that’s pretty far from being concerned with atoms.

With that in mind, software engineering is about managing information flow and storage. No one would say that the people who design hydro-electric power stations, building dams, spillways, and internal plumbing aren’t engineers. Information is handled in a very similar way. Pipelines, Queues, and Long-term storage. One is water, the other magnetic fields or electron holes, but it’s basically the same thing.

The other part of the argument is that real engineers are constrained by physics. That’s certainly true. Going back to those planes I worked on, they very much are constrained by physics. There’s only a certain amount of energy in the fuel. You can only convert some portion of that to thrust. For a given shape, the lift/drag ratio is known. You have to balance those things or the airplane doesn’t work. You can’t build a plane out of Unobtanium, no matter how much faster/better/easier it would be.

Similarly, software engineers face constraints. There are the prosaic one, like clock speeds, amount of memory, and disk space. You can’t use more than you have. Then there are others that are more dependent on the current environment. Network bandwidth is a real limit. Available power is a limit. The speed of light is a limit on communication. The speed of a wavefront in a wire is a limit. Then there are things like CAP theorem. There are lots of ways to balance these things. With no right answer, only different choices. In any given situation, the answer to the question of which design is “correct” is It Depends.

There you have it. Why reason #1 for software engineering not being real engineering is wrong. Reasons 2 and 3 are topics for a later post.

On the subject of bits and atoms, way back in 2015 I sat in a company all-hands meeting while Travis Kalanick described the new Uber branding. How the company was all about bits and atoms. Using technology to move things in the physical world. ↩︎

May 16, 2025 by Leon Rosenshein

Slow is Smooth, Smooth is Fast

it depends context systems smooth domains dialectic mmmss

Move fast and break things. That’s the tech mantra, right? Do something. Might be right, might be wrong. Just do something and see what happens. Things will break. That’s OK. Just fix it later. As the Dothraki say, It is known.

There’s another saying. Slow is Smooth, Smooth is Fast. This one is courtesy of the Navy Seals. It’s saying the opposite. Slow down. Think about what you’re doing. Make deliberate choices. Every step will be little slower, but overall things will get done faster. Again, it is known.

And just as with the Dothraki, just because it is known, it’s not necessarily true. Maybe they’re both true. It’s your classic dialectic thinking. It Depends on the context.

Or maybe, thinking about it with the dialectic lens, they’re really saying the same thing, but from different perspectives, so of course they’re both true. We just need to think about them the right way. A way that honors both sayings and leads us to the deeper truth.

From an outside-in perspective, move fast and break things is saying that you should perturb the system and see how it responds. Then, with that new knowledge, you make another change. Do that fast enough and often enough and you end up changing the entire paradigm. You will have broken the old system and replaced it with a new one. Quickly.

From an inside-out perspective, you want to be deliberate. You want to slow down just a bit and consider what you’re about to do. Then do something deliberately. Which leaves you well positioned to make the next deliberate step towards your goal. Do that deliberately enough and it looks like you’re moving smoothly. If you keep doing that, you’ll find that you’ve actually moved faster than if you had rushed each step, but spent more time between steps.

Bringing this back to software development, here’s something to keep in mind as you do your work. Neither of those say you should take shortcuts or write bad code. When you move fast and break things, the thing that you’re breaking isn’t your code. You’re changing your code, but you don’t break it. You break the outside paradigm.

When you’re moving slowly and smoothly, you are always being careful to not break your code. You keep things smooth so you can keep taking the next step. You don’t need to take time to throw out your code and start again because it can change with you. You don’t need to take an extended period of time to figure out why your code has collapsed under its own weight. You use your understanding of the system to keep it the best simple system for now.

In both cases you might need to back-track a bit occasionally because you’ve chosen to move and break some paradigm, which has taught you that something you’ve done needs to change. That’s expected and it’s fine. Since you’ve done things deliberately, maintaining your optionality, it’s easy to smoothly make that change and move forward.

Which brings us right back to the dialectic. Move fast and break things. Slow is Smooth, Smooth is Fast. Statements that sound like they contradict each other. But are both true. By moving slowly and smoothly, you’re able to move fast and break the paradigm. There’s even a study showing this is true¹.

Code Red: The Business Impact of Code Quality – A Quantitative Study of 39 Proprietary Production Codebases. Details are a story for another blog. ↩︎

May 14, 2025 by Leon Rosenshein

Government Digital Services

context agile feedback user stories

A long time ago, in a country far away from, the government released guidelines. Nothing unusual about this, It happens all the time. Usually, when I hear about that I think of things that are well known, well understood, generally accepted, and now written down in obtuse language with lots of buzz words and details. Enough fluff to make it largely incomprehensible. You know, standard bureaucratic language.

When I think about the government that did this, I think of powdered wigs, stiff upper lips, and traditions that date back hundreds, if not thousands of years. Very much rooted in what worked before, with only a passing nod to the current.

And then there’s this. The opposite of stuffy, hidebound, traditional, bureaucratic guidelines. From Government Digital Services in the UK, the Government Design Principles. First published in 2012. Largely unchanged since then. Very forward looking at the time. And still forward looking.

Before I get too far into this, I do want to acknowledge that the design they’re talking about is software design, not interface design. There are some principles that touch on interface design, but it’s about software design and the software design process more than anything.

It might not be quite a pithy as the Agile Manifesto, but it’s close. Remarkably close for a government publication. If nothing else, look where it starts. With the user’s needs. It includes talking to users and to recognize that what they ask for isn’t always what they need. That’s a great place to start for design.

There were 10 points in the original version, and all of them still apply. From doing only what is needed to making things open and interoperable. Because context matters and we don’t know what we don’t know.

I believe all of these principles are good principles, and I would never use an appeal to authority, but it’s nice when others agree with you.

May 12, 2025 by Leon Rosenshein

Best Simple System For Now

dan north it depends systems thinking context feedback simple

When you’re writing code you have lots of choices. Even when working with 20-year-old legacy code, you have options. Not all of those options are equal though. Some are cheap and fast now, but may have a large cost later. Others are expensive and slow now, but might make things easier in the future. Your job as a software engineer is to choose the right one.

A system without feedback and a sytem with a feedback loop

Which one is right? You can probably guess what my answer is. It Depends. Of course it does. It always does. Without the context, there is no up-front answer. In fact, both are usually wrong. You don’t want to choose the cheapest/fastest option, and you don’t want to the one that gives you the most options in the future.

Instead, you want to choose the one that gives you a good balance of things. You want what Dan North calls the best simple system for now. It’s a very deliberate phrase. There’s a lot to think about in there.

For Now

One of the most important parts of the phrase is at the end. For Now. Given what you know at the current moment, about where you are, about what the immediate goal is, and what is between you and that solution, and what you think the long-term goals are. What can you do right now? It’s going to change. You know that. You just don’t know how it’s going to change. So you want to maintain the options, not make more decisions than you need to.

Simple

One of the best ways to maintain that optionality is to keep things simple. Simple is easy to understand. It’s easy to reason about. And most importantly, it’s easy to change. But remember, simple doesn’t mean you get to ignore things. It still needs to work. It still needs to work at the scale you’re operating at. It still needs to work when the inputs change. Or at least it needs to work well enough to tell you that it can’t work in the new situation. Remember, KISS. The simpler it is the easier to get right and the harder to get wrong.

System

Another thing to keep in mind is that it’s a system. Even the simplest program is a system. And the important thing about systems is that the parts of a system interact with each other. Often in strange and unexpected ways. You need to remember, and minimize, emergent behavior. By keeping things simple. By remembering that you’re building a system for now.

You need to remember that systems have feedback loops. So you need to identify and understand those loops. So you can work with those loops, instead of against them. When you work against the feedback loops in a system you’re working against the entire system. If you keep trying to do that, you either change the entire system or you end up not changing anything. As John Gall said:

A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.

Best

Finally, we get to best. How can you make something the best? By ensuring that what you’re building is for now. By keeping it simple. And by working with the system. If you do all of those things, you’ve got a very good chance of ending up with the best simple system for now.

May 9, 2025 by Leon Rosenshein

Respect The Problem

domain driven design context systems thinking done

The other day I ran across a really interesting quote.

The bottom line here is that you have to respect the problem. … There’s no silver bullet solution to just linearise them and wish them away.

– Dan Davies

Now that was in the context government regulation and the environment, but the quote can be applied just as well to many different environments.

Such as software development. When you’re trying to solve a user’s problem, you have to understand their problem. Not just the surface level request, but the real problem behind it. If you’ve ever looked at what your doctor and nurse do when you have an appointment, you can see that many years ago someone said “We need to turn this pile of paper records into an electronic record”. So they took the exact forms that were in use and implemented them on the computer as a scrolling screen with lots of tiny checkboxes. It did solve the problem of moving to electronic records, but it didn’t really solve the problem of “How do we efficiently track patients and their conditions, knowing that we have a new input modality”

That’s not Domain Driven Design (DDD). That’s not respecting the problem. That’s a feature factory. Blindly implement the request without thinking about the why.

Respecting the problem means understanding the not just the surface “what” of the ask, but the why behind it. I’m not going to pretend to understand the medical data entry domain, nor do I know the details of the regulations behind them. But I do know that if you really look at the domain and think about the goals, you’d come up with a different solution than taking the old pen and paper forms and replicating them on the computer.

The constraints are different, so the solution should be different. If nothing else, most laptops are built in landscape mode, and the paper forms were designed in portrait mode. The form factor is different, so the solution should be different. Today’s tablet displays might have sufficient resolution to use the same size and layout as the paper forms, but they sure didn’t when the online systems were developed, but that’s the way they were developed.

Here’s another constraint. There’s no natural physical way to bookmark 3 different pages. Back when there were paper forms, it was easy and common to have fingers on the different pages you needed to flip between. There’s no physical way to do that with a laptop or tablet, so that capability just went away.

On the flip side, there are new capabilities that come with the online form. You can make a decision tree and have the form follow it. You can hide things that don’t matter. You can group things based on what you’ve already done. A simple example would be a field where you can select one or more things, and there’s an “other” option. Instead of always having a text box there taking up space or only show it when the user picked “other”. Another thing you could do would be allow things to get bigger when you need to interact with them. You can imagine lots of others.

But those things didn’t happen either. They just replicated the old paper system. And called it done.

That’s not DDD. That’s not good software engineering. That’s not respecting the problem. And it’s not solving the user’s problem.

May 7, 2025 by Leon Rosenshein

Dijkstra On Bugs

Dijkstra software engineering flow debug code quality debugging

Unsurprisingly, there are hundreds of quotes about computers and programming by Edsger Dijkstra, and almost all of them are worthy of a post (or two). His work is foundational to much of what we do as software engineers. He was also a prolific, excellent, and memorable communicator. After all, he was the one who came up with Goto Considered Harmful and that one is certainly well known, almost dogma.

Image of Edsger Wybe Dijkstra — Edgser W. Dijkstra
Attr: Hamilton Richards

But today I’m going to talk about one of his lesser known statements. A statement about how we view program correctness and debugging.

Let me start with a well-established fact: by and large the programming community displays a very ambivalent attitude towards the problem of program correctness. A major part of the average programmer’s activity is devoted to debugging, and from this observation we may conclude that the correctness of his programs —or should we say: their patent incorrectness?— is for him a matter of considerable concern. I claim that a programmer has only done a decent job when his program is flawless and not when his program is functioning properly only most of the time. But I have had plenty of opportunity to observe that this suggestion is repulsive to many professional programmers: they object to it violently! Apparently, many programmers derive the major part of their intellectual satisfaction and professional excitement from not quite understanding what they are doing. In this streamlined age, one of our most under-nourished psychological needs is the craving for Black Magic, and apparently the automatic computer can satisfy this need for the professional software engineers, who are secretly enthralled by the gigantic risks they take in their daring irresponsibility. They revel in the puzzles posed by the task of debugging. They defend —by appealing to all sorts of supposed Laws of Nature— the right of existence of their program bugs, because they are so attached to them: without the bugs, they feel, programming would no longer be what is used to be! (In the latter feeling I think —if I may say so— that they are quite correct.)

July 1970
prof.dr.Edsger W.Dijkstra
Department of Mathematics
Technological University
EINDHOVEN, the Netherlands
EWD288

That’s not quite as pithy as Simplicity is prerequisite for reliability, and there’s a lot to unpack there. Go read it again.

To me, the first and most important thing he’s saying is that, as a profession, we not just accept, but defend the existence of bugs. That’s a pretty damning accusation. That the profession of software engineering feels that all programs should have bugs.

Second is that debugging is the fun part. That we need the opportunity to debug. That without that part it’s boring.

Third, that we somehow need the Black Magic of the computer to fill some psychological need.

That’s not how I see it, but it does give you something to think about. Take the first part. That we defend the existence of bugs. There’s some truth to that. For all but the most trivial of programs running in a constrained domain, I would assert that it’s impossible to ensure that future changes to not cause improper operation. Or at least impossible in practice. But that doesn’t mean we should ignore the possibility of bugs, or that we shouldn’t be as defensive as we can be. And we should maintain Zero Bugs. Prevent what you can, then fix what is exposed as fast as possible.

Personally, I don’t find debugging being fun. I think that conflates the feeling of accomplishment we get from finding/fixing an issue with enjoyment. There have been many occasions where I’ve been proud of myself for doing the work, and I’ve definitely felt the easy and fulfillment of getting into a flow state while tracking down an issue, I wouldn’t call it fun. And I don’t know many people who would.

As to needing the Black Magic of computers, that’s not something I experience, but it might be true for others. As a description of how people approach things, maybe? Regardless, I don’t think it’s a good reason to accept issues.

Having said that about the individual points, his meta-point that we don’t do enough to ensure that issues don’t end up in the hands of our users/customers, is valid. I think we can, should, and must, do better. In this age of fast and easy updates, I think we, as a profession, have somewhat forgotten the value of shipping good software in favor of shipping flashy software. And that reflects badly on us.

As software engineers, our goal should be to solve our user’s problems by balancing their needs and the system’s capabilities. Most of the time that’s by using more software. But sometimes it’s by using less software. And in both cases, it’s by delivering software that does the right thing. All of the time, not just most of the time.

That’s how we can honor our responsibilities as software engineers and respond to Dijkstra’s message.

May 5, 2025 by Leon Rosenshein

Zero Bugs

tdd zbb agile lean

Back when I worked on boxed products at Microsoft, we had 2-year release cycles. And towards the end of each one was a milestone called Feature Complete. That was the point in the project where all features we expected when we did planning 18 months earlier were done. Or at least the ones that we hadn’t decided to cut because we ran out of time. You would think that after feature complete, we’d be ready to ship. But that wasn’t the case.

Instead, the next big milestone was Zero Bug Bounce (ZBB). That was the second time in the history of the project that there were zero active bugs in our tracking system. The first was before we wrote any code. After that, the number of bugs climbed until shortly after Feature Complete. For ¾ of the project or more, the incoming bug rate was higher than the fix rate.

That wasn’t just our project. That was the way most software was written. You built it, then you tried to test quality in. It worked, after a fashion, but let’s not fool ourselves. It wasn’t very efficient, and it wasn’t a lot of fun. From the beginning of the project until some time after feature complete the backlog of work kept getting bigger.

At the same time, the early 2000’s, extreme programming and the agile movement were getting started. Borrowing some concepts from lean manufacturing, and the idea of building quality in instead of testing it in.

One of the ways that expressed itself was the idea of a Zero Bug Policy (ZBP). The idea that your software should have 0 bugs. At the time, most folks looked at that and said it was impossible. Of course, there were already examples of bug free software, but people still thought it was impossible to write bug free code.

And those folks are right. Even with Test Driven Development (TDD), and a full suite of unit, integration, and system tests, you can’t guarantee bug-free software. But that’s not what a ZBP is about. It’s not that you never make a mistake, or a bug never gets shipped to a customer. Instead, a ZBP is really about not having a bug tracking system.

While ZBB and ZBP have a Levenshtein distance of only 4¹, they’re completely different things. A ZBP means that instead of keeping track of your bugs and fixing them later, when you’re not so busy adding more bugs, you fix them now, for some reasonable value of now. You don’t drop everything and fix it², but as soon as you finish what you’re working on you fix the problem before you start something new. That means that every day is potentially a ZBB.

That’s a very different way to build software. It’s hard to do. You need to build the muscles for TDD and unit test. You need to build the muscle to say “No” when schedule pressure pushes you to move on to the next feature even though there are still issues with the current task. You need to build the deployment muscle so it’s easy to make the fix. All of these things and more are hard to do, and don’t show any immediate benefit³. It takes disciple and commitment.

Another benefit of ZBP is that you’re always ready to ship. You might not have the feature set you originally planned, and it might not be as pretty as you might have made things, but if you need to do a demo, you can demo everything you’ve done. If something happens and the release date moves forward, you have something to release. You can sleep at night and not have to worry about having the rug pulled out from under you.

Remember, even if you’re living in a ZBB world, you don’t have to stay there. You can bias your choice of work slightly so that your rate of finding issues is lower than the rate at which you fix them. Even if this doesn’t get to you ZBB before feature complete, the wall you hit at feature complete will be shorter.

And finally, you need to differentiate between planned features, feature requests, learning more about the domain you’re operating in, and software bugs. The first two have nothing to do with a ZBP. You can have as many of them as you see fit, and you can track them however you want. They key is that they are NOT bugs. That’s just future work you need to do.

New learning about the domain might or might not be a bug. Learning there’s a better way to do something, or an abstraction you should be using is not a bug. Finding your domain model doesn’t match the system you’re trying to model IS a bug and needs to be fixed ASAP.

Simple coding errors are also bugs. First, write a test that fails because of the bug. Then fix the code so that test, and all other existing tests, pass. Again, don’t add those issues to a long-term tracker and wait to fix those issues. Just fix them now.

Only one for the acronyms, but that’s cheating ↩︎
Sometimes you do need to drop everything and fix the problem. Or at least part of the team does. If something changes and your production system goes down, you mitigate it immediately. Similarly, if the bug found is blocking a large portion of the dev team, you might choose to fix it immediately. In most cases however, you can work the fix in as the next thing. ↩︎
In the long run, putting more effort into how you write code will pay you back, but you can always rent time by taking on technical debt. You just have to pay it back later. ↩︎

Older Newer