Recent Posts (page 1 / 71)

by Leon Rosenshein

Indirection Vs. Abstraction

I’ve heard it said that there’s no problem in computer science that can’t be solved by another level of indirection. I’ve also heard that there’s no problem in computer science that can’t be solve by another level of abstraction. That’s the same thing, right? Wrong1.

Let’s start with some definitions.

Abstraction: In software engineering and computer science, abstraction is the process of generalizing concrete details, such as attributes … to focus attention on details of greater importance.

    – Wikipedia

Indirection: In computer programming, an indirection (also called a reference) is a way of referring to something using a name, reference, or container instead of the value itself

    – Wikipedia

Here’s the thing. Abstractions provide simplicity by hiding details. Indirections provide flexibility by introducing a decision point. Both are important. You can’t write effective, resilient, maintainable software without both. It also means that every abstraction is also an indirection.

Consider dependency injection. It’s an indirection. The goal is to let the developer not care which thing got injected. The operations, the verbs if you will, are defined by the thing on the other side of the indirection. Instead of calling some resource directly, you get told what resource to use. That makes it more flexible. Regardless of what you get though, they need to act exactly the same. Whether you’re writing your data to the production database, a shared staging version, or your own private one, you know you’re writing to a database and the API you use reflects that. You’ve also made the overall system a bit more complex. And you’ve made it harder for the person looking at the code to know where the data was written because they have to figure out what was injected.

An abstraction, on the other hand, changes what you can do with the underlying thing. The goal is to change the mental model of the thing. Instead of the verbs coming directly from the underlying thing(s), the abstraction defines its own operations. Some things that could have been done are not exposed. Some things that can’t be done directly are composed and presented as operations. Instead of the CRUD operations on a database, you might have Load, Save, and Filter on an entire dataset. And as a user of the abstraction, you don’t know (or care) if it’s a database, a bunch of files on disk, or a mechanical turk setup. Outside the abstraction all you need to know is what the operations do. Inside the abstraction, you don’t care how it is used. You’ve reduced the cognitive load of the user, but also reduced the system’s flexibility.

Those two usages are clearly not the same. Yet I hear people using them interchangeably all the time. Not only does that cause communication problems, when people aren’t talking about the same thing, it causes problems in code as well. Particularly when a problem calls for an abstraction, but someone provides a solution that is really an indirection.

Therein lies the problem. If you confuse the two you make things worse. The most common mistake I’ve seen is using an indirection and thinking you’ve created an abstraction. For example, there’s the Enterprise FizzBuzz. That’s about as indirect as you can get, all in the name of abstraction.

On a personal level, I once spent over a year working on a Java project that really embraced Lombok. The theory was that there was an object model that did what we needed and everything was late-bound in. In practice, everything was decorators and builders. Indirection everywhere. Tracing the code was almost impossible. Extending a model required changing multiple files that had no visible connection. You didn’t have to connect anything to anything else. But you also didn’t know what you needed to create or modify until things didn’t work. It was a nightmare to work with because it tried to be an abstraction, but all it really was, was unneeded indirection.

I love abstractions. They lower cognitive load, help people do the right things easily, and make it hard to do the wrong thing. I like indirections as well. They provide flexibility and monitoring points. Just like a short link URL lets you move the actual link and not have anyone care, indirection lets you move things without users caring.

However, they are not the same thing, and if you confuse them you’ll eventually realize it. And pay the price.


  1. It’s also not the same as the difference between an abstraction and an interface ↩︎

by Leon Rosenshein

Testing Is More Than Preventing Breakage

Here’s something I ran across on the interwebs the other day. It’s about the reason for testing. I don’t fully agree with the first point, but I definitely agree with the rest.

If your tests only tell you when something breaks, you’re missing the point.

Great tests accelerate learning. They guide design, expose bad assumptions, and make change safe.

Testing isn’t cleanup. It’s engineering. Still shipping without fast feedback? That’s not speed. That’s risk.

– Dave Farley

Using tests to tell you when something breaks is absolutely part of why we have tests. Whether the tests are written first or after, once you have them1 they’ll let you know when a change you made breaks something. That’s real value right there. It might not be the whole point, but it’s part of it.

Dave is right though that do tests more than just detect breakage. They help you clarify things. They help make things more concrete. They help you make your interfaces clearer. They validate what you think you know. And they point out the things you know that just ain’t so.

Especially if you write your tests first. You write the tests that expect the code to work correctly. To behave the way you want it to behave. Tests that use an abstraction and mental model that is consistent. Tests that show how the thing is going to be used.

That’s where the learning comes from. Each test builds upon the earlier ones. They let YOU test your mental model of the system before you write any of the code. If the mental model you’re using doesn’t allow you to write a test that has the behavior you want, it’s the wrong mental model. If the behavior you want is hard to get then your model needs more thought. Remember that you are not only allowed, but required to think about the whole system before you write that first test.

Once you have the tests written, all you need to do is write the code. When the tests pass, you’re done.

Yes, it’s not that simple in practice. You probably won’t write all the tests before you write any code. In fact, you shouldn’t. You should write your list of tests first. Then you should write one test2. Once you have a test, write the code needed to make that test pass3. Once the test passes, look at the code and fix it4. Then you’ll move on to the next test and its code. Repeat until you run out of tests to write.

As you add more tests and more code you’ll be learning about the code. You’ll find things that need to be refactored and combined. You’ll realize that some of what you thought you wanted was wrong and you’ll adapt. You’ll optimize for multiple things at once. You’ll make compromises. You’ll do the thinking needed to solve the business problem you need to solve. You know what else you’ve done? You’ve done Test Driven Development.

To reiterate, as Dave says, writing tests isn’t the cleanup you do at the end. It’s not just coding to a spec and checking a box marked ‘Write Tests’. Done right, writing the tests is engineering the system. And when you’re done you’ve not only solved the current problem, you’ve built a system that will reduce the risk of making changes when you learn something new.


  1. Of course, this assumes your tests are well written and validate behavior, not implementation, but that’s a topic for another time. ↩︎

  2. In TDD terms, this is the RED step. At leat one test is failing ↩︎

  3. In TDD terms, this is the GREEN step. All the tests pass ↩︎

  4. In TDD terms, this is the REFACTOR step. You learned something about the code so apply that learning. This is also when you’ll add more tests to your list of tests to write ↩︎

by Leon Rosenshein

Stuttering

There are two hard things in computer science: cache invalidation, naming things, and off-by-1 errors. Today’s post is about the middle one of the two.

Naming is hard. Noun clumps1 for data. Verbs2 for functions. Hungarian Notation3 for clarity? Everyone’s got an opinion on what’s right. Even languages have opinions on what’s allowed. Almost all languages allow almost all of the ASCII characters in identifier names. Some they have rules around what can be first in the identifier and how long it can be. Others assume certain types for different identifiers (I’m looking at you FORTRAN).

And some languages are more opinionated than others. APL has its own character set (and keyboard). Python doesn’t care much. Go, the language is not very opinionated. Go, the ecosystem and the community, on the other hand, are very opinionated.

gofmt, the Go formatter is so opinionated that, unlike every other formatting tool I’ve ever come across, it has ZERO options. You can add rules, but you can’t turn any off. You can’t pick the brace pattern. You can’t pick tabs, spaces, how much to indent each block, or anything else.

It’s simple, it’s fast, and all Go code looks the same. I consider that a win.

Go’s linters are similar. Both vet and golint let you choose which checks to run, but there is no ability to ignore a false positive. You’ve got to fix it or re-write the code to remove the false positive. I consider that mostly a win, but I have seen bad rewrites to make it happy.

Another thing golint will do for you is help prevent what it calls stuttering. According to golint, stuttering is when you have a public method in a struct with the same name as the struct itself. Something like run.Run(). I get that. Stuttering like that is both annoying and forcing ambiguity. When you’re talking about the code and you mention run, are you talking about the struct or the method? You need to be careful when you talk about it and the person you’re talking to needs to pay close attention to what you’re saying. That adds cognitive load, and part of Go’s design principles was the idea that it would be easy to ready. Both for a person and for the compiler. The Go Proverbs really lean into this. Make it easy to understand. A few extra lines or methods is fine.

But it’s not perfect. Sometimes code still stutters. And sometimes it’s hard to read and understand. Especially when there’s lots of text in the output to look at.

All of which is a very long way to say that naming is hard, and when a name stutters it adds to your cognitive load. They higher the cognitive load, the more likely you are to miss something. Missing something simple can cause you a lot of grief.

I was thinking of this because of an issue I tripped over the other day. I misread the name of a test in a test report because of stuttering issue. Starting from that simple mistake, I spent 4 hours questioning how computers worked, the correctness of a very simple, core feature of our build system, bazel, and my sanity.

You see, bazel has the ability to filter out tests based on the keywords you tag them with. Want to run only tests tagged with needs_network? Just add --test_tag_filters=needs_network to you bazel test command. Want to skip any tests with that tag, just add --test_tag_filters=-needs_network. Simple and straightforward.

And it seemed to work. I’d been happily marking bad tests with a tag and then fitering them out. Then, all of a sudden one of them showed up in my list of failures. Why was it running? It was supposed to be filtered out. And when I tried to run just that test and filtered it out it didn’t run.

However, when I ran all of the tests in that directory with the same filter, it did run. This apparent dichotomy led me to 4 hours of questioning myself and computers. I did all sorts of debugging. Printf, tracing, log grepping. And it just didn’t make any sense.

I started asking others if they had seen anything like that. No one had. Until finally, someone (thanx Alex) pointed out the simple thing that I was missing. Our test names stuttered. We had one test called //a/really/long/path//that/leads/to/the/thing:go_default_test4, and one test called //a/really/long/path//that/leads/to/the/thing/thing:go_default_test. I marked //a/really/long/path//that/leads/to/the/thing:go_default_test as a test that should be skipped, and when I tried to run it by itself, it got skipped. Then, when I ran all the tests under //a/really/long/path//that/leads/to/the/…5 and looked at the results, it was still skipped. But I didn’t notice it.

What I did notice was //a/really/long/path//that/leads/to/the/thing/thing:go_default_test. And because the prefix was the same for all the tests, my eye was drawn to the suffix. And I saw /thing:go_default_test, which I wasn’t expecting. It wasn’t supposed to be there. And because I was task focused I completely ignored the prefix. Instead of recognizing that these were two different tests, I thought the tools were broken. And down the rabbit hole I went.

To wrap this up, the real problem was that, because of the stutter, I had marked the wrong test. I was excluding a test that was fine, but running a test that sometimes failed. Then, when debugging it, I never noticed that there were two tests with almost identical names.

The moral of the story? First, beware of target fixation. Instead of looking at the bigger picture, I got stuck looking for how the tools were broken. Second, make it easier on yourself and don’t stutter in the first place. There didn’t need to be two tests that started with the same long prefix and ended with the same medium sized suffix, with just a tiny difference in the middle. That was a choice we made. We should have chosen better. If the tests names weren’t so similar I wouldn’t have missed the obvious.


  1. A bunch of nouns and adjectives that describe a thing. Generally, a good way to name a data item. ↩︎

  2. Methods should be named for what they do. If you find you want to put and in the name you’re probably wrong. ↩︎

  3. The real Hungarian Notation wasn’t about prefixing with base types, it was about prefixing with intent. And that’s not necessarily a bad thing. ↩︎

  4. Not only do we use bazel, we use gazelle, and we’ve been using it for a while, which makes things more complicated. ↩︎

  5. For the uninitiated, that weird notation basically means “all of the tests defined in or below this path”. ↩︎

by Leon Rosenshein

Optimization

I’m traveling on business this week. I’m staying in a hotel, as you do when traveling for work. As I spend time in the hotel, I’ve noticed some very interesting things and learned from them. Just like I’ve learned things from my dryer. One of the I’ve learned is about different kinds of optimization. Take a look at this photo.

Picture of disposable silverware and soap from a hotel stay

Look at that bar of soap. It’s got three holes all the way through it. It’s maybe half soap. That’s just the hotel being cheap, right?

Look at the silverware. They’ve all got big holes in the middle of the shaft . Only the ends of the silverware are covered. That’s just the hotel being cheap, right?

Wrong. That’s optimization. The question is, what are they optimizing for? Sure, cost is one of the things being optimized for. But it’s not the only thing. There are multiple other things. Like lifespan. Usefulness. Material used. Waste. And of course, customer satisfaction.

It’s not just any one of those things. Instead, it’s maximizing the collective impact of all those things. It’s got to be low cost, because the hotel goes through hundreds of thousands of them. If they’re too expensive, the price of the room goes up. If there’s too much material used then there’s more waste, which takes up more space and costs more to handle, which also drives up the cost of a room. On the other hand, if the silverware is made with too little material, it bends or breaks too easily and can’t be used.

The bar of soap needs to last for a few days, maybe a week. But that’s not a lot of soap. If it were solid, it would be too small to be used comfortably. The machine to make a bar of soap with holes in it is more complicated than one that makes a smaller solid bar. It’s more expensive to operate. But customer satisfaction is important.

The design of the silverware and the soap takes all that into account. They’re not the cheapest they could be. They’re not the best experience they could be. But they’re good enough. And they’re cheap enough. They’re not optimized for long term home use, and they’re not optimized for camping or restaurant use. They’re optimized to for short stays in a hotel.

Software is similar. It’s optimized for use in a specific context. Kubernetes is great if you need to orchestrate the deployment of hundreds of different workloads with multiple instances of each across thousands of computers. But if you’re hosting this static website then Kubernetes is a bad idea. Need to store a dozen rows for a database? A simple text file can work. If it’s a complicated schema then maybe JSON. You don’t need, or want, a Postgres or Cassandra database. On the other hand, of you need to store data related to millions of images with thousands being added every day, Postgres might be what you’re looking for.

And it’s not just in choice of technology. It applies to languages, architecture, and how you deploy things as well. These are all decisions we make on purpose.

Or at least we should be doing that. That’s what software engineering is really all about. Understanding not just the problem, but the context, the situation, and the environment, that you are solving the problem in. Then finding the combination of parameters that optimizes the value produced compared to the cost of production. I can’t tell you the right answer to all of those questions without a lot more context.

But I can tell you that the answer isn’t as simple as you think. And that the answer will change as the conditions change. So be prepared.

by Leon Rosenshein

Writing Legacy Code

One of the people I regularly ready is Tim Ottinger. His writing has either put into words things that I’ve felt but hadn’t figured out how to say, or said things that I’ve said, but in a much clearer/more powerful way than I have. His recent post is one of the former.

One of the dominant, less-disciplined, processes that programmers follow is:

  1. Write a bunch of code (being very careful, of course)
  2. Edit it until it will compile.
  3. Debug it until it gets the right answers.
  4. Look it over for other errors and correct them.
  5. Send it off to code review

If they write tests, they usually do so between (4) and (5).

Notice that by performing steps 1-4 first, they have placed themselves in the legacy code situation.

You can read the long form version on his website. That article has a lot to say about TDD, and, as usual, I agree with all of it.

But what I realy want to talk about here is New Legacy Code

One of the dominant, less-disciplined, processes that programmers follow is: 1. Write a bunch of code (being very careful, of course) 2. Edit it until it will compile. 3. Debug it until it gets the right answers. 4. Look it over for other errors and correct them. 5. Send it off to code review

Or more specifically, the fact that so often, we do it do ourselves. I see it all the time. Code that gets more and more complex as time goes on. Where the lines between domains blurs and functions start to do very different things depending on what parameters they’re called with. Where abstractions start leaking details up and down the call stack. Making changes (and testing) harder and harder as time goes on.

It starts with little things. Someone adds a special case based on one or two parameters, or if we’re in some particular state then skip a step. After a while the code starts looking like an arrow with all the nested conditionals. Instead of refactoring isolating things we make new connections to a database or call time.Now() directly. We take a function that did the one thing that was it’s name but add things to is until a better name would be doTheOldThingThenDoTheNewThingUnlessConditionXorYorZ().

I’m guilty of it. I get impatient. I have a problem. I have an idea for a solution. I start working on the solution. And of course, it’s not quite right, and I find myself in the legacy code situation. It doesn’t really feel like legacy code because at that moment I’ve got it all in my head, so making changes with context is easy. But months later, or even the next day, when I come back to it, all that context is gone. And I need to back my way into it.

It is undisiplined. I know better. Sometimes I forget. Usually I remember and do something about itm but sometimes I forget. Sometimes I get lazy and choose not to. Either way, I always regret when I don’t.

So go read Tim’s thoughts in his own words.

And remember that much of the pain you’re feeling when working without a net is self-inflicted.

by Leon Rosenshein

Who Do You Love?

On the topic of leading with the why, emotions are important. Two of the strongest emotions are love, and it’s not opposite, hate1. Three of the biggest drivers of innovation are necessity, laziness, and love. If you want to a strong driver of innovation, love is a pretty good choice.

Like George Thorogood asked, the question is, Who Do You Love?. Answer that question and you’re on your way. Get the whole team loving the same thing and you’ll be amazed at what you create. So, who should you love? In Inspired: How to create Products Customers Love Marty Cagan gives us the answer.

Fall in love with the customer’s problem, not your solution.

That’s it. It’s that simple. Love (and hate) your customer’s problem. Be passionate about it. If that problem is the most important thing in your universe, and you hate it so much you want to eliminate it, your customer is going to be surprised and delighted.

It means understanding the domain and the problem so well that you understand not just what happens, but why. You understand the fundamentals and the theory behind them. You know not just what the problem is, but why it exists.

What you don’t want to do is fall in love with your solution. Yes, you should deeply understand your solution. What it is, how it works, and why it works.

But you also need to remember that your solution is not the important thing. It’s just a means to an end. A way to eliminate the customer’s problem.

The problem with loving your solution is that it becomes the important thing. You start seeing it as the solution to not only the customer’s specific problem, but the reason you’re doing the work in the first place. And that’s just wrong.

When the solution is the most important thing, you start rephrasing the problem to match the solution. You stop delighting your customer. Instead, you start teaching your customer why they’re misunderstanding their problem, and they should listen to you about what the problem really is2. And that’s the tail wagging the dog.

When you and your team share a why of love (and hate) for the customer’s problem, you’ve tapped into some of the most powerful emotions and drivers of innovation.

Which will result in delivering maximum value for the customer.


  1. Yes, yes, yes, love and hate are in many ways opposites. However, the opposite of love is also indifference. Love and hate are strong emotions directed at a person/idea. The opposite of a strong emotion is no emotion, indifference. ↩︎

  2. Loving the problem doesn’t mean you’ll never have to educate your customers. Very often your outside view gives you an insight that customers are too close to the problem to see. The difference is that you’re not educating them on how wonderful your solution is, your helping them understand the problem better. After that, the solution comes naturally. ↩︎

by Leon Rosenshein

Practice Makes Perfect?

As I heard in marching band, don’t practice until you get it right, practice until you can’t get it wrong. It was certainly true there. It’s mostly true in Software Development too. Not that we’re doing the same thing, day in and day out, like a marching band, but there are a lot of process we practice every day, and getting better at those processes, learning them so well that we can’t get them wrong, is a good thing, right?

Well, It Depends. Generally, it’s better to improve your understanding of, and ability to use, those processes. But not always. Consider this quote:

Do you want to get better at what you’re doing, or find a better way to get the results you want?

That’s a pretty powerful question. It cuts right to the heart of the matter. As I often ask, “What are you really trying to do?” Your goal is to add value. As much value as you can over time. Sometimes you can do that by getting better at doing what you’re doing. But sometimes, you can add even more value by rethinking the situation and approaching it differently1.

Here’s an example for you. Over the years I’ve build about half a dozen batch/parallel processing systems. The first few were very bespoke. They were ad-hoc, distributed build systems for game assets. Turning complex 3D models built with expensive modeling tools into runtime versions that could be rendered in-game quickly. No options on where sources would be found or where to put results. Run one command and hope it worked. Retry it if it didn’t. That was about it.

Over the years they got more generic. Multi-step pipelines. Then map-reduce like things. We added some error handling. We figured out how to do steps that involved people. We started to track the data so we could re-run things based on changed data, automatically. We got better and better at handling different team’s specific needs and workflows. But through all that, changes to the pipeline required changes to the system. That only the framework team really understood. And that bottle-necked the overall system.

Then we had an epiphany. We were getting better and understanding our customers and their needs, but we would never understand them as well as they did. No matter how much better we got, we were rapidly approaching a capability limit.

So instead of getting more detailed, we got more abstract. Instead of doing things for our users, we gave them more power and let them do things themselves.

We gave our users the ability to write their own control logic. They could look at their data, do some thinking and planning, then tell the processing system what their data and processing flow looked like. All we had to do was do what they wanted. And do that well.

Instead of expanding our responsibility to include someone else’s domain and doing an OK job, we restricted our domain to what we knew best. Turning a graph of data processing tasks into an optimal (ish) usage of hardware and network resources. So that our customers could do what they needed to do, in their domain.

That’s just one example of how re-imagining the situation and really solving the user’s problem is what we want to do. There was no way we could be good enough at everyone’s domain and be the best at our domain at the same time. Getting better at undertanding other team’s domains helped, but there’s a limit to how well we could understand it. There were multiple teams that we needed to support at the same time. We just couldn’t do all of it will.

So, instead of trying to get better at the wrong thing, we made sure we understood the other teams to know what they needed, then we focused on our strength2. That’s what let us add even more value.

Even more problematic, there’s often no path that keeps adding value that will lets you keep getting better at what you’re doing as you switch to getting good at doing something more valuable3.


  1. For those mathematically inclined, it’s the difference between a local maximum and a global maximum. ↩︎

  2. More on this later, but for those interested, check out Now Discover Your Strengths ↩︎

  3. Step changes (0 -> 1) are hard. But that’s a topic for another day. ↩︎

by Leon Rosenshein

Power Dynamics

Positional power is an interesting thing. The HIPPO effect is real. When you’re in a position of power, by definition, you have that power. Whether you’re aware of it or not. Whether you use it intentionally or not.

Of course, with great power comes great responsibility. One of the hardest things to remember is that because of that power, people don’t always hear things the way they are intended. And as the person who’s trying to communicate something, it’s on you to make sure the message gets across. The misinterpretation is your fault, not theirs.

Cartoon from workchronicles.com. Frame 1: Boss makes an offhand remark about color on a website. Frane 2: Team hears the comment. Frame 3: Team assumes there’s a problem with the color and starts a deep investigation. Frame 4: Two weeks later team presents the results of their research. The boss has no idea what they’re talking about or why they did it.

One of the most famous examples of this is from the movie Beckett, where King Henry II exclaims “Will no one rid me of this meddlesome priest”, and the next day, the priest is dead. While this almost certainly didn’t happen exactly as depicted in the film, it’s not hard to image that something similar did. Did the King order his firend’s death? Not directly. However, if the King of England, arguably the most powerful man in Europe at the time, complained about something, it’s not at all surprising that his faithful followers did something about it.

England in the Middle Ages is not the only time that happened. It happens all the time. The less contact a person has with someone with significantly more positional power1, or the bigger the power differential, the more likely the subordinate is to listen very closely to the words spoken and do something about it. Whether the person speaking is a King, a General, or just someone with strong influence over your paycheck.

So as the person with that kind of power, it’s critical to think about how your words are heard. That doesn’t mean you shouldn’t say anything. That doesn’t mean you shouldn’t be relaxed and make comments or say what you think. It does, however, mean that you need to think about how your words are perceived. And you need to be explicit about the difference between orders, suggestions, questions, and personal opinions.

I learned this the hard way myself, back when I was a new manager. I asked one of the developers on my team why they had chosen to implement things in a new way when we already an implementation that was very close. I commented that I would have probably just tweaked the existing functionality. At the time, to me, it was just a comment.

The next Monday I got to the office to find that the developer had done a major refactor of the old code to support the existing functionality and be able to handle the new situation. The new code was good. Clean boundaries, no repetition, and very flexible. And over the next 4 years we never used that flexibility. We did occasionally have to go back in and separate the functionality even further. The developer gave up a weekend of their personal time to make a change that wasn’t needed, and in fact, cost us time later. All from a casual comment I made.

Because without enough context, it’s hard to distinguish between a personal opinion, a suggestion, and explicit direction. And in the absences of clarity, people often defer to power.


  1. It’s not just positional power that can skew interpretation. Situational power can do the same thing, as can reputation. Or even volume. ↩︎

by Leon Rosenshein

Exsqueeze Me?

With all due respect to Mike Meyers as Wayne Campbell, I saw something on the internet and the only possible response was Exsqueeze Me?. The quote started out OK, not great, but OK, then, right there at the end, it took a sharp left into crazy town.

Good teams can and will delete tests that have high false positive rates – or that never fail.

Here’s the thing. If you’ve got a test with a high false positive rate, that’s bad. Flaky tests are very bad. Bad in many dimensions. To list just a few of them,

  • They waste your time: Every time a test fails, you’re expected to go analyze what failed and why. Then figure out how to prevent that specific issue. However, if you go analyze the failure and it turns out the really wasn’t a problem, you’ve wasted however long it took you to figure out there wasn’t a problem.

  • Failing tests become business as usual: It’s the broken window effect. When no tests are failing then a sudden failing test is noteworthy. When you have a test that randomly fails then an additional failing test isn’t nearly as noticeable. If it wasn’t important enough to fix the only failing test, then each additional failing test is that much less important.

  • Alert (or Alarm) Fatigue is real: When the siren sounds it’s supposed to be unusual and noteworthy. If your alarm is going off all the time then it’s not an alarm, it’s just life. Just like the boy who cried wolf, if the alarm keeps going off and there’s nothing you can, should, or must do, you start to ignore it.

  • Flaky tests indicate a lack of understanding: It could be a lack of understanding of the domain, the environment, the test setup, or any combination of those three. If you don’t understand the system and situation in this specific case, what else aren’t you understanding? What are you missing that’s going to cause you problems later on?

That’s just some of the reasons flaky tests are bad. Deleting them isn’t the worst thing you could do, and it will fix the first three problems above, but it doesn’t do anything to fix the fourth. In fact, it just hides the problem. Ignoring a problem rarely makes it go away.

Therefore, most of the quote is almost correct. Instead of just removing a flaky test, a much better response is to fix the test so that it’s not flaky. It could be a bug in the code, a bug in the test, a problem with your test methodology, or a lack of understanding. Whichever it is, once you make the test pass consistently, you’re in much better shape. You don’t waste time. You’re incentivized to keep things clean. Alerts mean something. You understand your situation that much better. Which means you get to sleep that much better at night.

It’s the last part of the test that’s just plain WRONG. Some will say that a test that never fails serves no purpose and it’s wasting resources. Time. Bandwidth. Cognitive load. For no measurable benefit. They haven’t stopped a single bug from getting through.

That facts are real. All tests take time, bandwidth, and add some amount of cognitive load to the developers. But all of that, for all of your unit tests, should be minimal. If they’re not minimal then you have other problems (bad tests) you should fix1.

Just because a test hasn’t caught a bug yet, you can’t know that it won’t ever catch a bug. Even if no-one is changing the code directly, those tests can still help keep you safe. They do things like:

  • Protect against changes in dependencies: Dependencies outside of your control can change. Those changes can make your code break. If you don’t test, you don’t know.

  • Protect against environmental changes: There are lots of things in the environment that can change. Networks come and go. Clock speed changes. Processors get replaced and new ones show up. There can be subtle differences in the environment. If you don’t test, you don’t know.

  • Protect against bugs in tooling changes: Similarly, tools change. Runtime environments, compilers, and interpreters can change. Are you relying on undefined behavior? That can change without you knowing it. If you don’t test, you don’t know.

  • Provide examples of how to use the code being tested: Tests are great examples. They can be documentation. They can be used as a learning environment. They can be a reference design.

  • Acknowledge Hyrum’s Law: Given eough time and users, everything your code does, intended or not, is going to be relied upon by someone. You never want to change behavior on your users without knowing about it. That is not how you want to surprise your users.

  • Prevent bugs in the code under test: Finally, and certainly not the least important, you never know when a test is going to show you that you’ve broken something. Past performance is a good indicator of future performance, but it is not a guarantee. If you don’t test, you don’t know.

And that’s why the only possible response to someone saying you should delete tests that never fail is Exsqueeze Me?


  1. Those tests might not be flaky, but they can be bad for many of the same reasons. And you should fix them, not delete them. But that’s a slightly different post. ↩︎

by Leon Rosenshein

The Messy Middle

I’ve talked about a messy middle before, but that’s not the only messy middle. That one was about the land of It Depends. Where you can’t know the answer until you have all of the context. But there are other messy middles.

Consider the space between products and platforms. The split between what customers bought, and what the company built internally so that they could then build the things that customers bought. In almost all cases people know about the products, but almost no-one outside the company knows about the platforms.

A software layer diagram showing two distinct layers, product and platform, and a messy middle blurring the interface of the two layers

Not every company has this split. There has to be enough products, or at least enough engineers, for that to happen. At some point they become not just different things to work on, they become entirely different organizations within a single company.

It’s something you don’t see working at small, single product companies. You often don’t see it at companies with only a few products. Back when I was working for Microprose the company was working on multiple games at any given time, but each game development team was completely self-contained. There were a few people who worked across games, but the code for each one was essentially unique. Everything was a product.

The first place I really saw the split was at Microsoft, where there was a Windows team, that was the platform for everything else, a few other big teams, like Office, Developer Division, and SQL, and a bunch of smaller orgs, like Online, Home, Games, and Research. The odd thing there was that Windows was also a product itself.

At the time I was on the Product side, working on games, and didn’t pay too much attention to the platform side of the world. I only cared about what APIs it provided. Also, we were such a tiny part of the Microsoft world that Windows, our platform, might as well have been made by a different company. We could ask a few questions that outsiders couldn’t, but we didn’t have much influence. The Office and Developer Division teams had that.

Then I moved to Uber. When I started at Uber they had just gone through their split. By that time I was on the platform side, working deep in the engine room. At that time Uber was already big enough that we even saw differences between real-time and offline/batch processing.

You see, down in the engine room we spent a lot of time building a product agnostic processing engine. We didn’t care what folks did with our CPUs and persistent storage. We abstracted away the complexity of scheduling and placement engines and let our customers focus on data and control flow. And that’s where I ran head-first into another messy middle.

Besides making sure the system worked and was easy to use, I spent a lot of my time as an evangelist. Showing countless other teams how they could benefit from our platform. The more success I had, the more I saw that there was at least one missing layer. I saw multiple teams creating their own internal platforms, based on our platform.

Being closer to the product, those platforms were more opinionated than ours. Because they knew what their 2-5 teams were doing and didn’t care about the other 20 or 30 teams using the base platform, they were able to make assumptions and build decisions into their tools.

That’s where it started to get messy. Those little infrastructure teams didn’t really know about or talk to each other. They started to duplicate work. Or at least solve similar problems. Not just each other’s, but problems that we were solving. The line between the different teams and the platform ebbed and flowed as problems were identified and solved.

It was a bit of a surprise, but it wasn’t all bad. Yes, it was messy. Yes, there was some duplication. But there was also a lot of advancement. By not forcing groups of teams to wait for a shared solution those teams were able to move faster. By waiting for consensus on the solution to form and move into the platform we avoided churn. By consolidating known shared work we freed up resources for product specific work.

We turned what could have been a bottleneck into a competitive advantage. Through lots of communication. Of problems. Of intent. Of timelines. By keeping the cross coupling down. By encouraging teams with similar context to make those specialized solutions that kept domain specific work with the domain. By taking the common parts of those domain specific solutions and commoditizing them. By recognizing that one team’s platform is another team’s product.