Recent Posts (page 2 / 63)

by Leon Rosenshein

Seeing Like a State

I was going to compare and contrast Scott’s Seeing Like A State with Brand’s How Buildings Learn, but when I went to find the link to what I wrote, I realized that How Buildings Learn is going to have to wait, because, somehow, I haven’t directly talked about Seeing Like A State. I have mentioned legibility though, which is directionally similar.

In Seeing Like a State, Scott talks about the tendency of the state, really any large organization, to want to be able to measure, record, and control a system. Making it measurable means making it possible to record it in a ledger. Also, the organization (state) has a model that is used to predict the future. If you combine the record of how things were, with the model of how things will be, it’s not a big leap to believing you can control the future by controlling the measurements. And if you’ve made that leap you get to feel good about things. You have predictability. The model tells you what to expect. You have agency. Your results are the inputs to the model, so you have direct control over the results.

Unfortunately, things almost never work out that way. Models are, at best, approximations. So the results are at best approximations of the real world. The measurements that go into the model are often approximations as well. And when they’re not, they’re samples taken at a specific point in time, with a specific context. You can guess what the result of using approximations as inputs to a model that is also an approximation. You get a prediction that sometimes has some similarity with reality, but very often doesn’t. You often run into the cobra effect.

This applies to software development as much as it applies to government. As much as software development is about making complex systems out of tiny parts that do one thing, it’s also a social activity. Just like organizations and states, you can’t predict the output of software development without recognizing that there are people involved and including their own internal thoughts and motivations. And while those things are generally qualitatively knowable, until someone like Hari Seldon arrives and gives us psychohistory, it’s not going to be legible.

Which means that the key takeaway from Seeing Like A State is not that you can measure and predict the future, but that you can’t. Or at least, you can’t predict to the level of precision and accuracy that you think you can. But that doesn’t mean you shouldn’t measure, or that you shouldn’t use models to predict. It just means you need to be much more thoughtful about it. You need to work with the system, from the inside. It’s much more about Governing the Commons, than seeing like a state. But that, like How Buildings Learn, is a topic for another day.

by Leon Rosenshein

Code Coverage Is NOT useless

Mini rant today. There are lots of teams across the software industry that are called some variation of “Software Quality”. That’s a lovely term. It means different things to different people. There are (at least) two kinds of quality at play here. Internal software quality (ISQ) and external software quality (ESQ). ESQ is about correctness and suitability for the task at hand. ISQ is about the code itself, not whether or not it works as specified. Not all quality teams are responsible for both kinds of quality.

Furthermore, as much as people want it to mean that the team called “Software Quality” is responsible for ensuring that the entire org is building software with both internal and external quality, that isn’t the case. Those teams are not, and cannot be, responsible for what others do. After all, they’re not the ones writing the code. What it does mean, and what they can, and generally do, do, is that they are responsible for defining and promoting good practices and especially, for pointing out places in the codebase where the code misses the mark.

There are two very important points in that last sentence. The first is that the quality team’s job is to identify where the code misses the mark. NOT the developers. Code ownership is important, and people write the code, but it’s important to distinguish between problems with code and process and problems with people. That, however, is a topic for another time.

The other point, and where I’m going with today’s post, is the pointing out part. The quality team’s job is to point out, with comparable, if not truly objective values, how much ISQ the code has. There are lots of ways to do that. Things like cyclomatic complexity, lint/static analysis warnings, code sanitizer checks, or code coverage percentages. Those measures are very objective. There are X lint errors. Your tests execute Y percent of your codebase and cover Z percent of the branch decisions. And you can track those numbers over time. Are they getting closer to your goal or further? You can argue the value of all of those metrics, but they’re (relatively) easy to calculate, so they’re easy to report and track.

Which, finally, gets us to today’s rant. I ran across this article that that says code coverage is a useless metric. I have a real problem with that. I’m more than happy to discuss the value of code coverage metrics with anyone. I know that you can have 100% code coverage and still have bugs. It’s easy to get to a fairly high percentage of code coverage and not say anything about correctness. In complex systems with significant amounts of emergent behavior it’s even harder to get correctness from low level unit tests. Just look at that article.

What bothers me most about that article is the click-baity title and the initial premise. It starts from “Because it’s possible for a bad (or at least uncaring) actor to get great coverage and not find bugs, coverage metrics are useless.” If you have that approach to management, you’re going to get what you measure. To me, code coverage is a signal. A signal you need to balance with all of the other signals. Letting one signal overpower all the others is hiding the truth. And like any useful signal, its absence is just as enlightening as its presence. If you have a test suite that you think fully exercises your API and there are large areas of code without coverage, why do you even have that code? If you really don’t need it remove it. Maybe your domain breakdown is wrong and it belongs somewhere else? Should it be moved? If you find that there are swaths of code that are untestable because you can’t craft inputs that exercise them, do you need a refactor? Is this an opportunity for dependency injection?

So the next time someone tells you that code coverage is a useless metric, maybe the problem isn’t the metric, it’s how they’re using code coverage. That’s an opportunity for education, and that’s always a good thing.

by Leon Rosenshein

Fly Like An Eagle

I’ve talked about time before. It passes, and there’s not much you can do about that. Even in a simulator, time passes. That adds a lot of complexity. Especially in keeping track of things. And when they happened. And when you found out about them. And when someone asks you about it.

I’ve talked about time and dates being hard to deal with before. Then there’s the winter solstice, which merges time, dates, durations, and the English language. You end up with something that is hard to track, hard to talk about, and, more to the point, hard to reason with and hard to program for.

Even if you go with the standard unidirectional time, there are still a lot of things to keep track of. There’s the very simple side of it. What happened and when. I started working at Aurora in January 2021. That’s straight-forward. I stopped working there in February of 2022. Also straight-forward. It’s pretty easy to keep track of that. That means I worked at Aurora for just over 13 months. Very simple. But it gets more complicated. I also started working at Aurora in May of 2023. So I have 2 start dates. And I’ve worked for Aurora for almost 15 months. It’s also been almost 30 months since I started working for Aurora. So that’s a great example of how knowing a start date isn’t nearly enough to really know what happened.

Another place where time isn’t as simple is time zones and things like daylight savings time. In the United States, Which means twice a year, in most places (but not all) the clocks change, either forward or backward. Outside of those times, it’s easy to tell what time it is, but during those missing/added hours it gets a little odd. Add in time zones and it gets harder.

What makes it even worse, is that the rules for time zones and daylight savings time change. So if you ask when daylight savings time ends for a given year, you first need to find the rules that were in effect on that date, in that location. Which is very likely to be different than what the current rules are for your location. As for asking about the future, you can make a prediction, but you won’t know for sure until it happens.

Another thing you need to keep track of is the difference between when something happens and when you find out about it. One of the more common places where this happens is around payroll. On any given day you have a pay rate. It might be hourly, weekly, monthly, annual, or even a percentage of something else, like sales. That’s simple (except for things like time duration and daylight savings time). But what happens when there’s a change to pay rate? Sometimes it’s forward looking, and that’s not too bad. On some specified future date, the rate changes, so when the date arrives you change the calculation, and all is well.

But what happens when it’s a retroactive change? As of the first of last month, your new rate is 5% higher. Now you need to go back and calculate a new payment, subtract what was paid, then pay the delta. Again, not too bad, as long as you remember to do it. Consider this though. On Aug 1st you’re told that as of June 1st your pay rate has been increased. Great. Congratulations. But you just applied for a new mortgage on July 1st and you told them your pay was X. You were being honest, but on Aug 1st you find out that you were wrong. Does it matter? Maybe. Maybe not. But it’s real, and it happens. So you need to keep track of what happens, when it happens (take effect) and when you found out about it. Because all of those things change the answer you’ll give when asked a question with a temporal component.

As much as we’d like to make Time Stand Still, as Steve Miller said, time keeps on slippin’, slippin’, slippin’ Into the future. We’d at least like time to be linear, monotonic, and never go backwards, but that’s not the way things are. At least not as people experience it. To physics, time might always go foward at a constant rate, but to the people who live it, things aren’t as simple. Things happen in their own time. We find out about them in out own time. Sometimes right as it’s happening. Sometimes before so we can plan for it. And sometimes long after things happen. And we need to keep track of all of that.

by Leon Rosenshein

Milestones Vs. Steppingstones

In software, we’re very familiar with the idea of a milestone, but not steppingstones. Which is odd, because the two terms are very similar. Where does the terms come from? Like many things in the western world, the term milestoone comes from the Roman Empire. The Roman Empire did lots of things throughout Europe and Asia. Some good, some bad. One thing they did really well was build roads. Good, solid roads that you could count on to get you from here to there, regardless of the season or weather. You also knew where you were, because they put milestones along the road. At fixed, well known intervals (every mile) along the major roads was a marker, a milestone, that you could use to know how much progress you had made.

These days we have mile markers along our major roads, not actual stones, but we still use the term. In projects we use the term to mark significant points along the project’s journey from start to finish. They’re usually big, complex, demo-able things with fixed dates. They can be pretty important. They are almost always something fairly concrete and definable in the domain the user of your software can understand.

Steppingstones, on the other hand, aren’t something we talk about much. While milestones are the markers along the way that let us know how far we’ve come, steppingstones, on the other hand, are the little increments you use as you proceed from milestone to milestone. They’re solid, well anchored, stable places you can step to along the way. They usually help you to avoid falling into the water or sinking into the mud, but you can use steppingstones any time you need a place along the way to keep from making a mess or getting stuck.

In software we love to talk about analogies. To the stakeholders, the people who are not closely involved in the development of the software, but are responsible for ensuring the project succeeds, and often also responsible for providing resources, milestones often get used to provide confidence. Confidence that things are proceeding at the expected pace, and that the result will be something like what they’re expecting, and that it will arrive on the date its expected.

For those directly working on the project, the implementors, milestones provide a goal to work towards. They explain, in generally plain language, the functionality that someone who isn’t deeply involved with the project is supposed to be able to see. They’re a little bit squishy, because they don’t describe all of the possible edge cases, problems, and oddities along the way, but that’s a good thing. They let you figure out how to meet the requirements. And when the requirements don’t make sense, they give an opportunity and a forum to explain, again, in simple language, why they don’t make sense. Even more important, they give you a date. A time when the result is expected. That’s really good for helping you focus on what’s important. Focusing on which decisions need to be made now, and which decisions can (and should) wait until later.

Just like milestones have an analogy in software development, steppingstones have one too. As with milestones, stakeholders and implementors view steppingstones differently. But where they both see milestones as important, stakeholders generally don’t care about any of the details of the steppingstones. In fact, as long as you don’t get fall of the path, they don’t even want to hear about the steppingstones. They’re an implementation detail left to the implementors. For the implementors though, steppingstones are critical. They’re the stuff of the day-to-day work. Often you can’t see more than 2 or three steppingstones in front of you, so you can’t pick out which one you’re going to use until you get there. And where you find yourself directly impacts the choices you have on where to go next. You often have some idea of the steppingstones along the way, but which exact ones you end up using you won’t know until you get there.

Here’s another way to think about it. At the beginning of Raiders of the Lost Ark Indy is trying to get a golden idol from a lost temple. He knows his milestones. Find the temple. Find the entrance, Find the idol in the temple. Get out, Get home. He has a certain amount of supplies and tools, and he plans his route accordingly. What he can’t do beforehand though, is plan how do to each of those things. He knows there are booby traps along the way, but he doesn’t know what they are until he gets there. So he finds the steppingstones as he comes to them. In the room with the idol, he literally needs has to choose the correct first steppingstone before he can even start looking for the next one until he gets to the idol.

When you think a little more deeply about it, the difference between a milestone and a steppingstone is more of a question of scope and viewpoint than it is an objective reality. Just as software architecture can be seen as software design at a different scale, your steppingstones could be someone’s milestones, and your milestones are probably viewed as steppingstones by someone else. Which is another way of saying we need to think about the steppingstones along the way. And take many more much smaller steps.

by Leon Rosenshein

Complexity And Congitive Load

Software design is not about minimizing design complexity, but rather spending our complexity budget where it can do the most good. — Kent Beck

Let’s face it. Very often the systems we build are complex. And they’re complex in many different ways. Ways you just need to deal with. And it’s got nothing to do with how easy (or hard) it is to explain the task in English.

Sometimes the complexity is in the domain. In the US, if you’re writing tax software then you have the complexity of the federal tax laws, which are at best ambiguous, and probably contradictory. Add to that taxes for state and local jurisdictions. And foreign work and income. And where you live. And where you work.

Other times the complexity is in the details. You would think Time is the most monotonically increasing thing there is, but it isn’t that simple. Time is a lot more complex than you think. The same applies to people’s names and addresses. In fact, keeping track of pretty much any personal information is more complicated than you think. And that’s before you think about the privacy implications of storing that data.

It can also be scale that makes for complexity. It’s (relatively) easy to handle 10 transactions per second, but if you need to handle 10 million without adding latency that’s a whole different level of complexity. Finding the longest word in a list of 10 words is easy. Finding the longest word in Tolstoy’s War And Peace is a much more complex. And that’s not even thinking about which language you’re counting in.

We can’t get rid of the complexity, so compartmentalizing it helps. Providing the right level of abstraction hides the complexity. Behind the abstraction you only need to worry about the complexity. Outside the abstraction there is no complexity. You only need to think about the problem you’re solving and don’t need to think about the complex parts.

Now we’ve talking about cognitive load. It’s in the title and something I’ve written about before. It’s a measure of how many things you need to keep thinking about and be aware of that aren’t the problem you’re trying to solve, but are critical to solving the problem. The more you can reduce the cognitive load, the less effort you need to put into the ancillary problems, the more effort you can put into solving the problem you’re trying to solve.

Which is what Beck is talking about. Figure out where the complexity in your problem is, and put your effort there. Make everything else as simple as it can be. Define the domain you work in. Don’t try to be everything to everyone, just solve the problem you’re solving. Use existing solutions. Don’t build your own encryption module, use a well vetted one. Don’t build your own database system (you might need your own tables and stored procedures, but not a new dB).

You have a problem you’re trying to solve. You have a limited about of cognitive load you can bring to bear on the problem, So spend your cognitive load (and complexity) wisely. Spend it on the part of the problem that is your value add, not somewhere you can hide it behind an existing abstraction.

by Leon Rosenshein

Monolith Is A Deployment Strategy, Not An Architecture

There was an article a few weeks ago about how the Amazon video team switched one of their tools from a distributed microservice architecture to a monolith that runs/scales on EC2. Does this mark the beginning of the end for microservices? Were we wrong to decompose all those monoliths into microservices? Should we recombine all of our microservices and serverless systems back into monoliths?

Or, is this just another case of It Depends? I say It Depends. Because the difference between a monolith based system and a microservice based system isn’t really the design and segmentation of the code. It’s in the tradeoffs you make when deploying the code. The tradeoffs you make with Conway’s Law to keep from shipping your org structure. The tradeoffs you make when you think about needing to scale part of the process, but not all of it. The tradeoffs you make for performance. The tradeoffs you make to manage cognitive load.

Sure, monoliths get a bad rap and we often think of monoliths as nothing more than a container for your Big Ball Of Mud. And sometimes they are. I’ve been involved in my share of monolithic balls of mud. But they don’t have to be that way. If you pay attention to domain driven design you can have a well written monolith. With separation of concerns. With clean boundaries. With good abstractions that keep your cognitive load down.

At the same time, we think of microservices as the answer to all of our scaling needs. Need a new API? Just make a new microservice. Need more of something? Just create more instances of that existing service. At the same time though you end up with lots of different ways to do something. Every team/service becomes an island and does things its own way. And each one of those calls between services takes time, slowing things down. Have you ever tried debugging across service boundaries? It’s not easy. Or even just tracing what services are used in any given call chain. At one point in Uber’s microservice journey there were more microservices than engineers. Personally, I don’t think that’s a good thing.

So now that we’ve determined that you can have good (or bad) design with both monoliths and microservices, how do you choose? You choose based on what makes sense as a deployment methodology. How are you going to update things when you need to? It comes back to those tradeoffs. There are lots of things that you’re balancing. Ease of deployment. Horizontal vs vertical scaling. Depth and tightness of coupling. Debugability. Cognitive load.

Deploying a monolith is easy. There’s only one thing to deploy, so you don’t have to worry about versions. You don’t have to worry about the order of deployment. It’s always compatible with itself. Rollback, if needed, is just as easy. Deploying a single microservice is also easy, but what if it’s a breaking change? What else do you need to deploy first? What do you need to deploy after? What is or isn’t backward compatible? How can you test the whole system? Lots to think about and lots to get wrong if you’re not careful.

On the other hand, scaling is much easier with a microservice. If you have a service that is slower than the others, you can just deploy more of that microservice. Or you can give just that service more CPU/Memory. You get to scale what you need, when you need it. A monolith is the exact opposite. If you have one function call, you need to scale out/up, you need to scale everything out/up. So you have lots of waste.

Everywhere you look, you should be looking at monolith vs microservice as a question of what and how you deploy things, not how you decompose things into functions/libraries/APIs.

by Leon Rosenshein

0, 1, Many

Continuing on with looking at numbers, think about counting. We all know how to count. Or we think we do. But do we really think about how we should be counting. Consider the following quote.

“Common programmer thought pattern: there are only three numbers: 0, 1, and n.”     – Joel Spolsky

There’s more than a little truth to that statement. After all, from a linguistic standpoint there’s lots of precedent for it. My non-linguistic experience also tells me that there’s not just a quantitative difference between 0 and 1 and N, but there’s also a qualitative difference.

The qualitative difference shows up in many different ways. 0 is the same as never. That can’t/doesn’t happen, so don’t worry about it. 1 is the same as always. Count on it happening. Assume it already happened. Either way, always or never, 1 or 0, TRUE, or FALSE, it’s a constant. There are no decisions needed. N, on the other hand, is maybe. You don’t know. It might happen. It might not. You can’t count on it. You need to handle it happening. You need to handle it NOT happening. Be prepared for both cases1.

Another qualitative difference is that when there is a choice, it’s often not either/or, but one (or more) of many. In code that shows up as something that started as if/else, but eventually morphed into a series of if/elseif/elseif/elseif/…/else. Sure, that can work, but there are better ways. Listen to your data and let it guide you in your programming. This is where object-oriented programming, in it’s true sense, really comes into it’s own. You make the decision early about what the object is, then you just act on/with it in a linear sense. You get back to always (or never) for most decisions and let the object worry about what it means in that specific case.

Then there’s the learning case. I’ve said before that versions 1 and 2 of something are easy. It’s when you get to version N that things get interesting. Again, that first version is the never case. No one has done it before, so there are no special cases. Just do something and it will be ok. Version 2 is the always case. For anyone who has used version 1, it’s always been that way. There’s no ambiguity. Everyone, on both sides, knows what to expect. It’s only when you get the version 3+ that you get into the maybe case. You don’t know what your customer has experienced. You don’t know what they expect. They don’t know what is coming from you. And as I’ve said, that’s where the learning is. Dealing with the ambiguity is where you stretch and grow.

So, whether you’re thinking about your design, your implementation, your career, or life in general, think about how you deal with counting.


  1. Hint. You might think it’s a 0/1 situation, but check your assumptions. It might be a 0/1 situation, but our assumptions are often wrong, so think them through ↩︎

by Leon Rosenshein

1 > 2 > 0

I’m pretty sure this story is true, because I’ve heard it too many times, sometimes from people who could have been there. The people involved and timeline match up. Also, even if it’s not true, there’s still something to learn.

Amazon has always been about 2-pizza teams. You should be able to feed an entire team lunch with 2 pizzas. The idea is to keep them agile and innovative. To minimize communications delays and bottlenecks. It works pretty well too. It says nothing about the software architecture, only the scope of responsibility of a team.

Back around 2002, Amazon’s internal system was a large monolith. And there were lots of 2-pizza teams trying to work on it at the same time. It was pretty well designed, but with that much going on there were lots of interactions and coupling between teams and the work they were doing. So much that it really started to slow things down. It got so bad that Jeff Bezos issued an ultimatum.

All teams will henceforth expose their data and functionality through service interfaces.

That’s a pretty bold requirement. It meant that everything needed to change. From the code to the tooling to the deployment processes and automation. Over the next 2-3 years, Amazon did it. They changed to a Service-Oriented Architecture that endures to this day. It broke a lot of the direct coupling that had crept into the monolith. And it led directly to the creation of AWS and it being the dominant cloud platform. A cloud platform that can do everything from hosting Netflix’s compute and storage to hosting this blog.

It did that by clearly defining boundaries and letting teams innovate inside those boundaries to their hearts content. But it also led to some new problems. Because each team was responsible for everything inside those boundaries, teams started to write their own versions of things that were once shared libraries. And we all know that duplication code is bad. You duplicate bugs. It makes updates hard. Teams ended up with code they were responsible for that they didn’t necessarily understand.

Enter Brian Valentine. He’d recently joined Amazon (in 2006) as a Senior VP, coming from Microsoft, where he’d led, among other things, the core Windows OS team. A huge organization w/ 1000’s of people developing hundreds of libraries and executables that made it up. He looked at what was going on and realized that lots of teams were writing the same code. That there were multiple implementations of the same functionality scattered throughout the codebase. That it was inefficient and wasteful and that those sorts of functionality should be provided by a set of core 2 pizza teams so that the other teams could focus on their specific parts of the business.

He worked with his team and his peers to define a system where those core teams would be identified and created, then all the other teams would start using them. They wrote the 6-pager that defined the system, how it would be created, and all the benefits. Eventually it got to a meeting with Jeff Bezos, then Amazon CEO. I believe everything up to this point is true. Here’s where it gets apocryphal. I want to believe it’s true, but I just don’t know.

After the required reading, Valentine summarized the point of the meeting by writing a single line on the whiteboard

1 > 2

Huh? One is definitely not greater than two. One is strictly less than two. What Valentine meant was that having one way to do some shared bit of functionality that is actually shared, not copied/reimplemented, is better. It’s more efficient. It means less duplicated effort. It lets teams focus on their value add instead of doing the same thing everyone else was doing. That makes sense. So that’s what Amazon does now, right?

Nope. After Valentine said that was how things should be done and stepped back, Bezos stepped up and changed it slightly. To

1 > 2 > 0

What? That makes even less sense. Two is greater than 0, and so is one, but two is not between one and zero. What Bezos was saying was that having one solution might be better than having two, but waiting for some central team to build something new or update some existing service to have a new capability takes time. It adds coupling back in. It makes things more complex. And while you’re waiting for the central team to do its part the dependent team can’t do anything. So for some, potentially long, period of time, you don’t have 1 solution, you don’t have 2 solutions, you have 0 solutions. And having zero solutions is even worse than having multiple solutions. The plan pretty much ended there. Like they say in the go proverbs, “A little copying is better than a little dependency.”

Which is not to say you never go back and centralize common things. Amazon doesn’t expect every team to write their own operating system. They don’t write their own S3 access layer. They use Linux, or the AWS SDK. And when there is a critical mass of people doing something common then you write a centralized library that is shared. Or write a new service with a defined interface that everyone should call.

The trick is to do it at the right time. After there are enough instances to let you know what to build, but before there are too many versions to be able to replace them all with the central version.

by Leon Rosenshein

What You Do Next

“If you hit a wrong note, it’s the next note that you play that determines if it’s good or bad.”     –Miles Davis

That’s pretty deep. And it applies to things very far removed from jazz. Things that are very structured and precise. Things like code. Or, maybe, software isn’t as structured, precise, and linear as we think.

Before you get upset and tell me I’m nuts (which may be true, but doesn’t matter here), I want to be clear. Writing code that is well structured and precise is important. Being clear and understandable is important. Separation of concerns and listening to the data is important. Big ball of mud development is not what I’m talking about. I’m talking about how we write code, not the code we write.

Consider this alternate phrasing of what Miles Davis said.

You learned something new. How you respond determines if the knowledge was good or bad.

Those two sentences pretty much define the software development process. Everything else is an implementation detail. Waterfall or agile, greenfield or brownfield, startup or established in the industry, it doesn’t matter.

Put another way, it’s the OODA loop. Observe (see where you are). Orient (understand the situation). Decide (choose the next step). Act (do it). Because you are where you are, and the situation is what it is. Your involvement in getting there (Miles’ wrong note) doesn’t matter anymore. The only thing that matters is what you do next. How often (or fast) you run the loop.

Think about how empowering that is. The immutable past has happened. You can’t do anything about it. But you have a lot of control over the future. You have agency. You have power. You have the ability to make sure that the next step puts you in a better place than you are now.

If your next action moves you closer to your goal than where you currently find yourself then your move was good. Whether you find yourself closer or farther1, run the loop again. And keep running it. Eventually you find yourself at your goal. The more experience you have in the field, and with the situation, the more likely your choice of action will move you closer to your goal.

That applies whether you’re playing jazz or writing software. And now that I think about it, to life in general.


  1. Closer or farther in the minimizing time and effort sense. Sometimes refactoring, which takes time, appears to be irrelevant, and has no visible impact, actually gets you to your goal with less time and effort. ↩︎

by Leon Rosenshein

What Are You Testing? II

I’ve written about what you’re testing before. That article was about writing tests in a way that you could look at the test and understand what it was and wasn’t testing. It’s about writing readable and understandable code, and I stand by that article.

However, there are other ways to think about what you’re testing. The first, and most obvious way to think about it is thinking about what functionality the test is validating. Does the code do what you think it does? That’s pretty straightforward. You could think about it like I described in the article and how readable and understandable the test is. Or, you could think a little more abstractly, and think about what part of the development process the test was written I support of.

Taking inspiration from Agile Otter, you can think about the test at the meta level. What are you writing the test for? Is the test to help you write the code? Is the test to help you maintain the code? Is the test supposed to validate your assumptions about how users with use your code? Is the test supposed to characterize the performance of the code? Is the test supposed to help you understand how your code fails or what happens when some other part of the system fails? The reason you’re writing the test, the requirements you have for the test, help you write the test. It helps you know how to write the test and what successful result looks like.

Industrial Logic slide- Title: So, only write fast automated tests, right? Body:No. Only fast, automated microtests will support refactring. Other tests support system correctness, release-worthiness, etc.

Why we write tests.

A set of tests written to validate the internal functionality of a class, with knowledge of how the class operates has very different characteristics than a test written to validate the error handling when 25% of the network traffic times out or is otherwise lost. Tests written to validate the public interface of that class also look and feel different. They all have different runtime characteristics as well and are expected to be run at different times.

Knowing, understanding, and honoring those differences is key to writing good tests. Because the tests need to not only be correct in the sense that they test what they say they do by their name, but that the results of the test also accrue to the meta goal for the test. Integration and system level performance tests are great for testing how things will work as a whole, but they’re terrible for making sure you cover all of the possible branches in a utility class. Crafting a specific input to the entire system and then expecting that you can control exactly which branches get executed through a call chain of multiple microservices and database transactions is not going to work. You need unit tests and class functionality tests for that. The same thing if you need to do some refactoring. Testing a low level refactor by exercising the system is unlikely to test all of the cases and will probably take too long. On the other hand, if you have good unit tests for the public interface of a class, you can refactor to your heart’s content and feel confident that the system level results will be the same.

Conversely, testing system performance by measuring the time taken in a specific function is unlikely to help you. Unless you already know that most of the total time is taken in that function, assuming you know anything about system perf from looking at a single function is fooling yourself. Even if it’s a long-running function. Say you do some work on a transaction and take it from 100ms to 10ms. A 90% reduction in time. Sounds great. But if you only measure that execution time you don’t know the whole story. If the transaction only takes place 0.1% of the time, and part of that workflow involves getting a user’s response to a notification, saving 90ms is probably never going to be noticed.

So when you’re writing tests, don’t just remember what one thing you’re testing, also keep in mind why you’re testing it.