Recent Posts (page 2 / 71)

June 23, 2025 by Leon Rosenshein

Docs First And The Definition Of Done

I’ve talked about doc culture before. About the importance of writing things down. About how you can use a narrative doc, that everyone reads before talking to make meetings more efficient. Most folks think of this as the Amazon approach.

Amazon does run on docs. Many meetings start with a document, not a slide deck, that people read in the meeting. They comment on it live. The authors provide direct feedback. Then, when everyone is in roughly the same place, the discussion starts. The meeting is then about the questions raised and not answered by the document. Hopefully from people bringing different contexts and perspectives, but sometimes just because the doc isn’t ready. In the latter case, the meeting gets rescheduled until the doc is more finished. Either way, it saves a lot of time.

There’s another way that documents can help. And again, Amazon is pretty well known for it. It’s working backwards. For a new product, you start with the press release and the matching frequently asked questions. You get buy in on the end goal, the user problem you’re solving, and what the business case for solving the problem is. When you’ve gotten that approved, you have your north star. Your mission.

What it doesn’t do is say how you’re going to get there. You might have references to things that explain why it’s important and how the user (and business) will benefit, but it’s not about the technology. It’s not about the mechanism of the solution. It’s about solving the problem

That’s what makes the documentation first approach so powerful. It puts the user front and center. It emphasizes their why. It’s your new north star. Whenever you have to make a decision, that North Star provides the context to make the decision.

It’s constraining and freeing at the same time. The press release that you’ve written has locked in the user benefit and why people are going to care. Sure, you can make small changes in the details, but if you did a good press release, you don’t need to. You can’t go solve some other problem because it seems like a good idea. You can’t make some visual change and claim that you’ve solved the problem. You know what you’re going to deliver.

At the same time, the press release and accompanying FAQ also give a tremendous amount of freedom. You know what you’re doing, but the how isn’t defined. Whether you’re doing something new, adding a new feature to an existing product, or just making your current users happier while they do what they’ve always been doing, you get to define how to do it yourself. Are there interface changes? Possibly. Internal data structure changes? Probably. New/better algorithms? Maybe. Whatever you need to do. It’s up to you, as the software engineer, to engineer the solution.

The nice thing is, you don’t need to be done before you share what you’ve done. You can share it internally and with your users. You can get feedback on how you’re doing as you approach your goal. You take small steps in a direction. You see if you’re getting closer to the north star. You take some more small steps. You check again. You keep working towards that goal. With the knowledge that when you get there, you’ve done something your users will consider worthwhile.

As Uncle Ben said, “with great power comes great responsibility”. You’ve got the power to do anything you want. But you’ve also got the responsibility to solve the user’s problem. How you use the power and how you handle the responsibility is up tp you.

June 20, 2025 by Leon Rosenshein

System Goals

non-functional requirements code for the maintainer observability charity majors

As CEO and Co-Founder of Honeycomb, Charity Majors has a lot to say about observability. She also has a lot to say about software development in general. And software leadership. Over the years I’ve learned a lot from her stuff.

Recently she was on the How AI Is Built channel. It’s a pretty good overview of how to do observability and why it’s important. On top of that, the title of the episode really spoke to me. It’s one of those non-functional requirements, even if it’s not strictly worded as an -ility.

Build systems that can be debugged at 4am by tired humans with no context

That right there. That’s the goal. Make sure the maintainer has life as easy as possible.

Developer working late at night thinking about code

Of course, we want to write perfect code. Code that always works. That handles upstream and downstream issues. That scales seamlessly. That never fails and doesn’t need maintenance. But the reality is that code can fail. You will need to maintain it. Especially when the environment it’s running in changes. Different inputs. Different scale. Different latency requirements.

And of course, when it does fail, it won’t be at 10:00 in the morning while the original author is standing there expecting it. Instead, it will be early on a weekend morning, and you, as the on-call engineer, need to notice there’s a problem, figure out what’s happening at that moment, come up with a remediation, and implement it. While you’re barely awake working on code you’ve only seen before in passing.

That’s where runbooks help. They tell you where to start looking and how to deal with similar issues. That’s when observability shines. It tells you what’s really happening. Not what you expect to be happening or what you think_ is happening, but the actual ground truth. (Current) Comments and design docs are critical as well. They tell you want the original (and hopefully current) intent and constraints were. That’s when you need accurate logs and databases to show you how you got to where you are.

At 4 AM you have no context. You have no understanding. Before you can do anything, you need to build the context to understand the situation. Then, and only then, can you make a considered change to remediate. Sure, you can just start changing things and seeing what happens, but more likely to make it worse than to fix it when you do that. Remember, Slow is smooth and smooth is fast.

And that’s Coding for the Maintainer

June 18, 2025 by Leon Rosenshein

Superpowers And Optionality

unix way bounded contexts distributed systems optionality cognitive load

A long time ago I talked about distributed systems superpowers. Those are great things to have. When you’re building a distributed system, you really want your core functionality to have those superpowers. They make building the features your users want much easier. That’s your superpower

A quick recap. Here’s the 5 superpowers I listed.

Idempotency: The system should do the right thing if asked to do the same thing twice.
Availability: The system should always work.
Scalability: The system scale (up/out) automatically when it gets slow.
Durability: The system to remember what I told it.
Consistency: The system should give the same (correct) answer to everyone, always.

As usual, there’s some tension there. Especially when you consider some of the common programmer’s fallacies. Those pesky laws of physics don’t change, so you can’t be everything to everyone all at once. You have to make trade-offs.

Those superpower are not, however, something you only need if you have a distributed system. There’s the simplest question of what are you building that isn’t a distributed system? Pretty much every integrated circuit you interact with, from simple logic gates to FPGAs to multi-core CPUs or custom built GPUs is itself a distributed system. A system made up of discrete parts that talk to each other and may, or may not, have some of all of those superpowers.

If everything is a distributed system, then those aren’t distributed system superpowers, they’re just plain old superpowers. And that’s true. But there’s another way to think about distributed.

Regardless of the spatial distribution of a system, almost all non-trivial systems have distributed ownership. When you think about distribution that way those superpowers become even more important. Because the stronger they are, the more they constrain how much you need to keep in your head. How much cognitive load you need to carry around just to understand the environment. How much work you have to put in before you go to actually do something.

What those superpowers do is help you define, build, and maintain your bounded contexts. They help you isolate things into small components that embody the Unix way. They do one thing and do it well. And that opens up so many possibilities. It gives you optionality.

Just like with Unix, you don’t know what you or your users are going to want tomorrow. But having components that are idempotent, available, durable, stable, and consistent gives you superpowers. It lets you safely combine them in new ways without having to worry about having unintended consequences on other parts of the system. And do that at the same time as other people are doing other new things.

That’s optionality. That’s flexibility. That’s having the ability to add value later, as you discover it. And that’s the ultimate superpower.

Superhero flying through a world of code

June 13, 2025 by Leon Rosenshein

Coding Genies and Code Review

kent beck genies code review code for the maintainer

Speaking of writing for the maintainer, what about coding assistants? They write code too.

First, an axiom. Just like auto-complete in your IDE is your code, code that you ask an AI assistant to write and then share as your own is your code. There’s nothing wrong with that per-se, but it’s not an excuse to not care about the implementation details.

Second, a position. Kent Beck has taken to calling coding assistants Genies. Because, just like the traditional genie, (and kind of like a compiler, when you think about it), they do their best to do exactly what you tell them. Any flexibility you give them they’ll take in the way that fits them the best. Without anthropomorphizing them, they do it in the way that’s easiest for them and provides them with the most amusement.

A green genie with poited ears writing code

When the genie writes code, they’re writing code that, in theory, does what you asked it to do. If you want it to be readable, that’s another request. If you want it to be in the same style as the rest of the code, that’s a third request. And if you want it to play well with other code and services in your system, that’s an additional set of requests. And requires you’ve given it enough context to work with.

Assuming your genie has produced code that works, and that you’ve enticed it to write unit tests as well, you’re done, right? Wrong.

Before you inflict someone else’s code on your team, with your name on it, you need to do another step. You need to review your own code. You’re already doing that, right? It’s even more important when working with a genie.

It comes back to that software engineering thing. It’s as much about being prepared for what you don’t know as it is handling what you do know. Because, It Depends.

You want to be SOLID. You want to be DRY. You want to build exactly what you need now. You want to build for the future. You want abstraction. You want clear domain boundaries. You want things done now.

You don’t want to be clever. You don’t want extra abstraction. You don’t want to eliminate future options. You don’t want to have to change things later.

You want all of those things, but they’re in tension. Some are direct opposites, while others in one category impact things in the other in non-obvious ways.

You, as the software engineer, have to find the right balance. No genie today can do that for you. They don’t have the context needed. The don’t have the experience in your specific situation.

You need to be very clear when working with the genie. You need to make sure it’s properly constrained because it doesn’t know (or care) about you. Or the maintainer. Or truthfully, even what your real goal is. It’s just doing a best-fit match to what you said. And often, that works, or at least gets you close enough to see where you’re going.

That’s when reviewing your own code really becomes important. Not in the individual lines of code, but the overall change. The gestalt if you will. Before you ask anyone else to look at the code, you need to look at it. Does it do what you want in a way that fits with what you’ve already done? In a way you and your team are prepared to live with in the future.

That’s your responsibility. To provide the right balance of solving the problem at hand without creating more problems for the future. Solving a problem, adding specificity, generally reduces optionality. How much specificity you add and how much you reduce optionality is where not something you can let your genie choose. It has no past and no future. Just the eternal now. And it optimizes¹ for the now.

So whenever you’re sharing Because you are the one that’s going to have to live with it. The genie doesn’t.

The work optimizes is doing a lot of work here. They’re optimized not for best practices, but for average practices. ↩︎

June 11, 2025 by Leon Rosenshein

Who Are You Writing To Anyway?

context cognitive load code for the maintainer software engineering martin fowler

I’ve written about this before. When you’re writing code, who are you talking to? The simplest answer is that you’re writing to the computer. To the assembler, compiler, or interpreter, depending on the language you’re using. And at some level, that that’s true.

However, there’s a lot more to it than that. As the great philosopher and programmer, Anonymous, once said

A programmer does not primarily write code; rather, he primarily writes to another programmer about his problem solution. The understanding of this fact is the final step in his maturation as technician.

In fact, we generally, have at least three audiences. Of course, the assembler/compiler/interpreter is one of them. The second audience is the folks that review the code after¹ it’s written or pair with you while it’s being written. The third, and most often forgotten, is the maintainer.

The thing is, your compiler (or assembler or interpreter) doesn’t understand what you wrote. Instead, it follows a very strict set of rules to translate what you wrote into a series of 1’s and 0’s that the computer can execute. (NB: The computer doesn’t understand it either. Ij just knows how to execute it). It’s completely on you, the developer, to ensure that what you write does what you want. The compiler doesn’t care. As long as you follow it’s grammar rules it will come up with something for the computer to do. If it does what you want and expect it to do, great.

If not, someone else is going to need to read the code. And that person is going to need to understand it. That person might be you 10 minutes after you wrote it, when you _probably remember what you meant. In that case it’s relatively easy to figure out where what you wrote and what you meant aren’t the same. Another very common case is you, 6 months or 6 years after you stopped thinking about that bit of code.

In that case, you look at in confusion, wonder what idiot came up with it, check git blame, and silently yell at your former self. Then you piece together what you meant, dust of the old constraints you mostly forgot about, and then move forward.

Worst case is that when you check git blame you find that the most recent changes where 5 years ago, and all of the people who ever touched the code no longer work there. You’re on your own. You need to figure out what the code is actually doing, why it does it that way, and what odd constraint, that you don’t know about yet, has changed.

And that’s the person you’re writing the code to. The maintainer. The person who has to live with the code long after the thrill is gone. As John Woods said,

Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.

I usually maintain my own code, so the as-if is true.

Or maybe you prefer Martin Fowler’s way of saying it.

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

So write your code in a simple, straightforward manner. Make it understandable. Reduce cognitive load. Don’t require that the reader carry a lot of context. Don’t be clever. Don’t be a cowboy.

Because, if we’re doing it correctly, we ALL read way more code than we write.

Gandalf saying One does not simply write code

Code reviews and their audience are a whole different topic ↩︎

June 9, 2025 by Leon Rosenshein

AI Isn’t The Answer

it depends ted nelson data systems thinking software engineering ai

If AI isn’t the answer, what is? Before you can say what the answer is, you need to be clear what the question is. If the question is “How can I get more code written faster?”, then AI might be the right answer. On the other hand, if the question is “How can I provide more value to the user faster?”, then AI probably isn’t the answer. At least not in any meaningful way. And not with what we call AI¹ today.

Let’s be clear. One thing today’s AI can do, and do quickly, is churn out code. It will probably compile. And might even do something close to what you want. That’s probably OK as a proof of concept. It’s fast, and not completely wrong. If that’s your goal, then go for it.

On the other hand, if you’re trying to do something that hasn’t been done, deal with your specific constraints, or work with a complex, deeply interconnected system (think legacy code), typing speed is not what’s holding you back.

Consider this

Learning to program has no more to do with designing interactive software than learning to touch type has to do with writing poetry. – Ted Nelson

Yes, programming is a part of what we do as software engineers. We do eventually need to write things down in a way the computer can understand them (after translation by the compiler). But just as important as the code we do write is the way the code we write is organized. In functions, files, libraries, executables, and services (to name a few). It needs to be written is a way to make it easy to modify. As we learn more about what the code is supposed to do and what we need from it, we need the code, and it’s related tests) to be easy to update.

Also, just as important is how we organize our data. Both at rest (in memory, on disk, or in a database) and in motion (method arguments, cache layout, network messages, and more). It needs to support our use cases. It needs to be hard to get the data into a bad state. How your data is arranged can impact what your program can do in the future even more than the code itself.

Even more important than the code that gets written is the code you don’t write. Some abstraction, but not so much that it hides your meaning. Your code needs the ability to handle incorrect inputs, but you don’t want to go to far. You need to expect things to change. If you’re writing a tool to sort things, you probably want to be able to sort on multiple parts of your data structure, or at least know how that fits into your system. But you shouldn’t also build it to handle language translation. If that becomes necessary, so be it, but don’t build that in at first.

Because what you’re doing is software engineering. Making the engineering tradeoffs that fit your exact situation. Not someone else’s exact situation. Of course, It Depends, but figuring out what it depends on takes judgement. It takes awareness. It takes larger contexts. It takes intelligence. Which is exactly what AI doesn’t have. Which is why AI isn’t the answer. There are lots of ways AI can help², but it’s not ready to help with the thinking parts yet.

What we call AI today isn’t artificial, and it isn’t intelligent. More on this later, but the quick overview is that it’s just regurgitating what most people have done in similar circumstances. It doesn’t understand what you did, what it’s looking and, and what it proposes. It’s just an mostly well trained “if this then that” machine. ↩︎
Writing bits and pieces of code is someplace it can help. Refactoring is another good place. Finding common errors is another. We’ve had these things in our IDEs for year, but today’s AI can handle them better because those things are about pattern matching and “If this then that”. Which is right in today’s AI’s sweet spot. ↩︎

June 6, 2025 by Leon Rosenshein

But The Requirements Just Changed

fun domain driven design constraints requirements change

Or at least someone told you they did. But did they really change? Or do you just understand them better?

Here’s the thing. If you go to your user, not the stakeholder, not the purchasing agent, not the product owner, but the user, and get them to describe the problem they need to have solved, it won’t change much. They may come up with more problems as you talk, but problems rarely change or go away by themselves. As you build up your ubiquitous language with the user, you’ll get more detail. As you provide the user counter-examples you’ll find out what they don’t want. As you understand the problem domain better you’ll see the solution space more clearly.

And that will lead you to recognizing that you need to build something other than what you first thought you needed to build. You’ll realize you were initially wrong. Or more accurately, initially you weren’t quite right.

But the real requirements, the problems you’re trying to solve, haven’t changed.

I’ve been doing software engineering for a long time now. And almost every time I thought the requirements had changed, what actually happened was that I learned more details about what the problem really was and constraints I had to keep in mind as I built the solution.

This has been true when doing tradeoff analysis for the US Air Force. That was a multi-variate optimization situation. As we optimized, we learned more about the constraints we had to optimize within. We learned about new factors we needed to consider, and we learned that factors we thought were important really weren’t. Those weren’t actual requirement changes. What changed was our understand of what the requirements meant. We thought we knew at the start, but we didn’t.

It was true when making games. The requirement there was easy. Make it fun. The trick there was understanding what fun was. Because it was different for different games. The market for a Star Trek game had its definition of fun. Simulation lovers had different ideas. Fun was different for different kinds of simulations. Flight game fans had one set of ideas. Rail fans had a different one. Even in the flight genre there were differences. Sure, there was some overlap in the demographics for Falcon 4.0, Microsoft Flight Simulator, Combat Flight Simulator, Crimson Skys, and Fighter Ace. Even across those games there were very different ideas of what fun looked like. We started with what we thought was fun, but only through extensive and continuous conversations with users did we understand what they thought was fun.

It’s still true when building infrastructure tools for internal customers. The more we talk to our customers, the better we understand what their pain really looks like, and what we can do to solve it. We look at our metrics, hear a complaint or two, and think we know exactly what our customers want. And yet, when we build it and share it with them, they explain to us exactly where we’re wrong. So we meet them where they are.

In all of those cases, and others, what the users wanted and needed, the problems they wanted us to solve, weren’t changing. Just our understanding of them. And in some cases, their understanding of what was possible¹. In all cases, we learn together. And use that learning to really solve the problem.

In fact, about the only time that software requirements have actually changed on me has been in school. And in those cases, there was no user. There was no problem to solve. There was just a professor telling us to write some code that did a certain thing. Then, later, they would tell us something new. And we would have to solve the problem again, acknowledging and incorporating the new knowledge. It wasn’t deeper understanding of the user’s problem because there was no user, and there was no problem to understand. Just a set of requirements to meet. That’s not software engineering. That’s being a code monkey in a feature factory. And that’s something I avoid, because it’s not fulfilling, I don’t learn anything, and above all, it’s not fun.

User education starts long before, and goes way beyond, writing the instruction manual. It starts with teaching the user about what’s possible. Henry Ford didn’t actually say ““If I had asked people what they wanted, they would have said faster horses.”, but your user’s experience absolutely limits their requests. As engineers, it’s on us to help them understand what is possible. And use what’s possible to solve the problem. ↩︎

June 4, 2025 by Leon Rosenshein

New Code Vs. More Code

it depends chesterton's fence code for the maintainer bounded context cognitive load

I’ve said before that as developers we read more than we write. And that’s still true. Here’s something else we do more often than we probably think. We modify (extend) existing code way more often than we write brand new code.

Writing green-field code is fun. It’s easy. There are far fewer constraints. But how often are you really writing green-field code? If you want to be pedantic about it (and sometimes I do), everything after the first time you save you code is working on existing code. You could even go further and say every character after the first one typed is extending, but that’s a little extreme. Even if you think of it as the first time you share your code with others (git push for most of us these days), every subsequent push of the same file is modifying existing code.

Given that, it’s incumbent on all of us to make sure it’s as easy as possible to extend the code, and as hard as possible break the code. You want clean code, with good abstractions. You want the code to be DRY, SOLID, KISS, and a host of other acronyms. The question, of course, is how far to go with that idea? The answer, like so many of my other answers, is It Depends. Because our ability to handle cognitive load is limited. Very limited.

Taken to the extreme, you end up with the Fizz Buzz Enterprise Edition. It’s fully abstracted, extensible, and compartmentalized. It’s exactly what you want, and what you’ll get, if you follow all of the rules to their logical extreme. It’s also terrible code to pick up and maintain. Which is the exact opposite of what you should be doing.

Instead of extremes, you have to balance them. Not too much of any one thing. But not none of anything either.

You want the code to be legible. Legible in that when you look at it you know what it does. And what it doesn’t do. You want comments explaining not what it does, but why it does it that way instead of another (see Chesterton’s Fence). So that you don’t change it without understanding its purpose. You want comments explaining the constraints that made it that way. So you know when you should change it. Legible code makes it easy to know where you stand at any given moment. That’s the enabler for easier extension and harder to break.

You want abstraction because it helps reduce cognitive load. When you’re working on one thing, you don’t want to have to worry about anything else. So you want just enough abstraction to let you focus on the task at hand, and understand the things related to it well enough to work with them, without having to know all the details. Too much abstraction and you don’t know how to work with the rest of the system. Not enough and the cognitive load overwhelms you and things get lost in the details. The right abstractions make things easier to extend.

You want clear boundaries. Partly because they too help reduce cognitive load, but also because they tell you where your extensions should be. They belong with things in the same domain. Clean domain boundaries reduce coupling. They reduce unintended and unexpected side effects. They keep things legible for the maintainer. Maintaining clear boundaries also makes things easier to extend.

You want guardrails because they tell you when you’re drifting away from where you’re supposed to be, and help you get back. Unit, integration, and system tests give you that. These tests tell you when your interfaces start to function differently. Then you can decide if those changes are wrong or not. Hint, Hyrum’s Law tells us that there are probably some people think they are wrong. But maybe not. Regardless, without those guardrails you wouldn’t even know to check. Good guardrails make it hard to break things unintentionally.

Because when you get down to it, we almost never write new, green-field, code. Instead, we almost always extend existing code. So we should make it as easy on ourselves as possible.

June 2, 2025 by Leon Rosenshein

Listen to Your Data

it depends bounded context composition object oriented programming advent

I’ve touched on the importance of listening to your data before, but I decided that the topic is worth revisiting. That time it was about the difference between 0, 1, and many. As a side note, I mentioned the relationship between data and Object Oriented Programming, and how your data can tell you what your objects are.

That’s still true. When people ask me to take a look at their design and architecture and wonder what the right answer is, my first answer is usually, of course, It Depends. And when they as what it depends on, I generally say it’s a combination of two things. First, the problem you’re trying to solve, and your data. They’ve usually thought about the problem they’re solving, but when they often haven’t thought about the data. So I tell them, go listen to your data and figure out what it’s trying to tell you.

It’s about boundaries and bounded contexts. It’s about composition vs inheritance. It’s about cardinality, what things change together and what things change by themselves. It’s all of those things. How you store, access, and change your data will help you design systems that work with your data instead of fighting against it. From your internal and public APIs to your user interfaces to your logging and alerting.

But it’s also more than that. You have to listen to your data not only about how you store, access, and change it, but about how you process it. What parts do you need to process sequentially, and what parts can you process in parallel? Are you processing real-time streams, or are you analyzing huge piles of historical data? Do you want/need recency bias in your data or do you need to have long term trends? Or maybe both? All of this is going to impact your system.

The trick is to learn to listen to your data at small scale. Where you have the luxury of being able to try out something and see what the pain points are while you’re able to get things to work. Try different data structures. See what kind of algorithms they push you towards. See what the makes them work well, and what gets in the way. You can usually make any data structure work with any algorithm, but some things work better together. Trees lend themselves to depth first searches. There are other ways to do depth first, but it’s a lot easier with a tree than with an array.

One of the hard parts about learning like this is having a source of problems that have answers_. So you can check yourself. One possible source is an undergraduate comp-sci class. In many cases you can find an online class with problems and valid answers. Another is interview prep systems. Like leetcode problems. In general, I hate leetcode as an interview technique, for lots of reasons that I’ll get into another time, but as a learning opportunity, I think they’re a great place to start. Or, if you want a bit of competition to spur you on, another good place is the Advent of Code. Once you’re done speed-running it for points and have a working answer, take some time to experiment with the problem space.

Regardless of how you do it, once you learn how to listen to your data, you’ll hear it talking to you in unexpected ways. You’ll be able to look at a problem with large scale data and see how to break it down, categorize it, and work with it. So your solution works with your data. Not just today, but tomorrow as well.

May 30, 2025 by Leon Rosenshein

What Is Performance Anyway?

it depends performance systems thinking context

Performance is important. It’s also very context dependent. High performance can mean different things at different times to different people. And what your target audience is going to consider important is never fully known unit that audience actually gets your software in their hands.

That said, there are some areas that almost always go into what people consider high performing software. Things like responsiveness, latency, total run time, throughput, and resource efficiency. And of course, the actual result. If you’re talking about performance, you’re probably talking about one or more of those things.

Responsiveness

If the thing you’re building is responsive, whether you’re building hardware or software, people will feel good about it. People want to feel like they have some level of influence, if not outright control, over what they’re working on. That’s the autonomy part of what Daniel Pink talked about in Drive. From the audible click of a physical switch or on-screen button to the time a web page shows the first pixel, the shorter the time a user has to wait for something to happen, the more performant they’ll think it is.

Latency

Closely related to responsiveness, is latency. Not the time between the user’s action and the first response, but the time between the user’s action and the thing the user wants being finished. One of the big differences between cheap digital cameras and higher performance ones, outside of actually taking a better picture, was their latency. When you pushed the button on a cheap camera, it would typically beep or click immediately (very responsive), then think for a while, adjust the focus, shutter speed, and aperture, and finally take the picture. By which time the subject had moved out of the frame. A higher end camera, on the other hand, would beep just as soon, but the time taken to adjust things before the picture was taken was much shorter. You got a picture of the thing you wanted because it didn’t have time to move out of the frame.

Total Run Time

Total run time is another big one. How long does it take to do the thing? The less time it takes, the more performant the system is. Going back to those cameras, the cheap camera might take 2 seconds to go from button click to image stored on disk, while the more expensive one could do it in a second. If you prefer car analogies, how long does it take the car to go 300 miles (assuming you’re not constrained by those pesky speed limits)? One car might take 4 hours to go 300 miles. A high-performance care might be able to do it in 2 or 3.

Throughput

Just like responsiveness and latency are related, total run time and throughput are related. It’s not just how long something takes, but how long between each one, and how many can you do at once. Throughput becomes important when you have a large pile of things to do. Throughput tells you how long it will take to get everything done, not just the first one. If you’re moving one person a sports car has higher performance than a bus. If you’re moving 50 people, the bus has higher performance.

Resource Efficiency

Finally, there’s resource efficiency. For this discussion, resources consist of things like CPU cycles, memory, disk space and power. Again, this becomes really relevant at scale. If you need to do one thing, it doesn’t matter much if it takes 1 kilowatt-hour or 10 kilowatt-hours. On the other hand, if you need to do one million of them, the difference between it taking 1 or 1.1 kilowatt-hours makes a big difference.

When it comes to building high performance systems you really need context. You need to know what you’re optimizing for before you try to maximize performance. Not just what’s important, but how things are important relative to each other. That’s real engineering.

Use case 1 – Moving people

Let’s say you’ve got two vehicles, a sports car, and a bus. Which one is higher performance? Like I said, it depends. It depends on if you need to get the first person to the new location fastest, or the most people there. It depends on how many vehicles the road can handle. It depends on whst kind of fuel you have. And what kind of drivers you have.

	Sports Car	Bus
Top Speed	150 MPH	75 MPH
Turn Around Time	.1 hr	.5 hr
Count	4	2
Extra Seats	3	50
Miles / gallon	12	8

Assuming a 300 mile trip, the performance looks something like this:

	Sports Car	Bus
Responsiveness	2 hrs	4 hrs
Latency	2 hrs	4 hrs
Run Time	4.1 hrs	9 hrs
Throughput	~¾ people / hr	~5 people / hr
Fuel used / person delivered	~16.6 gal/person	~1.5

The sports car can get the first 3 people there the fastest, so if nothing else is important the sports car has higher performance. If you need to get 50 people, then the bus can do it in 4 hours, while the sports car would take ~67 hours. In that case the bus is higher performance.

Use case 2 - Real time vs Batch

In a previous role I was responsible for processing terabytes of data with processes that took hours to complete and had multiple human in the loop steps. While working at a company whose business was predicated on instant responses to millions of user requests per day. And those instant responses were where the money was. Literally. Those instant responses were about prices and payments and user choice. Performance there was all about getting the user a reasonable response as soon as possible. It had to be responsive immediately and quickly give the user a choice to accept. It wasn’t about the best answer. It was about the fastest. And to top it off, load was bursty. There were busy times and slow times. Based on time of day, weather, and special events.

Almost all of the company’s systems were designed and built for that use case. Running systems at 50-70% capacity to handle surges in load or failover. Approximations instead of exacting calculations. Because the most important thing was to keep the user involved and committing to the transaction. The systems worked. Amazingly well.

But they didn’t work for my use cases. In my use cases there was always more work to do, and it was more important to get it right than to get the result fast. Step times were measured in hours, not milliseconds. Hell, in some cases just loading the data took longer that most of the steps in the user-facing system took. We didn’t have tight deadlines, but we did have priorities. Sometimes more work that was more important than the work we were doing would come in, and we’d have to schedule that in.

While most of the company valued low latency and minimum run-time, we valued high throughput and efficient resource usage. Given that, the existing systems didn’t work for us. Sure, we made use of many of the same low-level components, observability, distributed processing systems, deployments, databases, etc. But we put them together very differently. We exposed things differently. Because our idea of performance was different.

The high performing system you end up building depends on what performance means to you.

So before you go and try to make your system performant, make sure you know what performance means to you. In your particular case.

Then you can optimize accordingly.

Older Newer