Recent Posts (page 4 / 65)

by Leon Rosenshein

Games And Agency

I recently finished listening to the first chapter of Worlds Beyond Number. It’s a podcast that uses D&D rules to help define a world built for storytelling. Brennan Mulligan is the Dungeon Master (DM) and world builder. He’s responsible for (almost) all of the backstory, pacing, and events. He does an amazing job of not just building a world and making it feel alive, he also provides the context that the three live characters act within. He does it in a way that makes the people playing the characters, and the folks listening in along the way, feel like they have control of their destiny, even while knowing there is a lot they don’t know about the world and it’s limitations.

Another podcast I regularly listen to is Brian Marick’s Oddly Influenced. It’s all about how folks have taken the teachings and writings of other fields and applied them to software development. A recent episode was an interview with Jessica Kerr, where they discussed the book Games: Agency as Art. Or at least that was the reason they did the interview. The interview itself was about far more that games, agency, or art.

One of the things they talked about was, of course, agency. What it was, what it wasn’t, and where it came from. Although similar to what Daniel Pink talked about in Drive, it’s a slightly different take on the idea. It approaches it from how you craft or adjust the environment to help the player have agency. In games, the boundaries of the player’s agency comes from the game designer (and the developer’s implementation of the designer’s vision). The designer provides the goals, which set the direction (purpose in Pink’s description), the player’s capabilities (Pink’s mastery), which define what the player can do, and the rules, which define what the player can’t do (the boundaries of Pink’s Autonomy). The thing about the kind of agency that a game designer can provide though, is that it has to be completely defined up front. The game has a beginning, a middle, and an end, and downloadable content aside, the designer gets no realtime feedback from the player, and has no ability to change the game after it ships.

A DM, on the other hand, such as Brennan in Worlds Beyond Number, has to do much of the same work as the game designer up front, but after that, the DM is right there in the world with the player, getting feedback and making adjustments (staying within the defined framework) that adapt and redefine both the world and the player’s agency. There’s a natural tension there, between maintaining the status quo and keeping the world operating per it’s rules, and ensuring that the players that inhabit the world are having a good time. Because if they’re not having a good time they’ll pick up their dice and go home. If they do that then the game is over. Even though it’s not thecnically player vs DM, if the players quit, the DM has clearly lost (whatever that means).

Which gets us to the Oddly Influenced part. It’s not a big stretch to think of an engineering manager (EM) /development lead as the DM in a world-building game. The EM has an initial set of goals that they want to see met. They have tools and capabilities they can provide to the team, compilers, platforms, compute and storage resources, consultants, and other teams (non-player characters in the D&D world). They also provide a set of constraints (rules) the team needs to work within. Deadlines and schedules. External regulatory requirements and internal processes that must be followed. Networks and the laws of physics. Just like a game designer or DM, they define the Goals, Capabilities, and Rules. They define the boundaries of the agency that the members of the team have.

And just like a designer or DM, a good EM uses those things levers to provide not just agency, but purpose and fulfillment. Make the goals to difficult, or just arbitrarily add a rule that makes using a capability impossible and the team (or player) gets frustrated. Conversely, make the goals too easy or provide rewards arbitrarily, and there’s no challenge or growth. The team (or player) gets bored and finds something else to do.

Also like a DM, the EM has the immediate feedback from the team. Are goals being met? Is the team “enjoying” the journey? Are they getting ahead of the goals, or keeping the goals from being met? How can the environment (cpabilities and rules) be changed to provide more fulfillment for the team while still incentivizing moving towards the goals?

The environment, however, is where things get more complicated for the EM. The game designed and the DM have complete control over the environment. That, unfortunately, isn’t so for the EM. The overall goals are given to the EM. And there’s not just one EM. Typically, not even one EM for any particular goal. The EM has to work with their partner EMs to reach the goals. Or adjust them so they can be met. Meanwhile the group of EMs is getting feedback from their managers and customers/users on the validity of what’s being built. You might even say the EMs are a team with some level of agency working within a framework of goals, capabilities, and rules.

Once again, it’s turtles all the way down. But at least at any given level, you’ve got another frame to view the situation through and to use to help make decisions so that, at that level, the designer/DM/EM knows what levers there are and the players/team knows what their level of agency is. Or, you could look at it the other way around. We’re all playing a game together. Someone else has defined the goals, capabilities, and rules. We can work with them, and each other, within and across the levels of the stack, to provide feedback to each other to jointly maximize goals met and enjoyment/fullfillment. Which is really the meta-goal.

by Leon Rosenshein

Shallow Hurry

I ran across the term shallow hurry the other day and it resonated deeply (no pun intended) with me. Shallow hurry means doing just what you’ve been doing, only doing it faster. The expectation is that you’ll get done sooner. And that might even be true. In the short term.

Typing faster will get your code written a little bit sooner, but there’s a natural limit on that. While there may be times when we’re actually limited by typing speed, that’s not usually the case. Not refactoring the code when you see an need/opportunity or not writing/running tests, on the other hand, will often get your code into production sooner. The first time. Sometimes the seocnd time. And occasionally the third time. After that, not so much. You find you’re fighting the code. You’re spending a lot of time dealing with edge cases and wierd constraints. You’re working harder, typing more and faster, and moving slower.

That’s an example of shallow hurry. It makes you faster in the moment, but long term, you’re slower. After the initial speedup, you spend more time avoiding problems than you do making forward progress. All the problems you pushed off until tomorrow are still there, and the shortcuts you took have added to that burden and made the problems interact in new, exciting, and damaging ways. So to make progress you need to keep finding corners to push the problems into. Or you bury them in another layer of abstraction, leaving the problem hidden under the covers to bite some unsuspecting maintainer in a few weeks/months.

There are lots of reasons why this might happen. Some are even existential. Back in the days of selling games in boxes on shelves in brick and mortar stores, 80%+ of sales happened between Thanksgiving and Christmas. If your box wasn’t on the shelves, you didn’t make the sale. If you missed too many sales you ran out of money, and that was the end. So, in that case, you take the chance, do the shallow hurry, and hope you get the chance to fix it.

On the other hand, most of the reasons aren’t nearly that existential. Instead, the drive for Shallow Hurry often comes from internal biases and misaligned incentives. Biases like sunk cost, anchoring and overconfidence. You’ve made a choice and put some effort into it. You don’t want to admit you might have made a mistake. Other’s around you fixate on the proposed solution, and of course, you know you’ve got to solve one more small problem or write one simple function, and you’ll reach the goal. Just move a little faster because you’re almost there.

Add to that a typical incentive system. Heros are rewarded for putting in extra effort, for executing on the plan and rescuing the project. At the same time, questioning the plan is seen as not being a team player, and discouraged. Include a deadline coming up that you’ve promised to meet, and you find folks doubling down on what they’re already do. Do it more. Do it faster. Get to the end and get the prize. Just before disaster strikes.

Because what often happens is that you’ve pushed a mountain of problems out in front of you. You’ve managed to reach the goal, but as is often the case, what you’ve reached is an intermediate goal. So you look to take the next step, and find there isn’t one. You’ve backed yourself into a corner, and before you can move forward, you need to figure out a path forward. You might even need to change your goal, just like Mike Mulligan and His Steam Shovel.

Luckily, as easy as it is to slip into shallow hurry, it’s just as easy to recognize. When you find yourself avoiding even looking at options, you might be dealing with shallow hurry. When you start thinking about ways to spend a few more hours just trying random things to see how it works, you might be dealing with shallow hurry. And when you’re spending more and more time working on the same things, but the results aren’t changing, you’re probably dealing with shallow hurry.

And that’s the time to take a step back, look at what you’re doing, why you’re doing it, and ask yourself my favorite question. “What are you really trying to do here?”

by Leon Rosenshein

Lead With the Why, Not the Way

Taking a break from the book reviews, but sticking with the theme of software development being a social endeavor, there are many ways to get teams working on things together and doing them in a similar fashion. Some work better than others.

One of the best ways to make sure that everyone is working together and in a similar fashion is to work ensemble style. If the team is sitting together talking about and editing the same code at the same time then, by definition they’re working together and in a similar fashion. I’ve had really good experiences with this with small groups and small tasks, but folks I generally trust and respect have reported good results with larger groups and longer term projects. Seems like a good goal to strive for.

That said, that’s not always going to be possible. For any number of organizational and structural reasons, it often doesn’t make sense for an entire team to be working on the exact same thing at the exact same time. So how can you get everyone working together?

One way is by fiat. Lay down the law and demand that people do what you tell them, exactly when you tell them, and in the precise fashion or have ordained. That might work. Once or twice. As long as everything goes exactly as you expected. Sometimes that’s the right approach, but only sometimes, and not over the long term. Iif you stick with that method then pretty soon a situation will arise where your exact instructions cause things to stop, or perhaps make things worse. That’s probably not your desired outcome, so that’s not a good approach.

Or, you could go to the other extreme. Tell people that they should all work together and get things done, then walk away. Again, that might work a time or two, but pretty soon everyone is going to have their own interpretation of what it means and how they should be working. That ends up in one of two places. Chaos, with everyone doing what they think you mean, or someone deciding and enforcing, again by fiat, their will. That might be what you want, and result in a success or two, but long-term it still doesn’t work.

Which leads us back to autonomy, alignment, purpose, and urgency You’ve got autonomy and urgency. How can you manage alignment and purpose? You can do that with what Alexis de Tocqueville called enlightened self-interest. The idea that you get the best results when people are working for a desired common goal, not just because it’s the goal, but also because it’s good for them. Said another way, like this entry’s title, Lead With the Why, Not the Way.

The best way to get a team working together, towards the same goal, in a similar fashion, is to help them understand why they want to do that. How it helps the team reach its goal and how it helps them reach their goals. It aligns intrinsic and extrinsic goals. It sets shared purpose. It gives people a reason to want to achieve the goals. It’s even more powerful when your why has multiple levels. It’s what the customer wants. It increases sales. It reduces support burden. It makes it easier to add the feature everyone wants to build.

And now that I think of it, I am talking about Governing the Commons. That’s all about how to set up a system. To give the people in the system the appropriate why’s so that they do what’s best for the system. Because in the long run, that’s also what’s best for themselves.

by Leon Rosenshein

How Buildings Learn

Now that I’ve written about Seeing Like A State, I want to talk about How Buildings Learn. How Buildings Learn is, in many ways, a counterpoint to Seeing Like a State. It also has a lot of relevance to software design.

How Buildings Learn starts from the premise that over-architecting is bad. That the best way ensure longevity is to architect not only for the now, but also for the future. And then, when you get feedback, listen to it and adapt. It’s a very agile way of approaching building architecture.

Brand goes into some detail about how designing for specific constraints is limiting. Of course, that makes sense. Every time you optimize of one thing, you’re not optimizing for something else. Instead, what Brand recommends is simple designs that you can adjust as you learn the true usage patterns.

There are multiple examples where premature optimization of architecture has caused problems. Consider the Fuller Dome. If all you’re trying to do is minimize resource usage it’s a great idea. Or if you’re building in a zero gravity environment. If you’re not, then you end up with a lot of wasted space in the top/center of the dome. Other examples are Falling Water and Villa Savoye. Both are examples of form trumping function and causing problems later on.

Both construction and software use the term architecture, but does Brand’s approach really apply? After all, the Unix Way is all about being specific. Doing one thing and doing it well. Which is the opposite of what Brand proposes. Or is it? The Unix way is not just about doing one thing. It’s also about composability. Which is really what Brand is getting at. Build something that is easy to subdivide into parts that are composable. Build the parts that meet your current need. When things change or you learn you need more, adjust the parts to match the new understanding. That’s evolutionary architecture.

And that’s in direct contrast to pre-defined legibility. A state (or organization) is often looking for control and predictability. So instead of building something that could work and them adjusting it to fit the exact needs, it asks for detailed, involved, plans. And then it sticks with them, even if the reality on the ground shows problems (see Villa Savoye above).

Another, more software based approach, but still with an architectural basis, is The Cathedral and the Bazzar. In it, Raymond describes the differences between working from a centrally defined/controlled plan and working from a set of common goals. According to Raymond, the Bazaar will get you a superior result. He’s got more than a little evidence to prove it.

However, the model of starting with something adaptable and a set of common goals and then building the perfect building (or piece of software or really any other shared resource) comes with its own problems. Not the least of which is diffusion of responsibility. How you handle that issue is critical to having a good outcome when buildings (or code) learn. Anarchy is not the way to reach a solution that optimizes for what everyone is trying to get done.

Which leads right to Ostrom’s Governing the Commons. But that’s a topic for another day.

by Leon Rosenshein

Seeing Like a State

I was going to compare and contrast Scott’s Seeing Like A State with Brand’s How Buildings Learn, but when I went to find the link to what I wrote, I realized that How Buildings Learn is going to have to wait, because, somehow, I haven’t directly talked about Seeing Like A State. I have mentioned legibility though, which is directionally similar.

In Seeing Like a State, Scott talks about the tendency of the state, really any large organization, to want to be able to measure, record, and control a system. Making it measurable means making it possible to record it in a ledger. Also, the organization (state) has a model that is used to predict the future. If you combine the record of how things were, with the model of how things will be, it’s not a big leap to believing you can control the future by controlling the measurements. And if you’ve made that leap you get to feel good about things. You have predictability. The model tells you what to expect. You have agency. Your results are the inputs to the model, so you have direct control over the results.

Unfortunately, things almost never work out that way. Models are, at best, approximations. So the results are at best approximations of the real world. The measurements that go into the model are often approximations as well. And when they’re not, they’re samples taken at a specific point in time, with a specific context. You can guess what the result of using approximations as inputs to a model that is also an approximation. You get a prediction that sometimes has some similarity with reality, but very often doesn’t. You often run into the cobra effect.

This applies to software development as much as it applies to government. As much as software development is about making complex systems out of tiny parts that do one thing, it’s also a social activity. Just like organizations and states, you can’t predict the output of software development without recognizing that there are people involved and including their own internal thoughts and motivations. And while those things are generally qualitatively knowable, until someone like Hari Seldon arrives and gives us psychohistory, it’s not going to be legible.

Which means that the key takeaway from Seeing Like A State is not that you can measure and predict the future, but that you can’t. Or at least, you can’t predict to the level of precision and accuracy that you think you can. But that doesn’t mean you shouldn’t measure, or that you shouldn’t use models to predict. It just means you need to be much more thoughtful about it. You need to work with the system, from the inside. It’s much more about Governing the Commons, than seeing like a state. But that, like How Buildings Learn, is a topic for another day.

by Leon Rosenshein

Code Coverage Is NOT useless

Mini rant today. There are lots of teams across the software industry that are called some variation of “Software Quality”. That’s a lovely term. It means different things to different people. There are (at least) two kinds of quality at play here. Internal software quality (ISQ) and external software quality (ESQ). ESQ is about correctness and suitability for the task at hand. ISQ is about the code itself, not whether or not it works as specified. Not all quality teams are responsible for both kinds of quality.

Furthermore, as much as people want it to mean that the team called “Software Quality” is responsible for ensuring that the entire org is building software with both internal and external quality, that isn’t the case. Those teams are not, and cannot be, responsible for what others do. After all, they’re not the ones writing the code. What it does mean, and what they can, and generally do, do, is that they are responsible for defining and promoting good practices and especially, for pointing out places in the codebase where the code misses the mark.

There are two very important points in that last sentence. The first is that the quality team’s job is to identify where the code misses the mark. NOT the developers. Code ownership is important, and people write the code, but it’s important to distinguish between problems with code and process and problems with people. That, however, is a topic for another time.

The other point, and where I’m going with today’s post, is the pointing out part. The quality team’s job is to point out, with comparable, if not truly objective values, how much ISQ the code has. There are lots of ways to do that. Things like cyclomatic complexity, lint/static analysis warnings, code sanitizer checks, or code coverage percentages. Those measures are very objective. There are X lint errors. Your tests execute Y percent of your codebase and cover Z percent of the branch decisions. And you can track those numbers over time. Are they getting closer to your goal or further? You can argue the value of all of those metrics, but they’re (relatively) easy to calculate, so they’re easy to report and track.

Which, finally, gets us to today’s rant. I ran across this article that that says code coverage is a useless metric. I have a real problem with that. I’m more than happy to discuss the value of code coverage metrics with anyone. I know that you can have 100% code coverage and still have bugs. It’s easy to get to a fairly high percentage of code coverage and not say anything about correctness. In complex systems with significant amounts of emergent behavior it’s even harder to get correctness from low level unit tests. Just look at that article.

What bothers me most about that article is the click-baity title and the initial premise. It starts from “Because it’s possible for a bad (or at least uncaring) actor to get great coverage and not find bugs, coverage metrics are useless.” If you have that approach to management, you’re going to get what you measure. To me, code coverage is a signal. A signal you need to balance with all of the other signals. Letting one signal overpower all the others is hiding the truth. And like any useful signal, its absence is just as enlightening as its presence. If you have a test suite that you think fully exercises your API and there are large areas of code without coverage, why do you even have that code? If you really don’t need it remove it. Maybe your domain breakdown is wrong and it belongs somewhere else? Should it be moved? If you find that there are swaths of code that are untestable because you can’t craft inputs that exercise them, do you need a refactor? Is this an opportunity for dependency injection?

So the next time someone tells you that code coverage is a useless metric, maybe the problem isn’t the metric, it’s how they’re using code coverage. That’s an opportunity for education, and that’s always a good thing.

by Leon Rosenshein

Fly Like An Eagle

I’ve talked about time before. It passes, and there’s not much you can do about that. Even in a simulator, time passes. That adds a lot of complexity. Especially in keeping track of things. And when they happened. And when you found out about them. And when someone asks you about it.

I’ve talked about time and dates being hard to deal with before. Then there’s the winter solstice, which merges time, dates, durations, and the English language. You end up with something that is hard to track, hard to talk about, and, more to the point, hard to reason with and hard to program for.

Even if you go with the standard unidirectional time, there are still a lot of things to keep track of. There’s the very simple side of it. What happened and when. I started working at Aurora in January 2021. That’s straight-forward. I stopped working there in February of 2022. Also straight-forward. It’s pretty easy to keep track of that. That means I worked at Aurora for just over 13 months. Very simple. But it gets more complicated. I also started working at Aurora in May of 2023. So I have 2 start dates. And I’ve worked for Aurora for almost 15 months. It’s also been almost 30 months since I started working for Aurora. So that’s a great example of how knowing a start date isn’t nearly enough to really know what happened.

Another place where time isn’t as simple is time zones and things like daylight savings time. In the United States, Which means twice a year, in most places (but not all) the clocks change, either forward or backward. Outside of those times, it’s easy to tell what time it is, but during those missing/added hours it gets a little odd. Add in time zones and it gets harder.

What makes it even worse, is that the rules for time zones and daylight savings time change. So if you ask when daylight savings time ends for a given year, you first need to find the rules that were in effect on that date, in that location. Which is very likely to be different than what the current rules are for your location. As for asking about the future, you can make a prediction, but you won’t know for sure until it happens.

Another thing you need to keep track of is the difference between when something happens and when you find out about it. One of the more common places where this happens is around payroll. On any given day you have a pay rate. It might be hourly, weekly, monthly, annual, or even a percentage of something else, like sales. That’s simple (except for things like time duration and daylight savings time). But what happens when there’s a change to pay rate? Sometimes it’s forward looking, and that’s not too bad. On some specified future date, the rate changes, so when the date arrives you change the calculation, and all is well.

But what happens when it’s a retroactive change? As of the first of last month, your new rate is 5% higher. Now you need to go back and calculate a new payment, subtract what was paid, then pay the delta. Again, not too bad, as long as you remember to do it. Consider this though. On Aug 1st you’re told that as of June 1st your pay rate has been increased. Great. Congratulations. But you just applied for a new mortgage on July 1st and you told them your pay was X. You were being honest, but on Aug 1st you find out that you were wrong. Does it matter? Maybe. Maybe not. But it’s real, and it happens. So you need to keep track of what happens, when it happens (take effect) and when you found out about it. Because all of those things change the answer you’ll give when asked a question with a temporal component.

As much as we’d like to make Time Stand Still, as Steve Miller said, time keeps on slippin’, slippin’, slippin’ Into the future. We’d at least like time to be linear, monotonic, and never go backwards, but that’s not the way things are. At least not as people experience it. To physics, time might always go foward at a constant rate, but to the people who live it, things aren’t as simple. Things happen in their own time. We find out about them in out own time. Sometimes right as it’s happening. Sometimes before so we can plan for it. And sometimes long after things happen. And we need to keep track of all of that.

by Leon Rosenshein

Milestones Vs. Steppingstones

In software, we’re very familiar with the idea of a milestone, but not steppingstones. Which is odd, because the two terms are very similar. Where does the terms come from? Like many things in the western world, the term milestoone comes from the Roman Empire. The Roman Empire did lots of things throughout Europe and Asia. Some good, some bad. One thing they did really well was build roads. Good, solid roads that you could count on to get you from here to there, regardless of the season or weather. You also knew where you were, because they put milestones along the road. At fixed, well known intervals (every mile) along the major roads was a marker, a milestone, that you could use to know how much progress you had made.

These days we have mile markers along our major roads, not actual stones, but we still use the term. In projects we use the term to mark significant points along the project’s journey from start to finish. They’re usually big, complex, demo-able things with fixed dates. They can be pretty important. They are almost always something fairly concrete and definable in the domain the user of your software can understand.

Steppingstones, on the other hand, aren’t something we talk about much. While milestones are the markers along the way that let us know how far we’ve come, steppingstones, on the other hand, are the little increments you use as you proceed from milestone to milestone. They’re solid, well anchored, stable places you can step to along the way. They usually help you to avoid falling into the water or sinking into the mud, but you can use steppingstones any time you need a place along the way to keep from making a mess or getting stuck.

In software we love to talk about analogies. To the stakeholders, the people who are not closely involved in the development of the software, but are responsible for ensuring the project succeeds, and often also responsible for providing resources, milestones often get used to provide confidence. Confidence that things are proceeding at the expected pace, and that the result will be something like what they’re expecting, and that it will arrive on the date its expected.

For those directly working on the project, the implementors, milestones provide a goal to work towards. They explain, in generally plain language, the functionality that someone who isn’t deeply involved with the project is supposed to be able to see. They’re a little bit squishy, because they don’t describe all of the possible edge cases, problems, and oddities along the way, but that’s a good thing. They let you figure out how to meet the requirements. And when the requirements don’t make sense, they give an opportunity and a forum to explain, again, in simple language, why they don’t make sense. Even more important, they give you a date. A time when the result is expected. That’s really good for helping you focus on what’s important. Focusing on which decisions need to be made now, and which decisions can (and should) wait until later.

Just like milestones have an analogy in software development, steppingstones have one too. As with milestones, stakeholders and implementors view steppingstones differently. But where they both see milestones as important, stakeholders generally don’t care about any of the details of the steppingstones. In fact, as long as you don’t get fall of the path, they don’t even want to hear about the steppingstones. They’re an implementation detail left to the implementors. For the implementors though, steppingstones are critical. They’re the stuff of the day-to-day work. Often you can’t see more than 2 or three steppingstones in front of you, so you can’t pick out which one you’re going to use until you get there. And where you find yourself directly impacts the choices you have on where to go next. You often have some idea of the steppingstones along the way, but which exact ones you end up using you won’t know until you get there.

Here’s another way to think about it. At the beginning of Raiders of the Lost Ark Indy is trying to get a golden idol from a lost temple. He knows his milestones. Find the temple. Find the entrance, Find the idol in the temple. Get out, Get home. He has a certain amount of supplies and tools, and he plans his route accordingly. What he can’t do beforehand though, is plan how do to each of those things. He knows there are booby traps along the way, but he doesn’t know what they are until he gets there. So he finds the steppingstones as he comes to them. In the room with the idol, he literally needs has to choose the correct first steppingstone before he can even start looking for the next one until he gets to the idol.

When you think a little more deeply about it, the difference between a milestone and a steppingstone is more of a question of scope and viewpoint than it is an objective reality. Just as software architecture can be seen as software design at a different scale, your steppingstones could be someone’s milestones, and your milestones are probably viewed as steppingstones by someone else. Which is another way of saying we need to think about the steppingstones along the way. And take many more much smaller steps.

by Leon Rosenshein

Complexity And Congitive Load

Software design is not about minimizing design complexity, but rather spending our complexity budget where it can do the most good. — Kent Beck

Let’s face it. Very often the systems we build are complex. And they’re complex in many different ways. Ways you just need to deal with. And it’s got nothing to do with how easy (or hard) it is to explain the task in English.

Sometimes the complexity is in the domain. In the US, if you’re writing tax software then you have the complexity of the federal tax laws, which are at best ambiguous, and probably contradictory. Add to that taxes for state and local jurisdictions. And foreign work and income. And where you live. And where you work.

Other times the complexity is in the details. You would think Time is the most monotonically increasing thing there is, but it isn’t that simple. Time is a lot more complex than you think. The same applies to people’s names and addresses. In fact, keeping track of pretty much any personal information is more complicated than you think. And that’s before you think about the privacy implications of storing that data.

It can also be scale that makes for complexity. It’s (relatively) easy to handle 10 transactions per second, but if you need to handle 10 million without adding latency that’s a whole different level of complexity. Finding the longest word in a list of 10 words is easy. Finding the longest word in Tolstoy’s War And Peace is a much more complex. And that’s not even thinking about which language you’re counting in.

We can’t get rid of the complexity, so compartmentalizing it helps. Providing the right level of abstraction hides the complexity. Behind the abstraction you only need to worry about the complexity. Outside the abstraction there is no complexity. You only need to think about the problem you’re solving and don’t need to think about the complex parts.

Now we’ve talking about cognitive load. It’s in the title and something I’ve written about before. It’s a measure of how many things you need to keep thinking about and be aware of that aren’t the problem you’re trying to solve, but are critical to solving the problem. The more you can reduce the cognitive load, the less effort you need to put into the ancillary problems, the more effort you can put into solving the problem you’re trying to solve.

Which is what Beck is talking about. Figure out where the complexity in your problem is, and put your effort there. Make everything else as simple as it can be. Define the domain you work in. Don’t try to be everything to everyone, just solve the problem you’re solving. Use existing solutions. Don’t build your own encryption module, use a well vetted one. Don’t build your own database system (you might need your own tables and stored procedures, but not a new dB).

You have a problem you’re trying to solve. You have a limited about of cognitive load you can bring to bear on the problem, So spend your cognitive load (and complexity) wisely. Spend it on the part of the problem that is your value add, not somewhere you can hide it behind an existing abstraction.

by Leon Rosenshein

Monolith Is A Deployment Strategy, Not An Architecture

There was an article a few weeks ago about how the Amazon video team switched one of their tools from a distributed microservice architecture to a monolith that runs/scales on EC2. Does this mark the beginning of the end for microservices? Were we wrong to decompose all those monoliths into microservices? Should we recombine all of our microservices and serverless systems back into monoliths?

Or, is this just another case of It Depends? I say It Depends. Because the difference between a monolith based system and a microservice based system isn’t really the design and segmentation of the code. It’s in the tradeoffs you make when deploying the code. The tradeoffs you make with Conway’s Law to keep from shipping your org structure. The tradeoffs you make when you think about needing to scale part of the process, but not all of it. The tradeoffs you make for performance. The tradeoffs you make to manage cognitive load.

Sure, monoliths get a bad rap and we often think of monoliths as nothing more than a container for your Big Ball Of Mud. And sometimes they are. I’ve been involved in my share of monolithic balls of mud. But they don’t have to be that way. If you pay attention to domain driven design you can have a well written monolith. With separation of concerns. With clean boundaries. With good abstractions that keep your cognitive load down.

At the same time, we think of microservices as the answer to all of our scaling needs. Need a new API? Just make a new microservice. Need more of something? Just create more instances of that existing service. At the same time though you end up with lots of different ways to do something. Every team/service becomes an island and does things its own way. And each one of those calls between services takes time, slowing things down. Have you ever tried debugging across service boundaries? It’s not easy. Or even just tracing what services are used in any given call chain. At one point in Uber’s microservice journey there were more microservices than engineers. Personally, I don’t think that’s a good thing.

So now that we’ve determined that you can have good (or bad) design with both monoliths and microservices, how do you choose? You choose based on what makes sense as a deployment methodology. How are you going to update things when you need to? It comes back to those tradeoffs. There are lots of things that you’re balancing. Ease of deployment. Horizontal vs vertical scaling. Depth and tightness of coupling. Debugability. Cognitive load.

Deploying a monolith is easy. There’s only one thing to deploy, so you don’t have to worry about versions. You don’t have to worry about the order of deployment. It’s always compatible with itself. Rollback, if needed, is just as easy. Deploying a single microservice is also easy, but what if it’s a breaking change? What else do you need to deploy first? What do you need to deploy after? What is or isn’t backward compatible? How can you test the whole system? Lots to think about and lots to get wrong if you’re not careful.

On the other hand, scaling is much easier with a microservice. If you have a service that is slower than the others, you can just deploy more of that microservice. Or you can give just that service more CPU/Memory. You get to scale what you need, when you need it. A monolith is the exact opposite. If you have one function call, you need to scale out/up, you need to scale everything out/up. So you have lots of waste.

Everywhere you look, you should be looking at monolith vs microservice as a question of what and how you deploy things, not how you decompose things into functions/libraries/APIs.