Recent Posts (page 6 / 67)

by Leon Rosenshein

Milestones Vs. Steppingstones

In software, we’re very familiar with the idea of a milestone, but not steppingstones. Which is odd, because the two terms are very similar. Where does the terms come from? Like many things in the western world, the term milestoone comes from the Roman Empire. The Roman Empire did lots of things throughout Europe and Asia. Some good, some bad. One thing they did really well was build roads. Good, solid roads that you could count on to get you from here to there, regardless of the season or weather. You also knew where you were, because they put milestones along the road. At fixed, well known intervals (every mile) along the major roads was a marker, a milestone, that you could use to know how much progress you had made.

These days we have mile markers along our major roads, not actual stones, but we still use the term. In projects we use the term to mark significant points along the project’s journey from start to finish. They’re usually big, complex, demo-able things with fixed dates. They can be pretty important. They are almost always something fairly concrete and definable in the domain the user of your software can understand.

Steppingstones, on the other hand, aren’t something we talk about much. While milestones are the markers along the way that let us know how far we’ve come, steppingstones, on the other hand, are the little increments you use as you proceed from milestone to milestone. They’re solid, well anchored, stable places you can step to along the way. They usually help you to avoid falling into the water or sinking into the mud, but you can use steppingstones any time you need a place along the way to keep from making a mess or getting stuck.

In software we love to talk about analogies. To the stakeholders, the people who are not closely involved in the development of the software, but are responsible for ensuring the project succeeds, and often also responsible for providing resources, milestones often get used to provide confidence. Confidence that things are proceeding at the expected pace, and that the result will be something like what they’re expecting, and that it will arrive on the date its expected.

For those directly working on the project, the implementors, milestones provide a goal to work towards. They explain, in generally plain language, the functionality that someone who isn’t deeply involved with the project is supposed to be able to see. They’re a little bit squishy, because they don’t describe all of the possible edge cases, problems, and oddities along the way, but that’s a good thing. They let you figure out how to meet the requirements. And when the requirements don’t make sense, they give an opportunity and a forum to explain, again, in simple language, why they don’t make sense. Even more important, they give you a date. A time when the result is expected. That’s really good for helping you focus on what’s important. Focusing on which decisions need to be made now, and which decisions can (and should) wait until later.

Just like milestones have an analogy in software development, steppingstones have one too. As with milestones, stakeholders and implementors view steppingstones differently. But where they both see milestones as important, stakeholders generally don’t care about any of the details of the steppingstones. In fact, as long as you don’t get fall of the path, they don’t even want to hear about the steppingstones. They’re an implementation detail left to the implementors. For the implementors though, steppingstones are critical. They’re the stuff of the day-to-day work. Often you can’t see more than 2 or three steppingstones in front of you, so you can’t pick out which one you’re going to use until you get there. And where you find yourself directly impacts the choices you have on where to go next. You often have some idea of the steppingstones along the way, but which exact ones you end up using you won’t know until you get there.

Here’s another way to think about it. At the beginning of Raiders of the Lost Ark Indy is trying to get a golden idol from a lost temple. He knows his milestones. Find the temple. Find the entrance, Find the idol in the temple. Get out, Get home. He has a certain amount of supplies and tools, and he plans his route accordingly. What he can’t do beforehand though, is plan how do to each of those things. He knows there are booby traps along the way, but he doesn’t know what they are until he gets there. So he finds the steppingstones as he comes to them. In the room with the idol, he literally needs has to choose the correct first steppingstone before he can even start looking for the next one until he gets to the idol.

When you think a little more deeply about it, the difference between a milestone and a steppingstone is more of a question of scope and viewpoint than it is an objective reality. Just as software architecture can be seen as software design at a different scale, your steppingstones could be someone’s milestones, and your milestones are probably viewed as steppingstones by someone else. Which is another way of saying we need to think about the steppingstones along the way. And take many more much smaller steps.

by Leon Rosenshein

Complexity And Congitive Load

Software design is not about minimizing design complexity, but rather spending our complexity budget where it can do the most good. — Kent Beck

Let’s face it. Very often the systems we build are complex. And they’re complex in many different ways. Ways you just need to deal with. And it’s got nothing to do with how easy (or hard) it is to explain the task in English.

Sometimes the complexity is in the domain. In the US, if you’re writing tax software then you have the complexity of the federal tax laws, which are at best ambiguous, and probably contradictory. Add to that taxes for state and local jurisdictions. And foreign work and income. And where you live. And where you work.

Other times the complexity is in the details. You would think Time is the most monotonically increasing thing there is, but it isn’t that simple. Time is a lot more complex than you think. The same applies to people’s names and addresses. In fact, keeping track of pretty much any personal information is more complicated than you think. And that’s before you think about the privacy implications of storing that data.

It can also be scale that makes for complexity. It’s (relatively) easy to handle 10 transactions per second, but if you need to handle 10 million without adding latency that’s a whole different level of complexity. Finding the longest word in a list of 10 words is easy. Finding the longest word in Tolstoy’s War And Peace is a much more complex. And that’s not even thinking about which language you’re counting in.

We can’t get rid of the complexity, so compartmentalizing it helps. Providing the right level of abstraction hides the complexity. Behind the abstraction you only need to worry about the complexity. Outside the abstraction there is no complexity. You only need to think about the problem you’re solving and don’t need to think about the complex parts.

Now we’ve talking about cognitive load. It’s in the title and something I’ve written about before. It’s a measure of how many things you need to keep thinking about and be aware of that aren’t the problem you’re trying to solve, but are critical to solving the problem. The more you can reduce the cognitive load, the less effort you need to put into the ancillary problems, the more effort you can put into solving the problem you’re trying to solve.

Which is what Beck is talking about. Figure out where the complexity in your problem is, and put your effort there. Make everything else as simple as it can be. Define the domain you work in. Don’t try to be everything to everyone, just solve the problem you’re solving. Use existing solutions. Don’t build your own encryption module, use a well vetted one. Don’t build your own database system (you might need your own tables and stored procedures, but not a new dB).

You have a problem you’re trying to solve. You have a limited about of cognitive load you can bring to bear on the problem, So spend your cognitive load (and complexity) wisely. Spend it on the part of the problem that is your value add, not somewhere you can hide it behind an existing abstraction.

by Leon Rosenshein

Monolith Is A Deployment Strategy, Not An Architecture

There was an article a few weeks ago about how the Amazon video team switched one of their tools from a distributed microservice architecture to a monolith that runs/scales on EC2. Does this mark the beginning of the end for microservices? Were we wrong to decompose all those monoliths into microservices? Should we recombine all of our microservices and serverless systems back into monoliths?

Or, is this just another case of It Depends? I say It Depends. Because the difference between a monolith based system and a microservice based system isn’t really the design and segmentation of the code. It’s in the tradeoffs you make when deploying the code. The tradeoffs you make with Conway’s Law to keep from shipping your org structure. The tradeoffs you make when you think about needing to scale part of the process, but not all of it. The tradeoffs you make for performance. The tradeoffs you make to manage cognitive load.

Sure, monoliths get a bad rap and we often think of monoliths as nothing more than a container for your Big Ball Of Mud. And sometimes they are. I’ve been involved in my share of monolithic balls of mud. But they don’t have to be that way. If you pay attention to domain driven design you can have a well written monolith. With separation of concerns. With clean boundaries. With good abstractions that keep your cognitive load down.

At the same time, we think of microservices as the answer to all of our scaling needs. Need a new API? Just make a new microservice. Need more of something? Just create more instances of that existing service. At the same time though you end up with lots of different ways to do something. Every team/service becomes an island and does things its own way. And each one of those calls between services takes time, slowing things down. Have you ever tried debugging across service boundaries? It’s not easy. Or even just tracing what services are used in any given call chain. At one point in Uber’s microservice journey there were more microservices than engineers. Personally, I don’t think that’s a good thing.

So now that we’ve determined that you can have good (or bad) design with both monoliths and microservices, how do you choose? You choose based on what makes sense as a deployment methodology. How are you going to update things when you need to? It comes back to those tradeoffs. There are lots of things that you’re balancing. Ease of deployment. Horizontal vs vertical scaling. Depth and tightness of coupling. Debugability. Cognitive load.

Deploying a monolith is easy. There’s only one thing to deploy, so you don’t have to worry about versions. You don’t have to worry about the order of deployment. It’s always compatible with itself. Rollback, if needed, is just as easy. Deploying a single microservice is also easy, but what if it’s a breaking change? What else do you need to deploy first? What do you need to deploy after? What is or isn’t backward compatible? How can you test the whole system? Lots to think about and lots to get wrong if you’re not careful.

On the other hand, scaling is much easier with a microservice. If you have a service that is slower than the others, you can just deploy more of that microservice. Or you can give just that service more CPU/Memory. You get to scale what you need, when you need it. A monolith is the exact opposite. If you have one function call, you need to scale out/up, you need to scale everything out/up. So you have lots of waste.

Everywhere you look, you should be looking at monolith vs microservice as a question of what and how you deploy things, not how you decompose things into functions/libraries/APIs.

by Leon Rosenshein

0, 1, Many

Continuing on with looking at numbers, think about counting. We all know how to count. Or we think we do. But do we really think about how we should be counting. Consider the following quote.

“Common programmer thought pattern: there are only three numbers: 0, 1, and n.”     – Joel Spolsky

There’s more than a little truth to that statement. After all, from a linguistic standpoint there’s lots of precedent for it. My non-linguistic experience also tells me that there’s not just a quantitative difference between 0 and 1 and N, but there’s also a qualitative difference.

The qualitative difference shows up in many different ways. 0 is the same as never. That can’t/doesn’t happen, so don’t worry about it. 1 is the same as always. Count on it happening. Assume it already happened. Either way, always or never, 1 or 0, TRUE, or FALSE, it’s a constant. There are no decisions needed. N, on the other hand, is maybe. You don’t know. It might happen. It might not. You can’t count on it. You need to handle it happening. You need to handle it NOT happening. Be prepared for both cases1.

Another qualitative difference is that when there is a choice, it’s often not either/or, but one (or more) of many. In code that shows up as something that started as if/else, but eventually morphed into a series of if/elseif/elseif/elseif/…/else. Sure, that can work, but there are better ways. Listen to your data and let it guide you in your programming. This is where object-oriented programming, in it’s true sense, really comes into it’s own. You make the decision early about what the object is, then you just act on/with it in a linear sense. You get back to always (or never) for most decisions and let the object worry about what it means in that specific case.

Then there’s the learning case. I’ve said before that versions 1 and 2 of something are easy. It’s when you get to version N that things get interesting. Again, that first version is the never case. No one has done it before, so there are no special cases. Just do something and it will be ok. Version 2 is the always case. For anyone who has used version 1, it’s always been that way. There’s no ambiguity. Everyone, on both sides, knows what to expect. It’s only when you get the version 3+ that you get into the maybe case. You don’t know what your customer has experienced. You don’t know what they expect. They don’t know what is coming from you. And as I’ve said, that’s where the learning is. Dealing with the ambiguity is where you stretch and grow.

So, whether you’re thinking about your design, your implementation, your career, or life in general, think about how you deal with counting.


  1. Hint. You might think it’s a 0/1 situation, but check your assumptions. It might be a 0/1 situation, but our assumptions are often wrong, so think them through ↩︎

by Leon Rosenshein

1 > 2 > 0

I’m pretty sure this story is true, because I’ve heard it too many times, sometimes from people who could have been there. The people involved and timeline match up. Also, even if it’s not true, there’s still something to learn.

Amazon has always been about 2-pizza teams. You should be able to feed an entire team lunch with 2 pizzas. The idea is to keep them agile and innovative. To minimize communications delays and bottlenecks. It works pretty well too. It says nothing about the software architecture, only the scope of responsibility of a team.

Back around 2002, Amazon’s internal system was a large monolith. And there were lots of 2-pizza teams trying to work on it at the same time. It was pretty well designed, but with that much going on there were lots of interactions and coupling between teams and the work they were doing. So much that it really started to slow things down. It got so bad that Jeff Bezos issued an ultimatum.

All teams will henceforth expose their data and functionality through service interfaces.

That’s a pretty bold requirement. It meant that everything needed to change. From the code to the tooling to the deployment processes and automation. Over the next 2-3 years, Amazon did it. They changed to a Service-Oriented Architecture that endures to this day. It broke a lot of the direct coupling that had crept into the monolith. And it led directly to the creation of AWS and it being the dominant cloud platform. A cloud platform that can do everything from hosting Netflix’s compute and storage to hosting this blog.

It did that by clearly defining boundaries and letting teams innovate inside those boundaries to their hearts content. But it also led to some new problems. Because each team was responsible for everything inside those boundaries, teams started to write their own versions of things that were once shared libraries. And we all know that duplication code is bad. You duplicate bugs. It makes updates hard. Teams ended up with code they were responsible for that they didn’t necessarily understand.

Enter Brian Valentine. He’d recently joined Amazon (in 2006) as a Senior VP, coming from Microsoft, where he’d led, among other things, the core Windows OS team. A huge organization w/ 1000’s of people developing hundreds of libraries and executables that made it up. He looked at what was going on and realized that lots of teams were writing the same code. That there were multiple implementations of the same functionality scattered throughout the codebase. That it was inefficient and wasteful and that those sorts of functionality should be provided by a set of core 2 pizza teams so that the other teams could focus on their specific parts of the business.

He worked with his team and his peers to define a system where those core teams would be identified and created, then all the other teams would start using them. They wrote the 6-pager that defined the system, how it would be created, and all the benefits. Eventually it got to a meeting with Jeff Bezos, then Amazon CEO. I believe everything up to this point is true. Here’s where it gets apocryphal. I want to believe it’s true, but I just don’t know.

After the required reading, Valentine summarized the point of the meeting by writing a single line on the whiteboard

1 > 2

Huh? One is definitely not greater than two. One is strictly less than two. What Valentine meant was that having one way to do some shared bit of functionality that is actually shared, not copied/reimplemented, is better. It’s more efficient. It means less duplicated effort. It lets teams focus on their value add instead of doing the same thing everyone else was doing. That makes sense. So that’s what Amazon does now, right?

Nope. After Valentine said that was how things should be done and stepped back, Bezos stepped up and changed it slightly. To

1 > 2 > 0

What? That makes even less sense. Two is greater than 0, and so is one, but two is not between one and zero. What Bezos was saying was that having one solution might be better than having two, but waiting for some central team to build something new or update some existing service to have a new capability takes time. It adds coupling back in. It makes things more complex. And while you’re waiting for the central team to do its part the dependent team can’t do anything. So for some, potentially long, period of time, you don’t have 1 solution, you don’t have 2 solutions, you have 0 solutions. And having zero solutions is even worse than having multiple solutions. The plan pretty much ended there. Like they say in the go proverbs, “A little copying is better than a little dependency.”

Which is not to say you never go back and centralize common things. Amazon doesn’t expect every team to write their own operating system. They don’t write their own S3 access layer. They use Linux, or the AWS SDK. And when there is a critical mass of people doing something common then you write a centralized library that is shared. Or write a new service with a defined interface that everyone should call.

The trick is to do it at the right time. After there are enough instances to let you know what to build, but before there are too many versions to be able to replace them all with the central version.

by Leon Rosenshein

What You Do Next

“If you hit a wrong note, it’s the next note that you play that determines if it’s good or bad.”     –Miles Davis

That’s pretty deep. And it applies to things very far removed from jazz. Things that are very structured and precise. Things like code. Or, maybe, software isn’t as structured, precise, and linear as we think.

Before you get upset and tell me I’m nuts (which may be true, but doesn’t matter here), I want to be clear. Writing code that is well structured and precise is important. Being clear and understandable is important. Separation of concerns and listening to the data is important. Big ball of mud development is not what I’m talking about. I’m talking about how we write code, not the code we write.

Consider this alternate phrasing of what Miles Davis said.

You learned something new. How you respond determines if the knowledge was good or bad.

Those two sentences pretty much define the software development process. Everything else is an implementation detail. Waterfall or agile, greenfield or brownfield, startup or established in the industry, it doesn’t matter.

Put another way, it’s the OODA loop. Observe (see where you are). Orient (understand the situation). Decide (choose the next step). Act (do it). Because you are where you are, and the situation is what it is. Your involvement in getting there (Miles’ wrong note) doesn’t matter anymore. The only thing that matters is what you do next. How often (or fast) you run the loop.

Think about how empowering that is. The immutable past has happened. You can’t do anything about it. But you have a lot of control over the future. You have agency. You have power. You have the ability to make sure that the next step puts you in a better place than you are now.

If your next action moves you closer to your goal than where you currently find yourself then your move was good. Whether you find yourself closer or farther1, run the loop again. And keep running it. Eventually you find yourself at your goal. The more experience you have in the field, and with the situation, the more likely your choice of action will move you closer to your goal.

That applies whether you’re playing jazz or writing software. And now that I think about it, to life in general.


  1. Closer or farther in the minimizing time and effort sense. Sometimes refactoring, which takes time, appears to be irrelevant, and has no visible impact, actually gets you to your goal with less time and effort. ↩︎

by Leon Rosenshein

What Are You Testing? II

I’ve written about what you’re testing before. That article was about writing tests in a way that you could look at the test and understand what it was and wasn’t testing. It’s about writing readable and understandable code, and I stand by that article.

However, there are other ways to think about what you’re testing. The first, and most obvious way to think about it is thinking about what functionality the test is validating. Does the code do what you think it does? That’s pretty straightforward. You could think about it like I described in the article and how readable and understandable the test is. Or, you could think a little more abstractly, and think about what part of the development process the test was written I support of.

Taking inspiration from Agile Otter, you can think about the test at the meta level. What are you writing the test for? Is the test to help you write the code? Is the test to help you maintain the code? Is the test supposed to validate your assumptions about how users with use your code? Is the test supposed to characterize the performance of the code? Is the test supposed to help you understand how your code fails or what happens when some other part of the system fails? The reason you’re writing the test, the requirements you have for the test, help you write the test. It helps you know how to write the test and what successful result looks like.

Industrial Logic slide- Title: So, only write fast automated tests, right? Body:No. Only fast, automated microtests will support refactring. Other tests support system correctness, release-worthiness, etc.

Why we write tests.

A set of tests written to validate the internal functionality of a class, with knowledge of how the class operates has very different characteristics than a test written to validate the error handling when 25% of the network traffic times out or is otherwise lost. Tests written to validate the public interface of that class also look and feel different. They all have different runtime characteristics as well and are expected to be run at different times.

Knowing, understanding, and honoring those differences is key to writing good tests. Because the tests need to not only be correct in the sense that they test what they say they do by their name, but that the results of the test also accrue to the meta goal for the test. Integration and system level performance tests are great for testing how things will work as a whole, but they’re terrible for making sure you cover all of the possible branches in a utility class. Crafting a specific input to the entire system and then expecting that you can control exactly which branches get executed through a call chain of multiple microservices and database transactions is not going to work. You need unit tests and class functionality tests for that. The same thing if you need to do some refactoring. Testing a low level refactor by exercising the system is unlikely to test all of the cases and will probably take too long. On the other hand, if you have good unit tests for the public interface of a class, you can refactor to your heart’s content and feel confident that the system level results will be the same.

Conversely, testing system performance by measuring the time taken in a specific function is unlikely to help you. Unless you already know that most of the total time is taken in that function, assuming you know anything about system perf from looking at a single function is fooling yourself. Even if it’s a long-running function. Say you do some work on a transaction and take it from 100ms to 10ms. A 90% reduction in time. Sounds great. But if you only measure that execution time you don’t know the whole story. If the transaction only takes place 0.1% of the time, and part of that workflow involves getting a user’s response to a notification, saving 90ms is probably never going to be noticed.

So when you’re writing tests, don’t just remember what one thing you’re testing, also keep in mind why you’re testing it.

by Leon Rosenshein

Release Stabilization

Release stabilization is traditionally the period at the end of a development cycle where the team minimizes change and spends time fixing things and making sure the result is stable. That sounds like a good thing, doesn’t it? And compared to its most obvious opposite, release de-stabilization, it’s definitely a good thing. Before you release any software, executable, library, website, whatever, you want to know that its good and that it’s stable and resilient to whatever the real world will throw at it, When thought of that way, it’s something we should always do.

On the other hand, there’s a different opposite state that’s implied by that term. A term that makes me think really hard about how we develop software in general. The alternative other state is Development Instability. If we need to take some time at the end to make the software work, what the hell were we doing all the time before that? Working on job security? Of course not, But …

“Release stabilization?” I don’t understand. Why did you choose to make it unstable? In what world does that make sense?       - - Kent Beck

I know at least one reason. The drive to get things done. Or at least call things done. And be able to close the Jira ticket. Because often, in the short term, that’s how we’re measured. And as Eli Goldratt said,

Tell me how you will measure me, and then I will tell you how I will behave. If you measure me in an illogical way, don’t complain about illogical behavior.

That doesn’t make it the right thing to do though. It’s a classic local maximum issue. The fastest thing I can do right now is the least amount of work needed to be able to close the ticket. Even if that means I leave a bunch of work for tomorrow (the release stabilization period). If it were as simple as delaying some work until later, we’d be OK, and it wouldn’t be a local maximum.

Unfortunately, it’s not just delaying some work. The work we’re delaying isn’t just moved to the end of the project, it also slows down progress for the rest of the project. Every time we delay work, we make things a little slower in the future. If you think of it as technical debt, which isn’t a bad metaphor, eventually the interest becomes too high and you go bankrupt.

You can hit a local maximum the other way too. You can spend too much time making some small thing perfect. You think by doing that you’ve made it so you don’t need the time at the end, and don’t have the problem of slowing yourself down in the future. It’s a good thought, but again, there’s a problem. You spend all that time making it perfect, given what you know about the rest of the system. Then you learn something new about the system and what was perfect becomes not perfect. So you perfect it again. Then you learn something new. And the same thing happens. It turns out there’s a name for this, Rework Avoidance Theory and it doesn’t work either. You end up slowing yourself down because you need to keep changing things as you learn more about what you’re doing.

We know these things. We struggle against doing them. Like everything else in software (and life), the right choice at any specific time is dependent on the context. I can’t tell you what you should do in any specific context, but I can tell you that the right way to approach the problem is to be aware of your context. To think about which work to do now based on what you know, and which work to leave for later when you know more.

I can’t be sure, but experience tells me that you should probably be doing a little more work after you think you’re done, not less. There might be some extra conditions/experiments you run when you get closer to the end, but overall, doing more along the way and less stabilization will probably get you to the finish line sooner.

by Leon Rosenshein

Average Practice

I’ve talked about best practices a bunch of times. Generally, in praise of them, with the caveat that context is important. What was the best practice for one team, in one specific situation, might not be the best practice for you in your situation.

Over time, best practices get diluted. As they get described, and implemented, with less and less connection to the original context which they came out of, the less specific and focused they get. They become generic advice that applies everywhere. Also, by the time you hear about them, the team that came up with it has probably changed what they’re doing. Because their situation changed, the practices they follow has also changed to be optimized for that new situation.

Blindly following best practices doesn’t mean you’re doing what the best teams (whatever that actually means) are doing or that you’re going to get the same results. Think about it. Most documentation of best practices are so lacking in specifics that it’s easy to convince yourself that you’re already following them. And if they do have specifics, the specifics are so focused on the original situation that you can’t apply them in yours. What you end up with isn’t the best practice for you. It’s an average practice that is probably a very good idea.

Don’t get me wrong. I’m not saying you should ignore best practices. You shouldn’t ignore them. Instead, you should look closely at them. They may not tell you exactly what to do in order to get the same results as their originator, but they are usually providing a very good baseline. Instead of considering them a goal and something to strive for, think of them as defining a baseline that is pretty good, but could be adjusted and optimized to provide even better results for your specific implementation.

So don’t avoid best practices. The exact opposite in fact. Embrace them. Embrace them so hard you understand what problem they were created to solve. Then look at how they actually contributed to solving the problem, in the original context. Then look at how that context is the same as yours and how they differ. Then, and only then, should you start to think about how to apply a modified version of the best practice in your situation.

Consider the idea of code reviews. At some of the biggest tech firms, Microsoft, Google, Amazon, Meta, etc., there are bespoke tools for managing code reviews and ensuring that (almost) every change is reviewed by the right folks in a timely fashion. These companies all have billion-dollar market caps, and have huge profits. Therefore, if you want to have that kind of success, you need to build your own bespoke system to do the same, of even better, find the FOSS version of one of those systems and do exactly what they did, right?

Wrong. First, what problem was that system built to do? What were the incentives and counterincentives in the environment they were built in? Consider the Microsoft Windows team. 1000’s of developers, broken into larger and smaller orgs that need to work together. That need to share knowledge. That need to protect themselves from each other’s best intentions. They need to build 1000s of executables and libraries, in concert, quickly, and test them separately and together. The result is the Virtual Build Lab and all of the tooling and infrastructure around it. Microsoft has a large team of people who’s only job is to run the system that does builds of Windows. 100’s of folks who make sure the right changes go into the right libraries and executables and are then packaged up into some deployable thing that can be run through a set of automated tests, which are built and maintained by an equivalent team. So if you want to build and deploy something as ubiquitous as Windows, you need to do the same thing.

Or not. Do you have 1000’s of developers working on the same product? Do you have millions of lines of source code? Do you need to maintain backward compatibility with 20 year old software and hardware? Can you afford to dedicate 100 people to running your build system? Probably not, so why try?

Instead, look at what core business problems they’re trying to solve. Allow different teams to develop at their own pace. Make sure all teams have access to the latest code. Make sure builds don’t slow everyone down (too much). Decouple teams, but make sure the right people look at the right code before it gets into the system. Once you understand what they were trying to do, decide how much of that you care about.

Maybe it’s decoupling with proper oversight that is what you really want to solve. So solve that problem. At your scale. With the tools you already have at your disposal. Augment those tools as needed, but only with what’s needed. Something that assigns and requires specific reviewers based on the section of the codebase. Or maybe mob/ensemble programming solves the real problem, and you don’t even need to have additional code reviews after the code is written.

Remember, the goal is to solve the business problem and provide value, not to implement what someone else did. So next time someone tells you that you should do what does, make sure you know why they did it, make sure you want to have the same result, then use their best practice as a baseline and starting point for coming up with a best practice that works for you, in your situation.

by Leon Rosenshein

Green Fields And Platforms

I recently had the opportunity to work on a greenfield project. You know, the kind of project were you recognize a common problem a set of users have and you have an idea of how to solve the problem for them. It can be a great space to operate in. You have a green field to build your castle in. Sure, you’ve got the eternal constraints of time and resources, but other than that you’re free to innovate.

You get to choose your architecture. Need lots of extensibility governed by a common control? Maybe use a micro-kernel architecture. Have lots of different components that need to scale differently? Try microservices. You set the requirements and you decide on the architecture.

You get to examine the problem and solution space and figure out what the bounded contexts are. You pick the domains that drive the design. You pick the nouns (objects) and verbs (actions/API). You get to do it in a way that makes sense to you.

I was in that situation. We choose the language. We choose the architecture. We choose the domains. We built a proof of concept that showed us we were on the right path. Things were looking good.

Then we ran headlong into an observation Tolstoy made in Anna Karenina.

All happy families are alike; each unhappy family is unhappy in its own way.

We were trying to build a platform for others to build their solutions on. Like a good platform, one of the things we were doing was hiding complexity. We were trying to make it easier for our potential customers to focus on doing the things that added value and that their customers wanted. We wanted to make it so that they didn’t need to worry about the complexity and details of configuring and providing low latency connections between any two of a few hundred endpoints.

We spent some time talking to potential customers and it turns out that Tolstoy was right. At a high level all of our them were doing the same kinds of things. It was easy to see how a single platform could provide a foundation for all of them to build on and be happy. However, when we dug into the details of what they were unhappy about, it wasn’t as uniform.

Sure, they all ran into the same problems, and they all used most of the same tools to approach a solution. Unfortunately for us though, the solution to those common problems wasn’t the same across customers. And they weren’t different in the details and approach, they were also different in how those solutions integrated with the rest of their systems.

The domains had different shapes. The bounded contexts had similar names, but the boundaries were different. The set of verbs they used were generally synonyms of each other, but because the domains and contexts were different, they were incompatible. They were all unhappy with their bespoke solutions, but they were all unhappy in very specific, and different, ways.

Getting rid of that unhappiness still made sense as a solution, but the field wasn’t as green as we thought. Sure, we could have whatever architecture we wanted on the inside, but along the edges we found a whole new set of constraints. We had to (mostly) match existing code and APIs. Much more of a brownfield project than we originally thought.

As a platform we needed to make customers’ lives easier. And one of the big things we needed to do to make their lives easier was make sure the transition from their bespoke solutions to our framework was straightforward and painless. We realized that doing that had just become our biggest problem. In some ways it was a bigger problem than the original problem our platform was supposed to solve. Instead of just needing to hide the underlying complexity, we needed to hide the complexity of building a single API that could support the workflows and patterns of multiple different approaches to the problem.

To solve the problem, we kept reducing the scope of the initial version. Instead of a full-featured platform that could drop in, we focused on a very small subset of the problem space. Something we knew was important enough that all of our customers would want the change. Something we knew was separable enough that our customers could accept the change without having to do too much architectural surgery on their existing code. Something that would be the thin edge of the wedge and let us insert our platform in our customers’ codebase and then expand out from.

So next time someone tries to convince you that you’re going to be working on a greenfield project, remember that even the greenest of fields have edges, and the shape and context of those edges will turn what you think is a greenfield project into a brownfield project.

That doesn’t mean the idea is a bad one or that you should run away from it. On the contrary, it can be a great opportunity to learn a new domain, add lots of customer value, and have a chance to really stretch your design muscle. Just don’t forget the things on the edges, because that’s where the constraints and headaches come from.