Recent Posts (page 33 / 70)

by Leon Rosenshein

Lumpers and Splitters

Are you a lumper or a splitter? I like to think of myself as a splitter, finding boundaries and cleaning architecture as I go, but I’m not sure that’s always true.

Because to be a splitter, you need to have a deep(ish) understanding of the problem. And you can’t have a deep understanding of the problem, solution, and its internal boundaries until you’ve lived with both the problem and a solution for a while. Instead, the best you can do is put things that seem to go together, or at least get used together, in one place so you can find them next time you need them. That’s called v1 or the MVP.

Ideally you’ve done a good enough job on v1 and learned enough that it makes sense to continue. So you add to it. And as a new, successful product, you have some momentum and good will. You want to take advantage of that, so you make the small additions you need and put things where they seem to fit best. You still haven’t lived with it, so right next to that other similar thing seems like a good idea.

Lather, Rinse, Repeat. Suddenly you find yourself struggling to make the next change. Things aren’t fitting together well, and now that you’ve lived with it for a while, you recognize that the internal boundaries of your solution aren’t quite right. That’s OK. You’ve not only figured out the problem, you’ve got a good idea of what a better solution looks like.

So you dive in. Breaking things along clear boundaries. Tightening the bounded contexts and firming up the APIs between them. You find it’s much easier to make changes again. You’re a splitter, and you feel good about the code. Out of curiosity you check history to see who lumped it together that way. And it turns out that the lumper was you.

Lather, Rinse, Repeat

by Leon Rosenshein

Functional Options

I’ve written about the Builder pattern before. It’s a way to give folks a set of default options for a constructor along with a way to set all of the options, and then validate things before actually creating the object. It’s nice because it lets you set all the parameters, in whatever order you see fit, and only validate them at the end. This can be important if setting parameters one at a time can put things in an invalid state temporarily.

On the other hand, it requires that the initial developer come up with all of the possible combinations of builder parameters. And that might not include the way one of your customers wants to use it. So what can you do in that case?

One variation is the Functional Options pattern. It relies on a variadic “constructor” function. Something along the lines of 

func MakeThing(required1 string, required2 int, options ...func(*Thing)) (*Thing, error)

Then, the user provides one or more functions that take a *Thing and do whatever magic they need to modify it. Those functions might take arguments. They might do validation. They can do anything the language allows. That’s pretty extensible.

Then, inside MakeThing, you add a loop that calls all of those functions, which modify the thing as desired.

func MakeThing(required1 string, required2 int, options ...func(*Thing)) (*Thing, error)
{
    thing := &Thing { 
        Name: required1,
        Value: required2,
    }

    for _, opt := range options {
        opt(thing)
    }

    return thing, nil
}

That gives the user all the control. There are 2 things I don’t like about though. The first is that there’s no final validation at the end. I have yet to see an example/tutorial that has one. It’s trivial to do and I’d certainly add one.

The other is a bigger issue. The functional options pattern requires your users to have full knowledge of the objects internals, while the Builder pattern hides those details. If you’re making a public API you probably want to hide those details, and the validator becomes crucial

Should you use it? Well, it depends, but it’s an option to consider.

by Leon Rosenshein

The Way

I recently stumbled back across an article titled Ron Jeffries Says Developers Should Abandon "Agile". And it’s strictly true. Jefferies did say that. Unfortunately it’s not the whole story. That’s much more nuanced, and won’t fit in a headline.

What he said was that many organizations are imposing processes and systems with “Agile” in their name. Those systems use many of the same words and descriptions from the original Agile Manifesto. And there might even be some short term benefits to organization, but long term, especially for the developers, it makes things worse. He calls this Faux or Dark Agile and says those systems should be abandoned.

Which leads me to another article that upset me the other day. 5 Things That Killed Software Development for Me. Again, it’s this person’s lived experience, and therefore true. But is that really the story? I think the story behind the story is really about forgetting my favorite question. What are you really trying to do here, and why? Because what upset me about the article is that there’s no attempt to understand the why or to really achieve those goals.

It’s done because This is the way. And the way is all that matters. Or is it? Is the result the important part? As the mandalorian learns, there is the way, but the way is there for a reason, and the reason is what’s important, not just keeping your helmet on.

So too with Agile:

  • People and Interactions over processes and tools -- Scrum rituals are an outcome, not the goal
  • Working software over comprehensive docs -- Add value with incremental change.
  • Customer collaboration over contract negotiation -- Add value together, not as adversaries
  • Responding to change over following a plan -- Start with a plan, then adjust as details become clear
by Leon Rosenshein
by Leon Rosenshein

Optimizing Tests

Like most things engineering, the answer to the question “Is this a good test?” is it depends. Tests have properties and the different kinds of tests make different tradeoffs between those properties.

While there are a lot of different properties a test could have, some of the most important are:

Deterministic: A good test gives the same result every time. Which usually means it has no external dependencies.

Fast:Tests should be fast. How fast? It depends. Programmer/Unit tests should be single digit seconds. Integration/acceptance tests should be single digit minutes. Bake/Soak tests might take hours or days. And of course this is for individual tests. The sum total of all tests of a given type, even with parallelization, might be longer.

Independent: Your tests shouldn’t have side effects. You should be able run them in any order and get the same results. Adding or removing a test shouldn’t change any other results. This means that you need a good way to set up all the preconditions for a test.

Specific: If a test fails, knowing which test failed and how should be enough to isolate the problem to a handful of methods. If your test that includes generating a value, storing it, and retrieving it fails, you don’t know which part failed and you have to examine the entire system to understand why. Much better to have tests for each part so you know where the problem is when the test fails.

Two-Sided: Of course you want to test that valid inputs give the correct results. But you also want to test that invalid inputs give the expected failure.

Uncoupled: Tests shouldn’t be concerned with the implementation of the solution. Ideally you would mock out an entire system and have it be functional and inspectable. We’ve done that for our in-memory file system we use for testing things on the infra team. We can preload the system, read/write arbitrary things, and then see what happened. On the other hand, for some things, like network calls, our mocking system looks for certain patterns and responds in certain ways. Not ideal, but a compromise. And avoid a mocking system that just returns a canned set of responses in a specific order. That’s both brittle and not representative of the thing you’re mocking.

Finally, going back to the classes of tests, and the different tradeoffs. Unit tests are run frequently. You might trade off testing how things work together for speed of testing. On the other hand, an integration test might have a more involved setup so you can test the interaction between components.

So what’s important to you when writing tests?

by Leon Rosenshein

Oracles

One of the things I was taught about driving, particularly night driving, way back when, was that you should never drive faster than you could see. Taken literally, that either makes no sense, or says never driver faster than the speed of light. But it does make sense. It means you should make sure you always have time to avoid any obstacle you suddenly see or, make sure you can stop before you hit something you suddenly see.

Simple enough, but there’s more to it than making sure you can identify something further away than your braking distance. Especially if you define braking distance as how far you go before you stop after the brakes are fully applied on the track under ideal conditions. It depends on things like road conditions, brake temperature, gross vehicle weight, lag in the system, and tire condition. And then there’s the person in the loop. How long does it take you to notice that something is in the way? Then decide that you need to stop? Then actually mash the brakes? Depending on your speed, that can take longer than actually stopping.

The same kind of thing applies to automatic scaling as well. You want to automatically scale up before you run out of whatever you’re scaling so your customers don’t notice, but you don’t want to scale up too much or too soon and have to pay for unused resources because it’s possible you might need them. The same goes when you scale down. You don’t want to hold on to resources too long, but you also don’t want to give them up only to need them again 2 seconds later. Which means your auto-scalers have to be Oracles

Sure, you could be greedy and just grab the most you might ever need, but while that might work for you, it’s pretty hard on everyone else, and if you are in some kind of multi-tenant system it won’t work if everyone is greedy all the time,

One place we see this a lot is with AWS and scaling processing. Not only do we need to predict future load, but we also need to take into account the fact that scaling up the system can take 10 minutes or more. Which means if we wait too long to scale up our customers ping us and ask why their jobs aren’t making progress, and if we don’t scale down we get asked why utilization is so low and can we do more with less? To make matters worse, because we run multiple clusters in multiple accounts in someone else’s multi-tenant system we’ve seen situations where one cluster has cannibalized another for resources. There are ways around it, but it’s non-trivial.

Bottom line, prediction is hard, but the benefits are multitude. And the key is understanding what the driving variables are.

by Leon Rosenshein

Containment

I’ve talked about failure modes before, but there was an incident last weekend that reminded me of them again. United 328, from leaving Denver for Honolulu, had a failure. A pretty catastrophic one. Most likely they lost a fan blade, which took out the rest of the engine. But as scary as it was for the folks on the plane and on the ground under it, no-one got hurt. That’s because a group of engineers thought about the safety case and the possible failure modes and planned for it.

And that meant that the aircraft had systems built in to minimize the impact and risk of rapid, unplanned, disassembly. Things like a containment system for the flying fan blades, automated fuel cutoff and fire extinguishers, and sufficient thrust from the other engine to keep making forward progress. Not just mechanical systems, but also operational systems like cockpit resource management systems, ATC procedures, and cabin crew plans. All of which worked together to get the plane safely back on the ground.

What makes this even more impressive is that all of those mechanical safety systems are heavy, and one thing you try to eliminate on planes is weight. Because every pound of an aircraft’s total weight is a pound of revenue generating stuff you can’t carry. That’s why the seats are so uncomfortable. And all those processes and training costs time and money, but they do them anyway. Because it’s just that important.

The same thing applies to software in general, not just robot software. What are the potential failure modes? What can you do to make sure you don’t make things worse? What do you need to do to ensure the best possible outcome when something goes wrong?

We can’t ensure 100% safety, but we need to do everything we can to minimize risk before something goes wrong and then do everything we can to get to the best result possible if it does.

by Leon Rosenshein

MBWA vs Gemba

Management By Walking Around (MBWA) is a management style where a manager walks around to the various people/workstations, seeing what’s going on, talking to people, observing and directing as they feel appropriate.

Gemba is Japanese for “the actual place” and is used in the Lean Methodology to represent the idea of managers periodically “walking the line”, or going to the workplace to see how things are going, listen to people, and observe what’s happening to better understand value creating and identify waste.

At first glance, MBWA and Gemba seem like the same thing. The manager leaves their office, goes to where the work is happening, looks around, and makes some changes. Sounds equivalent, no?

Actually, no. Because while the physical activity, walking around, is the same, the motivation and the process are pretty different. MBWA is spot checking on the people. Seeing what they’re doing, understanding what their concerns and impediments are, and making sure there’s good alignment on tasks and goals.

Gemba, on the other hand, is about checking on the value stream. Seeing what directly contributes to it and what adds friction and delay, then figuring out how to maximize the former and minimize the latter. And it often doesn’t happen during the Gemba walk. Instead, the walk is used to gather information. Asking questions and listening to answers. Getting the big picture and filling in the details. Then, later, with thoughtful deliberation, making decisions and acting on them.

Both have their place. Gemba focuses on the value stream, which is what we’re here for. Adding customer value. MBWA, on the other hand, focuses on the people. Since people are the ones adding value, we need to focus on them.

We need to be thoughtful and make sure both are getting done, at the right times. And, since we’re all WfH, do it without actually walking.

by Leon Rosenshein

Discoverability vs. Findability

First things first. discoverability and findability are not the same thing. Findable is relatively easy. You know something exists. You saw it once or someone told you about it. You know what you’re looking for, but you don’t know where it is. Like your car keys or that status dashboard. Discoverability, on the other hand, is about learning. You can only discover something once. The second time you see something you haven’t discovered it, you’ve found it. And that’s the key difference.

And that gets to the heart of the issue. Findability is important for the things you know you don’t know. You have a question and you want an answer. What’s the difference between merge sort and bubble sort? When is it better to use a micro-kernel based architecture instead of microservices? Search engines are great for that kind of thing. Put in your keywords and get the answer.

Discoverability, on the other hand, is about things you don’t know that you don’t know. And search engines are notoriously bad at that. Sure, you might get a better spelling or a search result might show you something related, but if you’re trying to find out how to use your microkernel architecture to process millions of queries per second you’re unlikely to find anything about how a distributed microservice might be a better choice. If you know a little bit more about the problem space you can use a search engine, but it’s harder. You need to change your thinking and search for architectural patterns, and then dig in from there. And that’s if you know the domain.

Or consider your IDE. Both VSCode the various JetBrains IDEs do a good job of both making it easy to find the functionality you’re looking for with hierarchical menus, context menus, and a search mechanism and the make it easy to discover keyboard shortcuts and related commands/options though advertising them and grouping things. Vim, on the other hand, has an OK search engine, butt if you don’t know what you’re looking for it’s almost impossible to discover.

So why should you care? You should care because it applies not just to IDEs and search engines, but also to libraries, APIs, and pub-sub systems. We talk to our customers/users a lot and understand what they want/need. We use that information to build the things that provide the most value to them. If there were more valuable things to build we’d build them instead. But unless our customers/users know about what we’ve done and use it then we’ve wasted time. Sure, you could assume that since they asked for it 3 months ago they’ll notice when it suddenly appears, but really, they’re busy to and probably won’t. So make it not just findable, but discoverable. How you make something discoverable is a different topic for later

by Leon Rosenshein

I've got you covered

Code coverage is important. If you haven’t exercised your code then you don’t know if it works. If you don’t know it works, how can you be done?

So code coverage is important. If all your code is covered by your unit tests you can feel pretty confident that your code does what you think it does. Test Driven Development (TDD) is a good way to approach it. Figure out what you want your code to do. Write the tests that validate it does those things. Then write just enough code to make sure all the tests pass. Do that well and you’ve got 100% code coverage.

And that means your code is perfect and you can turn off your alerting, right? Wrong. For one thing, 100% code coverage and passing tests means that all of your code is executed by the tests. It doesn’t say anything about the correctness of the code, Consider this bit of pseudo-code

func area(length int, width int) int {
 return length * length
}

func test_Area() {
  length = 2
  width = 2
  result_area = area(length, width)
  assert_equal(result_area, length * width)
}

100% coverage, all tests pass, and it’s wrong. So code coverage isn’t everything.

But you could argue that in this case the problem isn’t that coverage isn’t the problem, it’s that the test is wrong. You’d be correct, but writing tests to check the tests has no end.

Then there’s the issue of side effects. Everything does what you expect in isolation, but does it do so in concert? Do you have enough tests to cover the side effects? Your method to add to the global list does so, but is there another bit of code that has cached a copy of the global list and is now going to do the wrong thing? In this case each of the tests is correct, but there’s an interaction you haven’t tested.

Then there’s the whole world of integration. Each piece of code does exactly what it should. No less, and no more. But when you put them together you get some kind of non-linear response or positive feedback loop.

All of which is to say, code coverage is important. You can’t be sure you’re right without it. But don’t confuse a necessary component with a sufficient one.