Recent Posts (page 27 / 67)

by Leon Rosenshein

Best Practices?

Design patterns are best practices. Best practices are, by definition, the best way to do things. It’s right there in the name. Therefore we should use design patterns for everything. Taken to its logical conclusion, you end up turning the interview FizzBuzz question into the Fizz Buzz Enterprise Edition. That’s perfect, right? It’s got, among other things, frameworks, abstractions, factories, strategies, Travis for CI, and it’s even got an implementation. Which means if you need a scalable, enterprise-ready version of FizzBuzz, you’re set. End of story.

Or not. There are lots of design patterns there. But the simple usage of a pattern doesn’t mean it’s a good thing. After all, you can always add another level of abstraction.

More seriously, all of those things are important. When used at the right time. You could use a Builder to set up a State Machine every time you need to get the current state, but is that the right thing? Probably not. 

There are two times when those best practices are almost certainly wrong. At the very start of a project, and when you’ve pushed your project into areas where most folks haven’t gone. Because that’s not what the Best Practices are best at.

When we did the O’Reilly kata last year we almost certainly over-designed it. We had microservices, an authentication/authorization module, and provisions for adding a machine-learning component. For a customer with 100s of customers at 10s of locations in one city. In reality, you probably could have managed it with a spreadsheet and some web connections and a few Excel macros. The winning architecture was a monolith with some clear internal separation for if/when more scale was needed.

At the other end of the spectrum is getting the most performance out of a system. Which for me was epitomized by the Apple ][ hi res graphics, with it’s weird modes and loading order, because it saved a chip in the design and was cheaper/faster. There’s no Best Practice that will tell you that it’s a good idea to load things like that. It’s just something needed in that particular time/place. The StackOverflow folks have a good description of how they traded best practices for scale/speed, and how now, with more power and different goals they’re changing some things.

Which is to say that best practices are best practices when the situation you’re applying them to match the conditions the best practices were developed for.

by Leon Rosenshein

Leading Vs. Trailing

Indicators and metrics, that is. Or maybe we should call them drivers and outcomes. Leading indicators are predictors. Done right they help you know what’s going to happen in the future. Trailing indicators, on the other hand, are all about what happened. Consider a car without a fuel gauge. From experience you know you’ve got 6 hours of fuel. After 5 hours driving down the highway you can predict that you’re low on fuel. Hours driven is a leading indicator. But since you’re on vacation and the car has a lot of luggage in it after 5 ½ hours the engine sputters and dies. You’ve run out of fuel. RPM going to zero is a trailing indicator.

Let’s start with a reality. Anything you measure has happened. In that respect all metrics are trailing indicators. The hours you drove that car is a measure of what happened. On the other hand, you can also make predictions based on any metric, so they’re also leading indicators. If your car engine’s RPMs stay high you probably haven’t run out of fuel, so you can make a short term prediction.

It’s really about what you want and what you need to do to achieve it. Your key performance indicators (KPIs) tell you if you’re achieving the desired results. Meet them and you did. Don’t meet them and you’re not. But past performance is not a guarantee of future results. Failing to meet your KPIs won’t tell you what you need to do to fix the problem. That’s just the way trailing indicators are. When you’ve run out of gas, you don’t know why. Maybe you were going faster than usual. Maybe your watch is slow. Maybe you’ve got a rooftop cargo box on your car now. Maybe your fuel tank has a leak.

In this case, hours driven (or hours remaining) is one way to predict if you’ll meet the KPI (have enough fuel). But it’s not enough. Or accurate enough. A better one is fuel burn rate. Better if you combine it with distance traveled. Average miles per gallon is a pretty good leading indicator of how many more miles you can drive. But it’s not perfect. You might have a leak. Or the conditions you’ve averaged over are different enough from your current situation. So you want trailing indicators on your leading indicators to improve confidence.

Obviously this kind of thing has a lot to do with writing code to predict things, But what has this got to do with development in general? Consider the development process. A couple of the big things we measure and report on are time for a build and time to fix a broken build. Critical things to keep low. But their trailing indicators, and as such are great for telling us there’s a problem, but not so good at telling us what to change. For that we need to measure a bunch of other things. Like time from diff creation to approval. Number of changes to a diff before approval. Time from diff submission to test start time, The number of tests run. The number of tests that can be run at once. The reliability of the tests. The time that it takes to run all the tests.

There’s no silver bullet in that list. No one thing that will always make it better. What it points to is that you need to understand the drivers of the system (leading indicators) before you can get the results (trailing indicators) you want.

So what’s driving the result you’re looking for?

by Leon Rosenshein

Dogma

Dogma (n): a principle or set of principles laid down by an authority as incontrovertibly true.

        – Oxford Languages

That’s one definition of dogma. The problem I have with dogma is the incontrovertible part. Because even if something was absolutely true once, that doesn’t mean it’s true now. Or will be in the future.

And what happens when your dogma disagrees with itself? It’s not usually existence ending, but it can be confusing. Consider this bit of code

package main

import (
    "fmt"
)

const TWO = 2

func scaleList(numbers []int, scaleFactor int) []int {
    scaled := make([]int, len(numbers))
    for i, e := range numbers {
        scaled[i] = e * scaleFactor
    }
    return scaled
}

func doubleList(numbers []int) []int {
    return scaleList(numbers, TWO)
}

func main() {
    n1 := []int{1, 2, 3, 4}
    d := doubleList(n1)
    fmt.Println(d)
    n2 := []int{4, 2, 1, 3}
    fmt.Println(doubleList(n2))
    fmt.Println(doubleList(d))
}

It’s DRY. There’s no repetition of code. There are no magic hard-coded constants in functions. Functions do what they say they do. But is it YAGNI? KISS? Not really. Of course it’s a contrived example, but there’s no need for the scaleList function. There’s barely a need for doubleList, and a constant called TWO with a value of 2 is just added cognitive load.

Which is not to say that DRY, KISS, and YAGNI can’t live together. They can and they should. But it’s a balance. And the only bit of dogma I have is “It Depends”

by Leon Rosenshein

Flow vs. Utilization

What should you optimize, utilization (how busy everyone is) or flow (how tasks are moving through the system)? How do you measure those things? What does it even mean if they’re high or low?

What does utilization mean for a developer? What does it include? Of course, time spent typing in code counts. And time spent writing tests. But what about time building/running code and tests? Or time documenting the design decisions made? And the time spent making those decisions? Those should probably count. What about time spent tracking the time spent? Or time planning what work to do? Is time spent helping a co-worker understand something utilization? What about training, or research? All important things so how do you count them?

What does it mean to optimize flow? Is it as simple as increasing the number of Jira task state changes? Or maybe just increasing state changes to the right? Probably not because all Jira state changes aren’t equal. Is popping into existence a state change? Can you optimize development flow like an assembly line?

Or should we be optimizing something else? Like maybe value? Value as in customer value. Whoever your customer is, internal or external. And that value doesn’t exist until it’s delivered. Because working on your machine, or in staging, or some other test harness isn’t value in your customer’s hands.

Which might push you towards optimizing flow. After all, getting a task done adds value, right? Sometimes, but not always. Because, for tracking and dependency reasons, not every task in Jira adds value. In fact, most of them don’t. They’re necessary, but not sufficient. Until all of the required tasks are done, we can’t do the one task that adds value. You could spend a year doing precursor tasks and not add any value. So optimizing flow doesn’t seem right.

So how do you optimize value? First, figure out what value means. What use cases and user stories add the most value? Add new benefits to the user’s workflows? Then figure out which tasks are really required to provide those benefits. Once you know which tasks, focus on those specific tasks, not just any task. Look for bottlenecks and work to reduce them. Concentrate utilization in those areas. Think broadly. If you need domain expertise, get it. Ideally on the team, with the team if needed. If you need broad agreement between customers and suppliers, get that. Make sure everyone involved knows what you’re doing and why.

Or, to but it in the terms we started with, keep individual utilization high on the flow of tasks which roll up to adding value.

by Leon Rosenshein

We Like Ike

Dwight David Eisenhower. 34th president of the United States. Supreme Commander of the Allied Forces in Europe. Brought the term military-industrial complex into the vocabulary. All in all, a pretty busy guy for much of his career. He did a lot of things and made a lot of decisions. And regardless of if you agree with those things or his decisions, one of the other things he gave us was a framework for deciding what needs to get done when.

It’s not perfect, and of course, like all decision frameworks, there’s some judgement needed, but what it does have is simplicity. Just take each problem/task/decision and map it on two axes. Urgency and Importance. Because, “What is important is seldom urgent and what is urgent is seldom important.”

Urgency is about timeliness and deadlines. Given the delta-V of your rocket, there’s a launch window in 24 hours. Use it or not? That window will close whether you launch or not. The next opportunity will be in a few days/weeks/months/years, but there will be one. In this case, to quote Geddy Lee, “If you choose not to decide, you still have made a choice.”

Importance, on the other hand, is about the long term impact of the thing, and how much time/effort/money it will take to change later. Getting the hardware design right before you make a million of something is important. Back when software was delivered in a box, making sure the gold master you sent to the duplicator was about as important a choice as you could make.

So once you’ve got the two axes, plot them out.

Importance and Urgency matrix

The highest priority things are up and to the right. Urgent and Important. Do those first. And make sure they’re done. These are the crisis. If you don’t at least mitigate them something really bad will happen.

Next are the important ones that aren’t urgent, the top left. You want to get those things scheduled so you have time to give them the attention they need, before they end up in the top right. And by scheduled, I don’t mean schedule the crisis. I mean schedule the time you’re going to work on them and make sure they get done.

Then there’s the urgent things that aren’t as important. The key is less important, not unimportant. Can you get help on them? Is there someone better to make sure they get done? Find someone to help you work them in parallel with the other things on your list.

Finally, there are the non-urgent, unimportant things. These are the things that you try not to do. Requests that aren’t yours to resolve. Just say “No.”

One important thing to remember though. Just because you don’t think it’s important or urgent doesn’t mean it isn’t. The classification isn’t just about you and your wants/desires. You need to keep other’s priorities and your own biases in mind while classifying things. If you don’t, you end up in a silo. And you probably don’t want that.

by Leon Rosenshein

While You're In There ...

When you’re in the middle of a complicated change and you come across something isn’t quite right, but doesn’t impact the work you’re doing, what do you do? What about if you’re adding a new capability and you spot the cause of your personal annoyance? It could be a simple misspelling, incorrect documentation, or something more substantial, like common code that everyone agrees should be refactored out, but no-one has done. You’re right there. Everyone would agree it should be done.

Do you make the change? Do you make it in the same git commit? Do you make it in the same Diff/PR? It depends. Maybe. No.

It depends on the change. It depends on the cost of the change itself. Will making the change substantially impact your ability to deliver the value you’re trying to add? Will it cause rework/changes to other things that wouldn’t be impacted if you didn’t make the change?

It also depends on the difference in the cost of making the change now or later. Making the change is a context switch from value you’re trying to add. Depending on the scope, that could be a pretty large cost, but if the change is contained it’s a small cost. And it depends on the cumulative costs of whatever context switches got you to where you are. If you’re 7 layers of abstraction deep in the code and your mental model currently holds that state the cost to rebuild that context could be significant. On the other hand, if you’re working on code you work on regularly and not deep behind abstractions/helpers then the cost would be small.

Think of it this way. You go to the grocery store and buy everything on your list. While going up and down the aisles you notice the crushed walnuts and think they would go good in the salad you already have the ingredients for. You could just add them to next week’s list. After all, the salad will be good without them. But it will be better with them. And you certainly don’t want to get home, unload the car, then turn around and drive back to the store, buy the walnuts and then drive home again.

So the “It depends” is the cost/benefit calculus of not just the benefit of the change, but the opportunity cost (what you’re not doing instead), and the sunk cost of the setup that brought you to this point.

Maybe it should be in the same commit. Changing spelling in a comment? Sure, just add that to the current commit. Refactoring/changing function arguments? Use a new commit (or two). Commits are cheap. They’re your friend when something doesn’t go the way you want. Rather than go back to the beginning of the level and fight through the dungeon to where you are, just back up a couple of steps and try again. And if you’re not sure where it went wrong, there’s always bisect

No, don’t use the same Diff/PR for the two changes. Any time your commit message includes the phrase “and while I was there” stop. Think about what you’re doing. You’re almost certainly doing two separate things. So make them separate Diff/PRs. In that way they’re like git commits. Not quite as cheap, but they have many of the same benefits. So take advantage of them. Future you will thank you.

by Leon Rosenshein

Additive Vs. Subtractive

There are two basic ways to make colors, additive and subtractive. Then there’s positive and negative space. M.C. Escher loved to play with positive and negative space, and you’ve probably seen the vase/face image. Michelangelo is said to have said that the sculpture is already inside the marble, he just removes the superfluous stuff.

Escher birds positive vs negative space2 vases, positive vs negative space

But what has this got to do with software? We write software, we don’t carve it. We don’t start with a wall of text and release the program hidden within it. Or do we?

It’s been said that programming can be the art of adding bugs to an empty file. But most of the time we don’t start with an empty file, or at least without a framework we’re working in, so it’s not purely additive. In fact, a big part of what we do is work within the confines of existing code/systems.

This is most apparent when we come across something new. A common task for someone new to a team is for that person to take a bug off the list, figure out what’s wrong, then fix it. By its very nature, that’s a subtractive task. Remove the bug. And often the best way to do that is to remove a bunch of complex, brittle code with something simpler that fits the current data/processing model better.

Or consider a system where there are N ways of persisting data. Not just N places for data at rest, but N completely different ways to get the data there. Somewhere, hidden inside those N ways, is a much smaller number of ways that you really need. Ideally one. So subtract a few of them. Then there’s only one thing to maintain. One thing to update when the business/data/processing model changes. One place for bugs to hide instead of N.

So next time you’re presented with that wall of text and a problem to solve, don’t just reach for another empty file. See if you can release the masterpiece that’s hidden in that wall of text.


by Leon Rosenshein

Monolith to Microservice

Who knew Arthur C. Clarke and Stanley Kubrick were software architects? I certainly didn't. And yet, in 1968 they came up with 2001: A Space Odyssey, which includes a great metaphor for what happens when a new team encounters an existing monolith.

There it is, in all its glory. It works. It does something. It also has what appears to be a featureless, impenetrable surface. We’re not sure how or why it works, and we certainly don’t know where to start digging in. But we have to.

2001 monolith on the plains

So what do we do? We tear it apart. Into the smallest parts we can possibly imagine. Then we break it up a little farther. We end up with a pile of parts at our feet, and we’re not really sure what to do with them.

2001 Monkey smashing bones

And we start trying to mash them together. We’re not always sure what we want, but we keep trying. Eventually things start sticking together. Then more and more things. We have small successes and we build on them until, finally, it all comes together

2001 bone morphs to spaceship

1968 was actually a pretty important year for computers. It brought us the Apollo Guidance computer. A 70lb system that took man to the moon. All with less computing power than some USB chargers. And it brought us the mother of all demos, which has just about everything folks do with computers these days.

So what are you going to build on that foundation?

by Leon Rosenshein

Technical Debt and Agility

How much technical debt can/should you accrue? How agile can you be? What’s the connection?

I’ve written about tech debt before. The important thing to remember about it is that the metaphor only holds if you understand that it’s about opportunity cost and future value of money. It’s not a license to write crappy code and expect that the software elves will magically fix it later. Used correctly it’s building what you need now instead of what you think you’ll need in 12 months. Even if the rework later will have an overall higher absolute cost.

You do that for two reasons. The first and most important is that you can add value sooner by building what you need now. Because it’s the integration of released value over time that matters, not the instantaneous value released. The second reason is that while you have a good idea of what you’ll need in 12 months, and you probably even have the broad strokes correct, you don’t know the details yet. And the real cost is in the details, so you’re going to need to re-do it later anyway, so why waste time doing it now?

The question, of course, is how to balance the integrated value over time with the instantaneous value. Is another day’s work (25%) that produces 50% more value worth it? Over a week it is for sure. On the other hand, 50% more work for 1% more value takes a long time to be worth it. Probably much longer than it will take to figure out that you really should have done something different anyway. Your case is somewhere in the middle. You just need to figure out where.

That’s where agility comes in. Because agility is intimately tied to tech debt. The less tech debt you have, the more agile you can be. When you start a project you have no tech debt, and (almost) infinite agility. You can go in any direction with roughly the same cost. You just need to figure out the direction that provides the most value, and away you go.

As time goes on though, tech debt builds. And that starts to change the calculation. You spend more and more time paying the vig. That reduces the amount to effort you have to spend on adding value. Knowing where to add value also becomes harder. Unless you’ve already spent time and effort making sure you can change direction, changing direction takes more and more work.

That pushes you to spend more time up front preparing for changes. Keeping your tech debt low. Unfortunately, preparing for changes makes you more agile in the future, but doesn’t add current value. Which reduces your integrated value over time. So you don’t want to do too much of that. Which leads you right back to increased tech debt which slows you down.

Which, in a strange way, leads you to look for more frequent feedback and “optimize” your agility. By frequently looking at your vision/OKRs and comparing them to your current state you can adjust how much tech debt you carry and balance that with the agility you need to achieve your vision/objectives.

by Leon Rosenshein

A 'Making' App

Some of us are building very complicated systems that run in multiple processes across multiple machines with a pub/sub model. Others are building large scale embarrassingly parallel systems, while others are building microservice based systems that take user input and modify persistent data stores.

In production those systems have lots of discrete physical components talking with each other over systems with varying latencies. And as you’ve probably experienced, you don’t really have a working system until you’ve properly handled the issues that come from having multiple components.

Having said that, most of the issues you need to work through, the business logic, if you will, doesn’t require all of those systems to be in place to build and test them. When you think about it, that’s all your unit tests are. A making app that includes some production code. For unit tests it’s just a tiny bit of real code surrounded by mocks of the rest of the system. And it lets you determine how the code under test responds.

Then there’s integration development and testing. That’s where another version of the making app comes in. Something that takes all of those disparate components, removes the “network” transport in favor of function calls, and runs as a single binary on a single box.

Think of all the benefits. There are positive benefits. There’s no deployment/environment setup overhead. You can step into any function call instead of having to do remote debugging or network tracing. You can do function level refactoring with a simple IDE click. Cycle time (edit/compile/deploy/test) is very low.

There are the negative benefits. You don’t have to worry about hurting production traffic/data. You don’t need to worry about someone else’s test code getting in your way. You don’t have the expense of another environment sitting around waiting for you to do some testing. You don’t need to wait for that environment to be set up.

Down here in the engine room we do this a lot. There’s this thing called minikube (or kind), which are basically single machine kubernetes clusters. You can hook them up to your local docker registry and they include everything else. API servers. ETCD stores. Worker nodes. And other than scale, we can set them up just like our production clusters.

And since those local clusters are right there on the local machine, we have complete control. And isolation from others. When I was working on the admission controller, the thing that decides if a pod should be allowed to run or not, I got it wrong far more times than I got it right, but the only way to really be sure was to try it out. By running it completely locally I could do it quickly and not worry about impacting anyone else. A win-win situation.

So where could/should you be using your own making app?