Recent Posts (page 31 / 70)

by Leon Rosenshein

Monolith to Microservice

Who knew Arthur C. Clarke and Stanley Kubrick were software architects? I certainly didn't. And yet, in 1968 they came up with 2001: A Space Odyssey, which includes a great metaphor for what happens when a new team encounters an existing monolith.

There it is, in all its glory. It works. It does something. It also has what appears to be a featureless, impenetrable surface. We’re not sure how or why it works, and we certainly don’t know where to start digging in. But we have to.

2001 monolith on the plains

So what do we do? We tear it apart. Into the smallest parts we can possibly imagine. Then we break it up a little farther. We end up with a pile of parts at our feet, and we’re not really sure what to do with them.

2001 Monkey smashing bones

And we start trying to mash them together. We’re not always sure what we want, but we keep trying. Eventually things start sticking together. Then more and more things. We have small successes and we build on them until, finally, it all comes together

2001 bone morphs to spaceship

1968 was actually a pretty important year for computers. It brought us the Apollo Guidance computer. A 70lb system that took man to the moon. All with less computing power than some USB chargers. And it brought us the mother of all demos, which has just about everything folks do with computers these days.

So what are you going to build on that foundation?

by Leon Rosenshein

Technical Debt and Agility

How much technical debt can/should you accrue? How agile can you be? What’s the connection?

I’ve written about tech debt before. The important thing to remember about it is that the metaphor only holds if you understand that it’s about opportunity cost and future value of money. It’s not a license to write crappy code and expect that the software elves will magically fix it later. Used correctly it’s building what you need now instead of what you think you’ll need in 12 months. Even if the rework later will have an overall higher absolute cost.

You do that for two reasons. The first and most important is that you can add value sooner by building what you need now. Because it’s the integration of released value over time that matters, not the instantaneous value released. The second reason is that while you have a good idea of what you’ll need in 12 months, and you probably even have the broad strokes correct, you don’t know the details yet. And the real cost is in the details, so you’re going to need to re-do it later anyway, so why waste time doing it now?

The question, of course, is how to balance the integrated value over time with the instantaneous value. Is another day’s work (25%) that produces 50% more value worth it? Over a week it is for sure. On the other hand, 50% more work for 1% more value takes a long time to be worth it. Probably much longer than it will take to figure out that you really should have done something different anyway. Your case is somewhere in the middle. You just need to figure out where.

That’s where agility comes in. Because agility is intimately tied to tech debt. The less tech debt you have, the more agile you can be. When you start a project you have no tech debt, and (almost) infinite agility. You can go in any direction with roughly the same cost. You just need to figure out the direction that provides the most value, and away you go.

As time goes on though, tech debt builds. And that starts to change the calculation. You spend more and more time paying the vig. That reduces the amount to effort you have to spend on adding value. Knowing where to add value also becomes harder. Unless you’ve already spent time and effort making sure you can change direction, changing direction takes more and more work.

That pushes you to spend more time up front preparing for changes. Keeping your tech debt low. Unfortunately, preparing for changes makes you more agile in the future, but doesn’t add current value. Which reduces your integrated value over time. So you don’t want to do too much of that. Which leads you right back to increased tech debt which slows you down.

Which, in a strange way, leads you to look for more frequent feedback and “optimize” your agility. By frequently looking at your vision/OKRs and comparing them to your current state you can adjust how much tech debt you carry and balance that with the agility you need to achieve your vision/objectives.

by Leon Rosenshein

A 'Making' App

Some of us are building very complicated systems that run in multiple processes across multiple machines with a pub/sub model. Others are building large scale embarrassingly parallel systems, while others are building microservice based systems that take user input and modify persistent data stores.

In production those systems have lots of discrete physical components talking with each other over systems with varying latencies. And as you’ve probably experienced, you don’t really have a working system until you’ve properly handled the issues that come from having multiple components.

Having said that, most of the issues you need to work through, the business logic, if you will, doesn’t require all of those systems to be in place to build and test them. When you think about it, that’s all your unit tests are. A making app that includes some production code. For unit tests it’s just a tiny bit of real code surrounded by mocks of the rest of the system. And it lets you determine how the code under test responds.

Then there’s integration development and testing. That’s where another version of the making app comes in. Something that takes all of those disparate components, removes the “network” transport in favor of function calls, and runs as a single binary on a single box.

Think of all the benefits. There are positive benefits. There’s no deployment/environment setup overhead. You can step into any function call instead of having to do remote debugging or network tracing. You can do function level refactoring with a simple IDE click. Cycle time (edit/compile/deploy/test) is very low.

There are the negative benefits. You don’t have to worry about hurting production traffic/data. You don’t need to worry about someone else’s test code getting in your way. You don’t have the expense of another environment sitting around waiting for you to do some testing. You don’t need to wait for that environment to be set up.

Down here in the engine room we do this a lot. There’s this thing called minikube (or kind), which are basically single machine kubernetes clusters. You can hook them up to your local docker registry and they include everything else. API servers. ETCD stores. Worker nodes. And other than scale, we can set them up just like our production clusters.

And since those local clusters are right there on the local machine, we have complete control. And isolation from others. When I was working on the admission controller, the thing that decides if a pod should be allowed to run or not, I got it wrong far more times than I got it right, but the only way to really be sure was to try it out. By running it completely locally I could do it quickly and not worry about impacting anyone else. A win-win situation.

So where could/should you be using your own making app?

by Leon Rosenshein

Humor

Every once and a while I wander back to defprogramming.com to see what comes up. Today I got

Code is like humor. When you have to explain it, it’s bad.
    -- CORY HOUSE

Which I really like. Because code should be obvious when you read it. That doesn’t mean what to write is obvious. Far from it. The problem you’re trying to solve is nuanced. It’s complex. There are lots of corner cases to odel with. I get that.

But the resulting code shouldn’t be complex. It may be detailed, but it should be easy to understand and easy to explain. There shouldn’t be unintended side-effects, and there shouldn’t be unexpected requirements.

Or, to put it differently, yes, you should be commenting your code, but your comments shouldn’t be to describe what the code is doing. What the code does and how it does it should be obvious from reading it. Instead, your comments should be about why you’ve chose to do it that way. What were the constraints? What didn’t work and why? If there’s an extension that seems obvious but won’t work, explain why before someone tries and it doesn’t work.

Because cognitive load. Remember, how you write your code isn’t for the computer. The compiler/interpreter is going to take what you write and turn it into something the machine understands. Don’t worry about that part. The part you need to worry about is the next person to look at the code. Make it easy for them to understand. Otherwise you’re going to get paged in the middle of the night when something goes wrong. And no-one wants that.

by Leon Rosenshein

Naked Returns

TIL that you don’t actually need to list the things you return from a method when you use the return keyword. That’s called a naked return. At first I thought of Scala’s return and how you generally shouldn’t use it, but it’s clearly not that. It’s not (basically) returning the last thing calculated when you exit a function. That has lots of potential issues by itself and has led to some odd looking code, but that’s a different story.

Go’s naked return is conceptually very different. You can use a naked return if you not only declare the type of the returned value(s), but the variable names as well. What happens is that under the covers the compiler will create those variables, give them their default values, and then, when you just return` with no parameters, it will use the values of those variables.

1  package main
2
3  import (
4      "fmt"
5  )
6
7  func calculate(l, r int) (sum, prod int, block string) {
8
9      if l == 0 {
10         return
11     }
12
13     if l == 1 {
14         return sum, prod, block
15     }
16
17     sum = l + r
18     prod = l * r
19     block = "outer"
20
21     if l == 2 {
22         block := "inner2"
23         return sum, prod, block
24     }
25
26     if l == 3 {
27         block = "inner3"
28     }
29
30     if l == 4 {
31         block = "inner4"
32         return
33     }
34
35     if l == 5 {
36         // block := "inner5"
37         return
38     }
39
40     return
41 }
42
43 func main() {
44     sum, prod, block := calculate(0, 3)
45     fmt.Printf("For l = 0 - Sum: %d, Product %d. From %s\n", sum, prod, block)
46     sum, prod, block = calculate(1, 3)
47     fmt.Printf("For l = 1 - Sum: %d, Product %d. From %s\n", sum, prod, block)
48     sum, prod, block = calculate(2, 3)
49     fmt.Printf("For l = 2 - Sum: %d, Product %d. From %s\n", sum, prod, block)
50     sum, prod, block = calculate(3, 3)
51     fmt.Printf("For l = 3 - Sum: %d, Product %d. From %s\n", sum, prod, block)
52     sum, prod, block = calculate(4, 3)
53     fmt.Printf("For l = 4 - Sum: %d, Product %d. From %s\n", sum, prod, block)
54     sum, prod, block = calculate(5, 3)
55     fmt.Printf("For l = 5 - Sum: %d, Product %d. From %s\n", sum, prod, block)
56     sum, prod, block = calculate(6, 3)
57     fmt.Printf("For l = 6 - Sum: %d, Product %d. From %s\n", sum, prod, block)
58 }

For l = 0 - Sum: 0, Product 0. From
For l = 1 - Sum: 0, Product 0. From
For l = 2 - Sum: 5, Product 6. From inner2
For l = 3 - Sum: 6, Product 9. From inner3
For l = 4 - Sum: 7, Product 12. From inner4
For l = 5 - Sum: 8, Product 15. From outer
For l = 6 - Sum: 9, Product 18. From outer

In this contrived example, an immediate return, either naked (line 10) or explicit (line 14) gives you the default return values. Then depending on the value of l, you get different combinations of where the block variable is set and what and where the return is, but other than for l==0 and l==1, sum and prod are always calculated up front and are returned whether explicitly set on the return or not. And just in case you shadow the implicit declaration with a scope, if line 36 is uncommented you get a compile time error

./prog.go:37:3: block is shadowed during return

Which ensures you don’t do it by accident.

Seems handy, right. So why not always use them? Because they’re hard to read. The farther the return is from the declaration, the harder it is and the more cognitive load there is when you come across a naked return. And if I’ve said it once I’ve said it a bunch of times, always try to reduce cognitive load. There are plenty of other places where you’ll need those cycles, so when you can make things obvious, you should.

If you want to see it in action, check out this playground

by Leon Rosenshein

Not Just About Crows

I didn’t know this, but apparently the twitter user known as @tef_ebooks is all about the crows. Or at least posts lots of pictures about crows. I ran across them for something different.

thinking about that time i was in a meeting with amazon engineers, and my co-workers asked "what's serverless"
i said "per-request, not per-instance billing" and there was an awkward silence like all the hype had been let out of the room

That’s pretty blunt. And accurate. Which is by no means a complaint. Serverless has a lot going for it. Workers can scale quickly. Machine/instance management overhead is cut way down. You get to focus on the business problem and adding value. All good and important things.

But it’s also a recognition that Serverless isn’t a panacea. It doesn’t make the complexity of the problem go away. After all the problem space hasn’t changed, just the solution framework.

By definition it’s stateless, which is just another way of saying “State is someone else’s problem”. Which is great, unless you’re that somebody, in which case it’s still your problem. So you still have to deal with state, whether it’s a database, a set of files/S3 buckets, or something else entirely. And, that state management system needs to either scale to whatever the upstream system can scale to or include a throttling component.

Error handling is still a thing. Maybe you can drop bad requests or errors on the floor and ignore that transaction. But often you can’t. Which leads to retry loops, dead letter boxes, and manual mitigation. All of which are additional state to manage. Often, most of the complexity is actually in how to deal with those off-nominal cases. Note most of the compute, but the complexity.

There are other things. Warmup time per instance. You can set up pre-warmers, but it’s an additional thing. Emergent behavior. The more systems and interaction points, the more likely you are to see something unexpected. Billing surprise. Sure, you can put a cap on concurrency, but didn’t you set this up to remove that cap in the first place?

Even so, Serverless has its place. Not having to worry about managing machines, keeping them up to date, oor dealing with hardware issues has lots of value. In many cases the benefits outweigh the costs. Just don’t blindly assume serverless is the silver bullet. Make an informed decision.

by Leon Rosenshein

README.md

Markdown is pretty simple. A few `#`’s, `*`’s, and `-`’s and you’ve got a formatted document. Unless you don’t.

But you know where markdown files really shine? As the lowly README.md. Not to be confused with Reamde, which is excellent, but something completely different.

That README file is amazing for a whole slew of reasons. Not the least of which is that, since by convention almost all files have lower case names, it’s right there at the front of your directory listing, right next to your Build(.bazel) file and/or your Makefile and all of the code it’s describing. So it’s very easy to find.

README.md is easy to edit. It can be as simple as an ASCII text file, so your favorite IDE, even if it’s nano, will do just fine. All of the typography marks are just ASCII characters in the right places so it’s easy to type.

It’s also easy to read, and the typography marks do a really good job of letting you know not just that the author was marking something as special, but also how. Heading marks indent further for things of lower importance. Unordered lists look like lists, quotes look like old-school email forward/quotes, and bold/italic sections stand out clearly, even if you’re not sure which one is which.

So they’re easy to find and easy to write and read. Dayenu. But there’s even more.

The other part that’s important is that it’s a single file. Which means that in practice it’s not too long. If it gets too long you’ve probably got the wrong metaphor or are documenting at the wrong level. If you need to show more detail you can create another README.md deeper in the tree and reference it at the higher level.

And if you want a bigger set of online docs there’s doxygen/mkdocs to build a nice searchable website out of the various README.md files.

by Leon Rosenshein

OKRs and You

What is a vision? What about a mission?  What do Key Performance Indicators (KPIs) and Objectives and Key Results (OKRs) have to do with either of them? When and where can you use them?

A vision statement is a picture of the future. It defines the end goal. The future the company wants to achieve.

A mission statement, on the other hand, says what you’re going to do to enable that vision.

Which leaves us with KPIs and OKRs. KPIs are a cornerstone of Management by Objectives (MBO). They’re how you measure your progress against the defined objectives. And in OKRs your key results tell you if you’ve met your objectives. Two ways to organize, manage, and track that sound like the same thing, right?

MBO/KPI and OKR may both use the words objectives and key, but, in reality they’re pretty different. The simplest difference is in expectations. In MBO, the expectation is that you achieve all of the KPIs. With OKRs, not so much. Typically 70%-80% is considered success.

A more significant difference though, is the overall philosophy. OKRs are pretty directly tied to the vision and mission. They describe and measure both the what and the how. The objectives are tied to the change that the vision represents (the what) and the key results let you know if the method (the how) is working.

You should be able to express your OKRs with this simple formula.

We will (objective) as measured
by (these key results)


The formula provides the overall framework that ties the company vision/mission to team objectives. The OKRs are built bottom up, in service to the vision/mission and partner team’s OKRs.

The other important part of OKRs is that they’re tied to value. If nothing else, they’re tied to the value that the vision brings. They describe the value of the objectives to the customer, whether that’s an internal or external customer. Which leads to the big difference between key results and KPIs.

KPIs are thresholds on measurements. Average page load time less than X. Daily cloud spend less than Y. But there’s no indication of what changes or why. No description of how that adds value. Key results, on the other hand, help describe how much value meeting the objectives brings.

To that end, Key results come in two basic flavors, activity based and value based. Activity based KRs are typically used in the 0 -> 1 case. If you are going to add value by starting to do something. By building, launching, or starting something new. You can add a lot of value with a step change to a system. Value based key results are more typically used to describe an incremental change. They let you know how effective your activity has been, without defining the activity. They describe how much value you’ve added by saying how much you’ve improved the result.

When possible prefer value based KRs. For example, let's say you’ve got an objective to “improve developer efficiency”. You might be planning to do that in part by reducing build time by enabling remote builds. You could have a KR that was “enable remote builds”. An activity based KR. You’ve succeeded in the KR by enabling it. But have you made developers more efficient? A better KR might be “reduce build time by 50%”. While not a direct measure of developer efficiency, build time is a much better proxy. And still very measurable,

Finally, when and where can you use OKRs? Now, and everywhere. Whenever you have a goal or a vision OKRs make sense. You can use them to create and maintain alignment inside a company, across an industry, or with your own personal board.

by Leon Rosenshein

Hyrum's Law

With a sufficient number of users of an API, it does not matter what you promise in the contract:
All observable behaviors of your system will be depended on by somebody.

    -- Hyrum’s Law

Or, what’s old is new again. The earliest reference to Hyrum’s Law I could find was in late 2016, but that’s hardly the first time I ran across the idea. That’s just the name with the weight of Google behind it.

I ran across a variant of it when I was working on Falcon 4.0. Our testers decided some debug code was a great feature, so we ended up shipping it. After that I went to work for this little software company called Microsoft. Microsoft was the dominant force in the industry for many reasons at the time. At the time, and still, to a large degree, Microsoft focused on the enterprise customer. Customers who would have 10s or 100s of thousands of licenses and build lots of internal tools and processes use Microsoft products. And not just Windows/Office. Flight Simulator sold 100s of thousands of copies of each version. From the operating system to the office suite, to games, companies lived on MS software. And when you have a single customer with that many licenses you listen to what matters to them.

And one of the things that was important to customers was that things keep working. We dealt with that a lot on Flight Simulator. We spent a large percentage of our time in each development cycle making sure things were bug-for-bug compatible. In Windows it turned into an entire feature of its own, Compatibility Mode.

But unless you’re building the actual end user experience(and even there, your bug is their feature), you need to have some kind of interface to what you’re building. If there’s no way to provide input/read output it’s not doing anything but turning electricity into heat, so there will be some kind of API. Given that constraint, there are a few things to keep in mind.

  1. Version your API: Get people into the habit of specifying the version, so when it changes it’s not a mental shift.
  2. Eliminate, or at least minimize side effects: The fewer side effects, the less likely sers are to start depending on them.
  3. Keep the API as focused as possible: The bigger the surface area of your API, the more unique combinations of things users will come up with/ 
  4. Expect the unexpected: Users will come up with new and interesting ways to use the API that you didn’t intend. Be prepared to support them
  5. Instrument your API: That way you can understand how it’s being used and be less likely to be surprised when something stops working.
by Leon Rosenshein

Managing Expectations

A suggestion by a general to a private is an order. An order from a 2nd Lieutenant to a Master Sergeant is a suggestion.

    — Unknown

I cannae change the laws of physics! I've got to have thirty minutes!

     — Scotty (eight minutes before the Enterprise might be destroyed)

Although they come at it from different directions, both of those are about expectation management. Whether you’re asking/demanding something, or being asked/told to do something, the expectations, on both sides, are defined not just by the words, but by the situation.

When a general walks into a briefing room and asks someone to get a cup of coffee, he or she probably expects to get one, but doesn’t expect someone to stop doing what they’re doing and get the coffee. But chances are that the soldier fresh out of boot camp will do just that. Because their expectations are different. The opposite happens when the butter-bar fresh out of West Point tells the Sergeant Major what to do. The Sergeant Major doesn’t drop everything and do it, but instead explains the situation to the 2LT. Meanwhile, in the mid-2200s, Montgomery Scott was viewed as a miracle worker by telling people he couldn’t change the laws of physics.

But what has any of that got to do with work? It’s about managing expectations. It’s about how you handle the unexpected but foreseeable future. And it’s about understanding the impact your position relative to another person changes how what you say is heard.

One good example is what happens when your understanding of the situation changes. Your estimate of the time to complete should change as well. It might go up or down. What you do with that information is the important part. If you think you’re going to be finished sooner say something. But also say something if you think it’s going to take longer. You don’t want to surprise anyone. What you’re doing is almost certainly a dependency of someone else’s. They’ve planned for it to arrive at a certain time. They have an expectation. And now it’s wrong.

Think about how you’ve felt when someone missed a deadline. No one likes it. But the sooner you find out about it, the sooner you can adjust your plan. Or, if the timing is the critical factor you can discuss ways to have something done in time. It might be less than ideal, but it can still add value. There are lots of ways to start that conversation, but that’s a topic for a different time.

And that’s effective expectation management.