Recent Posts (page 56 / 70)

by Leon Rosenshein

Improve The Daily Work

Following up on last week's entry on DevOps, one of the key tenets in The Phoenix Project is  that "Improving daily work is more important than doing daily work". Or maybe you've heard it said as "Work smarter, not harder".

Whether you call it friction, cycle time, down time, idle time, or wasted time, anything that isn't time spent thinking about, codifying, and implementing the solution to the business problem you're trying to solve is time time not adding value. To maximize efficiency and productivity you want to minimize that time.

We spend lots of time trying to make computers do our bidding. Parts of it are very creative, figuring out ways to do something no one has ever done or figuring out a more efficient way to do something we already know how to do. In between those parts is a lot of doing the same thing over and over again. Everything from writing boilerplate code to running tests to deploying/validating products. A lot of that is rote repetition. It takes time. Sometimes a lot of time. And a little bit of our attention. Not a lot, but it breaks the train of thought and kicks us out of the flow. Then you spend 15 minutes getting back into the flow, then it happens again. And you think to yourself, "There's got to be a better way, but I don't have time for that right now,", and you might be right the first time that happens, or even the second, but at some point, often a lot sooner than you think, you find that you would have been better off taking the time to fix your workflow instead of brute forcing it.

Case in point. The other day I was working on building some new machine images to stand up a Kubernetes cluster for our new datacenter. Turns out the easy way to test is to write the code, push it to a branch, trigger a remote build, deploy it, then see what happens. Lots of steps, but each one is easy, there were no prerequisites, and I got to mark lots of steps as done. It felt like I was really doing lots of work. But in reality it was slow. I had to start each step when the previous one finished. I had to wait for steps to finish. then test. Then do it all over again. It was a 25 minute cycle. I made some quick progress at first, but then it got slow, and annoying. I'd make silly mistakes and waste a cycle. Or I'd have to back up. As Macbeth said, it was a tale, told by an idiot, full of sound and fury, signifying nothing.

A couple of hours in I stepped back, built some tools, wrote a makefile, figured out how to do things without needing to restart a remote service. I spend about 4 hours and turned a 25 minute cycle into a 5 minute cycle. It took 8 cycles to make up that 4 hour setup time. 8 cycles, with a break, took an hour. So in less than a day I was way ahead of where I would have been. A couple of days later I was about a week ahead. And that doesn't count the time saved by teammates who didn't have to go through my struggle. If you're wondering where the breakeven point is, there's even an XKCD for that.

Whether it's a Makefile with some common commands and arguments, a fix-it week project to make something smoother, or, even better, a culture that says "If you have a choice between improving developer productivity or developing a new feature, choose developer productivity", one of the best ways to be more productive is to make it easier and smoother to do your work.

by Leon Rosenshein

Dev Ops Book Club

The Phoenix Project, Accelerate The Dev-Ops Handbook, and recently, The Unicorn Project. Two novels and a two how-to books backed by research. Full disclosure, the novels, as novels aren't that good. Character development is shallow, the plot moves at the speed of the message, and there are more stereotypes than you can shake a stick at.

However, if you can get past that there's some really good information in all of those books. At the top of the list are the 4 kinds of work (Phoenix) and 4 key metrics (Accelerate). They're at the top, not for themselves, but for what they drive, which is Value or impact. If what you're doing doesn't have value it doesn't matter how well you're doing it. You're wasting your own and other's time.

The 4 kinds of work are:

  • Business Projects - The things your customers ask for
  • Internal Projects - The things you do to make your life easier
  • Operational Changes - The work you do between finishing something and seeing value, deploying things
  • Unplanned Work  - Things you need to do right now. The kind of work the on-call is doing. Dealing with failures, bugs, and sudden changes in plans.


The 4 metrics are:

  • Lead Time for changes - The time between a request coming in and customers seeing the change
  • Deployment Frequency - How many times you can deploy on any given day
  • Mean Time to Recovery - Average time to recover from a failure
  • Change fail percentage - The percentage of changes that fail to work as designed


There are also other operational things mentioned and explained that help achieve those goals while delivering value. Things like customer focus, developers finding joy from getting into the flow, improvement of daily work, and minimizing work in progress (WIP). The first one is key to making sure you're not wasting your time and truly adding value.

That last one is also interesting and something my boss and I appear to have divergent views on, but really, that's not the case. He likes to minimize WIP. Do one thing, finish it, then do the next. And generally speaking, I agree with him. Sometimes there are good reasons to not do that. But that's a discussion for another day.

by Leon Rosenshein

Good Fences Make Good Neighbors

Speaking of bounded contexts, I love contexts, or more precisely, I love the idea of bounded contexts. Boundaries are great tools for simplification. They help you maintain separation of concerns and isolation both inside a module and between modules. And that's domain driven design. One of the things you'll find when you dig a little deeper into microservice architecture is that everyone talks about Bounded Contexts and Domain Driven Design. The important thing to remember though is that regardless of the "architecture" you're building, good architecture is good architecture. Microservices might give you more scale-out, but you pay for it with more cognitive load.

Most of the things we do with computers these days is a simulation of some real-world thing, whether it's passing notes in class, driving a car, or high energy physics. Some are mostly in our heads, some are very dynamic, and some have more interaction with the physical world than others. They have different levels of detail, but they're all models or abstractions that we simulate. And like any other model, they have boundaries. Things that are part of the model and things that aren't. And the clearer the boundaries, the easier it is to know what's in and what's out. And that's a Bounded Context. And once you know the Bounded Context for each model it's much easier to put them together into a model of the entire system. 

Of course, scope matters. Bounded Contexts isn't something that only applies to architects or how microservices fit together. You should use Bounded Contexts at every scale. Functions in a class should have clear boundaries as well. It makes no more sense to have the function that sets the input path print out the final report than it does to have your sensor fusion service control emergency braking.

So know what your boundaries are. And if they're not clear, find the person on the other side and work to firm them up. You'll both be happier.

by Leon Rosenshein

The More Things Change

Last week I wrote about the difference between an engineer and a technician. Another way to think about it is where in the technology stack you operate. 30+ years ago the tech stack was a CPU, some ROM, some RAM, and maybe a persistent storage device. And that was enough to go to the moon. Today you can find that amount of processing and storage in the wall charger for your phone. Now we've got languages that manage all of your memory for you, IDEs that point out your mistakes, refactor your code, and suggest what you want before you type it, and serverless clouds to run you code in. Or, as others have said, we're "Standing on the shoulders of giants"

Best practices and the environment we're operating in might have changed, but the core problems really haven't. Functional may have replaced Object Oriented as the new hotness, but if you go back to first principles, you find they should live together. It's just a matter of understanding/expanding the scope. 

Cognitive load of the developer is still one of the biggest limiting factors of system scale. We just have new ways (microservices, Domain Driven Design, smarter IDEs, etc) to help manage the load. Execution time is still important. Now we have more/faster processors, so we can spend time making it easier (if less efficient in isolated single case) to parallelize, distribute, and scale things out without increasing cognitive load. But we're still limited by cognitive load

Perceived wait time is still important to users. But now we have enough local horsepower/bandwidth to do a bunch of local validation instead of waiting for a round trip. We have spinning circles of hope on the screen during the roundtrip. We can send bigger/higher resolution images instead of reducing everything to some standard 128 colors. But wait time, bandwidth, and resolution are still things we need to worry about.

by Leon Rosenshein

Rightsizing

How big is too big? How small is too small? How do you decide? What criteria goes into the decision? A good place to start thinking about this is the Unix Philosophy. Small and simple. Composable. Do one thing and do it well.

And you can apply it at just about any level you want. Functions should be small and single purpose. It would be odd to see a function called `HandleFile(...)` that parsed it's inputs and sometimes printed a file, sometimes executed it, and sometimes deleted it. Similarly, you don't expect your graphics library to render images to a canvas and control a robot. Going even broader in scope, Microsoft Office and the Google Suite are collections of (mostly) single purpose tools (documents, spreadsheets, drawing, presentation) that are composable and look/feel like they belong together. But you don't want or expect to use them to control a 6 axis CNC mill.

You can go even larger than that. Think about a distributed microservice architecture. The microservices act as functions/libraries in a larger system. Separation of concerns and bounded contexts help you keep things "simple". Whether you're talking about a function, library, tool, application, or service, knowing where the boundaries are lets you maintain context.

So keep things small. Good advice. but like any other piece of advice, you can take it too far. You can make things too small. If your boilerplate comment block is larger than most of your functions you've probably gone too small. If, instead of ls and and option to be recursive (ls -R) or show attributes (ls -l) you have new commands for each option (lsr, lsl, lsa, etc) you've gone too far.


by Leon Rosenshein

FI/RI, Flag, Merge, Rebase, Trunk

How do you develop a feature? Do you go off into your own little world (feature branch), develop for a few weeks/months, then spend another week dealing with merge issues and releasing your shiny new feature on the world fully formed and integrated? Or maybe after weeks of work you just rebase, handling the text conflicts relatively easily, but then dealing with hidden logic changes for a week. Or you might have tried to just live in master (trunk) and make small, innocuous changes all along until somehow there's enough there there for users to notice and ever so slowly becomes a full feature? Or maybe you've got 1000's developers working on "features" that are roughly the size of most applications and you ~want~need to share code and release them in a big-bang event?

Lots of different options. And of course, when there are lots of options there's no one right way that always works. It's far more fluid and personal. It depends on the level of coupling between the changes you're making and the changes others are making. It depends on how long you expect to be different than everyone else. It depends on the overall velocity of the codebase. It depends on the size of the codebase and the size of the teams. It depends on your personal style and workflow, and the team/org's style and workflow.

So what's a poor software engineer to do? Compromise of course. Find the best solution for your current set of constraints. The worst thing you can do is blindly do things the same way all the time. So next time you have an option, think about it. Think about where you want the branch/PR to go in the future. Discuss it with someone. And do it mindfully, not out of habit.

So what's your preference and why? Put it in the thread.

by Leon Rosenshein

2038 == 2000

How many of you remember the Y2K problem? How many of you had to fix your way through it? For those that weren't involved, the Y2K bug was the result of an optimization. It went something like this, circa 1970:

Dev 1: Our database is getting too big. We need to save some space.
Dev 2: Ok. Let's not store redundant data.
Dev 3: Hey, Look. All of the years stored in our database start with 19. Let's not store that
Dev 1: Great idea. We've got 1,000,000 users, and 2 bytes per row is 2 MB.
Dev 2: Wow, we'll fit in memory again. Way to go man.

And it was so. And it worked. And for 20 years no-one noticed. Then some tester got fancy and put in a date a few years in the future and something broke. Time seemed to go backward. Things that were supposed to be 2 years away were 96 years ago. Or displays were just wrong and showed as 19XX. But slowly people fixed the problems. Then the internet found out about it. And there was much wailing and gnashing of teeth. Prophecies of doomsday. The power grid was going to fail. The phones would stop working, Turn your computers off before midnight on Dec 31st, 1999 or they would melt.

Of course, as you can tell since you're reading this on a computer, most of those predictions didn't come true. Yes, some displays were wrong, some programs crashed, and others got things wrong, but generally it was a non-event.

So we're safe right? Not exactly. In the Windows world, anything built with VC8+ (2005) is good until the year 30,000+, but if you're still running something older (like an embedded system running Win-CE) you'll probably have issues. And in the Unix world almost everything is time_t based and good until 2038.

Of course 2038 isn't that far away anymore. And our favorite penguin isn't ready. The 5.6 kernel will be ready soon, but that doesn't mean the programs in user space are ready. That's on us to deal with. It could be as simple as a recompile to run on the new kernel, but there's more to future-proofing than just recompiling and deploying. Think about persisted data. What about databases and files with old timestamps in them? Or that nifty new gRPC protobuf you wrote to disk with a 32bit timestamp. We need to think about these things and deal with them before time runs out.

by Leon Rosenshein

Architecture vs Design

Here's a tough one. What's the difference between software architecture and software design? Both are obviously about software. Both talk about how things fit together. Both have patterns and best practices. Can you have one without the other? Can you have good architecture and bad design at the same time? What about vice-versa?

I think the difference between the two is about the difference in scope. I think a lot of things that seem similar, but aren't, are different because of scope. In this case, it's not just scope, but scale.

Software design is about how things fit together at the function/class/module level. Are your functions really functions, or a wrapper around side effects? Does the function name help the caller understand what is going to happen, or does it just collect some stuff into one line so some other function fits on one screen? Do your classes encapsulate something and provide a useful abstraction or are they just collections of stuff and global things? Do your modules do what they say they do and nothing else? Can you trust your modules not to interfere with each other? Do they have flexibility, scalability, reliability, and replaceability?

Software architecture is about how things fit together at a higher level. Are your databases coherent? Do your queues only handle one thing? Do your services/APIs/gateways do what callers think they will and nothing more or are they just collections of capabilities so you have fewer things to deploy? Can someone look at the architecture and understand which part is responsible for which functionality or are they just collections of stuff and persistent data? Can you replace one part without impacting others? Do they have flexibility, scalability, reliability, and replaceability?

Notice anything similar between those two paragraphs? They start and end the same. They use a lot of the same descriptions and requirements. It's just the scope of the things I'm talking about that are different. The same principles that go into a good function are the same things that male a good class/module are the same things that go into a good system architecture. They need to be SOLID. You need to think about the quality attributes. It's just the scope and scale that changes.

by Leon Rosenshein

Engineer vs Technician

I've been doing campus interviews for tech companies for a long time now. Long enough that it's been 10+ years since I told a candidate that I got my BS in 1988 and the response was "I was born in 1988." In that time I've done 100's of interviews at dozens of schools. And one of the things I quickly learned is that while there are lots of schools and cultures and styles, not everyone who learns to code and looks for a job as a programmer is a software engineer. Out of those 100s of interviews the vast majority of candidates could code, at least to the level expected for a 45 minute on-campus interview. But most of them weren't what I would call software engineers. And some of the people who's coding wasn't up to that level were. So what's the difference between a programmer and software engineer?

I think it comes down to attitude and mindset. Programmers understand the basics of a language. They understand the syntax and know the best practices, and they regularly apply them. Give them a clear set of requirements and you'll get back a function/library/executable that meets all the requirements. And that's it. And they're happy with the results and doing things that way. They don't know why the best practices are best practices and don't care. They don't know what happens if something in the environment goes wrong. That's the other person's problem. They don't know the customer/end-user and they don't want to. And there's nothing wrong with that. In many cases that's what's needed. As IDEs and frameworks and infrastructure becomes more of a commodity it becomes easier for a technician to put the pieces together the right way and get things done.

A software engineer, on the other hand, asks what if and why. Why does that work that way? Why do you need it to do that? What happens if that packet doesn't arrive? What if the other person isn't following the rules? What if the requirements are conflicting? And my favorite question, "What is it you're really trying to do here?". Software engineers don't just make it work, they figure why it works, what might break it, and what to do in case it does. If you think those questions are important you're a software engineer. Even if you haven't spent the last 4 years learning the intricacies of C++, you're an engineer. And it's the engineers who build those commodity IDEs and frameworks and infrastructure that lets the technician be so productive.

by Leon Rosenshein

Let It Crash

This is the philosophy of Erlang. Only do the right thing, as specified in the requirements. If you can't do that, fail. Don't try to make it better. Don't try to hide it. Don't pretend it didn't happen. Just fail. And let something else deal with the problem. Something with more scope/visibility/understanding of what's going on. And that's not a bad idea. So maybe we should just let it crash.

I don't mean the robot. I mean the code. And I don't mean crash in some uncontrolled manner and leave it there. What I mean is that if you don't have the information/understanding/scope to appropriately handle a failure, don't. That's what the VIM is for.

And this doesn't mean you get to ignore errors. Oh no. Quite the opposite. You need to handle them, but you need to handle them in the right place and the right way. It might be killing and restarting the process. It might be failing over to a different master. It might be a voting algorithm to decide who's in charge now. If you have a triple-redundant flight control computer and one of them gets confused, rather than try and fix the problem, restart it and pick up from where you are. Embedded systems often have supervisor or watchdog processes that know when something bad happens and trigger restarts rather than continue the bad behavior. In high availability distributed software you often find multiple instances ready and waiting to take over if something happens to the current master.

Everything is a trade-off. Can we build everything with standby systems ready to take over at a moment's notice? Do we need to? Should we do it anyway? Is it enough to make sure that we "fail safe" and that we can recover later? While there are wrong answers, there's not always a right answer. The important thing is to think about it and understand what the failure cases are and how we're going to deal with them.