Recent Posts (page 26 / 67)

by Leon Rosenshein

When You Have A Hammer

You haven’t mastered a tool until you understand when it should not be used.

Kelsey Hightower

I’ve talked about patterns and best practices before. They’re great. They give you a starting point. But they’re no substitute for thinking about your situation.

Consider the hammer, screwdriver, and pliers. In theory, there’s very little you can’t do with that set of tools. You use the hammer to put screws in. You use the pliers to take screws out. And you use the screwdriver to open paint cans, pry valve covers off, and, with the hammer, chisel out notches in wood. That’s mastery of your tools, right?

Not exactly. Sure it works, but screwdrivers are much better at putting in and taking out screws. using screwdrivers as prybars or chisels can work, but you end up with

screwdriver

And the less said about using pliers for every nut, bolt, and pipe you need to turn, the better.

The same can be said of software. Consider DRY. Don’t repeat yourself. Short functions. Generally good ideas. But even that can be taken to extremes. Consider this silliness

package main
import (
    "fmt"
)

func increment(x int) int {
    return x + 1
}

func is_less_than(x int, threshold int) bool {
    return x < threshold
}

func print_i_and_j(i int, j int) {
    fmt.Printf("I: %d, J: %d\n", i, j)
}

func main() {
    for i := 0; is_less_than(i, 10); i = increment(i) {
        for j := 0; is_less_than(j, 15); j = increment(j) {
            print_i_and_j(i, j)
        }
    }
}

It works, but it’s actually less clear. Sure, we can guess what those functions do, we can’t be sure. Especially if they’re in another package/library.

Or, a different example would be microservices. Sure, at planet scale lots of things need to scale at different rates/shapes, so when you get to that scale you need it. But using a microservice architecture for a simple static website doesn’t make a lot of sense.

So when you reach into your toolbox, make sure you grab the right tool, not the one on top :)

by Leon Rosenshein

Wat

Heading into the weekend, a little programming humor for you. All languages have their oddities, but some are a little odder than others. For instance, Javascript can lead to watman.

And if that’s not enough oddness, here are some more weird languages. Take a look, one of them might be just the thing for your next project, but probably not. 

by Leon Rosenshein

OOP at Scale

Quick, what’s the defining item of object oriented programming? Classes? Hierarchy? Inheritance? Enterprise Java Factories? 

According to Alan Kay, who came up with the term over 50 years ago, the 3 defining parts of OOP are:

  • Message passing
  • Encapsulation
  • Dynamic binding

Nothing about objects, classes, or inheritance. More about isolation and clear boundaries. You can (and should) do that with your code as well. Use function calls (message passing) instead of doing things yourself. Don’t set up some shared state and pass control off, explicitly send the required info and explicitly collect the answer.

Keep like things (data) and functionality (methods) together. The most common way to do that is with a class, but duck typing can suffice. It doesn’t need to be a class. It could just be a library. Everything about an employee could start with Employee_ and just be a loose method in there. Use data structures that collect all the relevant information together, not just a big heap of data.

Wait until as late as possible to decide exactly which method to call. Sure, you know what you want to do, and how to call it (which message to send), but don’t decide exactly which one until later. Dependency injection is a great example. If you need to durably store a blob of data just call the Store() method. Which store (file system, S3, database, testing, etc.) you determine at runtime because it’s encapsulated.

That’s Object Oriented Programming.

Now consider microservices. While there are lots of transports, by definition, all communication between services is via a message. It has to be because the thing you're talking to is potentially running on a computer miles away. You’ve got to send a message.

And since that service is somewhere else you can’t just reach inside it for some info or to change something. You have to send a message. It doesn’t get much more encapsulated than that.

As for dynamic binding, there’s DNS, which can change between one message and another. And most microservice systems have another level of load balancer behind that DNS entry, so the actual service you’re talking to is decided until after you make the request, and milliseconds before the message is handled. It doesn’t get much more dynamic than that.

And that’s OOP at Scale

by Leon Rosenshein

Empirical Software Engineering

What is software engineering? What is important to software engineering? What do we know about software engineering? How do we know what we know about software engineering?

Perhaps more importantly, what do we know we don’t know about software engineering? According to Hillel Wayne

Empirical Software Engineering is the study of what actually works in programming. Instead of trusting our instincts we collect data, run studies, and peer-review our results.

Seems reasonable. But what does it mean? Are shorter functions better or worse? Is there a language that is better to use than some other language? Is a microservice architecture better than an event driven one? For that matter, what does better mean?

Like many things, the answer is, it depends. It depends on context. It depends on the problem space. It depends on the constraints you have to deal with. There’s a great scene in Apollo 13 where they need to “make this, fit into the hole for this, using nothing but that”. Those are constraints. Doesn’t matter what the perfect answer might be. That’s the best answer, right now.

And in Hillel’s talk, he talks about the definition of better a lot.  And not just what better means, but how do you know if you’re right, and if the cause you’re positing is really the cause? And qualitative vs. quantitative research. And the validity of the finding itself.

I’ll let you all watch the talk yourself for the details, but here’s a few of the biggest takeaways.

The best metric for the number of bugs in a code base is the number of lines of code. Pretty obvious. But also not very helpful. It’s not very actionable.

Org structure has a big impact. The higher number of different people/teams/groups adding changing code in a module, the higher the number of bugs. But it’s a balance. Silos and gatekeepers may have a small positive impact on the number of bugs, but they have a large impact on velocity.

Finally, the things that are most impactful to development are Stress and Sleep. Missing a little bit of sleep on one day makes you less productive. Missing a few hours a night is like missing a whole night’s sleep. And even worse, along with that reduction in productivity and effectiveness, comes an inability to notice that reduction. Which means, not only do you make mistakes, you can’t tell you’re making mistakes. And you can’t do anything about them.

So whatever else you do, make sure you get enough sleep.

And on a mostly unrelated note, take a look at the slide deck that goes with the talk. That’s a really interesting way to do a talk. Very minimalist. So you focus on the presenter. Which is often a good thing. But if you consider a presentation as two channels of information (spoken and written), it basically leaves the second channel unused. But that’s a topic for another day.

by Leon Rosenshein

Rice and Garlic

Here’s another one from GeePaw Hill, all about rice and garlic. Actually, it’s not about rice and garlic. It’s about generic advice.

As GeePaw tells it, he knows a chef who has a standard answer to the context-less question, “How can I make my cooking better?”. The blind answer is “That’s too much rice, and not enough garlic.” And you know, that’s pretty good advice when you don’t know what’s being cooked. Of course, there are edge cases. You can’t put too much rice in a bowl of rice as a side dish, and contrary to what they say in Gilroy, some things don’t need more garlic.

So what’s the software development equivalent? GeePaw says you should take more, smaller, steps. Again, without much context, that’s pretty good advice. Rather than try to get from 0 to 100 in a single step, try 0 to 1. Then 1 to 2. Continue that approach and you’ll still get to 100. Assuming along the way you haven’t realized that what you really want is 75 so you stopped there. Or more likely, you’ve found that the goal isn’t 100 anymore, it’s actually a close neighbor of 100, but off in a different dimension. Which you couldn’t have known without working towards the goal incrementally.

And not just that. That kind of advice scales as well. It doesn’t matter if you’re going to the moon, building a website, or writing a command line calculator. Make a step in the right direction. Test fire an engine. Get your website to deploy and log that it’s listening on a port. Get the CLI calculator to return an error about invalid input. And you can break those down further into actionable tasks. Figure out what each step is supposed to do. Then come up with a way to tell it it’s doing it, and every time you take a step use that mechanism to see if you’re getting closer, holding your ground, or falling backward.

There’s a name for that kind of development. Test Driven Development (TDD). It has all kinds of benefits. Add in some customer value/feedback and now it’s not just TDD, it’s Agile TDD. Which is great.

Because you get to work with your customer to not only show your progress, but they get value along the way, and you get better understanding of their problem(s) and can focus on that, instead of just trying to get to 100 one fell swoop, because 3 months ago you thought that was the exact target to hit.

by Leon Rosenshein

Discipline

I cannot emphasize too much that architecture is as much about programmer discipline as any technical consideration. Without programmer discipline, all systems, no matter how well designed, degrade quickly into gray goo at the hands of people who don’t understand the "why." -- Allen Holub

Back in the stone age (actually the late 80s/early 90s) I was working for a 3rd tier aerospace company in southern California. One of the things we built was a tool we called ARENA. Basically a squadron level air combat simulation system. We handled flight modeling of aircraft and missiles, Air Combat AI, and a really fancy (for the time) display and replay system. And, I think, a pretty good domain driven architecture.

It helped that the domain was pretty simple (vehicles moving in 3D space) and that the entities didn’t really communicate with each other. At least not actively. Each object did it’s own physical modeling and put it’s location in a distributed shared memory system. After that each object was on it’s own to detect and respond to the other things in the world. And for each object we broke things down like the real world. Propulsion, aerodynamics, sensors, and either an AI or a set of input devices. Interfaces between systems were defined up front and we all knew what they were and respected them. We had external customers and internal users doing research into different flight modes and doing requirements tradeoffs.

Then we got a new customer. Someone wanted to use our system as a test bench for their mission computer (MC). Seemed reasonable at first. We already had a simulated world for the MC to live in, and we had well defined models of an individual aircraft, so how hard could it be to add a little more hardware to the loop? Turns out that it’s approximately impossible. At least with the architecture we had. Because our idea of the interfaces inside an aircraft were purely logical, while the MC expected distinct physical components talking to it over a single bus. So we wrote some adapters. That worked for some things, like the engine, because there was one input (throttle) and 3 outputs (fuel burn, thrust, and ingest drag). But it didn’t work for some of the more complex systems, like the radar, It had lots of inputs, including vehicle state, pilot commands, world state, and mission computer commands. And to get all of them we needed our adapter to reach into multiple systems. The timeline was tight so, instead of refactoring, we did the expedient thing and reached around our interfaces and directly coupled things. And it almost worked.

Everything was very close to correct, and often was, but the timing just didn’t work out. Things would drift. Or miss a simulation frame and stop, or worse, go backwards. So we added some double and triple buffers. That got rid of the backwards motion, and the pauses were usually better, but sometimes worse. So we added some extrapolation to keep things moving. Then we added another adjustment. And another. What was supposed to be a 4 week installation turned into a 3 month death march and resulted in a system that worked well enough to pass acceptance tests, but really wasn’t very good at what it was supposed to do. It went from a clean distributed system to a distributed ball of mud.

And that happened with the same team that built the initial simulation doing the mods. We knew why we had built ARENA the way we did. The reasons we put the boundaries and interfaces where we did. And why we shouldn’t have violated those boundaries. But we did. Not all of them, but enough of them. And we paid the price. And the system suffered because of it. Because we didn’t have the discipline to do the right thing.

Now imagine what would have happened if we didn’t know why things were the way they were. And there wasn’t any documentation of why. Which of course there wasn’t because hey, we all knew what was going on, so why right it down? Any semblance of structure would have fallen apart that much faster. And we probably would have not just had problems with the interface, but likely broken other things as well. It’s Chesterton’s Fence all over again. That’s where Architectural Decision Records (ADRs) come in. Writing down why you made the decisions you did. The code documents what the decision was, and the ADR documents why.

So two takeaways. First, next time you get into a situation where you have a choice between the “right” thing and the “expedient” thing, remember the long term cost of that decision. Remember that once you start down that slippery slope of expediency, it just gets easier and easier to make those decisions. Because once the boundaries and abstractions are broken, why bother to keep them almost correct? I’ll tell you why. Because this way lies the madness. Otherwise known as the big ball of mud.

Second, write down the why. The next person to work on that area won’t know all the background and the experiments that led to the current architecture/design. Having the why documented lets them avoid making those same mistakes all over again. Even (especially?) if that person is future you.

by Leon Rosenshein

NFRs and You

NFRs are non-functional requirements. I’ve talked about them before, but lately the name has been bothering me. It’s that non in the name. Unlike non-goals, which are the things you’re not going to do or worry about, Your NFRs are not things that your project won’t do, and they’re not requirements you can ignore. NFRs are things you do need to worry about. 

I mean, look at the phrase non-functional requirement. You mean you don’t want it to work? Or are you saying it doesn’t do anything, but it’s required anyway? Why would you put any effort into something that has no function?

Requirements without function or benefit should be pushed back against, but what we typically call NFRs are neither. NFRs are usually drawn from the list of -ilities. Things like burstability, extendability, reliability, and durability. For any given project you will likely need one or more of them, so ignoring them is a bad idea.

That said, the -ilities aren’t your typical functional requirements either. “The method shall return a (nil, error code) tuple when receiving invalid input.” That’s a pretty straightforward requirement that defines how the code should function. It’s also pretty easy to unit test. Write up a set of test cases that provide whatever level of coverage of the combinatorial explosion of possible inputs and call the method. As long as you get the expected answer you’ve met the functional requirement.

No, I think what we call NFRs are really operational requirements. Requirements around the way things behave when they’re operational. Things that are hard (impossible?) to test with unit testing. Consider burstability. Sure, you could write a test that, in parallel, calls your method 100’s of times per second. But does that give you any confidence that the system overall can handle it? To really get some confidence on burstability you need to actually try it. Maybe not in production, but in a production like environment, with all of the vagaries that implies.

And like any other requirement, operational requirements need to be part of the plan. You can’t just bolt extensibility on at the end. Or at least you can’t do it in a sustainable way. Which means you need to think about operational requirements from the beginning. They’re requirements, just like any other requirement. They’re just usually a little harder to figure out. Especially if you’re just looking at the business logic, because they’re not in the business logic, they’re in the business case.

The business logic will tell you what to do when a customer places an order, and what should happen if they do something wrong. It won’t tell you what to do on the Friday after Thanksgiving. It’s the business case that will tell you that while your typical order rate is X, on that day it will be 10x, and for the subsequent 6 weeks it will average 5x. And you better be ready for it, or else.

So next time you hear someone asking you about your NFRs, think about what they’re really asking you. Think about the scope of the question, and plan for it.

by Leon Rosenshein

Toasters

Toasters are pretty simple. Put in a slice of bread. Push a knob/lever and a few minutes later your toast pops up, with some approximation of the level of doneness you want. If you’ve got a commercial toaster with a conveyor belt you don’t even need to push the knob.

But it’s only an approximation of the level of doneness you want. And it seems to drift over time. And with different breads and slice thicknesses. And bagels, which should only be toasted on one side, are a whole different kettle of fish. Which leads to this story, which is almost 30 years old.

But even the simple toaster is hard to build, if you have to go back to first principles. I don’t mean being a maker in the modern sense, with a shop complete with 3D printers, CNC mills, and laser cutters. I mean first principles. Consider this attempt to make a toaster from scratch. It didn’t exactly work, but it didn’t exactly fail either.

But what has that got to do with us? Just this. On any given day we make lots of choices. Choices about what to build, what to reuse, and what to extend. Choices about building what we need right now vs what we know we’ll need next week/month vs what we think we might want in the future. So we need a framework to make those choices in.

A good place to start that framework is integrated value over time. Which usually means building on existing things instead of building from scratch. Improving things instead of replacing them. But not always. Sometimes a new requirement comes along and the cost of adapting current systems exceeds the cost of making something new. So do it. Carefully. Within the existing framework. And then iterate on that thing as well.

And that’s where it gets really interesting. You find out how well your mental model of the system translated to architecture and then code matches the problem domain. If they match well it’s easy to add something new. If they don’t the new thing will expose the gaps between your code and the problem space. Which brings you right back to deciding whether to adapt what you’ve got or build from scratch.

Because in the end, just like building a toaster, the right answer depends on figuring out the right place to start and the right place to go to.

by Leon Rosenshein

Pointers and Errors

Go is pass by value. But it has pointers. And multiple return values. So here’s a question. Let’s say you’ve got a Builder of some kind, but there are combinations of parameters that are invalid, so your builder returns a struct and an error. Simple enough. The caller just checks the returned error, and if it’s non-nil does whatever is needed. If it is nil, just continue on and use the struct.

But what if the user forgets to check the returned error? Then what? They’ve got a struct that might or might not be usable, but it’s certainly not what they asked for. What’s going to happen if they try to use it? Even worse, what happens if they store it for a while, then pass it down some long call tree, and then it gets used. And fails. Very far from where it was created? That can be a nightmare to debug.

One way to prevent that is by returning a pointer to a struct, instead of a struct itself. Then in the error case you can return a nil pointer and the error. There’s no chance of that nil pointer working. As soon as the code tries to use it you’ll get a panic. Panic’s aren’t good, but they’re often better than something random that almost works. So let’s always do that. Right?

Not so fast, because now you’re using a pointer to things, and suddenly you're exposed to spooky action at a distance. If one of those functions deep in the call tree changes something in the struct via pointer, it’s going to impact everyone who uses the struct from then on. That might or might not be what you want.

Consider this playground. When you create a Stopwatch you get one back, regardless of whether your offset string is valid (sq) or not (s2). So even if you get an error .Elapsed() will return a value. You don’t get a panic, but you do get an answer you weren’t expecting.

On the other hand, if you create a *Stopwatch with a valid offset you get one back (s3), but if your offset is invalid (s4, s5) you don’t get one back. As long as you check your error value (s3, s4) you can do the right thing with a *Stopwatch, but if you don’t (s5), boom - panic.

Finally, consider what happens in the timeIt methods. With the both the Stopwatch (s1) and *Stopwatch (s3) you get the right sub-time, but when you get back to the main function and check the overall time the Stopwatch (s1) gives you the right elapsed time, but the *Stopwatch (s3) has been reset and shows only 1 second of elapsed time.

So what’s the right answer? Like everything else, it depends. Don’t be dogmatic. I lean towards pointer returns because it’s harder to use an invalid one, and not pointers for method parameters unless I want to mutate the object. But that can change based on use case and efficiency. What doesn’t change is the need to think about the actual use case and make an informed decision.

by Leon Rosenshein

Understanding

Quick. What happens when someone passes a poorly formatted string to your interface? The interface notices and responds with the correct error code. Great. What if they set the colorize_output and the no_ouput flags? Do you colorize nothing? Arguably correct, but also arguably wrong, and probably indicative of a confused user. You can, and should, be testing all those cases, and others, and having well defined answers. Unit testing can help you avoid lots of customer issues. But that's just based on your interpretation, not your customers.

Then there's the whole "You gave me what I asked for, not what I wanted" situation. Again, you're arguably correct, but you don't have a happy customer. Remember, your customers want a problem solved, not a bag of features. So unit testing isn't enough. You need to do more, which leads to:

"Testing leads to failure, and failure leads to understanding." - Burt Rutan

Understanding what? Not just the specific feature on the ticket you're implementing. Understanding your customer's problems. Their workflow. Their environment. And their expectations. The kind of understanding you get from testing the thing you're building in the environment it will live in. That doesn't mean that you don't do unit testing or integration testing. Instead, it's another kind of testing you need to do.

And it's not just throw it over the wall testing, although that has value too. At least at first, you need to work with your customer to understand how they're using it. You need to help them understand how you expect it to be used. And together you come to understand the differences between the two. Then you can close the understanding gap.

You need to interact with your customer. You need to interact frequently to keep cycle times low. You need to interact deeply because you need to understand not just the surface issues, but why they're issues. And that kind of understanding requires a lot of information to be transferred. So you want high bandwidth communications. Bug reports might tell you what, but they're not too good at the why. So in person if possible, or at least screen share.

Because you can't solve a problem you don't understand, and you can't understand your customer without really seeing how they work. And that applies to all customers, internal and external.