Recent Posts (page 35 / 71)

by Leon Rosenshein

Discoverability vs. Findability

First things first. discoverability and findability are not the same thing. Findable is relatively easy. You know something exists. You saw it once or someone told you about it. You know what you’re looking for, but you don’t know where it is. Like your car keys or that status dashboard. Discoverability, on the other hand, is about learning. You can only discover something once. The second time you see something you haven’t discovered it, you’ve found it. And that’s the key difference.

And that gets to the heart of the issue. Findability is important for the things you know you don’t know. You have a question and you want an answer. What’s the difference between merge sort and bubble sort? When is it better to use a micro-kernel based architecture instead of microservices? Search engines are great for that kind of thing. Put in your keywords and get the answer.

Discoverability, on the other hand, is about things you don’t know that you don’t know. And search engines are notoriously bad at that. Sure, you might get a better spelling or a search result might show you something related, but if you’re trying to find out how to use your microkernel architecture to process millions of queries per second you’re unlikely to find anything about how a distributed microservice might be a better choice. If you know a little bit more about the problem space you can use a search engine, but it’s harder. You need to change your thinking and search for architectural patterns, and then dig in from there. And that’s if you know the domain.

Or consider your IDE. Both VSCode the various JetBrains IDEs do a good job of both making it easy to find the functionality you’re looking for with hierarchical menus, context menus, and a search mechanism and the make it easy to discover keyboard shortcuts and related commands/options though advertising them and grouping things. Vim, on the other hand, has an OK search engine, butt if you don’t know what you’re looking for it’s almost impossible to discover.

So why should you care? You should care because it applies not just to IDEs and search engines, but also to libraries, APIs, and pub-sub systems. We talk to our customers/users a lot and understand what they want/need. We use that information to build the things that provide the most value to them. If there were more valuable things to build we’d build them instead. But unless our customers/users know about what we’ve done and use it then we’ve wasted time. Sure, you could assume that since they asked for it 3 months ago they’ll notice when it suddenly appears, but really, they’re busy to and probably won’t. So make it not just findable, but discoverable. How you make something discoverable is a different topic for later

by Leon Rosenshein

I've got you covered

Code coverage is important. If you haven’t exercised your code then you don’t know if it works. If you don’t know it works, how can you be done?

So code coverage is important. If all your code is covered by your unit tests you can feel pretty confident that your code does what you think it does. Test Driven Development (TDD) is a good way to approach it. Figure out what you want your code to do. Write the tests that validate it does those things. Then write just enough code to make sure all the tests pass. Do that well and you’ve got 100% code coverage.

And that means your code is perfect and you can turn off your alerting, right? Wrong. For one thing, 100% code coverage and passing tests means that all of your code is executed by the tests. It doesn’t say anything about the correctness of the code, Consider this bit of pseudo-code

func area(length int, width int) int {
 return length * length
}

func test_Area() {
  length = 2
  width = 2
  result_area = area(length, width)
  assert_equal(result_area, length * width)
}

100% coverage, all tests pass, and it’s wrong. So code coverage isn’t everything.

But you could argue that in this case the problem isn’t that coverage isn’t the problem, it’s that the test is wrong. You’d be correct, but writing tests to check the tests has no end.

Then there’s the issue of side effects. Everything does what you expect in isolation, but does it do so in concert? Do you have enough tests to cover the side effects? Your method to add to the global list does so, but is there another bit of code that has cached a copy of the global list and is now going to do the wrong thing? In this case each of the tests is correct, but there’s an interaction you haven’t tested.

Then there’s the whole world of integration. Each piece of code does exactly what it should. No less, and no more. But when you put them together you get some kind of non-linear response or positive feedback loop.

All of which is to say, code coverage is important. You can’t be sure you’re right without it. But don’t confuse a necessary component with a sufficient one.

by Leon Rosenshein

YAGNI, but ...

I’ve said it before, and I’ll probably say it again. Remember YAGNI. You Ain’t Gonna Need It. But, YAGNI, until you do. And if you haven’t planned for it, then what?

We got about 5 inches of snow overnight here in sunny Colorado. It’s all shoveled now, and I used my legs, not my back, so I’m fine. Actually, this is Colorado Powder, so I used a 36” push-broom for most of it. But it reminded me of a teaching moment from my youth in the northeast, where the snow is wet and heavy, more like concrete. And it’s not so sunny, so it kind of sets like concrete too.

After one of those 12” dumps my father sent me out to shovel my grandmother’s sidewalk and plow her driveway. I argued a bit, but hey, I was going to get to drive the tractor (not me, but the same tractor), so I did it. I did it exactly. Nice straight cuts down to the edge of the sidewalk and put the snow right there on the edge. A couple of turns around the driveway with the tractor and if the berm was on the driveway a little, there was still plenty of room for the car, so no problem. I told my dad I was done and asked if I could borrow the tractor to plow the lake off for hockey.

He took a look outside and asked if I was sure I was done because there was going to be more snow later in the week, and it was only November, so … I said yes. Everything is clear. He said OK, you can borrow the tractor, but you have to keep doing your grandmother’s sidewalk and driveway.  I thought that was great and went and plowed the lake. Driving on ice with individual rear wheel braking is fun, btw.

And as it is wont to do in those parts, it got a little warmer (like mid 30s) then got cold and snowed again. So I plowed and shoveled, and if the driveway got a little narrower because the berms crept inward a little it still wasn’t a problem. The car still fit. Rinse and repeat a few times. And then one morning after plowing the car didn’t fit. And the berms were about 2 feet of near solid ice. So out I go w/ the tractor, pick, and shovel and beat the berms into submission. Got my grandmother’s car out and she got to work, so all was well.

At which point my father reminded me that just doing what you need at the moment has long term consequences, and saving a little time then can leave you in a really bad place later. Then he got out the big tractor and pushed the berms back to where they should have been.

That’s when I learned about the exceptions to YAGNI. You might not need it now, but sometime you know you’re going to in the future. Or at least the combination of work needed now to prepare for the future combined with the chance of needing it later and the amount of additional work needed to recover from the omission makes it worth buying the insurance of the pre-work.

And that applies to software just as much as it does to plowing the driveway. It’s certainly easier and faster to access S3 directly than it is to build and use a full POSIX compliant file system wrapper. So you probably shouldn’t do that since you don’t need the full wrapper now, but maybe a simple facade, with no semantic differences that does nothing but collect your accesses and maybe collects some metrics, is worth doing. The metrics are certainly nice, and it lets you see your access patterns better. And, if you realize later that yoU DO need that full POSIX compliant system or you need to switch to Azure blob store, or even just handle multi-region S3 access efficiently you’ve got a place to do it.

So remember, YAGNI, but …

by Leon Rosenshein

Ch-ch-ch-ch-changes

The greatest payoff of domain-driven design is not that the *state* of a business domain maps cleanly onto software but that *changes* to a business domain map cleanly onto software.

   -- Chris Ford

Domain-Driven Design has lots of benefits. Ubiquitous language. Clear Boundaries. It works with the business, not against it. And done right it changes with the business.

Back in my early days I was working with a FORTRAN simulation of an F18. It was designed to see how changes to the flight computer would translate into changes in the handling characteristics. And it was pretty good at doing that. But that’s not what I was doing with it. I was using it as the flight model for a 1v1 simulation.

First, I turned it into a real-time, man-in-the-loop system, with a simple graphical display. That was relatively easy. Instead of reading a time series of control inputs hook the inputs up to some external hardware and the keyboard. It already ran faster than real-time at 50hz, so no problems there. Just connect the outputs to a simple out-the-window display and voila - a man-in-the-loop simulation. The only thing left to do was add another copy of the aircraft to the world.  And that’s where I ran into problems.

You see, the simulation was designed around “ownship”, the vehicle being simulated. And it did a great job modeling the forces acting on ownship. Thrust, drag, lift, gravity, asymmetric forces from control surfaces, inertia. All modeled. And the result was net forces and rates. In all 6 degrees of freedom. But it had no absolute location, and or orientation. It was always at the center of the universe, and the universe moved around it. And for speed of implementation I put the notion of absolute location and orientation in the display system.

That’s fine if you’re the only thing in the universe, or at least all the other things don’t move, but it’s kind of hard when you want to interact with something and you’re both at the center of your own private universe that moves around you. But still, it’s just math and transformations, so I made it work for 2 aircraft in one visual world. I had it working for 3 as well, when I ran into a problem.

My boss told me we need to do MvN sims as well. Since we don’t have enough pilots let’s make some of the aircraft AI. Skipping the whole “How do I build an air combat simulation (ACS) AI overnight?” issue, I also had a problem in that these simulations needed to interact outside of the graphical display. Unfortunately my design had the display system as the only place everything interacted.

My first thought was to duplicate that logic everywhere things had to interact. It worked after a fashion, but it was hard to keep track of and as more and more things needed to know where everything else was it got even harder. Because I had my domains wrong.

I needed to change. So, back to the drawing board to fix the domains. Start with a consistent world for everything to live in. Move the integration/transformation into each model so that they had an absolute position. Share those around the system and let each system do what it needed to. Just like the real world.

And unsurprisingly, after that I found that a lot of the things we had planned became easier. Things like adding new vehicles, weapon systems, upgraded ACS AI, or adding a record/playback system. They matched the domain, so I was able to add them within the system, without doing code gymnastics to get them to fit.

by Leon Rosenshein

MVP

Just what is an MVP anyway? Like everything else, it depends. In this case on context. The Super Bowl has a Most Valuable Player, but that’s not the MVP I mean, In this case I’m talking about the MVP as Minimum Viable Product.

But even there, MVP isn’t consistently used. I first ran into the term when I was working on the initial release of in-browser 3D mapping for Virtual Earth. We (the dev team) were extremely busy trying to figure out how build a product that could stream a surface model of the entire globe down to a browser at sufficient level of detail to show the Microsoft Campus in the foreground and find the our office windows, but not flood the pipe with data for Mount Rainier. While we were busy figuring that out, a different group of people was figuring out what feature set we needed to make a product that fit into the Microsoft online offerings. Two groups trying to figure out what the MVP was. We eventually shipped something, but I’m not sure we ever agreed on what the MVP was. Since then I’ve seen the MVP debate play out the same way multiple times.

There’s 2 big reasons for that. First, because MVP is made up of some highly subjective terms. Let’s start with the last one and work backwards. Product is probably the easiest. It’s the thing you’re selling/giving away. Of course you haven’t defined what it is or why someone wants it. That’s where viable comes in. A viable product is one that a customer is willing to pay for. And for a product to be viable the total of what the customer pays, both direct and indirect, must add up, over time, to more than the expenses. If not, you’ll eventually run out of money and not be selling anything. Which leads us to minimum. What is the absolute least you can do and still have a viable product? The assumption here is that the minimum takes the least amount of time to build, makes the fewest wrong choices, and gets you more information from the customer soonest, so you can iterate on it and make better choices going forward. That makes sense, and is a good definition of MVP.

Second, there is another definition of MVP out there. One pushed by the Lean Startup movement. And it defines and MVP something as the minimum thing you can build to test the viability of a product idea. Which is a very different thing. The definition of minimum is about the same, and viability still refers to the product, but now, instead of validating the success of a product, it’s applied to the idea of the product. Which means when you’re building an MVP, you’re not checking to see if you can make money off it, you’re checking to see if anyone else thinks you have a good idea.

And that’s why we never reached agreement on what the MVP for VirtualEarth was. The dev team wanted to build and deploy something to see if anyone cared, and if so, what they cared about. Oh for sure we had ideas and built some of them in the first release, but mostly we wanted to find out from our early adopters what they thought was important. So we could build more of that. Meanwhile, that other group was trying to figure out, without actual data, what people were willing to pay for. We’ll never know for sure why Virtual Earth (Bing Maps) never became the standard online map, despite having 3D models, weather, moving trees, Streetside, BirdsEye, and user generated content first, but building things that people didn’t want or know how to use instead of what they wanted in the order they wanted it probably played a part.

So when you build an MVP, remember why you’re building it.

Stages of an MVP

by Leon Rosenshein

Strict Product Liability

How many times have you heard That’s not a bug, that’s a feature? Usually it’s a developer saying that to a user or tester as an excuse for things not working as expected, but sometimes it’s “the street finds its own uses for things”. And how you respond depends on how you approach the situation.

Consider this video showing how a developer might react to a tester not using the product as intended. Who’s responsible here? Is anyone responsible? Is the tester wrong? Was the developer wrong? Was the specification wrong? Was no-one wrong and this is a learning experience for everyone? Should something be done? It depends on what the goal was.

In the US we have the idea of Strict Product Liability (SPL). IANAL, but my basic understanding of SPL is that the maker of a product is liable for all reasonable uses of a product. Which means if you’re using a screwdriver as a scraper or pry bar and something bad happens the manufacturer might be liable. There are lots of other factors, and you should probably consult your own lawyer before you act on that description, but there’s an important lesson in there for us developers too.

And it’s this. You can’t control what your customers are going to do with the capabilities you give them. You can (and should) make it simple to use things the way you intend them to be used. Streamline the happy path and funnel use-cases towards it. But unless you spend way too much time constraining your customer they’re going to use the tools you give them to solve their problems.

So what do you do about it? Two big things. First, while you can’t force your customer to stay on the happy path, you can make it very hard for them to hurt themselves or others. If your schema has a string field in it for a title then limit it to a reasonable length. If not you’ll find that someone is storing a 10Mb JSON blob in every record and taking out your database. Or maybe they’re using it as temp storage for message passing for distributed processing and suddenly you’ve got 20K concurrent writes/reads.

Second, find out why they’re doing it. They probably have a good reason. And you probably have a solution to their problem, but they don’t know about it. So introduce them to it. And if you don’t, should you? If one customer needs it there’s a good chance there are others who would benefit from that capability as well. So maybe you should build it for them. They’ll be happier, your system will be happier, and the other users of the system will be happier.

by Leon Rosenshein

Kinds Of Tests

There are lots of classes of tests. It all depends on what your goals are. Using the correct test for the situation ensures you’re actually testing what you think you are.

Unit tests are good for testing things in isolation. They let you control the inputs, both direct and dependent, and make sure the result is what you expect. The tighter the control you have the more reproducible the test and the fewer false positives you get. And it’s important to test not just that good inputs produce correct answers. You also need to ensure that all the possible combinations of incorrect inputs produce the correct response, whatever that means in your case. Having good, hermetic, unit tests is crucial to minimizing regressions.

Integration tests, on the other hand, have a whole different kind of isolation. You want integration tests to have as few controlled inputs as possible, but you want to be sure that the results of your integration tests are kept out of your production systems. Because what if there’s something wrong that you haven’t found yet. You run them carefully, and you know exactly what to expect as a result so you can look for small changes. Having good integration tests is crucial to finding problems with interactions and emergent behaviors. And if (when) you find some, it’s a good idea to figure out how to add them to the unit tests as controlled dependent inputs.

Scale tests are kind of like integration tests, but you’re looking for different things. Instead of having a small number of hand-crafted input sets and caring about the specific results, scale tests use lots of inputs and see what the impact of demand/concurrency is. The actual results aren’t checked. Instead the error rates and internal metrics, such as response time, queue sizes, memory used, are tracked and anomalies are flagged. Scale tests include not just the number of requests, but the number of requests per time and time at scale to see how a system responds to spikes and long periods of high demand. Good scale tests need lots of input, but give you confidence that your production systems will keep running if they get deployed.

Then there are tests you run in production. Some call those experiments or A/B tests, but they’re tests just the same. You’re just testing how something not under your direct control responds to changes. Things can get really dicey here. First, you need a good way to segment the population so only a subset get the new experience. You need to be able to define the group tightly and repeatably, If subjects go in and out of the group it’s probably not valid. You need to ensure that not too many subjects are in the test. You need to make sure that the test doesn’t have an unwanted impact on the control group. You need them though because good experiments let you test things safely in the real world with real users.

And of course you need frameworks to handle all of these different kinds of tests. Sure, everyone could write their own, but that’s a huge duplication of effort. And worse than that, it increases cognitive load, because now you have to worry not only about the tests you’re doing, but how the tests are done as well. And the last thing I want people running tests to worry about is if the test harness is really running the tests that have been written and returning the correct results.

by Leon Rosenshein

Cognitive Load

I talk about cognitive load a lot (over 2 dozen times in the last year). Especially about reducing cognitive load. But what is it, and why is it important? And what does it have to do with software development, especially architecture and design?

To give an analogy, let’s imagine the brain is a tool. A general purpose tool that, with some instructions/adjustment can do lots of different things, You can store those instructions away and get them back when you need to do what they’re for. This tool can only make contact with the environment around it in a few ways, but those ways can be combined for more flexibility. One tool that meets those requirements is a computer. You’ve probably heard of them.

So, the brain is like a computer. That’s a nice analogy to use to help understand cognitive load. Especially the CPU part. Consider the CPU. It’s got a few registers. It can act on those registers in a handful of ways, including getting data into and out of a register. And that’s it. Everything else it does is a combination of those things. Let’s say your CPU has 5 registers. You can do anything you want with the info in them, but if you need to work with more than 5 things you’ll need to keep stopping to put one of those pieces of info down somewhere safe, pick up the new one, and move on. The bigger the difference between the number of pieces of info and the number of registers the more time spent just moving things around, not doing anything with the info. And every time you need to move something in or out of a register there’s a chance to get interrupted, or worse, drop something and lose it.

In a related fashion, computers appear to do multiple things at once. But in reality, for a given CPU that’s not really true. It does one thing for a few milliseconds, switches to a new thing for a few more, and cycles through everything over and over again, giving the appearance of doing all of those things at once. We call the time spent between doing different things a context switch, and they can take orders of magnitude longer than actually doing the work because the computer needs to put all of the info in those registers somewhere safe, then bring back the info that was in them the last time it worked on the other thing. It also needs to remember exactly where in the list of steps it was, and pick up where it left off. Again, that context switch is great opportunity to get something wrong.

Now, your brain isn’t a CPU, but let’s stick with the analogy. There are only a limited number of things you can keep in active memory at once. If the number of things you need to remember is higher than that you have to keep refreshing your memory. That’s cognitive load. The less time you spend refreshing your memory, the more time you can spend on what you’re trying to do.

Similarly, When you’re working on something, in the zone, as it were, you’ve got all the things you need in working memory and most of the instruction cycles in your brain are going towards getting that something done. When it’s time to context switch you need to save all that state, find and reload the old state. Until that’s done you aren’t making progress on anything. And for our brains that process is very slow and imprecise. Often you can’t get back to where you were, you can only get to a recent save point and then you need to go over the same things again to get back to where you were. That’s more cognitive load. Again, keeping it down helps you make progress against your goals.

So that’s what I mean by cognitive load and why it’s important. How it relates to development is a whole set of different topics for the future.

by Leon Rosenshein

Charting A Path

It's a sea chart, not a road map. Map out the destination (strategic goals) and the hazzards, but the route depends on the wind. "Road map" is not a useful metaphor.

    -- Allen Holub

Sometimes you run across a phrase that really resonates. This is one of those cases. I’ve talked about roadmaps before, but it took me a few paragraphs and 6 questions to say what Allen said in 3 sentences.

Know where you want to go and what you need to avoid, but the actual path isn’t known until you can look back and see what it was. That’s pretty profound. Because metaphors are important. They provide context, and context is important. And that’s why a roadmap might not be the best metaphor. A roadmap is prescriptive about both path and time. Because it describes a journey over a well-known, static landscape. And development is often not a known, static landscape.

But it doesn’t mean don’t plan and don’t pick a direction. What it does mean is that you need to be both proactive and reactive at the same time. Either one alone won’t get you there. And you need to balance them.

You need to be proactive in that you need to keep the goal, the “landscape”, and hazards in mind. Where possible you want to take advantage of the situation you are in. Going with the wind, as it were. You also need to plan to avoid the hazards, he rocks and shoals along the way.

And you need to be reactive as you go. The situation is not static. The goal moves as you learn more about it. The wind might be stronger or weaker than expected. The cross-wind will be different than planned. Staying on heading X for Y hours won’t put you where you planned, so you need to react to where you are and re-plan.

So don’t skip the planning. If you don’t know where you want to go you’ll never get there, and there’s a good chance all you’ll do is go around in circles. But don’t slavishly follow the plan. Assuming nothing will change along the way will ensure you never get where you want to be just as certainly as not knowing where you’re going.

by Leon Rosenshein

The -ilities

In software engineering there are lots of different kinds of requirements. There are the functional ones. They are the obvious ones that describe what the software is supposed to do. There are less obvious ones, that talk about what the software should do when something goes wrong. Then there are business requirements, like time to market and operational costs. And finally there’s a whole set of requirements that have nothing to do with how the software should work, or when it should be ready. Instead,  talk about how the software should be designed.

These are the non-functional requirements (NFRs). The things that you need to think about when you design the system, not just the code. The NFRs are a set of nouns that describe the quality attributes of the system. You’ll often hear them called the -ilities since many of them and that way.

It’s usually easier to build a system that meets the functional requirements if you ignore the NFRs. And if you were only going to build one version, and only build it once, that might be the right thing to do. Because most of the -ilities are talking about things in the future. Operational things like reliability, scalability, and adaptability. If you don’t have to run it, grow it, or change it to meet future needs, why bother thinking about that or being able to handle it?

You shouldn’t. On the other hand, if you only have a rough idea of the current requirements, and notion of which direction things are going to go in the future it behooves you to not box yourself in a corner. But there are lots of -lities, so how do you know which ones are important and which ones aren’t?

Well, it depends. It depends on what you know, what you don’t know, and unfortunately, on what you don’t know that you don’t know. So how do you decide? How do you architect the system so that you choose the right NFRs, and then use them to both add customer value and keep from painting yourself into a corner?

There’s no simple answer, but there are guidelines. Domain Driven Design helps you find the clear boundaries between things so that you can change one thing without needing to change everything. Test Driven Design helps you know that anything you do need to change still works the same as it did before. Working with subject matter experts on a Ubiquitous Language for your system helps ensure that you’re solving the right problems and that everyone is talking about the same thing.

And finally, having enough adaptability in your system to adjust to new learnings and requirements as they are discovered. And that means not just adaptability in the system design, but in the overall process so that you can make the changes you need without having to fight the system.