Recent Posts (page 58 / 69)

by Leon Rosenshein

Counting From Zero

Whole numbers, counting numbers, natural numbers. Which of those sets has the number 0 in it? You'd think mathematics would be precise, but naming things is hard. Depending on who you talk to, the natural numbers include 0, or they don't. It depends.

What about array indexes? They're zero based, right? Not exactly. Yes, the C family, Lisp, anything JVM based, ECMAScript, Go, and now Rust are 0 based. Other well known languages such as FORTRAN, COBOL, R, and Matlab are 1 based. But why?

The standard reason on the zero based side is that it's easier for the computer/compiler. Just multiple the size of an element by the index, add it to the base, and you have the address of the element. That makes sense. And when compilers were big and slow and there were no optimizers saving time and memory was important. And look at who uses those 0 based languages. If you're using one of those languages you're probably either writing frameworks/libraries for others to use or really concerned about performance.

Contrast that with who's using the 1 based languages. For many folks using that set the computer is a tool that does simple math quickly. They're concerned with lift and drag or modulus of elasticity or probability or transactional cash flow. And when they think of the first item in a list they're counting things, so the first thing (1st) goes by the number 1. They're optimizing for cognitive load in their own heads while they work on the problems they're trying to solve.

So who's right? As with everything else engineering, the answer is, it depends. Figure out what you're optimizing for, make a choice, and stick with it. Or, if you're Edsger W. Dijkstra, you could go back to how to represent a sequence of numbers and make a decision based on that.

by Leon Rosenshein

APIs Are For Life

api

APIs do lots of things. One of the most important things is that they make the work you've been doing available to others. It doesn't matter if your API is HTTP, gRPC, C++, Python, a string of characters, a bunch of 1's and 0's stored in some kind of persistent store, or something else entirely. Without an API it's unusable, locked away where no-one can get at it and you've wasted your time.

If your API is your user interface, then approach it like one. The best user interfaces don't just make it easy to do what you want, they make it easy to understand how you should be using them, and they make it hard to do the wrong thing. If you're using a metaphor for part of your API, use it for the whole thing. Ensure your objects, whatever they are, are consistently used and named. Your API is your contract, so you have to live up to it, but that goes both ways. If the user makes a mistake, you don't have to guess what they meant, just make it obvious what the mistake was.

And that's the easy part. The hard part is future-proofing your API. Once you've released your API and your customers start using it, they're going to expect you to continue to uphold the contract, so how can you change it? You can extend it. You can make it do new things and handle different cases. You can version it. You can (and should) support at least the previous version of your API. The worst thing you can do is make it subtly different, so that it appears to continue to do the same thing, but really it doesn't. You might think it's a bug fix, but does your customer? If they built an entire workflow assuming that the way it works is the right thing and you "fix" it they're going to be upset. But that's a story for another day.


by Leon Rosenshein

Random Is Hard

Not only are people bad at understanding random events, seeing patterns where they don't exist and not expecting things that they should, we (and the computers we use) are bad at generating random things. The best a computer can do is a Psuedo-random number and then calculate the next one from there. While people have done a good job of making sure the overall distribution of random numbers from these calculations are flat, we know, from the word calculation in that description, that given the same input you will get the same output. And that doesn't take into account the number of times that people try to roll their own. Unless you really know what you're doing and have a good reason to, don't do it. You'll just get it wrong.

Of course, in many cases reproducibility is a good thing. We do thousands of simulations a day, and while we expect variability, we need controlled variability so we can have reproducibility. On the other hand, If you're writing a black-jack game and someone can figure out the way you shuffle the deck and the order of the cards because they know the "seed" it doesn't matter how well distributed your random numbers are, they're not random.

Speaking of card games, just shuffling a deck is harder than you think. If you do it wrong then even if you have a perfectly flat distribution from your generator (or even a truly random generator) then you'll end up with a non-random distribution of cards. Kind of annoying if you're playing solitaire online, but imagine if you were a casino operator and someone figured out how to predict what the cards were going to do?

Random is hard and applications of random are even harder. Sometimes it really does matter how random your numbers are. So think about what you're doing and how you inject enough (but not too much) entropy into your system. Or, just build a wall of lava lamps and use that as the source.

by Leon Rosenshein

The Paper Of Record

From today's everything old is new again files, Blockchains and The Gray Lady.

While the most widely known usage of blockchains today is crypto-currency, all blockchains really are is an immutable, distributed, verifiable transaction log that lets you verify that not only is something what it says it is, it hasn't changed from the original. 

Bitcoin has been around for almost 11 years, but what if I told you that almost 15 years earlier a company called Surety started one of (if not the) first public blockchains and used notices in the NYTs classified section as the public, distributed ledger for their hashes? Send them a document and they'd send you back the timestamped/hashed document and then they'd add it to their public hash which was (and still is) published in the NYTs every week.

While I'm pretty sure we don't need to publish the git sha's of our releases in the NYTs, wouldn't it be great if every bit of software we released (internally and externally) intrinsically had its git sha and creation date immutably embedded? You'd at least be able to get back to the code it was built from. Being able to build the identical (except for creation date) thing and then do testing on it would be great too, but hermetic, reproducible builds are a topic for a different day.

by Leon Rosenshein
by Leon Rosenshein

Programming In The Dark

We all know about serverless programming in general, and AWS Lambda in particular, but have you ever heard of Dark Dark is a holistic programming language, editor, and backend all in one. It sounds pretty intriguing. Everything is boiled down to http endpoints, datastores, background workers and scheduled jobs. Everything else is handled by the backend. It autoscales. it's highly available. Deploys in 50ms. Built in version control. Built in feature flags. See live traffic in your IDE. Of course, to get those benefits you need to learn a new language and IDE, and you give up all knowledge and control of how and where things actually happen.

I'm not part of the closed Beta, so I have no idea how well it actually works, but there are a lot of good ideas there. It might not be the right answer for something that needs to support starting hundreds of new trips/second and hundreds of thousands of concurrent trips, but there are lots of use cases that don't require that kind of scale. Think about being able to focus on the problem you're trying to solve and not needing to worry about all those pesky things that distributed systems bring to the table like consistency issues, race conditions, latencies, and SPOFs.

The value isn't really in the things that you can do, but in the things you don't have to do. Imagine a world where you don't need to set up databases, servers, routers, repos, hosts, CI/CD pipelines, or test environments. Complexity and cognitive load slow us down, and easing that burden makes us more productive.

Dark isn't ready for PrimeTime yet, and may never scale to our needs, but reducing complexity and cycle time is a good thing. That's what those of us down here in the engine room are trying to do. Our value-add is making it easier for others to focus on their value-add.

by Leon Rosenshein

Lessons From Ancient Rome

A few years ago when Uber did the first Harvard Business School classes one of them was called Leadership Lessons From Ancient Rome. The pre-readings were a bit dense, and since they were direct translations from the original latin they seemed a but harsh to our way of thinking. When the class was done this year the reading weren't as hard to get through, but the intent was the same.

The basic idea was to look at leadership as a continuum along 2 axes, strict adherence to rules/standards, and deep devotion to a person/idea. In the class the first axis was discussed around the rule of law. If a law applies to an individual then how much more does it apply to a leader? Things are very black and white. Either the letter of the law was followed, or it wasn't. The second axis was about following a person or idea, regardless of the cost to you or others. If you have too little of both you have everyone doing what seems best for themselves. Absolute adherence to the letter of the law is absolute severity, prisoner 24601 goes to jail for taking a loaf of bread to feed a starving family, while blindly following a person or idea leads to demagoguery. It is by balancing adherence to standards and devotion that justice emerges.

The question was, how did that relate to being a good leader, and how do you create an environment where people can be free to do the right thing, while doing their best work. How do you create a culture with sufficient guidance/direction without stifling people and creativity? What does a healthy organization look like?

While there are lots of details in the linked articles, I think it comes down to a few things. Having clear, well defined goals and ensuring that everyone really understands what the goals and priorities are. Defining enough process and procedure so that information is shared and the left hand knows what the right is doing. Clarifying ownership and responsibilities, and then helping to resolve conflicts when there are disagreements over them. Finally, making sure that meeting the goals and priorities are what takes precedence, not who is doing it.

And if you're interested in the original readings about standards and severity let me know and I'll share them. Things were different back then.

by Leon Rosenshein

Failure Modes

How does your system fail? What are the impacts of the most common/likely modes of failure? What are the workarounds? Does it fail-safe? Does it Fail-Degraded?

Consider the lowly moving walkway. The Pittsburgh airport has a bunch of them. They just sit there, going around and around. They make your life easier. But sometimes they fail. They can fail in multiple different ways, but the most common is to simply stop moving. No-one likes that, but you know what happens when a moving walkway stops moving? It becomes a floor. It stops reducing your workload, but you can still walk on it. Similarly, for an escalator. If your escalator stops moving it just becomes a set of stairs. It might be steeper than you want, and the riser at the top and bottom are likely different than the ones in the middle (that's a building code violation in most places), but you can still go up and down them. The Denver airport is like that. Yes, every escalator has nearby stairs for if there's a problem, especially when they're being worked on, but if it stops moving, you don't have to. That's fail-degraded.

Elevators have a different failure mode. When they stop moving, at best they're closets. The closets are safe, but there's not useful functionality. Fire doors often have a mechanism to hold them open, but if something happens and the power fails, they close. Not ideal, but safe. Back at the airport again, that luggage cart you rented has a bar you need to hold on to or it won't move. If the linkage breaks the cart doesn't roll away, it's stuck. Truck and train air brakes work on the same system. If there's no pressure the brakes are full on. You need to apply the default pressure to release the brakes, then you add more to engage them again. If the hose/line breaks you stop. Fail Safe . Your car, on the other hand, won't stop if there's a hole in the brake line.

So how does this apply to software? How do you respond to failures of subsystems or bad input? Can you keep operating, store things in some durable fashion, and then catch up? How do you ensure that you don't make things worse by only applying part of a change? If your snappy local cache is down, can you go back to the source of truth yourself? Or do you just say no and let the upstream thing, with more (hopefully enough) knowledge of the situation do the right thing. 

by Leon Rosenshein

Wall Of Perl

Back in my Flight Simulator days I ran the build system. The build system supported about 15 developers, 20 testers, and 30 artists. We did daily builds and internal releases of code and content. No check-in builds, no automated unit tests. Code builds were easy. Fire up MSVC via command line with options and wait. If it failed send email to lots of people.

Content builds were a little different. We used 3DStudio Max for models and custom plugins for model export/processing. We had another custom tool to take textures (images saved in a lossless format) and convert them to pyramided DirectX texture files. And we had lots of models/textures. Around a terabyte of content source, and this was in the late 90's, so large for the time. The only way to make it run in a timely fashion was a distributed build farm (which were the artist's development machines during the day). And holding the whole thing together was a set of Perl scripts.

From the beginning Perl has always surprised me by how terse it is. Working with lists, whether modifying or just acting on them, is easy and (usually) clear. It has pretty close integration with the underlying OS, so acting on system resources (files usually) is straightforward. It's a procedural language, so it's familiar to most developers. It's strongly typed in that at runtime there is some enforcement, but it's loosely typed since there's no compile time checks and it can depend on your data to know if things really work.

Perl gets a bad wrap because there is lots of read-only perl code out there. Particularly since a big use of Perl is string handling, a lot of Perl code is replete with regular expressions, extracting data from strings and then using it to do something else. In the spirit of transparency I admit that I've written code in Perl, then come back the next day and been unable to figure out what I was doing, even knowing what I was trying to accomplish.

On the other hand, Perl is still amazingly powerful and useful. Larry Wall is the Benevolent Dictator For Life, and does a great job. Unlike other languages which went through a painful schism when updating to a new major version, Perl 5 is still the Perl, and what might have become Perl 6 has spun off into its own language and Perl continues to grow with active community support. There's even Perl being used in the NA toolset. So what's your experience with Perl? Share in the thread.

by Leon Rosenshein

On Testing

My title is Senior Software Engineer II. I work on Infrastructure. No mention of customers or testing in there, but I'm a tester. I write and execute tests to make sure my customers always get what they expect, and if something goes wrong *I* know before they do. It wasn't always that way.

Back in the day, there were different roles, and they kept to themselves. PMs talked to business leaders and customers and published the requirements. Devs wrote some code, hopefully related to the requirements, and tossed them over the wall to the testers, who tried to run it based on their understanding of the requirements, and told the developers it was broken. Lather, Rinse, Repeat. Eventually something got sent to customers, and then the patching, which went through the same cycle, began.

Now, not so much. There are lots of PM types, the walls between roles are much more porous, and the test org is largely gone. No STEs (Software Test Engineers), no SDETs (Software Development Engineer in Test), no SEs (Sustained Engineering). Instead we have lots of different kinds of tests and times/places to run them. There are Dev Tests, Unit tests, Functional Tests, Integration Tests, UI Tests, Simulations, Staging Environments, Black Box Tests, Chaos Monkeys, A/B experiments, and HealthChecks. So how do you know which ones to use and when?

That depends on where you are in the product life cycle and what you're trying to accomplish. Unit tests are great for making sure things do the right things when the inputs are defined and give the proper error when expected problems occur. They should be written early in the cycle and executed all the time to make sure unexpected changes don't sneak in. And of course, they should be completely deterministic, carrying all the data they need and not relying on any outside functionality.

At the other end of the cycle is black box testing and health checks. Assuming your system is deployed and running, use it like a customer would and make sure you get the correct responses. If you do this kind of testing regularly you stand a good chance of finding out there's a problem at the same time your customers do, possibly even sooner.

In between are all kinds of tests that focus on different things. Chaos monkeys for dealing with random problems. Integration tests to make sure your assumptions about another system are valid. Staging environments for medium scale testing. A/B experiments for large scale testing. 

Martin Fowler has a whole page of articles about testing and when and how to use them.