Recent Posts (page 52 / 65)

by Leon Rosenshein

2038 == 2000

How many of you remember the Y2K problem? How many of you had to fix your way through it? For those that weren't involved, the Y2K bug was the result of an optimization. It went something like this, circa 1970:

Dev 1: Our database is getting too big. We need to save some space.
Dev 2: Ok. Let's not store redundant data.
Dev 3: Hey, Look. All of the years stored in our database start with 19. Let's not store that
Dev 1: Great idea. We've got 1,000,000 users, and 2 bytes per row is 2 MB.
Dev 2: Wow, we'll fit in memory again. Way to go man.

And it was so. And it worked. And for 20 years no-one noticed. Then some tester got fancy and put in a date a few years in the future and something broke. Time seemed to go backward. Things that were supposed to be 2 years away were 96 years ago. Or displays were just wrong and showed as 19XX. But slowly people fixed the problems. Then the internet found out about it. And there was much wailing and gnashing of teeth. Prophecies of doomsday. The power grid was going to fail. The phones would stop working, Turn your computers off before midnight on Dec 31st, 1999 or they would melt.

Of course, as you can tell since you're reading this on a computer, most of those predictions didn't come true. Yes, some displays were wrong, some programs crashed, and others got things wrong, but generally it was a non-event.

So we're safe right? Not exactly. In the Windows world, anything built with VC8+ (2005) is good until the year 30,000+, but if you're still running something older (like an embedded system running Win-CE) you'll probably have issues. And in the Unix world almost everything is time_t based and good until 2038.

Of course 2038 isn't that far away anymore. And our favorite penguin isn't ready. The 5.6 kernel will be ready soon, but that doesn't mean the programs in user space are ready. That's on us to deal with. It could be as simple as a recompile to run on the new kernel, but there's more to future-proofing than just recompiling and deploying. Think about persisted data. What about databases and files with old timestamps in them? Or that nifty new gRPC protobuf you wrote to disk with a 32bit timestamp. We need to think about these things and deal with them before time runs out.

by Leon Rosenshein

Architecture vs Design

Here's a tough one. What's the difference between software architecture and software design? Both are obviously about software. Both talk about how things fit together. Both have patterns and best practices. Can you have one without the other? Can you have good architecture and bad design at the same time? What about vice-versa?

I think the difference between the two is about the difference in scope. I think a lot of things that seem similar, but aren't, are different because of scope. In this case, it's not just scope, but scale.

Software design is about how things fit together at the function/class/module level. Are your functions really functions, or a wrapper around side effects? Does the function name help the caller understand what is going to happen, or does it just collect some stuff into one line so some other function fits on one screen? Do your classes encapsulate something and provide a useful abstraction or are they just collections of stuff and global things? Do your modules do what they say they do and nothing else? Can you trust your modules not to interfere with each other? Do they have flexibility, scalability, reliability, and replaceability?

Software architecture is about how things fit together at a higher level. Are your databases coherent? Do your queues only handle one thing? Do your services/APIs/gateways do what callers think they will and nothing more or are they just collections of capabilities so you have fewer things to deploy? Can someone look at the architecture and understand which part is responsible for which functionality or are they just collections of stuff and persistent data? Can you replace one part without impacting others? Do they have flexibility, scalability, reliability, and replaceability?

Notice anything similar between those two paragraphs? They start and end the same. They use a lot of the same descriptions and requirements. It's just the scope of the things I'm talking about that are different. The same principles that go into a good function are the same things that male a good class/module are the same things that go into a good system architecture. They need to be SOLID. You need to think about the quality attributes. It's just the scope and scale that changes.

by Leon Rosenshein

Engineer vs Technician

I've been doing campus interviews for tech companies for a long time now. Long enough that it's been 10+ years since I told a candidate that I got my BS in 1988 and the response was "I was born in 1988." In that time I've done 100's of interviews at dozens of schools. And one of the things I quickly learned is that while there are lots of schools and cultures and styles, not everyone who learns to code and looks for a job as a programmer is a software engineer. Out of those 100s of interviews the vast majority of candidates could code, at least to the level expected for a 45 minute on-campus interview. But most of them weren't what I would call software engineers. And some of the people who's coding wasn't up to that level were. So what's the difference between a programmer and software engineer?

I think it comes down to attitude and mindset. Programmers understand the basics of a language. They understand the syntax and know the best practices, and they regularly apply them. Give them a clear set of requirements and you'll get back a function/library/executable that meets all the requirements. And that's it. And they're happy with the results and doing things that way. They don't know why the best practices are best practices and don't care. They don't know what happens if something in the environment goes wrong. That's the other person's problem. They don't know the customer/end-user and they don't want to. And there's nothing wrong with that. In many cases that's what's needed. As IDEs and frameworks and infrastructure becomes more of a commodity it becomes easier for a technician to put the pieces together the right way and get things done.

A software engineer, on the other hand, asks what if and why. Why does that work that way? Why do you need it to do that? What happens if that packet doesn't arrive? What if the other person isn't following the rules? What if the requirements are conflicting? And my favorite question, "What is it you're really trying to do here?". Software engineers don't just make it work, they figure why it works, what might break it, and what to do in case it does. If you think those questions are important you're a software engineer. Even if you haven't spent the last 4 years learning the intricacies of C++, you're an engineer. And it's the engineers who build those commodity IDEs and frameworks and infrastructure that lets the technician be so productive.

by Leon Rosenshein

Let It Crash

This is the philosophy of Erlang. Only do the right thing, as specified in the requirements. If you can't do that, fail. Don't try to make it better. Don't try to hide it. Don't pretend it didn't happen. Just fail. And let something else deal with the problem. Something with more scope/visibility/understanding of what's going on. And that's not a bad idea. So maybe we should just let it crash.

I don't mean the robot. I mean the code. And I don't mean crash in some uncontrolled manner and leave it there. What I mean is that if you don't have the information/understanding/scope to appropriately handle a failure, don't. That's what the VIM is for.

And this doesn't mean you get to ignore errors. Oh no. Quite the opposite. You need to handle them, but you need to handle them in the right place and the right way. It might be killing and restarting the process. It might be failing over to a different master. It might be a voting algorithm to decide who's in charge now. If you have a triple-redundant flight control computer and one of them gets confused, rather than try and fix the problem, restart it and pick up from where you are. Embedded systems often have supervisor or watchdog processes that know when something bad happens and trigger restarts rather than continue the bad behavior. In high availability distributed software you often find multiple instances ready and waiting to take over if something happens to the current master.

Everything is a trade-off. Can we build everything with standby systems ready to take over at a moment's notice? Do we need to? Should we do it anyway? Is it enough to make sure that we "fail safe" and that we can recover later? While there are wrong answers, there's not always a right answer. The important thing is to think about it and understand what the failure cases are and how we're going to deal with them.

by Leon Rosenshein

Innovative Data Transfer

While we're on the subject of moving things over the interwebs, it's not just copper wires or fiber that can be the backbone of the internet. Have you ever heard of IPoAC? There's IETF RFC 1149, A Standard for the Transmission of IP Datagrams on Avian Carriers. And if that's not good enough there's also IETF RFC 2549, IP over Avian Carriers with Quality of Service, so you can ensure you get enough bandwidth for your VOIP data. And these aren't just jokes. A group in Norway was able to issue a ping command via carrier pigeon. This was a little more complex as it involved some OCR, and it was pretty slow (64 bytes/42 minutes), so probably not something Nexflix will be using.

Of course, avian data transfer isn't always slower than traditional copper. Many moons ago I did a campus interview with a candidate who had worked at South Africa's Telekom the previous summer and learned about Winston, an 11 month old pigeon that was able to transfer a 4GB memory stick 60 miles in just over 2 hours (25x faster than the local ISP), including local upload time.

And remember, there's an XKCD for that.

by Leon Rosenshein

GeoStatisticians vs Default Values

Statisticians love numbers. They love patterns. They find patterns everywhere. Add in a healthy understanding of GIS and/or an ArcGIS license and you never know what they'll find.

Consider the case of the 500 mile email. Imagine you're the part-time admin of the email system for a university and you get a call from a department head letting you know that they haven't been able to send email more than 500 miles away for a few days. What do you do? First, you ask if they're joking. then you ask how they came up with that range, then why it took so long to complain Once you've convinced yourself there's really a problem, you debug and fix it.

It turns out that in SendMail 5 the default connection timeout is 0ms. Not 50 or 10000, but 0. So if you have a bad config file the default is used. In most multi-user, shared systems the timeout is a minimum value, not an exact value. In this case, what with context switching, kernel transitions and blocking I/O calls, 0ms turned out to be about 6ms.

Do you know how fast a signal travels over fiber? Pretty close to C. And a good NIC can send an SYN/ACK back pretty darn quick (microseconds). So, the transport delay allowed for a connection is about 3ms * C, or 3 millilightseconds. and 3 millilightseconds is right about 500 miles. So with a bad config file any time the recipient was over 500 miles away the connection would time out, and the send would fail.

The moral of the story? ALWAYS have sensible defaults for your config variables OR fail on startup if they're not set.

by Leon Rosenshein

Owning Your Comments

<soapbox>

I've been providing feedback on documents (RFCs, RFPs, Incident Reports, Proposals, Code Reviews/PRs, etc) for a long time. I've been doing it so long that I remember doing it with notes in the margin and sticky notes attached to the relevant pages. For the last 10+ years, of course, the comments have been in MS Word/Google Docs and Codeflow/Phab/GitHub. Generally, there's no question who owns the document being reviewed. Who owns those comments however, is less clear.

The way I see it, a comment starts a new virtual collaborative document. Like any other document, the person who created it "owns" it. That person owns the content and is responsible for driving the resolution any issues raised. And like with any other document, that doesn't mean the owner does the work. It could be the person who produced the work being commented on needs to do something, It could be a mentioned third party. It could be that the comment just brings awareness and nothing needs to be done. Regardless of what ends up happening, the author owns the "document". And that means only the author can close the document.

Clear ownership is important to our success. If everyone is responsible then no one is responsible. At Uber the idea of "being an owner" is a thing, and for a long time ownership meant grabbing things and holding on to them. And when there are problems languishing without owners that's a good thing. But, like anything else, it can be taken too far. And that's something I want to push back on, particularly in this area.

Recently I've seen an uptick in comments getting closed by someone other than the person opening it. That's bad for a bunch of reasons. First, it's not very respectful. When you answer someone's question it's not up to you to decide if the question has been answered, and closing a comment abruptly ends the conversation. Second, the discussion helps others understand the situation. They could learn from it, recognize similar situations, or take it one step further and solve bigger issues. Finally, You prevent others from seeing the comment/response. If you leave it out there for others to see you might save yourself from having to answer the same question multiple times.

So, unless the "owner" has left the company, or you've reached out and asked them and gotten no response, don't close someone else's comments.

</soapbox>

by Leon Rosenshein

Managing Your Inbox

Remember Inbox Zero? The idea that to keep your sanity and maximize productivity you should keep your inbox empty. Basically, treat your email inbox as a task list, and make sure that by the end of every day you've dealt with all of the items. Now this didn't mean everything was done, just that you'd dispositioned it. There were 5 basic options, delegate, delete, defer, do, or respond. Regardless of which one you chose, you didn't need to worry about that email (at least for a while). And for many people it worked.

Of course, like anything, the law of unintended consequences kicked in. Inbox zero was supposed to free people from the tyranny of their inbox. Instead, for many there was lots of action and anxiety. It got a lot of people checking their email nights and weekends to keep their inbox empty. And that was almost 15 years ago when life (and social media) was much simpler.

Which of course led to a recent update of the concept. Which was really an acknowledgement of how the world had changed, and a return to the basics, the idea of getting things done. Starting with the idea that we don't just have one inbox anymore. There's work, personal, professional (but not work), and some number of social media inputs. And after acknowledging that there's multiple input streams, changing the goals. Instead of an empty inbox being the goal, understanding and focusing on the right priorities at the right time is the key. Rather than just taking control of your email, take control of your time. Figure out what your priorities are, not just at work, but in general. Differentiate between the important and the urgent. Then spend yout time appropriately.

And that's not just a good way to manage your inbox. It applies to your your sprint goals, your OKRs, and your life in general.

by Leon Rosenshein

From BeyondCorp to BeyondProd

Like what they do and stand for or not, there's little question that Google has a very large presence and impact on the technology world. From defining a language (Golang), a build system (Bazel), a container management system (Kubernetes), and a database (Bigtable) to an entire service mesh to tie them together (Istio), a lot of what is used is based on what Google has done and is doing. And it's not just the technology itself, it's how those technology pieces get put together. Google's recent white paper BeyondProd is all about how to securely connect all those microservices. And not just securely connect, but know that the thing you're running is the thing you expect it to be. That it was built securely, from a secure, reviewed source repository, with approved libraries and dependencies. 

And it all starts with Identity. In the security world it's called Authentication (AuthN). Everything command/message/request has to come with an identity that is verifiable. And until you have AuthN you don't need to worry about Authorization (AuthZ). Without AuthN, AuthZ is a false sense of security.

And that's what Zero-Trust is all about. Ensure that everything you hear is from who you think sent it and that identity is allowed to do whatever it's trying to do. That's "Zero Trust", and where we're headed. Without it, you're never really sure. For a long time in Core Business there were a set of APIs that were only accessible to the rtapi service. And those APIs knew that a caller was rtapi because there was a header, `uber-caller` that was set to `rtapi`. As long as the caller was on the PROD network and had that header the services accepted it and did their thing.

As you might imagine, pretty soon every service added that header. A few different things happened. First was that there was no traceability. Then the services fell over because they were scaled for the expected traffic. Then the abstraction leaked. Then some data got to other services. Eventually it turned into a big bowl of spaghetti that we're still trying to unravel.

So let's not do that to ourselves. We might not be Zero Trust now, but we will be, so keep that in mind as you design your services/libraries/systems.

by Leon Rosenshein

Technical Debt And Inflation

I've talked before about technical debt, and as I've noted, I think it's a really good way to think about and explain to others the fallacy of a simple, linear tradeoff inside the quality, feature set, and ship date triangle. You can usually trade between feature set and ship date and think of it as an opportunity cost. At any given moment you have a certain amount of "developer capital" (capacity) available. Every bit of that capital you apply to one (feature set or ship date) is capital you can't apply to the other. Quality doesn't quite work that way. You can trade quality for either features to development time, but that's not opportunity cost, that's borrowing against the future, and you will have to pay it back, with interest. And that's something that's pretty easy to understand and explain to others. And if you don't at least make the "vig" then eventually you get so far behind that you can't do anything.

What I'd like to add to that is the concept of inflation. That's the idea that if you do nothing, you fall further behind. If you take $100 and put it under your mattress for safekeeping and then go to spend it 5 years later you'll find it won't buy as much. If you don't make investments then the value goes down.

For software it's not the value of a dollar, but the environment. Think about Python. If you don't put in the effort to keep it up to date you get behind, lose out on security updates, and eventually all support. If you have any COBOL laying around think about what you need to go through to maintain that. Not just that, but requirements and expectations change. If Netflix was still delivering movies at 1024x768 would it be worth as much? Probably not.

All of this is not to say that you never take on new debt, only that you should do it knowingly and with a plan to repay it. Like I mentioned a few days ago, when games shipped in cardboard boxes you needed to have them on the shelves for Black Friday. If you missed that date you lost an entire development cycle. In cases like that taking on debt to ship made sense. But those situations are rare. With the advent of online sales and distribution, Black Friday is still important, but not nearly as critical.

So how are you investing your capital today?