Recent Posts (page 49 / 65)

by Leon Rosenshein

Overcommunicating

I'm one of the folks down in the engine room (the infra team). Our job is to make sure that all of our customers are happy and their appetite for processing (batch, long running services, stateful, stateless, etc) and storage (HDFS, Blob, Posix, etc) is met. As such, we're essentially a service team. That doesn't mean we write services (we do that), but that we provide a service to our customers. And our customers use those services to do their work, whatever it is. And like other service teams, when things are going well and our customers needs are being met they don't think about us too much. And that's the way it should be. Processing and storage as reliable as running water.

Of course, when things go wrong people notice. That's an incident. So we do what's called Incident Management. And that's comprised of two things. The first is dealing with the problem. We mitigate the problem as soon as possible, understanding not just what happened, but why, then we do what we can to make sure it doesn't happen again.

The second part, which is at least as important, is communicating. Making sure people know there's a problem, and that we're working on it. Making sure people understand what caused it. This communication helps in lots of ways. It keeps people from wasting time trying to figure out what they're doing wrong when the problem isn't theirs. It keeps people from wondering what's going on and distracting us from fixing the problem. It helps our customers understand how they can make better use of our services without impacting others.

This communicating is such a big part of the process that it's one of the named roles in our SOP. We typically have two people on call for each area. When there's a problem the primary is the lead, responsible for making sure the problem is solved and that the right people are working on it. The secondary is the communicator. Their job is to keep everyone else informed so that they primary can work the problem. This includes posting to slack in #breaking, updating our dashboard with a banner, creating/updating an incident, and making sure that there's a Post-Incident review. They also handle all the ad-hoc queries that come in during the incident.

And while not everyone at ATG is working on services handling 1000s of requests/sec, we all have customers. Some internal, some at Core Business, and some external. And they all have expectations. So, whether you're responsible for a function call, library, REST/gRPC service, website, or a car showing up, what's your plan for when your customer complains? If you take a look at our SOP, it explicitly says it's not about how to solve the problem. It's about how to manage the process of solving the problem and making sure everyone who wants to know what's happening knows.

by Leon Rosenshein

I Feel You Man

Is your code empathetic? Do your users think it is? Do your users think you're empathetic? Is there anything you can do to change that? Does it matter?

All good questions. And of course it does. But before you can answer them you need to understand what empathy is. People often think that having empathy means that you feel what the other person is feeling, and that's part of it, but there's a more important part. Not feeling it, but knowing how they feel, and doing something about. Being able to predict how someone is going to feel or react, and take that into consideration before it happens.

And that's a learned ability. Something you can get better at. It's training yourself to think about things from the other side. When you're writing code, thinking about what it would be like for someone who doesn't have the context to see it for the first time. If you're wondering what that person would be feeling just take a look at some code you wrote over 6 months ago. Do it now. I'll wait.

That slightly lost feeling you had? Now you know how they would feel. Think about what would have made it easier for you to understand what you were reading. Now add that to your next PR.

Remember the last time you got an error like "There has been an unexpected error. Operation Failed."? Wasn't that helpful? Of course not. So do better. Think about user expectations. Give pointers and direction when there's a problem.

We can't make foolproof code, users can't read the developer's mind, and the code can't read the user's mind, so there are going to be mistakes. And that's where empathy comes in. Be prepared and make it easier for everyone.

Postscript

Now that we're all WDP, this is even more important. Communication is harder and slower, so mistakes are magnified, and clarity has an outsized impact. There's more than enough stress going around, so taking the time to not add to that stress really helps. And not just in the code. In every form of communication, take a deep breath, assume good intent, and don't overreact.

by Leon Rosenshein

Shiny Happy People

We've been working from home for almost 2 weeks now. How are you feeling about it? There's no question that there's been a huge disruption in the force. It's something we all need to get used to. And as someone pointed out the other day, this isn't really WFH, it's WDP (work during pandemic) which is not exactly the same thing. But there are some similarities. The priorities might have changed, and the relative ranking of taking care of yourself and your family vs work has certainly changed. But for most of us the work itself hasn't changed. The long term goals haven't.

What has changed is the environment. Both the physical and psychological environments are very different than they were 2 weeks ago. We're home, practicing social distancing, and dealing with our teammates over a screen. What used to be turning in your chair and asking a question is a much more involved project. There are new and different demands on our time. Feedback from coworkers is slower, but feedback from the people in the house is immediate.

So what can we, as employees and teams, do to be productive and feel fulfilled? I know for myself. one of the things I need to be productive is pixels. Way more pixels than my MacBook has. So I brought home my 2 external monitors. Those that work with me can attest that I don't sit down much, so I got myself a standing desk. And I'm fortunate enough to have a room at home that can be my office so I can "go to work" in the morning and "come home" at night. I used to walk to work, so now I take the dog for a walk to separate work from home.

Our team has it's own little zoom chat that sits in the corner of one of my screens (remember all those pixels) so I'm not alone. But I'm not always there either. Sometimes I just wander off. And that's ok. Sometimes I focus, sometimes I'm checking on my reactor, and sometimes I'm watching some online continuous education video. Some of them are even work related :)

And above all, I've talked to my manager about things. About having the space and time to get things done, but also the time to not be doing things. And I have trust. Trust that I'm doing the right thing, that the company is doing the right thing, and trust that my manager trusts me to do the right thing. And sometimes I just watch silly videos. What works for you?

by Leon Rosenshein

Keeping It Clean

Compilers are really good at turning the code you write into some kind of executable, be it machine code that executes directly or ByteCode that gets run by some virtual machine. It always does what you tell it, and most of the time it does what you want.

But that doesn't make it clean. To write clean you, first, you need to know what clean code is. There are lots of lists and acronyms that talk about what clean code is, but I think Martin Fowler summed it up best, with "any fool can write code that a computer can understand. Good programmers write code that humans can understand." It's not enough to make sure the compiler knows what you mean, you need to make sure future you (and your teammates) knows what you mean.

The code needs to be simple to understand (minimize cognitive load). It needs to be easy to navigate (although your IDE can help). It needs to do what the reader expects (from the library/class/method name) and nothing else. It should fail in predictable and informative ways.

But truly clean code goes beyond the actual source code. Comments (and doxygen notes in our case) should explain not what choices were made, but why. They should help the reader understand not only when to use something, but also when NOT to use it. And it's not just in the source itself. There are RFCs that give the big picture and the reasoning behind it. There might be presentations that explain some of the details. They might be codelabs that walk a user through use cases. Maybe a reference implementation or two.

In addition to the usual web links and links to other posts, there are 2 links to books on the O'reilly/Safari website (Clean Code and Code Complete). That website is a treasure trove of books, presentations, and online classes. If you find yourself with some spare time and bandwidth, check out what's there.

by Leon Rosenshein

On Conflict

Conflict is real, inevitable, and everywhere. Every comment on a PR or RFC is conflict. So is every question in a design review. If you're on a team that truly has no visible conflict then take a deep look around and try to figure out why. It's probably not a good sign. I doubt you all really agree about everything.

Unless you never make decisions anyone disagrees with, every decision you make involves some conflict. Everyone who disagrees with your decision puts a cost on the impact of your decision and a cost on the potential conflict (raising the issue). Most people tend to dislike conflict so that cost is relatively high. In many (most?) cases, the cost of the decision appears relatively low, so the conflict is never voiced. And generally that's ok and the team and work progress.

But is that really the best thing? Is conflict bad? Of course it can be. And unvoiced conflict isn't much better. There are two obvious problems with unvoiced conflict. First, the person who isn't saying anything feels stifled, unheard, and diminished. That's never a good thing. And in these days of Zoom and WFH, it's even easier to feel that way.

Second, we're missing out on important points and opportunities. There's a high probability that by not addressing the issue you're taking on unknown technical debt that is just going to bite you later. It might very well be the right thing to do, but we should never take on future work without being aware of it.

So what can we do about it? The part that we all can work on is reduce the cost of conflict. Keep it about the topic. The Cambridge dictionary says conflict is an active disagreement between people with opposing opinions or principles. The key here is that its people with opposing opinions, not opposing people. So talk about the ideas. Be open to new ideas. Listen to the other person. That person has decided to expend significant energy to raise the issue. Respect that commitment and listen to the ideas. Talk about the ideas and how they interact. Talk about use cases and future implications. Don't talk about people.

At its best, conflict is more like improv. The best jam sessions grow not out of agreement, but out of collaboration. Reaching the best possible solution that includes previously missed thoughts and ideas. The kind of conflict that starts with "I don't think that solution handles 'X' well, and ends with it handling 'X'. 'Y', and 'Z'.

by Leon Rosenshein

Naming Is Still Hard

Back in January I wrote about naming things and I came across another article with more info so I figured I'd share that. The one I have some disagreement with is the emphatic advice to avoid hungarian notation. While I agree that prefixing with the base type information is wasteful and makes things harder for your IDE, the original usage for hungarian notation, particularly when writing C code for windows makes a lot of sense. When your language/compiler can't help you out you help yourself. That's still relevant today with some DSLs. When I was writing shaders in HLSL for instance, the compilers weren't very smart, so we had to watch out for things ourselves.

by Leon Rosenshein

Circles

I was wandering around the internet the other day and came across an article describing something called the trinity architecture. It's a way to indicate composition, dependencies, and abstractions. And this is important because everything we do in the world of 0's and 1's is a model of something that we interact with. Ideally the model and the interactions develop together or at least the model works to handle the expected interactions. Compositions, dependencies and abstractions are how you describe those things. And the better those things fit together the easier it is to reason about. The easier it is to reason about, the lower the cognitive load.

And in many ways, that's the goal of architecture. To reduce the complexity and cognitive load of understanding the entire system so you can focus on one part and get it right. That's Domain Driven Design in a nutshell. Make it possible for people to do their jobs mostly in isolation, but being able to rely on what someone else has done or is doing.

So what does trinity bring to the table? It brings a slightly different naming scheme with three layers and it uses nested and tangential circles to indicate composition, dependencies, and implementations. It's certainly a new way to look at things, but as I see it, it's a little too abstract. Sure, it's got a domain, which is the model of what you're doing, and a clearly defined Public API to interact with that model, but then everything else is thrown into the Aux bucket. Everything from the physical computers and networks to the libraries, databases, queues, and event buses that you build things out of. And there's no actual customer/user facing thing, just an API. Sure, in some B2B situations you're building an API, but in most cases, you're supposed to be solving a problem, not just building an API. Which means unless you have the Domain exactly right the API doesn't let you your customers solve their problems. Maybe you know the space you're working in that well, but I'm pretty sure I don't. 

The trinity is also a framework for building applications. And there's value in frameworks, especially when they have all the buzzwords. And of course it's an open source framework, so you can start using it for free. And since it's backed by a company you can probably buy all the support you want.

So what do you think of the trinity? Is it a good idea and a way to think about building software? Is there something we can take from it as we build massively scalable distributed systems? Is it just a thinly veiled marketing campaign for unwary developers? Or is it somewhere in between?

by Leon Rosenshein

Intro To Architecture

Name dropping today. Neal Ford and Mark Richards. I've met and talked to both of them a few times. Had some really interesting conversations around scope of influence and the difference between a software developer and a software architect. They're both very good at not only high level architecture discussions but digging in to the details and thinking about specific use cases and approaches. They've got a new book coming out, Fundamentals of Software Architecure, and they're doing a webinar about it next month. Since we're all spending all our time online anyway, if software architecture is something you're interested in, think about checking it out.

Another thing to think about if you're interested in architecture is an architectural kata. I've done them with both Neal and Mark, and led a few sessions, at Uber and elsewhere. If you're interested in doing a kata session after we're back in the office let me know. If there's enough interest I'd love to run another session.

by Leon Rosenshein

Write It Down 2

Documentation is a thing. And there are lots of different kinds, each with their own use case and requirements. But one thing is consistent. Keep the audience in mind. What do they know? What don’t they know? What are they looking to know? What do you want the reader to know?

The most important thing to remember is that it’s for the reader. Whether you’re writing API docs for the RNA Docset, end user documentation for a public facing website, a postmortem, a bug report/feature request, or notes to yourself. remember the reader. Even if you’re writing notes to yourself, what you’re writing isn’t for you now, it’s for future you without all of the contect you have in your head while you’re writing. The reader doesn’t know what you know now. So everything you think is obvious and straightforward now won’t be when you read it. And that’s the simplest case, when you have a chance to know the details.

Another important thing to keep in mind is what the reader wants to do. People are trying to get something done, and they’re looking for help. So provide it. To do <X>, perform these steps. Think about the common mistakes or problems people might have and discuss them. Provide troubleshooting tips. As the writer of a tool/UI/library you know now only what should happen, but what to avoid. The person who has nothing but the documentation doesn’t. If you don’t say it, they don’t know it.

Provide levels of detail. For codelabs we have 100, 200, and 300 level labs. Even if you’re not making it that isolated, you should tell people what they need when they need and make more detailed information easily available when they’re ready for it.

And then have someone you trust, but doesn’t know how to do the thing you’re documenting, do it. We’re working on turning up a new DC and I put together a runbook and a set of scripts to do it, then gave it to someone on my team to try. I thought about what he would know, what he would have available, and tried to document it as unambiguously as possible. Guess what happened. He got stuck on step one because there were some setup steps that I hadn’t documented. So we fixed that and tried again. Little things kept popping up where I wasn’t as clear as I thought. So we fixed those. Then he ran into an error that I hadn’t seen before. So we added more to the troubleshooting section. We’re still tweaking it and making it easier, but it’s at the point where someone else could come in, pick it up, and have a very good chance of making it work.

Having written all that, I find that I’ve actually missed the most important thing of all. You need to actually write something down. Once you have something you can make it better. But you need to start.

by Leon Rosenshein

Dates Are Hard Too

I mentioned a while back that one of the things developers make assumptions about is that time is a monotonically increasing function. If only that were so. But it's not just time, one second ticking into the next, that isn't as simple as it seems. Duration is hard too. Not just the number of seconds between two times, but how you turn seconds into weeks, months, and years. For example, the windows calculator can tell you how many days, weeks, and months there are between two dates. Seems like a relatively straightforward thing to do. But sometimes it goes wrong.

For instance, It claimed that between July 31st 2019 and Dec 31st 2019 there were 5 months, 613566756 weeks, 3 days (152 days). The 152 days part is right, but the weeks? How in the world did they come up with that number? Turns out counting is hard. I'll leave the details to the article, but it comes from mixing signed and unsigned numbers and ambiguities in what it means to advance a calendar by one month since months aren't all the same length. It usually works, but there are always edge cases. Something to think about as we work on safety critical systems.