by Leon Rosenshein

Policies

I’ve talked about Chesterton’s Fence before. It’s the idea that you have to understand why something was done in the first place before you decide to undo it. You buy a vacation house with a fence and a gate across the driveway and all you ever do is stop, open the gate, drive through, and then close the gate behind you. To save time and trouble, you remove the gate because all it does is slow you down. You come back a few weeks later and find that the wild goats have not only eaten your grass, but as they are wont to do, turned it into a goat desert, eating not just the grass, but the trees and shrubs as well. Now you know why the fence (Chesterton’s Fence) was there. It wasn’t to slow you down, it was to protect your landscaping.

As I mentioned before, you can see things like that in code. Input validation tests for things that should never happen. Error handling even when input is validated and the call should never fail. An extra watchdog timer wrapped around an event handler. You could take them out and things would be fine for a while. Probably for a long time. But remember, saying something hardly ever happens is the same as saying that it happens, so eventually there will be a problem. Until you can be sure the thing can’t happen you need to be ready for when it does.

Which brings me to policies. Policies are rules about when and how to do things. Sometimes they’re written down, like in an employee handbook, or even enforced by the system (like tests need to be run before landing a PR). Sometimes they’re part of the team/org’s minhag, the custom, handed down as tribal knowledge, where you get told how the team let’s downstream users know about planned changes and outages by using a certain format in a specific Slack channel. And sometimes they’re only found when you violate them, like the policy that says if you need some new hardware you could just technically order it, but you’d better ask the admin first and they’ll take care of it. Regardless of how you learn about them, they’re there.

The thing is, as redundant or arbitrary as they seem to be, they were almost certainly put in place as a response to something that happened. As Jason Fried said,

Policies are organizational scar tissue.

However, just because a policy exists, and that it might have made sense at the time, it might not make sense now. That’s where the rest of that quote comes in

They are codified overreactions to situations that are unlikely to happen again. They are collective punishment for the misdeeds of an individual. This is how bureaucracies are born. No one sets out to create a bureaucracy. They sneak up on companies slowly. They are created one policy—one scar—at a time. So don’t scar on the first cut. Don’t create a policy because one person did something wrong once. Policies are only meant for situations that come up over and over again.

The problem is that in general, policies don’t have expiration dates. In fact, it’s the opposite. The longer they’ve been around, the harder it is to change them. Which can be a problem. Because policies are set in isolation from each other. And they accumulate. They can even conflict with each other. So you have to be careful when setting policies.

You probably don’t need a policy the first time something happens. You need to think about how likely it is to happen again, the cost of having/living with the policy, and what the cost of it happening again is. Unless the cost of it happening is greater than the cost of it happening, consider it as a teaching moment and remind people of the goals and consequences. Just send an email to the right group of people.

Instead, use policies for things that happen multiple times and have a very high cost that you can’t (easily) put a mechanism in place to prevent. If you want to make sure tests are run on every PR/commit, don’t make it a policy and hope folks do it, make it a part of the system so they don’t have to think about it. On the other hand, if you want to let your customers know about upcoming downtime or service interruptions, make a policy, write it down, and make sure everyone knows.

Finally, put a re-evaluation date on your policies. Write down why the policy is in place, what it’s goal is, and ideally, what criteria need to be met to remove the policy. For instance, you might have a policy to run unit tests today, then re-evaluate it every week while you build the mechanism to do it automatically. Once the mechanism is in place you can remove the policy.

And if you find a policy you don’t understand the reason for, remember Chesterton’s Fence. Don’t just remove it because you don’t know why it’s there. Figure out why it’s there, decide if it’s still needed, for that or some other reason, and then make a decision to keep, modify, or remove it.