by Leon Rosenshein

Something Smells ... Primitive

I like types. I like typed languages. I find they prevent me from making some simple mistakes. The simplest example is that if you have something like int cookiesAvailableToSell you can’t do cookiesAvailableToSell = 2.5. You either have 2 or 3 cookies to sell. If you can sell the half cookie as a whole one you have 3. If you can’t then you have 2 cookies to sell and a little snack.

Picture of primitive tools
image source

I like domains and bounded contexts. They’re great at helping you keep separate things separate and related things together. Together, domains and bounded contexts help you stay flexible. They give you clear boundaries to work with so you know what not to mix. They make responding to business and operational changes easier by localizing contact points between components.

You’re probably wondering what types and domains have in common. It’s that a type is a domain. A byte (depending on language, obviously) is the set of all integers x such that -127 <= x <= 128. That’s a pretty specific domain. A character is also a domain. It’s very similar to a byte in that it takes up one byte, and can have a numeric value just like a byte, but it’s actually a very different domain, and represents a single character. They may have the same in-memory representation, but operationally they’re very different. If you try to add (+) an int and a char, in a typed language you’ll get some kind of error at compile time.

In an untyped language you never know what will happen. On the other hand, if you try to + a string and a char the result is generally the string with the character appended. That works because in the domain of text that makes sense. In the mixed domain of integers and text it doesn’t.

Which brings me to the code smell known as Primitive Obsession. It’s pretty straightforward. It’s using the primitive, built-in types in your typed language to represent a value in a specific domain. Using an int to represent a unique identifier. A string to represent a Universally Unique ID. Or a string to represent an email address. Or even an int to represent which one value of a defined (enumerated) set of values that something could possibly be. I’ve done all of those things. I’ve seen others do all of those things. And I’ve seen it work. So why not do it that way?

The most obvious is that you often end up with code duplication. Consider the case where there’s a string that represents an email address. Every public function that function takes an email address now needs to validate it. Hopefully there’s a method to do that, but even if there is you (actually all of the developers on the team) need to remember to call that method every time the user of the method passes in a string for the email. You also need to handle the failure mode of the string not being a valid email address, so that code gets duplicated as well.

Another problem is what happens if the domain of the thing you’re representing changes? You’ve got something represented with a byte, but now you need to handle a larger domain of values. Instead of changing the type in one place and possibly updating some constructors/factories, you’re now on a search for all of the places you used byte instead of int for this use case. And you’re looking not just in your code, but in all code that uses your code. That’s a long, complicated, error-prone search. And you probably won’t find all of them at first. Someone, somewhere, is using your code without your knowledge. Next time they do an update they’re going to find out that what they have doesn’t work anymore. And they’re going to find out the hard way.

Those are two very real problems. They make life harder on you and your customers/users. But they’re not, in my opinion, the most important reasons. There’s a much more important reason. Still thinking about that email address as a string, what if you have an API that sends an email. It’s going to need, at a minimum, the user name, domain, subject, and body. If you have all of them as type string then you make it easier for your user to get the order of the parameters wrong and not know until some kind of runtime error happens.

How else could it be done?

A better choice is to create a new type. A new type that is specific to your domain. That enforces the limits of your domain. That collects all of the logic that belongs to that domain into one bounded context. That abstracts the implementation of the domain away from the user and focuses on the functionality.

Sticking with the string/email, changing your APIs to take an email address instead of a string solves all of the issues above. Instead of getting an InvalidEmailAddress error from the SendEmail function the user gets an error when they try to create an email address. The problem is very localized. It’s a problem creating the address, not one of 12 possible errors when sending the email.

You never need to remember to check if the input string is a valid email address. You know it is when you get it because every email address created has been validated. Do the construction right and they can’t even send in an uninitialized email address.

If for some reason later you want/need to change from taking a single string to creating an email address from a username and domain you just do it. You can create a new constructor that does whatever you want with whatever validation you think is appropriate. All without impacting your users.

And best of all, this happens at compile time. Get the order of the parameters wrong and the types are wrong. A whole class of possible errors is avoided by ensuring it fails long before it gets deployed.

Because the best way to fix an error is to make sure it doesn’t happen in the first place.