by Leon Rosenshein

Global Data(base)

A database is a giant global variable that pervades your code. How, exactly, is that a good thing?

  -- Allen Holub

Well. That’s a pretty bold statement. Let’s break it down and see how true it is.

A database is certainly a “global” variable. Or at least it is in the sense that any code with the correct permissions/connection string can connect to it and read/write data. That does make it global. Can’t really argue with that.

Especially if you only have one database. With one (widely) shared secret that lots of different things use to connect to that database. To read/write from the same table with a magic set of filters. Or lots of different tables that all have the same unrestricted permissions.

In that case your dB is a SPOF. SPOFs are almost never a good thing. SPOFs are rarely completely unavoidable, but can be more expensive (however you define that) to eliminate the SPOF than it’s worth to avoid, so you live with them, even if it’s not a good thing. so that part is true too.

Which leaves us with the middle part, pervades your code. That’s an interesting one. And about as subjective as you want to make it. Liberally sprinkling your codebase with code (not function calls) that opens a connection to a database, reads/writes some data, then closes the connection and moves on is almost certainly a bad thing. Can’t really argue with that.

So databases are bad and we should just stop using them, right? Of course not.

It’s about using the right tool for the job, and using the right way. If you have one service/tool that needs to deal with a few hundred kilobytes of inter-related data the right tool is probably a clear data model with an API designed for how the data is actually used. Behind that data model have an in-memory set of maps and a little bit of logic to connect them. If you need to get really fancy add some kind of write-through mode so any written changes get persisted for next time you start up. No (formal) database involved.

What if a bunch of things need access to that data? Put it in a (micro?) service. Same API, but now accessible from lots of things at the same time, regardless of what process they’re in. Still no formal dB involved.

18 months later and things are going so well that you’ve got 100s of GB of active interrelated data. Read/Write rates are up. You can’t afford to lose changes, and the ability to undo a set of changes as a unit is now important. So you start to build those changes in. Then you get smart, and instead of reinventing the wheel you put your data in a traditional database.

So was Allen right? It depends. Did you need a 256 core MS-SQL database with 512 GB of RAM and 2 PB of SSD to start out with? Of course not. You didn’t even need a simple AWS RDS. So it was correct.

On the other hand, a database is just a collection of data you access via computer, so pretty much everything is a database. So in the most fundamental sense, he was completely wrong. You already have a database, because storing and accessing data is what programs do.

Like everything though, context matters. It was a tweet, it was written for retweetability and impact. As a conversation starter it was effective. So next time you go to create a new AWS RDS, think about if you need that much database yet. You still need to do all the other hard parts of design, but do you need that bit of complexity? I can’t tell you for sure, but I do know it’s worth thinking about.