by Leon Rosenshein

Listen to Your Data

I’ve touched on the importance of listening to your data before, but I decided that the topic is worth revisiting. That time it was about the difference between 0, 1, and many. As a side note, I tmentioned the relationship between data and Object Oriented Programming, and how your data can tell you what your objects are.

That’s still true. When people ask me to take a look at their design and architecture and wonder what the right answer is, my first answer is usually, of course, It Depends. And when they as what it depends on, I generally say it’s a combination of two things. First, the problem you’re trying to solve, and your data. They’ve usually thought about the problem they’re solving, but when they often haven’t thought about the data. So I tell them, go listen to your data and figure out what it’s trying to tell you.

It’s about boundaries and bounded contexts. It’s about composition vs inheritance. It’s about cardinality, what things change together and what things change by themselves. It’s all of those things. How you store, access, and change your data will help you design systems that work with your data instead of fighting against it. From your internal and public APIs to your user interfaces to your logging and alerting.

But it’s also more than that. You have to listen to your data not only about how you store, access, and change it, but about how you process it. What parts do you need to process sequentially, and what parts can you process in parallel? Are you processing real-time streams, or are you analyzing huge piles of historical data? Do you want/need recency bias in your data or do you need to have long term trends? Or maybe both? All of this is going to impact your system.

The trick is to learn to listen to your data at small scale. Where you have the luxury of being able to try out something and see what the pain points are while you’re able to get things to work. Try different data structures. See what kind of algorithms they push you towards. See what the makes them work well, and what gets in the way. You can usually make any data structure work with any algorithm, but some things work better together. Trees lend themselves to depth first searches. There are other ways to do depth first, but it’s a lot easier with a tree than with an array.

One of the hard parts about learning like this is having a source of problems that have answers_. So you can check yourself. One possible source is an undergraduate comp-sci class. In many cases you can find an online class with problems and valid answers. Another is interview prep systems. Like leetcode problems. In general, I hate leetcode as an interview technique, for lots of reasons that I’ll get into another time, but as a learning opportunity, I think they’re a great place to start. Or, if you want a bit of competition to spur you on, another good place is the Advent of Code. Once you’re done speed-running it for points and have a working answer, take some time to experiment with the problem space.

Regardless of how you do it, once you learn how to listen to your data, you’ll hear it talking to you in unexpected ways. You’ll be able to look at a problem with large scale data and see how to break it down, categorize it, and work with it. So your solution works with your data. Not just today, but tomorrow as well.