I had a different topic on my mind for today but life and the internet have conspired to change my mind. Today seems to be about refactoring instead. I’m trying to upgrade a docker image to use some newer libraries and the definition of which versions of libraries are used/depended upon are scattered hither and yon. Where they’re defined at all and not just picked by a happy accident at the time things were set up. At the same time I got the latest pre-release part of Kent Beck’s Tidy First? on why you might want to do your tidying at different times in the lifecycle and saw the Code Whisperer’s article on What is refactoring? so I guess I’ll talk about refactoring instead.
Most of what I’m doing down in the depths of docker base images is refactoring. It’s Tidy FIRST. Moving definitions around and collecting them into fewer places. Using those definitions instead of specifying directly in all of the individual use cases. Making sure things still work. Adding some tests that work before any changes and making sure they still pass after the refactoring. When I get done there are no observable changes. Or at least that’s the goal.
Turns out there are some observable changes. Things didn’t actually work when I started, and it does me no good if it doesn’t work. So even before tidying is making it work. The world isn’t as static as some code might like. Some code isn’t as backward compatible as other bits of code would like. Some security systems have been updated and require a different set of keys than they used to. Some things have just moved. All of that needs to be handled. For example, what does
RUN pip3 install --no-cache --upgrade nvidia-ml-py do in a Docker file? It installs the latest version of
nvidia-ml-py, that’s what it does. It did that yesterday, last month, last year, and probably will next year. It’s good that it always does the same semantic thing. Unfortunately, the specific version it installs is going to be different in some of those cases. Which means a docker image built today, using the same version of Docker, and the same Docker file doesn’t always give you the same image. There’s an implicit external dependency in that line. A better choice would be something like
RUN pip3 install --no-cache -r requirements.txt, where requirements.txt specified what version of libraries you want.
Which gets us to when to tidy. When you’re building that Docker file you don’t know what versions of which libraries you want, and getting the latest versions is probably a good place to start. Once it’s working docker images are immutable, so you know the image won’t change. (NB: While the image might be immutable, if you’re using tags and expecting consistency, think again). So this could be an opportunity for Tidy NEVER. The code won’t change. The image won’t change. Don’t spend more time on it than needed. There’s always something else to do, so why tidy?
In this case, it’s because it was reasonable to think that someone might need to update the image in the future. Which means making the process more immutable/repeatable would have been a good choice. Which moves us to the realm of Tidy AFTER. In this case you move faster by not locking the versions until you have things working. Once you have things working that’s the time to tidy. To use pinned versions. Leaving things in good shape for the maintainer.
But that wasn’t done. So here I am now, doing Tidy FIRST. Not just tidying. Not just the classic refactor of making the change easy, then making the change. First I have to make it work. Figure out the right versions. Make sure they’re being used. Get it back to working. Then I can do some tidying. Then do some more tidying, making sure things still work. Then, and only then, make the change.
Because, as the Code Whisperer said,
a refactoring is a code transformation that preserves enough observable behavior to satisfy the project community
That’s what a refactoring is. Sometimes though, you have to make it work before you can do one.