by Leon Rosenshein

How Big Is That File Anyway?

There are lots of hard problems in computing, but you wouldn't think counting bytes is one of them. Counting bytes is easy, right? If you want to know how big a file is just count the number of bytes in it.

Or maybe not. It depends on what you're counting and what you're going to do with the number. If you want to know how many bytes of RAM it will take to hold the data in a file (assuming it's just a blob of data) then that might be correct. But what if it's a compressed file?

Or maybe you want to know how much disk space you'll get back if you delete it. In that case you need the number of bytes in the file, rounded up to the next whole block size. Because your disk allocates things by block. Different OS's and different devices have different block sizes, so the space used on one storage device could be different than that on another, even on the same computer.

You can't forget the overhead of actually remembering where you put that file and it's blocks. That data gets stored on the disk somewhere, usually with multiple copies.

If you're trying to figure out where your disk space went, things get even more complicated. How do you count a soft link? What about a hard link? What are you really measuring, disk space used by a directory, or how much data you would transfer if you copied the directory?

And what if you have file versioning enabled (at the OS level)? Windows Shadow Copy/ZFS/LVM Snapshots all take space. Is that included in the file size? Should it be deleted when you delete a file? Replicated file systems like HDFS make this particularly complicated by sometimes reporting the number of bytes in a file and sometimes reporting the total bytes used for all replicas.

Or, to paraphrase Clausewitz, Everything in computing is very simple. But the simplest thing is difficult.