by Leon Rosenshein


A SPOF is a single point of failure. It’s that one little thing that everything else depends on and doesn’t have redundancy. Like building an entire electric grid to make sure you have power available, adding an external generator and battery backup in case the grid fails, and building your house with multiple circuits to each room, then having a single line going from the battery/generator/grid multiplexer to the house. If that single line fails you’re out of power. At least in that case if the line fails you can probably run an extension cord from the working power supply directly to the house.

Now consider the James Webb Space Telescope (JWST). As I write this the JWST is about 500,000 miles from the Earth and is undergoing a complex, complicated process to get things ready for its real work. One of the biggest things it needs to get done is unfold its sunshield so the cold side stays cold. The sunshield, when unfolded, is the size of a tennis court and consists of 5 layers of incredibly thin (< 0.05mm) kapton film sheets. What makes it complicated is that the sunshield is packed for travel, it, along with all the rest of the satellite, needs to fit into a 5x15 meter fairing for launch. What makes it complex, is that during the unfolding process everything needs to move together at the right speed with the right force to avoid wrinkles and tears. On top of that, it needs to do it in a vacuum and microgravity. So you end up with 344 SPOFs.

Since it’s 500,000 miles away, in conditions that can’t be replicated reliably for any meaningful duration, testing in real conditions is hard. You can do lots of unit tests. You can do some underwater tests to approximate microgravity. But integration tests, not so much. So you plan. And you plan some more. You consider failure modes and build in contingencies. Then you build contingencies for your contingencies. And after all that, and $10,000,000,000 you launch with 344 SPOFs. In that case you have to have way more than 2 9’s confidence in your SPOFs. You need 4 or 5. That’s impressive.

So next time someone says it’s too hard to get 4 9’s on a highly available system remind them of the JWST, which got that kind of reliability with 344 SPOFs, so doing it Earthside, where you can touch things, change them, and have redundancy should be (relatively) easy.