How Third Grade Arithmetic Can Sink Your Software Architecture


Do you really understand single points of failure?

A Single Point of Failure (SPOF) will cause an entire system to fail, even if it’s the only broken component. The Power of Multiplying By Zero is a mental model we can use to better understand SPOFs. Consider these two expressions:

865309 x 42 x 1984 x 221 x 0

865309 + 42 + 1984 + 221 + 0

Assuming you have beyond a third-grade education (or your country’s equivalent), the second expression may take you a moment, but the first expression should evaluate itself.

The answer is ZERO.

No matter how large the remaining numbers are, only one multiplication by zero wipes everything out. This is no different than saying, “a chain is only as strong as its weakest link.” We call these Multiplicative Systems.

Distributed systems give us another example. We know the maximum theoretical availability of a node is the product of its collaborating nodes’ availabilities. Four nines (99.99%) of availability is a pretty decent goal. But these collaborating nodes compose a multiplicative system:

99.99% x 99.99% = 99.98%

Even if both collaborators meet their availability targets, our node can only hope to achieve three nines (99.9%) at best!

Consider these examples of real-world systems defeated by the Power of Multiplying By Zero:

  • On Christmas Eve 2012, the viewing experience of multiple Netflix customers was multiplied by zero. Their devices were unable to contact load balancers wiped out by a maintenance script executed by accident.
  • In February 2017, multiple sites including Quora, Trello, and IFTTT, were multiplied by zero due to a massive AWS Simple Storage Service (S3) failure triggered by a single typo. S3 had historically been so stable most engineers never considered failure a possibility.
  • Also in 2017, the data privacy of 143 million Equifax customers was multiplied by zero due to dozens of services containing an unpatched version of Apache Struts. The vulnerable services allowed attackers to execute arbitrary commands on Equifax’s servers.

As you can see, multiplicative systems aren’t just theoretical. So as you design your next system, continually ask yourself: “What would happen if we multiplied this component by zero?”