Culture, Trust, and Reliability

Modern software grows in boundless complexity defying understanding. Unknown and changing relationships between components emerge in surprising failure. Technologists, so focused on automation, loose sight of their own collective role in the systems. It is precisely trust among software communities that will allow us to navigate the vast complexity of our own making.

I think of the following collection as a web of related facts just as entangled as the systems we create.

* No one understands our complex system. It is already more complex than one person can understand.

* Incidents are, by their very existence, behaviors of our system that catch us by surprise.

* Fault tolerance in our system is created by replacing single-points of failure with various kinds of redundancy.

* Incidents in highly-redundant systems always involve simultaneous failure of related components. Often, the relationship is only discovered because of the incident.

* Incidents between components that cross team boundaries are especially challenging because they require coordination between members of all involved teams.

* Balancing all of these forces requires human coordination.

* Human relationships depend on trust.

* Building trust with other humans requires being vulnerable, especially being mutually vulnerable.

* Working deeply in diversity, equity, and inclusion creates myriad opportunities to practice supporting one an other in our vulnerability.

* Complexity grows combinatorially. We must absolutely practice building trust within and between teams to have any hope of keeping all these components moving and growing.

.

John Allspaw draws attention to all the humans above the line of representation and all the technology below the line of representation. See Human Performance in Systems

Peter Alvaro, dives into combinatorial explosion, and suggests we use distributed tracing to graph the dependencies in our complex systems and use SAT solvers over the resulting graph to identify targets for fault injection. See Twilight of the Experts