Ability to recover from failure
in tech ability to switch of the whole or parts of the system and still being able to recover is the first step in resiliency.
making the lower level components replaceable is one way of improving both reliability and resiliency
Observability plays a critical role in resiliency and RCAs help us learn from our failures so that we are more aware of things that lead to failures and how we can recover from them.
terms
Referenced in:
All notes