Skip to content

How It Stays Healthy

The software now reaches users safely. The next question is how it stays healthy once it is running.

Software can fail, slow down, or behave strangely after release. If you cannot see those problems, you cannot fix them quickly.

Healthy software is observable, recoverable, and manageable.

You want to notice problems early, understand what is happening, and restore service without making things worse.

Weak operationsStrong operations
Problems are hiddenProblems are visible
Slow recoveryFast recovery
No clear ownershipClear response path
Hard to learn from failuresFailures improve the system

If the clinic reminders stop sending, the team should be able to notice it quickly, identify where the failure is, and restore the service before many appointments are missed.

That same pattern applies to any user-facing system that people depend on every day.

  • Waiting for users to report everything.
  • Logging too little useful information.
  • Not knowing who responds.
  • Treating incidents as surprises instead of expected events.
  • Can you tell when something is wrong?
  • Can you find the cause quickly?
  • Can you recover safely?
  • Do you learn from failures?

Pick one system and ask:

  • What would tell you it is unhealthy?
  • Who would respond?
  • What would help you recover?

Healthy software is visible, responsive, and recoverable.

Next, learn how it changes over time.


  1. Why Software Exists
  2. What People Need
  3. What Success Looks Like
  4. Safety, Privacy, and Trust
  5. What Information It Needs
  6. How Software Should Feel To Use
  7. How Software Is Put Together
  8. How We Know It Works
  9. How Changes Reach Users
  10. How It Stays Healthy
  11. How It Changes Over Time
  12. How Teams Make Decisions
  13. How Cost And Value Shape Choices
  14. Special Cases
  15. Putting It All Together