Adds docs/RUNBOOK-OPERATIONS.md as the single operator-facing entry point for
provisioning, troubleshooting, and recovering Catalyst Sovereigns:
A. Pre-provision checklist — Hetzner project + token, Dynadot pool zones +
credentials, GHCR pull token (cross-link SECRET-ROTATION.md), PowerDNS pool
zones bootstrapped, PDM healthy, bp-* chart versions, subchart-guard CI green.
B. Step-by-step walkthrough with timing — Phase 0 OpenTofu (30-60s plan +
60-120s apply), PDM /commit (~5s), cloud-init (3-5min), Phase 1
bootstrap-kit (10-15min), cert-manager + Cilium Gateway (1-2min). Total
15-25min for a solo Sovereign.
C. 18 known failure modes with SYMPTOM / ROOT CAUSE / DIAGNOSIS / RECOVERY,
each pinned to the canonical fix commit (c6cbfe68, e571ec7a, 54872009,
2022e1af, 34c8de84, dddbab4b, 43aff202, 418cead0, 64d7de97, 330211d2,
41c7ac13) or marked fix-in-flight where applicable.
D. Idempotent recovery script (Hetzner purge with DELETE-204-but-resource-
persists verification sweep, PDM allocation release, catalyst-api
deployment-record cancel). Dry-run by default; --apply gates real deletes
on a validated HETZNER_API_TOKEN.
E. Cross-links to INVIOLABLE-PRINCIPLES, SOVEREIGN-PROVISIONING,
RUNBOOK-PROVISIONING, BLUEPRINT-AUTHORING, CHART-AUTHORING, SECRET-ROTATION,
PLATFORM-POWERDNS, IMPLEMENTATION-STATUS — references, doesn't duplicate.
F. Mermaid phase timeline diagram at the top showing ownership boundaries
(catalyst-provisioner -> cloud-init -> Sovereign cluster) and hand-off points.
G. Mermaid failure decision tree at the end — operators land at the right §C
entry in 4-6 yes/no questions.
Recovery script gracefully degrades to a name-only preview when
HETZNER_API_TOKEN is unset in dry-run mode (apply mode still hard-fails on
missing/invalid token), so operators can review what WOULD happen before
exporting the token.
Verified dry-run output against the live omantel.omani.works Sovereign:
- Step 1 lists 8 Hetzner kinds + 8 verification-sweep targets to inspect
- Step 2 confirms PDM reports the subdomain currently RESERVED (live state)
- Step 3 correctly identifies catalyst-api deployment 6274daeb7a9873cd
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>