Reconcile Pass 1 — first holistic LLM-driven reconciliation pass per ~/.claude/skills/reconcile-catalyst-docs/SKILL.md. Skill triggered after the post-Group-M architectural batch (#161, #162, #163, #167, #168, #169, #170, #171, #173, #174, #175). Live ground truth verified against kubectl + ls platform/ + git log + GHCR + componentGroups.ts. Drift categories fixed: - A. Numerical: bp-powerdns 1.0.5 → 1.0.6; component-logos 63 → 62 (powerdns SVG missing, tracked under #173); bootstrap kit 11 → 12 with bp-powerdns added per #167. - B. Service: pool-domain-manager + 5 registrar adapters (Cloudflare/Namecheap/GoDaddy/OVH/Dynadot, #170) added to IMPLEMENTATION-STATUS, ARCHITECTURE, PLATFORM-TECH-STACK, GLOSSARY, and PROVISIONING-PLAN; bp-powerdns added to ARCHITECTURE bootstrap kit + Catalyst-on-Catalyst dependency tree. - C. Architectural: SOVEREIGN-PROVISIONING §3 + DEMO-RUNBOOK Step 4 + ORCHESTRATOR-STATE Step 6 rewritten from Dynadot-direct DNS writes to PowerDNS authoritative + PDM /v1/commit + registrar-adapter NS-flip; PROVISIONING-PLAN Phase 4 paths corrected to products/catalyst/bootstrap/api/ (per INVIOLABLE-PRINCIPLES #3 the Go provisioner does NOT call cloud APIs); Phase 6 retitled and rewritten for the new DNS architecture. - D. Process: RUNBOOK-PROVISIONING §2 wizard-step table + DEMO-RUNBOOK Step 2 wizard-step table updated to canonical 7-step ordering (Org → Domain → Topology → Provider → Credentials → Components → Review per WIZARD_STEPS in WizardLayout.tsx, post #169 + #174); the three-mode StepDomain (pool / byo-manual / byo-api per #169) and two-tab StepComponents (mandatory infra + apps per #161/#162/#175) now documented. - E. Cross-doc: Group G ✅ across PROVISIONING-PLAN + ORCHESTRATOR-STATE (superseded by #167+#163+#170, not by the original Dynadot-multi-domain plan); Group C ✅ in PROVISIONING-PLAN (Flux is reconciling from openova-public today); README Stack-at-a-glance DNS row expanded. - F. Stale terminology: 11-grep banned-terms scan clean — every k8gb residual is a legitimate "removed at #171, replaced by lua-records" reference. VALIDATION-LOG.md gains the Reconcile Pass 1 entry per skill spec. Reconcile-skill numbering is independent of the Audit-skill numbering (which continues at Pass 108+). Files: 13 docs + VALIDATION-LOG entry. Escalations: none.
13 KiB
Multi-Region DNS — health-checked failover with PowerDNS lua-records
Status: Authoritative. Updated: 2026-04-29 (Reconcile Pass 1).
This document is the canonical reference for how Catalyst routes traffic across regions. Geographic redundancy in OpenOva is realized at the authoritative DNS layer, not at the K8s controller layer. PowerDNS lua-records (ifurlup, ifportup, pickclosest, pickrandom, pickwhashed) provide everything Catalyst needs:
- Geo-aware response selection — answer the closest healthy backend for the resolver's source IP / ECS subnet.
- Health-checked failover — drop a backend from the response set when a TCP/HTTP probe fails, restore it when the probe recovers.
- Latency-aware routing — combine
ifurlup(health) withpickclosest(geo) for active-active steering. - Same operational layer Catalyst already runs — PowerDNS is bp-powerdns, deployed by the bootstrap kit on every Sovereign's
mgtcluster. No separate operator, no extra CRDs, no extra reconciliation loop.
This subsumes the role previously assigned to k8gb. The k8gb component has been removed from componentGroups.ts, the umbrella chart, and the wizard; lua-records cover every failover scenario k8gb covered without the dedicated GSLB controller.
1. Why PowerDNS lua-records (and why not k8gb)
| Concern | k8gb (removed) | PowerDNS lua-records (current) |
|---|---|---|
| Authoritative DNS | CoreDNS plugin, separate zone | PowerDNS authoritative — same zones used for external-dns, ACME, etc. |
| Operator footprint | k8gb controller + CRDs (Gslb, GslbHttpRoute) + per-cluster CoreDNS pod set |
None — declarative LUA records in the existing PowerDNS zone |
| Health-check primitive | k8gb-managed liveness probes | PowerDNS ifurlup / ifportup (HTTP / TCP probes from PowerDNS pods) |
| Geo selection | EdgeDNS witness + custom logic | pickclosest (geo by source IP), pickrandom (RR), pickwhashed (sticky weighted) |
| DNSSEC | Layered on top, separate signer | Native — PowerDNS signs the lua-record's computed answer with the zone's KSK/ZSK |
| Operational surface | k8gb pods + CoreDNS pods + custom CRDs | Existing PowerDNS deployment + dnsdist rate-limit shield |
| Cluster-coordination | Required (gslb endpoints sync between clusters) | Not required — authoritative DNS is the source of truth |
The architectural cost difference is large enough that the deletion is the right move per INVIOLABLE-PRINCIPLES.md #2 ("never compromise from quality — pick the unified primitive, not the dual-shape design") and #4 ("never hardcode — health probes, weights, geo policy are configuration in the lua-record body, not code in a controller").
2. Failover patterns (the lua-record cookbook)
Every Catalyst Sovereign zone is hosted on PowerDNS. The records below sit alongside ordinary A/AAAA/CNAME records that external-dns writes via the PowerDNS REST API. Lua-record syntax follows the upstream PowerDNS documentation.
Note on examples. Backend IPv4 addresses (
5.161.42.18,95.217.189.42) and the FQDNprimary.example.combelow are placeholders — they illustrate the lua-record shape only. The canonical 6-record set per Sovereign zone is written by pool-domain-manager (PDM,core/pool-domain-manager/) on/v1/commit; lua-records (geo / health-check policy) are written by the catalyst-dns controller (Catalyst control-plane sidecar) from each Application's Placement spec — seedocs/PLATFORM-POWERDNS.md§"In-cluster consumers".
2.1 Active-active across two regions, health-checked
foo.acme.com. IN LUA A "ifurlup('https://primary.example.com/healthz', {'5.161.42.18', '95.217.189.42'}, {selector='all'})"
- PowerDNS HTTP-probes
https://primary.example.com/healthzfrom each PowerDNS pod every 5s (default; configurable viaintervaloption). selector='all'returns every healthy backend — the resolver's stub then picks one (typical client behaviour: rotate, retry on failure).- When the probe to a backend fails three times in a row (default
failOnIncerror=true, 3 fails to drop), that backend is removed from the answer set within the next TTL window. - When the probe recovers, the backend is restored automatically.
2.2 Geo-aware active-active (pickclosest)
api.acme.com. IN LUA A "pickclosest({'5.161.42.18', '95.217.189.42'})"
- PowerDNS uses ECS (EDNS Client Subnet) when present, falling back to the resolver's source IP.
- The closer regional LB by GeoIP wins.
- Combine with
ifurlupfor health-aware closeness:
api.acme.com. IN LUA A "
ifurlup('https://primary.example.com/healthz', {
{'5.161.42.18', '95.217.189.42'}
}, {selector='pickclosest'})
"
2.3 Active-passive (primary → DR)
api.acme.com. IN LUA A "ifurlup('https://primary.example.com/healthz', {'5.161.42.18', '95.217.189.42'}, {selector='pickfirst'})"
pickfirstreturns the first healthy backend in the list.- When
5.161.42.18(primary) is healthy → answer is5.161.42.18. - When primary fails the probe → answer flips to
95.217.189.42(DR) within one TTL window. - When primary recovers → answer flips back to primary on the next probe success.
2.4 TCP-only / non-HTTP services (ifportup)
For services that don't expose an HTTP /healthz (e.g. SMTP, IMAP, custom TCP):
mail.acme.com. IN LUA A "ifportup(587, {'5.161.42.18', '95.217.189.42'})"
- PowerDNS attempts a TCP connect to port 587 on each backend.
- Connect-fail → drop from the response set; connect-success → include.
2.5 Weighted round-robin (pickwhashed)
For canary releases or traffic-shifting:
api.acme.com. IN LUA A "pickwhashed({{80, '5.161.42.18'}, {20, '95.217.189.42'}})"
- 80% of distinct client IPs are pinned to
5.161.42.18, 20% to95.217.189.42(consistent hash on source IP — the same client gets the same answer until the weight changes).
3. Catalyst integration points
3.1 Where lua-records are written
Lua-records are part of each Sovereign's PowerDNS zone, alongside the canonical 6-record set (PLATFORM-POWERDNS.md §"Per-Sovereign zone model"). The 6-record set is written once at provisioning by pool-domain-manager (PDM /v1/commit); ongoing A/AAAA/CNAME records are written by external-dns; LUA records are written by the catalyst-dns controller (sidecar to the Catalyst control plane on the mgt cluster):
PDM ──► PowerDNS REST API ──► canonical 6-record set (one-shot at provision)
external-dns ──► PowerDNS REST API ──► A/AAAA/CNAME records (per-region LB IPs)
catalyst-dns ──► PowerDNS REST API ──► LUA records (geo / health-check policy)
This separation matters: external-dns knows about a single K8s Service or Ingress; it has no concept of multi-region health policy. The catalyst-dns controller reads the Application's Placement field from the per-Org Gitea repo, sees placement: active-active (or active-hotstandby, etc.), and synthesizes the corresponding lua-record body.
3.2 Application Placement → lua-record selector mapping
| Application Placement | lua-record idiom |
|---|---|
single-region |
Plain A record(s) — no lua-record needed |
active-active |
ifurlup(..., {selector='all'}) (or selector='pickclosest' for geo-affinity) |
active-hotstandby |
ifurlup(..., {selector='pickfirst'}) — primary first, DR second |
active-passive-warm |
ifurlup(..., {selector='pickfirst'}) + longer TTL (manual operator promotion is the contract; the LUA only flips when the probe fails enough times) |
weighted-canary |
pickwhashed({{w1, ip1}, {w2, ip2}}) — adjust weights via Catalyst console (re-emits the lua-record body with new weights) |
3.3 Probe target
Every Catalyst Application Blueprint MUST expose /healthz on its public endpoint. The catalyst-dns controller defaults to https://<app-fqdn>/healthz as the probe target, configurable per-Application via spec.healthCheck.path in the Blueprint instance.
DNS pods are inside the Sovereign — they probe outbound to the regional LB IPs over the public internet (or via the Cilium Cluster Mesh + WireGuard back-channel for cross-region private probes). The probe direction is intentional: DNS pods are the source of truth on whether a regional LB is reachable from the same place the public internet would reach it.
3.4 Split-brain protection (failover-controller)
Lua-records are necessary but not sufficient for split-brain protection during a network partition. The failover-controller layers a lease-based witness on top:
- During healthy operation, each regional cluster renews a lease in a cloud witness (Cloudflare KV or similar — out of band from the Sovereign's own infra).
- The PowerDNS lua-record probes are the primary failover signal (sub-minute response).
- The lease becomes the tie-breaker for stateful promotion (OpenBao DR, CNPG primary promotion) — only the cluster holding a valid lease is allowed to take over write authority.
- See
SRE.md§2.4 for the witness protocol; this doc covers only the DNS-routing half.
4. When to add a second Sovereign region (the HA upgrade path)
A single-region Sovereign is the SME default (PLATFORM-TECH-STACK.md §9.2). For corporate / regulated tier (and for any Sovereign that signs an SLA strict enough that single-region downtime would breach it), the upgrade path is:
- Sovereign provisioned in Region A (e.g.
hz-fsn-rtz-prod) — single LB IP, plain A records. - Operator decides to add Region B via the Catalyst admin UI: Admin → Infrastructure → Add Region (see
SOVEREIGN-PROVISIONING.md§8). - Crossplane provisions Region B's clusters (rtz + dmz) with the same building blocks as Region A.
- Region B's PowerDNS replicas join the Sovereign's authoritative NS set via SOA NOTIFY + AXFR (PowerDNS-native zone replication; no external sync layer needed).
- catalyst-dns rewrites every Application's lua-record from
single-region→active-active(or whichever Placement the Application opts into). Old plain A records are replaced withifurlup(...)lua-records pointing at both regional LBs. - The cloud witness (failover-controller) starts arbitrating leases across the two clusters.
The cluster name never changes during this upgrade — Region A's cluster is still hz-fsn-rtz-prod, Region B is now hz-hel-rtz-prod, and neither is "primary" or "DR". This is the explicit design from NAMING-CONVENTION.md §1.3 — failover is a routing event, not a renaming event.
4.1 Triggers for adding a second region
| Trigger | Recommendation |
|---|---|
| SLA target ≥ 99.95% uptime | Mandatory second region — single-region cannot meet this |
| Compliance requirement (DORA, NIS2, GDPR data residency split) | Mandatory — typically one region per data-residency boundary |
Application's Placement set to active-active / active-hotstandby / active-passive-warm |
Mandatory — these placements require ≥ 2 regions to honour |
| Latency-sensitive global traffic (regional users far from Region A) | Strongly recommended — pickclosest lua-records cut median RTT |
| Cost-sensitive single-tenant Sovereign on a low-tier SLA | Defer — pay for it when a workload demands it |
5. Operational checks
5.1 Verify a lua-record is healthy
dig +short api.acme.com @ns1.openova.io
# Expected: an A record from the healthy regional LB set.
dig +short api.acme.com @ns1.openova.io \
+subnet=80.81.82.0/24
# Expected: with a EU client subnet, pickclosest returns the EU regional LB.
5.2 Force a probe-failure simulation (chaos-engineering)
The Litmus chaos suite includes a scenario that black-holes a regional LB's probe target. After ~1 TTL window:
dig +short api.acme.com @ns1.openova.io
# Expected: the affected backend IP is absent from the response.
When the probe target is restored, the IP returns automatically — no operator action.
5.3 Read PowerDNS probe state
kubectl exec -n openova-system deploy/powerdns -- pdns_control bind-list-record api.acme.com
PowerDNS exposes the current probe status (last probe timestamp, last result, current selection set) — useful when investigating "why is the answer set what it is?" during an incident.
6. References
- PowerDNS Lua Records — upstream documentation — every selector, every option.
PLATFORM-POWERDNS.md— the bp-powerdns deployment, DNSSEC posture, REST API contract.SOVEREIGN-PROVISIONING.md§7-§8 — multi-region topology + add-region workflow.NAMING-CONVENTION.md§1.3 + §7 — building-block naming, no "primary"/"DR" labels.SRE.md§2 — multi-region strategy, split-brain protection, data-replication patterns.SECURITY.md§5 — OpenBao independent-Raft-per-region (DNS failover doesn't touch secret authority).- Issue #171 — the change that retired k8gb in favour of PowerDNS lua-records.
Part of OpenOva Catalyst. Read Inviolable Principles before any changes.