Reconcile Pass 1 — first holistic LLM-driven reconciliation pass per ~/.claude/skills/reconcile-catalyst-docs/SKILL.md. Skill triggered after the post-Group-M architectural batch (#161, #162, #163, #167, #168, #169, #170, #171, #173, #174, #175). Live ground truth verified against kubectl + ls platform/ + git log + GHCR + componentGroups.ts. Drift categories fixed: - A. Numerical: bp-powerdns 1.0.5 → 1.0.6; component-logos 63 → 62 (powerdns SVG missing, tracked under #173); bootstrap kit 11 → 12 with bp-powerdns added per #167. - B. Service: pool-domain-manager + 5 registrar adapters (Cloudflare/Namecheap/GoDaddy/OVH/Dynadot, #170) added to IMPLEMENTATION-STATUS, ARCHITECTURE, PLATFORM-TECH-STACK, GLOSSARY, and PROVISIONING-PLAN; bp-powerdns added to ARCHITECTURE bootstrap kit + Catalyst-on-Catalyst dependency tree. - C. Architectural: SOVEREIGN-PROVISIONING §3 + DEMO-RUNBOOK Step 4 + ORCHESTRATOR-STATE Step 6 rewritten from Dynadot-direct DNS writes to PowerDNS authoritative + PDM /v1/commit + registrar-adapter NS-flip; PROVISIONING-PLAN Phase 4 paths corrected to products/catalyst/bootstrap/api/ (per INVIOLABLE-PRINCIPLES #3 the Go provisioner does NOT call cloud APIs); Phase 6 retitled and rewritten for the new DNS architecture. - D. Process: RUNBOOK-PROVISIONING §2 wizard-step table + DEMO-RUNBOOK Step 2 wizard-step table updated to canonical 7-step ordering (Org → Domain → Topology → Provider → Credentials → Components → Review per WIZARD_STEPS in WizardLayout.tsx, post #169 + #174); the three-mode StepDomain (pool / byo-manual / byo-api per #169) and two-tab StepComponents (mandatory infra + apps per #161/#162/#175) now documented. - E. Cross-doc: Group G ✅ across PROVISIONING-PLAN + ORCHESTRATOR-STATE (superseded by #167+#163+#170, not by the original Dynadot-multi-domain plan); Group C ✅ in PROVISIONING-PLAN (Flux is reconciling from openova-public today); README Stack-at-a-glance DNS row expanded. - F. Stale terminology: 11-grep banned-terms scan clean — every k8gb residual is a legitimate "removed at #171, replaced by lua-records" reference. VALIDATION-LOG.md gains the Reconcile Pass 1 entry per skill spec. Reconcile-skill numbering is independent of the Audit-skill numbering (which continues at Pass 108+). Files: 13 docs + VALIDATION-LOG entry. Escalations: none.
13 KiB
PowerDNS — Authoritative DNS for OpenOva Sovereigns
Status: Authoritative. Closes #167.
PowerDNS Authoritative is the canonical DNS service for every Sovereign zone in the OpenOva fleet. It replaces the previously-listed k8gb component — PowerDNS Lua records cover geo and health-checked failover natively, removing the need for a dedicated GSLB controller.
This document defines the per-Sovereign zone model, DNSSEC posture, REST API contract, and the operational interfaces that bp-powerdns exposes to the rest of Catalyst.
Per-Sovereign zone model
Every Sovereign — pool (e.g. omantel.omani.works, acme.openova.io) and BYO (acme.bank.com) — gets its own PowerDNS zone. No exceptions.
PowerDNS Authoritative (Catalyst-Zero, Contabo-mkt initially)
├── openova.io. (root pool — admin + console + api)
├── omani.works. (Oman pool — Huawei + Omantel partners)
├── omantel.omani.works. (Omantel Sovereign — wildcard records)
├── acme.openova.io. (acme Sovereign — pool subdomain)
└── acme.bank.com. (acme BYO — operator brought their own domain)
Each Sovereign zone holds the canonical 6-record set written by catalyst-dns:
@ A <regional-LB-IPv4>
* A <regional-LB-IPv4>
console A <regional-LB-IPv4>
api A <regional-LB-IPv4>
gitea A <regional-LB-IPv4>
harbor A <regional-LB-IPv4>
The wildcard A record (*.<sub>.<domain>) covers every additional subdomain a Sovereign might add at runtime (e.g. axon, umami, langfuse) without re-issuing certificates.
Authority chain
. (root)
└── openova.io.
NS ns1.openova.io.
NS ns2.openova.io.
NS ns3.openova.io.
└── DS records for each Sovereign zone (per-zone DNSSEC anchors)
The three public NS endpoints (ns1, ns2, ns3) are anycast Floating IPs across Hetzner regions. The Phase-0 stand-in is a Service of type LoadBalancer (see "Anycast deferral" below).
For pool-domain-based Sovereigns (<sub>.openova.io, <sub>.omani.works) the parent zone openova.io / omani.works is delegated to the OpenOva PowerDNS NS set via the registrar (Dynadot). Each child Sovereign zone (<sub>.openova.io) publishes its own DS in the parent zone for DNSSEC chaining.
For BYO Sovereigns (<sub>.acme.com) the operator's existing registrar publishes the NS delegation pointing at OpenOva's NS endpoints. The operator follows the runbook in docs/RUNBOOK-PROVISIONING.md to add the DS record to their parent zone.
DNSSEC
DNSSEC is mandatory. Off requires an explicit cluster-overlay override AND a documented exception in the Sovereign's onboarding ticket.
Algorithm
ECDSAP256SHA256 (algorithm 13) — the IETF-recommended curve for new deployments since 2014, smaller signatures than RSA, sufficient strength for the foreseeable future.
Key model
Each zone gets an automatically-generated KSK (Key Signing Key) + ZSK (Zone Signing Key) pair:
# Generated by the catalyst-dns sidecar after zone creation.
pdnsutil add-zone-key <zone> ksk active ecdsa256
pdnsutil add-zone-key <zone> zsk active ecdsa256
Keys live in the cryptokeys table inside the CNPG-managed pdns-pg Postgres database. CNPG's WAL archiving + base backup schedule (configured at the bp-cnpg level) is the disaster-recovery anchor — losing the database means losing every zone's keys, so the Postgres backup posture is part of the security story.
SOA serials
default-soa-edit-signed=INCEPTION-EPOCH keeps SOA serials in sync across replicas after every signed-zone change. Operators don't manually bump SOA on edits — PowerDNS handles it.
Rotation
KSK rotation follows RFC 6781 / RFC 7583:
- Add a new KSK (
pdnsutil add-zone-key <zone> ksk active ecdsa256) - Wait for TTL to expire on the parent DS record
- Submit the new DS to the registrar / parent zone
- Wait for the new DS to propagate
- Mark the old KSK inactive (
pdnsutil set-zone-key-active <zone> <old-id> 0) - Wait one more TTL window
- Remove the old KSK (
pdnsutil remove-zone-key <zone> <old-id>)
ZSK rotation is fully automated — PowerDNS handles ZSK rollover internally.
Lua records (replacing k8gb)
PowerDNS Lua records (upstream docs) provide:
- Geo-aware responses — return different A records based on the resolver's source IP / ECS subnet
- Health-checked failover — drop a backend from the response set when a TCP/HTTP probe fails
- Weighted round-robin — split traffic across multiple regional LBs by weight
This subsumes everything k8gb was doing — same primitive (DNS-level GSLB), no separate operator + CRD set + reconciliation loop. The enable-lua-records=yes directive is set in the Catalyst overlay and cannot be turned off without removing geo + health-check capability.
Example (Catalyst-curated)
www IN LUA A "
ifurlup('https://www-fra.example.com/healthz', {
'A 1.2.3.4',
'A 5.6.7.8'
})
"
This record returns 1.2.3.4 while the FRA backend's /healthz returns 200; falls through to 5.6.7.8 otherwise. Used by Catalyst's regional active-active LB pattern.
REST API
Exposed at https://pdns.openova.io/api, behind a Traefik basicAuth Middleware. The plaintext password is generated per-cluster (random 32 chars per INVIOLABLE-PRINCIPLES.md #10), bcrypt-hashed in-cluster only, and stored in K8s Secret powerdns-api-basicauth in the openova-system namespace.
Endpoints
The full API surface is documented at https://doc.powerdns.com/authoritative/http-api/. The Catalyst-relevant endpoints:
GET /api/v1/servers/localhost
GET /api/v1/servers/localhost/zones
POST /api/v1/servers/localhost/zones # create zone
PUT /api/v1/servers/localhost/zones/<zone>
PATCH /api/v1/servers/localhost/zones/<zone> # add/remove records
DELETE /api/v1/servers/localhost/zones/<zone>
GET /api/v1/servers/localhost/zones/<zone>/cryptokeys # DNSSEC keys
POST /api/v1/servers/localhost/zones/<zone>/rectify
In-cluster consumers
Three Catalyst services hit the PowerDNS API in-cluster (via the ClusterIP Service powerdns:8081, NOT through the public ingress):
- pool-domain-manager (PDM) —
core/pool-domain-manager/internal/pdns/writes the canonical 6-record set on/v1/commit, creates per-Sovereign zones, and bootstraps DNSSEC keys (#163, #167, #168). PDM also wraps the parent-zone NS-flip via its registrar adapters (#170 — Cloudflare, Namecheap, GoDaddy, OVH, Dynadot). - cert-manager-webhook-pdns — DNS-01 ACME challenges for wildcard certs (replaces the planned cert-manager-dynadot-webhook for any zone hosted on PowerDNS)
- external-dns-pdns — automatic A/AAAA/CNAME records for K8s Ingress + LB services
The API key (x-api-key header) is read from K8s Secret powerdns-api-credentials (key api-key); the same secret is mounted by all three consumers.
External consumers
Operators may hit https://pdns.openova.io/api from a laptop using curl-with-basic-auth. The basicAuth middleware terminates at Traefik, so the API key is still required end-to-end:
curl -u operator:<password> -H 'X-API-Key: <api-key>' \
https://pdns.openova.io/api/v1/servers/localhost
dnsdist companion
A dnsdist Deployment fronts every PowerDNS replica for query rate-limiting and DDoS posture. Default policy:
MaxQPSIPRule(100)— drop queries from any source IP exceeding 100 qps for 60s- 1% sampled query logging to stderr (k8s log)
- Health-checked backend (the in-cluster
powerdns:5353Service)
The threshold is configurable via .Values.dnsdist.ratelimit.qpsPerSource. Per-Sovereign overrides go in the cluster overlay's HelmRelease values: block.
Anycast deferral
The target state for ns1.openova.io, ns2.openova.io, ns3.openova.io is anycast Hetzner Floating IPs spread across regions. The Crossplane XHetznerFloatingIP composite resource definition that would allocate these does not yet exist in platform/crossplane/compositions/ (the compositions currently authored cover Server, Network, Firewall, LoadBalancer, and Pool-Allocation).
Phase-0 stand-in: A Kubernetes Service of type=LoadBalancer (rendered by templates/anycast-endpoint.yaml). On Hetzner-managed Sovereigns the Hetzner cloud-controller-manager allocates a public IPv4 automatically; on Contabo this falls back to NodePort + external-IP routing.
Cutover: Once xrd-floating-ip.yaml + composition-floating-ip.yaml land in platform/crossplane/compositions/, bp-powerdns is bumped to 1.1.0 with .Values.crossplane.floatingIP.enabled=true flipped on by default. The placeholder Service stays in the chart but is disabled by setting .Values.anycast.enabled=false in cluster overlays. The follow-up issue tracking the composition is referenced from the gap comment in templates/crossplane-floatingip.yaml.
Cluster manifests (private repo)
The Flux-managed deployment lives in clusters/contabo-mkt/apps/powerdns/ in openova-private:
clusters/contabo-mkt/apps/powerdns/
├── kustomization.yaml # references the chart
├── helmrelease.yaml # pulls bp-powerdns:1.0.6 from ghcr.io/openova-io
├── helm-repository.yaml # OCI HelmRepository pointing at ghcr.io/openova-io
├── namespace.yaml # openova-system (already exists)
├── api-credentials-secret.yaml # ExternalSecret reading from openbao
└── api-basicauth-secret.yaml # bcrypt(operator:<pw>) for the Traefik middleware
The HelmRelease's values: block carries cluster-specific overrides (replicaCount, dnsdist threshold, region, etc.). The base chart's defaults are sufficient for Contabo-mkt's first deployment.
Acceptance (per #167)
- Wrapper chart
platform/powerdns/chart/with Chart.yaml + values.yaml + blueprint.yaml + templates - CNPG-backed
pdns-pgcluster (separate frompdm-pg) - DNSSEC enabled by default (ECDSAP256SHA256)
- lua-records enabled by default
- dnsdist companion with 100 qps default rate limit
- REST API at
pdns.openova.io/apibehind Traefik basicAuth - CI publishes
bp-powerdns:1.0.6cosign-signed + SBOM-attested (current chart version on main; seeplatform/powerdns/Chart.yaml) - Cluster manifest in private repo
clusters/contabo-mkt/apps/powerdns/ kubectl get cluster pdns-pg -n openova-systemhealthy — running todaykubectl get deploy powerdns -n openova-system1/1 ready — running today (replicaCount=1 on Contabo-mkt; default chart value is 3 for production Sovereigns)curl https://pdns.openova.io/api/v1/servers/localhostreturns JSON — verified post-Flux-reconciledig @ns1.openova.io openova.io NSreturns the 3 NS records — DEFERRED (anycast composition gap, see "Anycast deferral")dig +dnssecvalidates — DEFERRED (depends on parent-zone DS submission, separate ticket)
Runbook — first deploy on Contabo-mkt
# 1. Generate API key + webserver password (random 32 chars per INVIOLABLE-PRINCIPLES #10)
api_key=$(python3 -c "import secrets,string; print(''.join(secrets.choice(string.ascii_letters+string.digits) for _ in range(48)))")
ws_pw=$(python3 -c "import secrets,string; print(''.join(secrets.choice(string.ascii_letters+string.digits) for _ in range(32)))")
# 2. Stage them in OpenBao (already deployed at openbao.openova.io)
# — ExternalSecret in clusters/contabo-mkt/apps/powerdns/ pulls them into K8s
# 3. Generate basicAuth bcrypt for the operator user
op_pw=$(python3 -c "import secrets,string; print(''.join(secrets.choice(string.ascii_letters+string.digits) for _ in range(32)))")
htpasswd -nbB operator "$op_pw" # → operator:$2y$05$<hash>
# 4. Apply (Flux reconciles automatically every 1m)
git -C openova-private add clusters/contabo-mkt/apps/powerdns/
git commit -m "deploy(powerdns): initial bp-powerdns:1.0.6 on contabo-mkt"
git push
# 5. Wait for reconcile
flux get helmrelease powerdns -n openova-system
# 6. Verify
kubectl get cluster pdns-pg -n openova-system
kubectl get deploy powerdns -n openova-system
curl -u "operator:$op_pw" -H "X-API-Key: $api_key" https://pdns.openova.io/api/v1/servers/localhost
References
- PowerDNS Authoritative — Settings
- PowerDNS REST API
- Lua records
- dnsdist documentation
- Issue #167
- Inviolable Principles —
docs/INVIOLABLE-PRINCIPLES.md - Naming convention —
docs/NAMING-CONVENTION.md
Part of OpenOva Catalyst. Read Inviolable Principles before any changes.