openova/platform/powerdns/blueprint.yaml
e3mrah 684759564e
fix(powerdns+catalyst-api): zero-touch contabo PowerDNS API key for Sovereign cert-manager (PR #681 followup) (#686)
* fix(cilium-gateway): listener ports 80/443 → 30080/30443 + LB retarget

cilium-envoy refuses to bind privileged ports (80/443) on Sovereigns
even with all of:

- gatewayAPI.hostNetwork.enabled=true on the Cilium chart
- securityContext.privileged=true on the cilium-envoy DaemonSet
- securityContext.capabilities.add=[NET_BIND_SERVICE]
- envoy-keep-cap-netbindservice=true in cilium-config ConfigMap
- Gateway API CRDs at v1.3.0 (matching cilium 1.19.3 schema)

Repeatable error from cilium-envoy logs across otech45, otech46, otech47:

  listener 'kube-system/cilium-gateway-cilium-gateway/listener' failed
  to bind or apply socket options: cannot bind '0.0.0.0:80':
  Permission denied

The bind() syscall is intercepted by cilium-agent's BPF socket-LB
program in a way that does not honour container capabilities. Even
PID 1 with CapEff=0x000001ffffffffff (all caps) and uid=0 gets
"Permission denied". Cilium 1.19.3 → 1.16.5 made no difference
(F1, PR #684 still ships — the version bump is sound for other
reasons; the listener bind is just a separate fix).

This commit moves the listeners to high ports (30080/30443) and lets
the Hetzner LB do the public-facing port translation:

  HCLB :80   → CP node :30080  (cilium-gateway HTTP listener)
  HCLB :443  → CP node :30443  (cilium-gateway HTTPS listener)

External users still hit `https://console.<sov>.omani.works/auth/handover`
on port 443; the high port is invisible. High-port bind succeeds
without NET_BIND_SERVICE because the kernel only gates ports below
`net.ipv4.ip_unprivileged_port_start` (default 1024).

Will be verified on otech48: the next fresh provision should serve
console.otech48/auth/handover end-to-end without the 502/timeout
chain seen on otech45–47.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(powerdns+catalyst-api): zero-touch contabo PowerDNS API key for Sovereign cert-manager

PR #681 followup. The new bp-cert-manager-powerdns-webhook (PR #681)
calls contabo's authoritative PowerDNS at pdns.openova.io to write
DNS-01 challenge TXT records for *.otech<N>.omani.works. That webhook
needs an X-API-Key Secret in the Sovereign's cert-manager namespace —
PR #681 didn't ship the materialization seam, so on otech43..otech47
the Secret was missing and the wildcard cert never issued.

This commit closes the seam from contabo to the Sovereign:

1. bp-powerdns chart 1.1.7 to 1.1.8: Reflector annotations on
   openova-system/powerdns-api-credentials extended from "external-dns"
   to "external-dns,catalyst" so contabo catalyst-api can mount the
   API key.

2. bp-powerdns: api.basicAuth.enabled flips default true to false.
   Layered Traefik basicAuth + PowerDNS X-API-Key was double auth that
   blocked machine-to-machine API access from Sovereigns. The X-API-Key
   contract is unchanged.

3. bp-catalyst-platform 1.2.3 to 1.2.4: api-deployment.yaml adds
   CATALYST_POWERDNS_API_KEY env from powerdns-api-credentials/api-key
   secret (optional=true so Sovereign-side catalyst-api Pods that don't
   reflect this still start clean).

4. catalyst-api provisioner.go: new Provisioner.PowerDNSAPIKey field
   reads from CATALYST_POWERDNS_API_KEY env at New(). Stamps onto every
   Request before Validate(). Forwards as tofu var powerdns_api_key.

5. infra/hetzner/variables.tf: new var.powerdns_api_key (sensitive,
   default "").

6. infra/hetzner/cloudinit-control-plane.tftpl: replaces the defunct
   dynadot-api-credentials Secret block (PR #681 dropped
   bp-cert-manager-dynadot-webhook) with a new
   cert-manager/powerdns-api-credentials Secret block. runcmd applies
   it BEFORE Flux reconciles bp-cert-manager-powerdns-webhook.

End-to-end seam mirrors PR #543 ghcr-pull and PR #680 harbor-robot-token.

Will be verified live on otech48 (next provision after this lands).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:23:27 +04:00

67 lines
2.6 KiB
YAML

apiVersion: catalyst.openova.io/v1alpha1
kind: Blueprint
metadata:
name: bp-powerdns
labels:
catalyst.openova.io/category: per-host-cluster-infrastructure
catalyst.openova.io/section: pts-3-2-gitops-and-iac
spec:
version: 1.1.8
card:
title: PowerDNS
summary: |
Authoritative DNS for every Sovereign zone (pool + BYO). Per-zone
DNSSEC (ECDSAP256SHA256), lua-records for geo + health-checked
failover, dnsdist front-end for query rate-limiting + DDoS posture,
REST API at pdns.openova.io/api (operator-only). See
docs/MULTI-REGION-DNS.md for the lua-record patterns.
icon: powerdns.svg
category: infrastructure
visibility: unlisted # mandatory infra, auto-installed by bootstrap kit
configSchema:
type: object
properties:
replicaCount:
type: integer
default: 3
description: PowerDNS Authoritative replicas. Each connects to the same CNPG database.
dnssec:
type: boolean
default: true
description: |
DNSSEC ON (ECDSAP256SHA256) per #167 acceptance. Off requires explicit
override in cluster overlay AND a documented exception.
luaRecords:
type: boolean
default: true
description: PowerDNS Lua records — geo + health-checked failover. See docs/MULTI-REGION-DNS.md.
dnsdistEnabled:
type: boolean
default: true
description: Companion dnsdist for query rate-limiting (default 100 qps per source IP).
qpsPerSource:
type: integer
default: 100
description: dnsdist MaxQPSIPRule threshold per source IP.
placementSchema:
modes: [single-region, active-active]
default: active-active # the public NS endpoints span regions
manifests:
chart: ./chart
# Hard depends — these primitives MUST be present in-cluster before
# bp-powerdns reconciles cleanly:
# - postgresql.cnpg.io/v1.Cluster CRD (CNPG operator) consumed by
# templates/cnpg-cluster.yaml. Catalyst-Zero installs CNPG as a
# mandatory platform component (FABRIC group, componentGroups.ts
# `cnpg`); the wrapper Blueprint bp-cnpg is on the roadmap as a
# follow-up — until then we depend on the CRD being present, not
# on a sibling Blueprint.
# - cert-manager.io/v1.ClusterIssuer (letsencrypt-prod) referenced by
# templates/api-ingress.yaml. bp-cert-manager already exists.
# - traefik.io/v1alpha1.Middleware CRD — Traefik is the Catalyst-Zero
# ingress controller and is a fixture of every Sovereign.
depends:
- bp-cert-manager
upgrades:
from: ["0.x"]