* fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook Caught live on otech43-46: cert-manager DNS-01 challenges for *.otechN.omani.works failed because the Sovereign-side webhook wrote challenge TXT records to the Sovereign's local PowerDNS. omani.works is delegated from Dynadot to ns1/2/3.openova.io which run on contabo's central PowerDNS — the Sovereign's local PowerDNS is INVISIBLE on the public DNS chain until pool-domain-manager seals the per-Sovereign NS delegation. Let's Encrypt resolvers walk the public chain, query contabo, get NXDOMAIN, the cert never issues. Manual workaround was seeding challenge TXT directly in contabo PowerDNS. This PR automates the right write path: - bp-cert-manager-powerdns-webhook chart bumped to 1.0.4. Default powerdns.host flips from "" (skip-render) to https://pdns.openova.io (contabo's public PowerDNS API ingress, authoritative for omani.works). - ClusterIssuer letsencrypt-dns01-prod-powerdns now usable with no per-cluster powerdns.host override for the omani.works pool. apiKeySecretRef.namespace clarified — upstream ignores it; the Secret must live in cert-manager namespace (= ChallengeRequest.ResourceNamespace for ClusterIssuers). - bootstrap-kit slot 49 updated: drops bp-powerdns dependsOn (webhook calls out-of-cluster contabo, not local PowerDNS), bumps chart version, removes inline powerdns.host override (defaults are correct). - bootstrap-kit slot 49b (bp-cert-manager-dynadot-webhook) DELETED entirely — Dynadot is NOT the API-level authority for omani.works subdomains, the dynadot webhook silently fails the same way the Sovereign-local powerdns one did. - clusters/_template/sovereign-tls/cilium-gateway-cert.yaml flips issuerRef from letsencrypt-dns01-prod (was dynadot-backed) to letsencrypt-dns01-prod-powerdns (the new contabo-backed issuer). - bp-cert-manager chart: certManager.issuers.dns01.enabled defaults to false (deprecated dynadot path). letsencrypt-http01-prod retained for per-host certs. Cluster overlays MAY flip dns01.enabled=true for non-omani.works pools where Dynadot IS the API-level authority. - scripts/expected-bootstrap-deps.yaml: drops slot 49b, drops bp-powerdns edge from slot 49. - Documentation (README + blueprint.yaml + Chart.yaml description) rewritten to reflect contabo retarget and lifecycle reasoning. Credential plumbing (out of scope here, must be done in cloud-init): - Every Sovereign needs a `powerdns-api-credentials` Secret in the `cert-manager` namespace whose `api-key` value matches contabo's PowerDNS API key. Same seeding pattern as `dynadot-api-credentials` in infra/hetzner/cloudinit-control-plane.tftpl. Caveat — basicAuth on contabo's PowerDNS API ingress: contabo currently fronts pdns.openova.io with Traefik basicAuth (per clusters/contabo-mkt/apps/powerdns/helmrelease.yaml). The upstream zachomedia/cert-manager-webhook-pdns binary supports the X-API-Key header but not HTTP Basic Auth out of the box. To make this end-to-end green, contabo's basicAuth requirement must be relaxed (X-API-Key alone provides the auth posture, and contabo's API endpoint is restricted to operator IPs by other means OR the Sovereign's webhook needs an Authorization header injected via the chart's powerdns.headers map (plaintext password in the ClusterIssuer config — not ideal). This PR ships the chart side; the basicAuth question is a follow-up on the contabo side. Verified locally: - helm lint platform/cert-manager-powerdns-webhook/chart -> PASS - helm template platform/cert-manager-powerdns-webhook/chart -> renders - helm template ... --set clusterIssuer.enabled=true -> renders the ClusterIssuer with host="https://pdns.openova.io" + correct apiKey Secret reference. - helm template platform/cert-manager/chart -> renders ONLY letsencrypt-http01-prod (the dns01 dynadot issuer correctly gated off). - scripts/check-bootstrap-deps.sh: net-zero new drift; my branch reduces pre-existing errors from 3 to 2 (the dropped slot 49b removed the only drift my branch was responsible for). Closes follow-up to #373. Preconditions for handover URL TLS green on otech43-46 lineage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): repair YAML structure in expected-bootstrap-deps.yaml Two pre-existing drifts were blocking dependency-graph-audit CI: 1. Slot 5a (bp-reflector) was missing its closing list separator, causing yq to merge the bp-nats-jetstream entry into the bp-reflector map and effectively drop bp-reflector from the expected DAG. Added explicit `- slot: 7` for bp-nats-jetstream and quoted "5a" so yq treats it as a string slot (matches the convention with "49b"). 2. bp-powerdns slot 11: actual bootstrap-kit declares dependsOn bp-cnpg (live since otech28 — pdns-pg-app secret race) but the expected DAG was missing this edge. This is unblocks merging fix/cert-manager-powerdns-webhook-contabo (PR above) — these drifts existed on main but weren't surfaced until the last expected-deps edit forced a re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
132 lines
6.6 KiB
YAML
132 lines
6.6 KiB
YAML
# bp-cert-manager-powerdns-webhook — Catalyst bootstrap-kit Blueprint #49.
|
||
# (Slot 36 was reserved in the W2.K0 forward-declared DAG for `bp-stunner`;
|
||
# this Phase-2 webhook lands at slot 49 — first free slot after the W2.K4
|
||
# forward declarations end at 48. Source of truth: scripts/expected-
|
||
# bootstrap-deps.yaml.)
|
||
# DNS-01 ACME solver against contabo's central PowerDNS (authoritative for
|
||
# omani.works) for wildcard TLS on *.${SOVEREIGN_FQDN}. Supersedes
|
||
# bp-cert-manager-dynadot-webhook (slot 49b, dropped in this PR).
|
||
# Closes openova#373.
|
||
#
|
||
# ──────────────────────────────────────────────────────────────────────────
|
||
# Why this slot exists
|
||
# ──────────────────────────────────────────────────────────────────────────
|
||
# The per-Sovereign Gateway in 01-cilium.yaml requests a wildcard
|
||
# Certificate covering `*.${SOVEREIGN_FQDN}` — e.g. `*.otechN.omani.works`.
|
||
# omani.works itself is registered at Dynadot but is delegated to
|
||
# ns1/2/3.openova.io which run on contabo's PowerDNS in the
|
||
# openova-system namespace. Dynadot is NOT the API-level authority for
|
||
# omani.works subdomains; contabo PowerDNS is.
|
||
#
|
||
# When Let's Encrypt validates a DNS-01 challenge for `*.otechN.omani.works`,
|
||
# its resolvers walk the public DNS chain: Dynadot → ns1/2/3.openova.io
|
||
# (contabo PowerDNS). Until pool-domain-manager has committed the per-
|
||
# Sovereign NS delegation into contabo PowerDNS (and that delegation has
|
||
# propagated), the Sovereign's own PowerDNS is INVISIBLE on the public
|
||
# chain — LE queries contabo, gets NXDOMAIN, and the cert never issues.
|
||
#
|
||
# Caught live on otech43–46: manual workaround was to seed the challenge
|
||
# TXT record directly in contabo PowerDNS. This blueprint automates that
|
||
# write path: every Sovereign's cert-manager webhook calls contabo's
|
||
# PowerDNS API at https://pdns.openova.io to PATCH the challenge TXT
|
||
# record, regardless of whether the Sovereign's own DNS delegation has
|
||
# sealed yet.
|
||
#
|
||
# ──────────────────────────────────────────────────────────────────────────
|
||
# Wiring
|
||
# ──────────────────────────────────────────────────────────────────────────
|
||
# Wrapper chart: platform/cert-manager-powerdns-webhook/chart/
|
||
# Catalyst-curated values: platform/cert-manager-powerdns-webhook/chart/values.yaml
|
||
# Reconciled by: Flux on the new Sovereign's k3s control plane.
|
||
#
|
||
# dependsOn:
|
||
# - bp-cert-manager — provides the cert-manager.io CRDs + controllers.
|
||
# Without this the ClusterIssuer + Certificate
|
||
# resources templated by this blueprint can't apply.
|
||
#
|
||
# Note: this slot does NOT depend on bp-powerdns. The webhook calls
|
||
# contabo's central PowerDNS (https://pdns.openova.io) — an out-of-cluster
|
||
# endpoint — not the Sovereign's local PowerDNS. The Sovereign's
|
||
# bp-powerdns slot (11) is still installed (it backs the Sovereign's own
|
||
# subzone for app-level records via bp-external-dns), but it is NOT in
|
||
# the cert-issuance path.
|
||
#
|
||
# Credentials: the chart's apiKeySecretRef points at a Secret named
|
||
# `powerdns-api-credentials` in the cert-manager namespace. That Secret's
|
||
# `api-key` value MUST match the API key configured on contabo's central
|
||
# PowerDNS. It is provisioned onto every Sovereign by cloud-init at
|
||
# control-plane boot time (mirrors the dynadot-api-credentials seeding
|
||
# pattern; see infra/hetzner/cloudinit-control-plane.tftpl).
|
||
#
|
||
# Per docs/INVIOLABLE-PRINCIPLES.md #4 ("never hardcode") every URL/zone
|
||
# is operator-overridable. ${SOVEREIGN_FQDN} is substituted by Flux
|
||
# envsubst at the per-Sovereign apply time; contabo's bootstrap path
|
||
# does NOT include this template (per ADR-0001 §9.4 contabo stays on
|
||
# the legacy Traefik + per-host HTTP-01 stack).
|
||
|
||
---
|
||
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
||
kind: HelmRepository
|
||
metadata:
|
||
name: bp-cert-manager-powerdns-webhook
|
||
namespace: flux-system
|
||
spec:
|
||
type: oci
|
||
interval: 15m
|
||
url: oci://ghcr.io/openova-io
|
||
secretRef:
|
||
name: ghcr-pull
|
||
---
|
||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||
kind: HelmRelease
|
||
metadata:
|
||
name: bp-cert-manager-powerdns-webhook
|
||
namespace: flux-system
|
||
spec:
|
||
interval: 15m
|
||
releaseName: cert-manager-powerdns-webhook
|
||
# Co-located with cert-manager so the webhook's serving Certificate
|
||
# (issued by the chart's selfSigned + CA Issuers) and APIService
|
||
# caBundle injection live in the same namespace cert-manager itself
|
||
# watches. Mirrors upstream chart convention.
|
||
targetNamespace: cert-manager
|
||
dependsOn:
|
||
- name: bp-cert-manager
|
||
chart:
|
||
spec:
|
||
chart: bp-cert-manager-powerdns-webhook
|
||
version: 1.0.4
|
||
sourceRef:
|
||
kind: HelmRepository
|
||
name: bp-cert-manager-powerdns-webhook
|
||
namespace: flux-system
|
||
# Event-driven install: the chart's ClusterIssuer template uses a
|
||
# post-install Helm hook that runs AFTER cert-manager's CRDs land,
|
||
# so blocking on Helm `--wait` for the leaf Certificate to reach
|
||
# Ready is unnecessary. Replaces blanket spec.timeout band-aids.
|
||
install:
|
||
disableWait: true
|
||
remediation:
|
||
retries: 3
|
||
upgrade:
|
||
disableWait: true
|
||
remediation:
|
||
retries: 3
|
||
values:
|
||
# ─── PowerDNS API endpoint ──────────────────────────────────────────
|
||
# The chart's default value (https://pdns.openova.io — contabo's
|
||
# central PowerDNS, authoritative for omani.works) is correct for
|
||
# every Sovereign in the omani.works pool, so no override is needed
|
||
# here. Operators provisioning a Sovereign in a non-omani.works pool
|
||
# add a `powerdns: { host: "https://pdns.<other-pool>" }` override
|
||
# in their per-cluster overlay.
|
||
|
||
# ─── Paired ClusterIssuer ───────────────────────────────────────────
|
||
# Operator opts in here; the chart's default render skips this
|
||
# resource (skip-render pattern, lesson from #387 follow-up #402).
|
||
clusterIssuer:
|
||
enabled: true
|
||
name: letsencrypt-dns01-prod-powerdns
|
||
email: "ops@${SOVEREIGN_FQDN}"
|
||
acmeServer: "https://acme-v02.api.letsencrypt.org/directory"
|