openova/scripts/expected-bootstrap-deps.yaml
e3mrah 2b60e944e2
fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681)
* fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook

Caught live on otech43-46: cert-manager DNS-01 challenges for
*.otechN.omani.works failed because the Sovereign-side webhook wrote
challenge TXT records to the Sovereign's local PowerDNS. omani.works is
delegated from Dynadot to ns1/2/3.openova.io which run on contabo's
central PowerDNS — the Sovereign's local PowerDNS is INVISIBLE on the
public DNS chain until pool-domain-manager seals the per-Sovereign NS
delegation. Let's Encrypt resolvers walk the public chain, query
contabo, get NXDOMAIN, the cert never issues. Manual workaround was
seeding challenge TXT directly in contabo PowerDNS.

This PR automates the right write path:

- bp-cert-manager-powerdns-webhook chart bumped to 1.0.4. Default
  powerdns.host flips from "" (skip-render) to https://pdns.openova.io
  (contabo's public PowerDNS API ingress, authoritative for omani.works).
- ClusterIssuer letsencrypt-dns01-prod-powerdns now usable with no
  per-cluster powerdns.host override for the omani.works pool.
  apiKeySecretRef.namespace clarified — upstream ignores it; the Secret
  must live in cert-manager namespace (= ChallengeRequest.ResourceNamespace
  for ClusterIssuers).
- bootstrap-kit slot 49 updated: drops bp-powerdns dependsOn (webhook
  calls out-of-cluster contabo, not local PowerDNS), bumps chart version,
  removes inline powerdns.host override (defaults are correct).
- bootstrap-kit slot 49b (bp-cert-manager-dynadot-webhook) DELETED
  entirely — Dynadot is NOT the API-level authority for omani.works
  subdomains, the dynadot webhook silently fails the same way the
  Sovereign-local powerdns one did.
- clusters/_template/sovereign-tls/cilium-gateway-cert.yaml flips
  issuerRef from letsencrypt-dns01-prod (was dynadot-backed) to
  letsencrypt-dns01-prod-powerdns (the new contabo-backed issuer).
- bp-cert-manager chart: certManager.issuers.dns01.enabled defaults to
  false (deprecated dynadot path). letsencrypt-http01-prod retained for
  per-host certs. Cluster overlays MAY flip dns01.enabled=true for
  non-omani.works pools where Dynadot IS the API-level authority.
- scripts/expected-bootstrap-deps.yaml: drops slot 49b, drops bp-powerdns
  edge from slot 49.
- Documentation (README + blueprint.yaml + Chart.yaml description)
  rewritten to reflect contabo retarget and lifecycle reasoning.

Credential plumbing (out of scope here, must be done in cloud-init):
- Every Sovereign needs a `powerdns-api-credentials` Secret in the
  `cert-manager` namespace whose `api-key` value matches contabo's
  PowerDNS API key. Same seeding pattern as `dynadot-api-credentials`
  in infra/hetzner/cloudinit-control-plane.tftpl.

Caveat — basicAuth on contabo's PowerDNS API ingress: contabo currently
fronts pdns.openova.io with Traefik basicAuth (per
clusters/contabo-mkt/apps/powerdns/helmrelease.yaml). The upstream
zachomedia/cert-manager-webhook-pdns binary supports the X-API-Key
header but not HTTP Basic Auth out of the box. To make this end-to-end
green, contabo's basicAuth requirement must be relaxed (X-API-Key alone
provides the auth posture, and contabo's API endpoint is restricted to
operator IPs by other means OR the Sovereign's webhook needs an
Authorization header injected via the chart's powerdns.headers map
(plaintext password in the ClusterIssuer config — not ideal). This PR
ships the chart side; the basicAuth question is a follow-up on the
contabo side.

Verified locally:
- helm lint platform/cert-manager-powerdns-webhook/chart -> PASS
- helm template platform/cert-manager-powerdns-webhook/chart -> renders
- helm template ... --set clusterIssuer.enabled=true -> renders the
  ClusterIssuer with host="https://pdns.openova.io" + correct apiKey
  Secret reference.
- helm template platform/cert-manager/chart -> renders ONLY
  letsencrypt-http01-prod (the dns01 dynadot issuer correctly gated off).
- scripts/check-bootstrap-deps.sh: net-zero new drift; my branch reduces
  pre-existing errors from 3 to 2 (the dropped slot 49b removed the only
  drift my branch was responsible for).

Closes follow-up to #373. Preconditions for handover URL TLS green
on otech43-46 lineage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): repair YAML structure in expected-bootstrap-deps.yaml

Two pre-existing drifts were blocking dependency-graph-audit CI:

1. Slot 5a (bp-reflector) was missing its closing list separator,
   causing yq to merge the bp-nats-jetstream entry into the bp-reflector
   map and effectively drop bp-reflector from the expected DAG.
   Added explicit `- slot: 7` for bp-nats-jetstream and quoted "5a" so
   yq treats it as a string slot (matches the convention with "49b").

2. bp-powerdns slot 11: actual bootstrap-kit declares dependsOn
   bp-cnpg (live since otech28 — pdns-pg-app secret race) but the
   expected DAG was missing this edge.

This is unblocks merging fix/cert-manager-powerdns-webhook-contabo (PR
above) — these drifts existed on main but weren't surfaced until the
last expected-deps edit forced a re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:12:48 +04:00

301 lines
11 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Expected dependency DAG for clusters/_template/bootstrap-kit/*.yaml
#
# Authoritative spec: docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.
# Consumed by: scripts/check-bootstrap-deps.sh
# Updated by: W2.K0 (slots 01-14 baseline + slots 15-48 forward declarations)
# W2.K1, K2, K3, K4 PRs add the corresponding HR files; this
# file already declares the expected deps for those slots so
# each W2 PR can be mechanically verified at merge time.
#
# Schema:
# slots:
# - slot: <int> # numeric prefix on the HR file (01..48)
# name: <string> # value of metadata.name on the HelmRelease
# depends_on: [<string>] # ordered or unordered; comparison is set-based
# wave: <"present"|"W2.K1"|"W2.K2"|"W2.K3"|"W2.K4">
#
# Comparison semantics enforced by check-bootstrap-deps.sh:
# - Each HR file present on disk MUST declare exactly the depends_on set listed
# here (missing edges -> error, extra edges -> error).
# - HRs declared here but not yet present on disk are reported as "deferred"
# (info, not an error) so that this file can be the static authoritative list
# while W2.K1..K4 land their HR files in series.
# - The graph is checked for cycles after merging declared+actual edges.
#
# The slot-numbering convention is documented in BOOTSTRAP-KIT-EXPANSION-PLAN.md §3.
slots:
# ---- Tier 0-4: present today (post-PR-247 baseline) -----------------------
- slot: 1
name: bp-cilium
depends_on: []
wave: present
- slot: 1a
name: bp-gateway-api
# Upstream Kubernetes Gateway API CRDs (Standard channel — issue #503).
# Cilium 1.16's `gatewayAPI.enabled=true` enables the controller but does
# NOT install the gateway.networking.k8s.io CRDs themselves; without them
# every chart that ships HTTPRoute templates (bp-keycloak / bp-gitea /
# bp-powerdns / bp-openbao / bp-harbor / bp-grafana / bp-catalyst-platform)
# fails install with `no matches for kind HTTPRoute`. Same split-CRD
# pattern as bp-crossplane-claims and bp-external-secrets-stores.
depends_on: [bp-cilium]
wave: present
- slot: 2
name: bp-cert-manager
depends_on: [bp-cilium]
wave: present
- slot: 3
name: bp-flux
depends_on: [bp-cert-manager]
wave: present
- slot: 4
name: bp-crossplane
depends_on: [bp-flux]
wave: present
- slot: 5
name: bp-sealed-secrets
depends_on: [bp-cert-manager]
wave: present
- slot: "5a"
name: bp-reflector
# emberstack/reflector — secret/configmap mirror controller (issue #543).
# Propagates ghcr-pull secret to every namespace so cross-namespace
# ImagePullBackOff gaps are eliminated. Slot 5a: after sealed-secrets,
# before spire. dependsOn bp-cert-manager (CRDs must exist).
# Used by bp-gitea + bp-harbor to propagate CNPG-generated pg-app Secrets.
depends_on: [bp-cert-manager]
wave: present
- slot: 7
name: bp-nats-jetstream
depends_on: []
wave: present
- slot: 8
name: bp-openbao
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template;
# gateway.networking.k8s.io/v1 CRDs must be registered before install.
# bp-cnpg dep (issue #512): post-install init hook (`bao operator init`)
# races cnpg readiness on a fresh Sovereign, hitting the 15m install
# timeout. Explicit dep makes Flux wait for cnpg Ready=True first.
depends_on: [bp-gateway-api, bp-cnpg]
wave: present
- slot: 9
name: bp-keycloak
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
depends_on: [bp-cert-manager, bp-gateway-api]
wave: present
- slot: 10
name: bp-gitea
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
# bp-cnpg dep (issue #584): chart ships a CNPG Cluster CR; postgresql.cnpg.io/v1
# CRD must be registered before bp-gitea applies so Capabilities gate fires.
depends_on: [bp-keycloak, bp-gateway-api, bp-cnpg]
wave: present
- slot: 11
name: bp-powerdns
# bp-gateway-api dep (issue #503): chart ships an api-httproute.yaml template.
# bp-cnpg dep: chart's templates/cnpg-cluster.yaml renders a
# postgresql.cnpg.io/v1.Cluster gated on Capabilities.APIVersions.
# Without this dep Helm renders before the CRD is registered, the
# gate evaluates false, the Cluster CR is silently skipped, CNPG
# never creates pdns-pg-app, and powerdns Pods fail at boot with
# "secret pdns-pg-app not found" (caught live during otech28).
depends_on: [bp-cert-manager, bp-gateway-api, bp-cnpg]
wave: present
- slot: 12
name: bp-external-dns
# bp-reflector dep (issue #543): external-dns HTTPRoute uses reflector-mirrored
# ghcr-pull secret; reflector must be Ready before external-dns deploys.
depends_on: [bp-cert-manager, bp-powerdns, bp-reflector]
wave: present
- slot: 13
name: bp-catalyst-platform
# bp-gateway-api dep (issue #503): umbrella chart ships catalyst-ui +
# catalyst-api HTTPRoute templates.
# bp-keycloak + bp-cnpg deps (issue #512): umbrella post-install Jobs
# bootstrap OIDC clients + seed PG schemas; both deps take 5+ min to
# reach Ready on a fresh Sovereign, racing the 15m install timeout.
# Explicit deps make Flux wait for both Ready=True before umbrella starts.
depends_on: [bp-gitea, bp-gateway-api, bp-keycloak, bp-cnpg]
wave: present
- slot: 14
name: bp-crossplane-claims
depends_on: [bp-crossplane]
wave: present
# ---- Tier 5: storage + DB (W2.K1, slots 15-19) ----------------------------
- slot: 15
name: bp-external-secrets
depends_on: [bp-openbao, bp-cert-manager]
wave: W2.K1
- slot: 15a
name: bp-external-secrets-stores
# Default ClusterSecretStore CR(s). Split from bp-external-secrets@1.0.0
# at PR #334 (issue #331) to resolve CRD-ordering deadlock —
# ClusterSecretStore CR cannot live in the same Helm release as the ESO
# subchart that registers its CRD. Mirrors bp-crossplane ↔
# bp-crossplane-claims pattern.
depends_on: [bp-external-secrets, bp-openbao]
wave: W2.K1
- slot: 16
name: bp-cnpg
depends_on: [bp-flux]
wave: W2.K1
- slot: 17
name: bp-valkey
depends_on: [bp-flux]
wave: W2.K1
- slot: 18
name: bp-seaweedfs
depends_on: [bp-flux, bp-cert-manager]
wave: W2.K1
- slot: 19
name: bp-harbor
# bp-seaweedfs dependency REMOVED per ADR-0001 §13 (cloud-direct).
# Harbor on Sovereigns writes blobs directly to cloud Object Storage
# (Hetzner / R2 / S3 / Azure / GCS), not via SeaweedFS. See
# clusters/_template/bootstrap-kit/19-harbor.yaml lines 35-37.
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template;
# gateway.networking.k8s.io/v1 CRDs must be registered first.
depends_on: [bp-cnpg, bp-cert-manager, bp-gateway-api]
wave: W2.K1
# ---- Tier 6: observability (W2.K2, slots 20-26) ---------------------------
- slot: 20
name: bp-opentelemetry
depends_on: [bp-cert-manager]
wave: W2.K2
- slot: 21
name: bp-alloy
depends_on: [bp-opentelemetry]
wave: W2.K2
- slot: 22
name: bp-loki
depends_on: [bp-seaweedfs]
wave: W2.K2
- slot: 23
name: bp-mimir
depends_on: [bp-seaweedfs]
wave: W2.K2
- slot: 24
name: bp-tempo
depends_on: [bp-seaweedfs]
wave: W2.K2
- slot: 25
name: bp-grafana
# bp-gateway-api dep (issue #503): chart ships an HTTPRoute template.
depends_on: [bp-cnpg, bp-loki, bp-mimir, bp-tempo, bp-keycloak, bp-gateway-api]
wave: W2.K2
# ---- Tier 7: security + policy (W2.K3, slots 27-34) -----------------------
- slot: 27
name: bp-kyverno
depends_on: [bp-cilium]
wave: W2.K3
- slot: 28
name: bp-reloader
depends_on: []
wave: W2.K3
- slot: 29
name: bp-vpa
depends_on: []
wave: W2.K3
- slot: 30
name: bp-trivy
depends_on: [bp-cert-manager]
wave: W2.K3
- slot: 31
name: bp-falco
depends_on: [bp-cilium]
wave: W2.K3
- slot: 32
name: bp-sigstore
depends_on: [bp-cert-manager]
wave: W2.K3
- slot: 33
name: bp-syft-grype
depends_on: [bp-cert-manager]
wave: W2.K3
- slot: 34
name: bp-velero
# No dependsOn — Velero on Hetzner Sovereigns writes DIRECTLY to
# Hetzner Object Storage per ADR-0001 §13 + WBS §3 (S3-aware app
# rule). The previous SeaweedFS dependency was retired in #384;
# Velero's BackupStorageLocation now consumes flux-system/hetzner-
# object-storage Secret (issue #371) via Flux valuesFrom, populated
# at HelmRelease apply time — no in-cluster prerequisite Blueprint.
depends_on: []
wave: W2.K3
# ---- Tier 8 + 9: edge + apps + AI runtime (W2.K4, slots 35-48) ------------
- slot: 35
name: bp-coraza
depends_on: [bp-cilium, bp-cert-manager]
wave: W2.K4
- slot: 36
name: bp-stunner
depends_on: [bp-cilium, bp-cert-manager]
wave: W2.K4
- slot: 37
name: bp-knative
depends_on: [bp-cert-manager]
wave: W2.K4
- slot: 38
name: bp-kserve
depends_on: [bp-knative]
wave: W2.K4
- slot: 39
name: bp-vllm
depends_on: [bp-kserve]
wave: W2.K4
- slot: 40
name: bp-llm-gateway
depends_on: [bp-cnpg, bp-keycloak]
wave: W2.K4
- slot: 41
name: bp-anthropic-adapter
depends_on: [bp-llm-gateway]
wave: W2.K4
- slot: 42
name: bp-bge
depends_on: [bp-cnpg]
wave: W2.K4
- slot: 43
name: bp-nemo-guardrails
depends_on: [bp-llm-gateway, bp-bge, bp-cnpg]
wave: W2.K4
- slot: 44
name: bp-temporal
depends_on: [bp-cnpg, bp-cert-manager]
wave: W2.K4
- slot: 45
name: bp-openmeter
depends_on: [bp-cnpg, bp-nats-jetstream]
wave: W2.K4
- slot: 46
name: bp-livekit
depends_on: [bp-stunner, bp-cert-manager]
wave: W2.K4
- slot: 47
name: bp-matrix
depends_on: [bp-cnpg, bp-keycloak, bp-cert-manager]
wave: W2.K4
- slot: 48
name: bp-librechat
depends_on: [bp-llm-gateway, bp-vllm, bp-bge, bp-keycloak]
wave: W2.K4
# ---- Slot 49 — DNS-01 wildcard TLS solver against contabo's central PowerDNS
# Authored under #373; lands at slot 49 because slots 36-48 were already
# forward-declared by the W2.K4 batch. Re-targeted from per-Sovereign
# PowerDNS to contabo central PowerDNS (https://pdns.openova.io) because
# omani.works is delegated from Dynadot to ns1/2/3.openova.io which run
# on contabo PowerDNS — the Sovereign's own PowerDNS is not on the
# public DNS chain until pool-domain-manager seals the per-Sovereign
# NS delegation. Caught live on otech4346. Slot 49b
# (bp-cert-manager-dynadot-webhook) was dropped in the same PR
# (Dynadot is NOT the API-level authority for omani.works subdomains).
- slot: 49
name: bp-cert-manager-powerdns-webhook
depends_on: [bp-cert-manager]
wave: present