openova/clusters/_template/bootstrap-kit/12-external-dns.yaml
e3mrah 7e8c0f2944
fix(bootstrap-kit): add explicit install/upgrade timeout to all HR templates (#154) (#1357)
Audit of clusters/_template/bootstrap-kit/ found 36 HelmRelease templates
without an explicit timeout under install/upgrade — relying on Helm's
default 5m which races on cold-start hooks (CRD apply, post-install Jobs,
PVC binding on fresh nodes). PRs #127 / #131 / #143 / #150 already added
timeout: 15m to bp-self-sovereign-cutover, bp-gitea, bp-external-secrets-
stores and bp-harbor reactively after each new blueprint hit a 5m race.

Preempt the next 30+ reactive PRs by adding the same explicit
`timeout: 15m` to install AND upgrade across the full template surface.
Pattern matches the existing fixes: kept alongside `disableWait: true`
where present (the timeout protects the Helm install/upgrade transaction
itself — manifest apply, CRD establishment, hook Job — even when wait
on workload Ready is disabled).

Modified blueprints (alphabetical inside each cohort):
  CNI/Gateway: cilium, gateway-api
  Cert/Identity: cert-manager, sealed-secrets, reflector, openbao*, keycloak*
  GitOps/IaC: flux, crossplane, crossplane-claims
  Messaging: nats-jetstream
  DNS: powerdns, external-dns, bp-cert-manager-powerdns-webhook
  Secrets: external-secrets
  Data: cnpg, valkey, seaweedfs
  Observability: opentelemetry, alloy, loki, mimir, tempo, grafana
  Policy: kyverno, reloader, vpa
  Security: trivy, falco, sigstore, syft-grype, coraza
  Backup: velero
  Platform: cluster-autoscaler, bp-k8s-ws-proxy, bp-guacamole, bp-hcloud-ccm
  Apps: newapi

(* openbao/keycloak/gitea/harbor/cutover/es-stores already had timeout
   from prior PRs and were not modified.)

## Claimed TCs
Infra-only template change — preempts future Helm 5m-default cold-start
race wedges across the full bootstrap-kit. Validation surface is the
next fresh provision (TC: zero-touch Sovereign provision reaches
Ready=True on all HRs without per-blueprint timeout fix-forwards).

Refs #154, #127, #131, #143, #150.
Per principle 16: HR-level install/upgrade timeout is the canonical seam.
Per principle 4: target-state — preempt rather than react.

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 08:41:50 +04:00

93 lines
3.1 KiB
YAML

# bp-external-dns — Catalyst Blueprint #12 of 13. Per-Sovereign DNS sync —
# ExternalDNS reconciles Service/Ingress hostnames into the per-Sovereign
# PowerDNS authoritative server via the native `pdns` provider. Geo +
# health-checked failover responses are owned by PowerDNS lua-records,
# NOT by ExternalDNS.
#
# Wrapper chart: platform/external-dns/chart/
#
# dependsOn:
# - bp-cert-manager — ExternalDNS HelmRelease only after TLS issuers
# are reconciled, so any cert-manager-fronted webhook endpoints in
# downstream overlays come up cleanly.
# - bp-powerdns — native `pdns` provider points at the in-cluster
# bp-powerdns Service and reads the `powerdns-api-credentials` Secret
# it renders. Without bp-powerdns the ExternalDNS pod CrashLoops
# trying to dial a non-existent DNS API.
# - bp-reflector — Reflector mirrors the `powerdns-api-credentials`
# Secret from the `powerdns` namespace to `external-dns` automatically
# (issue #544). bp-reflector must be running before bp-external-dns
# installs so the reflected Secret is present when the pod starts.
---
apiVersion: v1
kind: Namespace
metadata:
name: external-dns
labels:
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bp-external-dns
namespace: flux-system
spec:
type: oci
interval: 15m
url: oci://ghcr.io/openova-io
secretRef:
name: ghcr-pull
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: bp-external-dns
namespace: flux-system
spec:
interval: 15m
releaseName: external-dns
targetNamespace: external-dns
dependsOn:
- name: bp-cert-manager
- name: bp-powerdns
- name: bp-reflector
chart:
spec:
chart: bp-external-dns
# 1.1.7: companion CiliumNetworkPolicy with toEntities[kube-apiserver]
# so external-dns can reach the kube-apiserver on Cilium clusters
# (default policy-cidr-match-mode=""). Fixes #770 — the vanilla
# NetworkPolicy 0.0.0.0/0 ipBlock does NOT match apiserver traffic
# under Cilium's identity model.
version: 1.1.7
sourceRef:
kind: HelmRepository
name: bp-external-dns
namespace: flux-system
# Event-driven install: ExternalDNS pod readiness depends on a
# successful initial reconcile against the per-Sovereign PowerDNS API
# (which itself stabilises after pdns-pg CNPG bootstraps) — legitimate
# slow-Ready cascade. Helm install completes when manifests apply.
# Replaces PR #221 spec.timeout: 15m.
install:
timeout: 15m
disableWait: true
remediation:
retries: 3
upgrade:
timeout: 15m
disableWait: true
remediation:
retries: 3
# Per-Sovereign overrides — txtOwnerId MUST be the Sovereign FQDN so two
# Sovereigns sharing a parent zone don't fight over the same record set.
# domainFilters narrow the zones ExternalDNS will manage; per-Sovereign
# cluster overlays patch this with the actual zone list.
values:
external-dns:
txtOwnerId: ${SOVEREIGN_FQDN}
txtPrefix: _externaldns.
domainFilters:
- ${SOVEREIGN_FQDN}