Audit of clusters/_template/bootstrap-kit/ found 36 HelmRelease templates without an explicit timeout under install/upgrade — relying on Helm's default 5m which races on cold-start hooks (CRD apply, post-install Jobs, PVC binding on fresh nodes). PRs #127 / #131 / #143 / #150 already added timeout: 15m to bp-self-sovereign-cutover, bp-gitea, bp-external-secrets- stores and bp-harbor reactively after each new blueprint hit a 5m race. Preempt the next 30+ reactive PRs by adding the same explicit `timeout: 15m` to install AND upgrade across the full template surface. Pattern matches the existing fixes: kept alongside `disableWait: true` where present (the timeout protects the Helm install/upgrade transaction itself — manifest apply, CRD establishment, hook Job — even when wait on workload Ready is disabled). Modified blueprints (alphabetical inside each cohort): CNI/Gateway: cilium, gateway-api Cert/Identity: cert-manager, sealed-secrets, reflector, openbao*, keycloak* GitOps/IaC: flux, crossplane, crossplane-claims Messaging: nats-jetstream DNS: powerdns, external-dns, bp-cert-manager-powerdns-webhook Secrets: external-secrets Data: cnpg, valkey, seaweedfs Observability: opentelemetry, alloy, loki, mimir, tempo, grafana Policy: kyverno, reloader, vpa Security: trivy, falco, sigstore, syft-grype, coraza Backup: velero Platform: cluster-autoscaler, bp-k8s-ws-proxy, bp-guacamole, bp-hcloud-ccm Apps: newapi (* openbao/keycloak/gitea/harbor/cutover/es-stores already had timeout from prior PRs and were not modified.) ## Claimed TCs Infra-only template change — preempts future Helm 5m-default cold-start race wedges across the full bootstrap-kit. Validation surface is the next fresh provision (TC: zero-touch Sovereign provision reaches Ready=True on all HRs without per-blueprint timeout fix-forwards). Refs #154, #127, #131, #143, #150. Per principle 16: HR-level install/upgrade timeout is the canonical seam. Per principle 4: target-state — preempt rather than react. Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
93 lines
3.1 KiB
YAML
93 lines
3.1 KiB
YAML
# bp-external-dns — Catalyst Blueprint #12 of 13. Per-Sovereign DNS sync —
|
|
# ExternalDNS reconciles Service/Ingress hostnames into the per-Sovereign
|
|
# PowerDNS authoritative server via the native `pdns` provider. Geo +
|
|
# health-checked failover responses are owned by PowerDNS lua-records,
|
|
# NOT by ExternalDNS.
|
|
#
|
|
# Wrapper chart: platform/external-dns/chart/
|
|
#
|
|
# dependsOn:
|
|
# - bp-cert-manager — ExternalDNS HelmRelease only after TLS issuers
|
|
# are reconciled, so any cert-manager-fronted webhook endpoints in
|
|
# downstream overlays come up cleanly.
|
|
# - bp-powerdns — native `pdns` provider points at the in-cluster
|
|
# bp-powerdns Service and reads the `powerdns-api-credentials` Secret
|
|
# it renders. Without bp-powerdns the ExternalDNS pod CrashLoops
|
|
# trying to dial a non-existent DNS API.
|
|
# - bp-reflector — Reflector mirrors the `powerdns-api-credentials`
|
|
# Secret from the `powerdns` namespace to `external-dns` automatically
|
|
# (issue #544). bp-reflector must be running before bp-external-dns
|
|
# installs so the reflected Secret is present when the pod starts.
|
|
|
|
---
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: external-dns
|
|
labels:
|
|
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
|
---
|
|
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
|
kind: HelmRepository
|
|
metadata:
|
|
name: bp-external-dns
|
|
namespace: flux-system
|
|
spec:
|
|
type: oci
|
|
interval: 15m
|
|
url: oci://ghcr.io/openova-io
|
|
secretRef:
|
|
name: ghcr-pull
|
|
---
|
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
|
kind: HelmRelease
|
|
metadata:
|
|
name: bp-external-dns
|
|
namespace: flux-system
|
|
spec:
|
|
interval: 15m
|
|
releaseName: external-dns
|
|
targetNamespace: external-dns
|
|
dependsOn:
|
|
- name: bp-cert-manager
|
|
- name: bp-powerdns
|
|
- name: bp-reflector
|
|
chart:
|
|
spec:
|
|
chart: bp-external-dns
|
|
# 1.1.7: companion CiliumNetworkPolicy with toEntities[kube-apiserver]
|
|
# so external-dns can reach the kube-apiserver on Cilium clusters
|
|
# (default policy-cidr-match-mode=""). Fixes #770 — the vanilla
|
|
# NetworkPolicy 0.0.0.0/0 ipBlock does NOT match apiserver traffic
|
|
# under Cilium's identity model.
|
|
version: 1.1.7
|
|
sourceRef:
|
|
kind: HelmRepository
|
|
name: bp-external-dns
|
|
namespace: flux-system
|
|
# Event-driven install: ExternalDNS pod readiness depends on a
|
|
# successful initial reconcile against the per-Sovereign PowerDNS API
|
|
# (which itself stabilises after pdns-pg CNPG bootstraps) — legitimate
|
|
# slow-Ready cascade. Helm install completes when manifests apply.
|
|
# Replaces PR #221 spec.timeout: 15m.
|
|
install:
|
|
timeout: 15m
|
|
disableWait: true
|
|
remediation:
|
|
retries: 3
|
|
upgrade:
|
|
timeout: 15m
|
|
disableWait: true
|
|
remediation:
|
|
retries: 3
|
|
# Per-Sovereign overrides — txtOwnerId MUST be the Sovereign FQDN so two
|
|
# Sovereigns sharing a parent zone don't fight over the same record set.
|
|
# domainFilters narrow the zones ExternalDNS will manage; per-Sovereign
|
|
# cluster overlays patch this with the actual zone list.
|
|
values:
|
|
external-dns:
|
|
txtOwnerId: ${SOVEREIGN_FQDN}
|
|
txtPrefix: _externaldns.
|
|
domainFilters:
|
|
- ${SOVEREIGN_FQDN}
|