Audit of clusters/_template/bootstrap-kit/ found 36 HelmRelease templates without an explicit timeout under install/upgrade — relying on Helm's default 5m which races on cold-start hooks (CRD apply, post-install Jobs, PVC binding on fresh nodes). PRs #127 / #131 / #143 / #150 already added timeout: 15m to bp-self-sovereign-cutover, bp-gitea, bp-external-secrets- stores and bp-harbor reactively after each new blueprint hit a 5m race. Preempt the next 30+ reactive PRs by adding the same explicit `timeout: 15m` to install AND upgrade across the full template surface. Pattern matches the existing fixes: kept alongside `disableWait: true` where present (the timeout protects the Helm install/upgrade transaction itself — manifest apply, CRD establishment, hook Job — even when wait on workload Ready is disabled). Modified blueprints (alphabetical inside each cohort): CNI/Gateway: cilium, gateway-api Cert/Identity: cert-manager, sealed-secrets, reflector, openbao*, keycloak* GitOps/IaC: flux, crossplane, crossplane-claims Messaging: nats-jetstream DNS: powerdns, external-dns, bp-cert-manager-powerdns-webhook Secrets: external-secrets Data: cnpg, valkey, seaweedfs Observability: opentelemetry, alloy, loki, mimir, tempo, grafana Policy: kyverno, reloader, vpa Security: trivy, falco, sigstore, syft-grype, coraza Backup: velero Platform: cluster-autoscaler, bp-k8s-ws-proxy, bp-guacamole, bp-hcloud-ccm Apps: newapi (* openbao/keycloak/gitea/harbor/cutover/es-stores already had timeout from prior PRs and were not modified.) ## Claimed TCs Infra-only template change — preempts future Helm 5m-default cold-start race wedges across the full bootstrap-kit. Validation surface is the next fresh provision (TC: zero-touch Sovereign provision reaches Ready=True on all HRs without per-blueprint timeout fix-forwards). Refs #154, #127, #131, #143, #150. Per principle 16: HR-level install/upgrade timeout is the canonical seam. Per principle 4: target-state — preempt rather than react. Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71 lines
2.0 KiB
YAML
71 lines
2.0 KiB
YAML
# bp-syft-grype — Catalyst bootstrap-kit Blueprint #33 (W2.K3, Tier 7 — Security/Policy).
|
|
# Anchore Syft + Grype as a scheduled CronJob. SBOM generation (Syft)
|
|
# paired with vulnerability matching (Grype) — the offline / scheduled
|
|
# half of the supply-chain stack. Anchore does not publish a Helm chart
|
|
# for the open-source CLIs, so this Blueprint is a scratch chart that
|
|
# wires the official ghcr.io/anchore/syft and ghcr.io/anchore/grype
|
|
# containers into a CronJob that scans the Sovereign's image inventory.
|
|
#
|
|
# Wrapper chart: platform/syft-grype/chart/ (Catalyst-authored scratch
|
|
# chart — no upstream subchart).
|
|
# Reconciled by: Flux on the new Sovereign's k3s control plane.
|
|
#
|
|
# dependsOn:
|
|
# - bp-cert-manager — the result-export sidecar publishes SBOMs over
|
|
# mTLS to the central scan-result store; cert-manager issues the
|
|
# workload's TLS material via the cluster's ClusterIssuer.
|
|
|
|
---
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: syft-grype
|
|
labels:
|
|
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
|
---
|
|
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
|
kind: HelmRepository
|
|
metadata:
|
|
name: bp-syft-grype
|
|
namespace: flux-system
|
|
spec:
|
|
type: oci
|
|
interval: 15m
|
|
url: oci://ghcr.io/openova-io
|
|
secretRef:
|
|
name: ghcr-pull
|
|
---
|
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
|
kind: HelmRelease
|
|
metadata:
|
|
name: bp-syft-grype
|
|
namespace: flux-system
|
|
spec:
|
|
interval: 15m
|
|
releaseName: syft-grype
|
|
targetNamespace: syft-grype
|
|
dependsOn:
|
|
- name: bp-cert-manager
|
|
chart:
|
|
spec:
|
|
chart: bp-syft-grype
|
|
version: 1.0.0
|
|
sourceRef:
|
|
kind: HelmRepository
|
|
name: bp-syft-grype
|
|
namespace: flux-system
|
|
# Event-driven install: the Blueprint is mostly a CronJob + RBAC
|
|
# surface. There is no long-running Deployment whose Ready=True is
|
|
# meaningful — disableWait is the correct shape so Flux marks Ready
|
|
# as soon as manifests apply.
|
|
install:
|
|
timeout: 15m
|
|
disableWait: true
|
|
remediation:
|
|
retries: 3
|
|
upgrade:
|
|
timeout: 15m
|
|
disableWait: true
|
|
remediation:
|
|
retries: 3
|