openova/clusters/_template/bootstrap-kit/33-syft-grype.yaml
e3mrah 7e8c0f2944
fix(bootstrap-kit): add explicit install/upgrade timeout to all HR templates (#154) (#1357)
Audit of clusters/_template/bootstrap-kit/ found 36 HelmRelease templates
without an explicit timeout under install/upgrade — relying on Helm's
default 5m which races on cold-start hooks (CRD apply, post-install Jobs,
PVC binding on fresh nodes). PRs #127 / #131 / #143 / #150 already added
timeout: 15m to bp-self-sovereign-cutover, bp-gitea, bp-external-secrets-
stores and bp-harbor reactively after each new blueprint hit a 5m race.

Preempt the next 30+ reactive PRs by adding the same explicit
`timeout: 15m` to install AND upgrade across the full template surface.
Pattern matches the existing fixes: kept alongside `disableWait: true`
where present (the timeout protects the Helm install/upgrade transaction
itself — manifest apply, CRD establishment, hook Job — even when wait
on workload Ready is disabled).

Modified blueprints (alphabetical inside each cohort):
  CNI/Gateway: cilium, gateway-api
  Cert/Identity: cert-manager, sealed-secrets, reflector, openbao*, keycloak*
  GitOps/IaC: flux, crossplane, crossplane-claims
  Messaging: nats-jetstream
  DNS: powerdns, external-dns, bp-cert-manager-powerdns-webhook
  Secrets: external-secrets
  Data: cnpg, valkey, seaweedfs
  Observability: opentelemetry, alloy, loki, mimir, tempo, grafana
  Policy: kyverno, reloader, vpa
  Security: trivy, falco, sigstore, syft-grype, coraza
  Backup: velero
  Platform: cluster-autoscaler, bp-k8s-ws-proxy, bp-guacamole, bp-hcloud-ccm
  Apps: newapi

(* openbao/keycloak/gitea/harbor/cutover/es-stores already had timeout
   from prior PRs and were not modified.)

## Claimed TCs
Infra-only template change — preempts future Helm 5m-default cold-start
race wedges across the full bootstrap-kit. Validation surface is the
next fresh provision (TC: zero-touch Sovereign provision reaches
Ready=True on all HRs without per-blueprint timeout fix-forwards).

Refs #154, #127, #131, #143, #150.
Per principle 16: HR-level install/upgrade timeout is the canonical seam.
Per principle 4: target-state — preempt rather than react.

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 08:41:50 +04:00

71 lines
2.0 KiB
YAML

# bp-syft-grype — Catalyst bootstrap-kit Blueprint #33 (W2.K3, Tier 7 — Security/Policy).
# Anchore Syft + Grype as a scheduled CronJob. SBOM generation (Syft)
# paired with vulnerability matching (Grype) — the offline / scheduled
# half of the supply-chain stack. Anchore does not publish a Helm chart
# for the open-source CLIs, so this Blueprint is a scratch chart that
# wires the official ghcr.io/anchore/syft and ghcr.io/anchore/grype
# containers into a CronJob that scans the Sovereign's image inventory.
#
# Wrapper chart: platform/syft-grype/chart/ (Catalyst-authored scratch
# chart — no upstream subchart).
# Reconciled by: Flux on the new Sovereign's k3s control plane.
#
# dependsOn:
# - bp-cert-manager — the result-export sidecar publishes SBOMs over
# mTLS to the central scan-result store; cert-manager issues the
# workload's TLS material via the cluster's ClusterIssuer.
---
apiVersion: v1
kind: Namespace
metadata:
name: syft-grype
labels:
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bp-syft-grype
namespace: flux-system
spec:
type: oci
interval: 15m
url: oci://ghcr.io/openova-io
secretRef:
name: ghcr-pull
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: bp-syft-grype
namespace: flux-system
spec:
interval: 15m
releaseName: syft-grype
targetNamespace: syft-grype
dependsOn:
- name: bp-cert-manager
chart:
spec:
chart: bp-syft-grype
version: 1.0.0
sourceRef:
kind: HelmRepository
name: bp-syft-grype
namespace: flux-system
# Event-driven install: the Blueprint is mostly a CronJob + RBAC
# surface. There is no long-running Deployment whose Ready=True is
# meaningful — disableWait is the correct shape so Flux marks Ready
# as soon as manifests apply.
install:
timeout: 15m
disableWait: true
remediation:
retries: 3
upgrade:
timeout: 15m
disableWait: true
remediation:
retries: 3