652d796f4b
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1ddd569789 |
fix(bp-*): observability toggles default false — break circular CRD dependency
Extends the v1.1.1 hardening that started with cilium / cert-manager /
crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints.
Every observability toggle in every Catalyst-curated Blueprint now ships
`false`/`null` by default; the operator opts in via a per-cluster values
overlay at clusters/<sovereign>/bootstrap-kit/* once
bp-kube-prometheus-stack reconciles.
Live failure mode that prompted this (omantel.omani.works 2026-04-29):
bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor
to true. The upstream Cilium 1.16.5 chart renders a
monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with
kube-prometheus-stack — a tier-2 Application Blueprint that depends on
the bootstrap-kit (cilium first). Helm install fails on a fresh
Sovereign with "no matches for kind ServiceMonitor in version
monitoring.coreos.com/v1 — ensure CRDs are installed first" and every
downstream HelmRelease reports `dep is not ready`. The earlier
trustCRDsExist=true mitigation only suppresses Helm's render-time gate;
the apiserver still rejects the resource at install-time.
Per-Blueprint changes:
- bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false;
hubble.metrics.enabled → null (this is the exact value that disables
the upstream metrics ServiceMonitor template branch — verified by
reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor
.enabled → false. tests/observability-toggle.sh extended with Case 4
(default render produces no hubble-relay / hubble-ui Deployments).
- bp-flux: flux2.prometheus.podMonitor.create → false.
- bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled
→ false (explicit lock; upstream already defaults false).
- bp-spire: spire.global.spire.recommendations.enabled +
recommendations.prometheus → false.
- bp-nats-jetstream: nats.promExporter.enabled +
promExporter.podMonitor.enabled → false.
- bp-openbao: openbao.injector.metrics.enabled +
openbao.serviceMonitor.enabled → false.
- bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled
+ metrics.prometheusRule.enabled → false.
- bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.*
serviceMonitor + prometheusRule → false.
- bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled
→ false (forward-compatibility guard; current upstream
pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future
upstream bump cannot silently regress).
Each chart ships a tests/observability-toggle.sh that asserts the rule
in three cases (default off / explicit on opt-in / explicit off) — runs
under blueprint-release.yaml's chart-test gate (added
|
||
|
|
b1638f51ea |
fix(bp-* tests): skip helm dep build when charts/ already vendored
Earlier rerun failure on the CI workflow (bp-cert-manager 25120060270): Error: no repository definition for https://charts.jetstack.io. Please add the missing repos via 'helm repo add' Root cause: blueprint-release.yaml's earlier `helm dependency build` step (line 181) successfully resolves the upstream chart and populates chart/charts/ — but it does NOT `helm repo add` the upstream repo first. Helm 3.20's `helm dep build` succeeds on the first call by falling back to direct-URL fetch from Chart.yaml `dependencies[].repository`. A SECOND `helm dep build` (run by the test script) hits a different code path that requires the repo to be in the helm repo cache. Fix: tests/observability-toggle.sh now skips `helm dep build` when chart/charts/ is already populated (which is always the case in CI since the workflow's own `helm dependency build` step ran first). Local dev runs from a fresh checkout still resolve subcharts. Refs #182 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d34facc040 |
fix(bp-*): observability toggles default false — break circular CRD dependency
bp-cilium@1.1.0 install fails on every fresh Sovereign with:
no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
— ensure CRDs are installed first
Cascades to all 10 other bp-* HelmReleases ("dep is not ready") since
bp-cilium is the root of the bootstrap dep graph. Verified live on
omantel.omani.works 2026-04-29 (issue #182).
Root cause: platform/cilium/chart/values.yaml and
platform/cert-manager/chart/values.yaml hardcoded
`serviceMonitor.enabled: true`. The monitoring.coreos.com/v1 CRDs ship
with kube-prometheus-stack — an Application-tier Blueprint that itself
depends on the bootstrap-kit. Hardcoding `true` creates a circular CRD
ordering: bp-cilium wants the CRD bp-kube-prometheus-stack provides, but
bp-kube-prometheus-stack cannot install before bp-cilium.
The `trustCRDsExist=true` mitigation only suppresses Helm's render-time
gate; the apiserver still rejects the resource at install-time.
Violates INVIOLABLE-PRINCIPLES.md #4 (never hardcode): observability
toggles MUST be operator-tunable, not chart-level constants assuming an
observability tier exists.
This commit:
A. Defaults every observability toggle false in the affected wrappers:
- platform/cilium/chart/values.yaml:
cilium.prometheus.enabled: false
cilium.prometheus.serviceMonitor.enabled: false
(trustCRDsExist removed — no longer relevant)
- platform/cert-manager/chart/values.yaml:
cert-manager.prometheus.enabled: false
cert-manager.prometheus.servicemonitor.enabled: false
- platform/crossplane/chart/values.yaml:
crossplane.metrics.enabled: false
(uniformity rule — does not break install but holds the invariant)
B. Bumps affected wrapper charts 1.1.0 → 1.1.1:
- bp-cilium, bp-cert-manager, bp-crossplane (leaves)
- bp-catalyst-platform (umbrella; deps repinned to 1.1.1 for the 3)
C. Updates clusters/_template/bootstrap-kit/* and
clusters/omantel.omani.works/bootstrap-kit/* HelmRelease versions to
1.1.1 so the live Sovereign picks up the fix on Flux reconcile.
D. Adds platform/<name>/chart/tests/observability-toggle.sh under each
affected chart. Each script asserts:
- default render produces zero monitoring.coreos.com refs
- opt-in render with --set <toggle>=true succeeds and produces a
ServiceMonitor (proves the toggle is wired)
- explicit-off render succeeds and produces zero refs
Wired into .github/workflows/blueprint-release.yaml via a new
"Run chart integration tests" step that executes every chart/tests/
*.sh on every publish — a regression that re-introduces a hardcoded
`true` fails the publish job before the OCI artifact is pushed.
E. Documents the rule in docs/BLUEPRINT-AUTHORING.md §11.2
"Observability toggles must default false". References Principle #4
and provides the canonical pattern (default off in wrapper values,
opt-in via per-cluster overlay at clusters/<sovereign>/...).
Per-chart audit table (which toggle was hardcoded → new default):
| Chart | Toggle | Was | Now |
|------------------|----------------------------------------------------------|------|-------|
| bp-cilium | cilium.prometheus.enabled | true | false |
| bp-cilium | cilium.prometheus.serviceMonitor.enabled | true | false |
| bp-cert-manager | cert-manager.prometheus.enabled | true | false |
| bp-cert-manager | cert-manager.prometheus.servicemonitor.enabled | true | false |
| bp-crossplane | crossplane.metrics.enabled | true | false |
| bp-flux | (no observability hardcodes) | n/a | n/a |
| bp-sealed-secrets| (no observability hardcodes) | n/a | n/a |
| bp-spire | (no observability hardcodes) | n/a | n/a |
| bp-nats-jetstream| (no observability hardcodes) | n/a | n/a |
| bp-openbao | (no observability hardcodes) | n/a | n/a |
| bp-keycloak | (no observability hardcodes) | n/a | n/a |
| bp-gitea | (no observability hardcodes) | n/a | n/a |
| bp-powerdns | (no observability hardcodes) | n/a | n/a |
| bp-catalyst-platform | (umbrella, no values overlay) | n/a | n/a |
Local gates green:
helm dep build ✓ all 3 affected charts
helm lint ✓ all 3
helm template ✓ all 3 — 0 monitoring.coreos.com refs in default
tests/observability-toggle.sh ✓ all 9 sub-cases pass
Closes the install path for bp-cilium 1.1.1 on a fresh Sovereign;
unblocks the full bp-* dep graph.
Refs: https://github.com/openova-io/openova/issues/182
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
43aff20254 |
feat(bp-*): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream
Each platform/<name>/chart/Chart.yaml now declares the canonical upstream chart as a dependencies: entry. helm dependency build pulls the upstream payload into the OCI artifact at publish time, so Flux helm install of bp-<name>:1.1.0 actually installs the upstream Helm release alongside the Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer, ExternalSecret) under templates/. Pinned upstream chart versions per platform/<name>/blueprint.yaml: - cilium 1.16.5 https://helm.cilium.io - cert-manager v1.16.2 https://charts.jetstack.io - flux 2.4.0 https://fluxcd-community.github.io/helm-charts - crossplane 1.17.x https://charts.crossplane.io/stable - sealed-secrets 2.16.x https://bitnami-labs.github.io/sealed-secrets - spire ... https://spiffe.github.io/helm-charts-hardened - nats-jetstream ... https://nats-io.github.io/k8s/helm/charts - openbao ... https://openbao.github.io/openbao-helm - keycloak ... https://charts.bitnami.com/bitnami - gitea ... https://dl.gitea.com/charts - catalyst-platform umbrella over the 10 leaf bp-* charts via helm dependency values.yaml in each chart adopts the umbrella convention: catalystBlueprint metadata block (provenance + version) at top level, upstream subchart values namespaced under the dependency name. cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER cert-manager controllers are running and CRDs registered (the previous hollow-chart shape ran the ClusterIssuer at install time when CRDs didn't exist yet, which was the omantel cluster's exact failure mode). Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella conversion is a meaningful structural revision). Cluster manifests in clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/ bootstrap-kit/ updated to reference 1.1.0. The blueprint-release.yaml workflow's helm package step needs an explicit helm dependency build before push so the upstream subchart bytes ship inside the OCI artifact. That CI change is a follow-up commit on this same branch (separate file scope). |
||
|
|
62d9c7d936 |
fix(charts): drop dependencies block — wrappers carry values overlay only
The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks.
Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values.
This keeps:
- blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd)
- the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork)
- the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>)
Changes:
- 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package.
- 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values.
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up.
After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.
|
||
|
|
8c0f76640c |
feat(charts): G2 wrapper Helm charts for 11 bootstrap-kit components + blueprint-release CI
Per docs/PROVISIONING-PLAN.md and tickets [F] chart. Adds Catalyst-curated wrapper Helm charts at platform/<name>/chart/ for every component the bootstrap-kit installer (introduced in commit |