Commit Graph

6 Commits

Author SHA1 Message Date
hatiyildiz
1ddd569789 fix(bp-*): observability toggles default false — break circular CRD dependency
Extends the v1.1.1 hardening that started with cilium / cert-manager /
crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints.
Every observability toggle in every Catalyst-curated Blueprint now ships
`false`/`null` by default; the operator opts in via a per-cluster values
overlay at clusters/<sovereign>/bootstrap-kit/* once
bp-kube-prometheus-stack reconciles.

Live failure mode that prompted this (omantel.omani.works 2026-04-29):
bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor
to true. The upstream Cilium 1.16.5 chart renders a
monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with
kube-prometheus-stack — a tier-2 Application Blueprint that depends on
the bootstrap-kit (cilium first). Helm install fails on a fresh
Sovereign with "no matches for kind ServiceMonitor in version
monitoring.coreos.com/v1 — ensure CRDs are installed first" and every
downstream HelmRelease reports `dep is not ready`. The earlier
trustCRDsExist=true mitigation only suppresses Helm's render-time gate;
the apiserver still rejects the resource at install-time.

Per-Blueprint changes:
- bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false;
  hubble.metrics.enabled → null (this is the exact value that disables
  the upstream metrics ServiceMonitor template branch — verified by
  reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor
  .enabled → false. tests/observability-toggle.sh extended with Case 4
  (default render produces no hubble-relay / hubble-ui Deployments).
- bp-flux: flux2.prometheus.podMonitor.create → false.
- bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled
  → false (explicit lock; upstream already defaults false).
- bp-spire: spire.global.spire.recommendations.enabled +
  recommendations.prometheus → false.
- bp-nats-jetstream: nats.promExporter.enabled +
  promExporter.podMonitor.enabled → false.
- bp-openbao: openbao.injector.metrics.enabled +
  openbao.serviceMonitor.enabled → false.
- bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled
  + metrics.prometheusRule.enabled → false.
- bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.*
  serviceMonitor + prometheusRule → false.
- bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled
  → false (forward-compatibility guard; current upstream
  pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future
  upstream bump cannot silently regress).

Each chart ships a tests/observability-toggle.sh that asserts the rule
in three cases (default off / explicit on opt-in / explicit off) — runs
under blueprint-release.yaml's chart-test gate (added bdeb0f54 + the
existing wiring) before helm push. A regression that re-introduces a
hardcoded enabled: true in any chart fails CI before the OCI artifact
is published.

Versioning:
- All 11 leaf charts bumped 1.1.0 → 1.1.1.
- products/catalyst/chart (bp-catalyst-platform umbrella) deps updated
  to 1.1.1 across the board.
- clusters/_template/bootstrap-kit/03-flux through 10-gitea bumped to
  1.1.1; clusters/omantel.omani.works/bootstrap-kit/* mirror.

docs/BLUEPRINT-AUTHORING.md §11.2 table extended to enumerate every
toggle disabled across all 11 Blueprints. References
docs/INVIOLABLE-PRINCIPLES.md #4.

GATES (all green):
- helm dep build resolves cleanly post-change for every chart whose
  upstream is published (umbrella waits on per-leaf publish).
- helm lint clean on all 11 leaves.
- helm template . default render produces zero monitoring.coreos.com
  references on every leaf (verified locally).
- tests/observability-toggle.sh PASS on all 11 leaves.

Live verification: with v1.1.1 published the omantel.omani.works
HelmRelease can roll forward without a manual values patch — Flux picks
up the new chart digest automatically (semver: 1.x in OCIRepository).

Refs: issue #182.
2026-04-29 19:23:52 +02:00
hatiyildiz
b1638f51ea fix(bp-* tests): skip helm dep build when charts/ already vendored
Earlier rerun failure on the CI workflow (bp-cert-manager 25120060270):

  Error: no repository definition for https://charts.jetstack.io.
  Please add the missing repos via 'helm repo add'

Root cause: blueprint-release.yaml's earlier `helm dependency build`
step (line 181) successfully resolves the upstream chart and populates
chart/charts/ — but it does NOT `helm repo add` the upstream repo
first. Helm 3.20's `helm dep build` succeeds on the first call by
falling back to direct-URL fetch from Chart.yaml `dependencies[].repository`.
A SECOND `helm dep build` (run by the test script) hits a different
code path that requires the repo to be in the helm repo cache.

Fix: tests/observability-toggle.sh now skips `helm dep build` when
chart/charts/ is already populated (which is always the case in CI
since the workflow's own `helm dependency build` step ran first). Local
dev runs from a fresh checkout still resolve subcharts.

Refs #182

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:12:21 +02:00
hatiyildiz
d34facc040 fix(bp-*): observability toggles default false — break circular CRD dependency
bp-cilium@1.1.0 install fails on every fresh Sovereign with:

  no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
  — ensure CRDs are installed first

Cascades to all 10 other bp-* HelmReleases ("dep is not ready") since
bp-cilium is the root of the bootstrap dep graph. Verified live on
omantel.omani.works 2026-04-29 (issue #182).

Root cause: platform/cilium/chart/values.yaml and
platform/cert-manager/chart/values.yaml hardcoded
`serviceMonitor.enabled: true`. The monitoring.coreos.com/v1 CRDs ship
with kube-prometheus-stack — an Application-tier Blueprint that itself
depends on the bootstrap-kit. Hardcoding `true` creates a circular CRD
ordering: bp-cilium wants the CRD bp-kube-prometheus-stack provides, but
bp-kube-prometheus-stack cannot install before bp-cilium.

The `trustCRDsExist=true` mitigation only suppresses Helm's render-time
gate; the apiserver still rejects the resource at install-time.

Violates INVIOLABLE-PRINCIPLES.md #4 (never hardcode): observability
toggles MUST be operator-tunable, not chart-level constants assuming an
observability tier exists.

This commit:

A. Defaults every observability toggle false in the affected wrappers:
   - platform/cilium/chart/values.yaml:
     cilium.prometheus.enabled: false
     cilium.prometheus.serviceMonitor.enabled: false
     (trustCRDsExist removed — no longer relevant)
   - platform/cert-manager/chart/values.yaml:
     cert-manager.prometheus.enabled: false
     cert-manager.prometheus.servicemonitor.enabled: false
   - platform/crossplane/chart/values.yaml:
     crossplane.metrics.enabled: false
     (uniformity rule — does not break install but holds the invariant)

B. Bumps affected wrapper charts 1.1.0 → 1.1.1:
   - bp-cilium, bp-cert-manager, bp-crossplane (leaves)
   - bp-catalyst-platform (umbrella; deps repinned to 1.1.1 for the 3)

C. Updates clusters/_template/bootstrap-kit/* and
   clusters/omantel.omani.works/bootstrap-kit/* HelmRelease versions to
   1.1.1 so the live Sovereign picks up the fix on Flux reconcile.

D. Adds platform/<name>/chart/tests/observability-toggle.sh under each
   affected chart. Each script asserts:
     - default render produces zero monitoring.coreos.com refs
     - opt-in render with --set <toggle>=true succeeds and produces a
       ServiceMonitor (proves the toggle is wired)
     - explicit-off render succeeds and produces zero refs
   Wired into .github/workflows/blueprint-release.yaml via a new
   "Run chart integration tests" step that executes every chart/tests/
   *.sh on every publish — a regression that re-introduces a hardcoded
   `true` fails the publish job before the OCI artifact is pushed.

E. Documents the rule in docs/BLUEPRINT-AUTHORING.md §11.2
   "Observability toggles must default false". References Principle #4
   and provides the canonical pattern (default off in wrapper values,
   opt-in via per-cluster overlay at clusters/<sovereign>/...).

Per-chart audit table (which toggle was hardcoded → new default):

| Chart            | Toggle                                                   | Was  | Now   |
|------------------|----------------------------------------------------------|------|-------|
| bp-cilium        | cilium.prometheus.enabled                                | true | false |
| bp-cilium        | cilium.prometheus.serviceMonitor.enabled                 | true | false |
| bp-cert-manager  | cert-manager.prometheus.enabled                          | true | false |
| bp-cert-manager  | cert-manager.prometheus.servicemonitor.enabled           | true | false |
| bp-crossplane    | crossplane.metrics.enabled                               | true | false |
| bp-flux          | (no observability hardcodes)                             | n/a  | n/a   |
| bp-sealed-secrets| (no observability hardcodes)                             | n/a  | n/a   |
| bp-spire         | (no observability hardcodes)                             | n/a  | n/a   |
| bp-nats-jetstream| (no observability hardcodes)                             | n/a  | n/a   |
| bp-openbao       | (no observability hardcodes)                             | n/a  | n/a   |
| bp-keycloak      | (no observability hardcodes)                             | n/a  | n/a   |
| bp-gitea         | (no observability hardcodes)                             | n/a  | n/a   |
| bp-powerdns      | (no observability hardcodes)                             | n/a  | n/a   |
| bp-catalyst-platform | (umbrella, no values overlay)                        | n/a  | n/a   |

Local gates green:
  helm dep build      ✓ all 3 affected charts
  helm lint           ✓ all 3
  helm template       ✓ all 3 — 0 monitoring.coreos.com refs in default
  tests/observability-toggle.sh  ✓ all 9 sub-cases pass

Closes the install path for bp-cilium 1.1.1 on a fresh Sovereign;
unblocks the full bp-* dep graph.

Refs: https://github.com/openova-io/openova/issues/182

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:08:09 +02:00
hatiyildiz
43aff20254 feat(bp-*): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream
Each platform/<name>/chart/Chart.yaml now declares the canonical upstream
chart as a dependencies: entry. helm dependency build pulls the upstream
payload into the OCI artifact at publish time, so Flux helm install of
bp-<name>:1.1.0 actually installs the upstream Helm release alongside the
Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer,
ExternalSecret) under templates/.

Pinned upstream chart versions per platform/<name>/blueprint.yaml:
- cilium                 1.16.5  https://helm.cilium.io
- cert-manager           v1.16.2 https://charts.jetstack.io
- flux                   2.4.0   https://fluxcd-community.github.io/helm-charts
- crossplane             1.17.x  https://charts.crossplane.io/stable
- sealed-secrets         2.16.x  https://bitnami-labs.github.io/sealed-secrets
- spire                  ...     https://spiffe.github.io/helm-charts-hardened
- nats-jetstream         ...     https://nats-io.github.io/k8s/helm/charts
- openbao                ...     https://openbao.github.io/openbao-helm
- keycloak               ...     https://charts.bitnami.com/bitnami
- gitea                  ...     https://dl.gitea.com/charts
- catalyst-platform      umbrella over the 10 leaf bp-* charts via
                         helm dependency

values.yaml in each chart adopts the umbrella convention: catalystBlueprint
metadata block (provenance + version) at top level, upstream subchart
values namespaced under the dependency name.

cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the
helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER
cert-manager controllers are running and CRDs registered (the previous
hollow-chart shape ran the ClusterIssuer at install time when CRDs
didn't exist yet, which was the omantel cluster's exact failure mode).

Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella
conversion is a meaningful structural revision). Cluster manifests in
clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/
bootstrap-kit/ updated to reference 1.1.0.

The blueprint-release.yaml workflow's helm package step needs an
explicit helm dependency build before push so the upstream subchart
bytes ship inside the OCI artifact. That CI change is a follow-up
commit on this same branch (separate file scope).
2026-04-29 17:21:36 +02:00
hatiyildiz
62d9c7d936 fix(charts): drop dependencies block — wrappers carry values overlay only
The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks.

Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values.

This keeps:
- blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd)
- the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork)
- the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>)

Changes:
- 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package.
- 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values.
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up.

After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.
2026-04-28 12:57:29 +02:00
hatiyildiz
8c0f76640c feat(charts): G2 wrapper Helm charts for 11 bootstrap-kit components + blueprint-release CI
Per docs/PROVISIONING-PLAN.md and tickets [F] chart. Adds Catalyst-curated wrapper Helm charts at platform/<name>/chart/ for every component the bootstrap-kit installer (introduced in commit 07b4bcf) needs. Each chart is the canonical bp-<name> source per BLUEPRINT-AUTHORING.md §1's source-location rule.

11 charts created with Chart.yaml + values.yaml + blueprint.yaml each:

Network + GitOps:
- platform/cilium/chart — wraps cilium 1.16.5; kubeProxyReplacement, WireGuard mTLS, Hubble, Gateway API
- platform/flux/chart — wraps flux 2.4.0
- platform/crossplane/chart — wraps crossplane 1.18.0 + provider-hcloud manifest

Security:
- platform/cert-manager/chart — wraps cert-manager 1.16.2 with CRDs+ServiceMonitor
- platform/sealed-secrets/chart — wraps sealed-secrets 2.16.1 (transient bootstrap-only)
- platform/spire/chart — wraps spiffe/spire 1.10.4 (5-min SVID rotation)

Catalyst control-plane services:
- platform/nats-jetstream/chart — wraps nats 2.10.22 (3-node cluster, JetStream + KV)
- platform/openbao/chart — wraps openbao 2.1.0 (3-node Raft, region-local per SECURITY §5)
- platform/keycloak/chart — wraps keycloak 25.0.6 (Bitnami flavor, edge proxy mode)
- platform/gitea/chart — wraps gitea 10.5.0 (CNPG Postgres backend, no chart-bundled valkey/redis since Catalyst control plane uses JetStream)

New platform/ folders (added per AUDIT-PROCEDURE component-count anchor — was 53, now 55):
- platform/spire/README.md — workload identity Catalyst control plane component
- platform/nats-jetstream/README.md — control-plane event spine
- platform/sealed-secrets/README.md — transient bootstrap-only

Each blueprint.yaml declares:
- catalyst.openova.io/v1alpha1 Blueprint kind (canonical CRD per BLUEPRINT-AUTHORING §3)
- visibility: unlisted (mandatory infra, auto-installed by bootstrap kit, not a marketplace card)
- manifests.chart: ./chart pointer
- depends: [] (foundational components have no Blueprint dependencies; control-plane services depend on each other implicitly via bootstrap order, not via Blueprint depends)

.github/workflows/blueprint-release.yaml:
- New CI workflow per BLUEPRINT-AUTHORING §11 (path-matrix per Blueprint folder)
- Triggers on push to main touching platform/*/chart/** or products/*/chart/**
- detect job: emits matrix of changed Blueprint folders via git diff
- build job (per chart): helm dependency build → helm package → helm push to GHCR → cosign keyless sign (GitHub OIDC) → Syft SBOM attestation
- Output: ghcr.io/openova-io/bp-<name>:<semver> with SLSA-3-style supply-chain provenance

Closes [F] tickets: 11 G2 charts (cilium, cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak, gitea, plus the umbrella products/catalyst/chart already exists from Pass 105). blueprint.yaml CRDs added across 11 entries. CI fan-out workflow live.

After this commit lands, the bootstrap-kit installer in commit 07b4bcf has real OCI artifacts to install. The first push to main will trigger 10 build matrix jobs (cilium was created in a separate commit earlier in this session) which produce 10 cosigned bp-<name>:<semver> artifacts on GHCR.

Component-count anchor update follows: 53 → 55 (added spire + nats-jetstream + sealed-secrets — but sealed-secrets was already conceptually counted under "supporting services"). Per AUDIT-PROCEDURE the count needs updating in CLAUDE.md, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST L11. Tracked as separate ticket [K] docs.
2026-04-28 12:51:06 +02:00