Commit Graph

13 Commits

Author SHA1 Message Date
e3mrah
ab67a48fe7
fix(blueprints): align blueprint.yaml spec.version with Chart.yaml version (#817) (#819)
TestBootstrapKit_BlueprintCardsHaveRequiredFields was failing on main for
9 blueprints because their platform/<name>/chart/Chart.yaml version had
been bumped without a matching update to platform/<name>/blueprint.yaml
spec.version. The pre-existing failure forced 7 recent PRs to self-merge
with --admin, masking real CI failures.

Aligned spec.version to match Chart.yaml version on:

  cert-manager   1.1.1 -> 1.1.2
  flux           1.1.3 -> 1.1.4
  crossplane     1.1.3 -> 1.1.4
  sealed-secrets 1.1.1 -> 1.1.2
  spire          1.1.4 -> 1.1.7
  nats-jetstream 1.1.1 -> 1.1.2
  openbao        1.2.0  -> 1.2.14
  keycloak       1.3.1 -> 1.3.2
  gitea          1.2.1 -> 1.2.3

Verified locally:

  $ go test ./... -run TestBootstrapKit_BlueprintCardsHaveRequiredFields -count=1
  --- PASS: TestBootstrapKit_BlueprintCardsHaveRequiredFields (0.01s)
      ... all 10 sub-tests pass (cilium + the 9 above)

The existing test (tests/e2e/bootstrap-kit/main_test.go:145) is itself
the drift guardrail: it fails CI whenever Chart.yaml is bumped without a
matching blueprint.yaml bump. No additional script needed.

Closes #817 once verified on main.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-04 22:32:49 +04:00
e3mrah
2b60e944e2
fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681)
* fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook

Caught live on otech43-46: cert-manager DNS-01 challenges for
*.otechN.omani.works failed because the Sovereign-side webhook wrote
challenge TXT records to the Sovereign's local PowerDNS. omani.works is
delegated from Dynadot to ns1/2/3.openova.io which run on contabo's
central PowerDNS — the Sovereign's local PowerDNS is INVISIBLE on the
public DNS chain until pool-domain-manager seals the per-Sovereign NS
delegation. Let's Encrypt resolvers walk the public chain, query
contabo, get NXDOMAIN, the cert never issues. Manual workaround was
seeding challenge TXT directly in contabo PowerDNS.

This PR automates the right write path:

- bp-cert-manager-powerdns-webhook chart bumped to 1.0.4. Default
  powerdns.host flips from "" (skip-render) to https://pdns.openova.io
  (contabo's public PowerDNS API ingress, authoritative for omani.works).
- ClusterIssuer letsencrypt-dns01-prod-powerdns now usable with no
  per-cluster powerdns.host override for the omani.works pool.
  apiKeySecretRef.namespace clarified — upstream ignores it; the Secret
  must live in cert-manager namespace (= ChallengeRequest.ResourceNamespace
  for ClusterIssuers).
- bootstrap-kit slot 49 updated: drops bp-powerdns dependsOn (webhook
  calls out-of-cluster contabo, not local PowerDNS), bumps chart version,
  removes inline powerdns.host override (defaults are correct).
- bootstrap-kit slot 49b (bp-cert-manager-dynadot-webhook) DELETED
  entirely — Dynadot is NOT the API-level authority for omani.works
  subdomains, the dynadot webhook silently fails the same way the
  Sovereign-local powerdns one did.
- clusters/_template/sovereign-tls/cilium-gateway-cert.yaml flips
  issuerRef from letsencrypt-dns01-prod (was dynadot-backed) to
  letsencrypt-dns01-prod-powerdns (the new contabo-backed issuer).
- bp-cert-manager chart: certManager.issuers.dns01.enabled defaults to
  false (deprecated dynadot path). letsencrypt-http01-prod retained for
  per-host certs. Cluster overlays MAY flip dns01.enabled=true for
  non-omani.works pools where Dynadot IS the API-level authority.
- scripts/expected-bootstrap-deps.yaml: drops slot 49b, drops bp-powerdns
  edge from slot 49.
- Documentation (README + blueprint.yaml + Chart.yaml description)
  rewritten to reflect contabo retarget and lifecycle reasoning.

Credential plumbing (out of scope here, must be done in cloud-init):
- Every Sovereign needs a `powerdns-api-credentials` Secret in the
  `cert-manager` namespace whose `api-key` value matches contabo's
  PowerDNS API key. Same seeding pattern as `dynadot-api-credentials`
  in infra/hetzner/cloudinit-control-plane.tftpl.

Caveat — basicAuth on contabo's PowerDNS API ingress: contabo currently
fronts pdns.openova.io with Traefik basicAuth (per
clusters/contabo-mkt/apps/powerdns/helmrelease.yaml). The upstream
zachomedia/cert-manager-webhook-pdns binary supports the X-API-Key
header but not HTTP Basic Auth out of the box. To make this end-to-end
green, contabo's basicAuth requirement must be relaxed (X-API-Key alone
provides the auth posture, and contabo's API endpoint is restricted to
operator IPs by other means OR the Sovereign's webhook needs an
Authorization header injected via the chart's powerdns.headers map
(plaintext password in the ClusterIssuer config — not ideal). This PR
ships the chart side; the basicAuth question is a follow-up on the
contabo side.

Verified locally:
- helm lint platform/cert-manager-powerdns-webhook/chart -> PASS
- helm template platform/cert-manager-powerdns-webhook/chart -> renders
- helm template ... --set clusterIssuer.enabled=true -> renders the
  ClusterIssuer with host="https://pdns.openova.io" + correct apiKey
  Secret reference.
- helm template platform/cert-manager/chart -> renders ONLY
  letsencrypt-http01-prod (the dns01 dynadot issuer correctly gated off).
- scripts/check-bootstrap-deps.sh: net-zero new drift; my branch reduces
  pre-existing errors from 3 to 2 (the dropped slot 49b removed the only
  drift my branch was responsible for).

Closes follow-up to #373. Preconditions for handover URL TLS green
on otech43-46 lineage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): repair YAML structure in expected-bootstrap-deps.yaml

Two pre-existing drifts were blocking dependency-graph-audit CI:

1. Slot 5a (bp-reflector) was missing its closing list separator,
   causing yq to merge the bp-nats-jetstream entry into the bp-reflector
   map and effectively drop bp-reflector from the expected DAG.
   Added explicit `- slot: 7` for bp-nats-jetstream and quoted "5a" so
   yq treats it as a string slot (matches the convention with "49b").

2. bp-powerdns slot 11: actual bootstrap-kit declares dependsOn
   bp-cnpg (live since otech28 — pdns-pg-app secret race) but the
   expected DAG was missing this edge.

This is unblocks merging fix/cert-manager-powerdns-webhook-contabo (PR
above) — these drifts existed on main but weren't surfaced until the
last expected-deps edit forced a re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:12:48 +04:00
e3mrah
a7fa0626b2
feat(platform): add global.imageRegistry to bp-cilium/cert-manager/cert-manager-pdns-webhook/sealed-secrets (PR 1/3 #560) (#562)
* docs(wbs): Mermaid DAG shows actual Phase-8a dependency cascade

Per founder corrective: existing diagram missed the real blockers
surfaced during otech10..otech22 burns. The image-pull-through gap
(#557) and the cross-namespace secret gap (#543, #544) gate every
workload pull from a public registry — without them, Sovereign hits
DockerHub anonymous rate-limit on first provision and 30+ HRs are
ImagePullBackOff/CreateContainerConfigError.

Adds:
- Phase 0b · Image pull-through (#557 + #557B Sovereign-Harbor swap +
  #557C charts global.imageRegistry templating). Edges to NATS / Gitea
  / Harbor / Grafana / Loki / Mimir / PowerDNS / Crossplane /
  cert-manager-powerdns-webhook / Trivy / Kyverno / SPIRE / OpenBao
- Phase 0c · Cross-namespace secrets (#543 ghcr-pull Reflector + #544
  powerdns-api-credentials reflect). Edges to bp-catalyst-platform and
  bp-cert-manager-powerdns-webhook
- Phase 1 additions: #542 kubeconfig CP-IP fix and #547 helmwatch
  38-HR threshold both gate Phase 8a integration test
- Phase 0b → Phase 8b edge: post-handover Sovereign-Harbor swap is
  what makes "zero contabo dependency" DoD-met possible

WBS now reflects the cascade observed live, not the pre-Phase-8a model.

* feat(platform): add global.imageRegistry to bp-cilium/cert-manager/cert-manager-powerdns-webhook/sealed-secrets (PR 1/3, #560)

- bp-cilium 1.1.1→1.1.2: global.imageRegistry stub added; upstream cilium
  subchart does not expose a single registry knob — per-Sovereign overlays
  wire specific image.repository fields alongside this value.
- bp-cert-manager 1.1.1→1.1.2: global.imageRegistry stub added; upstream
  chart exposes per-component image.registry knobs documented in the comment.
- bp-cert-manager-powerdns-webhook 1.0.2→1.0.3: global.imageRegistry stub
  added + deployment.yaml templated to prefix the webhook image repository
  when the value is non-empty. Verified: helm template with
  --set global.imageRegistry=harbor.openova.io produces
  harbor.openova.io/zachomedia/cert-manager-webhook-pdns:<appVersion>.
- bp-sealed-secrets 1.1.1→1.1.2: global.imageRegistry stub added; upstream
  subchart exposes sealed-secrets.image.registry for overlay wiring.

All four charts render clean with default values (empty imageRegistry).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: alierenbaysal <alierenbaysal@openova.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:48:37 +04:00
e3mrah
5502d9aa48
feat(dns): cert-manager-dynadot-webhook for DNS-01 wildcard TLS (closes #159) (#291)
Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer
in bp-cert-manager by shipping the missing piece — a Go binary that
satisfies cert-manager's external webhook contract
(`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json.

Architecture
============

* `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with
  pool-domain-manager and catalyst-dns). Encapsulates the api3.json
  transport, command builders, response decoding, and the safe
  read-modify-write semantics required to never accidentally wipe a
  zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2`
  variant is unexported.
* `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook
  binary. Implements `Solver.Present` via the client's append-only
  `AddRecord` path and `Solver.CleanUp` via the read-modify-write
  `RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`)
  rejects challenges for unmanaged apexes BEFORE any Dynadot call.
* `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm
  wrapper. Templates Deployment + Service + APIService + serving
  Certificate (CA chain via cert-manager Issuer self-signing) +
  RBAC + ServiceAccount. Mirrors the standard cert-manager external-
  webhook deployment shape.
* `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the
  paired ClusterIssuer activates. The interim http01 issuer remains
  templated as the rollback path.

Test results
============

  core/pkg/dynadot-client          — 7 tests PASS  (race-clean)
  core/cmd/cert-manager-dynadot-... — 9 tests PASS  (race-clean)

Test coverage includes a Present/CleanUp round-trip against an
httptest fixture that models Dynadot's zone state, an explicit
unmanaged-domain rejection, a regression preserving a pre-existing
CNAME across the DNS-01 round-trip (the zone-wipe defence), and a
typed-error propagation test that surfaces `ErrInvalidToken` to
cert-manager so the controller will retry.

Helm template smoke render
==========================

`helm template` against the new chart with default values yields 12
resources / 424 lines (APIService, Certificate, ClusterRoleBinding,
Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The
modified bp-cert-manager chart still renders both ClusterIssuers
(`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default
values; flipping `certManager.issuers.dns01.enabled=false` is the
clean rollback.

Smoke command (post-deploy)
===========================

  kubectl get apiservices.apiregistration.k8s.io \
    v1alpha1.acme.dynadot.openova.io
  # Issue a *.<sovereign>.<pool> wildcard cert and watch the
  # Order/Challenge progress through cert-manager.

CI
==

`.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the
pool-domain-manager-build pattern (cosign keyless signing, SBOM
attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager-
dynadot-webhook:<sha>`). Triggered by changes to either the binary or
the shared dynadot-client package.

Closes #159

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:37:47 +04:00
e3mrah
1f5c76def1
fix(platform): sync blueprint.yaml versions with Chart.yaml (#199)
* feat(ui): Playwright cosmetic + step-flow regression guards

15 regression guards in products/catalyst/bootstrap/ui/e2e/cosmetic-
guards.spec.ts that fail HARD when each user-flagged defect class
returns:

  1.  card height drift from canonical 108px
  2.  reserved right padding eating description width
  3.  logo tile drift from per-brand LOGO_SURFACE
  4.  invisible glyph (white-on-white) via luminance proxy
  5.  wizard step order Org/Topology/Provider/Credentials/Components/
      Domain/Review
  6.  legacy "Choose Your Stack" / "Always Included" tab labels
  7.  Domain step reachable before Components
  8.  CPX32 not the recommended Hetzner SKU
  9.  per-region SKU dropdown shows wrong provider catalog
  10. provision page is .html (static) not SPA route
  11. legacy bubble/edge DAG SVG markup on provision page
  12. admin sidebar drift from canonical core/console (w-56 + 7 labels)
  13. AppDetail uses tablist instead of sectioned layout
  14. job rows navigate to /job/<id> instead of expand-in-place
  15. Phase 0 banners (Hetzner infra / Cluster bootstrap) on AdminPage

Each test prints a failure message naming the canonical reference,
the source-of-truth file, and the data-testid PR needed (if any) so
the implementing agent has a precise target. No .skip() — per
INVIOLABLE-PRINCIPLES #2, missing components fail loud.

CI: .github/workflows/cosmetic-guards.yaml runs the suite on every
PR that touches products/catalyst/bootstrap/ui/** or core/console/**.

Docs: docs/UI-REGRESSION-GUARDS.md maps each test to the user's
original complaint, the canonical reference, and the green/red
semantics (5 tests intentionally RED on main today — they stay red
until the companion-agent's UI work lands).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(platform): sync blueprint.yaml versions with Chart.yaml so manifest-validation passes

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:07:55 +04:00
hatiyildiz
b1638f51ea fix(bp-* tests): skip helm dep build when charts/ already vendored
Earlier rerun failure on the CI workflow (bp-cert-manager 25120060270):

  Error: no repository definition for https://charts.jetstack.io.
  Please add the missing repos via 'helm repo add'

Root cause: blueprint-release.yaml's earlier `helm dependency build`
step (line 181) successfully resolves the upstream chart and populates
chart/charts/ — but it does NOT `helm repo add` the upstream repo
first. Helm 3.20's `helm dep build` succeeds on the first call by
falling back to direct-URL fetch from Chart.yaml `dependencies[].repository`.
A SECOND `helm dep build` (run by the test script) hits a different
code path that requires the repo to be in the helm repo cache.

Fix: tests/observability-toggle.sh now skips `helm dep build` when
chart/charts/ is already populated (which is always the case in CI
since the workflow's own `helm dependency build` step ran first). Local
dev runs from a fresh checkout still resolve subcharts.

Refs #182

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:12:21 +02:00
hatiyildiz
d34facc040 fix(bp-*): observability toggles default false — break circular CRD dependency
bp-cilium@1.1.0 install fails on every fresh Sovereign with:

  no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
  — ensure CRDs are installed first

Cascades to all 10 other bp-* HelmReleases ("dep is not ready") since
bp-cilium is the root of the bootstrap dep graph. Verified live on
omantel.omani.works 2026-04-29 (issue #182).

Root cause: platform/cilium/chart/values.yaml and
platform/cert-manager/chart/values.yaml hardcoded
`serviceMonitor.enabled: true`. The monitoring.coreos.com/v1 CRDs ship
with kube-prometheus-stack — an Application-tier Blueprint that itself
depends on the bootstrap-kit. Hardcoding `true` creates a circular CRD
ordering: bp-cilium wants the CRD bp-kube-prometheus-stack provides, but
bp-kube-prometheus-stack cannot install before bp-cilium.

The `trustCRDsExist=true` mitigation only suppresses Helm's render-time
gate; the apiserver still rejects the resource at install-time.

Violates INVIOLABLE-PRINCIPLES.md #4 (never hardcode): observability
toggles MUST be operator-tunable, not chart-level constants assuming an
observability tier exists.

This commit:

A. Defaults every observability toggle false in the affected wrappers:
   - platform/cilium/chart/values.yaml:
     cilium.prometheus.enabled: false
     cilium.prometheus.serviceMonitor.enabled: false
     (trustCRDsExist removed — no longer relevant)
   - platform/cert-manager/chart/values.yaml:
     cert-manager.prometheus.enabled: false
     cert-manager.prometheus.servicemonitor.enabled: false
   - platform/crossplane/chart/values.yaml:
     crossplane.metrics.enabled: false
     (uniformity rule — does not break install but holds the invariant)

B. Bumps affected wrapper charts 1.1.0 → 1.1.1:
   - bp-cilium, bp-cert-manager, bp-crossplane (leaves)
   - bp-catalyst-platform (umbrella; deps repinned to 1.1.1 for the 3)

C. Updates clusters/_template/bootstrap-kit/* and
   clusters/omantel.omani.works/bootstrap-kit/* HelmRelease versions to
   1.1.1 so the live Sovereign picks up the fix on Flux reconcile.

D. Adds platform/<name>/chart/tests/observability-toggle.sh under each
   affected chart. Each script asserts:
     - default render produces zero monitoring.coreos.com refs
     - opt-in render with --set <toggle>=true succeeds and produces a
       ServiceMonitor (proves the toggle is wired)
     - explicit-off render succeeds and produces zero refs
   Wired into .github/workflows/blueprint-release.yaml via a new
   "Run chart integration tests" step that executes every chart/tests/
   *.sh on every publish — a regression that re-introduces a hardcoded
   `true` fails the publish job before the OCI artifact is pushed.

E. Documents the rule in docs/BLUEPRINT-AUTHORING.md §11.2
   "Observability toggles must default false". References Principle #4
   and provides the canonical pattern (default off in wrapper values,
   opt-in via per-cluster overlay at clusters/<sovereign>/...).

Per-chart audit table (which toggle was hardcoded → new default):

| Chart            | Toggle                                                   | Was  | Now   |
|------------------|----------------------------------------------------------|------|-------|
| bp-cilium        | cilium.prometheus.enabled                                | true | false |
| bp-cilium        | cilium.prometheus.serviceMonitor.enabled                 | true | false |
| bp-cert-manager  | cert-manager.prometheus.enabled                          | true | false |
| bp-cert-manager  | cert-manager.prometheus.servicemonitor.enabled           | true | false |
| bp-crossplane    | crossplane.metrics.enabled                               | true | false |
| bp-flux          | (no observability hardcodes)                             | n/a  | n/a   |
| bp-sealed-secrets| (no observability hardcodes)                             | n/a  | n/a   |
| bp-spire         | (no observability hardcodes)                             | n/a  | n/a   |
| bp-nats-jetstream| (no observability hardcodes)                             | n/a  | n/a   |
| bp-openbao       | (no observability hardcodes)                             | n/a  | n/a   |
| bp-keycloak      | (no observability hardcodes)                             | n/a  | n/a   |
| bp-gitea         | (no observability hardcodes)                             | n/a  | n/a   |
| bp-powerdns      | (no observability hardcodes)                             | n/a  | n/a   |
| bp-catalyst-platform | (umbrella, no values overlay)                        | n/a  | n/a   |

Local gates green:
  helm dep build      ✓ all 3 affected charts
  helm lint           ✓ all 3
  helm template       ✓ all 3 — 0 monitoring.coreos.com refs in default
  tests/observability-toggle.sh  ✓ all 9 sub-cases pass

Closes the install path for bp-cilium 1.1.1 on a fresh Sovereign;
unblocks the full bp-* dep graph.

Refs: https://github.com/openova-io/openova/issues/182

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:08:09 +02:00
hatiyildiz
43aff20254 feat(bp-*): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream
Each platform/<name>/chart/Chart.yaml now declares the canonical upstream
chart as a dependencies: entry. helm dependency build pulls the upstream
payload into the OCI artifact at publish time, so Flux helm install of
bp-<name>:1.1.0 actually installs the upstream Helm release alongside the
Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer,
ExternalSecret) under templates/.

Pinned upstream chart versions per platform/<name>/blueprint.yaml:
- cilium                 1.16.5  https://helm.cilium.io
- cert-manager           v1.16.2 https://charts.jetstack.io
- flux                   2.4.0   https://fluxcd-community.github.io/helm-charts
- crossplane             1.17.x  https://charts.crossplane.io/stable
- sealed-secrets         2.16.x  https://bitnami-labs.github.io/sealed-secrets
- spire                  ...     https://spiffe.github.io/helm-charts-hardened
- nats-jetstream         ...     https://nats-io.github.io/k8s/helm/charts
- openbao                ...     https://openbao.github.io/openbao-helm
- keycloak               ...     https://charts.bitnami.com/bitnami
- gitea                  ...     https://dl.gitea.com/charts
- catalyst-platform      umbrella over the 10 leaf bp-* charts via
                         helm dependency

values.yaml in each chart adopts the umbrella convention: catalystBlueprint
metadata block (provenance + version) at top level, upstream subchart
values namespaced under the dependency name.

cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the
helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER
cert-manager controllers are running and CRDs registered (the previous
hollow-chart shape ran the ClusterIssuer at install time when CRDs
didn't exist yet, which was the omantel cluster's exact failure mode).

Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella
conversion is a meaningful structural revision). Cluster manifests in
clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/
bootstrap-kit/ updated to reference 1.1.0.

The blueprint-release.yaml workflow's helm package step needs an
explicit helm dependency build before push so the upstream subchart
bytes ship inside the OCI artifact. That CI change is a follow-up
commit on this same branch (separate file scope).
2026-04-29 17:21:36 +02:00
hatiyildiz
97e942e0bc feat(cert-manager): #113 — Lets Encrypt DNS-01 + HTTP-01 ClusterIssuers
Adds platform/cert-manager/chart/templates/clusterissuer-letsencrypt-dns01.yaml
with two ClusterIssuers, both Catalyst-curated, rendered conditionally
from values.yaml:

- letsencrypt-dns01-prod (TARGET STATE, default disabled) — ACME DNS-01
  via the cert-manager webhook solver, pointing at a future
  `cert-manager-dynadot-webhook` Catalyst binary that will implement the
  webhook.acme.cert-manager.io/v1alpha1 contract against the existing
  internal/dynadot/ package. Shipping the issuer template ahead of the
  webhook so cluster overlays only need a values flip + secret ref —
  no template edits — once the webhook lands.

- letsencrypt-http01-prod (INTERIM, default enabled) — ACME HTTP-01
  via the cilium ingress class. Issues certs for the explicit hostnames
  (console, gitea, harbor, admin, api) but NOT for wildcards; the
  canonical *.<sub>.<domain> record needs DNS-01.

Header comment explains the gap: the Catalyst external-dns webhook
(products/catalyst/bootstrap/api/cmd/external-dns-dynadot-webhook/)
implements a DIFFERENT RPC contract (records.list/add/delete) than what
cert-manager DNS-01 expects (Present/CleanUp on ChallengeRequest CRD),
so it cannot be reused; a dedicated cmd/cert-manager-dynadot-webhook/
must be built. Operator runbook for cutover is in the file header.

values.yaml gains a `certManager.issuers.{email,acmeServer,dns01,http01}`
section so all knobs are runtime-configurable per
docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode); cluster overlays in
clusters/<sovereign>/ can flip dns01.enabled via the bp-catalyst-platform
umbrella's values without rebuilding the Blueprint OCI artifact.

blueprint.yaml gains a spec.outputs section advertising:
- issuerName: letsencrypt-http01-prod (default)
- wildcardIssuerName: letsencrypt-dns01-prod (target state)
- issuerKind: ClusterIssuer

so dependent Blueprints (cilium-gateway, harbor, gitea) can consume the
issuer name without hardcoding it.

Closes #113.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:44:56 +02:00
hatiyildiz
62d9c7d936 fix(charts): drop dependencies block — wrappers carry values overlay only
The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks.

Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values.

This keeps:
- blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd)
- the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork)
- the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>)

Changes:
- 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package.
- 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values.
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up.

After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.
2026-04-28 12:57:29 +02:00
hatiyildiz
8c0f76640c feat(charts): G2 wrapper Helm charts for 11 bootstrap-kit components + blueprint-release CI
Per docs/PROVISIONING-PLAN.md and tickets [F] chart. Adds Catalyst-curated wrapper Helm charts at platform/<name>/chart/ for every component the bootstrap-kit installer (introduced in commit 07b4bcf) needs. Each chart is the canonical bp-<name> source per BLUEPRINT-AUTHORING.md §1's source-location rule.

11 charts created with Chart.yaml + values.yaml + blueprint.yaml each:

Network + GitOps:
- platform/cilium/chart — wraps cilium 1.16.5; kubeProxyReplacement, WireGuard mTLS, Hubble, Gateway API
- platform/flux/chart — wraps flux 2.4.0
- platform/crossplane/chart — wraps crossplane 1.18.0 + provider-hcloud manifest

Security:
- platform/cert-manager/chart — wraps cert-manager 1.16.2 with CRDs+ServiceMonitor
- platform/sealed-secrets/chart — wraps sealed-secrets 2.16.1 (transient bootstrap-only)
- platform/spire/chart — wraps spiffe/spire 1.10.4 (5-min SVID rotation)

Catalyst control-plane services:
- platform/nats-jetstream/chart — wraps nats 2.10.22 (3-node cluster, JetStream + KV)
- platform/openbao/chart — wraps openbao 2.1.0 (3-node Raft, region-local per SECURITY §5)
- platform/keycloak/chart — wraps keycloak 25.0.6 (Bitnami flavor, edge proxy mode)
- platform/gitea/chart — wraps gitea 10.5.0 (CNPG Postgres backend, no chart-bundled valkey/redis since Catalyst control plane uses JetStream)

New platform/ folders (added per AUDIT-PROCEDURE component-count anchor — was 53, now 55):
- platform/spire/README.md — workload identity Catalyst control plane component
- platform/nats-jetstream/README.md — control-plane event spine
- platform/sealed-secrets/README.md — transient bootstrap-only

Each blueprint.yaml declares:
- catalyst.openova.io/v1alpha1 Blueprint kind (canonical CRD per BLUEPRINT-AUTHORING §3)
- visibility: unlisted (mandatory infra, auto-installed by bootstrap kit, not a marketplace card)
- manifests.chart: ./chart pointer
- depends: [] (foundational components have no Blueprint dependencies; control-plane services depend on each other implicitly via bootstrap order, not via Blueprint depends)

.github/workflows/blueprint-release.yaml:
- New CI workflow per BLUEPRINT-AUTHORING §11 (path-matrix per Blueprint folder)
- Triggers on push to main touching platform/*/chart/** or products/*/chart/**
- detect job: emits matrix of changed Blueprint folders via git diff
- build job (per chart): helm dependency build → helm package → helm push to GHCR → cosign keyless sign (GitHub OIDC) → Syft SBOM attestation
- Output: ghcr.io/openova-io/bp-<name>:<semver> with SLSA-3-style supply-chain provenance

Closes [F] tickets: 11 G2 charts (cilium, cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak, gitea, plus the umbrella products/catalyst/chart already exists from Pass 105). blueprint.yaml CRDs added across 11 entries. CI fan-out workflow live.

After this commit lands, the bootstrap-kit installer in commit 07b4bcf has real OCI artifacts to install. The first push to main will trigger 10 build matrix jobs (cilium was created in a separate commit earlier in this session) which produce 10 cosigned bp-<name>:<semver> artifacts on GHCR.

Component-count anchor update follows: 53 → 55 (added spire + nats-jetstream + sealed-secrets — but sealed-secrets was already conceptually counted under "supporting services"). Per AUDIT-PROCEDURE the count needs updating in CLAUDE.md, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST L11. Tracked as separate ticket [K] docs.
2026-04-28 12:51:06 +02:00
hatiyildiz
14ed84de41 docs(pass-8): role-in-Catalyst banners + dead-link fix in component READMEs
Pass 8 — line-by-line read of platform/cnpg, platform/strimzi,
platform/k8gb, platform/keycloak, platform/cert-manager, platform/cilium.

CNPG and Strimzi: read in full and confirmed clean — they correctly
position themselves as Application Blueprints and don't drift from
the canonical model. CNPG's `<org>-postgres-dr` cluster name
(Application-tier database role) is acceptable per NAMING-CONVENTION
§1.3 (which only forbids primary/dr in K8s host-cluster names, not
in Application-internal CRD names).

Four READMEs updated:

k8gb:
- Header reframed: per-host-cluster infrastructure pointer to
  PLATFORM-TECH-STACK §3.1 and SRE §2.4 split-brain protection.
- Removed dead link to ../failover-controller/docs/ADR-FAILOVER-
  CONTROLLER.md (the failover-controller folder has no docs/);
  replaced with link to that component's README + SRE §2.4.

keycloak:
- Header reframed from "FAPI Authorization Server for Open Banking"
  (narrow) to "User identity for Catalyst Sovereigns" (broad).
  Keycloak handles ALL user identity in Catalyst, not just FAPI.
- Added per-Org / per-Sovereign topology callout matching SECURITY
  §6. Clarified that "Multi-tenant TPP" refers to PSD2 Third Party
  Providers, not Catalyst's Organization-level multi-tenancy.
- FAPI features kept since Keycloak still serves Fingate as the
  FAPI Authorization Server.

cert-manager:
- Header reframed as per-host-cluster infrastructure with pointer
  to PLATFORM-TECH-STACK §3.3.

cilium:
- Header reframed as per-host-cluster infrastructure with pointer
  to PLATFORM-TECH-STACK §3.1, including the install-first note
  (CNI must come before any other workload during Phase 0).

VALIDATION-LOG: Pass 8 entry added.

Refs #37
2026-04-27 21:39:03 +02:00
talent-mesh
c9d04a53b4 refactor: flatten platform/ structure (41 components)
Remove hierarchical grouping (networking/, security/, etc.) and use flat
structure for all 41 platform components.

Changes:
- All components now directly under platform/ (no subfolders)
- AI Hub components moved from meta-platforms/ai-hub/components/ to platform/
- Open Banking components (lago, openmeter) moved to platform/
- meta-platforms/ now only contains README files that reference platform/
- Open Banking custom services remain in meta-platforms/open-banking/services/

Structure:
- platform/ (41 components, flat)
- meta-platforms/ai-hub/ (README only, references platform/)
- meta-platforms/open-banking/ (README + 6 custom services)

All documentation links updated.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 15:19:48 +00:00