openova/products/catalyst/chart/Chart.yaml

1047 lines
61 KiB
YAML

apiVersion: v2
name: bp-catalyst-platform
# 1.4.22 (#915 SME blockers — issues #934/#940/#941/#942/#943/#944): six
# coupled chart + orchestrator fixes that unblock alice signup gates 2-6
# on a freshly franchised Sovereign. C5-final got Gate 1 GREEN on
# otech113 (2026-05-05) but every downstream gate failed because the SME
# bundle hardcoded contabo-only assumptions:
#
# - #934: auth + notification SME services pinned SMTP env to bytes
# the operator placed in `sme-secrets` via .Values.smeSecrets.smtp.*.
# On a Sovereign nothing populated those values — auth.yaml's POST
# /auth/send-pin returned `failed to send email` and gate 2 (PIN
# delivery) timed out. Fix: sme-secrets.yaml now reads SMTP_*
# from `catalyst-system/sovereign-smtp-credentials` (the same
# A5-seeded source #883/#905 the chart 1.4.20 catalyst-openova-kc-
# credentials Secret already uses) with source-wins precedence.
# Empty source falls back to legacy chart-level defaults so
# contabo paths stay clean. Both canonical (smtp-host/port/from/
# user/pass) AND legacy (host/port/from/user/password) source-Secret
# key shapes are accepted.
#
# - #940: Sovereign provisioning service shipped with GITHUB_TOKEN
# placeholder bytes AND with GITHUB_OWNER + GITHUB_REPO hardcoded
# to upstream `openova-io/openova` so per-tenant commits attempted
# authenticated POST against api.github.com — failed every time
# with 401. Fix: chart values
# .Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
# repo,branch}} make every GitHub-API coordinate operator-overridable
# with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
# API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
# Provisioning binary's startup gate validates the GITHUB_TOKEN
# does NOT contain placeholder substrings (`<placeholder>`,
# `PLACEHOLDER`, `REPLACE_ME`, ...) and crashes the Pod into
# Pending if it does — the operator sees the misconfig immediately
# instead of after alice signups have failed silently in Pod logs.
#
# - #941: marketplace UI drew "COMING SOON" overlay on every AI +
# Communication card on a fresh Sovereign because catalog handler's
# migrateAppDeployable() map at core/services/catalog/handlers/
# seed.go omitted `openclaw` and `stalwart-mail` even though both
# blueprints (bp-openclaw, bp-stalwart-{sovereign,tenant}) are
# visibility=listed in the embedded blueprints.json. C5-final hit
# "27 apps COMING SOON" because of this — gates 4 (LLM) and 5
# (mail) blocked before alice could click Install. Fix: add both
# slugs to the deployable map.
#
# - #942: configmap.yaml hardcoded REDPANDA_BROKERS to
# `redpanda.talentmesh.svc.cluster.local:9092`. talentmesh ns does
# not exist on a Sovereign and the OpenOva architecture uses NATS
# JetStream as the only local bus per ADR-0001 (slot 09 ships
# bp-nats-jetstream into namespace `nats-jetstream`). Every SME
# service crashlooped at startup with `lookup ...: no such host`,
# blocking gate 3 (tenant ready). Fix: data-driven via
# .Values.smeServices.eventBus.brokers with a topology-aware default
# — Sovereign ⇒ NATS JetStream Service, contabo ⇒ legacy Redpanda
# Service. The ConfigMap key name stays REDPANDA_BROKERS for
# back-compat with existing SME service Go env wiring.
#
# - #943: bp-newapi chart silently skipped Deployment render on a
# fresh Sovereign because the Pod gate REQUIRED operator-supplied
# `database.existingSecret` AND `credentials.existingSecret`. The
# bootstrap-kit slot 80 overlay supplied neither, so NewAPI never
# came up and gate 5 (LLM) timed out. Fix: bp-newapi 1.4.0 auto-
# provisions a CNPG-backed Postgres Cluster + a chart-emitted DSN
# Secret + a Helm-lookup-persistent SESSION_SECRET/CRYPTO_SECRET
# Secret when the operator hasn't overridden either. The
# deployment.yaml gate now passes by default. Capabilities-gated
# on postgresql.cnpg.io/v1 so a cold install before bp-cnpg is
# Ready surfaces as "no Cluster yet" rather than an install error.
#
# - #944 (CRITICAL — cross-cluster pollution): Sovereign provisioning
# service had GIT_BASE_PATH hardcoded to `clusters/contabo-mkt/
# tenants` so every alice tenant overlay landed in the upstream
# openova/openova repo's contabo overlay, which contabo Flux would
# then install on the contabo cluster. C5-final caught + reverted
# the alice2 incident at commit 5715db04 (2026-05-05). Fix:
# provisioning.yaml templates GIT_BASE_PATH from
# .Values.smeServices.provisioning.gitBasePath with a topology-
# aware default `clusters/<sovereignFQDN>/sme-tenants` on
# Sovereigns. Provisioning binary's startup AND every commit code
# path validate the path begins with `clusters/<self-FQDN>/` via
# a new shared `core/services/provisioning/gitguard` package —
# refusing to commit to any other cluster's tree. Defence in depth
# so a runtime env mutation (kubectl exec, ConfigMap update without
# Pod restart, hostile sidecar) cannot bypass the check.
#
# Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
# 13-bp-catalyst-platform.yaml bumps from 1.4.21 → 1.4.22.
# Coupled bp-newapi bump 1.3.0 → 1.4.0 for the #943 CNPG auto-
# provisioning. 2026-05-05.
#
# 1.4.20 (#924): Phase-2 SMTP source-wins extended to non-secret fields
# (smtp-host, smtp-port, smtp-from) AND to canonical key shape `smtp-user`/
# `smtp-pass` in addition to legacy `user`/`password`. Pairs with the
# new bp-stalwart-sovereign chart whose post-install Job materialises
# `catalyst-system/sovereign-smtp-credentials` carrying Sovereign-local
# infrastructure addresses (`mail.<sovereignFQDN>` / `noreply@<sovereignFQDN>`).
# Once bp-stalwart-sovereign installs (bootstrap-kit slot 95), the
# next Flux reconcile of THIS umbrella picks up the Sovereign-local
# coordinates and Console PIN delivery flips from mothership relay
# (`mail.openova.io`, Phase-1 #883) to Sovereign-local relay without
# operator action. Pre-#924 catalyst-system/sovereign-smtp-credentials
# carried only credentials and the chart fell back to
# .Values.sovereign.smtp.* defaults — that fallback path remains as
# the Sovereign-without-bp-stalwart-sovereign back-compat seam.
# 1.4.24 (#934 follow-up): smeSecrets.smtp.{host,port,from,user}
# defaults flipped from "" to the mothership relay
# (mail.openova.io:587, noreply@openova.io). On otech113 the
# `catalyst-system/sovereign-smtp-credentials` Secret seeded by A5's
# provisioner only carried smtp-user + smtp-pass (host/port/from
# missing in the seed) — sme-secrets source-wins lookup correctly
# kept SMTP_HOST="" because the source field was unset, but the
# auth Pod then failed `failed to send email` for gate 2 (PIN
# delivery). Defaults match `.Values.sovereign.smtp.*` which is the
# proven catalyst-api PIN delivery path. When A5 ships the missing
# host/port/from coverage these defaults become unused (source wins).
# 2026-05-05.
# 1.4.26 (#957 follow-up): catalyst-api-cutover-driver ClusterRole
# gains a `create tokenreviews.authentication.k8s.io` rule so that
# HandleCutoverInternalTrigger can validate the auto-trigger Job's
# projected SA token via the apiserver's TokenReview API. Without
# this rule the endpoint returns 502 "token-review-failed" on every
# call; PR #947 wired the endpoint but not its RBAC. Caught live on
# otech113 2026-05-05 — chart 0.1.18 fixed the readiness-probe loop
# but every trigger immediately got 502 in <10ms (synchronous
# apiserver permission rejection). 2026-05-05.
version: 1.4.34
appVersion: 1.4.34
description: |
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
Composes the catalyst-{ui,api}, console, admin, marketplace UI modules and the marketplace-api backend.
Deployed via Flux on Catalyst-Zero (Contabo k3s) and on every franchised Sovereign provisioned by Catalyst-Zero.
Per docs/PROVISIONING-PLAN.md — this is the canonical bp-catalyst-platform Helm chart.
As of 1.1.9 this umbrella contains ONLY the Catalyst-Zero control-plane
workloads (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign
HTTPRoute). Foundation Blueprints (cilium, cert-manager, flux,
crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak,
gitea) are installed independently by the bootstrap-kit at slots
01..10 (see clusters/_template/bootstrap-kit/). Each lands in its own
namespace (flux-system, cert-manager, kube-system, etc.) under its own
Flux HelmRelease — install order owned by Flux dependsOn rather than
this umbrella's Helm dependency graph.
Bumped to 1.1.1 in lockstep with bp-external-dns 1.1.0 to reflect the
dependency removal. Bumped to 1.1.2 to pull in bp-flux:1.1.2 — the
catastrophic-double-install fix (omantel.omani.works incident,
2026-04-29). See docs/RUNBOOK-PROVISIONING.md §"bp-flux double-install".
Bumped to 1.1.3 to drop three stray kustomize index files
(templates/kustomization.yaml, templates/marketplace-api/kustomization.yaml,
templates/sme-services/kustomization.yaml) that Helm was rendering as
resources with empty metadata.name — Helm post-render rejected the
install on otech.omani.works, 2026-04-30.
Bumped to 1.1.4 to give the bp-keycloak/bp-gitea embedded postgresql
subcharts distinct fullnameOverride values (keycloak-postgresql /
gitea-postgresql). Both bitnami postgresql subcharts default to
`<release>-postgresql`, so they collided as
`catalyst-platform-postgresql.catalyst-system` and Helm post-render
refused the second occurrence — install_failed on otech.omani.works,
2026-04-30 (issue #252).
Bumped to 1.1.5 to remove three legacy Traefik-era ingress template
files (templates/ingress.yaml, templates/sme-services/ingress.yaml,
templates/marketplace-api/ingress.yaml). They emitted
`traefik.io/v1alpha1 Middleware` (strip-sovereign, strip-nova,
root-to-nova) plus Ingress objects hardcoded to `console.openova.io` /
`admin.openova.io` / `marketplace.openova.io` / `openova.io` with
`ingressClassName: traefik`. Sovereigns use Cilium native gateway
(per docs/ARCHITECTURE.md §11) — Traefik CRDs are not installed and
never will be — and per-Sovereign Catalyst hostnames are
`console.${SOVEREIGN_FQDN}` / `admin.${SOVEREIGN_FQDN}` etc., not the
contabo-mkt openova.io domain. Helm install was failing on otech with
`no matches for kind "Middleware" in version "traefik.io/v1alpha1"`.
Per-Sovereign HTTPRoute resources for the Catalyst console/admin/
marketplace will be authored separately (out of scope here) — issue
#279, 2026-04-30.
Bumped to 1.1.6 to delete the entire `templates/sme-services/`
directory (admin/auth/billing/catalog/configmap/console/domain/
gateway/marketplace/notification/provisioning/serviceaccounts/tenant
— 13 manifests, ~36 resources). Every one of them was hardcoded to
`namespace: sme` and to `sme.openova.io` URLs. The SME microservice
mesh is a contabo-mkt-only product (the OpenOva.io marketplace) that
was dragged into the Catalyst umbrella during Group C cutover; it
has no role on franchised Sovereigns. Sovereigns don't run SME and
don't have an `sme` namespace, so the Helm install was failing with
`failed to create resource: namespaces "sme" not found` on
otech.omani.works. Resolution: SME services are out of scope for the
bp-catalyst-platform Blueprint — they will be re-homed in a
contabo-mkt-only Kustomization (or a separate `bp-sme` Blueprint)
if/when SME is re-deployed. Issue #281, 2026-04-30.
Bumped to 1.1.9 to remove the 10 foundation-Blueprint subchart
dependencies (bp-cilium, bp-cert-manager, bp-flux, bp-crossplane,
bp-sealed-secrets, bp-spire, bp-nats-jetstream, bp-openbao,
bp-keycloak, bp-gitea). When this umbrella reconciled with
`targetNamespace: catalyst-system`, Helm rendered every subchart's
`flux2` / `cilium` / etc. controllers into catalyst-system —
duplicating the foundation stack the bootstrap-kit had already
installed at slots 01..10 in their own canonical namespaces
(flux-system, cert-manager, kube-system, ...). On Phase-8a-preflight
otech16 (2026-05-02) this manifested as a duplicate source-controller
in catalyst-system NS that other HRs (bp-cnpg, bp-spire,
bp-crossplane-claims) intermittently routed to via service discovery,
failing chart pulls with "i/o timeout" against
`source-controller.catalyst-system.svc.cluster.local`. Resolution:
the umbrella ships ONLY Catalyst-Zero control-plane workloads; the
foundation layer is owned end-to-end by the bootstrap-kit. Issue
#510, 2026-05-02.
Bumped to 1.1.12 to add optional=true to the DYNADOT_API_KEY and
DYNADOT_API_SECRET secretKeyRef entries in the catalyst-api Deployment.
Sovereign clusters don't hold Dynadot credentials (their tenant DNS
is served by the Sovereign's own PowerDNS instance); without
optional=true Kubernetes refuses to start the pod when the
dynadot-api-credentials Secret is absent, crashlooping catalyst-api
on every new Sovereign. The fix mirrors the existing optional=true on
DYNADOT_MANAGED_DOMAINS and DYNADOT_DOMAIN. Issue #547, 2026-05-02.
Bumped to 1.1.13 to rename all imagePullSecrets references from
ghcr-pull-secret to ghcr-pull (canonical name written by cloud-init at
/var/lib/catalyst/ghcr-pull-secret.yaml). The wrong name was causing
ImagePullBackOff on catalyst-api, catalyst-ui, marketplace-api and all
11 SME service deployments. Paired with new bp-reflector (slot 05a)
that auto-mirrors flux-system/ghcr-pull to every namespace via
reflector.v1.k8s.emberstack.com annotations. Issue #543, 2026-05-02.
Bumped to 1.1.14 to add global.imageRegistry value and template all
Catalyst-authored image refs (catalyst-api, catalyst-ui, marketplace-api,
console, and all 10 SME service deployments). Post-handover per-Sovereign
overlays set global.imageRegistry to the local Harbor mirror. Issue #560.
Bumped to 1.1.15 to rebuild catalyst-ui with Vite base: '/' (was
/sovereign/). The previous base caused blank pages on Sovereign clusters:
the browser requested /sovereign/assets/index-*.js but nginx served the
dist at / so every asset returned 404. On contabo
(console.openova.io/sovereign/*) Traefik's strip-sovereign Middleware strips
the prefix before reaching nginx — both environments now serve assets at
/assets/* as expected. Also fixes router.tsx basepath from '/sovereign' to
'/' so TanStack Router Link/navigate calls emit correct paths. Issue #596,
2026-05-02.
Bumped to 1.1.16 to bundle catalyst-ui image tag 59fb2b7 (Vite base:/
fix from #596) into the OCI chart values.yaml. Chart 1.1.15 was
published at commit 32c5e433 before the deploy job updated values.yaml
SHA tags to 59fb2b7, so Sovereigns pulling 1.1.15 got the old
ccc3898 image. 1.1.16 ships with catalystUi.tag + catalystApi.tag =
59fb2b7 baked in. Issue #596, 2026-05-02.
Bumped to 1.2.0 — feature add: GET /auth/handover seamless single-identity
flow (issue #606, Phase-8b Agent C). Adds:
- CATALYST_KC_ADDR / CATALYST_KC_SA_CLIENT_ID / CATALYST_KC_SA_CLIENT_SECRET env
- CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH env + Secret volume for handover JWK
Sovereign-side catalyst-api pods receive the operator's browser redirect from
Catalyst-Zero, validate the one-time RS256 JWT, create/update the operator in
Keycloak (sovereign realm), exchange for a user session via token-exchange,
set HttpOnly session cookies, and redirect to /console/dashboard. 2026-05-02.
Bumped to 1.2.1 — Option-B pure passwordless magic-link (issue #614,
Phase-8b). Replaces Agent A's Keycloak execute-actions-email (PKCE) flow with
a fully server-side path:
- catalyst-api mints its own RS256 JWT (same signer keypair as Agent B)
- Sends link via Stalwart SMTP (noreply@openova.io)
- GET /api/v1/auth/magic validates JWT, single-use jti, KC token-exchange,
sets HttpOnly cookies, redirects to /sovereign/wizard
- ZERO Keycloak UI exposure, ZERO browser PKCE round-trip
Adds CATALYST_OPENOVA_KC_* env refs from new catalyst-openova-kc-credentials
Secret + CATALYST_SESSION_COOKIE_DOMAIN. 2026-05-02.
Bumped to 1.2.5 — Phase-8b live followup on otech48 (2026-05-03). Two
handover bugs caught on the live single-identity flow:
1. Sovereign-side catalyst-api responded to GET /auth/handover with
"server misconfiguration: public key unavailable" — the K8s Secret
`catalyst-handover-jwt-public` was never created, so the optional
Secret-volume mount fell through and the JWK file was absent inside
the container. 1.2.0 wired the volume mount but no provisioning
step materialised the Secret. Fix paired with infra/hetzner/
cloudinit-control-plane.tftpl — cloud-init now writes the Secret
manifest into catalyst-system NS and runcmd applies it BEFORE
flux-bootstrap, mirroring the canonical pattern that flux-system/
ghcr-pull (PR #543) and flux-system/harbor-robot-token (PR #680)
already follow. The chart-side change moves the volume mount off
the catalyst-api PVC (mountPath /etc/catalyst/handover-jwt-public,
no subPath) so a leftover empty directory in the PVC from pre-#606
installs cannot collide with a re-provisioned Secret mount, and
updates CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH to point at the new
location.
2. /auth/handover validator rejected every valid JWT with 401
"invalid audience" because SOVEREIGN_FQDN was unset — the audience
check collapsed to the literal "https://console." prefix.
bp-catalyst-platform's HelmRelease overlay was already setting
`global.sovereignFQDN` but the chart template never plumbed it
through to the Pod env. Added a SOVEREIGN_FQDN env reading
`.Values.global.sovereignFQDN` (default "" so Catalyst-Zero
installs, where catalyst-api is the SIGNER not the validator,
stay clean).
Verifies live on otech49+ — fresh provision should reach
https://console.otech49.omani.works/auth/handover?token=... and
exchange to a Keycloak session WITHOUT manual Secret creation.
Issue #606 followup, 2026-05-03.
Bumped to 1.2.3 — RCA + permanent fix for catalyst-api Pods stuck in
CreateContainerConfigError on every fresh Sovereign because the
required (non-optional) `harbor-robot-token` secretKeyRef had no
source. Caught live on otech43, otech45, otech46 — operator was
hand-creating a placeholder Secret each iteration. Root cause: the
chart references `harbor-robot-token` as required but nothing
materialised it on the Sovereign cluster. The token VALUE was
already arriving (cloud-init interpolates var.harbor_robot_token
into /etc/rancher/k3s/registries.yaml), but no Kubernetes Secret
was created for catalyst-api to mount. Fix paired with
infra/hetzner/cloudinit-control-plane.tftpl: cloud-init now writes
/var/lib/catalyst/harbor-robot-token-secret.yaml into flux-system ns
with auto-mirror Reflector annotations, runcmd applies it BEFORE
flux-bootstrap, and bp-reflector (slot 05a) propagates it into
catalyst-system on first reconcile — exactly the canonical pattern
flux-system/ghcr-pull already uses (PR #543). Chart-side change is
a comment update on the secretKeyRef explaining the new seam.
Issue #557 follow-up, 2026-05-03.
Bumped to 1.2.6 — Phase-1 watcher status transition fix (otech48
incident, 2026-05-03). All 37 bp-* HelmReleases reached Ready=True
on the Sovereign cluster but the catalyst-api deployment record
stayed status=phase1-watching. Wizard's POST /mint-handover-token
returned 409 not-handover-ready, blocking the auto-redirect to
console.otech48.omani.works/auth/handover.
Root cause: helmwatch's terminate-on-all-done gate required
`len(observed) >= MinBootstrapKitHRs`. Chart shipped
CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=38 (matched the kit count
it was originally tuned against), but the actual bootstrap-kit
cardinality had drifted to 37 — making the gate permanently
unsatisfiable. Watch ran until 60-minute WatchTimeout fired.
Fix:
- helmwatch: gate terminate-on-all-done on the informer's
HasSynced signal (after WaitForCacheSync the full bp-* set is
in cache regardless of cardinality). MinBootstrapKitHRs stays
as a defence-in-depth floor (now default 1).
- chart env: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=1 (was 38).
- watcher: emit operator-visible "All N blueprints reconciled.
Sovereign ready for handover." SSE event on transition
(idempotent).
- handler: persistDeployment after markPhase1Done so the on-disk
JSON reflects status=ready before any wizard poll. Refuse to
downgrade adopted status on late watcher events. Issue #TBD.
Bumped to 1.3.1 — Phase-8b handover DNS-resolution fix (otech94
incident, 2026-05-04, issue #781). On a fresh Sovereign the
handover URL returned `{"error":"keycloak error: ensure user"}`
with a `dial tcp: lookup auth.<sov-fqdn> on 10.43.0.10:53: no
such host` inside the catalyst-api Pod. Root cause: the cluster's
CoreDNS resolves *.<sov-fqdn> via the upstream resolvers — it
does NOT forward to the in-cluster PowerDNS that holds those
records. Public DNS works (PowerDNS authoritative), but Pod-side
lookups of auth.<sov-fqdn> return NXDOMAIN.
No catalyst chart manifest needed change (api-deployment.yaml
already reads CATALYST_KC_ADDR from a secretKeyRef into
catalyst-kc-sa-credentials). The fix lives in bp-keycloak 1.3.2:
the Secret's `addr` value now resolves to the in-cluster Service
URL (http://keycloak.keycloak.svc.cluster.local) instead of the
public gateway host (https://auth.<sov-fqdn>). The HTTPRoute
hostname (.Values.gateway.host) stays at auth.<sov-fqdn> for
operator browsers — only the catalyst-api Pod's intra-cluster
OAuth client_credentials calls switch to the Service URL.
Catalyst-Zero (contabo) uses keycloak-zero (separate chart) and
is unaffected. 2026-05-04.
Bumped to 1.3.2 — Day-2 cutover RBAC P0 fix (otech102 incident,
2026-05-04, issue #830 Bug 1). The /api/v1/sovereign/cutover/start
endpoint returned 502 status-read-failed: "User
\"system:serviceaccount:catalyst-system:default\" cannot get resource
\"configmaps\" in API group \"\" in the namespace \"catalyst\"". The
catalyst-api Pod was running under the catalyst-system/default
ServiceAccount with no Role/ClusterRole binding to read or patch the
cutover ConfigMaps + create/watch Jobs in the `catalyst` namespace
where bp-self-sovereign-cutover ships its step ConfigMaps.
Fix: add a dedicated ServiceAccount + ClusterRole + ClusterRoleBinding
shipped by THIS chart:
- serviceaccount-cutover-driver.yaml — ServiceAccount
catalyst-api-cutover-driver in catalyst-system
- clusterrole-cutover-driver.yaml — ClusterRole granting
get/list/watch + patch on configmaps; create/get/list/watch/
delete/patch on batch/jobs; get/list/watch on pods + apps/
deployments + apps/daemonsets; create on events. Per
feedback_rbac_create_no_resourcenames.md the `create` verbs are
split into their own Rule WITHOUT resourceNames (combining
create + resourceNames produces 403 every POST).
- clusterrolebinding-cutover-driver.yaml — bind the SA to the
ClusterRole at cluster scope (cutover namespace is runtime-
configurable via CATALYST_CUTOVER_NAMESPACE).
Plus api-deployment.yaml: spec.serviceAccountName set to
catalyst-api-cutover-driver. Issue #830, 2026-05-04.
Bumped to 1.4.0 — multi-zone parent-domain support (issue #827,
parent epic #825). A franchised Sovereign now supports N parent
zones, NOT one. New values:
- parentZones: [] — list of parent domains (`omani.works`,
`omani.trade`, ...)
- wildcardCert.enabled — toggle the per-zone Cert render
- wildcardCert.namespace — kube-system (Cilium Gateway home)
- wildcardCert.issuerName — letsencrypt-dns01-prod-powerdns
- catalystApi.powerdnsURL — base URL of the Sovereign's
in-cluster PowerDNS REST API,
threaded into the catalyst-api Pod
as CATALYST_POWERDNS_API_URL so the
admin-console "Add another parent
domain" flow (#829) can call the
real PowerDNS for runtime zone
creation. Empty = in-code default
(powerdns.powerdns.svc:8081).
New template templates/sovereign-wildcard-certs.yaml renders one
cert-manager.io/v1.Certificate per parentZone. Each cert renews
independently; a stalled DNS-01 challenge on one zone does not
block another. The chart skips render entirely when parentZones
is empty so the legacy single-zone path
(clusters/_template/sovereign-tls/cilium-gateway-cert.yaml) keeps
ownership of `sovereign-wildcard-tls` without helm-vs-kustomize
ownership flap. Pairs with bp-powerdns 1.2.0 (which now creates
N zones at install time via a Helm hook Job) and the
/api/v1/sovereign/parent-domains catalyst-api endpoint (the
admin-console add-domain flow #829). 2026-05-04.
Bumped to 1.4.1 — Day-2 cutover RBAC dual-mode fix (issue #830 Bug 1
follow-up, 2026-05-04). Chart 1.3.2 shipped serviceaccount-cutover-
driver.yaml + clusterrole-cutover-driver.yaml + clusterrolebinding-
cutover-driver.yaml with `{{ .Release.Namespace }}` directives that
rendered fine via Helm on Sovereigns but BROKE the Kustomize-mode
contabo-mkt deploy: the directives made Kustomize parse the files as
invalid YAML and silently skip them. Worse, the new files were never
added to templates/kustomization.yaml's resources list, so even if
the YAML had been valid Kustomize would not have rendered them.
Result on contabo: catalyst-api Pod's spec.serviceAccountName
references a non-existent SA — the Pod fails ContainerCreating with
the same RBAC forbidden error #830 was meant to fix.
Fix:
- Strip all `{{ .Release.Namespace }}` directives from the SA +
ClusterRole files. metadata.namespace auto-fills from Helm's
--namespace flag and from Kustomize's `namespace:` directive.
- Split ClusterRoleBinding into Helm-only +
Kustomize-only sibling files because Helm does NOT auto-inject
subjects[0].namespace the way it does metadata.namespace, and the
apiserver rejects bindings without it. clusterrolebinding-
cutover-driver.yaml uses {{ .Release.Namespace }} (Helm-only,
excluded from .helmignore for Sovereigns); clusterrolebinding-
cutover-driver-kustomize.yaml omits subjects[0].namespace and
relies on Kustomize's native injection (contabo-only).
- Add the three new files to templates/kustomization.yaml's
resources list so Kustomize-mode (contabo-mkt) actually renders
them.
This fix mirrors the same dual-mode contract documented in api-
deployment.yaml comments. Verified with `helm template` (subjects[0].
namespace=catalyst-system) AND `kubectl kustomize` (subjects[0].
namespace=catalyst). 2026-05-04.
Bumped to 1.4.2 — dual-mode contract violation in 1.4.0
CATALYST_POWERDNS_API_URL block (issue #830 follow-up, 2026-05-04).
PR #838 introduced two `value: {{ default "..." .Values... | quote }}`
Helm directives in api-deployment.yaml's CATALYST_POWERDNS_API_URL +
CATALYST_POWERDNS_SERVER_ID env entries. Both broke the Kustomize-
mode contabo-mkt build with "yaml: invalid map key: map[string]
interface {}{...}", stalling every contabo reconciliation including
THIS chart's own RBAC fix from 1.4.1.
Same pattern as the SOVEREIGN_FQDN block right below in the same
file (extensively documented as a dual-mode hazard): replace the
Helm directive with a literal default. The in-cluster Service URL
is a non-secret constant on every Sovereign that ships bp-powerdns
at its canonical release name; per-Sovereign overrides are still
possible via the HelmRelease overlay's `catalystApi.env` additional-
env patch (which takes precedence). 2026-05-04.
Bumped to 1.4.3 — auto-provision SME Postgres + secrets bundle on
Sovereign install (issue #859, 2026-05-04). The 11 SME service
Deployments (auth, billing, catalog, console, domain, gateway,
marketplace, notification, provisioning, tenant — plus admin which
has no DB/secret refs) reference two cluster-scoped resources:
- `sme-pg-app` Secret (basic-auth: username + password) backing the
sme-pg-rw.sme.svc.cluster.local Postgres Service
- `sme-secrets` Secret with 11 keys: JWT_SECRET, JWT_REFRESH_SECRET,
GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, SMTP_HOST/PORT/FROM/USER/
PASS, ADMIN_EMAIL, ADMIN_PASSWORD
On contabo-mkt these are pre-provisioned in
clusters/contabo-mkt/apps/sme/data/{postgresql,secrets}.yaml. On a
freshly franchised Sovereign nothing equivalent existed — caught
live on otech103 (2026-05-04 23:18 Berlin) where 10 of 11 SME pods
landed in CreateContainerConfigError after MARKETPLACE_ENABLED=true.
Fix:
- templates/sme-services/cnpg-cluster.yaml — gated on the same
.Values.ingress.marketplace.enabled flag the rest of the SME
bundle uses. Renders postgresql.cnpg.io/v1.Cluster `sme-pg` in
`sme` namespace, instances=1, storage=10Gi, primary DB sme_auth
+ secondary DB sme_billing via postInitApplicationSQL. CNPG
auto-creates `sme-pg-app` Secret and the `sme-pg-rw` Service.
Capabilities-gated on postgresql.cnpg.io/v1 so a misordered
overlay surfaces as "no Cluster yet" rather than chart install
failure (mirrors platform/powerdns/chart/templates/cnpg-cluster.
yaml). bp-catalyst-platform (slot 13) declares dependsOn:
bp-cnpg (slot 16) — already in place since 2026-05-02 (see
1.1.9 changelog) — so by reconcile time the CRD is registered.
- templates/sme-services/sme-secrets.yaml — gated on the same
flag. JWT_SECRET / JWT_REFRESH_SECRET / ADMIN_PASSWORD are
auto-generated via sprig randAlphaNum (64 / 64 / 32 chars
respectively) AND PERSISTED across reconciles via Helm `lookup`
— same load-bearing pattern as platform/gitea/chart/templates/
admin-secret.yaml (issue #830 Bug 2). Without lookup every
reconcile would invalidate every active SME session and lock
out every admin (feedback_passwords.md). Operator-supplied
GOOGLE_CLIENT_*, SMTP_* values default to empty placeholders;
operator brings real values via the per-Sovereign overlay or
the admin-console signup form. helm.sh/resource-policy: keep
so the Secret survives helm uninstall.
- values.yaml — add `smePostgres.cluster.*` (storage / pgVersion
/ resources / ...) and `smeSecrets.{smtp,admin}.*` blocks; both
fully data-driven per Inviolable Principle #4.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.2 → 1.4.3. 2026-05-04.
Bumped to 1.4.4 — deploy FerretDB in sme ns + cross-ns Valkey wire
to unblock catalog/tenant/domain SME services on franchised
Sovereigns (issue #861, 2026-05-04). After 1.4.3 landed sme-pg +
sme-secrets, 7/12 SME pods reached Running on otech103 but 3 stayed
in CrashLoopBackOff with the same DNS error:
catalog: failed to ping MongoDB
error=...lookup ferretdb.sme.svc.cluster.local on 10.43.0.10:53:
no such host
Root cause: SME service ConfigMap (sme-services-config) hardcoded
two URLs that have no Sovereign-side workload behind them:
- MONGODB_URI: mongodb://ferretdb.sme.svc.cluster.local:27017
(FerretDB has no Deployment on Sovereigns — only on contabo-mkt
via clusters/contabo-mkt/apps/sme/data/ferretdb.yaml)
- VALKEY_ADDR: valkey.sme.svc.cluster.local:6379
(bp-valkey 1.0.0 deploys to namespace `valkey`, not `sme`,
and exposes Services `valkey-primary` / `valkey-replicas` /
`valkey-headless` — no plain `valkey` service)
Fix:
- NEW templates/sme-services/ferretdb.yaml — gated on the same
.Values.ingress.marketplace.enabled flag. Deployment + Service
`ferretdb` in `sme` ns, image pinned ghcr.io/ferretdb/ferretdb:1.24
(matches contabo's data/ferretdb.yaml — v2.x requires PostgreSQL
with the DocumentDB extension which the sme-pg CNPG cluster from
PR #860 does not ship; v1.24 works against vanilla CNPG postgres:
16 and is the proven path). Backed by sme-pg via FERRETDB_POSTGRESQL_
URL env interpolating PG_USER + PG_PASSWORD from the sme-pg-app
Secret (auto-created by CNPG in 1.4.3) and pointing at
sme-pg-rw.sme.svc.cluster.local:5432/sme_documents. Image is
operator-overridable via .Values.smeServices.ferretdb.{image,tag}
(Inviolable Principle #4).
- cnpg-cluster.yaml — extend postInitApplicationSQL to also
CREATE DATABASE sme_documents OWNER sme so FerretDB has a DB to
write into on first install. The DB list is data-driven from
.Values.smePostgres.cluster.additionalDatabases (defaulting to
[sme_billing, sme_documents]) so adding a new SME service is a
values-only change.
- configmap.yaml — VALKEY_ADDR now reads from .Values.smeServices.
valkey.host (default valkey-primary.valkey.svc.cluster.local:6379
— the actual Service name bitnami/valkey 5.5.1 with replication
architecture renders, NOT the issue's `valkey.valkey.svc.cluster.
local` which doesn't exist on Sovereigns). MONGODB_URI also uses
.Values.smeServices.ferretdb.{host,port} for symmetry.
- NEW templates/sme-services/valkey-cross-ns-policy.yaml —
CiliumNetworkPolicy in `valkey` namespace allowing ingress on
6379/TCP from any Pod in the `sme` namespace. Defense-in-depth on
top of bp-valkey 1.0.0's upstream NetworkPolicy (which already
permits port 6379 from any source). Gated on the same
marketplace.enabled flag.
- values.yaml — add `smeServices.ferretdb.{image,tag,replicas,
resources}` and `smeServices.valkey.host` blocks. Every URL,
image ref, and resource value is operator-overridable per
Inviolable Principle #4.
Known follow-up: bp-valkey ships with `auth.enabled: true` (bitnami
default). SME services pass only VALKEY_ADDR (no password env). Two
remediation paths exist: (a) per-Sovereign overlay disables
bp-valkey auth, or (b) plumb VALKEY_PASSWORD through SME service
Deployments + service code. Filed separately. This PR ships the
infrastructure (FQDN + CiliumNetworkPolicy) so the wire is in place
when one of those auth fixes lands.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.3 → 1.4.4. 2026-05-04.
Bumped to 1.4.5 — wire VALKEY_PASSWORD into SME auth + gateway services
to clear cross-ns Valkey auth crashloop on franchised Sovereigns
(issue #863, 2026-05-04). After 1.4.4 landed FerretDB + the cross-ns
CiliumNetworkPolicy, 11/13 SME pods reached Running 1/1 on otech103
but `auth` stayed in CrashLoopBackOff and `gateway`'s rate limiter
was disabled, both with the same error:
ERROR failed to connect to Valkey error="NOAUTH HELLO must be
called with the client already authenticated, otherwise the
HELLO <proto> AUTH <user> <pass> option can be used..."
Root cause: bp-valkey 1.0.0 (slot 17) ships with `auth.enabled=true`
(bitnami valkey 5.5.1 default convention). The bitnami subchart
auto-generates a random password and exposes it via the
`valkey-password` key in the `valkey` Secret in the `valkey`
namespace. SME service code (`core/services/shared/db/valkey.go`)
only accepted an addr — no password — and the auth.yaml + gateway.yaml
Deployments only set VALKEY_ADDR. Cross-ns AUTH was never plumbed
through. Pre-1.4.4 this was masked because VALKEY_ADDR pointed at a
non-existent `valkey.sme.svc.cluster.local` and the connect failed
at DNS not at AUTH.
Fix:
- core/services/shared/db/valkey.go — add ConnectValkeyWithAuth
overload that takes username + password. ConnectValkey kept
backwards-compatible for callers that don't pass auth (contabo-mkt
auth-less in-namespace Valkey under data/valkey.yaml).
- core/services/auth/main.go + core/services/gateway/main.go —
read VALKEY_USERNAME + VALKEY_PASSWORD env, call
ConnectValkeyWithAuth when password is non-empty, else fall through
to the no-auth path. Empty password = current contabo behaviour.
- NEW templates/sme-services/valkey-cross-ns-secret.yaml — use Helm
`lookup` to read the bp-valkey auto-generated password from
`valkey/valkey` Secret and re-emit it as `sme-valkey-auth` in
`sme` namespace. Same lookup-and-mirror pattern as
sme-secrets.yaml (issue #859) and gitea-admin-secret (issue #830
Bug 2). On first install the lookup may return nil — Flux's 15m
reconcile picks up the mirror once bp-valkey is Ready.
- auth.yaml + gateway.yaml — add VALKEY_PASSWORD env reading from
`sme-valkey-auth` Secret with `optional: true` so contabo-mkt's
auth-less Valkey path keeps working when the mirror Secret is
absent. valkey-go's `default` ACL user uses `requirepass`, so
VALKEY_USERNAME stays unset by convention.
- values.yaml — add `smeServices.valkey.{sourceSecretName,
sourcePasswordKey, destNamespace, destSecretName}` knobs so a
forked bp-valkey with non-default Secret naming can override
without forking the chart (Inviolable Principle #4).
No SME smeTag bump needed at chart-source time — the
services-build.yaml workflow rebuilds the auth + gateway images
from this commit's SHA and updates the `image:` line in auth.yaml +
gateway.yaml directly. The chart's blueprint-release pipeline picks
up those updated SHAs in its values.yaml on the next chart push.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.4 → 1.4.5. 2026-05-04.
Bumped to 1.4.6 — bundle the rebuilt services-auth + services-gateway
image SHA fa4395f from PR #864 into the chart artifact (issue #863
follow-up, 2026-05-05). 1.4.5 was published at commit fa4395fa BEFORE
the deploy job updated auth.yaml's hardcoded `image:` to fa4395f, so
Sovereigns pulling 1.4.5 got the OLD image (5cdb738) without the
ConnectValkeyWithAuth Go change — VALKEY_PASSWORD env was wired but
the binary ignored it and still hit "NOAUTH HELLO" on connect.
Same race documented in the 1.1.16 changelog above (catalyst-ui
base:/ fix). 1.4.6 republishes the chart with the deploy-committed
image SHAs already in tree (auth.yaml + gateway.yaml `image:` lines
point at fa4395f as of commit 9731701c).
No template/code changes — pure version bump to roll a fresh OCI
artifact whose `helm template` output references the
ConnectValkeyWithAuth-enabled image.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.5 → 1.4.6. 2026-05-05.
Bumped to 1.4.7 — provision the `provisioning-github-token` Secret
on Sovereign install so the last 1/13 SME pod (provisioning) reaches
Running 1/1 (issue #866, 2026-05-04). After 1.4.6 cleared 12/13 SME
pods on otech103, the provisioning Deployment stayed in
CreateContainerConfigError waiting on
`secret/provisioning-github-token` (key GITHUB_TOKEN) which exists
on contabo-mkt as a hand-rolled SealedSecret but had no Sovereign-
side equivalent. Without this Secret the Pod can't even start —
blocks the full SME stack on every fresh Sovereign.
Fix (issue #866 Option C — local-Gitea target):
Post-cutover the canonical Git target on a Sovereign IS the local
Gitea instance (the GitRepository CRs already point there). New
template templates/sme-services/provisioning-github-token.yaml
uses Helm `lookup` to read the auto-generated gitea admin password
from `gitea/gitea-admin-secret` (already generated by
platform/gitea/chart/templates/admin-secret.yaml with the same
lookup-persistence pattern) and re-emit it as
`sme/provisioning-github-token` under the GITHUB_TOKEN key. Same
lookup-and-mirror precedent as valkey-cross-ns-secret.yaml (#863)
and sme-secrets.yaml (#859).
bp-gitea (slot 10) reaches Ready before bp-catalyst-platform
(slot 13) — the Flux dependsOn chain in
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
lists bp-gitea explicitly — so by the time this template renders,
gitea-admin-secret EXISTS in the gitea namespace and lookup
returns its decoded password.
values.yaml — new `smeServices.provisioning.gitToken.*` block
(sourceNamespace / sourceSecretName / sourcePasswordKey /
destNamespace / destSecretName / destKey) so per-Sovereign
overlays pointing the provisioning service at a non-Gitea Git
host (e.g. a GitHub PAT via OpenBao + ExternalSecret) can swap
the source ref without forking the chart (Inviolable Principle #4).
Out of scope for this chart bump — full Gitea REST-API target
support in core/services/provisioning/github/client.go (which
hardcodes https://api.github.com today) is a follow-up Go change.
This Secret unblocks the Pod reaching Running 1/1, completing the
SME stack 12/13 → 13/13.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.6 → 1.4.7. 2026-05-04.
1.4.8 (issue #868): fix the marketplace UI PIN-signin flow that 503'd
on otech103 because the public /api/* HTTPRoute backend-ref'd a dead
Service (catalyst-system/marketplace-api with zero matching Pods).
Two template fixes:
- templates/sme-services/marketplace-routes.yaml: /api/* rule now
cross-namespace backendRef sme/gateway:8080 (the SME BSS gateway
Pod that already fronts services-auth, catalog, tenant, billing,
provisioning).
- templates/sme-services/marketplace-reference-grant.yaml: extend
`to:` list with the gateway Service so the cross-ns hop is
authorised by Gateway API.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.7 → 1.4.8. 2026-05-04.
1.4.9 (issue #871): no template change — chart-version-only bump to
republish the OCI artifact with the current services-auth image SHA
baked into templates/sme-services/auth.yaml. 1.4.8 was published from
commit 95a06f56 BEFORE the deploy-bot updated auth.yaml's image pin
from `services-auth:fa4395f` (old) → `services-auth:95a06f5` (new,
with the /auth/send-pin alias), so 1.4.8 OCI bytes still reference
the OLD SHA and otech103 reconciled the broken image. Bumping the
chart version forces blueprint-release to publish a fresh artifact
with the current pin. Same race documented in
feedback_idempotent_iac_purge.md and overnight DoD doc as
"deploy-step race". Lockstep slot 13 pin bumps to 1.4.9. 2026-05-05.
1.4.10 (issue #876): wire CATALYST_OTECH_FQDN env on the catalyst-api
Deployment from the same `sovereign-fqdn` ConfigMap (key `fqdn`) that
feeds SOVEREIGN_FQDN. The SME tenant create handler (sme_tenant.go)
and the sovereign-parent-domains seed (sovereign_parent_domains.go)
both read CATALYST_OTECH_FQDN — without it, POST /api/v1/sme/tenants
returns 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign,
and the SME-pool fallback returns an empty list. The two env names
exist for historical reasons (Phase-8b handover vs SME-tier tenant
pipeline) but ultimately point at the Sovereign's public FQDN.
optional=true since Catalyst-Zero (contabo) doesn't run the SME
tenant pipeline. Lockstep slot 13 pin bumps to 1.4.10. 2026-05-05.
1.4.11 (issue #878): wire CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN
env on the catalyst-api Deployment, sourced from the local Gitea
admin secret (`gitea-admin-secret`, keys `username` + `password`).
Without these, the SME tenant pipeline (#804) and the marketplace-
settings GitOps writer fail at the first reconcile with "gitops
token unconfigured" (post-cutover Sovereign has no GitHub PAT — the
GitOps target is the local Gitea). optional=true so Catalyst-Zero
(contabo) keeps using the existing GitHub PAT path. Pairs with a
catalyst-api code change (marketplace_settings.go +
sme_tenant_gitops.go): injectTokenIntoURL now takes a configurable
username (was hardcoded "x-access-token"; GitHub PAT-only) so the
same code path works for both GitHub and Gitea. Also adds `git` to
the catalyst-api Containerfile (Alpine 3.20 base + apk add git) —
the pipeline shells out to git clone/commit/push, and without the
binary the first reconcile fails with `exec: "git": executable
file not found in $PATH`. Lockstep slot 13 pin bumps to 1.4.11.
2026-05-05.
1.4.12 (issue #878 follow-up): chart-version-only bump to republish
the OCI artifact with the new catalyst-api image SHA (7bdd14f) baked
into values.yaml. 1.4.11 was published from commit 7bdd14fc BEFORE
the deploy-bot updated values.yaml's catalystApi.tag from 20413ec ->
7bdd14f, so 1.4.11 OCI bytes still reference the OLD image without
the git binary. Same deploy-step race fixed in CI by #874 (services-
build auto-bumps chart patch + dispatches blueprint-release) — the
catalyst-build workflow needs the equivalent. Until then this manual
bump is required after every catalyst-api image change. Lockstep
slot 13 pin bumps to 1.4.12. 2026-05-05.
1.4.13 (issue #879): unblock the multi-domain Day-2 add-domain happy
path on a fresh post-handover Sovereign. Five stacked wiring fixes,
three of which are chart-side:
Bug 1 — POOL_DOMAIN_MANAGER_URL: api-deployment.yaml now wires
`POOL_DOMAIN_MANAGER_URL=https://pool.openova.io` so the Sovereign-
side catalyst-api hits the public PDM ingress on contabo (the
in-cluster default `pool-domain-manager.openova-system.svc` only
resolves on contabo and is NXDOMAIN on franchised Sovereigns).
Caught live on otech103, 2026-05-05: every Day-2 add-domain POST
failed with `dial tcp: lookup pool-domain-manager.openova-system.
svc.cluster.local: no such host`.
Bug 2 — CATALYST_PDM_BASIC_AUTH_USER / _PASS: api-deployment.yaml
now mounts the `pdm-basicauth` Secret (keys `username`+`password`)
so pdmFlipNS can `Authorization: Basic ...` against the Traefik
basicAuth Middleware in front of pool.openova.io. optional=true:
Catalyst-Zero pods skip the header (in-cluster Service path is
unauthenticated) and CI / older Sovereigns degrade to a clear 401
log line instead of crashlooping. The Secret is provisioned by
cloud-init at handover-time (paired infra change in
cloudinit-control-plane.tftpl).
Bug 5 — HTTPRoute /auth/handover Exact match: httproute.yaml
catalyst-ui rule changed from PathPrefix `/auth/` to Exact
`/auth/handover`. The previous PathPrefix collided with the OIDC
PKCE redirect_uri `/auth/callback` — catalyst-api 404s on that
path because it only registers `/api/v1/auth/callback`. Result
post-handover-JWT-cookie-expiry (8h TTL): the operator could not
log into the Sovereign Console at all (caught live on otech103).
Exact-match keeps /auth/handover routed to catalyst-api while
every other /auth/* path falls through to catalyst-ui's React
Router for client-side OIDC.
Three coupled code-side fixes ship in catalyst-api as part of the
same #879 PR (parent_domains.go):
Bug 2-code: pdmFlipNS now SetBasicAuth from the env (read every
call so a Secret rotation propagates without Pod restart).
Bug 3-code: pdmFlipNS body now includes `nameservers` (computed
from expectedNSFor — PDM's SetNSRequest schema requires it; the
previous body got 422 missing-nameservers).
Bug 4-code: lookupPrimaryDomain falls back to SOVEREIGN_FQDN env
after CATALYST_PRIMARY_DOMAIN. On a post-handover Sovereign no
Deployment record is persisted, so without this fallback GET
/parent-domains returned {"items":[]} and the propagation panel
showed `expectedNs: null`. The SOVEREIGN_FQDN env is already
wired by api-deployment.yaml from the sovereign-fqdn ConfigMap.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.11 → 1.4.12. 2026-05-05.
Bumped to 1.4.13 — Flux Kustomization watching SME tenant overlays
(issue #882, 2026-05-05). The catalyst-api SME-tenant pipeline's
GitOps writer (sme_tenant_gitops.go::WriteTenantOverlay) commits
per-tenant Kustomize overlays to clusters/<sov-fqdn>/sme-tenants/
<tenant-id>/ on every successful POST /api/v1/sme/tenants — but no
Flux Kustomization on the Sovereign cluster watched that path. The
state machine (sme_tenant.go) advanced optimistically through every
step (vcluster → bp_charts → dns → certs → keycloak_clients →
registry) and reported state=done, while no actual K8s resources
materialised because nothing was reconciling the orchestrator's
write target.
Verified live on otech103 (2026-05-04 23:18 Berlin): the orchestrator
successfully committed the 9-file overlay for tenant 15f1e45e-...
to the local Gitea openova/openova repo @main, but `kubectl get hr
-n sme-15f1e45e-...` returned No resources found indefinitely.
Fix: NEW templates/sme-services/sme-tenants-kustomization.yaml,
gated on .Values.ingress.marketplace.enabled (same flag the rest of
the SME bundle uses) — non-marketplace Sovereigns don't run the SME
tenant pipeline so they don't render this Kustomization. Renders one
Flux Kustomization in flux-system that sweeps the entire
./clusters/<sovereignFQDN>/sme-tenants directory tree:
- sourceRef: flux-system/openova GitRepository (the same one the
cluster bootstraps from; cutover Step 5 flips its
.spec.url to the local in-cluster Gitea, which is
precisely where sme_tenant_gitops.go pushes via
CATALYST_GITOPS_REPO_URL=http://gitea-http.gitea.svc
.cluster.local:3000/openova/openova)
- path: ./clusters/{{ .Values.global.sovereignFQDN }}/sme-tenants
- interval: 1m (matches the orchestrator's "Flux reconciles
within ~1 min" SLA documented at the top of
sme_tenant_gitops.go)
- prune: true (DELETE /api/v1/sme/tenants/<id> removes the
overlay directory; Flux GCs the tenant resources)
- wait: false (per-tenant overlays each install ~5 bp-* HRs
asynchronously and have their own readiness watcher
in the orchestrator; blocking this top-level
Kustomization on every tenant's full readiness would
let one stuck tenant gate every other tenant)
Per Inviolable Principle #4 (never hardcode), every knob is
operator-overridable via .Values.smeTenants.kustomization.* —
the GitRepository sourceRef name/namespace, the resource name,
the cadence (interval/retryInterval/timeout), and the toggles
(prune/wait). Defaults match the canonical bootstrap-kit
conventions documented in clusters/_template/bootstrap-kit/03-flux
.yaml + the cloud-init flux-bootstrap.yaml block.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.12 → 1.4.13. 2026-05-05.
1.4.14 (issue #879 follow-up): chart-version-only republish so the OCI
artifact carries the catalyst-api image SHA 7bfd6df (the #879 fix
commit). Chart 1.4.13 was published from commit 7bfd6df5 BEFORE the
deploy-bot updated values.yaml's catalystApi.tag from aa226df ->
7bfd6df, so 1.4.13 OCI bytes still reference the OLD catalyst-api
image without the pdmFlipNS basic-auth + nameservers + lookup-
primary-domain SOVEREIGN_FQDN-fallback fixes. Same deploy-step race
fixed in CI by #874 (services-build auto-bumps chart patch + dispatches
blueprint-release) — the catalyst-build workflow needs the equivalent.
Until then this manual bump is required after every catalyst-api
image change. Lockstep slot 13 pin bumps to 1.4.14. 2026-05-05.
1.4.15 (issue #887): auto-provision marketplace-api-secrets Secret on
Sovereign install. templates/marketplace-api/deployment.yaml has always
referenced a secretKeyRef on `marketplace-api-secrets` (key:
`jwt-secret`); on contabo-mkt this Secret is hand-rolled in
clusters/contabo-mkt/apps/.../marketplace-api-secrets.yaml. On a freshly
franchised Sovereign with ingress.marketplace.enabled=true, nothing
equivalent existed — caught live on otech103 (2026-05-05) where
marketplace-api landed in CreateContainerConfigError "secret not found"
every reconcile. Fix: NEW templates/marketplace-api/secret.yaml uses
Helm `lookup` to persist a 64-char randAlphaNum jwt-secret across
reconciles (same load-bearing pattern as sme-secrets, valkey-cross-ns-
secret, provisioning-github-token, gitea-admin-secret per
feedback_passwords.md). Without lookup every reconcile would
invalidate every active marketplace JWT. helm.sh/resource-policy: keep
so the Secret survives helm uninstall. Lockstep slot 13 pin bumps to
1.4.15. 2026-05-05.
1.4.17 (issue #901): unblock Sovereign Console login on every fresh
provision. https://console.<sov>/login PIN-issue endpoint returned 503
with "CATALYST_OPENOVA_KC_SA_CLIENT_SECRET not set" — a 3-bug chain:
Bug 1: api-deployment.yaml lines 676-739 reference a Secret
`catalyst-openova-kc-credentials` for the full PIN-auth env block
(CATALYST_OPENOVA_KC_* + CATALYST_SMTP_*). On contabo-mkt this Secret
is hand-rolled out-of-band (clusters/contabo-mkt/apps/keycloak-zero/
helmrelease.yaml mounts it via extraEnvVars). On a freshly franchised
Sovereign nothing equivalent existed — every secretKeyRef has
optional=true so the Pod started, but POST /api/v1/auth/pin/issue
503'd on the missing client-secret env. Fix: NEW
templates/catalyst-openova-kc-credentials-secret.yaml mirrors the
canonical KC SA Secret (`keycloak/catalyst-kc-sa-credentials`,
created by bp-keycloak's openbao-bridge post-install hook) into
catalyst-system as `catalyst-openova-kc-credentials` with the key
shape api-deployment.yaml expects. Same Helm-`lookup` persistence
pattern as templates/marketplace-api/secret.yaml (#887),
sme-secrets.yaml (#859), valkey-cross-ns-secret.yaml (#863),
provisioning-github-token.yaml (#866) and gitea-admin-secret.yaml
(#830). helm.sh/resource-policy: keep — Secret survives helm
uninstall.
Sovereign-vs-contabo gate (load-bearing): the new template is
rendered ONLY when `lookup "v1" "Secret" "keycloak"
"catalyst-kc-sa-credentials"` returns non-nil. On Catalyst-Zero
(contabo) Keycloak runs as `keycloak-zero` in its own namespace
and there is NO Secret by that name in the `keycloak` namespace
— lookup returns nil → the template renders empty bytes → the
existing hand-rolled Secret in clusters/contabo-mkt/apps/...
remains untouched (no helm-vs-kustomize ownership flap). The
new file is intentionally NOT added to templates/kustomization.yaml
`resources:` so Kustomize-mode contabo build skips it entirely
(same dual-mode pattern as templates/marketplace-api/secret.yaml).
Bug 2: SMTP host default `stalwart-web.stalwart.svc.cluster.local`
(an in-code constant) doesn't exist on Sovereign — even after Bug 1
the PIN-email delivery would fail at the next step. Fix: chart now
populates smtp-host/smtp-port/smtp-from from .Values.sovereign.smtp.*
defaulting to mail.openova.io:587 / noreply@openova.io. SMTP
user/pass come from a SECONDARY lookup against
`catalyst-system/sovereign-smtp-credentials` (Secret seeded by
cloud-init at provision time — issue #883 follow-up). If the source
Secret is missing, the Secret renders with empty smtp-user/smtp-pass
so the login surface still works and PIN delivery surfaces as a
clear "email delivery failed" log line, not as a 503.
Bug 3: CATALYST_POST_AUTH_REDIRECT default `/sovereign/wizard` is
mothership-only — the wizard page is the Provisioning Wizard the
operator drives at signup, not a post-handover Sovereign page. Fix:
chart-level default flips to `/sovereign/components` (the post-
handover Sovereign Console homepage). Per-Sovereign overlays
override via the catalystApi.env additional-env patch — the chart
value is a literal (per the dual-mode contract documented in the
CATALYST_POWERDNS_API_URL block of api-deployment.yaml).
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.16 → 1.4.17. 2026-05-05.
1.4.18 (issue #910 — TBD): create the `sme` namespace on Sovereigns
where the marketplace is enabled. Every template under
templates/sme-services/* (billing, auth, ferretdb, valkey-cross-ns-
secret, sme-secrets, provisioning-github-token, cnpg-cluster, ...)
emits resources with `namespace: sme`. On Catalyst-Zero (contabo)
the `sme` namespace is pre-provisioned by clusters/contabo-mkt/apps/
sme/* — so the chart never created it. On a fresh franchised
Sovereign nothing else creates the `sme` namespace, so chart 1.4.17
install failed 23 times with `failed to create resource: namespaces
"sme" not found` — caught live on otech105 (2026-05-05). Fix: NEW
templates/sme-services/sme-namespace.yaml gated on the same
ingress.marketplace.enabled flag as the rest of the SME bundle so
non-marketplace Sovereigns and the Kustomize-mode contabo build
(which does NOT include sme-namespace.yaml in templates/sme-services/
kustomization.yaml's `resources:` list) skip this entirely.
helm.sh/resource-policy: keep — never cascade-delete the namespace
on chart uninstall (would erase every SME workload + tenant).
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.17 → 1.4.18. 2026-05-05.
1.4.19 (issue #910 — zero-touch provisioning, Bugs 2 + 3): two
coupled fixes that unblocked Sovereign Console PIN-login on a
freshly franchised cluster (1.4.18 closed Bug 1, the missing `sme`
namespace).
Bug 2 — CATALYST_SESSION_COOKIE_DOMAIN was hardcoded to
console.openova.io in templates/api-deployment.yaml. On a Sovereign
the request host is console.<sov-fqdn>, so the browser silently
rejected the Set-Cookie (RFC 6265 §5.3 step 6 — Domain mismatch)
and every /api/* request landed without a session, redirecting back
to /login forever. Caught live on otech105 (2026-05-05).
Fix: change the literal default to `""` (empty). Per the dual-mode
contract (CATALYST_POWERDNS_API_URL block in api-deployment.yaml),
this MUST stay a literal — Helm template directives in `value:`
fields break the contabo Kustomize-mode build. Empty value is
correct on BOTH paths: when CATALYST_SESSION_COOKIE_DOMAIN is empty
the auth handler omits the Domain attribute and the browser binds
the cookie to the exact request host. On contabo that is
console.openova.io (wizard + magic-link served from the same
host); on a Sovereign that is console.<sov-fqdn> (likewise). Per-
Sovereign overlays MAY override via the catalystApi.env additional-
env patch in the per-cluster HelmRelease for unusual topologies.
Bug 3 — catalyst-openova-kc-credentials-secret.yaml's smtp-user/
smtp-pass lookup used "existing target wins" persistence over the
source `sovereign-smtp-credentials` Secret seeded by A5's
provisioner (issue #883). On first install the source Secret had
not yet been seeded (race between catalyst-api's seedSovereignSMTP
step and the chart reconcile), so the chart rendered empty SMTP
creds, persisted them into the target, and NEVER picked up A5's
seeded bytes on subsequent reconciles. POST /api/v1/auth/pin/issue
502'd with `email-send-failed` for the life of the cluster.
Caught live on otech105 (2026-05-05).
Fix: invert the SMTP-cred lookup precedence. SOURCE
(sovereign-smtp-credentials) wins over the persisted target. Every
Flux reconcile (1m cadence) re-reads the source, so as soon as A5's
seed completes the chart picks it up on the next tick. Operator
rotation: edit sovereign-smtp-credentials (the operator-facing
seam); the target is a chart-derived projection and never an
operator surface. KC fields keep the previous "existing target
wins" contract because bp-keycloak's openbao-bridge auto-rotates
the client-secret on every Helm upgrade and we want that rotation
to require explicit operator action (delete the target) rather
than picking up automatically and rolling the catalyst-api Pod.
No values.yaml schema change. No bootstrap-kit slot 13 envsubst
change. Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.18 → 1.4.19. 2026-05-05.
type: application
# Opt-out from the blueprint-release hollow-chart guard (issue #181 / #510).
# This umbrella legitimately ships only Catalyst-authored workloads
# (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign HTTPRoute);
# the foundation layer is installed independently by the bootstrap-kit
# and must NOT be re-rendered into catalyst-system as subcharts.
annotations:
catalyst.openova.io/no-upstream: "true"
# No subchart dependencies — see 1.1.9 changelog above. The 10
# foundation Blueprints are installed by clusters/_template/bootstrap-kit/
# at their own slots, each as a top-level Flux HelmRelease in its own
# canonical namespace. This umbrella renders only the Catalyst-Zero
# control-plane workloads (catalyst-ui, catalyst-api, ProvisioningState
# CRD, Sovereign HTTPRoute) into targetNamespace: catalyst-system.