1047 lines
61 KiB
YAML
1047 lines
61 KiB
YAML
apiVersion: v2
|
|
name: bp-catalyst-platform
|
|
# 1.4.22 (#915 SME blockers — issues #934/#940/#941/#942/#943/#944): six
|
|
# coupled chart + orchestrator fixes that unblock alice signup gates 2-6
|
|
# on a freshly franchised Sovereign. C5-final got Gate 1 GREEN on
|
|
# otech113 (2026-05-05) but every downstream gate failed because the SME
|
|
# bundle hardcoded contabo-only assumptions:
|
|
#
|
|
# - #934: auth + notification SME services pinned SMTP env to bytes
|
|
# the operator placed in `sme-secrets` via .Values.smeSecrets.smtp.*.
|
|
# On a Sovereign nothing populated those values — auth.yaml's POST
|
|
# /auth/send-pin returned `failed to send email` and gate 2 (PIN
|
|
# delivery) timed out. Fix: sme-secrets.yaml now reads SMTP_*
|
|
# from `catalyst-system/sovereign-smtp-credentials` (the same
|
|
# A5-seeded source #883/#905 the chart 1.4.20 catalyst-openova-kc-
|
|
# credentials Secret already uses) with source-wins precedence.
|
|
# Empty source falls back to legacy chart-level defaults so
|
|
# contabo paths stay clean. Both canonical (smtp-host/port/from/
|
|
# user/pass) AND legacy (host/port/from/user/password) source-Secret
|
|
# key shapes are accepted.
|
|
#
|
|
# - #940: Sovereign provisioning service shipped with GITHUB_TOKEN
|
|
# placeholder bytes AND with GITHUB_OWNER + GITHUB_REPO hardcoded
|
|
# to upstream `openova-io/openova` so per-tenant commits attempted
|
|
# authenticated POST against api.github.com — failed every time
|
|
# with 401. Fix: chart values
|
|
# .Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
|
|
# repo,branch}} make every GitHub-API coordinate operator-overridable
|
|
# with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
|
|
# API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
|
|
# Provisioning binary's startup gate validates the GITHUB_TOKEN
|
|
# does NOT contain placeholder substrings (`<placeholder>`,
|
|
# `PLACEHOLDER`, `REPLACE_ME`, ...) and crashes the Pod into
|
|
# Pending if it does — the operator sees the misconfig immediately
|
|
# instead of after alice signups have failed silently in Pod logs.
|
|
#
|
|
# - #941: marketplace UI drew "COMING SOON" overlay on every AI +
|
|
# Communication card on a fresh Sovereign because catalog handler's
|
|
# migrateAppDeployable() map at core/services/catalog/handlers/
|
|
# seed.go omitted `openclaw` and `stalwart-mail` even though both
|
|
# blueprints (bp-openclaw, bp-stalwart-{sovereign,tenant}) are
|
|
# visibility=listed in the embedded blueprints.json. C5-final hit
|
|
# "27 apps COMING SOON" because of this — gates 4 (LLM) and 5
|
|
# (mail) blocked before alice could click Install. Fix: add both
|
|
# slugs to the deployable map.
|
|
#
|
|
# - #942: configmap.yaml hardcoded REDPANDA_BROKERS to
|
|
# `redpanda.talentmesh.svc.cluster.local:9092`. talentmesh ns does
|
|
# not exist on a Sovereign and the OpenOva architecture uses NATS
|
|
# JetStream as the only local bus per ADR-0001 (slot 09 ships
|
|
# bp-nats-jetstream into namespace `nats-jetstream`). Every SME
|
|
# service crashlooped at startup with `lookup ...: no such host`,
|
|
# blocking gate 3 (tenant ready). Fix: data-driven via
|
|
# .Values.smeServices.eventBus.brokers with a topology-aware default
|
|
# — Sovereign ⇒ NATS JetStream Service, contabo ⇒ legacy Redpanda
|
|
# Service. The ConfigMap key name stays REDPANDA_BROKERS for
|
|
# back-compat with existing SME service Go env wiring.
|
|
#
|
|
# - #943: bp-newapi chart silently skipped Deployment render on a
|
|
# fresh Sovereign because the Pod gate REQUIRED operator-supplied
|
|
# `database.existingSecret` AND `credentials.existingSecret`. The
|
|
# bootstrap-kit slot 80 overlay supplied neither, so NewAPI never
|
|
# came up and gate 5 (LLM) timed out. Fix: bp-newapi 1.4.0 auto-
|
|
# provisions a CNPG-backed Postgres Cluster + a chart-emitted DSN
|
|
# Secret + a Helm-lookup-persistent SESSION_SECRET/CRYPTO_SECRET
|
|
# Secret when the operator hasn't overridden either. The
|
|
# deployment.yaml gate now passes by default. Capabilities-gated
|
|
# on postgresql.cnpg.io/v1 so a cold install before bp-cnpg is
|
|
# Ready surfaces as "no Cluster yet" rather than an install error.
|
|
#
|
|
# - #944 (CRITICAL — cross-cluster pollution): Sovereign provisioning
|
|
# service had GIT_BASE_PATH hardcoded to `clusters/contabo-mkt/
|
|
# tenants` so every alice tenant overlay landed in the upstream
|
|
# openova/openova repo's contabo overlay, which contabo Flux would
|
|
# then install on the contabo cluster. C5-final caught + reverted
|
|
# the alice2 incident at commit 5715db04 (2026-05-05). Fix:
|
|
# provisioning.yaml templates GIT_BASE_PATH from
|
|
# .Values.smeServices.provisioning.gitBasePath with a topology-
|
|
# aware default `clusters/<sovereignFQDN>/sme-tenants` on
|
|
# Sovereigns. Provisioning binary's startup AND every commit code
|
|
# path validate the path begins with `clusters/<self-FQDN>/` via
|
|
# a new shared `core/services/provisioning/gitguard` package —
|
|
# refusing to commit to any other cluster's tree. Defence in depth
|
|
# so a runtime env mutation (kubectl exec, ConfigMap update without
|
|
# Pod restart, hostile sidecar) cannot bypass the check.
|
|
#
|
|
# Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
# 13-bp-catalyst-platform.yaml bumps from 1.4.21 → 1.4.22.
|
|
# Coupled bp-newapi bump 1.3.0 → 1.4.0 for the #943 CNPG auto-
|
|
# provisioning. 2026-05-05.
|
|
#
|
|
# 1.4.20 (#924): Phase-2 SMTP source-wins extended to non-secret fields
|
|
# (smtp-host, smtp-port, smtp-from) AND to canonical key shape `smtp-user`/
|
|
# `smtp-pass` in addition to legacy `user`/`password`. Pairs with the
|
|
# new bp-stalwart-sovereign chart whose post-install Job materialises
|
|
# `catalyst-system/sovereign-smtp-credentials` carrying Sovereign-local
|
|
# infrastructure addresses (`mail.<sovereignFQDN>` / `noreply@<sovereignFQDN>`).
|
|
# Once bp-stalwart-sovereign installs (bootstrap-kit slot 95), the
|
|
# next Flux reconcile of THIS umbrella picks up the Sovereign-local
|
|
# coordinates and Console PIN delivery flips from mothership relay
|
|
# (`mail.openova.io`, Phase-1 #883) to Sovereign-local relay without
|
|
# operator action. Pre-#924 catalyst-system/sovereign-smtp-credentials
|
|
# carried only credentials and the chart fell back to
|
|
# .Values.sovereign.smtp.* defaults — that fallback path remains as
|
|
# the Sovereign-without-bp-stalwart-sovereign back-compat seam.
|
|
# 1.4.24 (#934 follow-up): smeSecrets.smtp.{host,port,from,user}
|
|
# defaults flipped from "" to the mothership relay
|
|
# (mail.openova.io:587, noreply@openova.io). On otech113 the
|
|
# `catalyst-system/sovereign-smtp-credentials` Secret seeded by A5's
|
|
# provisioner only carried smtp-user + smtp-pass (host/port/from
|
|
# missing in the seed) — sme-secrets source-wins lookup correctly
|
|
# kept SMTP_HOST="" because the source field was unset, but the
|
|
# auth Pod then failed `failed to send email` for gate 2 (PIN
|
|
# delivery). Defaults match `.Values.sovereign.smtp.*` which is the
|
|
# proven catalyst-api PIN delivery path. When A5 ships the missing
|
|
# host/port/from coverage these defaults become unused (source wins).
|
|
# 2026-05-05.
|
|
# 1.4.26 (#957 follow-up): catalyst-api-cutover-driver ClusterRole
|
|
# gains a `create tokenreviews.authentication.k8s.io` rule so that
|
|
# HandleCutoverInternalTrigger can validate the auto-trigger Job's
|
|
# projected SA token via the apiserver's TokenReview API. Without
|
|
# this rule the endpoint returns 502 "token-review-failed" on every
|
|
# call; PR #947 wired the endpoint but not its RBAC. Caught live on
|
|
# otech113 2026-05-05 — chart 0.1.18 fixed the readiness-probe loop
|
|
# but every trigger immediately got 502 in <10ms (synchronous
|
|
# apiserver permission rejection). 2026-05-05.
|
|
version: 1.4.34
|
|
appVersion: 1.4.34
|
|
description: |
|
|
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
|
|
Composes the catalyst-{ui,api}, console, admin, marketplace UI modules and the marketplace-api backend.
|
|
Deployed via Flux on Catalyst-Zero (Contabo k3s) and on every franchised Sovereign provisioned by Catalyst-Zero.
|
|
Per docs/PROVISIONING-PLAN.md — this is the canonical bp-catalyst-platform Helm chart.
|
|
|
|
As of 1.1.9 this umbrella contains ONLY the Catalyst-Zero control-plane
|
|
workloads (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign
|
|
HTTPRoute). Foundation Blueprints (cilium, cert-manager, flux,
|
|
crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak,
|
|
gitea) are installed independently by the bootstrap-kit at slots
|
|
01..10 (see clusters/_template/bootstrap-kit/). Each lands in its own
|
|
namespace (flux-system, cert-manager, kube-system, etc.) under its own
|
|
Flux HelmRelease — install order owned by Flux dependsOn rather than
|
|
this umbrella's Helm dependency graph.
|
|
|
|
Bumped to 1.1.1 in lockstep with bp-external-dns 1.1.0 to reflect the
|
|
dependency removal. Bumped to 1.1.2 to pull in bp-flux:1.1.2 — the
|
|
catastrophic-double-install fix (omantel.omani.works incident,
|
|
2026-04-29). See docs/RUNBOOK-PROVISIONING.md §"bp-flux double-install".
|
|
Bumped to 1.1.3 to drop three stray kustomize index files
|
|
(templates/kustomization.yaml, templates/marketplace-api/kustomization.yaml,
|
|
templates/sme-services/kustomization.yaml) that Helm was rendering as
|
|
resources with empty metadata.name — Helm post-render rejected the
|
|
install on otech.omani.works, 2026-04-30.
|
|
Bumped to 1.1.4 to give the bp-keycloak/bp-gitea embedded postgresql
|
|
subcharts distinct fullnameOverride values (keycloak-postgresql /
|
|
gitea-postgresql). Both bitnami postgresql subcharts default to
|
|
`<release>-postgresql`, so they collided as
|
|
`catalyst-platform-postgresql.catalyst-system` and Helm post-render
|
|
refused the second occurrence — install_failed on otech.omani.works,
|
|
2026-04-30 (issue #252).
|
|
Bumped to 1.1.5 to remove three legacy Traefik-era ingress template
|
|
files (templates/ingress.yaml, templates/sme-services/ingress.yaml,
|
|
templates/marketplace-api/ingress.yaml). They emitted
|
|
`traefik.io/v1alpha1 Middleware` (strip-sovereign, strip-nova,
|
|
root-to-nova) plus Ingress objects hardcoded to `console.openova.io` /
|
|
`admin.openova.io` / `marketplace.openova.io` / `openova.io` with
|
|
`ingressClassName: traefik`. Sovereigns use Cilium native gateway
|
|
(per docs/ARCHITECTURE.md §11) — Traefik CRDs are not installed and
|
|
never will be — and per-Sovereign Catalyst hostnames are
|
|
`console.${SOVEREIGN_FQDN}` / `admin.${SOVEREIGN_FQDN}` etc., not the
|
|
contabo-mkt openova.io domain. Helm install was failing on otech with
|
|
`no matches for kind "Middleware" in version "traefik.io/v1alpha1"`.
|
|
Per-Sovereign HTTPRoute resources for the Catalyst console/admin/
|
|
marketplace will be authored separately (out of scope here) — issue
|
|
#279, 2026-04-30.
|
|
Bumped to 1.1.6 to delete the entire `templates/sme-services/`
|
|
directory (admin/auth/billing/catalog/configmap/console/domain/
|
|
gateway/marketplace/notification/provisioning/serviceaccounts/tenant
|
|
— 13 manifests, ~36 resources). Every one of them was hardcoded to
|
|
`namespace: sme` and to `sme.openova.io` URLs. The SME microservice
|
|
mesh is a contabo-mkt-only product (the OpenOva.io marketplace) that
|
|
was dragged into the Catalyst umbrella during Group C cutover; it
|
|
has no role on franchised Sovereigns. Sovereigns don't run SME and
|
|
don't have an `sme` namespace, so the Helm install was failing with
|
|
`failed to create resource: namespaces "sme" not found` on
|
|
otech.omani.works. Resolution: SME services are out of scope for the
|
|
bp-catalyst-platform Blueprint — they will be re-homed in a
|
|
contabo-mkt-only Kustomization (or a separate `bp-sme` Blueprint)
|
|
if/when SME is re-deployed. Issue #281, 2026-04-30.
|
|
Bumped to 1.1.9 to remove the 10 foundation-Blueprint subchart
|
|
dependencies (bp-cilium, bp-cert-manager, bp-flux, bp-crossplane,
|
|
bp-sealed-secrets, bp-spire, bp-nats-jetstream, bp-openbao,
|
|
bp-keycloak, bp-gitea). When this umbrella reconciled with
|
|
`targetNamespace: catalyst-system`, Helm rendered every subchart's
|
|
`flux2` / `cilium` / etc. controllers into catalyst-system —
|
|
duplicating the foundation stack the bootstrap-kit had already
|
|
installed at slots 01..10 in their own canonical namespaces
|
|
(flux-system, cert-manager, kube-system, ...). On Phase-8a-preflight
|
|
otech16 (2026-05-02) this manifested as a duplicate source-controller
|
|
in catalyst-system NS that other HRs (bp-cnpg, bp-spire,
|
|
bp-crossplane-claims) intermittently routed to via service discovery,
|
|
failing chart pulls with "i/o timeout" against
|
|
`source-controller.catalyst-system.svc.cluster.local`. Resolution:
|
|
the umbrella ships ONLY Catalyst-Zero control-plane workloads; the
|
|
foundation layer is owned end-to-end by the bootstrap-kit. Issue
|
|
#510, 2026-05-02.
|
|
Bumped to 1.1.12 to add optional=true to the DYNADOT_API_KEY and
|
|
DYNADOT_API_SECRET secretKeyRef entries in the catalyst-api Deployment.
|
|
Sovereign clusters don't hold Dynadot credentials (their tenant DNS
|
|
is served by the Sovereign's own PowerDNS instance); without
|
|
optional=true Kubernetes refuses to start the pod when the
|
|
dynadot-api-credentials Secret is absent, crashlooping catalyst-api
|
|
on every new Sovereign. The fix mirrors the existing optional=true on
|
|
DYNADOT_MANAGED_DOMAINS and DYNADOT_DOMAIN. Issue #547, 2026-05-02.
|
|
Bumped to 1.1.13 to rename all imagePullSecrets references from
|
|
ghcr-pull-secret to ghcr-pull (canonical name written by cloud-init at
|
|
/var/lib/catalyst/ghcr-pull-secret.yaml). The wrong name was causing
|
|
ImagePullBackOff on catalyst-api, catalyst-ui, marketplace-api and all
|
|
11 SME service deployments. Paired with new bp-reflector (slot 05a)
|
|
that auto-mirrors flux-system/ghcr-pull to every namespace via
|
|
reflector.v1.k8s.emberstack.com annotations. Issue #543, 2026-05-02.
|
|
Bumped to 1.1.14 to add global.imageRegistry value and template all
|
|
Catalyst-authored image refs (catalyst-api, catalyst-ui, marketplace-api,
|
|
console, and all 10 SME service deployments). Post-handover per-Sovereign
|
|
overlays set global.imageRegistry to the local Harbor mirror. Issue #560.
|
|
Bumped to 1.1.15 to rebuild catalyst-ui with Vite base: '/' (was
|
|
/sovereign/). The previous base caused blank pages on Sovereign clusters:
|
|
the browser requested /sovereign/assets/index-*.js but nginx served the
|
|
dist at / so every asset returned 404. On contabo
|
|
(console.openova.io/sovereign/*) Traefik's strip-sovereign Middleware strips
|
|
the prefix before reaching nginx — both environments now serve assets at
|
|
/assets/* as expected. Also fixes router.tsx basepath from '/sovereign' to
|
|
'/' so TanStack Router Link/navigate calls emit correct paths. Issue #596,
|
|
2026-05-02.
|
|
|
|
Bumped to 1.1.16 to bundle catalyst-ui image tag 59fb2b7 (Vite base:/
|
|
fix from #596) into the OCI chart values.yaml. Chart 1.1.15 was
|
|
published at commit 32c5e433 before the deploy job updated values.yaml
|
|
SHA tags to 59fb2b7, so Sovereigns pulling 1.1.15 got the old
|
|
ccc3898 image. 1.1.16 ships with catalystUi.tag + catalystApi.tag =
|
|
59fb2b7 baked in. Issue #596, 2026-05-02.
|
|
|
|
Bumped to 1.2.0 — feature add: GET /auth/handover seamless single-identity
|
|
flow (issue #606, Phase-8b Agent C). Adds:
|
|
- CATALYST_KC_ADDR / CATALYST_KC_SA_CLIENT_ID / CATALYST_KC_SA_CLIENT_SECRET env
|
|
- CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH env + Secret volume for handover JWK
|
|
Sovereign-side catalyst-api pods receive the operator's browser redirect from
|
|
Catalyst-Zero, validate the one-time RS256 JWT, create/update the operator in
|
|
Keycloak (sovereign realm), exchange for a user session via token-exchange,
|
|
set HttpOnly session cookies, and redirect to /console/dashboard. 2026-05-02.
|
|
|
|
Bumped to 1.2.1 — Option-B pure passwordless magic-link (issue #614,
|
|
Phase-8b). Replaces Agent A's Keycloak execute-actions-email (PKCE) flow with
|
|
a fully server-side path:
|
|
- catalyst-api mints its own RS256 JWT (same signer keypair as Agent B)
|
|
- Sends link via Stalwart SMTP (noreply@openova.io)
|
|
- GET /api/v1/auth/magic validates JWT, single-use jti, KC token-exchange,
|
|
sets HttpOnly cookies, redirects to /sovereign/wizard
|
|
- ZERO Keycloak UI exposure, ZERO browser PKCE round-trip
|
|
Adds CATALYST_OPENOVA_KC_* env refs from new catalyst-openova-kc-credentials
|
|
Secret + CATALYST_SESSION_COOKIE_DOMAIN. 2026-05-02.
|
|
|
|
Bumped to 1.2.5 — Phase-8b live followup on otech48 (2026-05-03). Two
|
|
handover bugs caught on the live single-identity flow:
|
|
|
|
1. Sovereign-side catalyst-api responded to GET /auth/handover with
|
|
"server misconfiguration: public key unavailable" — the K8s Secret
|
|
`catalyst-handover-jwt-public` was never created, so the optional
|
|
Secret-volume mount fell through and the JWK file was absent inside
|
|
the container. 1.2.0 wired the volume mount but no provisioning
|
|
step materialised the Secret. Fix paired with infra/hetzner/
|
|
cloudinit-control-plane.tftpl — cloud-init now writes the Secret
|
|
manifest into catalyst-system NS and runcmd applies it BEFORE
|
|
flux-bootstrap, mirroring the canonical pattern that flux-system/
|
|
ghcr-pull (PR #543) and flux-system/harbor-robot-token (PR #680)
|
|
already follow. The chart-side change moves the volume mount off
|
|
the catalyst-api PVC (mountPath /etc/catalyst/handover-jwt-public,
|
|
no subPath) so a leftover empty directory in the PVC from pre-#606
|
|
installs cannot collide with a re-provisioned Secret mount, and
|
|
updates CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH to point at the new
|
|
location.
|
|
|
|
2. /auth/handover validator rejected every valid JWT with 401
|
|
"invalid audience" because SOVEREIGN_FQDN was unset — the audience
|
|
check collapsed to the literal "https://console." prefix.
|
|
bp-catalyst-platform's HelmRelease overlay was already setting
|
|
`global.sovereignFQDN` but the chart template never plumbed it
|
|
through to the Pod env. Added a SOVEREIGN_FQDN env reading
|
|
`.Values.global.sovereignFQDN` (default "" so Catalyst-Zero
|
|
installs, where catalyst-api is the SIGNER not the validator,
|
|
stay clean).
|
|
|
|
Verifies live on otech49+ — fresh provision should reach
|
|
https://console.otech49.omani.works/auth/handover?token=... and
|
|
exchange to a Keycloak session WITHOUT manual Secret creation.
|
|
Issue #606 followup, 2026-05-03.
|
|
|
|
Bumped to 1.2.3 — RCA + permanent fix for catalyst-api Pods stuck in
|
|
CreateContainerConfigError on every fresh Sovereign because the
|
|
required (non-optional) `harbor-robot-token` secretKeyRef had no
|
|
source. Caught live on otech43, otech45, otech46 — operator was
|
|
hand-creating a placeholder Secret each iteration. Root cause: the
|
|
chart references `harbor-robot-token` as required but nothing
|
|
materialised it on the Sovereign cluster. The token VALUE was
|
|
already arriving (cloud-init interpolates var.harbor_robot_token
|
|
into /etc/rancher/k3s/registries.yaml), but no Kubernetes Secret
|
|
was created for catalyst-api to mount. Fix paired with
|
|
infra/hetzner/cloudinit-control-plane.tftpl: cloud-init now writes
|
|
/var/lib/catalyst/harbor-robot-token-secret.yaml into flux-system ns
|
|
with auto-mirror Reflector annotations, runcmd applies it BEFORE
|
|
flux-bootstrap, and bp-reflector (slot 05a) propagates it into
|
|
catalyst-system on first reconcile — exactly the canonical pattern
|
|
flux-system/ghcr-pull already uses (PR #543). Chart-side change is
|
|
a comment update on the secretKeyRef explaining the new seam.
|
|
Issue #557 follow-up, 2026-05-03.
|
|
|
|
Bumped to 1.2.6 — Phase-1 watcher status transition fix (otech48
|
|
incident, 2026-05-03). All 37 bp-* HelmReleases reached Ready=True
|
|
on the Sovereign cluster but the catalyst-api deployment record
|
|
stayed status=phase1-watching. Wizard's POST /mint-handover-token
|
|
returned 409 not-handover-ready, blocking the auto-redirect to
|
|
console.otech48.omani.works/auth/handover.
|
|
Root cause: helmwatch's terminate-on-all-done gate required
|
|
`len(observed) >= MinBootstrapKitHRs`. Chart shipped
|
|
CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=38 (matched the kit count
|
|
it was originally tuned against), but the actual bootstrap-kit
|
|
cardinality had drifted to 37 — making the gate permanently
|
|
unsatisfiable. Watch ran until 60-minute WatchTimeout fired.
|
|
Fix:
|
|
- helmwatch: gate terminate-on-all-done on the informer's
|
|
HasSynced signal (after WaitForCacheSync the full bp-* set is
|
|
in cache regardless of cardinality). MinBootstrapKitHRs stays
|
|
as a defence-in-depth floor (now default 1).
|
|
- chart env: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=1 (was 38).
|
|
- watcher: emit operator-visible "All N blueprints reconciled.
|
|
Sovereign ready for handover." SSE event on transition
|
|
(idempotent).
|
|
- handler: persistDeployment after markPhase1Done so the on-disk
|
|
JSON reflects status=ready before any wizard poll. Refuse to
|
|
downgrade adopted status on late watcher events. Issue #TBD.
|
|
|
|
Bumped to 1.3.1 — Phase-8b handover DNS-resolution fix (otech94
|
|
incident, 2026-05-04, issue #781). On a fresh Sovereign the
|
|
handover URL returned `{"error":"keycloak error: ensure user"}`
|
|
with a `dial tcp: lookup auth.<sov-fqdn> on 10.43.0.10:53: no
|
|
such host` inside the catalyst-api Pod. Root cause: the cluster's
|
|
CoreDNS resolves *.<sov-fqdn> via the upstream resolvers — it
|
|
does NOT forward to the in-cluster PowerDNS that holds those
|
|
records. Public DNS works (PowerDNS authoritative), but Pod-side
|
|
lookups of auth.<sov-fqdn> return NXDOMAIN.
|
|
|
|
No catalyst chart manifest needed change (api-deployment.yaml
|
|
already reads CATALYST_KC_ADDR from a secretKeyRef into
|
|
catalyst-kc-sa-credentials). The fix lives in bp-keycloak 1.3.2:
|
|
the Secret's `addr` value now resolves to the in-cluster Service
|
|
URL (http://keycloak.keycloak.svc.cluster.local) instead of the
|
|
public gateway host (https://auth.<sov-fqdn>). The HTTPRoute
|
|
hostname (.Values.gateway.host) stays at auth.<sov-fqdn> for
|
|
operator browsers — only the catalyst-api Pod's intra-cluster
|
|
OAuth client_credentials calls switch to the Service URL.
|
|
Catalyst-Zero (contabo) uses keycloak-zero (separate chart) and
|
|
is unaffected. 2026-05-04.
|
|
|
|
Bumped to 1.3.2 — Day-2 cutover RBAC P0 fix (otech102 incident,
|
|
2026-05-04, issue #830 Bug 1). The /api/v1/sovereign/cutover/start
|
|
endpoint returned 502 status-read-failed: "User
|
|
\"system:serviceaccount:catalyst-system:default\" cannot get resource
|
|
\"configmaps\" in API group \"\" in the namespace \"catalyst\"". The
|
|
catalyst-api Pod was running under the catalyst-system/default
|
|
ServiceAccount with no Role/ClusterRole binding to read or patch the
|
|
cutover ConfigMaps + create/watch Jobs in the `catalyst` namespace
|
|
where bp-self-sovereign-cutover ships its step ConfigMaps.
|
|
Fix: add a dedicated ServiceAccount + ClusterRole + ClusterRoleBinding
|
|
shipped by THIS chart:
|
|
- serviceaccount-cutover-driver.yaml — ServiceAccount
|
|
catalyst-api-cutover-driver in catalyst-system
|
|
- clusterrole-cutover-driver.yaml — ClusterRole granting
|
|
get/list/watch + patch on configmaps; create/get/list/watch/
|
|
delete/patch on batch/jobs; get/list/watch on pods + apps/
|
|
deployments + apps/daemonsets; create on events. Per
|
|
feedback_rbac_create_no_resourcenames.md the `create` verbs are
|
|
split into their own Rule WITHOUT resourceNames (combining
|
|
create + resourceNames produces 403 every POST).
|
|
- clusterrolebinding-cutover-driver.yaml — bind the SA to the
|
|
ClusterRole at cluster scope (cutover namespace is runtime-
|
|
configurable via CATALYST_CUTOVER_NAMESPACE).
|
|
Plus api-deployment.yaml: spec.serviceAccountName set to
|
|
catalyst-api-cutover-driver. Issue #830, 2026-05-04.
|
|
|
|
Bumped to 1.4.0 — multi-zone parent-domain support (issue #827,
|
|
parent epic #825). A franchised Sovereign now supports N parent
|
|
zones, NOT one. New values:
|
|
- parentZones: [] — list of parent domains (`omani.works`,
|
|
`omani.trade`, ...)
|
|
- wildcardCert.enabled — toggle the per-zone Cert render
|
|
- wildcardCert.namespace — kube-system (Cilium Gateway home)
|
|
- wildcardCert.issuerName — letsencrypt-dns01-prod-powerdns
|
|
- catalystApi.powerdnsURL — base URL of the Sovereign's
|
|
in-cluster PowerDNS REST API,
|
|
threaded into the catalyst-api Pod
|
|
as CATALYST_POWERDNS_API_URL so the
|
|
admin-console "Add another parent
|
|
domain" flow (#829) can call the
|
|
real PowerDNS for runtime zone
|
|
creation. Empty = in-code default
|
|
(powerdns.powerdns.svc:8081).
|
|
New template templates/sovereign-wildcard-certs.yaml renders one
|
|
cert-manager.io/v1.Certificate per parentZone. Each cert renews
|
|
independently; a stalled DNS-01 challenge on one zone does not
|
|
block another. The chart skips render entirely when parentZones
|
|
is empty so the legacy single-zone path
|
|
(clusters/_template/sovereign-tls/cilium-gateway-cert.yaml) keeps
|
|
ownership of `sovereign-wildcard-tls` without helm-vs-kustomize
|
|
ownership flap. Pairs with bp-powerdns 1.2.0 (which now creates
|
|
N zones at install time via a Helm hook Job) and the
|
|
/api/v1/sovereign/parent-domains catalyst-api endpoint (the
|
|
admin-console add-domain flow #829). 2026-05-04.
|
|
|
|
Bumped to 1.4.1 — Day-2 cutover RBAC dual-mode fix (issue #830 Bug 1
|
|
follow-up, 2026-05-04). Chart 1.3.2 shipped serviceaccount-cutover-
|
|
driver.yaml + clusterrole-cutover-driver.yaml + clusterrolebinding-
|
|
cutover-driver.yaml with `{{ .Release.Namespace }}` directives that
|
|
rendered fine via Helm on Sovereigns but BROKE the Kustomize-mode
|
|
contabo-mkt deploy: the directives made Kustomize parse the files as
|
|
invalid YAML and silently skip them. Worse, the new files were never
|
|
added to templates/kustomization.yaml's resources list, so even if
|
|
the YAML had been valid Kustomize would not have rendered them.
|
|
Result on contabo: catalyst-api Pod's spec.serviceAccountName
|
|
references a non-existent SA — the Pod fails ContainerCreating with
|
|
the same RBAC forbidden error #830 was meant to fix.
|
|
Fix:
|
|
- Strip all `{{ .Release.Namespace }}` directives from the SA +
|
|
ClusterRole files. metadata.namespace auto-fills from Helm's
|
|
--namespace flag and from Kustomize's `namespace:` directive.
|
|
- Split ClusterRoleBinding into Helm-only +
|
|
Kustomize-only sibling files because Helm does NOT auto-inject
|
|
subjects[0].namespace the way it does metadata.namespace, and the
|
|
apiserver rejects bindings without it. clusterrolebinding-
|
|
cutover-driver.yaml uses {{ .Release.Namespace }} (Helm-only,
|
|
excluded from .helmignore for Sovereigns); clusterrolebinding-
|
|
cutover-driver-kustomize.yaml omits subjects[0].namespace and
|
|
relies on Kustomize's native injection (contabo-only).
|
|
- Add the three new files to templates/kustomization.yaml's
|
|
resources list so Kustomize-mode (contabo-mkt) actually renders
|
|
them.
|
|
This fix mirrors the same dual-mode contract documented in api-
|
|
deployment.yaml comments. Verified with `helm template` (subjects[0].
|
|
namespace=catalyst-system) AND `kubectl kustomize` (subjects[0].
|
|
namespace=catalyst). 2026-05-04.
|
|
|
|
Bumped to 1.4.2 — dual-mode contract violation in 1.4.0
|
|
CATALYST_POWERDNS_API_URL block (issue #830 follow-up, 2026-05-04).
|
|
PR #838 introduced two `value: {{ default "..." .Values... | quote }}`
|
|
Helm directives in api-deployment.yaml's CATALYST_POWERDNS_API_URL +
|
|
CATALYST_POWERDNS_SERVER_ID env entries. Both broke the Kustomize-
|
|
mode contabo-mkt build with "yaml: invalid map key: map[string]
|
|
interface {}{...}", stalling every contabo reconciliation including
|
|
THIS chart's own RBAC fix from 1.4.1.
|
|
Same pattern as the SOVEREIGN_FQDN block right below in the same
|
|
file (extensively documented as a dual-mode hazard): replace the
|
|
Helm directive with a literal default. The in-cluster Service URL
|
|
is a non-secret constant on every Sovereign that ships bp-powerdns
|
|
at its canonical release name; per-Sovereign overrides are still
|
|
possible via the HelmRelease overlay's `catalystApi.env` additional-
|
|
env patch (which takes precedence). 2026-05-04.
|
|
|
|
Bumped to 1.4.3 — auto-provision SME Postgres + secrets bundle on
|
|
Sovereign install (issue #859, 2026-05-04). The 11 SME service
|
|
Deployments (auth, billing, catalog, console, domain, gateway,
|
|
marketplace, notification, provisioning, tenant — plus admin which
|
|
has no DB/secret refs) reference two cluster-scoped resources:
|
|
- `sme-pg-app` Secret (basic-auth: username + password) backing the
|
|
sme-pg-rw.sme.svc.cluster.local Postgres Service
|
|
- `sme-secrets` Secret with 11 keys: JWT_SECRET, JWT_REFRESH_SECRET,
|
|
GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, SMTP_HOST/PORT/FROM/USER/
|
|
PASS, ADMIN_EMAIL, ADMIN_PASSWORD
|
|
On contabo-mkt these are pre-provisioned in
|
|
clusters/contabo-mkt/apps/sme/data/{postgresql,secrets}.yaml. On a
|
|
freshly franchised Sovereign nothing equivalent existed — caught
|
|
live on otech103 (2026-05-04 23:18 Berlin) where 10 of 11 SME pods
|
|
landed in CreateContainerConfigError after MARKETPLACE_ENABLED=true.
|
|
|
|
Fix:
|
|
- templates/sme-services/cnpg-cluster.yaml — gated on the same
|
|
.Values.ingress.marketplace.enabled flag the rest of the SME
|
|
bundle uses. Renders postgresql.cnpg.io/v1.Cluster `sme-pg` in
|
|
`sme` namespace, instances=1, storage=10Gi, primary DB sme_auth
|
|
+ secondary DB sme_billing via postInitApplicationSQL. CNPG
|
|
auto-creates `sme-pg-app` Secret and the `sme-pg-rw` Service.
|
|
Capabilities-gated on postgresql.cnpg.io/v1 so a misordered
|
|
overlay surfaces as "no Cluster yet" rather than chart install
|
|
failure (mirrors platform/powerdns/chart/templates/cnpg-cluster.
|
|
yaml). bp-catalyst-platform (slot 13) declares dependsOn:
|
|
bp-cnpg (slot 16) — already in place since 2026-05-02 (see
|
|
1.1.9 changelog) — so by reconcile time the CRD is registered.
|
|
- templates/sme-services/sme-secrets.yaml — gated on the same
|
|
flag. JWT_SECRET / JWT_REFRESH_SECRET / ADMIN_PASSWORD are
|
|
auto-generated via sprig randAlphaNum (64 / 64 / 32 chars
|
|
respectively) AND PERSISTED across reconciles via Helm `lookup`
|
|
— same load-bearing pattern as platform/gitea/chart/templates/
|
|
admin-secret.yaml (issue #830 Bug 2). Without lookup every
|
|
reconcile would invalidate every active SME session and lock
|
|
out every admin (feedback_passwords.md). Operator-supplied
|
|
GOOGLE_CLIENT_*, SMTP_* values default to empty placeholders;
|
|
operator brings real values via the per-Sovereign overlay or
|
|
the admin-console signup form. helm.sh/resource-policy: keep
|
|
so the Secret survives helm uninstall.
|
|
- values.yaml — add `smePostgres.cluster.*` (storage / pgVersion
|
|
/ resources / ...) and `smeSecrets.{smtp,admin}.*` blocks; both
|
|
fully data-driven per Inviolable Principle #4.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.2 → 1.4.3. 2026-05-04.
|
|
|
|
Bumped to 1.4.4 — deploy FerretDB in sme ns + cross-ns Valkey wire
|
|
to unblock catalog/tenant/domain SME services on franchised
|
|
Sovereigns (issue #861, 2026-05-04). After 1.4.3 landed sme-pg +
|
|
sme-secrets, 7/12 SME pods reached Running on otech103 but 3 stayed
|
|
in CrashLoopBackOff with the same DNS error:
|
|
|
|
catalog: failed to ping MongoDB
|
|
error=...lookup ferretdb.sme.svc.cluster.local on 10.43.0.10:53:
|
|
no such host
|
|
|
|
Root cause: SME service ConfigMap (sme-services-config) hardcoded
|
|
two URLs that have no Sovereign-side workload behind them:
|
|
- MONGODB_URI: mongodb://ferretdb.sme.svc.cluster.local:27017
|
|
(FerretDB has no Deployment on Sovereigns — only on contabo-mkt
|
|
via clusters/contabo-mkt/apps/sme/data/ferretdb.yaml)
|
|
- VALKEY_ADDR: valkey.sme.svc.cluster.local:6379
|
|
(bp-valkey 1.0.0 deploys to namespace `valkey`, not `sme`,
|
|
and exposes Services `valkey-primary` / `valkey-replicas` /
|
|
`valkey-headless` — no plain `valkey` service)
|
|
|
|
Fix:
|
|
- NEW templates/sme-services/ferretdb.yaml — gated on the same
|
|
.Values.ingress.marketplace.enabled flag. Deployment + Service
|
|
`ferretdb` in `sme` ns, image pinned ghcr.io/ferretdb/ferretdb:1.24
|
|
(matches contabo's data/ferretdb.yaml — v2.x requires PostgreSQL
|
|
with the DocumentDB extension which the sme-pg CNPG cluster from
|
|
PR #860 does not ship; v1.24 works against vanilla CNPG postgres:
|
|
16 and is the proven path). Backed by sme-pg via FERRETDB_POSTGRESQL_
|
|
URL env interpolating PG_USER + PG_PASSWORD from the sme-pg-app
|
|
Secret (auto-created by CNPG in 1.4.3) and pointing at
|
|
sme-pg-rw.sme.svc.cluster.local:5432/sme_documents. Image is
|
|
operator-overridable via .Values.smeServices.ferretdb.{image,tag}
|
|
(Inviolable Principle #4).
|
|
- cnpg-cluster.yaml — extend postInitApplicationSQL to also
|
|
CREATE DATABASE sme_documents OWNER sme so FerretDB has a DB to
|
|
write into on first install. The DB list is data-driven from
|
|
.Values.smePostgres.cluster.additionalDatabases (defaulting to
|
|
[sme_billing, sme_documents]) so adding a new SME service is a
|
|
values-only change.
|
|
- configmap.yaml — VALKEY_ADDR now reads from .Values.smeServices.
|
|
valkey.host (default valkey-primary.valkey.svc.cluster.local:6379
|
|
— the actual Service name bitnami/valkey 5.5.1 with replication
|
|
architecture renders, NOT the issue's `valkey.valkey.svc.cluster.
|
|
local` which doesn't exist on Sovereigns). MONGODB_URI also uses
|
|
.Values.smeServices.ferretdb.{host,port} for symmetry.
|
|
- NEW templates/sme-services/valkey-cross-ns-policy.yaml —
|
|
CiliumNetworkPolicy in `valkey` namespace allowing ingress on
|
|
6379/TCP from any Pod in the `sme` namespace. Defense-in-depth on
|
|
top of bp-valkey 1.0.0's upstream NetworkPolicy (which already
|
|
permits port 6379 from any source). Gated on the same
|
|
marketplace.enabled flag.
|
|
- values.yaml — add `smeServices.ferretdb.{image,tag,replicas,
|
|
resources}` and `smeServices.valkey.host` blocks. Every URL,
|
|
image ref, and resource value is operator-overridable per
|
|
Inviolable Principle #4.
|
|
|
|
Known follow-up: bp-valkey ships with `auth.enabled: true` (bitnami
|
|
default). SME services pass only VALKEY_ADDR (no password env). Two
|
|
remediation paths exist: (a) per-Sovereign overlay disables
|
|
bp-valkey auth, or (b) plumb VALKEY_PASSWORD through SME service
|
|
Deployments + service code. Filed separately. This PR ships the
|
|
infrastructure (FQDN + CiliumNetworkPolicy) so the wire is in place
|
|
when one of those auth fixes lands.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.3 → 1.4.4. 2026-05-04.
|
|
|
|
Bumped to 1.4.5 — wire VALKEY_PASSWORD into SME auth + gateway services
|
|
to clear cross-ns Valkey auth crashloop on franchised Sovereigns
|
|
(issue #863, 2026-05-04). After 1.4.4 landed FerretDB + the cross-ns
|
|
CiliumNetworkPolicy, 11/13 SME pods reached Running 1/1 on otech103
|
|
but `auth` stayed in CrashLoopBackOff and `gateway`'s rate limiter
|
|
was disabled, both with the same error:
|
|
|
|
ERROR failed to connect to Valkey error="NOAUTH HELLO must be
|
|
called with the client already authenticated, otherwise the
|
|
HELLO <proto> AUTH <user> <pass> option can be used..."
|
|
|
|
Root cause: bp-valkey 1.0.0 (slot 17) ships with `auth.enabled=true`
|
|
(bitnami valkey 5.5.1 default convention). The bitnami subchart
|
|
auto-generates a random password and exposes it via the
|
|
`valkey-password` key in the `valkey` Secret in the `valkey`
|
|
namespace. SME service code (`core/services/shared/db/valkey.go`)
|
|
only accepted an addr — no password — and the auth.yaml + gateway.yaml
|
|
Deployments only set VALKEY_ADDR. Cross-ns AUTH was never plumbed
|
|
through. Pre-1.4.4 this was masked because VALKEY_ADDR pointed at a
|
|
non-existent `valkey.sme.svc.cluster.local` and the connect failed
|
|
at DNS not at AUTH.
|
|
|
|
Fix:
|
|
- core/services/shared/db/valkey.go — add ConnectValkeyWithAuth
|
|
overload that takes username + password. ConnectValkey kept
|
|
backwards-compatible for callers that don't pass auth (contabo-mkt
|
|
auth-less in-namespace Valkey under data/valkey.yaml).
|
|
- core/services/auth/main.go + core/services/gateway/main.go —
|
|
read VALKEY_USERNAME + VALKEY_PASSWORD env, call
|
|
ConnectValkeyWithAuth when password is non-empty, else fall through
|
|
to the no-auth path. Empty password = current contabo behaviour.
|
|
- NEW templates/sme-services/valkey-cross-ns-secret.yaml — use Helm
|
|
`lookup` to read the bp-valkey auto-generated password from
|
|
`valkey/valkey` Secret and re-emit it as `sme-valkey-auth` in
|
|
`sme` namespace. Same lookup-and-mirror pattern as
|
|
sme-secrets.yaml (issue #859) and gitea-admin-secret (issue #830
|
|
Bug 2). On first install the lookup may return nil — Flux's 15m
|
|
reconcile picks up the mirror once bp-valkey is Ready.
|
|
- auth.yaml + gateway.yaml — add VALKEY_PASSWORD env reading from
|
|
`sme-valkey-auth` Secret with `optional: true` so contabo-mkt's
|
|
auth-less Valkey path keeps working when the mirror Secret is
|
|
absent. valkey-go's `default` ACL user uses `requirepass`, so
|
|
VALKEY_USERNAME stays unset by convention.
|
|
- values.yaml — add `smeServices.valkey.{sourceSecretName,
|
|
sourcePasswordKey, destNamespace, destSecretName}` knobs so a
|
|
forked bp-valkey with non-default Secret naming can override
|
|
without forking the chart (Inviolable Principle #4).
|
|
|
|
No SME smeTag bump needed at chart-source time — the
|
|
services-build.yaml workflow rebuilds the auth + gateway images
|
|
from this commit's SHA and updates the `image:` line in auth.yaml +
|
|
gateway.yaml directly. The chart's blueprint-release pipeline picks
|
|
up those updated SHAs in its values.yaml on the next chart push.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.4 → 1.4.5. 2026-05-04.
|
|
|
|
Bumped to 1.4.6 — bundle the rebuilt services-auth + services-gateway
|
|
image SHA fa4395f from PR #864 into the chart artifact (issue #863
|
|
follow-up, 2026-05-05). 1.4.5 was published at commit fa4395fa BEFORE
|
|
the deploy job updated auth.yaml's hardcoded `image:` to fa4395f, so
|
|
Sovereigns pulling 1.4.5 got the OLD image (5cdb738) without the
|
|
ConnectValkeyWithAuth Go change — VALKEY_PASSWORD env was wired but
|
|
the binary ignored it and still hit "NOAUTH HELLO" on connect.
|
|
|
|
Same race documented in the 1.1.16 changelog above (catalyst-ui
|
|
base:/ fix). 1.4.6 republishes the chart with the deploy-committed
|
|
image SHAs already in tree (auth.yaml + gateway.yaml `image:` lines
|
|
point at fa4395f as of commit 9731701c).
|
|
|
|
No template/code changes — pure version bump to roll a fresh OCI
|
|
artifact whose `helm template` output references the
|
|
ConnectValkeyWithAuth-enabled image.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.5 → 1.4.6. 2026-05-05.
|
|
|
|
Bumped to 1.4.7 — provision the `provisioning-github-token` Secret
|
|
on Sovereign install so the last 1/13 SME pod (provisioning) reaches
|
|
Running 1/1 (issue #866, 2026-05-04). After 1.4.6 cleared 12/13 SME
|
|
pods on otech103, the provisioning Deployment stayed in
|
|
CreateContainerConfigError waiting on
|
|
`secret/provisioning-github-token` (key GITHUB_TOKEN) which exists
|
|
on contabo-mkt as a hand-rolled SealedSecret but had no Sovereign-
|
|
side equivalent. Without this Secret the Pod can't even start —
|
|
blocks the full SME stack on every fresh Sovereign.
|
|
|
|
Fix (issue #866 Option C — local-Gitea target):
|
|
Post-cutover the canonical Git target on a Sovereign IS the local
|
|
Gitea instance (the GitRepository CRs already point there). New
|
|
template templates/sme-services/provisioning-github-token.yaml
|
|
uses Helm `lookup` to read the auto-generated gitea admin password
|
|
from `gitea/gitea-admin-secret` (already generated by
|
|
platform/gitea/chart/templates/admin-secret.yaml with the same
|
|
lookup-persistence pattern) and re-emit it as
|
|
`sme/provisioning-github-token` under the GITHUB_TOKEN key. Same
|
|
lookup-and-mirror precedent as valkey-cross-ns-secret.yaml (#863)
|
|
and sme-secrets.yaml (#859).
|
|
|
|
bp-gitea (slot 10) reaches Ready before bp-catalyst-platform
|
|
(slot 13) — the Flux dependsOn chain in
|
|
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
|
|
lists bp-gitea explicitly — so by the time this template renders,
|
|
gitea-admin-secret EXISTS in the gitea namespace and lookup
|
|
returns its decoded password.
|
|
|
|
values.yaml — new `smeServices.provisioning.gitToken.*` block
|
|
(sourceNamespace / sourceSecretName / sourcePasswordKey /
|
|
destNamespace / destSecretName / destKey) so per-Sovereign
|
|
overlays pointing the provisioning service at a non-Gitea Git
|
|
host (e.g. a GitHub PAT via OpenBao + ExternalSecret) can swap
|
|
the source ref without forking the chart (Inviolable Principle #4).
|
|
|
|
Out of scope for this chart bump — full Gitea REST-API target
|
|
support in core/services/provisioning/github/client.go (which
|
|
hardcodes https://api.github.com today) is a follow-up Go change.
|
|
This Secret unblocks the Pod reaching Running 1/1, completing the
|
|
SME stack 12/13 → 13/13.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.6 → 1.4.7. 2026-05-04.
|
|
|
|
1.4.8 (issue #868): fix the marketplace UI PIN-signin flow that 503'd
|
|
on otech103 because the public /api/* HTTPRoute backend-ref'd a dead
|
|
Service (catalyst-system/marketplace-api with zero matching Pods).
|
|
Two template fixes:
|
|
- templates/sme-services/marketplace-routes.yaml: /api/* rule now
|
|
cross-namespace backendRef sme/gateway:8080 (the SME BSS gateway
|
|
Pod that already fronts services-auth, catalog, tenant, billing,
|
|
provisioning).
|
|
- templates/sme-services/marketplace-reference-grant.yaml: extend
|
|
`to:` list with the gateway Service so the cross-ns hop is
|
|
authorised by Gateway API.
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.7 → 1.4.8. 2026-05-04.
|
|
|
|
1.4.9 (issue #871): no template change — chart-version-only bump to
|
|
republish the OCI artifact with the current services-auth image SHA
|
|
baked into templates/sme-services/auth.yaml. 1.4.8 was published from
|
|
commit 95a06f56 BEFORE the deploy-bot updated auth.yaml's image pin
|
|
from `services-auth:fa4395f` (old) → `services-auth:95a06f5` (new,
|
|
with the /auth/send-pin alias), so 1.4.8 OCI bytes still reference
|
|
the OLD SHA and otech103 reconciled the broken image. Bumping the
|
|
chart version forces blueprint-release to publish a fresh artifact
|
|
with the current pin. Same race documented in
|
|
feedback_idempotent_iac_purge.md and overnight DoD doc as
|
|
"deploy-step race". Lockstep slot 13 pin bumps to 1.4.9. 2026-05-05.
|
|
|
|
1.4.10 (issue #876): wire CATALYST_OTECH_FQDN env on the catalyst-api
|
|
Deployment from the same `sovereign-fqdn` ConfigMap (key `fqdn`) that
|
|
feeds SOVEREIGN_FQDN. The SME tenant create handler (sme_tenant.go)
|
|
and the sovereign-parent-domains seed (sovereign_parent_domains.go)
|
|
both read CATALYST_OTECH_FQDN — without it, POST /api/v1/sme/tenants
|
|
returns 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign,
|
|
and the SME-pool fallback returns an empty list. The two env names
|
|
exist for historical reasons (Phase-8b handover vs SME-tier tenant
|
|
pipeline) but ultimately point at the Sovereign's public FQDN.
|
|
optional=true since Catalyst-Zero (contabo) doesn't run the SME
|
|
tenant pipeline. Lockstep slot 13 pin bumps to 1.4.10. 2026-05-05.
|
|
|
|
1.4.11 (issue #878): wire CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN
|
|
env on the catalyst-api Deployment, sourced from the local Gitea
|
|
admin secret (`gitea-admin-secret`, keys `username` + `password`).
|
|
Without these, the SME tenant pipeline (#804) and the marketplace-
|
|
settings GitOps writer fail at the first reconcile with "gitops
|
|
token unconfigured" (post-cutover Sovereign has no GitHub PAT — the
|
|
GitOps target is the local Gitea). optional=true so Catalyst-Zero
|
|
(contabo) keeps using the existing GitHub PAT path. Pairs with a
|
|
catalyst-api code change (marketplace_settings.go +
|
|
sme_tenant_gitops.go): injectTokenIntoURL now takes a configurable
|
|
username (was hardcoded "x-access-token"; GitHub PAT-only) so the
|
|
same code path works for both GitHub and Gitea. Also adds `git` to
|
|
the catalyst-api Containerfile (Alpine 3.20 base + apk add git) —
|
|
the pipeline shells out to git clone/commit/push, and without the
|
|
binary the first reconcile fails with `exec: "git": executable
|
|
file not found in $PATH`. Lockstep slot 13 pin bumps to 1.4.11.
|
|
2026-05-05.
|
|
|
|
1.4.12 (issue #878 follow-up): chart-version-only bump to republish
|
|
the OCI artifact with the new catalyst-api image SHA (7bdd14f) baked
|
|
into values.yaml. 1.4.11 was published from commit 7bdd14fc BEFORE
|
|
the deploy-bot updated values.yaml's catalystApi.tag from 20413ec ->
|
|
7bdd14f, so 1.4.11 OCI bytes still reference the OLD image without
|
|
the git binary. Same deploy-step race fixed in CI by #874 (services-
|
|
build auto-bumps chart patch + dispatches blueprint-release) — the
|
|
catalyst-build workflow needs the equivalent. Until then this manual
|
|
bump is required after every catalyst-api image change. Lockstep
|
|
slot 13 pin bumps to 1.4.12. 2026-05-05.
|
|
|
|
1.4.13 (issue #879): unblock the multi-domain Day-2 add-domain happy
|
|
path on a fresh post-handover Sovereign. Five stacked wiring fixes,
|
|
three of which are chart-side:
|
|
|
|
Bug 1 — POOL_DOMAIN_MANAGER_URL: api-deployment.yaml now wires
|
|
`POOL_DOMAIN_MANAGER_URL=https://pool.openova.io` so the Sovereign-
|
|
side catalyst-api hits the public PDM ingress on contabo (the
|
|
in-cluster default `pool-domain-manager.openova-system.svc` only
|
|
resolves on contabo and is NXDOMAIN on franchised Sovereigns).
|
|
Caught live on otech103, 2026-05-05: every Day-2 add-domain POST
|
|
failed with `dial tcp: lookup pool-domain-manager.openova-system.
|
|
svc.cluster.local: no such host`.
|
|
|
|
Bug 2 — CATALYST_PDM_BASIC_AUTH_USER / _PASS: api-deployment.yaml
|
|
now mounts the `pdm-basicauth` Secret (keys `username`+`password`)
|
|
so pdmFlipNS can `Authorization: Basic ...` against the Traefik
|
|
basicAuth Middleware in front of pool.openova.io. optional=true:
|
|
Catalyst-Zero pods skip the header (in-cluster Service path is
|
|
unauthenticated) and CI / older Sovereigns degrade to a clear 401
|
|
log line instead of crashlooping. The Secret is provisioned by
|
|
cloud-init at handover-time (paired infra change in
|
|
cloudinit-control-plane.tftpl).
|
|
|
|
Bug 5 — HTTPRoute /auth/handover Exact match: httproute.yaml
|
|
catalyst-ui rule changed from PathPrefix `/auth/` to Exact
|
|
`/auth/handover`. The previous PathPrefix collided with the OIDC
|
|
PKCE redirect_uri `/auth/callback` — catalyst-api 404s on that
|
|
path because it only registers `/api/v1/auth/callback`. Result
|
|
post-handover-JWT-cookie-expiry (8h TTL): the operator could not
|
|
log into the Sovereign Console at all (caught live on otech103).
|
|
Exact-match keeps /auth/handover routed to catalyst-api while
|
|
every other /auth/* path falls through to catalyst-ui's React
|
|
Router for client-side OIDC.
|
|
|
|
Three coupled code-side fixes ship in catalyst-api as part of the
|
|
same #879 PR (parent_domains.go):
|
|
|
|
Bug 2-code: pdmFlipNS now SetBasicAuth from the env (read every
|
|
call so a Secret rotation propagates without Pod restart).
|
|
Bug 3-code: pdmFlipNS body now includes `nameservers` (computed
|
|
from expectedNSFor — PDM's SetNSRequest schema requires it; the
|
|
previous body got 422 missing-nameservers).
|
|
Bug 4-code: lookupPrimaryDomain falls back to SOVEREIGN_FQDN env
|
|
after CATALYST_PRIMARY_DOMAIN. On a post-handover Sovereign no
|
|
Deployment record is persisted, so without this fallback GET
|
|
/parent-domains returned {"items":[]} and the propagation panel
|
|
showed `expectedNs: null`. The SOVEREIGN_FQDN env is already
|
|
wired by api-deployment.yaml from the sovereign-fqdn ConfigMap.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.11 → 1.4.12. 2026-05-05.
|
|
|
|
Bumped to 1.4.13 — Flux Kustomization watching SME tenant overlays
|
|
(issue #882, 2026-05-05). The catalyst-api SME-tenant pipeline's
|
|
GitOps writer (sme_tenant_gitops.go::WriteTenantOverlay) commits
|
|
per-tenant Kustomize overlays to clusters/<sov-fqdn>/sme-tenants/
|
|
<tenant-id>/ on every successful POST /api/v1/sme/tenants — but no
|
|
Flux Kustomization on the Sovereign cluster watched that path. The
|
|
state machine (sme_tenant.go) advanced optimistically through every
|
|
step (vcluster → bp_charts → dns → certs → keycloak_clients →
|
|
registry) and reported state=done, while no actual K8s resources
|
|
materialised because nothing was reconciling the orchestrator's
|
|
write target.
|
|
|
|
Verified live on otech103 (2026-05-04 23:18 Berlin): the orchestrator
|
|
successfully committed the 9-file overlay for tenant 15f1e45e-...
|
|
to the local Gitea openova/openova repo @main, but `kubectl get hr
|
|
-n sme-15f1e45e-...` returned No resources found indefinitely.
|
|
|
|
Fix: NEW templates/sme-services/sme-tenants-kustomization.yaml,
|
|
gated on .Values.ingress.marketplace.enabled (same flag the rest of
|
|
the SME bundle uses) — non-marketplace Sovereigns don't run the SME
|
|
tenant pipeline so they don't render this Kustomization. Renders one
|
|
Flux Kustomization in flux-system that sweeps the entire
|
|
./clusters/<sovereignFQDN>/sme-tenants directory tree:
|
|
- sourceRef: flux-system/openova GitRepository (the same one the
|
|
cluster bootstraps from; cutover Step 5 flips its
|
|
.spec.url to the local in-cluster Gitea, which is
|
|
precisely where sme_tenant_gitops.go pushes via
|
|
CATALYST_GITOPS_REPO_URL=http://gitea-http.gitea.svc
|
|
.cluster.local:3000/openova/openova)
|
|
- path: ./clusters/{{ .Values.global.sovereignFQDN }}/sme-tenants
|
|
- interval: 1m (matches the orchestrator's "Flux reconciles
|
|
within ~1 min" SLA documented at the top of
|
|
sme_tenant_gitops.go)
|
|
- prune: true (DELETE /api/v1/sme/tenants/<id> removes the
|
|
overlay directory; Flux GCs the tenant resources)
|
|
- wait: false (per-tenant overlays each install ~5 bp-* HRs
|
|
asynchronously and have their own readiness watcher
|
|
in the orchestrator; blocking this top-level
|
|
Kustomization on every tenant's full readiness would
|
|
let one stuck tenant gate every other tenant)
|
|
|
|
Per Inviolable Principle #4 (never hardcode), every knob is
|
|
operator-overridable via .Values.smeTenants.kustomization.* —
|
|
the GitRepository sourceRef name/namespace, the resource name,
|
|
the cadence (interval/retryInterval/timeout), and the toggles
|
|
(prune/wait). Defaults match the canonical bootstrap-kit
|
|
conventions documented in clusters/_template/bootstrap-kit/03-flux
|
|
.yaml + the cloud-init flux-bootstrap.yaml block.
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.12 → 1.4.13. 2026-05-05.
|
|
|
|
1.4.14 (issue #879 follow-up): chart-version-only republish so the OCI
|
|
artifact carries the catalyst-api image SHA 7bfd6df (the #879 fix
|
|
commit). Chart 1.4.13 was published from commit 7bfd6df5 BEFORE the
|
|
deploy-bot updated values.yaml's catalystApi.tag from aa226df ->
|
|
7bfd6df, so 1.4.13 OCI bytes still reference the OLD catalyst-api
|
|
image without the pdmFlipNS basic-auth + nameservers + lookup-
|
|
primary-domain SOVEREIGN_FQDN-fallback fixes. Same deploy-step race
|
|
fixed in CI by #874 (services-build auto-bumps chart patch + dispatches
|
|
blueprint-release) — the catalyst-build workflow needs the equivalent.
|
|
Until then this manual bump is required after every catalyst-api
|
|
image change. Lockstep slot 13 pin bumps to 1.4.14. 2026-05-05.
|
|
|
|
1.4.15 (issue #887): auto-provision marketplace-api-secrets Secret on
|
|
Sovereign install. templates/marketplace-api/deployment.yaml has always
|
|
referenced a secretKeyRef on `marketplace-api-secrets` (key:
|
|
`jwt-secret`); on contabo-mkt this Secret is hand-rolled in
|
|
clusters/contabo-mkt/apps/.../marketplace-api-secrets.yaml. On a freshly
|
|
franchised Sovereign with ingress.marketplace.enabled=true, nothing
|
|
equivalent existed — caught live on otech103 (2026-05-05) where
|
|
marketplace-api landed in CreateContainerConfigError "secret not found"
|
|
every reconcile. Fix: NEW templates/marketplace-api/secret.yaml uses
|
|
Helm `lookup` to persist a 64-char randAlphaNum jwt-secret across
|
|
reconciles (same load-bearing pattern as sme-secrets, valkey-cross-ns-
|
|
secret, provisioning-github-token, gitea-admin-secret per
|
|
feedback_passwords.md). Without lookup every reconcile would
|
|
invalidate every active marketplace JWT. helm.sh/resource-policy: keep
|
|
so the Secret survives helm uninstall. Lockstep slot 13 pin bumps to
|
|
1.4.15. 2026-05-05.
|
|
|
|
1.4.17 (issue #901): unblock Sovereign Console login on every fresh
|
|
provision. https://console.<sov>/login PIN-issue endpoint returned 503
|
|
with "CATALYST_OPENOVA_KC_SA_CLIENT_SECRET not set" — a 3-bug chain:
|
|
|
|
Bug 1: api-deployment.yaml lines 676-739 reference a Secret
|
|
`catalyst-openova-kc-credentials` for the full PIN-auth env block
|
|
(CATALYST_OPENOVA_KC_* + CATALYST_SMTP_*). On contabo-mkt this Secret
|
|
is hand-rolled out-of-band (clusters/contabo-mkt/apps/keycloak-zero/
|
|
helmrelease.yaml mounts it via extraEnvVars). On a freshly franchised
|
|
Sovereign nothing equivalent existed — every secretKeyRef has
|
|
optional=true so the Pod started, but POST /api/v1/auth/pin/issue
|
|
503'd on the missing client-secret env. Fix: NEW
|
|
templates/catalyst-openova-kc-credentials-secret.yaml mirrors the
|
|
canonical KC SA Secret (`keycloak/catalyst-kc-sa-credentials`,
|
|
created by bp-keycloak's openbao-bridge post-install hook) into
|
|
catalyst-system as `catalyst-openova-kc-credentials` with the key
|
|
shape api-deployment.yaml expects. Same Helm-`lookup` persistence
|
|
pattern as templates/marketplace-api/secret.yaml (#887),
|
|
sme-secrets.yaml (#859), valkey-cross-ns-secret.yaml (#863),
|
|
provisioning-github-token.yaml (#866) and gitea-admin-secret.yaml
|
|
(#830). helm.sh/resource-policy: keep — Secret survives helm
|
|
uninstall.
|
|
|
|
Sovereign-vs-contabo gate (load-bearing): the new template is
|
|
rendered ONLY when `lookup "v1" "Secret" "keycloak"
|
|
"catalyst-kc-sa-credentials"` returns non-nil. On Catalyst-Zero
|
|
(contabo) Keycloak runs as `keycloak-zero` in its own namespace
|
|
and there is NO Secret by that name in the `keycloak` namespace
|
|
— lookup returns nil → the template renders empty bytes → the
|
|
existing hand-rolled Secret in clusters/contabo-mkt/apps/...
|
|
remains untouched (no helm-vs-kustomize ownership flap). The
|
|
new file is intentionally NOT added to templates/kustomization.yaml
|
|
`resources:` so Kustomize-mode contabo build skips it entirely
|
|
(same dual-mode pattern as templates/marketplace-api/secret.yaml).
|
|
|
|
Bug 2: SMTP host default `stalwart-web.stalwart.svc.cluster.local`
|
|
(an in-code constant) doesn't exist on Sovereign — even after Bug 1
|
|
the PIN-email delivery would fail at the next step. Fix: chart now
|
|
populates smtp-host/smtp-port/smtp-from from .Values.sovereign.smtp.*
|
|
defaulting to mail.openova.io:587 / noreply@openova.io. SMTP
|
|
user/pass come from a SECONDARY lookup against
|
|
`catalyst-system/sovereign-smtp-credentials` (Secret seeded by
|
|
cloud-init at provision time — issue #883 follow-up). If the source
|
|
Secret is missing, the Secret renders with empty smtp-user/smtp-pass
|
|
so the login surface still works and PIN delivery surfaces as a
|
|
clear "email delivery failed" log line, not as a 503.
|
|
|
|
Bug 3: CATALYST_POST_AUTH_REDIRECT default `/sovereign/wizard` is
|
|
mothership-only — the wizard page is the Provisioning Wizard the
|
|
operator drives at signup, not a post-handover Sovereign page. Fix:
|
|
chart-level default flips to `/sovereign/components` (the post-
|
|
handover Sovereign Console homepage). Per-Sovereign overlays
|
|
override via the catalystApi.env additional-env patch — the chart
|
|
value is a literal (per the dual-mode contract documented in the
|
|
CATALYST_POWERDNS_API_URL block of api-deployment.yaml).
|
|
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.16 → 1.4.17. 2026-05-05.
|
|
|
|
1.4.18 (issue #910 — TBD): create the `sme` namespace on Sovereigns
|
|
where the marketplace is enabled. Every template under
|
|
templates/sme-services/* (billing, auth, ferretdb, valkey-cross-ns-
|
|
secret, sme-secrets, provisioning-github-token, cnpg-cluster, ...)
|
|
emits resources with `namespace: sme`. On Catalyst-Zero (contabo)
|
|
the `sme` namespace is pre-provisioned by clusters/contabo-mkt/apps/
|
|
sme/* — so the chart never created it. On a fresh franchised
|
|
Sovereign nothing else creates the `sme` namespace, so chart 1.4.17
|
|
install failed 23 times with `failed to create resource: namespaces
|
|
"sme" not found` — caught live on otech105 (2026-05-05). Fix: NEW
|
|
templates/sme-services/sme-namespace.yaml gated on the same
|
|
ingress.marketplace.enabled flag as the rest of the SME bundle so
|
|
non-marketplace Sovereigns and the Kustomize-mode contabo build
|
|
(which does NOT include sme-namespace.yaml in templates/sme-services/
|
|
kustomization.yaml's `resources:` list) skip this entirely.
|
|
helm.sh/resource-policy: keep — never cascade-delete the namespace
|
|
on chart uninstall (would erase every SME workload + tenant).
|
|
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.17 → 1.4.18. 2026-05-05.
|
|
|
|
1.4.19 (issue #910 — zero-touch provisioning, Bugs 2 + 3): two
|
|
coupled fixes that unblocked Sovereign Console PIN-login on a
|
|
freshly franchised cluster (1.4.18 closed Bug 1, the missing `sme`
|
|
namespace).
|
|
|
|
Bug 2 — CATALYST_SESSION_COOKIE_DOMAIN was hardcoded to
|
|
console.openova.io in templates/api-deployment.yaml. On a Sovereign
|
|
the request host is console.<sov-fqdn>, so the browser silently
|
|
rejected the Set-Cookie (RFC 6265 §5.3 step 6 — Domain mismatch)
|
|
and every /api/* request landed without a session, redirecting back
|
|
to /login forever. Caught live on otech105 (2026-05-05).
|
|
Fix: change the literal default to `""` (empty). Per the dual-mode
|
|
contract (CATALYST_POWERDNS_API_URL block in api-deployment.yaml),
|
|
this MUST stay a literal — Helm template directives in `value:`
|
|
fields break the contabo Kustomize-mode build. Empty value is
|
|
correct on BOTH paths: when CATALYST_SESSION_COOKIE_DOMAIN is empty
|
|
the auth handler omits the Domain attribute and the browser binds
|
|
the cookie to the exact request host. On contabo that is
|
|
console.openova.io (wizard + magic-link served from the same
|
|
host); on a Sovereign that is console.<sov-fqdn> (likewise). Per-
|
|
Sovereign overlays MAY override via the catalystApi.env additional-
|
|
env patch in the per-cluster HelmRelease for unusual topologies.
|
|
|
|
Bug 3 — catalyst-openova-kc-credentials-secret.yaml's smtp-user/
|
|
smtp-pass lookup used "existing target wins" persistence over the
|
|
source `sovereign-smtp-credentials` Secret seeded by A5's
|
|
provisioner (issue #883). On first install the source Secret had
|
|
not yet been seeded (race between catalyst-api's seedSovereignSMTP
|
|
step and the chart reconcile), so the chart rendered empty SMTP
|
|
creds, persisted them into the target, and NEVER picked up A5's
|
|
seeded bytes on subsequent reconciles. POST /api/v1/auth/pin/issue
|
|
502'd with `email-send-failed` for the life of the cluster.
|
|
Caught live on otech105 (2026-05-05).
|
|
Fix: invert the SMTP-cred lookup precedence. SOURCE
|
|
(sovereign-smtp-credentials) wins over the persisted target. Every
|
|
Flux reconcile (1m cadence) re-reads the source, so as soon as A5's
|
|
seed completes the chart picks it up on the next tick. Operator
|
|
rotation: edit sovereign-smtp-credentials (the operator-facing
|
|
seam); the target is a chart-derived projection and never an
|
|
operator surface. KC fields keep the previous "existing target
|
|
wins" contract because bp-keycloak's openbao-bridge auto-rotates
|
|
the client-secret on every Helm upgrade and we want that rotation
|
|
to require explicit operator action (delete the target) rather
|
|
than picking up automatically and rolling the catalyst-api Pod.
|
|
|
|
No values.yaml schema change. No bootstrap-kit slot 13 envsubst
|
|
change. Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
|
|
13-bp-catalyst-platform.yaml bumps from 1.4.18 → 1.4.19. 2026-05-05.
|
|
type: application
|
|
|
|
# Opt-out from the blueprint-release hollow-chart guard (issue #181 / #510).
|
|
# This umbrella legitimately ships only Catalyst-authored workloads
|
|
# (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign HTTPRoute);
|
|
# the foundation layer is installed independently by the bootstrap-kit
|
|
# and must NOT be re-rendered into catalyst-system as subcharts.
|
|
annotations:
|
|
catalyst.openova.io/no-upstream: "true"
|
|
|
|
# No subchart dependencies — see 1.1.9 changelog above. The 10
|
|
# foundation Blueprints are installed by clusters/_template/bootstrap-kit/
|
|
# at their own slots, each as a top-level Flux HelmRelease in its own
|
|
# canonical namespace. This umbrella renders only the Catalyst-Zero
|
|
# control-plane workloads (catalyst-ui, catalyst-api, ProvisioningState
|
|
# CRD, Sovereign HTTPRoute) into targetNamespace: catalyst-system.
|