* fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713)
Closes#713
Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:
1. SOVEREIGN_FQDN race
api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
with optional:true. On Sovereigns, that ConfigMap is rendered by the
sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
HelmRelease. When the Pod starts first, valueFrom collapses to "" and
stays empty — audience check rejects every valid token as "invalid
audience". Fix: add Reloader annotations so the Pod rolls when the
ConfigMap (and the handover-jwt-public Secret) appears.
2. catalyst-api-server SA missing user-level realm-management role mappings
bp-keycloak realm import granted roles via clientScopeMappings — wrong
level. The actual service-account user had no clientRoles entry, so KC
rejected GET /users with 403 when catalyst-api tried to ensure the
operator user during handover. Fix: add explicit "users" array binding
service-account-catalyst-api-server to realm-management.{impersonation,
manage-users, view-users, query-users}.
* fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715)
Closes#715
Two architectural bugs surfaced live on otech64 (2026-05-03), both leading
to a healthy-looking Sovereign that the operator could not reach.
1. catalyst-api tofu workdir on emptyDir
CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's
catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered
a rolling restart 3 minutes into otech64's tofu run), in-progress state
was lost. Tofu had created LB/network/server/services but not the
hcloud_load_balancer_target.control_plane resource yet — the cluster
came up at the k3s level but the public LB had no targets, returning
TLS handshake failure for every console.<sov> request.
Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed,
fsGroup=65534 already wires write access). tofu apply resumes from
where it left off after any Pod restart.
2. bp-reloader env-vars strategy
reloadStrategy=env-vars only injects checksum env vars for ConfigMaps
referenced via envFrom. Workloads using valueFrom: configMapKeyRef
(catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the
configmap.reloader.stakater.com/reload annotation added in PR #714
was a no-op under env-vars.
Switch to reloadStrategy=annotations. Reloader bumps a pod-template
annotation, triggering rollout regardless of how the CM/Secret is
referenced.
* fix(bp-catalyst-platform): emit sovereign-fqdn ConfigMap inside chart, drop sovereign-tls duplicate (#717)
Closes#717
Reloader v1.4.16 is silent on the SOVEREIGN_FQDN race (#713). Tried all
annotation forms (configmap.reloader.stakater.com/reload, reloader/auto)
and both reload strategies (env-vars, annotations). RBAC is correct, watch
coverage is global, but manual CM patches produce zero Reloader log output
and zero Pod rollouts. Abandoning Reloader as the race fix.
Move the sovereign-fqdn ConfigMap into bp-catalyst-platform chart
templates, guarded by {{ if .Values.global.sovereignFQDN }}. Helm install
applies all chart manifests in a single etcd transaction so the ConfigMap
commits before the Pod schedules. valueFrom resolves correctly the first
time. No race possible.
Drop the duplicate from clusters/_template/sovereign-tls/ to avoid
Helm-vs-Flux ownership flapping. The Kustomize path on contabo enumerates
files in templates/kustomization.yaml so this Helm-templated file is never
parsed by Kustomize.
Verified live: deleting the existing CM and re-running Helm install
produced an immediately-correct catalyst-api Pod with SOVEREIGN_FQDN
populated, where the same install with the previous out-of-chart CM had
left the env empty for the Pod's lifetime.
---------
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
* fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713)
Closes#713
Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:
1. SOVEREIGN_FQDN race
api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
with optional:true. On Sovereigns, that ConfigMap is rendered by the
sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
HelmRelease. When the Pod starts first, valueFrom collapses to "" and
stays empty — audience check rejects every valid token as "invalid
audience". Fix: add Reloader annotations so the Pod rolls when the
ConfigMap (and the handover-jwt-public Secret) appears.
2. catalyst-api-server SA missing user-level realm-management role mappings
bp-keycloak realm import granted roles via clientScopeMappings — wrong
level. The actual service-account user had no clientRoles entry, so KC
rejected GET /users with 403 when catalyst-api tried to ensure the
operator user during handover. Fix: add explicit "users" array binding
service-account-catalyst-api-server to realm-management.{impersonation,
manage-users, view-users, query-users}.
* fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715)
Closes#715
Two architectural bugs surfaced live on otech64 (2026-05-03), both leading
to a healthy-looking Sovereign that the operator could not reach.
1. catalyst-api tofu workdir on emptyDir
CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's
catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered
a rolling restart 3 minutes into otech64's tofu run), in-progress state
was lost. Tofu had created LB/network/server/services but not the
hcloud_load_balancer_target.control_plane resource yet — the cluster
came up at the k3s level but the public LB had no targets, returning
TLS handshake failure for every console.<sov> request.
Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed,
fsGroup=65534 already wires write access). tofu apply resumes from
where it left off after any Pod restart.
2. bp-reloader env-vars strategy
reloadStrategy=env-vars only injects checksum env vars for ConfigMaps
referenced via envFrom. Workloads using valueFrom: configMapKeyRef
(catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the
configmap.reloader.stakater.com/reload annotation added in PR #714
was a no-op under env-vars.
Switch to reloadStrategy=annotations. Reloader bumps a pod-template
annotation, triggering rollout regardless of how the CM/Secret is
referenced.
---------
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Closes#713
Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:
1. SOVEREIGN_FQDN race
api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
with optional:true. On Sovereigns, that ConfigMap is rendered by the
sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
HelmRelease. When the Pod starts first, valueFrom collapses to "" and
stays empty — audience check rejects every valid token as "invalid
audience". Fix: add Reloader annotations so the Pod rolls when the
ConfigMap (and the handover-jwt-public Secret) appears.
2. catalyst-api-server SA missing user-level realm-management role mappings
bp-keycloak realm import granted roles via clientScopeMappings — wrong
level. The actual service-account user had no clientRoles entry, so KC
rejected GET /users with 403 when catalyst-api tried to ensure the
operator user during handover. Fix: add explicit "users" array binding
service-account-catalyst-api-server to realm-management.{impersonation,
manage-users, view-users, query-users}.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Helm parses the entire file (including YAML comments) for template
directives BEFORE YAML parsing strips comments. Literal '{{ ... }}'
inside a # comment was treated as a template directive and failed
with 'unexpected <.> in operand' at line 419.
PR #698 introduced this in the explanatory comment for the
SOVEREIGN_FQDN ConfigMap workaround. Reword to avoid the literal
double-curlies — the comment still describes the constraint without
tripping the Helm parser.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #692 added an inline Helm-template `value:` for SOVEREIGN_FQDN in
api-deployment.yaml. That broke contabo-mkt's catalyst-platform Flux
Kustomization (path: ./products/catalyst/chart/templates) because Kustomize
parses raw YAML and Helm `{{ ... }}` is not valid YAML syntax. Live error
on contabo at adf8dc7d:
kustomize build failed: yaml: invalid map key:
map[string]interface {}{".Values.global.sovereignFQDN | default \"\" | quote":""}
Replace the Helm-template form with `valueFrom.configMapKeyRef.optional:
true` so the same template renders cleanly under both consumers:
- contabo-mkt (Kustomize): ConfigMap `sovereign-fqdn` doesn't exist →
optional ref → env stays empty → catalyst-api on contabo never validates
handover JWTs anyway (it's the SIGNER, not the validator). Correct.
- Sovereigns (Helm via bp-catalyst-platform OCI chart): on apply, the
sovereign-tls Kustomization renders `sovereign-fqdn-configmap.yaml` with
envsubst on ${SOVEREIGN_FQDN}, creating the ConfigMap with the per-
Sovereign FQDN. catalyst-api Pod resolves the ref → env populated →
audience check works.
This restores the bridge between the two consumers without forking the
template. The bp-catalyst-platform 1.2.5 → 1.2.7 bump publishes the new
chart; bootstrap-kit overlay pin updated.
Will be verified on otech49 (next provision after this lands).
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>