openova/platform
e3mrah 9a58289786
fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (closes #715) (#716)
* fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713)

Closes #713

Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:

1. SOVEREIGN_FQDN race
   api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
   with optional:true. On Sovereigns, that ConfigMap is rendered by the
   sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
   HelmRelease. When the Pod starts first, valueFrom collapses to "" and
   stays empty — audience check rejects every valid token as "invalid
   audience". Fix: add Reloader annotations so the Pod rolls when the
   ConfigMap (and the handover-jwt-public Secret) appears.

2. catalyst-api-server SA missing user-level realm-management role mappings
   bp-keycloak realm import granted roles via clientScopeMappings — wrong
   level. The actual service-account user had no clientRoles entry, so KC
   rejected GET /users with 403 when catalyst-api tried to ensure the
   operator user during handover. Fix: add explicit "users" array binding
   service-account-catalyst-api-server to realm-management.{impersonation,
   manage-users, view-users, query-users}.

* fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715)

Closes #715

Two architectural bugs surfaced live on otech64 (2026-05-03), both leading
to a healthy-looking Sovereign that the operator could not reach.

1. catalyst-api tofu workdir on emptyDir
   CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's
   catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered
   a rolling restart 3 minutes into otech64's tofu run), in-progress state
   was lost. Tofu had created LB/network/server/services but not the
   hcloud_load_balancer_target.control_plane resource yet — the cluster
   came up at the k3s level but the public LB had no targets, returning
   TLS handshake failure for every console.<sov> request.

   Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed,
   fsGroup=65534 already wires write access). tofu apply resumes from
   where it left off after any Pod restart.

2. bp-reloader env-vars strategy
   reloadStrategy=env-vars only injects checksum env vars for ConfigMaps
   referenced via envFrom. Workloads using valueFrom: configMapKeyRef
   (catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the
   configmap.reloader.stakater.com/reload annotation added in PR #714
   was a no-op under env-vars.

   Switch to reloadStrategy=annotations. Reloader bumps a pod-template
   annotation, triggering rollout regardless of how the CM/Secret is
   referenced.

---------

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-04 02:04:26 +04:00
..
alloy feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
anthropic-adapter feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
bge feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
cert-manager fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681) 2026-05-03 17:12:48 +04:00
cert-manager-dynadot-webhook fix(bp-cert-manager-dynadot-webhook): pin SHA + add ghcr-pull imagePullSecret (#643) 2026-05-02 23:52:42 +04:00
cert-manager-powerdns-webhook fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681) 2026-05-03 17:12:48 +04:00
cilium fix(bp-cilium): upgrade upstream cilium 1.16.5 → 1.19.3 (1.2.0) (#684) 2026-05-03 17:20:54 +04:00
clickhouse docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
cnpg feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
coraza fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220) 2026-04-30 06:15:28 +02:00
crossplane feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
crossplane-claims feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
debezium docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs 2026-04-27 22:36:39 +02:00
external-dns fix(bp-external-dns): livenessProbe.initialDelaySeconds=180 for cold-cluster cache-sync (closes #700) (#707) 2026-05-03 23:39:36 +04:00
external-secrets feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
external-secrets-stores fix(bp-external-secrets-stores): split ClusterSecretStore into separate chart per #247 pattern (closes #331) (#426) 2026-05-01 17:33:47 +04:00
failover-controller refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
falco fix(bp-falco): rename rules_file → rules_files (Falco 0.36+ canonical key, Closes #570) (#574) 2026-05-02 12:59:29 +04:00
ferretdb docs(pass-11b): retry banners on failover-controller/trivy/clickhouse/ferretdb (Edit needed Read first) 2026-04-27 21:45:56 +02:00
flink docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
flux feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
gateway-api fix: bp-gateway-api 5→10 CRDs + bp-gitea CNPG + bp-harbor CNPG race fix + DAG audit (#592) 2026-05-02 15:20:05 +04:00
gitea fix(bp-gitea): use explicit labels in sync-job template (chart 1.2.3 retry) (#670) 2026-05-03 14:37:24 +04:00
grafana feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
guacamole docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
harbor fix(bp-harbor): grep-oE for password (multi-line tolerant) (chart 1.2.13) (#651) 2026-05-03 09:26:23 +04:00
iceberg docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
keda docs(pass-10): banners on 7 more components + opentofu active-active drift fix 2026-04-27 21:43:45 +02:00
keycloak fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713) (#714) 2026-05-04 01:37:36 +04:00
knative feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
kserve feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
kyverno feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
langfuse fix(bp-langfuse): drop apostrophe from description to clear GHCR 500 (resolves #215) (#278) 2026-04-30 17:31:51 +04:00
librechat feat(charts): bp-librechat wrapper chart (closes #275) (#287) 2026-04-30 18:56:59 +04:00
litmus feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
livekit feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
llm-gateway feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
loki feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
matrix feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
milvus docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
mimir fix: drop bp-langfuse from minimal + bp-mimir 1.0.2 push_grpc fix (#664) 2026-05-03 13:50:38 +04:00
nats-jetstream feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
nemo-guardrails feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
neo4j docs(pass-12): role-in-Catalyst banners on 11 AI/ML Application Blueprints 2026-04-27 21:47:45 +02:00
newapi feat(platform): add bp-newapi — multi-tenant LLM marketplace gateway (#394) (#396) 2026-05-01 15:57:06 +04:00
openbao fix(bp-openbao): add BAO_TOKEN+NAMESPACE env to auth-bootstrap (chart 1.2.14) (#666) 2026-05-03 14:02:34 +04:00
openmeter feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289) 2026-04-30 19:37:28 +04:00
opensearch docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
opentelemetry feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
opentofu refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
powerdns fix(powerdns+catalyst-api): zero-touch contabo PowerDNS API key for Sovereign cert-manager (PR #681 followup) (#686) 2026-05-03 18:23:27 +04:00
reflector/chart fix: bp-reflector + rename ghcr-pull-secret->ghcr-pull (Closes #543) (#554) 2026-05-02 12:17:51 +04:00
reloader fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (closes #715) (#716) 2026-05-04 02:04:26 +04:00
sealed-secrets feat(platform): add global.imageRegistry to bp-cilium/cert-manager/cert-manager-pdns-webhook/sealed-secrets (PR 1/3 #560) (#562) 2026-05-02 12:48:37 +04:00
seaweedfs fix(bp-seaweedfs): remove trailing slash in registry — fixes double-slash image ref (Closes #568) (#576) 2026-05-02 13:02:48 +04:00
sigstore feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
spire fix(bp-spire): disable oidc ClusterSPIFFEID + chart bump (1.1.7) (#645) 2026-05-03 08:27:33 +04:00
stalwart docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
strimzi docs(pass-35): completion sweep for surviving DNS placeholders (8 components) 2026-04-27 22:46:16 +02:00
stunner feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290) 2026-04-30 19:37:38 +04:00
syft-grype fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220) 2026-04-30 06:15:28 +02:00
tempo feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214) 2026-04-29 22:11:04 +02:00
temporal feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
trivy fix(bp-trivy): raise operator memory limit 256Mi→512Mi — OOMKilled on 38-HR Sovereign (Closes #588) (#590) 2026-05-02 15:20:03 +04:00
valkey feat(platform): add global.imageRegistry to bp-openbao/external-secrets/cnpg/valkey/nats-jetstream/powerdns/gitea (PR 2/3, #560) (#565) 2026-05-02 12:52:43 +04:00
velero feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580) 2026-05-02 13:21:53 +04:00
vllm feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283) 2026-04-30 18:37:07 +04:00
vpa fix(bp-vpa): drop registry.k8s.io/ prefix in repository (upstream prepends it) (#641) 2026-05-02 23:32:35 +04:00