openova

Author	SHA1	Message	Date
e3mrah	0a721506d1	fix(catalyst-api): eventual-consistent Phase-1 watcher with late-poll (#910 ) (#913 ) When the all-terminal trip fires with at least one failed HelmRelease, keep the informer running for an additional LatePollTimeout window (default 10 minutes) to give Flux helm-controller's remediation.retries path room to flip the failed HR back to installing → installed. If every component reaches StateInstalled during the late-poll window, classify as OutcomeReady; if the deadline elapses with any HR still failed, classify as OutcomeFailed exactly as before. Motivated by the otech105 incident (2026-05-05): bp-catalyst-platform 1.4.17 hit the missing-sme-namespace InstallFailed on first install, 1.4.18 (chart-version bump) succeeded a few minutes later — the Sovereign reached 40/40 HRs Ready=True but the orchestrator had already marked the deployment FAILED at the moment of the 1.4.17 terminal observation. Specifically: * internal/helmwatch: new Config fields LatePollTimeout + LatePollInterval, new runLatePoll loop that re-reads the live state map until convergence-or-deadline. Per-component events fire via the existing dispatch path so the wizard log pane surfaces the recovery window. New CompileLatePollTimeout + CompileLatePollInterval env helpers parse CATALYST_PHASE1_LATE_POLL_TIMEOUT + CATALYST_PHASE1_LATE_POLL_INTERVAL. * internal/handler: phase1WatchConfigForDeployment threads the two new knobs through. Two new test-only handler fields phase1LatePollTimeout / phase1LatePollInterval mirror the existing Phase-1 knobs. * clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: bump install/upgrade timeout from 15m to 25m for the bp-catalyst-platform umbrella specifically. The chart genuinely needs ~20 minutes worst-case on a fresh franchised Sovereign with the full SME service stack; every other bp-* chart stays at its previous default since they install in well under 5 minutes empirically. New tests cover: * TestWatch_LatePollRecoversFailedComponentToReady — happy path * TestWatch_LatePollExhaustsKeepsOutcomeFailed — exhaustion path * TestWatch_LatePollMultipleFailedPartialRecovery — partial recovery * TestWatch_LatePollDoesNotRunWhenNoFailures — happy-path regression * TestLatePollActive_FlagToggles — accessor wiring * TestCompileLatePoll{Timeout,Interval}_DefaultOnEmpty — env helpers * TestRunPhase1Watch_LatePollRecoversFailedToReady — handler integration * TestRunPhase1Watch_LatePollExhaustsFlipsToFailed — handler integration * TestPhase1WatchConfig_LatePollEnvVarOverride — env wiring * TestPhase1WatchConfig_LatePollFieldOverrideBeatsEnv — test injection Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:25:51 +04:00
github-actions[bot]	937491b17d	deploy: update catalyst images to `dd2fe1a`	2026-05-05 08:16:17 +00:00
e3mrah	dd2fe1aa62	fix(bp-catalyst-platform): unblock Sovereign Console PIN-login on fresh provision (1.4.19, #910 Bugs 2+3) (#912 ) Two coupled fixes that unblock Sovereign Console PIN-login on every freshly franchised cluster (1.4.18 closed Bug 1 — the missing `sme` namespace). Bug 2 — CATALYST_SESSION_COOKIE_DOMAIN was hardcoded to console.openova.io in templates/api-deployment.yaml. On a Sovereign the request host is console.<sov-fqdn>, so the browser silently rejected the Set-Cookie (RFC 6265 §5.3 step 6 — Domain mismatch) and every /api/* request landed without a session, redirecting back to /login forever. Caught live on otech105 (2026-05-05). Fix: change the literal default to "" (empty). Per the dual-mode contract documented in the CATALYST_POWERDNS_API_URL block of api-deployment.yaml, this MUST stay a literal — Helm template directives in `value:` fields break the contabo Kustomize-mode build. Empty value is correct on BOTH paths: when CATALYST_SESSION_COOKIE_DOMAIN is empty the auth handler omits the Domain attribute and the browser binds the cookie to the exact request host. On contabo that is console.openova.io (wizard + magic-link served from the same host); on a Sovereign that is console.<sov-fqdn> (likewise). Per-Sovereign overlays MAY override via the catalystApi.env additional-env patch in the per-cluster HelmRelease for unusual topologies. Bug 3 — catalyst-openova-kc-credentials-secret.yaml's smtp-user/ smtp-pass lookup used "existing target wins" persistence over the source `sovereign-smtp-credentials` Secret seeded by A5's provisioner (issue #883). On first install the source Secret had not yet been seeded (race between catalyst-api's seedSovereignSMTP step and the chart reconcile), so the chart rendered empty SMTP creds, persisted them into the target, and operator-edited target bytes would be overwritten on every subsequent reconcile because the source ALSO won at that point — a footgun. Caught live on otech105 (2026-05-05): POST /api/v1/auth/pin/issue 502'd with `email-send-failed`. Fix: invert the SMTP-cred lookup precedence. SOURCE (sovereign-smtp-credentials) wins over the persisted target. Every Flux reconcile (1m cadence) re-reads the source, so as soon as A5's seed completes the chart picks it up on the next tick. Operator rotation: edit sovereign-smtp-credentials (the operator-facing seam); the target is a chart-derived projection and never an operator surface. KC fields keep the previous "existing target wins" contract because bp-keycloak's openbao-bridge auto-rotates the client-secret on every Helm upgrade and we want that rotation to require explicit operator action (delete the target Secret) rather than auto-roll the catalyst-api Pod. Lockstep: - products/catalyst/chart/Chart.yaml: 1.4.18 → 1.4.19 with full 1.4.19 changelog block. - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: pinned chart version 1.4.18 → 1.4.19 with inline rationale comment matching the 1.4.x changelog format. Verification: - helm template (default values) clean — Kustomize-mode contabo build path unchanged. - helm template Sovereign-mode (ingress.marketplace.enabled=true, sovereignFQDN=otech106.omani.works) renders 62 resources; CATALYST_SESSION_COOKIE_DOMAIN renders as `value: ""`. - kubectl kustomize products/catalyst/chart/templates clean — contabo Kustomize-mode build emits same resource set, with CATALYST_SESSION_COOKIE_DOMAIN: "". Refs: #910 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:14:20 +04:00
e3mrah	58bfdb5eb3	fix(catalyst-api): align SME tenant orchestrator emit with bp-keycloak / bp-cnpg chart contracts (#910 ) (#911 ) The sme_tenant_gitops.go emit for per-tenant bp-keycloak HelmReleases used a values shape (`topology`, `realm.`, `bootstrap.`, `ingress.`) that the bp-keycloak chart does NOT consume. Result: tenant Keycloak Pod ran but the chart's templates/httproute.yaml guard rendered nothing (`gateway.host` was unset), so tenant users could not reach their own Keycloak and downstream WordPress / OpenClaw / Stalwart OIDC integration broke. Chart contract (platform/keycloak/chart/values.yaml): - sovereignFQDN - sovereignRealm.enabled - gateway.enabled / gateway.host / gateway.parentRef - smtp.{host,port,from,user,password,ssl,starttls,auth} This change emits the canonical shape, plus a forward-looking realmConfig.tenant. marker for the future tenant-mode realm template (Helm accepts unknown values silently — the marker is harmless until the chart honours it). Also fixes bp-cnpg emit: the chart is a pure umbrella subchart of cloudnative-pg; per-Sovereign overrides MUST flow through the `cloudnative-pg.` namespace. The previous top-level `namespace` / `operator.enabled` keys were silently ignored by Helm. Tenant install also disables CRD creation since the mothership bp-cnpg already owns them. Tenant SMTP credentials are wired via spec.valuesFrom referring to a per-tenant `sme-tenant-smtp-credentials` Secret (optional=true so the chart still installs before the Secret is reflected — outbound mail silently no-ops, login flows work). Tests: - TestBPKeycloakEmittedYAMLParses (every byte parses as YAML) - TestBPKeycloakValuesContract (sovereignFQDN/gateway/smtp/sovereignRealm) - TestBPKeycloakValuesContract_NoLegacyKeys - TestBPCNPGSubchartKey - TestBPKeycloakValuesFromSMTPSecret (optional, smtp. targetPath) - TestBPKeycloakInstallTimeout Verified WP / OpenClaw / Stalwart emit shapes already align with their chart values.yaml (smeDomain / keycloak.realmURL / clientID / clientSecretName / ingress.host) — no change needed in those templates. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:12:50 +04:00
github-actions[bot]	abea3af1e5	deploy: update catalyst images to `4969525`	2026-05-05 07:40:42 +00:00
e3mrah	496952587e	fix(bp-catalyst-platform): create sme namespace on marketplace Sovereigns (1.4.18) (#909 ) Every template under templates/sme-services/* (billing, auth, ferretdb, valkey-cross-ns-secret, sme-secrets, provisioning-github-token, cnpg-cluster, ...) emits resources with `namespace: sme`. On Catalyst-Zero (contabo) the `sme` namespace is pre-provisioned by clusters/contabo-mkt/apps/sme/* — so the chart never needed to create it. On a fresh franchised Sovereign nothing else creates the `sme` namespace, so chart 1.4.17 install failed 23 times with `failed to create resource: namespaces "sme" not found`. Caught live on otech105 (2026-05-05) — bp-catalyst-platform stuck Ready=False for 18 minutes blocking every downstream Sovereign Console login + the full marketplace UI. Fix: - NEW templates/sme-services/sme-namespace.yaml — gated on the same `.Values.ingress.marketplace.enabled` flag the rest of the SME bundle uses. Renders a Namespace `sme` with `helm.sh/resource-policy: keep` so a chart uninstall never cascade-deletes every SME workload + tenant. - Same dual-mode contract as templates/marketplace-api/secret.yaml (#887) and templates/catalyst-openova-kc-credentials-secret.yaml (#901): the new file is intentionally NOT added to templates/sme-services/kustomization.yaml's `resources:` list, so the Kustomize-mode contabo build skips it entirely (contabo's `sme` namespace is owned by clusters/contabo-mkt/apps/sme/ namespace.yaml). Lockstep: - products/catalyst/chart/Chart.yaml: 1.4.17 -> 1.4.18 with full 1.4.18 changelog block. - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: pinned chart version 1.4.17 -> 1.4.18 with inline rationale comment matching the 1.4.x changelog format. Verified live on otech105: after the runtime hot-fix (`kubectl create ns sme`) bp-catalyst-platform reached Ready=True ("Helm upgrade succeeded for release catalyst-system/ catalyst-platform.v2 with chart bp-catalyst-platform@1.4.17") and all 40/40 bootstrap-kit HRs converged. This PR ensures future Sovereigns provision cleanly without operator intervention. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:38:31 +04:00
github-actions[bot]	82ade7397c	deploy: update catalyst images to `aec4aca`	2026-05-05 07:09:37 +00:00
e3mrah	aec4aca296	fix(catalyst-api): PDM client must add basic auth for public ingress (#907 ) (#908 ) # What The pdm.Client (Reserve / Commit / Release / Check) never sets the `Authorization: Basic …` header — but the Sovereign-side catalyst-api talks to PDM via the public ingress at https://pool.openova.io which is gated by Traefik basicAuth Middleware. Every fresh provision attempt fails at the very first PDM hop with: {"detail":"pool-domain-manager is temporarily unreachable: pdm reserve status 401: 401 Unauthorized\n", "error":"pdm-unavailable"} This blocks 100% of fresh otechN provisions on pool-mode Sovereigns. # Why now Caught live during DoD A6 verification on otech104. The `pdm-basicauth` Secret is already provisioned on Sovereigns (per api-deployment.yaml lines 588-625, the env vars CATALYST_PDM_BASIC_AUTH_USER / _PASS are wired through Reflector from contabo). The handler-side `pdmFlipNS` and `pdmCreatePowerDNSZone` (Day-2 add-domain operations) already use these credentials — but the core `pdm.Client` used during initial provisioning does not. This is the asymmetry the fix corrects. # What changes * `internal/pdm/client.go` — add a private `do(req)` helper that decorates outbound requests with basic auth from Pod env. Replace the four direct `c.HTTP.Do(req)` callsites with `c.do(req)`. Read every call so a Secret rotation propagates without a Pod restart (Reloader handles env reload). When env is unset the helper is a no-op — preserving the in-cluster Service path used by Catalyst-Zero (contabo) where Traefik basicAuth is not in front of the request. * `internal/pdm/client_test.go` — two new tests: - `TestClient_BasicAuth_AppliedFromEnv` — every method (Check / Reserve / Commit / Release) carries the expected `Basic …` header when env is set. - `TestClient_BasicAuth_OmittedWhenEnvUnset` — defensive shape for in-cluster Service path. Per Inviolable Principle #10, the credentials never enter a struct that gets logged — read-and-set inside `do()` only. Per Inviolable Principle #4 (never hardcode), the basic-auth shape mirrors the existing `pdmBasicAuth()` seam in `handler/parent_domains.go` — same env-var contract, same defensive "empty creds = skip auth" semantics. # Verification `go test ./internal/pdm/...` passes locally. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 11:07:25 +04:00
github-actions[bot]	300c774ff4	deploy: update catalyst images to `e08d872`	2026-05-05 07:03:01 +00:00
e3mrah	e08d8721e1	fix(pdm/dynadot): pre-register glue records before set_ns (#900 ) (#906 ) Multi-domain Day-2 add-domain on a Sovereign was failing with Dynadot's "'ns1.<sov>.omani.works' needs to be registered with an ip address before it can be used" error. Dynadot rejects set_ns whenever the NS hostnames aren't registered as account-level "host records" first. This change wires the glue pre-registration into the PDM dynadot adapter as an optional registrar.GlueRegistrar interface, threads the Sovereign's load-balancer IPv4 from cloud-init through Flux postBuild into the chart's `global.sovereignLBIP`, and forwards it via catalyst-api's pdmFlipNS to PDM's /set-ns endpoint as a new `glueIP` field. PDM's SetNS handler calls RegisterGlueRecord for each out-of-bailiwick NS before SetNameservers, with idempotent get_ns → register_ns / set_ns_ip semantics so retries are free. Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:00:45 +04:00
e3mrah	7658f9d937	fix(catalyst-api): seed sovereign-smtp-credentials Secret on freshly franchised Sovereigns (#883 ) (#905 ) On a freshly franchised Sovereign the console-side magic-link / PIN email flow fails because there's no SMTP relay reachable in the cluster. Phase-1 architectural decision (founder-confirmed): the Sovereign Console relays mail through the mothership Stalwart at mail.openova.io:587 during initial provisioning. A Sovereign-local Stalwart-relay is Phase-2 work tracked separately. This PR teaches the catalyst-api Sovereign provisioner to seed the catalyst-system/sovereign-smtp-credentials Secret on the new cluster right after the cloud-init kubeconfig postback lands and BEFORE runPhase1Watch fires. The bp-catalyst-platform chart's auto-create step (#901) reads this Secret via Helm `lookup` when rendering the Sovereign-local catalyst-openova-kc-credentials Secret, so the chart-rendered bytes carry working SMTP submission credentials and the auth service's SMTP-PLAIN dial against mail.openova.io:587 succeeds on the first send-pin. What's seeded: Secret catalyst-system/sovereign-smtp-credentials smtp-user: <mothership CATALYST_SMTP_USER> smtp-pass: <mothership CATALYST_SMTP_PASS> The mothership catalyst-api Pod already has both env vars wired via secretKeyRef → catalyst-openova-kc-credentials in the catalyst namespace (chart api-deployment.yaml.679-740) — no new K8s read against the mothership API is needed. Idempotent: an already-existing sovereign-smtp-credentials Secret short-circuits to AlreadyExists. The helper does NOT update an existing Secret — operator-supplied bytes take precedence over mothership re-seed. This survives the kubeconfig PUT retry path, the kubeconfig-missing relaunch (#538), and operator manual replay during incident response. Failure modes are surfaced via the SSE event bus (sovereign-smtp-seed phase) so the wizard renders the seed outcome inline with helmwatch events. A failure does NOT abort Phase-1 — the chart's lookup will not find the Secret, the auth pod will log SMTP-refused on first send-pin (exactly the pre-fix behaviour), and the operator sees a loud warn at provision time rather than a silent "ready" with broken email. Per docs/INVIOLABLE-PRINCIPLES.md #10 (credential hygiene): the catalyst-api never logs the SMTP password. Logs include the deployment id, target namespace + secret name, and byte length — never the plaintext. Per #4 (never hardcode): namespace + secret name are fixed-by-chart- contract (#901); timeout is overridable via CATALYST_SOVEREIGN_SMTP_SEED_TIMEOUT. Tests: - skipped-no-env outcome when mothership env unset - happy path: Secret + Namespace created, data + labels + annotations verified - already-exists pre-Create: no overwrite of operator bytes - race during Create: AlreadyExists treated as success - client-build failure: ClientFailure outcome - api-failure on Get (non-NotFound): APIFailure outcome - emit event matrix: every outcome maps to expected level + substr Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:58:49 +04:00
e3mrah	368545369b	fix(bp-stalwart-tenant): unbootable on fresh tenants — values shape, missing admin Secret, sec ctx (#898 ) (#904 ) Three fixes that left bp-stalwart-tenant 0.1.0 unable to come up on a freshly-franchised SME tenant. All surfaced on the otech103 alice tenant during the Phase-1 DoD sweep. 1. Tenant-domain values shape (HelmRelease render error) The 0.1.0 chart referenced `.Values.domain.primary` in five templates. The live HR on otech103 had `values.domain: acme.omani.works` (a string), emitted by a pre-#897 catalyst-api build, so every reconcile died with: can't evaluate field primary in type interface {} Added `bp-stalwart-tenant.tenantDomain` + `tenantMode` helpers that resolve in priority order: 1. `tenant.domain` (forward-looking flat shape) 2. `domain.primary` (canonical post-#897 map shape) 3. `domain` (string) (legacy pre-#897 shape — back-compat) Returns "" smoke-render-safe; per-template gates skip when empty. 2. Missing stalwart-admin Secret deployment.yaml + mailbox-provision-job.yaml reference a Secret key `ADMIN_PASSWORD` on `.Values.admin.secretName`. The 0.1.0 chart only emitted an ExternalSecret, and only when `admin.externalSecret.remoteRef.key` was non-empty (smoke-render concession). Fresh tenants land in CreateContainerConfigError. Added `templates/admin-secret.yaml` mirroring marketplace-api/ secret.yaml (#887): random 32-char ADMIN_PASSWORD generated by sprig randAlphaNum, persisted across reconcile via lookup, helm.sh/resource-policy: keep so reinstall picks it back up. Auto-disabled when an authoritative ExternalSecret is wired — no double-bind between two controllers. 3. Pod sec ctx vs. upstream image's file capabilities `getcap docker.io/stalwartlabs/stalwart:v0.16.3 /usr/local/bin/ stalwart` reports `cap_net_bind_service=ep`. The image creates user `stalwart` at UID 2000 and the binary IS the entrypoint (no demotion script). The 0.1.0 chart ran as UID 65534 with `drop: ALL` — kernel refuses to elevate file caps with empty bounding set, so exec failed with `operation not permitted`. Aligned to image's native UID 2000, kept `drop: ALL` and added `NET_BIND_SERVICE` explicitly. fsGroup 2000 ensures /opt/stalwart PVC is writable. Other: - Bumped Chart.yaml + blueprint.yaml to 0.1.1 (#817 alignment). - configSchema in blueprint.yaml now permits the legacy + tenant shapes alongside the canonical map. - mailboxProvisioner.setupJob.enabled defaults to false until the canonical stalwart-cli image is published (re-uses upstream stalwart container as fallback CLI host). Acceptance: targeted at otech103 alice tenant (sme-789ae512-bc0f-467c-a016-001f5496c403) where 0.1.0 reconciliation fails with the value-shape error and the pod CrashLoops with `exec ... operation not permitted`. Verification on otech103 in #898. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:55:03 +04:00
e3mrah	cab0a30e4a	fix(catalyst): unblock Sovereign Console login on fresh provision (#901 ) (#903 ) Three-bug chain blocked https://console.<sov-fqdn>/login PIN-issue on every fresh Sovereign with HTTP 503 "CATALYST_OPENOVA_KC_SA_CLIENT_SECRET not set": 1. catalyst-openova-kc-credentials Secret was hand-rolled on contabo-mkt and never provisioned on Sovereign by the chart. NEW templates/catalyst-openova-kc-credentials-secret.yaml mirrors the canonical KC SA Secret (keycloak/catalyst-kc-sa-credentials, created by bp-keycloak's openbao-bridge post-install hook) into catalyst-system/catalyst-openova-kc-credentials with the keys api-deployment.yaml's PIN-auth env block expects. Same Helm-`lookup` persistence + `helm.sh/resource-policy: keep` pattern as templates/marketplace-api/secret.yaml (#887). Sovereign-vs-contabo gate: render only when `lookup "v1" "Secret" "keycloak" "catalyst-kc-sa-credentials"` returns non-nil. On contabo that lookup is nil (Catalyst-Zero uses keycloak-zero in its own ns with its own hand-rolled Secret); template emits empty bytes, no ownership flap. Not added to templates/kustomization.yaml `resources:` so Kustomize-mode contabo build skips it entirely. 2. SMTP host default `stalwart-web.stalwart.svc.cluster.local` doesn't resolve on Sovereign. Chart now populates smtp-host/smtp-port/smtp-from from .Values.sovereign.smtp.* defaulting to mail.openova.io:587 / noreply@openova.io. SMTP user/pass mirrored from a SECONDARY lookup against catalyst-system/sovereign-smtp-credentials (#883 seam). When the source Secret is absent the new Secret renders with empty smtp-user/smtp-pass — login surface still works and PIN delivery surfaces as a clear "email delivery failed" log line, not as a 503. 3. CATALYST_POST_AUTH_REDIRECT default `/sovereign/wizard` is mothership- only. Default flips to `/sovereign/components` (the post-handover Sovereign Console homepage). Per-Sovereign overlays override via the catalystApi.env additional-env patch — the chart value is a literal per the dual-mode contract documented in the CATALYST_POWERDNS_API_URL block of api-deployment.yaml. Lockstep slot 13 pin in clusters/_template/bootstrap-kit/ 13-bp-catalyst-platform.yaml bumps from 1.4.16 → 1.4.17. Refs: #901 Signed-off-by: hatice.yildiz <hatice.yildiz@openova.io> Co-authored-by: hatice.yildiz <hatice.yildiz@openova.io>	2026-05-05 10:54:09 +04:00
e3mrah	93c4b700de	fix(bp-keycloak): templatize existingConfigmap reference for per-tenant installs (#899 ) (#902 ) bp-keycloak 1.3.2 hardcoded `keycloak.keycloakConfigCli.existingConfigmap` to the literal "keycloak-sovereign-realm-config". This worked for the Sovereign- mothership bootstrap-kit (releaseName=keycloak emits matching ConfigMap) but broke for every per-tenant install where releaseName=bp-keycloak emits "bp-keycloak-sovereign-realm-config" — the post-install keycloak-config-cli Job stuck in ContainerCreating with `MountVolume.SetUp failed for volume "config-volume" : configmap "keycloak-sovereign-realm-config" not found`, HelmRelease InstallFailed after 15m timeout, cascading to bp-openclaw and bp-wordpress-tenant which dependsOn it. The bitnami/keycloak subchart's `keycloak.keycloakConfigCli.configmapName` helper (charts/keycloak/templates/_helpers.tpl) applies `tpl` to the existingConfigmap value, so embedding `{{ .Release.Name }}` inside the string resolves at chart-render time. With this single-line change: - Sovereign-mothership (releaseName=keycloak) → keycloak-sovereign-realm-config (unchanged) - Per-tenant (releaseName=bp-keycloak) → bp-keycloak-sovereign-realm-config (matches actual emitted ConfigMap) Verified via helm template both modes — backendRef and config-volume configMap.name match the actual ConfigMap emitted by templates/configmap-sovereign-realm.yaml. Chart bumped 1.3.2 → 1.3.3 + bootstrap-kit slot 09 + blueprint.yaml. Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:49:39 +04:00
github-actions[bot]	febad0249d	deploy: update catalyst images to `6b0d6c3`	2026-05-05 06:00:29 +00:00
e3mrah	6b0d6c37af	fix(catalyst-api): SME tenant bp-stalwart overlay uses correct domain.{primary,mode} schema (#897 ) * fix(bp-catalyst-platform): bump 1.4.15 -> 1.4.16 to republish with #893/#889 catalyst-api image (`727fb2f`) * fix(catalyst-api): SME tenant bp-stalwart overlay uses correct domain.{primary,mode} schema The bp-stalwart-tenant chart values schema is: domain: primary: <fqdn> mode: free-subdomain \| byo But the tenant overlay template emitted a flat scalar: domain: <fqdn> Helm rendered the mailbox-provision-job template and hit: template: bp-stalwart-tenant/templates/mailbox-provision-job.yaml:67: can't evaluate field primary in type interface {} Fix: emit the correct nested object with .DomainMode threaded through from smeTenantTemplateData (already populated by renderSMETenantOverlay). --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 09:58:11 +04:00
github-actions[bot]	d084cceeba	deploy: update catalyst images to `98f5543`	2026-05-05 05:54:30 +00:00
e3mrah	98f5543bdc	fix(bp-catalyst-platform): bump 1.4.15 -> 1.4.16 to republish with #893/#889 catalyst-api image (`727fb2f`) (#896 ) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 09:52:30 +04:00
github-actions[bot]	98fc72dfd4	deploy: update catalyst images to `727fb2f`	2026-05-05 05:47:47 +00:00
e3mrah	727fb2ffdd	fix(catalyst-api): SME tenant orchestrator emits shared helmrepositories.yaml (#893 follow-up) (#895 ) * fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889) The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml that enumerates tenant subdirectories. The orchestrator only wrote per-tenant overlays without the parent index, so on otech103 Flux hit: kustomization path not found: stat /tmp/kustomization-... /clusters/otech103.omani.works/sme-tenants: no such file or directory Even after a tenant signup, the parent path lacked a kustomization.yaml so Flux couldn't enumerate subdirs. Fix: NEW writeParentTenantsIndex helper called from both WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for subdirectories that contain kustomization.yaml, sorts them lexically for deterministic output (no spurious diffs), and writes a parent kustomization.yaml listing them under `resources:`. Empty list (no tenants) renders as `resources: []` — still a valid Kustomization root, so Flux stays Ready=True after the last tenant teardown. git add covers both the per-tenant subdir AND the parent index, so a single commit captures the delta. Live on otech103 post-cutover, 2026-05-05. * fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891) After Day-2 cutover, the GitRepository ignore filter excluded the Sovereign's own clusters/<sov-fqdn>/ subtree. This made every Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov overlays) hit "kustomization path not found" because source-controller filtered the path out of the artifact tarball. Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for 20+ minutes despite the orchestrator successfully committing the overlay to local Gitea. Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a multi-line YAML strategic-merge file via /tmp emptyDir (since the Pod runs readOnlyRootFilesystem), composing the new ignore filter: /* !/clusters/_template !/clusters/${SOVEREIGN_FQDN} !/platform !/products The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already established in the chart values). Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep. * fix(catalyst-api): SME tenant HR templates reference correct per-blueprint HelmRepository names (#893) Five overlay templates in sme_tenant_gitops.go hardcoded: sourceRef: name: openova-blueprints But Sovereign clusters have NO HelmRepository named `openova-blueprints`. Each blueprint ships its own HelmRepository named after itself: - bp-keycloak / bp-cnpg / bp-wordpress-tenant / bp-openclaw / bp-stalwart-tenant Live on otech103 (2026-05-05): all 5 tenant bp-* HRs stuck in "HelmChart not ready: latest generation of object has not been reconciled" because the HelmRepository didn't exist. Fix: each template's sourceRef.name now matches the actual HelmRepository name. Verified live patch works on otech103. * fix(catalyst-api): SME tenant orchestrator emits shared helmrepositories.yaml at parent level (#893 follow-up) After #893 fixed the per-tenant HR sourceRef.name to match the actual HelmRepository name, the HelmRepositories themselves were absent on Sovereigns: the bootstrap-kit only ships a small canonical set (bp-cilium, bp-cnpg, bp-keycloak, bp-gitea, ...). The SME tenant charts (bp-wordpress-tenant, bp-openclaw, bp-stalwart-tenant) and the vcluster (loft) repo aren't on a Sovereign by default. Fix: extend writeParentTenantsIndex to ALSO emit a shared helmrepositories.yaml at clusters/<sov-fqdn>/sme-tenants/ helmrepositories.yaml. The parent kustomization.yaml lists it FIRST so source-controller reconciles the HelmRepositories before any tenant HelmChart is requested. Six HelmRepositories total: bp-keycloak, bp-cnpg, bp-wordpress-tenant, bp-openclaw, bp-stalwart-tenant (oci://ghcr.io/openova-io), and loft (https://charts.loft.sh) for the vcluster chart. Live verification on otech103: applied the four missing repos (bp-wordpress-tenant, bp-openclaw, bp-stalwart-tenant, loft) and the tenant HRs progress past SourceNotReady. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 09:44:52 +04:00
github-actions[bot]	4a810ddcf7	deploy: update catalyst images to `3eb0cd6`	2026-05-05 05:43:58 +00:00
e3mrah	3eb0cd6d0b	fix(catalyst-api): SME tenant HR templates reference correct per-blueprint HelmRepository names (#893 ) (#894 ) * fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889) The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml that enumerates tenant subdirectories. The orchestrator only wrote per-tenant overlays without the parent index, so on otech103 Flux hit: kustomization path not found: stat /tmp/kustomization-... /clusters/otech103.omani.works/sme-tenants: no such file or directory Even after a tenant signup, the parent path lacked a kustomization.yaml so Flux couldn't enumerate subdirs. Fix: NEW writeParentTenantsIndex helper called from both WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for subdirectories that contain kustomization.yaml, sorts them lexically for deterministic output (no spurious diffs), and writes a parent kustomization.yaml listing them under `resources:`. Empty list (no tenants) renders as `resources: []` — still a valid Kustomization root, so Flux stays Ready=True after the last tenant teardown. git add covers both the per-tenant subdir AND the parent index, so a single commit captures the delta. Live on otech103 post-cutover, 2026-05-05. * fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891) After Day-2 cutover, the GitRepository ignore filter excluded the Sovereign's own clusters/<sov-fqdn>/ subtree. This made every Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov overlays) hit "kustomization path not found" because source-controller filtered the path out of the artifact tarball. Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for 20+ minutes despite the orchestrator successfully committing the overlay to local Gitea. Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a multi-line YAML strategic-merge file via /tmp emptyDir (since the Pod runs readOnlyRootFilesystem), composing the new ignore filter: /* !/clusters/_template !/clusters/${SOVEREIGN_FQDN} !/platform !/products The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already established in the chart values). Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep. * fix(catalyst-api): SME tenant HR templates reference correct per-blueprint HelmRepository names (#893) Five overlay templates in sme_tenant_gitops.go hardcoded: sourceRef: name: openova-blueprints But Sovereign clusters have NO HelmRepository named `openova-blueprints`. Each blueprint ships its own HelmRepository named after itself: - bp-keycloak / bp-cnpg / bp-wordpress-tenant / bp-openclaw / bp-stalwart-tenant Live on otech103 (2026-05-05): all 5 tenant bp-* HRs stuck in "HelmChart not ready: latest generation of object has not been reconciled" because the HelmRepository didn't exist. Fix: each template's sourceRef.name now matches the actual HelmRepository name. Verified live patch works on otech103. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 09:41:47 +04:00
e3mrah	eddf0e62a4	fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter (#891 ) (#892 ) * fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889) The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml that enumerates tenant subdirectories. The orchestrator only wrote per-tenant overlays without the parent index, so on otech103 Flux hit: kustomization path not found: stat /tmp/kustomization-... /clusters/otech103.omani.works/sme-tenants: no such file or directory Even after a tenant signup, the parent path lacked a kustomization.yaml so Flux couldn't enumerate subdirs. Fix: NEW writeParentTenantsIndex helper called from both WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for subdirectories that contain kustomization.yaml, sorts them lexically for deterministic output (no spurious diffs), and writes a parent kustomization.yaml listing them under `resources:`. Empty list (no tenants) renders as `resources: []` — still a valid Kustomization root, so Flux stays Ready=True after the last tenant teardown. git add covers both the per-tenant subdir AND the parent index, so a single commit captures the delta. Live on otech103 post-cutover, 2026-05-05. * fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891) After Day-2 cutover, the GitRepository ignore filter excluded the Sovereign's own clusters/<sov-fqdn>/ subtree. This made every Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov overlays) hit "kustomization path not found" because source-controller filtered the path out of the artifact tarball. Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for 20+ minutes despite the orchestrator successfully committing the overlay to local Gitea. Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a multi-line YAML strategic-merge file via /tmp emptyDir (since the Pod runs readOnlyRootFilesystem), composing the new ignore filter: /* !/clusters/_template !/clusters/${SOVEREIGN_FQDN} !/platform !/products The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already established in the chart values). Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 09:39:42 +04:00
github-actions[bot]	c2ff6da073	deploy: update catalyst images to `a9f0626`	2026-05-05 05:31:48 +00:00
e3mrah	a9f06265fb	fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889 ) (#890 ) The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml that enumerates tenant subdirectories. The orchestrator only wrote per-tenant overlays without the parent index, so on otech103 Flux hit: kustomization path not found: stat /tmp/kustomization-... /clusters/otech103.omani.works/sme-tenants: no such file or directory Even after a tenant signup, the parent path lacked a kustomization.yaml so Flux couldn't enumerate subdirs. Fix: NEW writeParentTenantsIndex helper called from both WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for subdirectories that contain kustomization.yaml, sorts them lexically for deterministic output (no spurious diffs), and writes a parent kustomization.yaml listing them under `resources:`. Empty list (no tenants) renders as `resources: []` — still a valid Kustomization root, so Flux stays Ready=True after the last tenant teardown. git add covers both the per-tenant subdir AND the parent index, so a single commit captures the delta. Live on otech103 post-cutover, 2026-05-05. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 09:29:44 +04:00
github-actions[bot]	654ac4fb5e	deploy: update catalyst images to `3726176`	2026-05-05 05:28:33 +00:00
e3mrah	3726176e19	fix(bp-catalyst-platform): auto-provision marketplace-api-secrets on Sovereign install (#887 ) (#888 ) * fix(bp-catalyst-platform): bump 1.4.13 -> 1.4.14 to republish with #879 catalyst-api image (`7bfd6df`) Chart 1.4.13 was published from commit `7bfd6df5` (the #879 fix) BEFORE the deploy-bot updated values.yaml's catalystApi.tag from `aa226df` -> `7bfd6df`, so 1.4.13 OCI bytes still reference the OLD catalyst-api image without the pdmFlipNS basic-auth + nameservers + lookup-primary-domain SOVEREIGN_FQDN-fallback fixes. Same deploy-step race already documented in 1.4.6 / 1.4.9 / 1.4.12 changelog entries — catalyst-build CI doesn't yet auto-bump chart patch + dispatch blueprint-release the way services-build does (per #874), so this manual republish is required after every catalyst-api image change. No template/code changes — pure version bump to roll a fresh OCI artifact whose values.yaml references catalystApi.tag=7bfd6df. Lockstep slot 13 pin bumps to 1.4.14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-catalyst-platform): auto-provision marketplace-api-secrets on Sovereign install (#887) templates/marketplace-api/deployment.yaml referenced a secretKeyRef on `marketplace-api-secrets` (key: `jwt-secret`) but the chart never rendered the Secret. On contabo-mkt this is hand-rolled; on a freshly franchised Sovereign with ingress.marketplace.enabled=true the marketplace-api Pod hit CreateContainerConfigError on every reconcile. Fix: NEW templates/marketplace-api/secret.yaml uses Helm `lookup` to persist a 64-char randAlphaNum jwt-secret across reconciles (same load-bearing pattern as sme-secrets, valkey-cross-ns-secret, provisioning-github-token, gitea-admin-secret per feedback_passwords.md). Without lookup every reconcile would invalidate every active marketplace JWT. helm.sh/resource-policy: keep so the Secret survives helm uninstall. Lockstep slot 13 pin bumps 1.4.14 -> 1.4.15. Caught live on otech103 post-cutover, 2026-05-05. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 09:26:23 +04:00
github-actions[bot]	87e090dd0c	deploy: update catalyst images to `213039d`	2026-05-05 05:12:35 +00:00
e3mrah	213039dc31	fix(bp-catalyst-platform): bump 1.4.13 -> 1.4.14 to republish with #879 catalyst-api image (`7bfd6df`) (#886 ) Chart 1.4.13 was published from commit `7bfd6df5` (the #879 fix) BEFORE the deploy-bot updated values.yaml's catalystApi.tag from `aa226df` -> `7bfd6df`, so 1.4.13 OCI bytes still reference the OLD catalyst-api image without the pdmFlipNS basic-auth + nameservers + lookup-primary-domain SOVEREIGN_FQDN-fallback fixes. Same deploy-step race already documented in 1.4.6 / 1.4.9 / 1.4.12 changelog entries — catalyst-build CI doesn't yet auto-bump chart patch + dispatch blueprint-release the way services-build does (per #874), so this manual republish is required after every catalyst-api image change. No template/code changes — pure version bump to roll a fresh OCI artifact whose values.yaml references catalystApi.tag=7bfd6df. Lockstep slot 13 pin bumps to 1.4.14. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 09:10:37 +04:00
e3mrah	4120e4ed9d	fix(bp-catalyst-platform): Flux Kustomization watching SME tenant overlays (#882 ) (#885 ) The catalyst-api SME-tenant pipeline's GitOps writer (sme_tenant_gitops.go::WriteTenantOverlay) commits per-tenant Kustomize overlays to clusters/<sov-fqdn>/sme-tenants/<tenant-id>/ on every successful POST /api/v1/sme/tenants — but no Flux Kustomization on the Sovereign cluster watched that path. The state machine (sme_tenant.go) advanced optimistically through every step (vcluster -> bp_charts -> dns -> certs -> keycloak_clients -> registry) and reported state=done, while no actual K8s resources materialised because nothing was reconciling the orchestrator's write target. Verified live on otech103 (2026-05-04 23:18 Berlin): the orchestrator successfully committed the 9-file overlay for tenant 15f1e45e-... to the local Gitea openova/openova repo @main, but `kubectl get hr -n sme-15f1e45e-...` returned No resources found indefinitely. Fix: - NEW templates/sme-services/sme-tenants-kustomization.yaml renders one Flux Kustomization in flux-system that sweeps the entire ./clusters/<global.sovereignFQDN>/sme-tenants directory tree. - sourceRef: flux-system/openova GitRepository (the same one the cluster bootstraps from; cutover Step 5 flips its .spec.url to the local in-cluster Gitea, which is precisely where sme_tenant_gitops.go pushes via CATALYST_GITOPS_REPO_URL). - interval=1m (matches the orchestrator's documented "Flux reconciles within ~1 min" SLA), prune=true (DELETE /api/v1/sme/tenants/<id> removes the overlay; Flux GCs the resources), wait=false (per-tenant overlays each install ~5 bp-* HRs asynchronously and have their own readiness watcher in the orchestrator; blocking this top-level Kustomization on every tenant's full readiness would let one stuck tenant gate every other tenant). - Gated on .Values.ingress.marketplace.enabled — non-marketplace Sovereigns don't run the SME tenant pipeline. - Per Inviolable Principle #4, every knob is operator-overridable via .Values.smeTenants.kustomization.* (sourceRef name/namespace, interval, retryInterval, timeout, prune, wait). Lockstep slot 13 pin in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml bumps from 1.4.12 -> 1.4.13. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 09:09:00 +04:00
github-actions[bot]	be54707bfb	deploy: update catalyst images to `7bfd6df`	2026-05-05 05:04:30 +00:00
e3mrah	7bfd6df588	fix(catalyst-api,bp-catalyst-platform,infra): unblock multi-domain Day-2 add-domain flow on Sovereigns (#879 ) (#884 ) 5 stacked wiring bugs blocked the Day-2 add-parent-domain happy path on a fresh post-handover Sovereign — surfaced live on otech103, 2026-05-05 — plus a 6th gap (ghcr-pull reflector for catalyst-system). All six fixed in one PR so a single chart bump + cloud-init re-render closes the gap end-to-end. Bug 1 (chart, api-deployment.yaml): wire POOL_DOMAIN_MANAGER_URL= https://pool.openova.io. The in-cluster Service default only resolves on contabo; on Sovereigns every Day-2 POST died with NXDOMAIN. Bug 2 (chart + code): wire CATALYST_PDM_BASIC_AUTH_USER / _PASS env from a new pdm-basicauth Secret, and have pdmFlipNS SetBasicAuth from those envs. The PDM public ingress at pool.openova.io is gated by Traefik basicAuth; calls without Authorization: Basic returned 401. optional=true so contabo + CI + older Sovereigns degrade to a clear 401 log line. Per Inviolable Principle #10, the credentials only ever live in Pod env + are read once per call by pdmFlipNS — never enter a logged struct or persisted record. Bug 3 (code, parent_domains.go): pdmFlipNS body now includes the required nameservers field (computed from expectedNSFor). PDM's SetNSRequest schema requires it; the previous body got 422 missing-nameservers. Bug 4 (code, parent_domains.go): lookupPrimaryDomain falls back to SOVEREIGN_FQDN env after CATALYST_PRIMARY_DOMAIN. On a post-handover Sovereign no Deployment record is persisted, so without this fallback GET /parent-domains returned {"items":[]} and the propagation panel showed expectedNs:null. SOVEREIGN_FQDN is already wired by api-deployment.yaml from the sovereign-fqdn ConfigMap. Bug 5 (chart, httproute.yaml): catalyst-ui /auth/* PathPrefix narrowed to Exact /auth/handover. The previous PathPrefix collided with OIDC PKCE redirect_uri /auth/callback — catalyst-api 404s on that path because it only registers /api/v1/auth/callback, breaking login post-handover-JWT- cookie expiry. Exact match keeps /auth/handover routed to catalyst-api while every other /auth/* path falls through to catalyst-ui's React Router for client-side OIDC. Bug 6 (cloud-init): ghcr-pull + harbor-robot-token + new pdm-basicauth Reflector annotations enumerate explicit allowed/auto-namespaces (sme, catalyst, catalyst-system, gitea, harbor) instead of empty-string. The ambiguous empty-string interpretation caused otech103 to require a manual catalyst-system mirror creation; explicit list back-ports the verified working state. Provisioner wiring: Request.PDMBasicAuthUser/Pass + Provisioner fields + tfvars emission so the contabo catalyst-api can stamp the credentials onto every Sovereign provision request. variables.tf adds matching pdm_basic_auth_user / pdm_basic_auth_pass tofu vars (sensitive, default empty) so older provisioner builds that pre-date this change keep rendering valid cloud-init (the Secret renders with empty values and Pod start is unaffected). Chart bumped 1.4.11 -> 1.4.12, lockstep slot 13 pin updated. Closes the architectural blockers tracked in #879; the catalyst-api image rebuild + chart republish run via the existing CI pipelines (services- build.yaml + blueprint-release.yaml) on this commit's SHA. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 09:02:39 +04:00
github-actions[bot]	2bcff5b43b	deploy: update catalyst images to `aa226df`	2026-05-05 04:52:11 +00:00
e3mrah	aa226df757	fix(bp-catalyst-platform): bump 1.4.11 -> 1.4.12 to republish with current catalyst-api image (#878 follow-up) (#881 ) Same deploy-step race as #871 (chart 1.4.9): chart 1.4.11 was published from commit `7bdd14fc` BEFORE the deploy-bot updated values.yaml's catalystApi.tag from `20413ec` -> `7bdd14f`. The OCI artifact for 1.4.11 still bakes in the OLD image SHA without the git binary, so otech103 reconciles 1.4.11 and the catalyst-api Pod runs an image that still fails the SME tenant pipeline at git clone. Long-term fix is the catalyst-build equivalent of #874 (auto-bump chart patch on Catalyst-API image rebuild). Short-term: this manual bump. No template change. Lockstep slot 13 pin bumps to 1.4.12. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:50:06 +04:00
github-actions[bot]	1d7023d7c0	deploy: update catalyst images to `7bdd14f`	2026-05-05 04:47:59 +00:00
e3mrah	7bdd14fcb1	fix(catalyst-api,bp-catalyst-platform): SME tenant gitops auth + git binary (#878 ) (#880 ) Three-part fix that unblocks the SME tenant pipeline post-Day-2- Independence cutover. Live-reproduced on otech103 — POST /api/v1/sme/ tenants succeeds (HTTP 202) but the first reconcile fails with "gitops token unconfigured" → after wiring the env, fails with `exec: "git": executable file not found in $PATH` → after fixing the URL hardcoding, would still 401 against local Gitea because the basic-auth username is hardcoded "x-access-token". Part A — code (marketplace_settings.go + sme_tenant_gitops.go): - Add gitOpsConfig.User (loaded from CATALYST_GITOPS_USER env, default "x-access-token" for back-compat with GitHub PATs). - New injectTokenIntoURLWithUser(rawURL, user, token) — variant of injectTokenIntoURL that takes a configurable basic-auth username. - Update all 3 call sites in marketplace_settings.go + sme_tenant_gitops.go to use the new variant with cfg.User. Part B — Containerfile: - apk add git in the runtime stage. The SME tenant pipeline (#804) and marketplace-settings GitOps writer both shell out to git clone/commit/push; without the binary every first reconcile fails. Part C — chart (api-deployment.yaml): - Wire CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN envs on catalyst-api Deployment, sourced from the local `gitea-admin-secret` (already mirrored into catalyst-system via bp-reflector annotation per #866). optional=true so Catalyst-Zero (contabo) keeps using its existing GitHub PAT path. Bump bp-catalyst-platform 1.4.10 -> 1.4.11 + lockstep slot 13 pin. Closes #878 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:45:45 +04:00
e3mrah	8e4c88fd28	fix(bp-self-sovereign-cutover): auto-sync local Gitea mirror from upstream GitHub (#870 ) (#875 ) Step-1 gitea-mirror Job replaces the legacy one-shot create-empty-repo + git-push pattern with a single call to Gitea's native /repos/migrate API with mirror=true and mirror_interval=10m0s. Gitea now polls the upstream openova-io/openova repo on a 10-minute interval and replicates branches + tags into the local Sovereign Gitea automatically. Closes the "Sovereign drifts from upstream main forever after Day-2 cutover" bug — hit twice during the otech103 2026-05-04 overnight DoD session, requiring manual `git fetch` inside the Gitea pod for every chart rollout. Why /repos/migrate over the previous git push approach: - Gitea cannot convert a regular repo into a pull-mirror after creation (the mirror flag is set at create-time only). The migrate endpoint creates the repo AS a mirror in one shot. - The migrate endpoint accepts toggles for issues / pull-requests / wiki / labels / milestones / releases — we set them all to false so Gitea only replicates branches+tags, the only refs the Sovereign's Flux GitRepository needs. - Recurring sync is a Gitea-native capability; using it avoids a parallel CronJob (which would violate the "event-driven not cron" inviolable principle) or a long-poll sidecar (which would duplicate what Gitea already does). Idempotency: if the repo already exists from a prior cutover attempt, the script PATCHes mirror_interval to the desired value and POSTs to /mirror-sync to trigger an immediate refresh. Note that PATCH alone cannot convert a legacy non-mirror repo to a mirror — Sovereigns seeded by chart < 0.1.14 would need an operator-driven repo delete + re-migrate to retro-fit auto-sync, but new provisions take the migrate path automatically. Verification on the rendered ConfigMap: $ helm template smoke . # renders 16 docs cleanly $ bash tests/cutover-contract.sh # all 7 gates green $ sh -n <rendered-script> # POSIX shell syntax OK Chart bumped 0.1.13 → 0.1.14 (Chart.yaml + blueprint.yaml spec.version aligned per #817 invariant + slot 06a-bp-self-sovereign-cutover.yaml pin lockstep). Refs #870, #790. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:35:40 +04:00
e3mrah	5a8210856f	fix(bp-catalyst-platform): wire CATALYST_OTECH_FQDN env on catalyst-api Deployment (#876 ) (#877 ) The SME tenant create handler (sme_tenant.go:481) and the parent- domain pool seed (sovereign_parent_domains.go:45) both read the CATALYST_OTECH_FQDN env. The chart only wired SOVEREIGN_FQDN (same value semantically — the Sovereign's public FQDN — but a different env name). Without CATALYST_OTECH_FQDN, POST /api/v1/sme/tenants returns 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign, and the SME-pool fallback path returns an empty list. Fix: add a CATALYST_OTECH_FQDN env entry on the catalyst-api Deployment, sourced from the same `sovereign-fqdn` ConfigMap (key `fqdn`) that feeds SOVEREIGN_FQDN. optional=true since Catalyst-Zero (contabo) doesn't run the SME tenant pipeline. The two env names exist for historical reasons (Phase-8b handover vs SME-tier tenant pipeline #804); they ultimately point at the same value. Bump bp-catalyst-platform 1.4.9 -> 1.4.10 + lockstep slot 13 pin. Closes #876 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:35:27 +04:00
e3mrah	db332f6767	fix(ci): services-build auto-bumps chart patch + dispatches blueprint-release (#874 ) * fix(bp-catalyst-platform): bump 1.4.8 -> 1.4.9 to republish with current services-auth image (#871) Chart 1.4.8 was published from commit `95a06f56` BEFORE the deploy-bot updated templates/sme-services/auth.yaml's image pin from services-auth:fa4395f -> services-auth:95a06f5 (which has the /auth/send-pin alias from PR #869). The blueprint-release workflow fired on `95a06f56` only, so the OCI artifact for 1.4.8 was published with the OLD image SHA in chart bytes. otech103 reconciled 1.4.8 and rendered the auth Deployment with the OLD image -> /auth/send-pin returns 404 -> SME marketplace signup blocked. Same deploy-step race documented in feedback_idempotent_iac_purge.md and the overnight DoD bookmark. Long-term fix is a double-bump sequencing PR (file separately); short-term fix is bumping the chart version so blueprint-release republishes the artifact with the current image pin. No template change. Lockstep slot 13 pin in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml bumps from 1.4.8 -> 1.4.9. Closes #871 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): services-build deploy auto-bumps chart patch + dispatches blueprint-release (#872) Eliminate the recurring race between services-build's deploy commit and blueprint-release's path-trigger on chart-version-bumping PRs. Before: a PR bumping `products/catalyst/chart/Chart.yaml` AND touching `core/services/*` triggered both workflows on the same merge SHA in parallel. blueprint-release packaged the chart at the merge commit (which still held the OLD image SHAs) and published the bumped chart version with stale image refs. services-build's deploy commit landed AFTER, but per GitHub Actions design GITHUB_TOKEN-authored pushes do NOT re-trigger workflows, so blueprint-release never fired again on the corrected chart. A manual no-op chart bump PR was the only way to republish (PR #865 chasing PR #864 was the live incident). After: services-build's deploy step 1. sed-rewrites image: lines under products/catalyst/chart/templates/sme-services/.yaml (unchanged) 2. Pure-bash semver patch-bumps Chart.yaml `version:` and `appVersion:` atomically 3. Single commit captures both rewrites 4. Explicit `gh workflow run blueprint-release.yaml -f blueprint=catalyst -f tree=products` dispatches the chart publish (matches catalyst-build's PR #720 pattern) 5. Idempotent push retry re-reads origin/main and bumps from THAT version on conflict, so concurrent CI runs produce strictly increasing patch versions instead of clobbering each other Adds `actions: write` to the deploy job permissions so the gh workflow run dispatch doesn't return HTTP 403. The manual chart-version field in author PRs becomes a floor; CI auto-bumps from there. PR authors should NOT bump the patch themselves any more — the deploy step does it. Major/minor bumps remain the author's call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:32:34 +04:00
github-actions[bot]	8e8bb642aa	deploy: update catalyst images to `20413ec`	2026-05-05 04:31:32 +00:00
e3mrah	20413ecc14	fix(bp-catalyst-platform): bump 1.4.8 -> 1.4.9 to republish with current services-auth image (#871 ) (#873 ) Chart 1.4.8 was published from commit `95a06f56` BEFORE the deploy-bot updated templates/sme-services/auth.yaml's image pin from services-auth:fa4395f -> services-auth:95a06f5 (which has the /auth/send-pin alias from PR #869). The blueprint-release workflow fired on `95a06f56` only, so the OCI artifact for 1.4.8 was published with the OLD image SHA in chart bytes. otech103 reconciled 1.4.8 and rendered the auth Deployment with the OLD image -> /auth/send-pin returns 404 -> SME marketplace signup blocked. Same deploy-step race documented in feedback_idempotent_iac_purge.md and the overnight DoD bookmark. Long-term fix is a double-bump sequencing PR (file separately); short-term fix is bumping the chart version so blueprint-release republishes the artifact with the current image pin. No template change. Lockstep slot 13 pin in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml bumps from 1.4.8 -> 1.4.9. Closes #871 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:29:37 +04:00
github-actions[bot]	43a31f680c	deploy: update sme service images to `95a06f5`	2026-05-05 04:23:28 +00:00
e3mrah	95a06f56f8	fix(sme-marketplace): unblock PIN signin — route /api/* to sme/gateway + add send-pin alias (#868 ) (#869 ) Two-part fix for marketplace UI signin flow which 503'd then 404'd on otech103. Live debugging found two stacked bugs. Part A — chart (HTTPRoute backend): - marketplace-routes.yaml: /api/* rule now backendRefs sme/gateway:8080 (cross-namespace) instead of catalyst-system/marketplace-api which had a Service selector matching zero Pods. The gateway in sme already fronts services-auth, catalog, tenant, billing, provisioning. - marketplace-reference-grant.yaml: extend `to:` list with the gateway Service so the cross-ns hop is authorised by Gateway API. - Bump bp-catalyst-platform 1.4.7 → 1.4.8 + lockstep slot 13 pin. Part B — services-auth (route name): - Add /auth/send-pin alias delegating to existing SendMagicLink handler, and /auth/verify-pin alias delegating to VerifyMagicLink. The marketplace UI surfaces a 6-digit PIN ("Send PIN" button), so the PIN-named routes are the canonical UX-facing names. /auth/magic-link and /auth/verify remain registered for backward compat. - services-build workflow auto-rebuilds the auth image on push to core/services/** — no manual dispatch needed. Refs: #868 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 08:22:17 +04:00
github-actions[bot]	b42a61f883	deploy: update catalyst images to `3bfc97d`	2026-05-05 02:28:04 +00:00
e3mrah	3bfc97dcea	feat(bp-catalyst-platform): provision provisioning-github-token Secret on Sovereign install (#866 ) (#867 ) After #859 + #861 + #863 cleared 12/13 SME pods on otech103, the provisioning Deployment stayed in CreateContainerConfigError waiting on `secret/provisioning-github-token` (key GITHUB_TOKEN) which exists on contabo-mkt as a hand-rolled SealedSecret but had no Sovereign-side equivalent. Without this Secret the Pod can't even start. Fix (issue #866 Option C — local-Gitea target): Post-cutover the canonical Git target on a Sovereign IS the local Gitea instance (the GitRepository CRs already point there). New template templates/sme-services/provisioning-github-token.yaml uses Helm `lookup` to read the auto-generated gitea admin password from `gitea/gitea-admin-secret` and re-emit it as `sme/provisioning-github-token` under the GITHUB_TOKEN key. Same lookup-and-mirror pattern as valkey-cross-ns-secret.yaml (#863) and sme-secrets.yaml (#859). bp-gitea (slot 10) reaches Ready before bp-catalyst-platform (slot 13) so the lookup has data by the time this template renders. values.yaml — new `smeServices.provisioning.gitToken.*` block (sourceNamespace / sourceSecretName / sourcePasswordKey / destNamespace / destSecretName / destKey) so per-Sovereign overlays pointing the provisioning service at a non-Gitea Git host (e.g. a GitHub PAT via OpenBao + ExternalSecret) can swap the source ref without forking the chart (Inviolable Principle #4). Out of scope: full Gitea REST-API target support in core/services/provisioning/github/client.go (which hardcodes https://api.github.com today) is a follow-up Go change. Chart 1.4.6 → 1.4.7. Slot 13 pin bumped in lockstep. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 06:26:03 +04:00
github-actions[bot]	348b70a7d9	deploy: update catalyst images to `b0debf9`	2026-05-05 02:18:30 +00:00
e3mrah	b0debf93a6	fix(bp-catalyst-platform): bump 1.4.5 -> 1.4.6 to bundle rebuilt SME images (#863 ) (#865 ) Chart 1.4.5 was published at commit `fa4395fa` BEFORE the services-build deploy step committed `9731701c` updating auth.yaml + gateway.yaml `image:` lines to `fa4395f`. Result: Sovereigns pulling 1.4.5 got the OLD image (`5cdb738`) without the ConnectValkeyWithAuth Go change — VALKEY_PASSWORD env was wired but the binary ignored it and still failed with "NOAUTH HELLO" on connect. Same race documented in 1.1.16 changelog (catalyst-ui base:/ fix). No template/code changes — pure version bump to roll a fresh OCI artifact whose `helm template` output references the rebuilt image. Slot 13 pin lockstep 1.4.5 -> 1.4.6. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 06:16:27 +04:00
github-actions[bot]	9731701c56	deploy: update sme service images to `fa4395f`	2026-05-05 02:10:45 +00:00
e3mrah	fa4395fa3a	fix(bp-catalyst-platform): wire VALKEY_PASSWORD into SME auth + gateway (#863 ) (#864 ) After PR #862 (1.4.4) made cross-ns Valkey reachable from `sme` ns, the auth Pod started CrashLoopBackOff with "NOAUTH HELLO must be called with the client already authenticated". Root cause: bp-valkey 1.0.0 ships auth.enabled=true (bitnami default) but SME service code + Deployment templates never plumbed a password through. Chart 1.4.4 -> 1.4.5. Slot 13 pin lockstep. Changes: - core/services/shared/db/valkey.go: add ConnectValkeyWithAuth overload taking username + password. ConnectValkey kept backwards-compatible for contabo-mkt's auth-less in-namespace Valkey. - core/services/auth/main.go + gateway/main.go: read VALKEY_USERNAME + VALKEY_PASSWORD env, call ConnectValkeyWithAuth when password set, else fall through to no-auth path. - NEW templates/sme-services/valkey-cross-ns-secret.yaml: Helm `lookup` reads bp-valkey's auto-generated `valkey-password` from the `valkey/valkey` Secret and re-emits it as `sme-valkey-auth` in `sme` ns. Same pattern as sme-secrets.yaml (#859) and gitea-admin-secret (#830 Bug 2). On first install the lookup may return nil; Flux's 15m reconcile picks up the mirror once bp-valkey is Ready. - auth.yaml + gateway.yaml: add VALKEY_PASSWORD env from `sme-valkey- auth` Secret with optional=true so contabo-mkt's auth-less path keeps working when the mirror Secret is absent. - values.yaml: add `smeServices.valkey.{sourceSecretName, sourcePasswordKey, destNamespace, destSecretName}` knobs (Inviolable Principle #4). Live verified the failure mode on otech103: 11/13 SME pods Running 1/1, auth in CrashLoopBackOff with NOAUTH HELLO error. Provisioning Pod's CreateContainerConfigError is unrelated (ghcr-pull, separate ticket). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 06:09:38 +04:00
github-actions[bot]	329baf0d65	deploy: update catalyst images to `ee00ec0`	2026-05-05 01:55:09 +00:00

1 2 3 4 5 ...

1215 Commits