openova

Author	SHA1	Message	Date
github-actions[bot]	a78b4e2e51	deploy: update catalyst images to `dad5ead`	2026-05-04 07:54:28 +00:00
e3mrah	dad5ead534	feat(wizard): Marketplace mode step (#710 wave 3a) (#725 ) Inserts StepMarketplace between StepComponents and StepDomain so the operator can opt the new Sovereign into a multi-tenant SaaS platform during provisioning. The toggle drives store.marketplaceEnabled, which StepReview now ships in the POST /v1/deployments body — the catalyst-api Request struct + OpenTofu var.marketplace_enabled + cloud-init Flux substitute + bp-catalyst-platform ingress.marketplace.enabled values were all wired earlier (PR #719); this PR is the missing UI seam. Brand fields (name / tagline / primary colour) persist on the wizard state so a future settings page can read them without re-prompting on every wizard run. The chart only consumes the enabled flag for now. Wizard step list grows from 7 to 8 stops (StepMarketplace at id=6, shifting Domain → 7 and Review → 8). WizardLayout test updated to assert the new count; the existing pre-existing StepComponents test failures (CORTEX cascade) and the @tabler/icons-react typecheck error are untouched and unrelated. Companion PRs (other agents): post-launch settings page + catalog publish/unpublish admin. This is 1 of 3 parallel pieces on #710 wave 3. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 11:52:17 +04:00
github-actions[bot]	f7365de162	deploy: update sme service images to `2a034a0`	2026-05-04 07:38:18 +00:00
github-actions[bot]	84d40a58c7	deploy: update Catalyst marketplace image to `2a034a0`	2026-05-04 07:37:45 +00:00
e3mrah	2a034a0959	feat(catalog): unified catalog with Published flag — operator curates marketplace (#710 wave 2) (#724 ) Single source of truth for apps; Sovereign-console operator decides which apps marketplace customers see; marketplace storefront filters by Published. Per founder rule 2026-05-04: unpublish is a marketplace- visibility toggle, not a deployment-lifecycle action — existing tenant deployments of an unpublished app keep running unaffected. core/services/catalog/store/store.go ==================================== - App.Published bool — operator-controlled visibility - ListPublishedApps: marketplace-storefront subset (Published=true AND System=false AND Deployable=true). System and Deployable are catalog-team-controlled; Published is the operator's curation knob. - SetAppPublished(slug, bool) — hot-path one-bit write the Sovereign console hits per row toggle. Cheaper than UpdateApp; slug-keyed so the UI doesn't need the internal Mongo _id. - UpdateApp: thread published through full-update path too. core/services/catalog/handlers/handlers.go + routes.go ====================================================== - ListApps now honours ?published=true query param: GET /catalog/apps → operator view: every app GET /catalog/apps?published=true → marketplace view: filtered - New PATCH /catalog/admin/apps/{slug}/publish?value={true\|false} for the Sovereign-console operator's row toggle. - requireAdmin gating preserved on the admin endpoint. core/services/catalog/handlers/seed.go ====================================== - migrateAppPublished: defaults Published=true on every existing app on the day Catalyst 1.3.x ships. Operators opt OUT of marketplace visibility per app, not IN — matches how a real SaaS storefront is curated and prevents an empty marketplace on flag-introduction day. Idempotent on re-run. core/marketplace/src/lib/api.ts ================================ - getApps() now hits /catalog/apps?published=true so the marketplace storefront only renders the operator-curated subset. DoD pending wave 2.5 ==================== The Sovereign-console "Catalog & publishing" admin page (per-row toggle UI) is the next chunk and ships in a follow-up — backend + storefront filter are the load-bearing change here. Catalog admins can flip the flag today via the PATCH endpoint; the per-row UI is quality-of-life on top. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 11:37:03 +04:00
github-actions[bot]	52f68420ac	deploy: update Catalyst marketplace image to `73d68d9`	2026-05-04 07:31:20 +00:00
e3mrah	73d68d99c1	fix(auth-ux): HTML PIN email + copyable email pill + 6-box marketplace PIN + drop UI debris (#721 ) (#723 ) Wave 1 of #721 — what the founder actually saw on console.openova.io and marketplace.openova.io / marketplace.<sov>. PIN email rewrite (catalyst-api auth.go) ======================================== Was: plaintext "Your OpenOva sign-in code:\n\n 9 6 5 1 2 8\n…" Now: multipart/alternative MIME with a polished HTML alternative — white card on neutral background, OpenOva mark + wordmark, "Your sign-in code" heading, big tinted code block (34px monospaced, 10px letter-spacing, one-tap copy on iOS Mail), expiration + ignore notice, footer credit. Inline styles only — Gmail/Outlook web strip <style>. Card pinned at 480px so narrow webmail panes render correctly. text/plain fallback kept for clients without HTML. Catalyst-Zero verify page (VerifyPinPage.tsx) ============================================= - Email shown as a copyable PILL with copy icon — click copies to clipboard, icon flips to a check for 1.5s. Selection-fallback for browsers without clipboard API. - Centered title + subtitle (was left-aligned in 1.2.x). - Microcopy: "Codes expire after 10 minutes — check your spam folder." Marketplace checkout sign-in (CheckoutStep.svelte) ================================================== - 1 single <input maxlength=6> → 6 separate <input maxlength=1> boxes with auto-advance, paste-fan-out (paste a 6-digit code anywhere on the row, all 6 boxes fill, autosubmits), backspace-back, ArrowLeft/ Right navigation, autocomplete=one-time-code on first box for iOS SMS autofill, caret-transparent so the digit IS the caret. - Email shown as the same copyable pill pattern (svg copy/check icons, hover-to-brand affordance). - Dropped "Use a different email" link (browser back works). - Added expire/spam microcopy below button. Header + wayfinding cleanup =========================== - Header.svelte: top-right "Sign in" button hidden when pathname is /checkout or /login. Two sign-in CTAs on the same screen was the UI debris caught live 2026-05-04. - CheckoutStep.svelte: "← Back to Review" moved from bottom-left (where users don't look) to top-left above the Checkout heading, rendered with a chevron icon. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 11:30:24 +04:00
github-actions[bot]	f375533ffa	deploy: update catalyst images to `88bfa34`	2026-05-04 05:44:50 +00:00
e3mrah	88bfa347d4	fix(auth): sign-out actually signs out + iCloud-style PIN UX (closes #721 ) (#722 ) * feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes #710) Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS operator with a single overlay toggle. Changes ======= products/catalyst/chart: - Chart.yaml 1.2.7 → 1.3.0 - values.yaml: ingress.marketplace.enabled toggle (default false) + marketplace.{brand,currency,paymentProvider,signupPolicy} surface - templates/sme-services/marketplace-routes.yaml: HTTPRoute marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin, / → marketplace; HTTPRoute .<sov> → console (per-tenant wildcard) - templates/sme-services/marketplace-reference-grant.yaml: cross- namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services - .helmignore: stop excluding sme-services/ and marketplace-api/* (only .kustomization.yaml + .ingress.yaml remain Kustomize-only) - All sme-services/* + marketplace-api/* manifests wrapped with {{ if .Values.ingress.marketplace.enabled }} so non-marketplace Sovereigns render the chart unchanged clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: - chart version 1.2.7 → 1.3.0 - ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN} - ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false} infra/hetzner: - variables.tf: marketplace_enabled var (string "true"/"false", default "false") - main.tf: thread var into cloudinit-control-plane.tftpl - cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations products/catalyst/bootstrap/api/internal/provisioner/provisioner.go: - Request.MarketplaceEnabled bool (json:"marketplaceEnabled") - writeTfvars: marketplace_enabled = "true"\|"false" core/pool-domain-manager/internal/allocator/allocator.go: - canonicalRecordSet adds "marketplace" prefix → marketplace.<sov> resolves via PDM at zone-commit time (PR #710 explicit record so caches don't depend on the .<sov> wildcard alone) DoD ready ========= - helm template with ingress.marketplace.enabled=false → identical manifest set to 1.2.7 (verified locally) - helm template with ingress.marketplace.enabled=true → emits 17 extra resources: 13 sme-services workloads + 2 marketplace-api + 1 HTTPRoute pair + 1 ReferenceGrant - pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green - catalyst-api builds, provisioner cloudinit_path_test green fix(ci): catalyst-build dispatches blueprint-release after deploy commit (closes #712) The deploy job's `git push` is made under GITHUB_TOKEN; per GitHub Actions design, commits authored by GITHUB_TOKEN don't re-trigger workflows. blueprint-release.yaml's `on.push.paths: products//chart/` filter matches the deploy commit's diff (chart/values.yaml + chart/templates/{api,ui}-deployment.yaml), so the workflow SHOULD fire, but doesn't — leaving the bp-catalyst-platform:1.2.7 OCI artifact stuck on whatever catalyst-api SHA was current at the last manual chart- touching PR. Today (2026-05-03) this stranded otech62-otech66 on catalyst-api:74d08eb six PRs after the SHA was superseded — every fresh Sovereign installed the buggy pre-#701 image and rejected handover with 401 unauthenticated. Fix: after `git push` succeeds in the deploy job, dispatch blueprint-release explicitly via `gh workflow run`. The dispatched run re-renders + re-publishes the chart with the just-pushed values.yaml. Closes #712. fix(auth): sign-out actually signs out + iCloud-style PIN UX (closes #721) Sign-out ======== 1. Cookie-clear Domain mismatch PIN-verify SETS catalyst_session with Domain:$CATALYST_SESSION_COOKIE_DOMAIN so the cookie carries across console.<sov> and marketplace.<sov>. HandleAuthLogout was clearing WITHOUT the Domain attribute. Browsers require an exact-match Set-Cookie (Path + Domain + SameSite) to actually drop a cookie — a mismatched Domain creates a new empty cookie scoped to the current host while the original parent-domain cookie stays alive. Next /whoami picks it up and the operator looks "still signed in". Fix: mirror the EXACT Domain/Path/Secure/SameSite the cookie was set with. Same fix on catalyst_refresh. 2. Keycloak SSO session survives local cookie drop Even if the local cookie clear worked, the upstream KC SSO session stayed alive. The next OIDC PKCE auth-guard fetch silently re- authenticated against KC and the operator landed back as the same identity. Fix: HandleAuthLogout returns 200 with { ok: true, keycloakLogoutURL: "<kc>/realms/<realm>/protocol/ openid-connect/logout?client_id=...&post_logout_redirect_uri= <origin>/login" }. UI's signOut() hard-navigates to keycloakLogoutURL so KC drops the SSO session and 302s back to /login. qc.clear() flushes all TanStack Query caches before the navigation. PIN UX (iCloud reference) ========================= PinInput6.tsx - Box size 48×56 → 56×64 (sm: 64×72) - Border 1px → 1.5px, rounded-lg → rounded-xl - Soft inner-shadow on top + bottom - Filled box gets a brand-tinted border (operator sees progress) - Focus: scale 1.04 + 3px ring at 30% brand alpha - text-xl → text-2xl (sm: text-3xl), tracking-tight, tabular-nums - caret-transparent — the digit IS the caret (matches iOS native) - Webkit autofill background normalised VerifyPinPage.tsx - Title + subtitle centered (was left-aligned) - Title 20px → 24px, semibold, tracking-tight - Subtitle in two lines: "A 6-digit code was sent to" / email - "Didn't get a code? Send a new one" + spam-folder microcopy below - Error message centered LoginPage.tsx - Centered title + subtitle to match - Copy: "We'll email you a 6-digit code to verify it's you." --------- Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 09:41:49 +04:00
github-actions[bot]	4c7e1e6d4c	deploy: update catalyst images to `35183af`	2026-05-04 03:51:04 +00:00
e3mrah	35183af5be	fix(ci): catalyst-build dispatches blueprint-release after deploy commit (closes #712 ) (#720 ) * feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes #710) Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS operator with a single overlay toggle. Changes ======= products/catalyst/chart: - Chart.yaml 1.2.7 → 1.3.0 - values.yaml: ingress.marketplace.enabled toggle (default false) + marketplace.{brand,currency,paymentProvider,signupPolicy} surface - templates/sme-services/marketplace-routes.yaml: HTTPRoute marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin, / → marketplace; HTTPRoute .<sov> → console (per-tenant wildcard) - templates/sme-services/marketplace-reference-grant.yaml: cross- namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services - .helmignore: stop excluding sme-services/ and marketplace-api/* (only .kustomization.yaml + .ingress.yaml remain Kustomize-only) - All sme-services/* + marketplace-api/* manifests wrapped with {{ if .Values.ingress.marketplace.enabled }} so non-marketplace Sovereigns render the chart unchanged clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: - chart version 1.2.7 → 1.3.0 - ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN} - ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false} infra/hetzner: - variables.tf: marketplace_enabled var (string "true"/"false", default "false") - main.tf: thread var into cloudinit-control-plane.tftpl - cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations products/catalyst/bootstrap/api/internal/provisioner/provisioner.go: - Request.MarketplaceEnabled bool (json:"marketplaceEnabled") - writeTfvars: marketplace_enabled = "true"\|"false" core/pool-domain-manager/internal/allocator/allocator.go: - canonicalRecordSet adds "marketplace" prefix → marketplace.<sov> resolves via PDM at zone-commit time (PR #710 explicit record so caches don't depend on the .<sov> wildcard alone) DoD ready ========= - helm template with ingress.marketplace.enabled=false → identical manifest set to 1.2.7 (verified locally) - helm template with ingress.marketplace.enabled=true → emits 17 extra resources: 13 sme-services workloads + 2 marketplace-api + 1 HTTPRoute pair + 1 ReferenceGrant - pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green - catalyst-api builds, provisioner cloudinit_path_test green fix(ci): catalyst-build dispatches blueprint-release after deploy commit (closes #712) The deploy job's `git push` is made under GITHUB_TOKEN; per GitHub Actions design, commits authored by GITHUB_TOKEN don't re-trigger workflows. blueprint-release.yaml's `on.push.paths: products//chart/*` filter matches the deploy commit's diff (chart/values.yaml + chart/templates/{api,ui}-deployment.yaml), so the workflow SHOULD fire, but doesn't — leaving the bp-catalyst-platform:1.2.7 OCI artifact stuck on whatever catalyst-api SHA was current at the last manual chart- touching PR. Today (2026-05-03) this stranded otech62-otech66 on catalyst-api:74d08eb six PRs after the SHA was superseded — every fresh Sovereign installed the buggy pre-#701 image and rejected handover with 401 unauthenticated. Fix: after `git push` succeeds in the deploy job, dispatch blueprint-release explicitly via `gh workflow run`. The dispatched run re-renders + re-publishes the chart with the just-pushed values.yaml. Closes #712. --------- Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 07:49:03 +04:00
e3mrah	4946ccd125	feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes #710 ) (#719 ) Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS operator with a single overlay toggle. Changes ======= products/catalyst/chart: - Chart.yaml 1.2.7 → 1.3.0 - values.yaml: ingress.marketplace.enabled toggle (default false) + marketplace.{brand,currency,paymentProvider,signupPolicy} surface - templates/sme-services/marketplace-routes.yaml: HTTPRoute marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin, / → marketplace; HTTPRoute .<sov> → console (per-tenant wildcard) - templates/sme-services/marketplace-reference-grant.yaml: cross- namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services - .helmignore: stop excluding sme-services/ and marketplace-api/* (only .kustomization.yaml + .ingress.yaml remain Kustomize-only) - All sme-services/* + marketplace-api/* manifests wrapped with {{ if .Values.ingress.marketplace.enabled }} so non-marketplace Sovereigns render the chart unchanged clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: - chart version 1.2.7 → 1.3.0 - ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN} - ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false} infra/hetzner: - variables.tf: marketplace_enabled var (string "true"/"false", default "false") - main.tf: thread var into cloudinit-control-plane.tftpl - cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations products/catalyst/bootstrap/api/internal/provisioner/provisioner.go: - Request.MarketplaceEnabled bool (json:"marketplaceEnabled") - writeTfvars: marketplace_enabled = "true"\|"false" core/pool-domain-manager/internal/allocator/allocator.go: - canonicalRecordSet adds "marketplace" prefix → marketplace.<sov> resolves via PDM at zone-commit time (PR #710 explicit record so caches don't depend on the *.<sov> wildcard alone) DoD ready ========= - helm template with ingress.marketplace.enabled=false → identical manifest set to 1.2.7 (verified locally) - helm template with ingress.marketplace.enabled=true → emits 17 extra resources: 13 sme-services workloads + 2 marketplace-api + 1 HTTPRoute pair + 1 ReferenceGrant - pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green - catalyst-api builds, provisioner cloudinit_path_test green Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 07:47:37 +04:00
github-actions[bot]	3a7fdad13f	deploy: update catalyst images to `1b1ea52`	2026-05-03 22:47:22 +00:00
e3mrah	1b1ea52c39	fix(bp-catalyst-platform): emit sovereign-fqdn ConfigMap atomically in chart (closes #717 ) (#718 ) * fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713) Closes #713 Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing 401 on /auth/handover: 1. SOVEREIGN_FQDN race api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn" with optional:true. On Sovereigns, that ConfigMap is rendered by the sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform HelmRelease. When the Pod starts first, valueFrom collapses to "" and stays empty — audience check rejects every valid token as "invalid audience". Fix: add Reloader annotations so the Pod rolls when the ConfigMap (and the handover-jwt-public Secret) appears. 2. catalyst-api-server SA missing user-level realm-management role mappings bp-keycloak realm import granted roles via clientScopeMappings — wrong level. The actual service-account user had no clientRoles entry, so KC rejected GET /users with 403 when catalyst-api tried to ensure the operator user during handover. Fix: add explicit "users" array binding service-account-catalyst-api-server to realm-management.{impersonation, manage-users, view-users, query-users}. * fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715) Closes #715 Two architectural bugs surfaced live on otech64 (2026-05-03), both leading to a healthy-looking Sovereign that the operator could not reach. 1. catalyst-api tofu workdir on emptyDir CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered a rolling restart 3 minutes into otech64's tofu run), in-progress state was lost. Tofu had created LB/network/server/services but not the hcloud_load_balancer_target.control_plane resource yet — the cluster came up at the k3s level but the public LB had no targets, returning TLS handshake failure for every console.<sov> request. Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed, fsGroup=65534 already wires write access). tofu apply resumes from where it left off after any Pod restart. 2. bp-reloader env-vars strategy reloadStrategy=env-vars only injects checksum env vars for ConfigMaps referenced via envFrom. Workloads using valueFrom: configMapKeyRef (catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the configmap.reloader.stakater.com/reload annotation added in PR #714 was a no-op under env-vars. Switch to reloadStrategy=annotations. Reloader bumps a pod-template annotation, triggering rollout regardless of how the CM/Secret is referenced. * fix(bp-catalyst-platform): emit sovereign-fqdn ConfigMap inside chart, drop sovereign-tls duplicate (#717) Closes #717 Reloader v1.4.16 is silent on the SOVEREIGN_FQDN race (#713). Tried all annotation forms (configmap.reloader.stakater.com/reload, reloader/auto) and both reload strategies (env-vars, annotations). RBAC is correct, watch coverage is global, but manual CM patches produce zero Reloader log output and zero Pod rollouts. Abandoning Reloader as the race fix. Move the sovereign-fqdn ConfigMap into bp-catalyst-platform chart templates, guarded by {{ if .Values.global.sovereignFQDN }}. Helm install applies all chart manifests in a single etcd transaction so the ConfigMap commits before the Pod schedules. valueFrom resolves correctly the first time. No race possible. Drop the duplicate from clusters/_template/sovereign-tls/ to avoid Helm-vs-Flux ownership flapping. The Kustomize path on contabo enumerates files in templates/kustomization.yaml so this Helm-templated file is never parsed by Kustomize. Verified live: deleting the existing CM and re-running Helm install produced an immediately-correct catalyst-api Pod with SOVEREIGN_FQDN populated, where the same install with the previous out-of-chart CM had left the env empty for the Pod's lifetime. --------- Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 02:45:24 +04:00
github-actions[bot]	b2f78a81e1	deploy: update catalyst images to `9a58289`	2026-05-03 22:06:35 +00:00
e3mrah	9a58289786	fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (closes #715 ) (#716 ) * fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713) Closes #713 Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing 401 on /auth/handover: 1. SOVEREIGN_FQDN race api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn" with optional:true. On Sovereigns, that ConfigMap is rendered by the sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform HelmRelease. When the Pod starts first, valueFrom collapses to "" and stays empty — audience check rejects every valid token as "invalid audience". Fix: add Reloader annotations so the Pod rolls when the ConfigMap (and the handover-jwt-public Secret) appears. 2. catalyst-api-server SA missing user-level realm-management role mappings bp-keycloak realm import granted roles via clientScopeMappings — wrong level. The actual service-account user had no clientRoles entry, so KC rejected GET /users with 403 when catalyst-api tried to ensure the operator user during handover. Fix: add explicit "users" array binding service-account-catalyst-api-server to realm-management.{impersonation, manage-users, view-users, query-users}. * fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715) Closes #715 Two architectural bugs surfaced live on otech64 (2026-05-03), both leading to a healthy-looking Sovereign that the operator could not reach. 1. catalyst-api tofu workdir on emptyDir CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered a rolling restart 3 minutes into otech64's tofu run), in-progress state was lost. Tofu had created LB/network/server/services but not the hcloud_load_balancer_target.control_plane resource yet — the cluster came up at the k3s level but the public LB had no targets, returning TLS handshake failure for every console.<sov> request. Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed, fsGroup=65534 already wires write access). tofu apply resumes from where it left off after any Pod restart. 2. bp-reloader env-vars strategy reloadStrategy=env-vars only injects checksum env vars for ConfigMaps referenced via envFrom. Workloads using valueFrom: configMapKeyRef (catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the configmap.reloader.stakater.com/reload annotation added in PR #714 was a no-op under env-vars. Switch to reloadStrategy=annotations. Reloader bumps a pod-template annotation, triggering rollout regardless of how the CM/Secret is referenced. --------- Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 02:04:26 +04:00
github-actions[bot]	c179cba12a	deploy: update catalyst images to `e96e31a`	2026-05-03 21:39:29 +00:00
e3mrah	e96e31a781	fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713 ) (#714 ) Closes #713 Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing 401 on /auth/handover: 1. SOVEREIGN_FQDN race api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn" with optional:true. On Sovereigns, that ConfigMap is rendered by the sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform HelmRelease. When the Pod starts first, valueFrom collapses to "" and stays empty — audience check rejects every valid token as "invalid audience". Fix: add Reloader annotations so the Pod rolls when the ConfigMap (and the handover-jwt-public Secret) appears. 2. catalyst-api-server SA missing user-level realm-management role mappings bp-keycloak realm import granted roles via clientScopeMappings — wrong level. The actual service-account user had no clientRoles entry, so KC rejected GET /users with 403 when catalyst-api tried to ensure the operator user during handover. Fix: add explicit "users" array binding service-account-catalyst-api-server to realm-management.{impersonation, manage-users, view-users, query-users}. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 01:37:36 +04:00
github-actions[bot]	2eb499e9d7	deploy: update catalyst images to `f254ff1`	2026-05-03 20:27:20 +00:00
e3mrah	f254ff1f8d	fix(catalyst-ui): auth-guard honors catalyst_session cookie before OIDC PKCE fallback (Phase-8b followup) (#711 ) The wizard handover lands the operator at GET https://console.<sov>.omani.works/auth/handover?token=<jwt> which the Sovereign-side catalyst-api validates and 302-redirects to /console/dashboard with a fresh `catalyst_session` HttpOnly Secure SameSite=Lax cookie. Verified live with curl on otech49: HTTP/1.1 302 Found location: /console/dashboard set-cookie: catalyst_session=eyJhbGciOiJSUzI1NiI...; HttpOnly; Secure; SameSite=Lax The browser arrived at /console/dashboard with the cookie attached but SovereignConsoleLayout went straight from "no sessionStorage tokens" to initiateLogin() (PKCE redirect to Keycloak). Operators landed on auth.<sov>.../auth?response_type=code&client_id=catalyst-ui&... — a username/password screen. User from the field on otech49 + otech52 today: "fuck, this is asking username password!!!" Fix: probe GET /api/v1/whoami (with credentials:'include') BEFORE considering Keycloak. The whoami handler is gated by the catalyst-api session middleware, which HMAC-validates the cookie's signature against the local handover signer's public key. On 200, the layout enters a new `cookie-authenticated` AuthState and renders the console shell directly. On 401, the existing OIDC flow runs unchanged so returning users with an expired cookie still get the silent refresh plus PKCE fallback. 5xx is treated like 401 (fall through to OIDC) so a flaky API never traps an authenticated user behind a Keycloak login they don't need. Sign-out is also branch-aware: the cookie path DELETEs /api/v1/auth/session and reloads to '/'; the OIDC path keeps calling initiateLogout() so the Keycloak end-session URL is still reached. File changed: products/catalyst/bootstrap/ui/src/app/layouts/SovereignConsoleLayout.tsx Tests added: products/catalyst/bootstrap/ui/src/app/layouts/SovereignConsoleLayout.test.tsx Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 00:25:19 +04:00
github-actions[bot]	4984488b41	deploy: update catalyst images to `4a9b2b2`	2026-05-03 20:01:47 +00:00
e3mrah	4a9b2b2bff	fix(catalyst-api/wipe): retry firewall delete + purge Hetzner S3 buckets (closes #706 ) (#709 ) * fix(catalyst-api/wipe): retry firewall delete on 422 resource_in_use Hetzner server delete is asynchronous — returns 200 'action started' while the firewall stays attached for 5-30s. Single-shot delete saw 422, swallowed it, reported '0 firewalls deleted' while leaving the firewall live (verified on otech50 2026-05-03). Adds deleteFirewallWithRetry with exponential backoff (6s/12s/24s/48s, 5 attempts). PurgeReport gains FirewallsRetried + S3Buckets fields. Issue #706. * feat(catalyst-api/wipe): add Hetzner Object Storage bucket purge Adds PurgeBuckets() that empties + deletes the per-Sovereign Hetzner Object Storage bucket via the S3 API. tofu destroy can't remove `minio_s3_bucket` while objects are present, so 28 orphan buckets accumulated from otech23..otech50 (audit 2026-05-03). Sequence: BucketExists → ListObjectVersions → RemoveObjects (batch 1000) → ListIncompleteUploads → RemoveIncompleteUpload → RemoveBucket. 404 anywhere is idempotent success. Issue #706. * test(catalyst-api/wipe): firewall retry + bucket purge regression coverage Adds purge_firewall_retry_test.go with three cases: - TestFirewallRetry_Server_Detach_Async: 422 twice then 204 → 1 fw deleted - TestFirewallRetry_Exhausted: always 422 → no fw deleted, error reported - TestFirewallRetry_AlreadyGone_404: idempotent success path Adds buckets_test.go with stubbed S3 endpoints exercising: - BucketNameForSovereign/HetznerObjectStorageEndpoint contract - empty bucket, 1500-version bucket (3 keys, multi-delete batches), in-progress multipart upload abort, 404 idempotent, progress callback Issue #706. * fix(catalyst-api/wipe): wire bucket purge into WipeDeployment handler After hetzner.Purge() returns (which now retries firewall delete on 422), call hetzner.PurgeBuckets() with the per-Sovereign Object Storage credentials from dep.Request. Runs AFTER tofu destroy so tofu state isn't fought, BEFORE local-record cleanup so the wizard banner shows the count. Skips with a logged warning when in-memory credentials are unavailable (Pod restart between provision and wipe). The SSE log + UI banner now report the s3-buckets count alongside the existing resource tallies. Issue #706. * feat(catalyst-ui): wipe banner now reports S3 buckets + firewall retries Adds s3_buckets and firewalls_retried fields to the WipeReport TypeScript shape and renders the new bucket count alongside the existing servers/lbs/networks/firewalls/ssh-keys tally. When the firewall retry counter is non-zero, surfaces it in a parenthetical so operators see why the wipe took an extra few seconds. Both the AppsPage Cancel & Wipe modal and the DecommissionPage success view consume the same WipeReport interface so this single update covers both surfaces. Issue #706. --------- Co-authored-by: hatiyildiz <hatice@openova.io>	2026-05-03 23:59:48 +04:00
github-actions[bot]	cdbb617231	deploy: update catalyst images to `e4ef4c0`	2026-05-03 19:56:21 +00:00
e3mrah	e4ef4c0671	fix(catalyst-api/jobs): bridge subscribes to helmwatch transition events (closes #695 ) (#708 ) * fix(bp-external-dns): livenessProbe.initialDelaySeconds=180 for cold-cluster cache-sync (closes #700) PR #679 added --request-timeout=120s but external-dns has TWO timeouts: RequestTimeout (per-API-call, controlled by --request-timeout) and WaitForCacheSync (initial informer sync, hardcoded 60s in upstream binary, NOT exposed as a flag). On a fresh Sovereign with k3s apiserver CPU-saturated, the cache sync misses 60s -> fatal: failed to sync v1.Node: context deadline exceeded -> CrashLoopBackOff 5-10 times. Caught live on otech49+ (2026-05-03), 5 restarts before stable. Bump livenessProbe.initialDelaySeconds from upstream 10s default to 180s so kubelet does NOT restart the Pod while the initial cache sync runs against a CPU-saturated freshly-provisioned k3s apiserver. The Sovereign apiserver reaches steady-state within ~2 min so 3 min comfortably covers cold starts. Also bumps periodSeconds=30 + failureThreshold=3 so a genuinely-hung pod is still killed within ~90s once steady-state. readinessProbe gets a corresponding initialDelaySeconds=30 so endpoint flapping during sync doesn't churn services. Helm overrides REPLACE whole maps (not merge), so the override preserves the upstream httpGet.path: /healthz + port: http shape verbatim. Bumps: - platform/external-dns/chart/Chart.yaml: 1.1.5 -> 1.1.6 - clusters/_template/bootstrap-kit/12-external-dns.yaml: HelmRelease pin 1.1.5 -> 1.1.6 fix(catalyst-api/jobs): bridge subscribes to helmwatch transition events (closes #695) Wires the per-deployment jobs.Bridge directly to the helmwatch Watcher's runtime event stream so every per-component HelmRelease transition observed AFTER the initial-list seed advances the per-Job state map. The wizard's /jobs page now reflects the live cluster state instead of pinning Install rows to whatever the initial-list snapshot saw at attach time. Symptom (verified on otech48/49/50/52, 2026-05-03 14:40-19:20): the wizard rendered Install rows as "running"/"pending" even after `kubectl --context=otech<N> -n flux-system get hr` showed every bp-* HelmRelease at Ready=True. Wiring change: helmwatch.Watcher.Subscribe(fn func(provisioner.Event)) — fan-out callback registered alongside the primary `emit` Emit. Every event the Watcher dispatches reaches both sinks. Used by the handler at attachBridgeSeederHook + RefreshWatch construction sites: watcher.Subscribe(func(ev provisioner.Event) { if err := bridge.OnProvisionerEvent(ev); err != nil { h.log.Warn("jobs bridge: runtime event forward failed", "id", depID, "phase", ev.Phase, "component", ev.Component, "err", err) } }) Tests: - internal/jobs/helmwatch_bridge_test.go::TestBridge_SeedThenRuntimeTransitions seeds 3 pending HRs, asserts 3 pending jobs; emits Ready=True for HR-1 → asserts 1 succeeded + 2 pending; emits Ready=Unknown for HR-2 → asserts 1 succeeded + 1 running + 1 pending. Verifies StartedAt / FinishedAt / DurationMs / LatestExecutionID stamps too. - internal/helmwatch/helmwatch_test.go::TestWatch_SubscribeFanOut proves a Subscribe callback receives the same set of per-component events as the primary emit, including the "ready for handover" terminal event. - internal/helmwatch/helmwatch_test.go::TestWatch_SubscribeNilIsNoop guards against panic on nil callback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 23:54:20 +04:00
e3mrah	c5ffaa2fd7	fix(bp-external-dns): livenessProbe.initialDelaySeconds=180 for cold-cluster cache-sync (closes #700 ) (#707 ) PR #679 added --request-timeout=120s but external-dns has TWO timeouts: RequestTimeout (per-API-call, controlled by --request-timeout) and WaitForCacheSync (initial informer sync, hardcoded 60s in upstream binary, NOT exposed as a flag). On a fresh Sovereign with k3s apiserver CPU-saturated, the cache sync misses 60s -> fatal: failed to sync *v1.Node: context deadline exceeded -> CrashLoopBackOff 5-10 times. Caught live on otech49+ (2026-05-03), 5 restarts before stable. Bump livenessProbe.initialDelaySeconds from upstream 10s default to 180s so kubelet does NOT restart the Pod while the initial cache sync runs against a CPU-saturated freshly-provisioned k3s apiserver. The Sovereign apiserver reaches steady-state within ~2 min so 3 min comfortably covers cold starts. Also bumps periodSeconds=30 + failureThreshold=3 so a genuinely-hung pod is still killed within ~90s once steady-state. readinessProbe gets a corresponding initialDelaySeconds=30 so endpoint flapping during sync doesn't churn services. Helm overrides REPLACE whole maps (not merge), so the override preserves the upstream httpGet.path: /healthz + port: http shape verbatim. Bumps: - platform/external-dns/chart/Chart.yaml: 1.1.5 -> 1.1.6 - clusters/_template/bootstrap-kit/12-external-dns.yaml: HelmRelease pin 1.1.5 -> 1.1.6 Co-authored-by: hatiyildiz <hatice@openova.io>	2026-05-03 23:39:36 +04:00
github-actions[bot]	6df37b032c	deploy: update catalyst images to `0238a2b`	2026-05-03 18:53:12 +00:00
e3mrah	0238a2bde0	fix(flow-canvas): round-5 — variable slots + fit-to-host + zigzag + 60ms resize (#669 ) (#705 ) Co-authored-by: hatiyildiz <hatice@openova.io>	2026-05-03 22:51:10 +04:00
github-actions[bot]	21122116dd	deploy: update catalyst images to `bceaa20`	2026-05-03 18:03:55 +00:00
e3mrah	bceaa20c43	fix(catalyst-api): mint local session JWT in auth_handover (PR #694 pattern) (#703 ) Keycloak v26 dropped legacy 'requested_subject' token-exchange. The auth_handover.go path still called kc.ImpersonateToken() which uses that parameter, returning 400 'invalid_request'. PR #694 already moved PIN-verify to local JWT minting via handoverSigner.SignCustomClaims; apply the same pattern to /auth/handover. Caught live on otech49 (2026-05-03): ERROR auth_handover: ImpersonateToken failed err=token endpoint 400: Parameter 'requested_subject' is not supported for standard token exchange Sovereign Keycloak still owns the canonical user record (created via EnsureUser before token mint) — only the session-cookie minting moves local. IdP brokering and federation paths are unaffected. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 22:01:06 +04:00
github-actions[bot]	4ba39c2d60	deploy: update catalyst images to `3144eed`	2026-05-03 17:42:30 +00:00
e3mrah	3144eedd5e	fix(catalyst-api): read CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH env (PR #692 followup) (#702 ) PR #692 moved the Sovereign-side JWK volume mount from /var/lib/catalyst/handover-jwt-public.jwk (subPath, conflicted with the catalyst-api PVC) to /etc/catalyst/handover-jwt-public/public.jwk (directory mount). The chart sets CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH to the new path, but the AuthHandover handler never read that env. Result: auth_handover.go used the hardcoded default /var/lib/catalyst/handover-jwt-public.jwk which no longer exists, returning 401 'public key unavailable' on every handover. Caught live on otech49 (2026-05-03): ERROR auth_handover: load public key failed err=read /var/lib/catalyst/handover-jwt-public.jwk: no such file path=/var/lib/catalyst/handover-jwt-public.jwk Fix: - Resolution order: handler field -> env var -> default const - Default const updated to the new path so cold-starts work without the env var (defence in depth) Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 21:40:39 +04:00
github-actions[bot]	0e6ac5cd29	deploy: update catalyst images to `ed2b374`	2026-05-03 17:36:22 +00:00
e3mrah	ed2b374b5e	fix(catalyst-api): move /auth/handover OUTSIDE the session-gate (Phase-8b followup) (#701 ) The Sovereign-side /auth/handover handler is the ENTRY POINT that establishes the session. The operator's browser arrives with the handover JWT in the URL query and zero cookies. Putting the route inside the RequireSession middleware group rejects every handover with 401 {error:unauthenticated} before AuthHandover ever runs. Caught live on otech49 (2026-05-03): GET /auth/handover?token=<valid-jwt> -> 401 in 43us (middleware rejection, no body log line emitted). This was working on otech48 only because catalyst-api there had no Keycloak credentials wired (kc-sa-credentials Secret was missing) so GetAuthConfig() returned nil and RequireSession became a passthrough. Once PR #691 wired the credentials cleanly on otech49, the gate activated and broke the handover. Fix: register the route at the top-level mux outside the auth group, mirroring the same pattern as /api/v1/deployments/{id}/kubeconfig (cloud-init postback that also has no cookies). The handler's own JWT validation IS the authentication. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 21:33:14 +04:00
github-actions[bot]	cf9946f4f1	deploy: update catalyst images to `2146deb`	2026-05-03 17:10:05 +00:00
e3mrah	2146deb427	fix(catalyst-platform): escape literal Helm-curly in api-deployment.yaml comment (#699 ) Helm parses the entire file (including YAML comments) for template directives BEFORE YAML parsing strips comments. Literal '{{ ... }}' inside a # comment was treated as a template directive and failed with 'unexpected <.> in operand' at line 419. PR #698 introduced this in the explanatory comment for the SOVEREIGN_FQDN ConfigMap workaround. Reword to avoid the literal double-curlies — the comment still describes the constraint without tripping the Helm parser. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 21:08:13 +04:00
github-actions[bot]	7edc4370a3	deploy: update catalyst images to `74d08eb`	2026-05-03 16:51:31 +00:00
e3mrah	74d08eb5a6	fix(catalyst-api+sovereign-tls): SOVEREIGN_FQDN via ConfigMap, not Helm template (PR #692 followup) (#698 ) PR #692 added an inline Helm-template `value:` for SOVEREIGN_FQDN in api-deployment.yaml. That broke contabo-mkt's catalyst-platform Flux Kustomization (path: ./products/catalyst/chart/templates) because Kustomize parses raw YAML and Helm `{{ ... }}` is not valid YAML syntax. Live error on contabo at `adf8dc7d`: kustomize build failed: yaml: invalid map key: map[string]interface {}{".Values.global.sovereignFQDN \| default \"\" \| quote":""} Replace the Helm-template form with `valueFrom.configMapKeyRef.optional: true` so the same template renders cleanly under both consumers: - contabo-mkt (Kustomize): ConfigMap `sovereign-fqdn` doesn't exist → optional ref → env stays empty → catalyst-api on contabo never validates handover JWTs anyway (it's the SIGNER, not the validator). Correct. - Sovereigns (Helm via bp-catalyst-platform OCI chart): on apply, the sovereign-tls Kustomization renders `sovereign-fqdn-configmap.yaml` with envsubst on ${SOVEREIGN_FQDN}, creating the ConfigMap with the per- Sovereign FQDN. catalyst-api Pod resolves the ref → env populated → audience check works. This restores the bridge between the two consumers without forking the template. The bp-catalyst-platform 1.2.5 → 1.2.7 bump publishes the new chart; bootstrap-kit overlay pin updated. Will be verified on otech49 (next provision after this lands). Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:49:36 +04:00
github-actions[bot]	01a2e3bdb4	deploy: update catalyst images to `1946e0a`	2026-05-03 16:40:41 +00:00
e3mrah	1946e0a46e	fix(flow-canvas): variable-width depth columns + ResizeObserver debounce (#669 round 3) (#693 ) * fix(flow-canvas): variable-width depth columns + ResizeObserver debounce (#669 round 3) Round-2 UAT showed: 1. Dense bucket of 30+ siblings piled at the right edge while 60% of canvas (left side) sat empty with one bubble per depth. 2. Sim "trying never stabilizing" during pane-transition animations. Root cause #1: round-2 used a constant `perDepthX` for every depth. With one-bubble depths next to a 30+ sibling depth, the dense bucket got 80% × perDepthX (~128 px) of horizontal room and had to pile into 8+ sub-columns; sparse depths each got the same perDepthX (~160 px) for a single bubble. Net: 60% canvas unused on the left, dense cluster jammed at right. Round-3 fix #1: variable-width depth columns. Each depth gets a slot whose width tracks its bucket's natural extent at radius R: sparse buckets need 2R + small gap; dense buckets need (totalCols - 1) * (2R + COLLIDE_PADDING) to fit sub-columns side-by-side. depthToX returns the centerline of slot[depth]; adjacent slots are separated by `gap = clamp(r4, MIN, MAX)`. Total layout width = sum(slots) + gaps. Root cause #2: ResizeObserver fired on every animation frame during the 220ms padding-right transition (pane open/close). Every fire called setHostSize, which retriggered layoutMetrics → R changed by 1-2 px → all node targets shifted → sim re-seeded → never settled. Round-3 fix #2: 180ms debounce on the observer + 8 px epsilon gate (sub-pixel changes ignored entirely). Combined with snap-to-4 on R and snap-to-8 on slot widths in layoutMetrics, the metrics now hold constant during pane-transition animations and the sim converges once. Tests: bounded layout (17) + JobDetail (5) all green; tsc -b clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(flow-canvas): sqrt-aspect dense buckets + tight grid clamps (#669 round 4) Round-3 still piled the dense bucket at the right edge. Distribution test on the founder's exact screenshot shape (1+1+30) showed the dense slot occupied only 28% of total X-extent — better than round-2 (~13%) but not enough. Round-4 fix: 1. layoutMetrics targets a sqrt-aspect-ratio for dense buckets: targetRows = round(sqrt(count / 1.6)) 30 leaves → 4 rows × 8 cols → ~700 px slot at R=40, occupying >50% of total X-extent. The densest bucket's targetRows now sets R via vertical-fit, so wide buckets actually claim X-room rather than collapsing into thin tall columns. 2. gridTargets reads cols/rows from layoutMetrics.slotInfo instead of recomputing — guarantees the per-tick clamp uses the same sub-grid dimensions as the slot-width math. 3. Per-cell clamp window narrowed to ±(pitch/2 - R) so the bubble edge can never reach a neighbour's centre. Old clamp used the full pitch which let forceCollide push bubbles into a neighbour's territory and then ratcheted them in — centres could collapse to <2R apart. Adds FlowCanvasOrganic.distribution.test.tsx replicating the founder's UAT screenshot (depth 0: 1, depth 1: 1, depth 2: 30). Asserts: - depth-0 X < depth-1 X < depth-2 X (left-to-right) - dense leafSpan ≥ 30% of total layout extent - no centre-to-centre distance < 2R All tests green: distribution (2/2), bounded (17/17), JobDetail (5/5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:38:44 +04:00
github-actions[bot]	3da196ec42	deploy: update catalyst images to `46c956b`	2026-05-03 16:36:40 +00:00
e3mrah	46c956b21e	feat(catalyst-ui+api): wizard guest mode + ownership check (#689 ) (#696 ) The wizard surface is now anonymous-first. A visitor lands on console.openova.io and runs the entire 7-step provisioning flow without a session; auth fires only when they click Launch. Frontend (catalyst-ui): - Drop the wizardAuthGuard so the wizard route renders for anonymous visitors. The existing zustand+persist store already keeps every form field in localStorage with credential-hygiene partitioning (Hetzner token, SSH private key, registrar token NEVER persisted), so the guest-mode hydration on refresh works for free. - New shared/lib/useSession hook polls /api/v1/whoami via React Query; exposes signedIn / email / refetch / signOut. - New widgets/auth/ProfileMenu in the wizard header — Sign in button for anonymous, email-initial avatar with sign-out dropdown for signed-in. - New widgets/auth/PinSignInModal — two-stage email → 6-digit PIN modal that POSTs /auth/pin/issue + /auth/pin/verify (issue #688). Falls back to /auth/magic-link when the PIN endpoint is not available, so this PR is shippable independent of #688's merge order. - StepReview Launch handler routes anonymous through the PIN modal; on verify it stamps the verified email into orgEmail and POSTs the deployment immediately. - New /provision/* beforeLoad guard: anonymous → redirect to wizard with a sessionStorage flash banner; signed-in cross-tenant gets the canonical 404 from the API (no UI-side branch). - New shared/lib/flashBanner — sessionStorage seam for the guard → wizard banner hand-off. Backend (catalyst-api): - Add OwnerEmail to store.Record and handler.Deployment, stamped from X-User-Email at CreateDeployment. - New checkOwnership helper enforces 404 (NEVER 403) on cross-tenant access — never leak existence of someone else's deployment via the response code. Legacy records (OwnerEmail == "") pass through with a warning so in-place upgrade does not lock operators out. - Wired into GetDeployment, StreamLogs, GetDeploymentEvents, WipeDeployment, GetKubeconfig, MintHandoverToken, ListJobs, and GetJob. PutKubeconfig keeps its bearer-token auth (cloud-init postback path). Tests: - Backend: deployments_owner_test.go covers legacy passthrough, no-session passthrough, owner match (case-insensitive), the load-bearing 404-not-403 cross-tenant assertion, and end-to-end proof through GetDeployment + GetDeploymentEvents. - Frontend: flashBanner round-trip + clear-on-read; useSession signed-in / 401 / signOut paths; WizardLayout guest-mode [Sign in] button + flash banner rendering. Closes #689. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:34:38 +04:00
e3mrah	4764b69e4c	fix(catalyst-api): Phase-1 watcher transitions status to ready when all HRs Ready (#697 ) otech48 incident (2026-05-03): all 37 bp-* HelmReleases on the Sovereign cluster reached Ready=True, but the catalyst-api deployment record stayed status=phase1-watching. Wizard's POST /mint-handover-token returned 409 not-handover-ready, blocking the auto-redirect to console.<sov>/auth/handover. Root cause: helmwatch's terminate-on-all-done gate required len(observed) >= MinBootstrapKitHRs. Chart shipped CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=38, but the actual bootstrap-kit cardinality had drifted to 37 — making the gate permanently unsatisfiable. Watch ran until 60-minute WatchTimeout fired. Fix: gate terminate-on-all-done on the informer's HasSynced signal instead of the brittle count. After WaitForCacheSync returns the full bp-* set is in the cache regardless of cardinality. MinBootstrapKitHRs stays as a defence-in-depth floor (default lowered 11 → 1) for the empty-cache footgun. Chart env CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS dropped to 1. Implementation: - helmwatch.Watcher: new informerSynced bool gate, set after WaitForCacheSync. processEvent refuses to consider terminate-on-all-done while informerSynced=false. After WaitForCacheSync, re-evaluate the all-terminal check once on the synced cache (handles the rehydrate- after-restart path where every HR is already Ready=True at attach). - helmwatch.maybeEmitReadyTransition: emits the operator-visible "All N blueprints reconciled. Sovereign ready for handover." SSE event exactly once when the gate fires (idempotency guard against flicker re-triggering the gate). - handler.markPhase1Done: persistDeployment after status flip so the on-disk JSON reflects status=ready before any wizard poll. Also refuses to downgrade an already-adopted deployment if a late watcher event tries to flap it. - Tests: new transition_test.go with happy-path, idempotency, partial- ready, realistic 37-HR convergence, and empty-cache scenarios. New TestMarkPhase1Done_RefusesToDowngradeAdopted in phase1_watch_test.go. Will be verified live on otech49 (next provision after this lands): - Wizard auto-shows "Open your Sovereign Console" button within 30s of all HRs reaching Ready - No manual API calls or kubectl exec needed to flip status - catalyst-api logs show "All 37 blueprints reconciled" event in SSE buffer Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:34:26 +04:00
github-actions[bot]	8afb667da9	deploy: update catalyst images to `ba31f24`	2026-05-03 16:28:50 +00:00
e3mrah	ba31f24922	feat(catalyst-ui+api): replace magic-link with 6-digit PIN auth (#688 ) (#694 ) Replace the magic-link login flow on console.openova.io with a paste-friendly 6-digit numeric PIN, modelled on bank/Google verification screens. Founder rejected magic links because they look like phishing (2026-05-03). ## Backend (products/catalyst/bootstrap/api) - New handler/pinstore.go — sync.Mutex-guarded in-memory map keyed by email with 10-minute TTL, 60-second per-email rate limit, 3-attempt lockout, and a background goroutine that sweeps expired entries every minute. PINs are NEVER persisted to disk per credential-hygiene rules. - handler/auth.go rewritten: * POST /api/v1/auth/pin/issue — body {email}. EnsureUser in openova realm, generate 6-digit PIN with crypto/rand (NEVER math/rand), store, send plaintext email with prominent "3 7 2 4 5 8" code and NO clickable URL, return {ok, requestId, expiresInSec}. Rate-limit 60s. * POST /api/v1/auth/pin/verify — body {email, pin, requestId}. Atomic verify+decrement, on match mint self-signed session JWT (same handover signer; KC 24.7 removed legacy token-exchange) and set HttpOnly Secure SameSite=Lax cookie. Wrong: 401 with attemptsRemaining. Locked/expired: 410. Stable error codes: pin-invalid / pin-expired / attempts-exceeded / email-required / pin-rate-limited. - Routes wired in cmd/api/main.go. Legacy /auth/magic and /auth/callback redirect to /login?error=flow_changed for stale bookmarks. - Handler struct gets a pinStore field; openovaKC keycloakClient kept for the EnsureUser call. - Tests: auth_pin_test.go (14 tests covering happy path, all error codes, SMTP rollback, rate limit, request-mismatch) + pinstore_test.go (12 tests on the store invariants). ## Frontend (products/catalyst/bootstrap/ui) - New PinInput6.tsx component — 6 inputs, inputmode=numeric, maxlength=1, auto-advance focus, Backspace steps back, paste-anywhere splits clipboard digits across boxes (extracts /\d/g), auto-submits on the 6th digit or Enter. one-time-code autocomplete on box 0 for SMS prefill. - LoginPage rewritten — single email field, "Send code" button, on success navigates to /login/verify with email + requestId in the URL. PIN never enters the URL. - New VerifyPinPage — renders PinInput6, calls /pin/verify, on 401 shows "Code incorrect, X attempts remaining", on 410 routes back to /login with the error code, on 200 navigates to /wizard (or ?next=...). - AuthCallbackPage stripped of magic-link code path; Catalyst-Zero branch is now a 302 safety net for stale Keycloak redirect URIs. - Router gets /login/verify route. - 17 vitest cases on PinInput6 covering paste, typing, backspace, Enter, pasting alphanumerics/long strings, controlled value, disabled state. ## DoD verification - go test ./internal/handler/... -run "Pin\|Handover\|Auth" → PASS (12 pinstore_test + 14 auth_pin_test + handover/auth tests) - npm test src/components/PinInput6.test.tsx → 17 passed - helm template products/catalyst/chart → renders without error - Email body contains zero clickable URLs: TestSendPinEmail_NoMagicLinkURL asserts ?token=, &token=, magic-link substrings absent Closes #688 Co-authored-by: hatiyildiz <hatice@openova.io>	2026-05-03 20:26:05 +04:00
e3mrah	7ca9541ef9	fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup) (#691 ) * fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup) Sovereign-side catalyst-api needs Keycloak service-account credentials to provision the operator's user during /auth/handover. Today the chart references K8s Secret `catalyst-kc-sa-credentials` with keys addr/realm/ client-id/client-secret in the catalyst-system namespace — but no zero-touch path materialised it. The dead SealedSecret template at 09a-keycloak-catalyst-api-secret.yaml had a different name AND different keys (CATALYST_KC_), used PLACEHOLDER_SEALED_VALUE markers no provisioner replaced, and wasn't even listed in the bootstrap-kit kustomization. Symptom on otech48: GET /auth/handover?token=<valid-jwt> returns "server misconfiguration: keycloak not configured" (auth_handover.go:169). Fix: bp-keycloak chart's configmap-sovereign-realm.yaml template now emits the realm-import ConfigMap AND the catalyst-kc-sa-credentials Secret in a single template scope so they share the same generated client secret. Pattern mirrors platform/powerdns/chart/templates/ api-credentials-secret.yaml (canonical seam, ADR-0001 §11.3 anti-duplication). Secret-value resolution order (first match wins): 1. operator-supplied .Values.catalystApiServerClientSecret 2. helm `lookup` of existing Secret in keycloak ns (idempotent) 3. fresh randAlphaNum 32 (zero-touch on first install) The Secret carries the four keys exactly as the catalyst-api Pod's secretKeyRef expects — addr / realm / client-id / client-secret — with addr derived from gateway.host (https://auth.<sovereignFQDN>). Reflector annotations auto-mirror the Secret to catalyst-system as soon as that namespace materialises (bootstrap-kit slot 13). The realm import already creates the catalyst-api-server client with serviceAccountsEnabled + impersonation/manage-users/view-users/ query-users role mappings — so once Keycloak is Ready and the realm imports, the SA is fully provisioned and the K8s Secret carries a matching client secret. No post-install Job, no Admin-API script, no out-of-band SealedSecret ceremony. Cleanup: removes the dead 09a SealedSecret template (not in kustomization, never produced a working Secret). Bumps: - bp-keycloak chart 1.3.0 -> 1.3.1 - clusters/_template/bootstrap-kit/09-keycloak.yaml HelmRelease pin 1.3.0 -> 1.3.1 Existing per-Sovereign overlays (clusters/otech.omani.works/, clusters/omantel.omani.works/) intentionally remain on 1.3.0 — fresh otechN provisioning consumes _template at provision time. Will be verified live on otech49 — handover end-to-end without ANY manual Secret creation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(keycloak): bump blueprint.yaml spec.version to match chart 1.3.1 TestBootstrapKit_BlueprintCardsHaveRequiredFields/keycloak asserts Chart.yaml.version == blueprint.yaml.spec.version. Forgot to bump blueprint.yaml in the previous commit. Note: 8 other blueprints (cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, gitea) carry the same pre-existing mismatch and the test fails on main too. Out of scope for this PR; fixing the keycloak case to keep the new chart version internally consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:50:06 +04:00
github-actions[bot]	2146279083	deploy: update catalyst images to `6f3e15b`	2026-05-03 15:49:28 +00:00
e3mrah	6f3e15b1ec	fix(handover): provision JWK Secret on Sovereign + inject SOVEREIGN_FQDN env (Phase-8b followup) (#692 ) Two handover bugs caught live on otech48 (2026-05-03): 1. Sovereign-side catalyst-api responded to GET /auth/handover with "server misconfiguration: public key unavailable". Root cause: the K8s Secret `catalyst-handover-jwt-public` (referenced by the chart's optional Secret-volume) was never materialised on the Sovereign, so the optional volume mount fell through and the JWK file was absent inside the container. 1.2.0 wired the mount but no provisioning step created the Secret. Fix mirrors the canonical pattern from PR #543 (ghcr-pull) and PR #680 (harbor-robot-token): cloud-init now writes the Secret manifest into catalyst-system NS and runcmd applies it BEFORE flux-bootstrap, so the Secret exists by the time bp-catalyst-platform reconciles. Also moves the chart volume mount off the catalyst-api PVC (mountPath /etc/catalyst/handover-jwt-public, no subPath) so a leftover empty directory in the PVC from pre-#606 installs cannot collide with the re-provisioned Secret mount. 2. /auth/handover validator rejected every valid JWT with 401 "invalid audience" because SOVEREIGN_FQDN was unset on Sovereigns — the audience check collapsed to the literal "https://console." prefix. The bp-catalyst-platform HelmRelease overlay was already setting `global.sovereignFQDN` but the chart template never plumbed it through to the Pod env. Added a SOVEREIGN_FQDN env reading `.Values.global.sovereignFQDN` (default "" so Catalyst-Zero installs, where catalyst-api is the SIGNER not the validator, stay clean). Bumps: - bp-catalyst-platform 1.2.4 -> 1.2.5 - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml HelmRelease pin Will be verified live on otech49 — fresh provision should reach https://console.otech49.omani.works/auth/handover?token=... and exchange to a Keycloak session WITHOUT manual Secret creation. Issue #606 followup. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:47:21 +04:00
github-actions[bot]	adf8dc7ded	deploy: update catalyst images to `d0b574b`	2026-05-03 14:36:29 +00:00
e3mrah	d0b574bd68	fix(hetzner-tofu): add powerdns_api_key to templatefile() vars (#687 ) PR #686 added var.powerdns_api_key to variables.tf and referenced it as ${powerdns_api_key} in cloudinit-control-plane.tftpl, but missed wiring it into the templatefile() vars dict in main.tf. Result on otech48: Invalid value for "vars" parameter: vars map does not contain key "powerdns_api_key", referenced at ./cloudinit-control-plane.tftpl:273 This commit closes the gap: powerdns_api_key now flows from var -> templatefile vars -> cloud-init -> Secret manifest. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:34:36 +04:00
github-actions[bot]	351ab9b584	deploy: update catalyst images to `6847595`	2026-05-03 14:25:30 +00:00

1 2 3 4 5 ...

1044 Commits