Two unrelated production-bug fixes squashed because they came out of
the same live verification pass on console.openova.io 2026-05-04.
1. catalyst-build.yaml deploy job permissions
PR #720 added a `gh workflow run blueprint-release.yaml` dispatch
step at the end of the deploy job to close the bot-deploy-doesn't-
trigger-workflows gap from #712. Step has been failing on every run
since with HTTP 403 "Resource not accessible by integration"
because GITHUB_TOKEN lacks `actions: write` by default.
Result: blueprint-release was never dispatched after PR #722–727
merged; the bp-catalyst-platform OCI artifact stayed on the
pre-fix chart and any Sovereign provisioned afterwards picked up
the buggy chart. Add the missing permission so dispatch succeeds.
2. AuthLayout.tsx vertical centering at small viewport heights
The sign-in / verify cards were mathematically centered at
1440×900 (Δ=0.008px verified via getBoundingClientRect in
Playwright) but founder reports the card sitting at the top of
the screen on real-world viewports. Root cause: the right panel
had `flex flex-1 items-center justify-center` which centers ONLY
if the inner content fits within the viewport — at smaller heights
the form's natural content flow pushed the card off-screen with
no scroll fallback.
Fix: add `items-stretch` to the outer flex (so the right panel
fills full viewport height), `overflow-y-auto` on the right
column (so the card can scroll inside its column when too tall),
and `py-8` padding on the card wrapper (breathing room when
scrolling kicks in). Result: card is vertically centered when
content fits, and stays visible (column-scrollable) when it
doesn't, on every viewport height from 1024×600 up.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Operators of a live Sovereign can now enable / disable marketplace
mode (and edit storefront branding) from the console's Settings →
Marketplace page without re-running provisioning. The page POSTs to
a new auth-gated endpoint that commits the change to the per-Sovereign
overlay file in the GitOps repo; Flux reconciles the chart on the
target Sovereign within ~1 min and the marketplace HTTPRoutes /
ConfigMaps re-render off the new values.
Per the founder's 2026-05-04 GitOps rule + INVIOLABLE-PRINCIPLES.md
#3, the handler does NOT touch in-cluster ConfigMaps directly — every
mutation is a git commit on the audit trail.
Backend:
- new handler POST /api/v1/sovereigns/{id}/marketplace
- looks up deployment, verifies #689 ownership, decodes body
- shallow-clones openova-public to a scratch tempdir using a
CATALYST_GITOPS_TOKEN PAT (env-gated; 503 if unset)
- patches clusters/<fqdn>/bootstrap-kit/13-bp-catalyst-platform.yaml
via yaml.v3 Node round-trip (ingress.marketplace.enabled +
marketplace.brand.{name,tagline,primaryColor})
- commits as "catalyst-api <ops@openova.io>" with message
"settings: marketplace enabled=<bool> for <fqdn>" + pushes
origin HEAD:<branch>; returns commit SHA + appliedAt
- 5-minute deadline + scratch RemoveAll to never leak the auth URL
- token-bearing URLs redacted on every error path so a 500 body
never echoes the GitOps PAT
- hex-colour validator + handler-side reject of malformed brand
colour so the chart's CSS template can't 500 on a typo
- route wired inside the existing RequireSession group in main.go
- 5 unit tests cover YAML patch round-trip, hex validation, token
URL injection, and stderr redaction
Frontend:
- new page src/pages/sovereign/settings/MarketplaceSettings.tsx
- render: heading + toggle card + brand fields (Name, Tagline,
primary colour with picker + hex input + inline error)
- footer: idle / saving / reconciling (with short SHA) / applied /
error states; auto-clears applied after 8s
- route /console/settings/marketplace under the existing
SovereignConsoleLayout
- SovereignSidebar grows a sub-nav under Settings showing
"Marketplace" only when /console/settings/* is active
- 4 vitest cases lock-in render, toggle flip, colour validation,
fetch contract (URL + credentials:'include' + payload shape)
2 of 3 parallel pieces; wizard step + catalog admin page in companion PRs.
Closes#710 partially.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 of 3 parallel pieces; wizard step + settings page in companion PRs.
Adds the Sovereign-console operator surface for marketplace curation.
Backend support shipped in PR #724 (#710 wave 2): GET /catalog/apps and
PATCH /catalog/admin/apps/{slug}/publish?value={true|false}. This PR
wires the per-row toggle UI on top of those endpoints.
products/catalyst/bootstrap/ui/src/pages/sovereign/CatalogAdminPage.tsx
======================================================================
- Header: "Catalog & marketplace publishing" + subtitle naming the
marketplace.<sovereignFQDN> hostname so the operator knows exactly
which storefront they're curating.
- Toolbar: search input (matches name/slug/tagline/description) +
category filter dropdown derived from the loaded set.
- Table: per-app row with icon + name + slug + tagline / category pill /
status pills (Backing service / Deployable / Coming soon / Featured) /
Published switch.
- Optimistic UI: flipping the toggle updates the row immediately. On
PATCH failure the previous state is restored and a toast is raised
via useNotifications. Per-slug pending bookkeeping debounces rapid
clicks so a second click waits for the first PATCH to resolve.
- System apps (mysql/postgres/redis) render with the toggle disabled
and a tooltip explaining "Backing services are never shown in
marketplace" — matches the storefront filter in
ListPublishedApps (system: false).
- Apps with deployable=false render a "Coming soon" pill but the
Published toggle still works — operators may pre-publish so when the
catalog team flips deployable=true the storefront row appears
instantly.
- Auth: fetch and PATCH both use credentials:'include' so the
catalyst_session cookie minted by /auth/handover travels along. Backend
requireAdmin enforcement is unchanged; UI only adapts the wire-level
contract.
products/catalyst/bootstrap/ui/src/app/router.tsx
==================================================
- New /console/catalog route mounted under SovereignConsoleLayout
(so the OIDC + cookie auth gate runs first).
products/catalyst/bootstrap/ui/src/pages/sovereign/SovereignSidebar.tsx
======================================================================
- Catalog entry in the left rail between Users and Settings, with the
bookshelf icon. Adds 'catalog' to ActiveSection + path regex so the
active highlight follows /console/catalog.
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), the API URL
flows through API_BASE so the same image works on Sovereign clusters
(BASE='/') and Catalyst-Zero (BASE='/sovereign/').
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inserts StepMarketplace between StepComponents and StepDomain so the
operator can opt the new Sovereign into a multi-tenant SaaS platform
during provisioning. The toggle drives store.marketplaceEnabled, which
StepReview now ships in the POST /v1/deployments body — the catalyst-api
Request struct + OpenTofu var.marketplace_enabled + cloud-init Flux
substitute + bp-catalyst-platform ingress.marketplace.enabled values
were all wired earlier (PR #719); this PR is the missing UI seam.
Brand fields (name / tagline / primary colour) persist on the wizard
state so a future settings page can read them without re-prompting on
every wizard run. The chart only consumes the enabled flag for now.
Wizard step list grows from 7 to 8 stops (StepMarketplace at id=6,
shifting Domain → 7 and Review → 8). WizardLayout test updated to
assert the new count; the existing pre-existing StepComponents test
failures (CORTEX cascade) and the @tabler/icons-react typecheck error
are untouched and unrelated.
Companion PRs (other agents): post-launch settings page + catalog
publish/unpublish admin. This is 1 of 3 parallel pieces on #710 wave 3.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single source of truth for apps; Sovereign-console operator decides which
apps marketplace customers see; marketplace storefront filters by
Published. Per founder rule 2026-05-04: unpublish is a marketplace-
visibility toggle, not a deployment-lifecycle action — existing tenant
deployments of an unpublished app keep running unaffected.
core/services/catalog/store/store.go
====================================
- App.Published bool — operator-controlled visibility
- ListPublishedApps: marketplace-storefront subset
(Published=true AND System=false AND Deployable=true).
System and Deployable are catalog-team-controlled; Published is the
operator's curation knob.
- SetAppPublished(slug, bool) — hot-path one-bit write the Sovereign
console hits per row toggle. Cheaper than UpdateApp; slug-keyed so
the UI doesn't need the internal Mongo _id.
- UpdateApp: thread published through full-update path too.
core/services/catalog/handlers/handlers.go + routes.go
======================================================
- ListApps now honours ?published=true query param:
GET /catalog/apps → operator view: every app
GET /catalog/apps?published=true → marketplace view: filtered
- New PATCH /catalog/admin/apps/{slug}/publish?value={true|false}
for the Sovereign-console operator's row toggle.
- requireAdmin gating preserved on the admin endpoint.
core/services/catalog/handlers/seed.go
======================================
- migrateAppPublished: defaults Published=true on every existing app
on the day Catalyst 1.3.x ships. Operators opt OUT of marketplace
visibility per app, not IN — matches how a real SaaS storefront is
curated and prevents an empty marketplace on flag-introduction day.
Idempotent on re-run.
core/marketplace/src/lib/api.ts
================================
- getApps() now hits /catalog/apps?published=true so the marketplace
storefront only renders the operator-curated subset.
DoD pending wave 2.5
====================
The Sovereign-console "Catalog & publishing" admin page (per-row
toggle UI) is the next chunk and ships in a follow-up — backend +
storefront filter are the load-bearing change here. Catalog admins
can flip the flag today via the PATCH endpoint; the per-row UI is
quality-of-life on top.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Wave 1 of #721 — what the founder actually saw on console.openova.io
and marketplace.openova.io / marketplace.<sov>.
PIN email rewrite (catalyst-api auth.go)
========================================
Was: plaintext "Your OpenOva sign-in code:\n\n 9 6 5 1 2 8\n…"
Now: multipart/alternative MIME with a polished HTML alternative —
white card on neutral background, OpenOva mark + wordmark,
"Your sign-in code" heading, big tinted code block (34px monospaced,
10px letter-spacing, one-tap copy on iOS Mail), expiration + ignore
notice, footer credit. Inline styles only — Gmail/Outlook web strip
<style>. Card pinned at 480px so narrow webmail panes render correctly.
text/plain fallback kept for clients without HTML.
Catalyst-Zero verify page (VerifyPinPage.tsx)
=============================================
- Email shown as a copyable PILL with copy icon — click copies to
clipboard, icon flips to a check for 1.5s. Selection-fallback for
browsers without clipboard API.
- Centered title + subtitle (was left-aligned in 1.2.x).
- Microcopy: "Codes expire after 10 minutes — check your spam folder."
Marketplace checkout sign-in (CheckoutStep.svelte)
==================================================
- 1 single <input maxlength=6> → 6 separate <input maxlength=1>
boxes with auto-advance, paste-fan-out (paste a 6-digit code anywhere
on the row, all 6 boxes fill, autosubmits), backspace-back, ArrowLeft/
Right navigation, autocomplete=one-time-code on first box for iOS SMS
autofill, caret-transparent so the digit IS the caret.
- Email shown as the same copyable pill pattern (svg copy/check icons,
hover-to-brand affordance).
- Dropped "Use a different email" link (browser back works).
- Added expire/spam microcopy below button.
Header + wayfinding cleanup
===========================
- Header.svelte: top-right "Sign in" button hidden when pathname is
/checkout or /login. Two sign-in CTAs on the same screen was the UI
debris caught live 2026-05-04.
- CheckoutStep.svelte: "← Back to Review" moved from bottom-left
(where users don't look) to top-left above the Checkout heading,
rendered with a chevron icon.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
* feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes#710)
Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS
operator with a single overlay toggle.
Changes
=======
products/catalyst/chart:
- Chart.yaml 1.2.7 → 1.3.0
- values.yaml: ingress.marketplace.enabled toggle (default false) +
marketplace.{brand,currency,paymentProvider,signupPolicy} surface
- templates/sme-services/marketplace-routes.yaml: HTTPRoute
marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin,
/ → marketplace; HTTPRoute *.<sov> → console (per-tenant wildcard)
- templates/sme-services/marketplace-reference-grant.yaml: cross-
namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services
- .helmignore: stop excluding sme-services/* and marketplace-api/* (only
*.kustomization.yaml + *.ingress.yaml remain Kustomize-only)
- All sme-services/* + marketplace-api/* manifests wrapped with
{{ if .Values.ingress.marketplace.enabled }} so non-marketplace
Sovereigns render the chart unchanged
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
- chart version 1.2.7 → 1.3.0
- ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN}
- ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false}
infra/hetzner:
- variables.tf: marketplace_enabled var (string "true"/"false", default "false")
- main.tf: thread var into cloudinit-control-plane.tftpl
- cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED
on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations
products/catalyst/bootstrap/api/internal/provisioner/provisioner.go:
- Request.MarketplaceEnabled bool (json:"marketplaceEnabled")
- writeTfvars: marketplace_enabled = "true"|"false"
core/pool-domain-manager/internal/allocator/allocator.go:
- canonicalRecordSet adds "marketplace" prefix → marketplace.<sov>
resolves via PDM at zone-commit time (PR #710 explicit record so
caches don't depend on the *.<sov> wildcard alone)
DoD ready
=========
- helm template with ingress.marketplace.enabled=false → identical
manifest set to 1.2.7 (verified locally)
- helm template with ingress.marketplace.enabled=true → emits 17 extra
resources: 13 sme-services workloads + 2 marketplace-api + 1
HTTPRoute pair + 1 ReferenceGrant
- pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green
- catalyst-api builds, provisioner cloudinit_path_test green
* fix(ci): catalyst-build dispatches blueprint-release after deploy commit (closes#712)
The deploy job's `git push` is made under GITHUB_TOKEN; per GitHub
Actions design, commits authored by GITHUB_TOKEN don't re-trigger
workflows. blueprint-release.yaml's `on.push.paths: products/*/chart/**`
filter matches the deploy commit's diff (chart/values.yaml +
chart/templates/{api,ui}-deployment.yaml), so the workflow SHOULD fire,
but doesn't — leaving the bp-catalyst-platform:1.2.7 OCI artifact stuck
on whatever catalyst-api SHA was current at the last manual chart-
touching PR.
Today (2026-05-03) this stranded otech62-otech66 on catalyst-api:74d08eb
six PRs after the SHA was superseded — every fresh Sovereign installed
the buggy pre-#701 image and rejected handover with 401 unauthenticated.
Fix: after `git push` succeeds in the deploy job, dispatch
blueprint-release explicitly via `gh workflow run`. The dispatched run
re-renders + re-publishes the chart with the just-pushed values.yaml.
Closes#712.
* fix(auth): sign-out actually signs out + iCloud-style PIN UX (closes#721)
Sign-out
========
1. Cookie-clear Domain mismatch
PIN-verify SETS catalyst_session with Domain:$CATALYST_SESSION_COOKIE_DOMAIN
so the cookie carries across console.<sov> and marketplace.<sov>.
HandleAuthLogout was clearing WITHOUT the Domain attribute. Browsers
require an exact-match Set-Cookie (Path + Domain + SameSite) to
actually drop a cookie — a mismatched Domain creates a new empty
cookie scoped to the current host while the original parent-domain
cookie stays alive. Next /whoami picks it up and the operator looks
"still signed in".
Fix: mirror the EXACT Domain/Path/Secure/SameSite the cookie was
set with. Same fix on catalyst_refresh.
2. Keycloak SSO session survives local cookie drop
Even if the local cookie clear worked, the upstream KC SSO session
stayed alive. The next OIDC PKCE auth-guard fetch silently re-
authenticated against KC and the operator landed back as the same
identity.
Fix: HandleAuthLogout returns 200 with
{ ok: true, keycloakLogoutURL: "<kc>/realms/<realm>/protocol/
openid-connect/logout?client_id=...&post_logout_redirect_uri=
<origin>/login" }.
UI's signOut() hard-navigates to keycloakLogoutURL so KC drops the
SSO session and 302s back to /login. qc.clear() flushes all
TanStack Query caches before the navigation.
PIN UX (iCloud reference)
=========================
PinInput6.tsx
- Box size 48×56 → 56×64 (sm: 64×72)
- Border 1px → 1.5px, rounded-lg → rounded-xl
- Soft inner-shadow on top + bottom
- Filled box gets a brand-tinted border (operator sees progress)
- Focus: scale 1.04 + 3px ring at 30% brand alpha
- text-xl → text-2xl (sm: text-3xl), tracking-tight, tabular-nums
- caret-transparent — the digit IS the caret (matches iOS native)
- Webkit autofill background normalised
VerifyPinPage.tsx
- Title + subtitle centered (was left-aligned)
- Title 20px → 24px, semibold, tracking-tight
- Subtitle in two lines: "A 6-digit code was sent to" / email
- "Didn't get a code? Send a new one" + spam-folder microcopy below
- Error message centered
LoginPage.tsx
- Centered title + subtitle to match
- Copy: "We'll email you a 6-digit code to verify it's you."
---------
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
* feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes#710)
Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS
operator with a single overlay toggle.
Changes
=======
products/catalyst/chart:
- Chart.yaml 1.2.7 → 1.3.0
- values.yaml: ingress.marketplace.enabled toggle (default false) +
marketplace.{brand,currency,paymentProvider,signupPolicy} surface
- templates/sme-services/marketplace-routes.yaml: HTTPRoute
marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin,
/ → marketplace; HTTPRoute *.<sov> → console (per-tenant wildcard)
- templates/sme-services/marketplace-reference-grant.yaml: cross-
namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services
- .helmignore: stop excluding sme-services/* and marketplace-api/* (only
*.kustomization.yaml + *.ingress.yaml remain Kustomize-only)
- All sme-services/* + marketplace-api/* manifests wrapped with
{{ if .Values.ingress.marketplace.enabled }} so non-marketplace
Sovereigns render the chart unchanged
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
- chart version 1.2.7 → 1.3.0
- ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN}
- ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false}
infra/hetzner:
- variables.tf: marketplace_enabled var (string "true"/"false", default "false")
- main.tf: thread var into cloudinit-control-plane.tftpl
- cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED
on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations
products/catalyst/bootstrap/api/internal/provisioner/provisioner.go:
- Request.MarketplaceEnabled bool (json:"marketplaceEnabled")
- writeTfvars: marketplace_enabled = "true"|"false"
core/pool-domain-manager/internal/allocator/allocator.go:
- canonicalRecordSet adds "marketplace" prefix → marketplace.<sov>
resolves via PDM at zone-commit time (PR #710 explicit record so
caches don't depend on the *.<sov> wildcard alone)
DoD ready
=========
- helm template with ingress.marketplace.enabled=false → identical
manifest set to 1.2.7 (verified locally)
- helm template with ingress.marketplace.enabled=true → emits 17 extra
resources: 13 sme-services workloads + 2 marketplace-api + 1
HTTPRoute pair + 1 ReferenceGrant
- pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green
- catalyst-api builds, provisioner cloudinit_path_test green
* fix(ci): catalyst-build dispatches blueprint-release after deploy commit (closes#712)
The deploy job's `git push` is made under GITHUB_TOKEN; per GitHub
Actions design, commits authored by GITHUB_TOKEN don't re-trigger
workflows. blueprint-release.yaml's `on.push.paths: products/*/chart/**`
filter matches the deploy commit's diff (chart/values.yaml +
chart/templates/{api,ui}-deployment.yaml), so the workflow SHOULD fire,
but doesn't — leaving the bp-catalyst-platform:1.2.7 OCI artifact stuck
on whatever catalyst-api SHA was current at the last manual chart-
touching PR.
Today (2026-05-03) this stranded otech62-otech66 on catalyst-api:74d08eb
six PRs after the SHA was superseded — every fresh Sovereign installed
the buggy pre-#701 image and rejected handover with 401 unauthenticated.
Fix: after `git push` succeeds in the deploy job, dispatch
blueprint-release explicitly via `gh workflow run`. The dispatched run
re-renders + re-publishes the chart with the just-pushed values.yaml.
Closes#712.
---------
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
* fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713)
Closes#713
Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:
1. SOVEREIGN_FQDN race
api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
with optional:true. On Sovereigns, that ConfigMap is rendered by the
sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
HelmRelease. When the Pod starts first, valueFrom collapses to "" and
stays empty — audience check rejects every valid token as "invalid
audience". Fix: add Reloader annotations so the Pod rolls when the
ConfigMap (and the handover-jwt-public Secret) appears.
2. catalyst-api-server SA missing user-level realm-management role mappings
bp-keycloak realm import granted roles via clientScopeMappings — wrong
level. The actual service-account user had no clientRoles entry, so KC
rejected GET /users with 403 when catalyst-api tried to ensure the
operator user during handover. Fix: add explicit "users" array binding
service-account-catalyst-api-server to realm-management.{impersonation,
manage-users, view-users, query-users}.
* fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715)
Closes#715
Two architectural bugs surfaced live on otech64 (2026-05-03), both leading
to a healthy-looking Sovereign that the operator could not reach.
1. catalyst-api tofu workdir on emptyDir
CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's
catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered
a rolling restart 3 minutes into otech64's tofu run), in-progress state
was lost. Tofu had created LB/network/server/services but not the
hcloud_load_balancer_target.control_plane resource yet — the cluster
came up at the k3s level but the public LB had no targets, returning
TLS handshake failure for every console.<sov> request.
Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed,
fsGroup=65534 already wires write access). tofu apply resumes from
where it left off after any Pod restart.
2. bp-reloader env-vars strategy
reloadStrategy=env-vars only injects checksum env vars for ConfigMaps
referenced via envFrom. Workloads using valueFrom: configMapKeyRef
(catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the
configmap.reloader.stakater.com/reload annotation added in PR #714
was a no-op under env-vars.
Switch to reloadStrategy=annotations. Reloader bumps a pod-template
annotation, triggering rollout regardless of how the CM/Secret is
referenced.
* fix(bp-catalyst-platform): emit sovereign-fqdn ConfigMap inside chart, drop sovereign-tls duplicate (#717)
Closes#717
Reloader v1.4.16 is silent on the SOVEREIGN_FQDN race (#713). Tried all
annotation forms (configmap.reloader.stakater.com/reload, reloader/auto)
and both reload strategies (env-vars, annotations). RBAC is correct, watch
coverage is global, but manual CM patches produce zero Reloader log output
and zero Pod rollouts. Abandoning Reloader as the race fix.
Move the sovereign-fqdn ConfigMap into bp-catalyst-platform chart
templates, guarded by {{ if .Values.global.sovereignFQDN }}. Helm install
applies all chart manifests in a single etcd transaction so the ConfigMap
commits before the Pod schedules. valueFrom resolves correctly the first
time. No race possible.
Drop the duplicate from clusters/_template/sovereign-tls/ to avoid
Helm-vs-Flux ownership flapping. The Kustomize path on contabo enumerates
files in templates/kustomization.yaml so this Helm-templated file is never
parsed by Kustomize.
Verified live: deleting the existing CM and re-running Helm install
produced an immediately-correct catalyst-api Pod with SOVEREIGN_FQDN
populated, where the same install with the previous out-of-chart CM had
left the env empty for the Pod's lifetime.
---------
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
* fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713)
Closes#713
Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:
1. SOVEREIGN_FQDN race
api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
with optional:true. On Sovereigns, that ConfigMap is rendered by the
sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
HelmRelease. When the Pod starts first, valueFrom collapses to "" and
stays empty — audience check rejects every valid token as "invalid
audience". Fix: add Reloader annotations so the Pod rolls when the
ConfigMap (and the handover-jwt-public Secret) appears.
2. catalyst-api-server SA missing user-level realm-management role mappings
bp-keycloak realm import granted roles via clientScopeMappings — wrong
level. The actual service-account user had no clientRoles entry, so KC
rejected GET /users with 403 when catalyst-api tried to ensure the
operator user during handover. Fix: add explicit "users" array binding
service-account-catalyst-api-server to realm-management.{impersonation,
manage-users, view-users, query-users}.
* fix(catalyst-api,bp-reloader): tofu state on PVC + Reloader annotations strategy (#715)
Closes#715
Two architectural bugs surfaced live on otech64 (2026-05-03), both leading
to a healthy-looking Sovereign that the operator could not reach.
1. catalyst-api tofu workdir on emptyDir
CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu (emptyDir). When contabo's
catalyst-api Pod rolled mid-apply (the PR #714 deploy commit triggered
a rolling restart 3 minutes into otech64's tofu run), in-progress state
was lost. Tofu had created LB/network/server/services but not the
hcloud_load_balancer_target.control_plane resource yet — the cluster
came up at the k3s level but the public LB had no targets, returning
TLS handshake failure for every console.<sov> request.
Move CATALYST_TOFU_WORKDIR to /var/lib/catalyst/tofu (PVC-backed,
fsGroup=65534 already wires write access). tofu apply resumes from
where it left off after any Pod restart.
2. bp-reloader env-vars strategy
reloadStrategy=env-vars only injects checksum env vars for ConfigMaps
referenced via envFrom. Workloads using valueFrom: configMapKeyRef
(catalyst-api's SOVEREIGN_FQDN) are silently not reloaded — the
configmap.reloader.stakater.com/reload annotation added in PR #714
was a no-op under env-vars.
Switch to reloadStrategy=annotations. Reloader bumps a pod-template
annotation, triggering rollout regardless of how the CM/Secret is
referenced.
---------
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Closes#713
Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing
401 on /auth/handover:
1. SOVEREIGN_FQDN race
api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn"
with optional:true. On Sovereigns, that ConfigMap is rendered by the
sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform
HelmRelease. When the Pod starts first, valueFrom collapses to "" and
stays empty — audience check rejects every valid token as "invalid
audience". Fix: add Reloader annotations so the Pod rolls when the
ConfigMap (and the handover-jwt-public Secret) appears.
2. catalyst-api-server SA missing user-level realm-management role mappings
bp-keycloak realm import granted roles via clientScopeMappings — wrong
level. The actual service-account user had no clientRoles entry, so KC
rejected GET /users with 403 when catalyst-api tried to ensure the
operator user during handover. Fix: add explicit "users" array binding
service-account-catalyst-api-server to realm-management.{impersonation,
manage-users, view-users, query-users}.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
The wizard handover lands the operator at
GET https://console.<sov>.omani.works/auth/handover?token=<jwt>
which the Sovereign-side catalyst-api validates and 302-redirects to
/console/dashboard with a fresh `catalyst_session` HttpOnly Secure
SameSite=Lax cookie. Verified live with curl on otech49:
HTTP/1.1 302 Found
location: /console/dashboard
set-cookie: catalyst_session=eyJhbGciOiJSUzI1NiI...; HttpOnly; Secure; SameSite=Lax
The browser arrived at /console/dashboard with the cookie attached but
SovereignConsoleLayout went straight from "no sessionStorage tokens"
to initiateLogin() (PKCE redirect to Keycloak). Operators landed on
auth.<sov>.../auth?response_type=code&client_id=catalyst-ui&... — a
username/password screen. User from the field on otech49 + otech52
today: "fuck, this is asking username password!!!"
Fix: probe GET /api/v1/whoami (with credentials:'include') BEFORE
considering Keycloak. The whoami handler is gated by the catalyst-api
session middleware, which HMAC-validates the cookie's signature
against the local handover signer's public key. On 200, the layout
enters a new `cookie-authenticated` AuthState and renders the console
shell directly. On 401, the existing OIDC flow runs unchanged so
returning users with an expired cookie still get the silent refresh
plus PKCE fallback. 5xx is treated like 401 (fall through to OIDC) so
a flaky API never traps an authenticated user behind a Keycloak
login they don't need.
Sign-out is also branch-aware: the cookie path DELETEs
/api/v1/auth/session and reloads to '/'; the OIDC path keeps calling
initiateLogout() so the Keycloak end-session URL is still reached.
File changed: products/catalyst/bootstrap/ui/src/app/layouts/SovereignConsoleLayout.tsx
Tests added: products/catalyst/bootstrap/ui/src/app/layouts/SovereignConsoleLayout.test.tsx
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(catalyst-api/wipe): retry firewall delete on 422 resource_in_use
Hetzner server delete is asynchronous — returns 200 'action started'
while the firewall stays attached for 5-30s. Single-shot delete saw
422, swallowed it, reported '0 firewalls deleted' while leaving the
firewall live (verified on otech50 2026-05-03).
Adds deleteFirewallWithRetry with exponential backoff (6s/12s/24s/48s,
5 attempts). PurgeReport gains FirewallsRetried + S3Buckets fields.
Issue #706.
* feat(catalyst-api/wipe): add Hetzner Object Storage bucket purge
Adds PurgeBuckets() that empties + deletes the per-Sovereign Hetzner
Object Storage bucket via the S3 API. tofu destroy can't remove
`minio_s3_bucket` while objects are present, so 28 orphan buckets
accumulated from otech23..otech50 (audit 2026-05-03).
Sequence: BucketExists → ListObjectVersions → RemoveObjects (batch
1000) → ListIncompleteUploads → RemoveIncompleteUpload → RemoveBucket.
404 anywhere is idempotent success.
Issue #706.
* test(catalyst-api/wipe): firewall retry + bucket purge regression coverage
Adds purge_firewall_retry_test.go with three cases:
- TestFirewallRetry_Server_Detach_Async: 422 twice then 204 → 1 fw deleted
- TestFirewallRetry_Exhausted: always 422 → no fw deleted, error reported
- TestFirewallRetry_AlreadyGone_404: idempotent success path
Adds buckets_test.go with stubbed S3 endpoints exercising:
- BucketNameForSovereign/HetznerObjectStorageEndpoint contract
- empty bucket, 1500-version bucket (3 keys, multi-delete batches),
in-progress multipart upload abort, 404 idempotent, progress callback
Issue #706.
* fix(catalyst-api/wipe): wire bucket purge into WipeDeployment handler
After hetzner.Purge() returns (which now retries firewall delete on
422), call hetzner.PurgeBuckets() with the per-Sovereign Object Storage
credentials from dep.Request. Runs AFTER tofu destroy so tofu state
isn't fought, BEFORE local-record cleanup so the wizard banner shows
the count.
Skips with a logged warning when in-memory credentials are unavailable
(Pod restart between provision and wipe). The SSE log + UI banner now
report the s3-buckets count alongside the existing resource tallies.
Issue #706.
* feat(catalyst-ui): wipe banner now reports S3 buckets + firewall retries
Adds s3_buckets and firewalls_retried fields to the WipeReport
TypeScript shape and renders the new bucket count alongside the
existing servers/lbs/networks/firewalls/ssh-keys tally. When the
firewall retry counter is non-zero, surfaces it in a parenthetical so
operators see why the wipe took an extra few seconds.
Both the AppsPage Cancel & Wipe modal and the DecommissionPage success
view consume the same WipeReport interface so this single update
covers both surfaces.
Issue #706.
---------
Co-authored-by: hatiyildiz <hatice@openova.io>
* fix(bp-external-dns): livenessProbe.initialDelaySeconds=180 for cold-cluster cache-sync (closes#700)
PR #679 added --request-timeout=120s but external-dns has TWO timeouts:
RequestTimeout (per-API-call, controlled by --request-timeout) and
WaitForCacheSync (initial informer sync, hardcoded 60s in upstream binary,
NOT exposed as a flag). On a fresh Sovereign with k3s apiserver
CPU-saturated, the cache sync misses 60s -> fatal: failed to sync
*v1.Node: context deadline exceeded -> CrashLoopBackOff 5-10 times.
Caught live on otech49+ (2026-05-03), 5 restarts before stable.
Bump livenessProbe.initialDelaySeconds from upstream 10s default to 180s
so kubelet does NOT restart the Pod while the initial cache sync runs
against a CPU-saturated freshly-provisioned k3s apiserver. The Sovereign
apiserver reaches steady-state within ~2 min so 3 min comfortably covers
cold starts. Also bumps periodSeconds=30 + failureThreshold=3 so a
genuinely-hung pod is still killed within ~90s once steady-state.
readinessProbe gets a corresponding initialDelaySeconds=30 so endpoint
flapping during sync doesn't churn services.
Helm overrides REPLACE whole maps (not merge), so the override preserves
the upstream httpGet.path: /healthz + port: http shape verbatim.
Bumps:
- platform/external-dns/chart/Chart.yaml: 1.1.5 -> 1.1.6
- clusters/_template/bootstrap-kit/12-external-dns.yaml: HelmRelease pin 1.1.5 -> 1.1.6
* fix(catalyst-api/jobs): bridge subscribes to helmwatch transition events (closes#695)
Wires the per-deployment jobs.Bridge directly to the helmwatch
Watcher's runtime event stream so every per-component HelmRelease
transition observed AFTER the initial-list seed advances the per-Job
state map. The wizard's /jobs page now reflects the live cluster state
instead of pinning Install rows to whatever the initial-list snapshot
saw at attach time.
Symptom (verified on otech48/49/50/52, 2026-05-03 14:40-19:20):
the wizard rendered Install rows as "running"/"pending" even after
`kubectl --context=otech<N> -n flux-system get hr` showed every
bp-* HelmRelease at Ready=True.
Wiring change:
helmwatch.Watcher.Subscribe(fn func(provisioner.Event)) — fan-out
callback registered alongside the primary `emit` Emit. Every event
the Watcher dispatches reaches both sinks. Used by the handler at
attachBridgeSeederHook + RefreshWatch construction sites:
watcher.Subscribe(func(ev provisioner.Event) {
if err := bridge.OnProvisionerEvent(ev); err != nil {
h.log.Warn("jobs bridge: runtime event forward failed",
"id", depID, "phase", ev.Phase,
"component", ev.Component, "err", err)
}
})
Tests:
- internal/jobs/helmwatch_bridge_test.go::TestBridge_SeedThenRuntimeTransitions
seeds 3 pending HRs, asserts 3 pending jobs; emits Ready=True for
HR-1 → asserts 1 succeeded + 2 pending; emits Ready=Unknown for
HR-2 → asserts 1 succeeded + 1 running + 1 pending. Verifies
StartedAt / FinishedAt / DurationMs / LatestExecutionID stamps
too.
- internal/helmwatch/helmwatch_test.go::TestWatch_SubscribeFanOut
proves a Subscribe callback receives the same set of per-component
events as the primary emit, including the "ready for handover"
terminal event.
- internal/helmwatch/helmwatch_test.go::TestWatch_SubscribeNilIsNoop
guards against panic on nil callback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #679 added --request-timeout=120s but external-dns has TWO timeouts:
RequestTimeout (per-API-call, controlled by --request-timeout) and
WaitForCacheSync (initial informer sync, hardcoded 60s in upstream binary,
NOT exposed as a flag). On a fresh Sovereign with k3s apiserver
CPU-saturated, the cache sync misses 60s -> fatal: failed to sync
*v1.Node: context deadline exceeded -> CrashLoopBackOff 5-10 times.
Caught live on otech49+ (2026-05-03), 5 restarts before stable.
Bump livenessProbe.initialDelaySeconds from upstream 10s default to 180s
so kubelet does NOT restart the Pod while the initial cache sync runs
against a CPU-saturated freshly-provisioned k3s apiserver. The Sovereign
apiserver reaches steady-state within ~2 min so 3 min comfortably covers
cold starts. Also bumps periodSeconds=30 + failureThreshold=3 so a
genuinely-hung pod is still killed within ~90s once steady-state.
readinessProbe gets a corresponding initialDelaySeconds=30 so endpoint
flapping during sync doesn't churn services.
Helm overrides REPLACE whole maps (not merge), so the override preserves
the upstream httpGet.path: /healthz + port: http shape verbatim.
Bumps:
- platform/external-dns/chart/Chart.yaml: 1.1.5 -> 1.1.6
- clusters/_template/bootstrap-kit/12-external-dns.yaml: HelmRelease pin 1.1.5 -> 1.1.6
Co-authored-by: hatiyildiz <hatice@openova.io>
Keycloak v26 dropped legacy 'requested_subject' token-exchange. The
auth_handover.go path still called kc.ImpersonateToken() which uses
that parameter, returning 400 'invalid_request'. PR #694 already
moved PIN-verify to local JWT minting via handoverSigner.SignCustomClaims;
apply the same pattern to /auth/handover.
Caught live on otech49 (2026-05-03):
ERROR auth_handover: ImpersonateToken failed
err=token endpoint 400: Parameter 'requested_subject' is not
supported for standard token exchange
Sovereign Keycloak still owns the canonical user record (created via
EnsureUser before token mint) — only the session-cookie minting
moves local. IdP brokering and federation paths are unaffected.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #692 moved the Sovereign-side JWK volume mount from
/var/lib/catalyst/handover-jwt-public.jwk (subPath, conflicted with
the catalyst-api PVC) to /etc/catalyst/handover-jwt-public/public.jwk
(directory mount). The chart sets CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
to the new path, but the AuthHandover handler never read that env.
Result: auth_handover.go used the hardcoded default
/var/lib/catalyst/handover-jwt-public.jwk which no longer exists,
returning 401 'public key unavailable' on every handover.
Caught live on otech49 (2026-05-03):
ERROR auth_handover: load public key failed
err=read /var/lib/catalyst/handover-jwt-public.jwk: no such file
path=/var/lib/catalyst/handover-jwt-public.jwk
Fix:
- Resolution order: handler field -> env var -> default const
- Default const updated to the new path so cold-starts work without
the env var (defence in depth)
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Sovereign-side /auth/handover handler is the ENTRY POINT that
establishes the session. The operator's browser arrives with the
handover JWT in the URL query and zero cookies. Putting the route
inside the RequireSession middleware group rejects every handover
with 401 {error:unauthenticated} before AuthHandover ever runs.
Caught live on otech49 (2026-05-03):
GET /auth/handover?token=<valid-jwt> -> 401 in 43us (middleware
rejection, no body log line emitted).
This was working on otech48 only because catalyst-api there had no
Keycloak credentials wired (kc-sa-credentials Secret was missing) so
GetAuthConfig() returned nil and RequireSession became a passthrough.
Once PR #691 wired the credentials cleanly on otech49, the gate
activated and broke the handover.
Fix: register the route at the top-level mux outside the auth group,
mirroring the same pattern as /api/v1/deployments/{id}/kubeconfig
(cloud-init postback that also has no cookies). The handler's own
JWT validation IS the authentication.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helm parses the entire file (including YAML comments) for template
directives BEFORE YAML parsing strips comments. Literal '{{ ... }}'
inside a # comment was treated as a template directive and failed
with 'unexpected <.> in operand' at line 419.
PR #698 introduced this in the explanatory comment for the
SOVEREIGN_FQDN ConfigMap workaround. Reword to avoid the literal
double-curlies — the comment still describes the constraint without
tripping the Helm parser.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #692 added an inline Helm-template `value:` for SOVEREIGN_FQDN in
api-deployment.yaml. That broke contabo-mkt's catalyst-platform Flux
Kustomization (path: ./products/catalyst/chart/templates) because Kustomize
parses raw YAML and Helm `{{ ... }}` is not valid YAML syntax. Live error
on contabo at adf8dc7d:
kustomize build failed: yaml: invalid map key:
map[string]interface {}{".Values.global.sovereignFQDN | default \"\" | quote":""}
Replace the Helm-template form with `valueFrom.configMapKeyRef.optional:
true` so the same template renders cleanly under both consumers:
- contabo-mkt (Kustomize): ConfigMap `sovereign-fqdn` doesn't exist →
optional ref → env stays empty → catalyst-api on contabo never validates
handover JWTs anyway (it's the SIGNER, not the validator). Correct.
- Sovereigns (Helm via bp-catalyst-platform OCI chart): on apply, the
sovereign-tls Kustomization renders `sovereign-fqdn-configmap.yaml` with
envsubst on ${SOVEREIGN_FQDN}, creating the ConfigMap with the per-
Sovereign FQDN. catalyst-api Pod resolves the ref → env populated →
audience check works.
This restores the bridge between the two consumers without forking the
template. The bp-catalyst-platform 1.2.5 → 1.2.7 bump publishes the new
chart; bootstrap-kit overlay pin updated.
Will be verified on otech49 (next provision after this lands).
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(flow-canvas): variable-width depth columns + ResizeObserver debounce (#669 round 3)
Round-2 UAT showed:
1. Dense bucket of 30+ siblings piled at the right edge while 60% of
canvas (left side) sat empty with one bubble per depth.
2. Sim "trying never stabilizing" during pane-transition animations.
Root cause #1: round-2 used a constant `perDepthX` for every depth.
With one-bubble depths next to a 30+ sibling depth, the dense bucket
got 80% × perDepthX (~128 px) of horizontal room and had to pile into
8+ sub-columns; sparse depths each got the same perDepthX (~160 px)
for a single bubble. Net: 60% canvas unused on the left, dense
cluster jammed at right.
Round-3 fix#1: variable-width depth columns. Each depth gets a slot
whose width tracks its bucket's natural extent at radius R:
sparse buckets need 2R + small gap; dense buckets need
(totalCols - 1) * (2R + COLLIDE_PADDING) to fit sub-columns
side-by-side. depthToX returns the centerline of slot[depth];
adjacent slots are separated by `gap = clamp(r*4, MIN, MAX)`. Total
layout width = sum(slots) + gaps.
Root cause #2: ResizeObserver fired on every animation frame during
the 220ms padding-right transition (pane open/close). Every fire
called setHostSize, which retriggered layoutMetrics → R changed by
1-2 px → all node targets shifted → sim re-seeded → never settled.
Round-3 fix#2: 180ms debounce on the observer + 8 px epsilon gate
(sub-pixel changes ignored entirely). Combined with snap-to-4 on R
and snap-to-8 on slot widths in layoutMetrics, the metrics now hold
constant during pane-transition animations and the sim converges
once.
Tests: bounded layout (17) + JobDetail (5) all green; tsc -b clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(flow-canvas): sqrt-aspect dense buckets + tight grid clamps (#669 round 4)
Round-3 still piled the dense bucket at the right edge. Distribution
test on the founder's exact screenshot shape (1+1+30) showed the dense
slot occupied only 28% of total X-extent — better than round-2 (~13%)
but not enough.
Round-4 fix:
1. layoutMetrics targets a sqrt-aspect-ratio for dense buckets:
targetRows = round(sqrt(count / 1.6))
30 leaves → 4 rows × 8 cols → ~700 px slot at R=40, occupying
>50% of total X-extent. The densest bucket's targetRows now sets
R via vertical-fit, so wide buckets actually claim X-room rather
than collapsing into thin tall columns.
2. gridTargets reads cols/rows from layoutMetrics.slotInfo instead
of recomputing — guarantees the per-tick clamp uses the same
sub-grid dimensions as the slot-width math.
3. Per-cell clamp window narrowed to ±(pitch/2 - R) so the bubble
edge can never reach a neighbour's centre. Old clamp used the
full pitch which let forceCollide push bubbles into a neighbour's
territory and then ratcheted them in — centres could collapse to
<2R apart.
Adds FlowCanvasOrganic.distribution.test.tsx replicating the founder's
UAT screenshot (depth 0: 1, depth 1: 1, depth 2: 30). Asserts:
- depth-0 X < depth-1 X < depth-2 X (left-to-right)
- dense leafSpan ≥ 30% of total layout extent
- no centre-to-centre distance < 2R
All tests green: distribution (2/2), bounded (17/17), JobDetail (5/5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wizard surface is now anonymous-first. A visitor lands on
console.openova.io and runs the entire 7-step provisioning flow
without a session; auth fires only when they click Launch.
Frontend (catalyst-ui):
- Drop the wizardAuthGuard so the wizard route renders for anonymous
visitors. The existing zustand+persist store already keeps every
form field in localStorage with credential-hygiene partitioning
(Hetzner token, SSH private key, registrar token NEVER persisted),
so the guest-mode hydration on refresh works for free.
- New shared/lib/useSession hook polls /api/v1/whoami via React
Query; exposes signedIn / email / refetch / signOut.
- New widgets/auth/ProfileMenu in the wizard header — Sign in button
for anonymous, email-initial avatar with sign-out dropdown for
signed-in.
- New widgets/auth/PinSignInModal — two-stage email → 6-digit PIN
modal that POSTs /auth/pin/issue + /auth/pin/verify (issue #688).
Falls back to /auth/magic-link when the PIN endpoint is not
available, so this PR is shippable independent of #688's merge
order.
- StepReview Launch handler routes anonymous through the PIN modal;
on verify it stamps the verified email into orgEmail and POSTs
the deployment immediately.
- New /provision/* beforeLoad guard: anonymous → redirect to wizard
with a sessionStorage flash banner; signed-in cross-tenant gets
the canonical 404 from the API (no UI-side branch).
- New shared/lib/flashBanner — sessionStorage seam for the guard →
wizard banner hand-off.
Backend (catalyst-api):
- Add OwnerEmail to store.Record and handler.Deployment, stamped
from X-User-Email at CreateDeployment.
- New checkOwnership helper enforces 404 (NEVER 403) on cross-tenant
access — never leak existence of someone else's deployment via
the response code. Legacy records (OwnerEmail == "") pass through
with a warning so in-place upgrade does not lock operators out.
- Wired into GetDeployment, StreamLogs, GetDeploymentEvents,
WipeDeployment, GetKubeconfig, MintHandoverToken, ListJobs, and
GetJob. PutKubeconfig keeps its bearer-token auth (cloud-init
postback path).
Tests:
- Backend: deployments_owner_test.go covers legacy passthrough,
no-session passthrough, owner match (case-insensitive), the
load-bearing 404-not-403 cross-tenant assertion, and end-to-end
proof through GetDeployment + GetDeploymentEvents.
- Frontend: flashBanner round-trip + clear-on-read; useSession
signed-in / 401 / signOut paths; WizardLayout guest-mode
[Sign in] button + flash banner rendering.
Closes#689.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
otech48 incident (2026-05-03): all 37 bp-* HelmReleases on the Sovereign
cluster reached Ready=True, but the catalyst-api deployment record stayed
status=phase1-watching. Wizard's POST /mint-handover-token returned 409
not-handover-ready, blocking the auto-redirect to console.<sov>/auth/handover.
Root cause: helmwatch's terminate-on-all-done gate required len(observed) >=
MinBootstrapKitHRs. Chart shipped CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=38,
but the actual bootstrap-kit cardinality had drifted to 37 — making the
gate permanently unsatisfiable. Watch ran until 60-minute WatchTimeout fired.
Fix: gate terminate-on-all-done on the informer's HasSynced signal instead
of the brittle count. After WaitForCacheSync returns the full bp-* set is
in the cache regardless of cardinality. MinBootstrapKitHRs stays as a
defence-in-depth floor (default lowered 11 → 1) for the empty-cache
footgun. Chart env CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS dropped to 1.
Implementation:
- helmwatch.Watcher: new informerSynced bool gate, set after
WaitForCacheSync. processEvent refuses to consider terminate-on-all-done
while informerSynced=false. After WaitForCacheSync, re-evaluate the
all-terminal check once on the synced cache (handles the rehydrate-
after-restart path where every HR is already Ready=True at attach).
- helmwatch.maybeEmitReadyTransition: emits the operator-visible
"All N blueprints reconciled. Sovereign ready for handover." SSE event
exactly once when the gate fires (idempotency guard against flicker
re-triggering the gate).
- handler.markPhase1Done: persistDeployment after status flip so the
on-disk JSON reflects status=ready before any wizard poll. Also
refuses to downgrade an already-adopted deployment if a late watcher
event tries to flap it.
- Tests: new transition_test.go with happy-path, idempotency, partial-
ready, realistic 37-HR convergence, and empty-cache scenarios. New
TestMarkPhase1Done_RefusesToDowngradeAdopted in phase1_watch_test.go.
Will be verified live on otech49 (next provision after this lands):
- Wizard auto-shows "Open your Sovereign Console" button within 30s of
all HRs reaching Ready
- No manual API calls or kubectl exec needed to flip status
- catalyst-api logs show "All 37 blueprints reconciled" event in SSE buffer
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the magic-link login flow on console.openova.io with a paste-friendly
6-digit numeric PIN, modelled on bank/Google verification screens. Founder
rejected magic links because they look like phishing (2026-05-03).
## Backend (products/catalyst/bootstrap/api)
- New handler/pinstore.go — sync.Mutex-guarded in-memory map keyed by email
with 10-minute TTL, 60-second per-email rate limit, 3-attempt lockout, and
a background goroutine that sweeps expired entries every minute.
PINs are NEVER persisted to disk per credential-hygiene rules.
- handler/auth.go rewritten:
* POST /api/v1/auth/pin/issue — body {email}. EnsureUser in openova realm,
generate 6-digit PIN with crypto/rand (NEVER math/rand), store, send
plaintext email with prominent "3 7 2 4 5 8" code and NO clickable URL,
return {ok, requestId, expiresInSec}. Rate-limit 60s.
* POST /api/v1/auth/pin/verify — body {email, pin, requestId}. Atomic
verify+decrement, on match mint self-signed session JWT (same handover
signer; KC 24.7 removed legacy token-exchange) and set HttpOnly Secure
SameSite=Lax cookie. Wrong: 401 with attemptsRemaining. Locked/expired:
410. Stable error codes: pin-invalid / pin-expired / attempts-exceeded /
email-required / pin-rate-limited.
- Routes wired in cmd/api/main.go. Legacy /auth/magic and /auth/callback
redirect to /login?error=flow_changed for stale bookmarks.
- Handler struct gets a pinStore field; openovaKC keycloakClient kept for
the EnsureUser call.
- Tests: auth_pin_test.go (14 tests covering happy path, all error codes,
SMTP rollback, rate limit, request-mismatch) + pinstore_test.go (12 tests
on the store invariants).
## Frontend (products/catalyst/bootstrap/ui)
- New PinInput6.tsx component — 6 inputs, inputmode=numeric, maxlength=1,
auto-advance focus, Backspace steps back, paste-anywhere splits clipboard
digits across boxes (extracts /\d/g), auto-submits on the 6th digit or
Enter. one-time-code autocomplete on box 0 for SMS prefill.
- LoginPage rewritten — single email field, "Send code" button, on success
navigates to /login/verify with email + requestId in the URL. PIN never
enters the URL.
- New VerifyPinPage — renders PinInput6, calls /pin/verify, on 401 shows
"Code incorrect, X attempts remaining", on 410 routes back to /login
with the error code, on 200 navigates to /wizard (or ?next=...).
- AuthCallbackPage stripped of magic-link code path; Catalyst-Zero branch
is now a 302 safety net for stale Keycloak redirect URIs.
- Router gets /login/verify route.
- 17 vitest cases on PinInput6 covering paste, typing, backspace, Enter,
pasting alphanumerics/long strings, controlled value, disabled state.
## DoD verification
- go test ./internal/handler/... -run "Pin|Handover|Auth" → PASS
(12 pinstore_test + 14 auth_pin_test + handover/auth tests)
- npm test src/components/PinInput6.test.tsx → 17 passed
- helm template products/catalyst/chart → renders without error
- Email body contains zero clickable URLs: TestSendPinEmail_NoMagicLinkURL
asserts ?token=, &token=, magic-link substrings absent
Closes#688
Co-authored-by: hatiyildiz <hatice@openova.io>
* fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup)
Sovereign-side catalyst-api needs Keycloak service-account credentials
to provision the operator's user during /auth/handover. Today the chart
references K8s Secret `catalyst-kc-sa-credentials` with keys addr/realm/
client-id/client-secret in the catalyst-system namespace — but no
zero-touch path materialised it. The dead SealedSecret template at
09a-keycloak-catalyst-api-secret.yaml had a different name AND different
keys (CATALYST_KC_*), used PLACEHOLDER_SEALED_VALUE markers no
provisioner replaced, and wasn't even listed in the bootstrap-kit
kustomization.
Symptom on otech48: GET /auth/handover?token=<valid-jwt> returns
"server misconfiguration: keycloak not configured"
(auth_handover.go:169).
Fix: bp-keycloak chart's configmap-sovereign-realm.yaml template now
emits the realm-import ConfigMap AND the catalyst-kc-sa-credentials
Secret in a single template scope so they share the same generated
client secret. Pattern mirrors platform/powerdns/chart/templates/
api-credentials-secret.yaml (canonical seam, ADR-0001 §11.3
anti-duplication).
Secret-value resolution order (first match wins):
1. operator-supplied .Values.catalystApiServerClientSecret
2. helm `lookup` of existing Secret in keycloak ns (idempotent)
3. fresh randAlphaNum 32 (zero-touch on first install)
The Secret carries the four keys exactly as the catalyst-api Pod's
secretKeyRef expects — addr / realm / client-id / client-secret —
with addr derived from gateway.host (https://auth.<sovereignFQDN>).
Reflector annotations auto-mirror the Secret to catalyst-system as
soon as that namespace materialises (bootstrap-kit slot 13).
The realm import already creates the catalyst-api-server client with
serviceAccountsEnabled + impersonation/manage-users/view-users/
query-users role mappings — so once Keycloak is Ready and the realm
imports, the SA is fully provisioned and the K8s Secret carries a
matching client secret. No post-install Job, no Admin-API script,
no out-of-band SealedSecret ceremony.
Cleanup: removes the dead 09a SealedSecret template (not in
kustomization, never produced a working Secret).
Bumps:
- bp-keycloak chart 1.3.0 -> 1.3.1
- clusters/_template/bootstrap-kit/09-keycloak.yaml HelmRelease
pin 1.3.0 -> 1.3.1
Existing per-Sovereign overlays (clusters/otech.omani.works/,
clusters/omantel.omani.works/) intentionally remain on 1.3.0 — fresh
otechN provisioning consumes _template at provision time.
Will be verified live on otech49 — handover end-to-end without ANY
manual Secret creation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(keycloak): bump blueprint.yaml spec.version to match chart 1.3.1
TestBootstrapKit_BlueprintCardsHaveRequiredFields/keycloak asserts
Chart.yaml.version == blueprint.yaml.spec.version. Forgot to bump
blueprint.yaml in the previous commit.
Note: 8 other blueprints (cert-manager, flux, crossplane, sealed-secrets,
spire, nats-jetstream, openbao, gitea) carry the same pre-existing
mismatch and the test fails on main too. Out of scope for this PR;
fixing the keycloak case to keep the new chart version internally
consistent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>