This is the comprehensive fix for the chroot Sovereign Console DoD
gaps caught on omantel.biz 2026-05-06. Eight pages were broken with
"Something went wrong!" / "Invariant failed" / "Couldn't load" /
"Not Found"; root causes traced to (a) /api/v1/sovereign/self
returning 503 because env vars weren't populated post-handover,
(b) several Sovereign endpoints (/users, /catalog, /settings,
/topology) didn't exist server-side, and (c) several pages used
strict-mode useParams against the mother-side /provision/$id/...
route which throws Invariant on the chroot /apps, /users, /settings,
/app/$id routes.
Server changes:
- auth.Claims gains SovereignFQDN + DeploymentID fields.
- auth_handover.go authHandoverClaims gains the same; the minted
Sovereign session JWT now carries them so downstream handlers
can resolve identity without env or store-fallback.
- sovereign_self.go reads sovereign_fqdn / deployment_id from the
catalyst_session cookie payload (best-effort base64 decode; no
signature check needed since this catalyst-api minted the cookie
in the first place). Resolution order: env → cookie → store →
503/404.
- new handlers in sovereign_more.go:
GET /api/v1/sovereign/users — Keycloak realm users
GET /api/v1/sovereign/catalog — embedded blueprints catalog
GET /api/v1/sovereign/settings — tenant identity + features
GET /api/v1/sovereign/topology — hierarchical infra view
for CloudPage's getHierarchicalInfrastructure()
All return well-shaped empty responses on any error (no 500s
that bubble into UI error boundaries).
UI changes:
- SettingsPage / AppDetail / UserAccessListPage replace strict-mode
useParams({ from: '/provision/$deploymentId/...' }) with
useParams({ strict: false }) + useResolvedDeploymentId() fall-
back. Now works on BOTH the mother route AND the chroot
Sovereign route without throwing Invariant.
- CatalogAdminPage's fetchApps swaps /catalog/apps → /api/v1/
sovereign/catalog when window.location.hostname is not
console.openova.io.
- getHierarchicalInfrastructure (CloudPage's source) swaps
/api/v1/deployments/{id}/infrastructure/topology → /api/v1/
sovereign/topology under the same chroot guard.
Bumps bp-catalyst-platform 1.4.49 → 1.4.50.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
The /api/v1/sovereign/jobs endpoint returns a minimal shape
{id, name, namespace, kind, status, startedAt, finishedAt} — no
appId, parentId, dependsOn, childIds. JobsTable iterates
`for (const d of job.dependsOn)` and reads
`job.appId.toLowerCase()` etc., which throws TypeError
'Cannot read properties of undefined (reading length)' and
breaks page render entirely (0 rows shown).
Coerce missing fields to safe defaults in defaultFetchJobs so
the table renders. Followup: server-side handler should return
the full Job shape with empty arrays for missing fields.
Bumps bp-catalyst-platform 1.4.48 → 1.4.49.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
The useLiveJobsBackfill hook gates with `enabled: enabled && !!deploymentId`.
On chroot Sovereign Console where /sovereign/self returns 503
(deployment-id-not-yet-stamped) and the route doesn't carry an
:deploymentId param, deploymentId is the empty string and the query
NEVER mounts. Live jobs always remained empty, mergeJobs fell
through to reducer-derived imported snapshot (every job pinned at
'pending').
Fix: when DETECTED_MODE.mode === 'sovereign', enable the query
regardless of deploymentId emptiness. The URL is FQDN-scoped via
the session cookie, no deploymentId needed in the path.
Bumps bp-catalyst-platform 1.4.47 → 1.4.48.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
CloudPage's switcher rendered `d.id.slice(0, 8)` without a nullish
guard. When listDeployments returns an entry with undefined id (e.g.
malformed/legacy record), this throws TypeError 'Cannot read
properties of undefined (reading slice)' which the React error
boundary catches as 'Invariant failed', breaking all of /cloud.
Caught on omantel.biz 2026-05-06.
Also bumps the literal :91eeeed → :2122fb8 in api-deployment.yaml /
ui-deployment.yaml so freshly provisioned Sovereigns pick up the
JobsPage+AppsPage live-status fix from PR #1039 (chart 1.4.46's
values.yaml had :2122fb8 but the templated literals didn't).
Bumps bp-catalyst-platform 1.4.46 → 1.4.47.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
Symptom on omantel.biz 2026-05-06: every job and every app on the
Sovereign Console showed "Pending" forever, even when the underlying
HelmReleases were Ready=True and the cluster was fully operational.
Root cause:
- JobsPage's useLiveJobsBackfill was gated by `inFlight =
streamStatus !== 'completed' && streamStatus !== 'failed'`. The
imported snapshot mother POSTs at handover ALWAYS arrives with
streamStatus="completed" (mother considered phase-1 done before
firing the JWT). So inFlight=false and disablePolling=true on
Sovereign mode → liveJobs.length=0 → mergeJobs returns the
reducer-derived imported snapshot (every job pinned at "pending").
- AppsPage read `state.apps[id].status` from the same imported
reducer state. No live-status overlay.
Fix:
- JobsPage: bypass the inFlight gate when DETECTED_MODE.mode ===
'sovereign'. Live polling /api/v1/sovereign/jobs is the
authoritative source on chroot Sovereign Console.
- AppsPage: add a useQuery polling /api/v1/sovereign/apps every 5s
on Sovereign mode, mapping the server's status enum
(installed | installing | bootstrap | available) to the UI's
ApplicationStatus vocabulary, and overlay it on top of the
reducer-derived status.
Bumps bp-catalyst-platform 1.4.45 → 1.4.46.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
The Sovereign Console's chroot /cloud and /apps panes back onto
HandleSovereignCloud / HandleSovereignApps in catalyst-api, which
use the in-cluster client to enumerate cluster-wide K8s resources
(Nodes, Namespaces, Services, PVCs, StorageClasses, Ingresses,
HTTPRoutes, HelmReleases). The pre-existing ClusterRole only
covered the cutover-step Job-driving verbs (configmaps/jobs/pods).
Caught on otech130 2026-05-06: /api/v1/sovereign/cloud returned
{nodes:[], namespaces:[], …} because every List call hit a silent
apiserver Forbidden, and the handler's err branch falls through
to an empty response shape.
Adds get/list/watch on:
- core: nodes, namespaces, services, persistentvolumes,
persistentvolumeclaims
- networking.k8s.io: ingresses
- gateway.networking.k8s.io: httproutes, gateways
- storage.k8s.io: storageclasses
- helm.toolkit.fluxcd.io: helmreleases
Bumps bp-catalyst-platform 1.4.44 → 1.4.45.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
Chart 1.4.43 was built before PR #1032 bumped chart Chart.yaml in
the same commit, so its values.yaml had tag :91eeeed but the
hardcoded image refs in templates/api-deployment.yaml and
templates/ui-deployment.yaml stayed at :ff864e9 (the previous
bump from PR #1030). Sovereigns provisioned with chart 1.4.43
therefore still have the duplicate-sidebar bug — caught on
otech129 2026-05-05.
This bump pins the literal refs to :91eeeed, which is PR #1032's
commit SHA. Bootstrap-kit pin moves 1.4.43 → 1.4.44 so otech130+
get the PortalShell skip-inner-Sidebar logic.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
Symptom on otech127 2026-05-05: every page on the Sovereign Console
rendered TWO overlapping sidebars, where the inner one had broken
URLs like /provision//jobs (empty $deploymentId after the slash).
Clicking sidebar links failed because the broken sidebar was on top
and intercepted clicks.
Root cause: SovereignConsoleLayout (the chroot-route layout) mounts
SovereignSidebar with clean-root URLs (/jobs, /apps, etc.). The page
component (e.g. JobsPage) wraps its content in PortalShell, which
ALSO mounts the older Sidebar with deploymentId-templated URLs
(/provision/$deploymentId/jobs). On the chroot route there's no
deploymentId path param, so tan-stack renders /provision//jobs.
Fix: PortalShell skips its inner Sidebar when DETECTED_MODE.mode ===
'sovereign'. The outer SovereignSidebar (mounted by
SovereignConsoleLayout) is the correct chroot sidebar in that mode.
On mother-mode (/provision/$id/X) the inner Sidebar renders normally.
Bumps bp-catalyst-platform 1.4.42 → 1.4.43.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
The Sovereign Console at /sovereign/deployments rendered every row's FQDN
as a Link to=`/dashboard` regardless of which row was clicked. On contabo
(mother) this resolved to /sovereign/dashboard (the CURRENT user's
Sovereign), so clicking ANY row in the deployments list always
navigated to the same dashboard — breaking the operator's expectation
that "click row X to see deployment X's pages."
Fix: route each row to /provision/<row-id>/dashboard on the mother view
(Catalyst-Zero), and to /dashboard on the chroot Sovereign view (where
each Sovereign sees only its own deployment, so /dashboard is correct).
Mode resolved via the existing DETECTED_MODE singleton.
Bumps bp-catalyst-platform chart 1.4.40 → 1.4.41.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
Live on otech124 right now: /api/v1/sovereign/self returns 503
deployment-id-not-yet-stamped because:
- CATALYST_SELF_DEPLOYMENT_ID env is empty (orchestrator never patches
it, and #984's cutover-step-09-graduate idea wasn't merged either)
- The handler doesn't fall back to the local store
The deployment record IS imported on Sovereign (verified — POST
/api/v1/internal/deployments/import returns 200, persisted log
confirmed). Once the handler scans the store, /sovereign/self
returns the deploymentId and every chroot-aware UI Link
(/dashboard, /jobs, /apps, /cloud) finally renders correctly.
Without this, every <Link> built via useResolvedDeploymentId on
Sovereign mode produces /provision//<page> with empty id segment,
which the route validator rejects with 'Deployment id in the URL
is malformed' (founder report).
Closes the live regression on otech124.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per founder report on otech122, the Sovereign Console /jobs page showed
all 'Pending' status — the imported deployment record's job snapshot
captured at mother's phase1-watching state, frozen forever.
The fix is small: useLiveJobsBackfill on Sovereign mode (DETECTED_MODE
=== 'sovereign') prefers /api/v1/sovereign/jobs which sovereign.go
already exposes — it reads HelmRelease history + recent K8s Jobs from
the local cluster's apiserver via in-cluster config and returns LIVE
status. The /api/v1/deployments/<id>/jobs path stays the default for
contabo monitor surface (mother view of an in-flight provision —
that's where the imported record IS the canonical view).
Also added credentials:'include' so the cookie reaches the endpoint.
Closes the user-reported 'all jobs Pending forever' on Sovereign
Console.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last leftover from PR #983's URL contract that PR #992 reverts undid.
PR #996 caught the auth_handover.go + router.tsx /console/dashboard
references but missed AuthCallbackPage.tsx:80. The Sovereign-side
PKCE callback after Keycloak login was navigating to a route that
doesn't exist in the consoleLayoutRoute tree.
Found while verifying otech124 mid-Phase-1.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two problems surfaced live on otech122 (founder report):
1. SovereignSidebar.tsx still has /console/X paths.
PR #983 originally fixed this. PR #984 introduced the same fix in a
different shape. PR #992 (revert of broken redirect chain) reverted
#984 and accidentally reverted #983's SovereignSidebar fix too —
both PRs touched the same nav literals. PR #998 re-fixed
Sidebar.tsx (mother) but missed re-fixing SovereignSidebar.tsx.
Symptoms: clicking Settings on console.<sov-fqdn> goes to
/console/settings (route doesn't exist → 'Not found'); other nav
items fall through to wizard-side /provision//<page> handlers.
2. AppsPage.tsx app card row link is not chroot-aware.
On the mother monitor surface, the row link to <Link to='/app/$id'>
escapes /sovereign/provision/<dep-id>/ to /sovereign/app/<id>.
Fix: same DETECTED_MODE-aware pattern as PR #1000 used for JobsTable
and FlowPage.
3. SovereignConsoleLayout's settings dropdown navigate also still
pointed at /console/settings — fixed inline.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same architectural reasoning as PR #1008 (subdomains/check). The wizard's
StepCredentials, StepDomain, StepCloud-creds and StepSSH all run BEFORE
the operator authenticates. Gating those endpoints on a session cookie
returned 401 to every anonymous visitor and blocked the only flow that
matters.
Move from rg (session-gated) to r (unauthenticated):
- /api/v1/credentials/validate (Hetzner token + project id)
- /api/v1/credentials/object-storage/validate (S3 creds)
- /api/v1/sshkey/generate (read-only ephemeral keypair)
- /api/v1/registrar/{r}/validate (Dynadot key+secret)
All four are read-only probes — they call the upstream API
(Hetzner/S3/Dynadot) with the operator-supplied credential and return
200/400 based on whether it works. No state change on success. The
upstream API itself is the auth gate (a wrong credential simply gets
rejected at the upstream).
/api/v1/registrar/{r}/set-ns stays in rg (session-gated) — it's
called from CreateDeployment which is itself post-auth.
Closes the wizard 401 the founder hit on Domain (BYO Dynadot) +
Credentials (Hetzner) steps trying otech with omantel.biz.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* deploy: re-bump chart literal :b45a49f → :8ec8c01 (mistake-rollback fix)
PR #1006 rolled back to :b45a49f because the catalyst-api pod was
ImagePullBackOff for ~30s while pulling :8ec8c01. The image was IN
GHCR; the pull just took time. Pod recovered to Running on :8ec8c01,
THEN my rollback kicked in and reverted to :b45a49f — losing the
wizard credentials fix from PR #1004 that the founder needed.
Re-bump forward. :8ec8c01 contains useSubdomainAvailability's
credentials:'include' fix that closes the wizard 401 → false-502.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(catalyst-api): make /api/v1/subdomains/check public (no session required)
The wizard's Domain step renders BEFORE the operator authenticates —
PIN issue + verify happen AFTER they pick a subdomain. Requiring a
session cookie on /api/v1/subdomains/check forced 401 on every
anonymous visitor and trapped logged-out operators in a 'check
unavailable' state.
Move the route from rg (session-gated) to r (unauthenticated). Same
model as /auth/pin/issue: read-only public-facing endpoint with no
state change. Information disclosure is negligible — 'is this
subdomain taken?' is what DNS itself answers to anyone with a
resolver.
The handler routes to PDM (managed pool) or DNS (BYO); both are
read-only. PDM has its own rate-limiting middleware on the public
ingress, so anonymous spam is bounded by that.
Closes the wizard 401 the founder hit on otech119 Domain step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1006 rolled back to :b45a49f because the catalyst-api pod was
ImagePullBackOff for ~30s while pulling :8ec8c01. The image was IN
GHCR; the pull just took time. Pod recovered to Running on :8ec8c01,
THEN my rollback kicked in and reverted to :b45a49f — losing the
wizard credentials fix from PR #1004 that the founder needed.
Re-bump forward. :8ec8c01 contains useSubdomainAvailability's
credentials:'include' fix that closes the wizard 401 → false-502.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Chart 1.4.35 referenced :8ec8c01 before the catalyst-build for that
SHA finished pushing to GHCR. Flux applied → catalyst-api pod stuck
ImagePullBackOff → wizard breaks ('worked few seconds then failed').
Roll the literal back to :b45a49f (the previous working SHA from
chart 1.4.34). Chart version stays 1.4.35 to avoid re-publishing
churn. The wizard credentials fix in :8ec8c01 will land when the
build catches up — at which point we manually re-bump the literal.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>