Commit Graph

1457 Commits

Author SHA1 Message Date
github-actions[bot]
1cde1a085f deploy: update catalyst images to b004820 2026-05-07 17:57:25 +00:00
e3mrah
b00482007e
fix(catalyst): /jobs/timeline page renders without crash (#1076)
* fix(catalyst): /jobs/timeline page renders without crash

Root cause: JobsTimeline used a strict useParams({ from:
'/provision/$deploymentId/jobs/timeline' }) call, which threw "Invariant
failed" inside useSyncExternalStoreWithSelector when the actual route
tree-match was the chroot consoleJobsTimelineRoute (path '/jobs/timeline'
— added in PR #1073). The throw bubbled into the React Error Boundary
and replaced the entire surface with the "Something went wrong! Show
Error" overlay.

Fix: switch to the canonical useResolvedDeploymentId() pattern that
JobsPage / NotificationsPage / Dashboard use — it reads the URL
:deploymentId param when present (mothership tenant route) and falls
back to /api/v1/sovereign/self when absent (chroot Sovereign route).
Same module owns both topologies; no behaviour change for the
mothership tenant route.

Caught on console.omantel.biz QA pass 2026-05-07 (TC-050).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(catalyst): JobsTimeline header notes both routes

Refer to both /provision/$deploymentId/jobs/timeline (mothership) and
/jobs/timeline (Sovereign chroot) so future readers understand the
component is shared across topologies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 21:55:03 +04:00
github-actions[bot]
3fa187bc35 deploy: update catalyst images to 76830d9 2026-05-07 17:54:53 +00:00
e3mrah
76830d9c62
fix(catalyst): chroot — skip tenantDiscover polling, /auth/handover redirects authed user to / (#1077)
Two bugs surfaced live on console.omantel.biz on 2026-05-07.

TC-229 (P0) — chroot continuous /api/v1/tenant/discover 404 polling.
The Sovereign chroot's catalyst-api does not register the
tenant/discover endpoint (it is mother-only — only the Catalyst-Zero
apex `console.openova.io` knows about the tenant registry). The SPA's
bootstrapTenant() at app boot still ran on the chroot, returned 404,
and the SPA's React-Query layer kept re-issuing the call as the
Dashboard mounted/unmounted. 50+ HTTP 404 lines were captured during a
single Dashboard navigation. Fix: short-circuit bootstrapTenant() at
the single tenantDiscover.ts seam when DETECTED_MODE.mode ===
'sovereign'. Returns the existing 'unwired' status (no registry
available; proceed on the host's own identity), caches it so a second
call is a no-op, and never touches the network. Tenant identity on
chroot is already encoded in the session JWT (sovereign_fqdn /
deployment_id claims) so no registry payload is needed.

TC-004 (P1) — /auth/handover authenticated visit shows error page.
Fix #2 PR #1075 added the SPA-friendly handover-error page for browser
visits with no token. That branch fired even when the operator already
had a live catalyst_session cookie, so an authed user pasting the bare
/auth/handover URL saw "Handover incomplete" copy that confuses people
who are already logged in. Fix: add a three-way branch on no-token
visits — authenticated browser (302 to authHandoverRedirect, default
/dashboard), unauthenticated browser (existing 302 to handover-error
page from PR #1075), programmatic caller (existing 401 JSON contract
from auth_handover_test.go). New helper hasValidCatalystSession reads
the session token via auth.Config.ReadSessionToken (cookie / Bearer /
?access_token query — same channels RequireSession honours) and
validates it via auth.Config.ValidateToken (same path RequireSession
uses, including LocalPublicKey fallback for self-signed handover-
session JWTs). Returns false when authConfig is nil so unconfigured
Sovereigns / CI keep working unchanged.

Tests: TestAuthHandover_MissingTokenAuthedRedirectsToDashboard
(raw-JWT cookie + Bearer header), MissingTokenExpiredSessionFalls-
Through (expired session falls through to error page),
MissingTokenNoAuthConfigKeepsHTMLBranch (nil authConfig keeps the
existing branches working). Existing missing-token tests unchanged.

Files touched (per Fix Author #6 brief):
- products/catalyst/bootstrap/ui/src/shared/lib/tenantDiscover.ts
- products/catalyst/bootstrap/api/internal/handler/auth_handover.go
- products/catalyst/bootstrap/api/internal/handler/auth_handover_test.go

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 21:52:21 +04:00
github-actions[bot]
56a568dc1c deploy: update catalyst images to 3dc9f42 2026-05-07 16:32:02 +00:00
e3mrah
3dc9f42c95
fix(catalyst): chroot SPA 404s for /cloud/legacy + /notifications + /readyz shadow + /auth/handover html error (#1075)
Five live bugs surfaced on console.omantel.biz 2026-05-07:

  TC-090..092  /cloud/architecture, /cloud/compute, /cloud/network/ingresses
               returned the SPA shell with TanStack Router default 404 in
               sovereign mode. The legacy redirects (LEGACY_CLOUD_REDIRECTS)
               were only mounted under the mothership /provision/$id/cloud
               subtree, never at root for sovereign mode.

  TC-160       /notifications returned the SPA shell + 404 because the only
               notifications route was /provision/$id/notifications and
               NotificationsPage hard-required the URL :deploymentId param
               via useParams({ from: '/provision/$deploymentId/notifications' }).

  TC-211       /readyz returned the SPA shell (HTTP 200 + index.html)
               instead of a real Go-handler probe response, because no
               Gateway rule routed it to catalyst-api — nginx try_files
               and the SPA catch-all both shadowed the path.

  TC-004       /auth/handover with no token returned raw 401 JSON
               {"error":"missing token parameter"} to browser visits,
               breaking the seamless-handover UX promise for stale
               email-link clicks.

Fixes:

* products/catalyst/chart/templates/httproute.yaml — Exact matches
  for /readyz and /healthz on the console hostname route to catalyst-api.
  External monitors pointing at console.<sov>/readyz now hit the real
  Go probe; pod-level k8s probes still hit nginx-internal /healthz.

* products/catalyst/bootstrap/api/internal/handler/auth_handover.go —
  Browser visits (Accept: text/html or Sec-Fetch-Mode: navigate) on
  the missing-token path 302-redirect to /auth/handover-error?reason=
  missing_token. Programmatic callers (Accept: application/json or no
  Accept header) keep the legacy 401 JSON contract that the test
  matrix pins. New tests cover both branches.

* products/catalyst/bootstrap/ui/src/app/router.tsx — Adds
  authHandoverErrorRoute (/auth/handover-error) with a friendly
  error surface; consoleNotificationsRoute (/notifications under the
  Sovereign console layout); consoleLegacyCloudRedirectRoutes
  (sovereign-mode siblings of legacyCloudRedirectRoutes, reusing
  LEGACY_CLOUD_REDIRECTS verbatim so the two redirect sets cannot
  drift). consoleCloudRoute gains validateSearch matching
  provisionCloudRoute.

* products/catalyst/bootstrap/ui/src/pages/sovereign/NotificationsPage.tsx —
  Replaces strict useParams({ from: '/provision/$deploymentId/...' })
  with useResolvedDeploymentId so the page works on both /provision/$id/
  notifications (URL param) and sovereign-mode /notifications
  (/api/v1/sovereign/self self-discovery). Mirrors the pattern used by
  JobsPage / SettingsPage / Dashboard.

Verification:
  helm template products/catalyst/chart  — clean
  npm run build                          — clean (1.88MB bundle, vite v8)
  npx tsc --noEmit                       — clean
  go build ./...                         — clean
  go test -run TestAuthHandover_MissingToken — PASS (legacy + new HTML branch)

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 20:29:49 +04:00
github-actions[bot]
5a1216992d deploy: update catalyst images to 369b60e 2026-05-07 16:18:19 +00:00
e3mrah
369b60ec5c
fix(catalyst): chroot EventSource auth via access_token query param — unblocks 13 cloud list views (#1074)
The chroot Sovereign Console SPA performs its own PKCE OIDC flow with
Keycloak and stores the access_token in sessionStorage. installFetchAuthInterceptor
patches window.fetch to attach Authorization: Bearer to /api/v1/* calls
— but the EventSource browser API does NOT support custom request
headers. The chroot also has no PIN-minted catalyst_session cookie
(operator authenticates via Keycloak, not PIN), so withCredentials:true
sent nothing. Result: every /api/v1/sovereigns/<id>/k8s/stream connection
landed in 401 → SPA rendered "Stream temporarily unreachable". Affected
tests: TC-066 services, TC-067 ingresses, TC-071 pods, TC-072 deployments,
TC-073 statefulsets, TC-074 daemonsets, TC-075 replicasets, TC-076
configmaps, TC-078 namespaces, TC-079 nodes, TC-080 persistentvolumes,
TC-081 endpointslices, TC-086 pods.

Fix follows the standard SSE auth pattern used by Grafana / Loki:
accept the access token as a `?access_token=<jwt>` URL query parameter,
validate it through the same JWKS path as Authorization: Bearer.

BE — products/catalyst/bootstrap/api/internal/auth/session.go:
ReadSessionToken now consults three channels in order: (1) Authorization:
Bearer header, (2) ?access_token=<jwt> query parameter, (3) catalyst_session
cookie. Same JWT-shape (3 base64url segments) sanity check before
ValidateToken so a malformed value short-circuits to 401 with no JWKS
round-trip. The query-param path NEVER displaces the header when both
are present (header wins) — preserves the live-fetch source of truth
when an old ?access_token= is left in the address bar after a refresh.

BE — products/catalyst/bootstrap/api/cmd/api/main.go:
Replaced chi's middleware.Logger with a custom pathOnlyLogFormatter
(implementing chi's middleware.LogFormatter) that emits r.URL.Path only
— never r.RequestURI. Critical for credential hygiene per CLAUDE.md §10:
chi.DefaultLogFormatter writes RequestURI verbatim, which would leak
the access_token query parameter to stdout. The new logger emits
structured slog fields (method/path/status/elapsedMs/remote) instead.

FE — useK8sCacheStream.ts + useK8sStream.ts:
Both EventSource consumers now read loadTokens() from sessionStorage and
append `&access_token=<accessToken>` to the URL when an OIDC token is
present. Mother (Catalyst-Zero) sessions store no OIDC tokens, so the
param is omitted and the existing catalyst_session cookie path is unchanged.

Tests:
- 8 new Go tests in session_test.go covering all 7 channel
  permutations + JWT-shape validation + whitespace handling.
- 2 new vitest cases in useK8sStream.test.ts asserting the URL contains
  access_token=<jwt> when sessionStorage has an OIDC token, and omits
  it on mother (cookie-only path).

Verification:
  $ go build ./... && go test ./internal/auth/... → ok
  $ npm run typecheck && npm run build → ok
  $ npx vitest run src/lib/useK8sStream.test.ts → 11/11 passing
  $ curl -i 'https://console.omantel.biz/.../k8s/stream?kinds=pod' → 401
    (will return 200 + SSE frames after deploy)

Risk surface: a stale ?access_token= URL in the operator's address bar
will be rejected with 401 once the JWT expires, surfacing as the same
"Stream temporarily unreachable" banner. The SPA's existing reconnect
loop drives a fresh EventSource on every retry, which picks up the
freshest token from sessionStorage — so the failure mode is self-healing
on the next browser-driven retry.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 20:15:54 +04:00
github-actions[bot]
23558f90a7 deploy: update catalyst images to 67e55eb 2026-05-07 16:13:56 +00:00
e3mrah
67e55ebb0b
fix(catalyst): /jobs/timeline router precedence + bp-spire/keycloak detail copy (#1073)
Sovereign Console (chroot, console.<sov-fqdn>) was missing the static
/jobs/timeline route entirely — TanStack Router fell through to the
dynamic /jobs/$jobId route with jobId='timeline', rendering the
'Job not found' surface. The mothership /provision/$deploymentId/jobs
tree already had the correct precedence (timeline before $jobId);
this PR ports the same pattern to consoleLayoutRoute children.

Also corrects a stale comment in applicationCatalog.ts that listed
bp-spire among the bootstrap kit. The generated BOOTSTRAP_KIT (sourced
from clusters/_template/bootstrap-kit/) does not include spire — it is
a tier-up selection. Documents that /app/bp-spire correctly renders
'App not found' on Sovereigns where the operator did not select it.

Caught on console.omantel.biz QA pass 2026-05-07 (TC-050).

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-07 20:11:38 +04:00
github-actions[bot]
a8da886a18 deploy: update catalyst images to 0286276 2026-05-07 13:19:06 +00:00
hatiyildiz
02862769cf fix(catalyst): JobDetail crash on Phase-0 jobs (undefined appId.startsWith)
The Phase-0 lifecycle jobs I added in PR #1072 have empty appId
(they are NOT Sovereign components). The Job struct serialises
appId with omitempty → undefined on the wire. FlowPage.tsx (the
canvas embedded inside JobDetail) called j.appId.startsWith('bp-')
unguarded, throwing TypeError 'Cannot read properties of undefined
(reading startsWith)' the moment any Phase-0 job appeared in the
merged jobs list. The whole JobDetail page crashed under the React
Error Boundary — exactly what the founder caught on /jobs/install-
tempo and /jobs/install-catalyst-platform.

Fix: coerce j.appId to '' before .startsWith and fall back to
j.jobName when bare is empty. Also skip empty-bare entries from
the liveIdByBare map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:16:51 +02:00
github-actions[bot]
cbb653a938 deploy: update catalyst images to 0316c44 2026-05-07 13:12:38 +00:00
hatiyildiz
0316c444e1 fix(catalyst): chroot JobDetail 'Job not found' + graph WorkerNode duplicates
User found two bugs after the previous round, both verified live:

1. /jobs/install-tempo (and every other deep-link) rendered "Job
   not found" because useLiveJobsBackfill keyed its React Query on a
   constant 'sovereign' string. First render fired with empty
   deploymentId (useResolvedDeploymentId hadn't resolved yet) →
   /api/v1/deployments//jobs → 400. When the real id arrived, the
   query key DIDN'T change, so React Query kept the failed cache and
   never refetched. JobDetail's jobsById stayed empty → Job not
   found banner. Fix: include resolved deploymentId in the queryKey
   AND gate enabled on !!deploymentId so the first fetch waits.

2. /cloud?view=graph showed duplicate WorkerNodes (8 instead of 4)
   because the cloud-side topology synth emitted node id
   'node-<k8s-name>' while the k8sAdapter emits bare '<k8s-name>'.
   mergeGraphs couldn't dedupe across the prefix mismatch. Fix:
   topology_loader synth now uses the bare K8s node name as the
   topology id so WorkerNode composite ids match exactly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:10:17 +02:00
github-actions[bot]
46d868738e deploy: update catalyst images to d7c8c47 2026-05-07 12:24:22 +00:00
hatiyildiz
d7c8c47f8c fix(catalyst): apps status — ignore reducer's default-pending init on chroot
Previous fix's fallback chain skipped to state.apps[app.id]?.status
which is 'pending' by default for every app at reducer init, never
reaching the 'available' fallback. Now: live API status wins; SSE
reducer state honoured only when it's an explicit non-pending
transition; on Sovereign mode with live query loaded, missing
app.id falls to 'available' (AVAILABLE pill) instead of 'pending'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:22:17 +02:00
github-actions[bot]
de309e149a deploy: update catalyst images to 2f97710 2026-05-07 12:19:26 +00:00
hatiyildiz
2f97710be4 fix(catalyst): apps fallback to AVAILABLE not PENDING when no API entry
componentGroups.ts references blueprints not in blueprints.json
(KEDA, Axon, Debezium, Envoy, frpc, NetBird, etc) — data drift
between the two catalog sources. The FE was rendering these as
PENDING (implying install in progress) instead of AVAILABLE
(implying not yet deployed). Default to 'available' when no API
or reducer state exists so the operator sees the right call-to-
action pill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:17:01 +02:00
github-actions[bot]
f376ee4551 deploy: update catalyst images to 1a85a9b 2026-05-07 12:11:54 +00:00
hatiyildiz
1a85a9b226 fix(catalyst): chroot /jobs lifecycle seed runs even when bootstrap-kit children already in store
The early-return guard (existing>0) short-circuited the lifecycle seed
on every Sovereign that had previously seeded the bootstrap-kit
children. Split the guard so the provisioner-group seed fires
independently when missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:09:22 +02:00
github-actions[bot]
15bf2f28cc deploy: update catalyst images to 4a171b0 2026-05-07 12:06:40 +00:00
e3mrah
4a171b00d8
fix(catalyst): chroot /jobs Phase-0 + /cloud topology synth + AVAILABLE pill (#1072)
Three issues raised on console.omantel.biz, each verified live in
Playwright BEFORE this fix and to be re-verified after deploy:

1. /jobs missing Phase-0 lifecycle rows. Only the 40 install-* rows
   from bootstrap-kit children showed; tofu-init/plan/apply/output and
   cluster-bootstrap rows were absent because those Job records live
   on the mother only. Fix: chrootSeedJobsStoreIfEmpty now also calls
   bridge.SeedProvisionerJobs() + MarkProvisionerComplete() so the
   chroot view shows the full deployment history under a "Provision
   Hetzner" group, all stamped Succeeded.

2. /cloud kind=clusters / node-pools / vclusters / load-balancers
   rendered "No clusters yet". The topology loader required the
   deployment record's Regions to be non-empty; the chroot's
   synthesised Deployment has empty Regions. Fix:
   topology_loader.buildTopology now falls through to a chroot path
   that lists live K8s Nodes via the in-cluster dynamic client,
   groups them by `node.kubernetes.io/instance-type` to derive
   NodePools, and emits one Region/Cluster carrying every real Node.
   lookupDeploymentForInfra now also calls chrootEnsureDeployment so
   the chroot path actually fires.

3. KEDA (and 14 other catalog items) showed "PENDING" pill with no
   install affordance — confusing because PENDING is what in-flight
   installs render. Fix: introduce ApplicationStatus='available' as
   a distinct value; map API status="available" to it; render an
   "AVAILABLE" pill (accent-tinted, distinct from neutral PENDING)
   so the operator sees the right call-to-action.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:03:59 +04:00
github-actions[bot]
d45fa4a8b4 deploy: update catalyst images to 8e631eb 2026-05-07 11:28:11 +00:00
e3mrah
8e631ebd05
fix(catalyst): chroot Sovereign Console OIDC bearer auth + self synth id (#1071)
The chroot Sovereign Console SPA performs its own PKCE OIDC flow
(client-side token exchange — no server-minted catalyst_session
cookie). Until now, every /api/v1/* fetch from the chroot 401'd
because the BE's session middleware ONLY read catalyst_session
cookie. The user observed: /apps showed all 36 apps as "pending"
(liveAppsQuery 401 → fell back to wizard frozen state); /jobs
appeared limited; /cloud, /dashboard etc all degraded.

Three coupled fixes:

1. BE session middleware now ALSO accepts Authorization: Bearer
   <jwt>. ValidateToken handles signature verification against the
   same JWKS regardless of whether the JWT arrived via cookie or
   header. (auth/session.go: ReadSessionToken)

2. FE installs a global window.fetch interceptor at boot
   (main.tsx → installFetchAuthInterceptor). When the SPA holds an
   OIDC access_token in sessionStorage (Sovereign Console only,
   never on mother), every /api/v1/ fetch automatically picks up
   Authorization: Bearer. Mother (cookie-based) is a transparent
   no-op since sessionStorage has no token.

3. HandleSovereignSelf now also reads SOVEREIGN_FQDN env (the
   chroot's standard sovereign-fqdn ConfigMap entry — same name
   used by k8scache.factory.go). When no deployment id resolves
   from any source, synthesise "sovereign-<fqdn>" — matching the
   k8scache self-register convention so /api/v1/sovereigns/{id}/*
   handlers' chroot-aliasing finds the same single registered
   cluster the FE is targeting.

End-to-end: a fresh-cutover Sovereign Console serves real-time
apps + jobs + cloud data to operators who logged in via direct
Keycloak (no handover JWT), no per-deployment cutover-import
step required.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:26:03 +04:00
github-actions[bot]
deaf74270a deploy: update catalyst images to 118b9eb 2026-05-07 08:31:47 +00:00
e3mrah
118b9eb67d
fix(catalyst): durable Phase-0 jobs + chroot post-cutover live data (#1070)
Three coupled fixes for what the user observed post-cutover on
console.omantel.biz:

1. JobsTable rows for tofu-init/plan/apply/output/cluster-bootstrap
   disappeared the moment bootstrap-kit children landed. Root cause:
   those rows were synthesised on the FE from the SSE event reducer;
   when liveJobs from the BE arrived, mergeJobs() switched to backend-
   only and the reducer-derived rows vanished.

   Fix: register the 5 Phase-0 lifecycle phases as durable Job records
   under a new "provisioner" group inside jobs.Store. The bridge now
   transitions them through Pending → Running → Succeeded/Failed as
   the provisioner emits its named-phase events; "tofu" stdout/stderr
   stream lines append to the currently-active phase's Execution.
   /jobs/tofu-apply (and the four siblings) now resolve from the very
   first emit and never disappear when the BE feed takes over.

2. /api/v1/sovereigns/<id>/k8s/stream returned 404 on every chroot
   post-cutover, so /cloud?view=list&kind=services and every other
   k8scache-backed view rendered "Stream temporarily unreachable".
   Root cause: the chroot's k8scache.Factory.FromEnv self-register
   path needed a deployment id, but cutover never imports the mother's
   record AND step-07 only patches CATALYST_GITOPS_REPO_URL — not
   CATALYST_SELF_DEPLOYMENT_ID. Result: chroot deferred forever, no
   informers, no clusters registered.

   Fix: factory.go now derives a stable "sovereign-<fqdn>" id from
   SOVEREIGN_FQDN when no other id resolves, so the chroot self-
   registers exactly one cluster on every Sovereign. The k8s handlers
   alias any incoming URL cluster id onto that single chroot cluster
   when SOVEREIGN_FQDN is set, so existing FE that targets the
   mother's deployment id keeps working byte-identically.

3. /api/v1/deployments/<id>/jobs returned every job as Pending with
   no Started/Duration/exec-logs because chrootSeedJobsStoreIfEmpty's
   in-memory ownership-check gate never matched (no deployment record
   imported). Fix: jobs.go now synthesises an in-memory Deployment
   record from SOVEREIGN_FQDN on first read, so the lazy seed fires
   and converts the live HelmRelease state into rich Job records.

Together these mean post-cutover Sovereign Consoles serve real-time
data for ALL future Sovereigns without any per-deployment cutover
import step required.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:29:33 +04:00
github-actions[bot]
3b930793c5 deploy: update catalyst images to 25f1446 2026-05-07 07:29:52 +00:00
e3mrah
25f14469d3
fix(provisioner): map wizard's three-mode domain selector to tofu's binary pool/byo enum (#1069)
Caught live on omantel.biz re-provision (deploymentId ab0bf689620f4102):
tofu plan failed at exit 1 with:

  Error: Invalid value for variable
    on variables.tf line 296:
   296: variable "domain_mode" {
      ├────────────────
      │ var.domain_mode is "byo-manual"
    Domain mode must be 'pool' or 'byo'.

The wizard's StepDomain has three options (pool / byo-manual /
byo-api) so the UX can branch the operator into the right flow:

  - pool:        OpenOva owns the parent zone via Dynadot+PDM
  - byo-manual:  operator pastes NS records into their registrar
  - byo-api:     operator's registrar API drives NS automatically

The OpenTofu module's `variable "domain_mode"` validation only
accepts the binary pool/byo distinction — from the cloud-infra layer
(Hetzner servers, network, LB) NONE of those wizard distinctions
matter; tofu only needs to know whether to call Dynadot at apply
time. The three-mode wizard value was being written verbatim to the
tfvars without mapping.

Add `mapDomainModeForTofu(wizardMode)` helper:
  - "pool"      → "pool"
  - "byo-manual"→ "byo"
  - "byo-api"   → "byo"
  - empty       → "byo"  (test path that doesn't set the field)

Bump chart 1.4.83 → 1.4.84.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:26:50 +04:00
github-actions[bot]
adda972dd8 deploy: update catalyst images to 0a0b912 2026-05-06 20:35:36 +00:00
e3mrah
0a0b912e0d
fix(wizard): KServe was wrongly under Always Included on every Sovereign (#1068)
* fix(hetzner-purge): close volumes/primary_ips/floating_ips gap — wipe was leaving Crossplane orphans

Founder caught the gap on omantel.biz post-decommission: Hetzner
console showed 0 servers/LBs/IPs but 1 Volume + 2 Networks + 1
Firewall lingering. Networks/Firewall were the existing async-detach
window (handled by name-prefix fallback in the next provision); the
**Volume** was a hard miss — Purge() never called /v1/volumes.

Root cause: post-handover, the Hetzner Cloud Volume CSI driver
allocates Hetzner Volumes for every CNPG/Harbor/Loki/Mimir
StatefulSet PVC. tofu state never tracks them. When the operator
decommissions, `tofu destroy` is a no-op for the Volume and the
existing label-sweep didn't list /v1/volumes either. Result: orphan
volumes accrue cloud cost across re-provision cycles.

Same architectural gap for primary_ips (CCM-allocated for LoadBalancer
services since Hetzner's 2023 IP-decoupling) and floating_ips
(rare in Catalyst stack but listed for completeness).

Fix: extend Purge() + purgeByNamePrefix() to walk three additional
endpoints in dependency order:

  servers → load_balancers → firewalls → networks → ssh_keys
  → volumes (after servers detach)
  → primary_ips (after LBs free their IPs)
  → floating_ips

Both label-pass AND name-prefix-pass cover all 8 kinds. PurgeReport
extended with Volumes/PrimaryIPs/FloatingIPs slices; Total() updated.

CSI-named volumes (`pvc-<uid>` form) won't match either pass — those
need the canonical `catalyst.openova.io/sovereign=<fqdn>` label which
the Crossplane composition for VolumeClaim must apply. That's a
separate composition-layer fix tracked separately; this PR closes
the wipe gap for everything labelled OR name-prefixed.

Bump chart 1.4.80 → 1.4.81.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(wizard): KServe was wrongly under Always Included on every Sovereign

Founder caught on console.openova.io/sovereign/wizard step 4: KServe
appeared in the "Always Included" section as if every Sovereign had
to install it. False positive — KServe is conditionally mandatory
ONLY when the operator opts into the CORTEX (AI/ML) product family.

Two coupled bugs:

(1) Data model: kserve was tagged tier:'mandatory' inside the CORTEX
    product family, but tier:'mandatory' is consumed everywhere in
    the wizard as "always-on regardless of family selection":
      - componentGroups.ts:543 — seedIds.add(c.id) → auto-selected at
        wizard init for every Sovereign
      - applicationCatalog.ts:97 — seeded into the apps grid
      - store.ts:642 — special-cased as undeselectable
      - StepComponents.tsx — surfaced under "Always Included" tab
    Demote to tier:'recommended'. CORTEX has
    cascadeOnMemberSelection:true so picking any CORTEX member (vLLM,
    Specter, BGE, Milvus, …) still auto-pulls KServe via the cascade
    — that's the right semantics. KServe stays visible under CORTEX
    in Tab 1 ("Choose Your Stack") and locks-in once CORTEX is
    selected.

(2) UI filter: AlwaysIncludedTab was iterating every PRODUCTS entry
    regardless of product.tier and listing every member with
    component.tier === 'mandatory'. That mixes the platform-mandatory
    layer (PILOT/SPINE/SURGE/SILO/GUARDIAN tier:'mandatory' families)
    with conditional-mandatory members of opt-in families
    (CORTEX/RELAY tier:'optional', INSIGHTS/FABRIC tier:'recommended').
    Filter by product.tier === 'mandatory' so only the always-on
    families' mandatory members appear. Defence-in-depth — even if a
    new opt-in family ships with internal-mandatory members, they
    won't leak into "Always Included".

Audit confirmed kserve was the only offender across all 9 product
families today. PILOT/SPINE/SURGE/SILO/GUARDIAN remain unchanged
(their members rightfully tier:'mandatory'); CORTEX kserve fixed;
others have no internal mandatories.

Bump chart 1.4.81 → 1.4.82.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:33:19 +04:00
github-actions[bot]
9b4376fba7 deploy: update catalyst images to b233202 2026-05-06 20:10:53 +00:00
e3mrah
b233202b65
fix(hetzner-purge): close volumes/primary_ips/floating_ips gap — wipe was leaving Crossplane orphans (#1067)
Founder caught the gap on omantel.biz post-decommission: Hetzner
console showed 0 servers/LBs/IPs but 1 Volume + 2 Networks + 1
Firewall lingering. Networks/Firewall were the existing async-detach
window (handled by name-prefix fallback in the next provision); the
**Volume** was a hard miss — Purge() never called /v1/volumes.

Root cause: post-handover, the Hetzner Cloud Volume CSI driver
allocates Hetzner Volumes for every CNPG/Harbor/Loki/Mimir
StatefulSet PVC. tofu state never tracks them. When the operator
decommissions, `tofu destroy` is a no-op for the Volume and the
existing label-sweep didn't list /v1/volumes either. Result: orphan
volumes accrue cloud cost across re-provision cycles.

Same architectural gap for primary_ips (CCM-allocated for LoadBalancer
services since Hetzner's 2023 IP-decoupling) and floating_ips
(rare in Catalyst stack but listed for completeness).

Fix: extend Purge() + purgeByNamePrefix() to walk three additional
endpoints in dependency order:

  servers → load_balancers → firewalls → networks → ssh_keys
  → volumes (after servers detach)
  → primary_ips (after LBs free their IPs)
  → floating_ips

Both label-pass AND name-prefix-pass cover all 8 kinds. PurgeReport
extended with Volumes/PrimaryIPs/FloatingIPs slices; Total() updated.

CSI-named volumes (`pvc-<uid>` form) won't match either pass — those
need the canonical `catalyst.openova.io/sovereign=<fqdn>` label which
the Crossplane composition for VolumeClaim must apply. That's a
separate composition-layer fix tracked separately; this PR closes
the wipe gap for everything labelled OR name-prefixed.

Bump chart 1.4.80 → 1.4.81.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:08:50 +04:00
github-actions[bot]
f958643dc7 deploy: update catalyst images to daeff32 2026-05-06 19:00:38 +00:00
e3mrah
daeff32cbe
fix(cloudpage): hoist k8sStream above ctx — TS use-before-declaration broke build-ui (#1066)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane

Founder asked: "make the real-time k8s information propagation
development reused — find the reverted prior work and implement the
final working one."

History:
- PR #358 (May 1) shipped the full informer + SSE data plane:
  internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics}
  + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) +
  UI hook lib/useK8sStream.ts + widget useK8sCacheStream.
- PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream
  with kinds=namespace,node,pv,pod,deployment,...,server.hcloud,
  volume.hcloud and `&initialState=1` for live cloud-graph deltas.
- PR #981 hotfix dropped the synchronous discovery probe in
  factory.go:AddCluster (it was calling
  core.Discovery().ServerResourcesForGroupVersion(gv) with NO context
  timeout — on a kubeconfig pointing at a decommissioned otech the
  call hung the catalyst-api startup for minutes per dead cluster).

After #981 the discovery-probe surgery was clean — no follow-up
broke. The data plane code stayed in the codebase. The remaining
gap was operational, not architectural:

  On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>),
  the catalyst-api boots without a posted-back kubeconfig in
  /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns []
  → factory has zero clusters → every
  /api/v1/sovereigns/{depId}/k8s/* request 404s with
  "sovereign \"...\" not registered". The architecture-graph
  in-flight call confirmed live on omantel.biz today.

Fix in this PR:

1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN
   env is set (chroot mode), build a ClusterRef with id resolved from
   CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by
   scanning /var/lib/catalyst/deployments/*.json for a record matching
   the FQDN (mirrors HandleSovereignSelf's store-fallback path for
   consistency). DynamicClient + CoreClient built from
   rest.InClusterConfig(). Append to the cluster list. Mother behavior
   unchanged — SOVEREIGN_FQDN unset → branch is a no-op.

2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide
   get/list/watch on every kind in the k8scache registry (pods,
   deployments, statefulsets, daemonsets, replicasets, services,
   endpointslices, ingresses, configmaps, secrets, persistentvolumes,
   persistentvolumeclaims, hcloud.crossplane.io managed resources,
   vclusters), plus authorization.k8s.io/subjectaccessreviews so the
   per-event SAR gating in the SSE handler doesn't 403 silently.

3. Bump chart 1.4.70 → 1.4.71.

The discovery-probe failure mode that triggered the original revert
(synchronous ServerResourcesForGroupVersion blocking startup) does
NOT recur here — InClusterConfig() returns immediately, NewForConfig
is lazy, and the first network call happens inside the informer
goroutine after Start, off the boot critical path. Mother-side
LoadClustersFromDir behavior is untouched (no probe, just kubeconfig
file parsing as it has been since #981).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): + More popover escapes overflow clip + graph centers via gravity force

Two cloud-page bugs caught live on omantel.biz:

(1) /cloud?view=list&kind=clusters → +More popover non-functional.
    The popover renders at its anchor coords but pointer events pass
    through to the toolbar below it. Diagnosis:
        .cloud-page-toolbar > [data-testid="cloud-kind-chips"] {
          overflow-x: auto;
        }
    Per CSS spec, when one overflow axis is non-visible, the OTHER
    axis becomes auto/hidden too. So overflow-x:auto on the chips
    strip silently sets overflow-y:auto, which clips the absolutely-
    positioned popover that hangs DOWN from the +More button.

    Fix: render the popover via React.createPortal to document.body
    so it's outside any overflow ancestor. Position via fixed
    coordinates computed from the +More button's
    getBoundingClientRect, recomputed on resize/scroll. Click-outside
    dismissal updated to check both wrapper AND portaled popover.

(2) /cloud?view=graph → bubbles drift to canvas edges, leaving the
    centre empty until enough nodes (e.g. worker nodes) are added
    to anchor things via link tension.

    Two coupled root causes:

    a) `forceCenter` only adjusts the centroid — it shifts ALL
       nodes uniformly so their average sits at (cx, cy). It does
       NOT pull individual nodes inward. With small node counts
       and high charge repulsion (-160 for ≤50 nodes), nothing
       opposes outward drift.

    b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x =
       minX`. Nodes that hit the wall get arrested with their
       velocity preserved on the perpendicular axis but no inward
       impulse → they slide along the wall and stack at corners.
       The simulation never relaxes back to the centre.

    Fix:
    a) Add forceX(cx) + forceY(cy) with `centerGravity` strength
       per node-count tier (0.08 for ≤50, scaling down with
       larger graphs where link tension is sufficient). This pulls
       every individual node toward the centre proportional to its
       offset.
    b) Replace the hard clamp with an elastic bounce: when a node
       hits the boundary, reverse its velocity component (×0.4
       damping) instead of zeroing it. Energy returns to the
       system, the simulation actually relaxes.

Bump chart 1.4.72 → 1.4.73.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloud): expose all live K8s kinds in +More popover + chip counts + tighter graph centering

Founder feedback (after PR #1062 lit up the data plane):
1. The +More popover was missing pods, deployments, statefulsets,
   daemonsets, configmaps, secrets, namespaces, etc. — it only
   carried the 6 placeholder kinds the legacy topology API knew
   about.
2. Several chips (Services, Ingresses, Storage Classes) showed "—"
   for count even though the data IS in the live cluster (visible in
   the graph view).
3. The graph view still pushed bubbles to canvas edges; only adding
   worker nodes brought things back. The previous gravity tuning
   wasn't strong enough for ~300 nodes.

This PR addresses all three.

(1) Eleven new K8s-backed list pages exposed in +More:
    Pods, Deployments, StatefulSets, DaemonSets, ReplicaSets,
    ConfigMaps, Secrets, Namespaces, Nodes, PersistentVolumes,
    EndpointSlices.
    Plus replaced the placeholder Services and Ingresses pages with
    live K8s tables.

    All built on a new generic K8sListPage that subscribes to
    /api/v1/sovereigns/{depId}/k8s/stream (same SSE channel the
    architecture-graph already uses) and renders a typed-column
    table per kind. Columns are declared once per kind in
    kindsPages.tsx; the rendering is uniform so adding a kind is a
    ~12-line wrapper.

(2) CloudPage.kindCounts now folds the live K8s snapshot into the
    chip-count map. KIND_TO_REGISTRY in kinds.ts maps each chip id
    to the registry kind name (pods → 'pod' etc). Counts that came
    from null (data not available) flip to live counts the moment
    the SSE stream's initialState=1 arrives.

(3) GraphCanvas physics retuned for live-data scale:
    - centerGravity: 0.08→0.18 for ≤50 nodes, 0.06→0.16 for ≤200,
      0.04→0.14 for ≤1000, 0.03→0.10 for ≤5000, 0.02→0.08 for >5000.
      The forceX/forceY pulls every individual node toward (cx,cy)
      proportional to its offset — 2-3× stronger than the original
      tuning so the canvas centre stays populated.
    - Charge softened: -160→-90 for ≤50 nodes, scaled down through
      every tier. The previous values were calibrated against a
      ~20-node topology stub; live data delivers 10-50× more nodes
      per Sovereign so charge needs to relax proportionally.

Bump chart 1.4.74 → 1.4.75.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud-list): share single SSE subscription via CloudContext — list pages were stuck connecting

After PR #1064 the +More popover was correctly populated and chip
counts were live, but clicking through to a list page (e.g.
/cloud?view=list&kind=pods) hung at "Connecting to live cluster
stream…" while the chip count beside the same kind already showed
the right number (110 pods).

Diagnosis: the K8sListPage was calling useK8sCacheStream with kinds:[kind],
opening its OWN EventSource. The parent CloudPage already had an
EventSource open (subscribing to all kinds — the source of the chip
counts). Two long-lived SSE streams from the same browser to the
same origin starve the connection budget; the second connection
hangs at "connecting" while the first holds the slot.

Fix: hoist the snapshot via CloudContext. CloudPage is already the
owner of the page-level useK8sCacheStream invocation; expose its
snapshot/status/revision through the existing useCloud() context.
K8sListPage now reads from useCloud() instead of opening a duplicate
stream. Single subscription, single source of truth for both chip
counts AND list rows.

Bump chart 1.4.76 → 1.4.77.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloudpage): hoist k8sStream above ctx — was used before declaration

PR #1065 added k8sStream into the ctx useMemo deps but the
useK8sCacheStream() call was at line 396, well after the ctx build at
line 290. tsc -b caught it: TS2448/TS2454 use-before-declaration. CI
build-ui failed.

Move the useK8sCacheStream invocation to immediately precede the ctx
build. No behaviour change.

Bump chart 1.4.78 → 1.4.79.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:58:25 +04:00
e3mrah
f02136a89c
fix(cloud-list): share single SSE via CloudContext — list pages were stuck connecting (#1065)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane

Founder asked: "make the real-time k8s information propagation
development reused — find the reverted prior work and implement the
final working one."

History:
- PR #358 (May 1) shipped the full informer + SSE data plane:
  internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics}
  + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) +
  UI hook lib/useK8sStream.ts + widget useK8sCacheStream.
- PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream
  with kinds=namespace,node,pv,pod,deployment,...,server.hcloud,
  volume.hcloud and `&initialState=1` for live cloud-graph deltas.
- PR #981 hotfix dropped the synchronous discovery probe in
  factory.go:AddCluster (it was calling
  core.Discovery().ServerResourcesForGroupVersion(gv) with NO context
  timeout — on a kubeconfig pointing at a decommissioned otech the
  call hung the catalyst-api startup for minutes per dead cluster).

After #981 the discovery-probe surgery was clean — no follow-up
broke. The data plane code stayed in the codebase. The remaining
gap was operational, not architectural:

  On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>),
  the catalyst-api boots without a posted-back kubeconfig in
  /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns []
  → factory has zero clusters → every
  /api/v1/sovereigns/{depId}/k8s/* request 404s with
  "sovereign \"...\" not registered". The architecture-graph
  in-flight call confirmed live on omantel.biz today.

Fix in this PR:

1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN
   env is set (chroot mode), build a ClusterRef with id resolved from
   CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by
   scanning /var/lib/catalyst/deployments/*.json for a record matching
   the FQDN (mirrors HandleSovereignSelf's store-fallback path for
   consistency). DynamicClient + CoreClient built from
   rest.InClusterConfig(). Append to the cluster list. Mother behavior
   unchanged — SOVEREIGN_FQDN unset → branch is a no-op.

2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide
   get/list/watch on every kind in the k8scache registry (pods,
   deployments, statefulsets, daemonsets, replicasets, services,
   endpointslices, ingresses, configmaps, secrets, persistentvolumes,
   persistentvolumeclaims, hcloud.crossplane.io managed resources,
   vclusters), plus authorization.k8s.io/subjectaccessreviews so the
   per-event SAR gating in the SSE handler doesn't 403 silently.

3. Bump chart 1.4.70 → 1.4.71.

The discovery-probe failure mode that triggered the original revert
(synchronous ServerResourcesForGroupVersion blocking startup) does
NOT recur here — InClusterConfig() returns immediately, NewForConfig
is lazy, and the first network call happens inside the informer
goroutine after Start, off the boot critical path. Mother-side
LoadClustersFromDir behavior is untouched (no probe, just kubeconfig
file parsing as it has been since #981).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): + More popover escapes overflow clip + graph centers via gravity force

Two cloud-page bugs caught live on omantel.biz:

(1) /cloud?view=list&kind=clusters → +More popover non-functional.
    The popover renders at its anchor coords but pointer events pass
    through to the toolbar below it. Diagnosis:
        .cloud-page-toolbar > [data-testid="cloud-kind-chips"] {
          overflow-x: auto;
        }
    Per CSS spec, when one overflow axis is non-visible, the OTHER
    axis becomes auto/hidden too. So overflow-x:auto on the chips
    strip silently sets overflow-y:auto, which clips the absolutely-
    positioned popover that hangs DOWN from the +More button.

    Fix: render the popover via React.createPortal to document.body
    so it's outside any overflow ancestor. Position via fixed
    coordinates computed from the +More button's
    getBoundingClientRect, recomputed on resize/scroll. Click-outside
    dismissal updated to check both wrapper AND portaled popover.

(2) /cloud?view=graph → bubbles drift to canvas edges, leaving the
    centre empty until enough nodes (e.g. worker nodes) are added
    to anchor things via link tension.

    Two coupled root causes:

    a) `forceCenter` only adjusts the centroid — it shifts ALL
       nodes uniformly so their average sits at (cx, cy). It does
       NOT pull individual nodes inward. With small node counts
       and high charge repulsion (-160 for ≤50 nodes), nothing
       opposes outward drift.

    b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x =
       minX`. Nodes that hit the wall get arrested with their
       velocity preserved on the perpendicular axis but no inward
       impulse → they slide along the wall and stack at corners.
       The simulation never relaxes back to the centre.

    Fix:
    a) Add forceX(cx) + forceY(cy) with `centerGravity` strength
       per node-count tier (0.08 for ≤50, scaling down with
       larger graphs where link tension is sufficient). This pulls
       every individual node toward the centre proportional to its
       offset.
    b) Replace the hard clamp with an elastic bounce: when a node
       hits the boundary, reverse its velocity component (×0.4
       damping) instead of zeroing it. Energy returns to the
       system, the simulation actually relaxes.

Bump chart 1.4.72 → 1.4.73.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloud): expose all live K8s kinds in +More popover + chip counts + tighter graph centering

Founder feedback (after PR #1062 lit up the data plane):
1. The +More popover was missing pods, deployments, statefulsets,
   daemonsets, configmaps, secrets, namespaces, etc. — it only
   carried the 6 placeholder kinds the legacy topology API knew
   about.
2. Several chips (Services, Ingresses, Storage Classes) showed "—"
   for count even though the data IS in the live cluster (visible in
   the graph view).
3. The graph view still pushed bubbles to canvas edges; only adding
   worker nodes brought things back. The previous gravity tuning
   wasn't strong enough for ~300 nodes.

This PR addresses all three.

(1) Eleven new K8s-backed list pages exposed in +More:
    Pods, Deployments, StatefulSets, DaemonSets, ReplicaSets,
    ConfigMaps, Secrets, Namespaces, Nodes, PersistentVolumes,
    EndpointSlices.
    Plus replaced the placeholder Services and Ingresses pages with
    live K8s tables.

    All built on a new generic K8sListPage that subscribes to
    /api/v1/sovereigns/{depId}/k8s/stream (same SSE channel the
    architecture-graph already uses) and renders a typed-column
    table per kind. Columns are declared once per kind in
    kindsPages.tsx; the rendering is uniform so adding a kind is a
    ~12-line wrapper.

(2) CloudPage.kindCounts now folds the live K8s snapshot into the
    chip-count map. KIND_TO_REGISTRY in kinds.ts maps each chip id
    to the registry kind name (pods → 'pod' etc). Counts that came
    from null (data not available) flip to live counts the moment
    the SSE stream's initialState=1 arrives.

(3) GraphCanvas physics retuned for live-data scale:
    - centerGravity: 0.08→0.18 for ≤50 nodes, 0.06→0.16 for ≤200,
      0.04→0.14 for ≤1000, 0.03→0.10 for ≤5000, 0.02→0.08 for >5000.
      The forceX/forceY pulls every individual node toward (cx,cy)
      proportional to its offset — 2-3× stronger than the original
      tuning so the canvas centre stays populated.
    - Charge softened: -160→-90 for ≤50 nodes, scaled down through
      every tier. The previous values were calibrated against a
      ~20-node topology stub; live data delivers 10-50× more nodes
      per Sovereign so charge needs to relax proportionally.

Bump chart 1.4.74 → 1.4.75.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud-list): share single SSE subscription via CloudContext — list pages were stuck connecting

After PR #1064 the +More popover was correctly populated and chip
counts were live, but clicking through to a list page (e.g.
/cloud?view=list&kind=pods) hung at "Connecting to live cluster
stream…" while the chip count beside the same kind already showed
the right number (110 pods).

Diagnosis: the K8sListPage was calling useK8sCacheStream with kinds:[kind],
opening its OWN EventSource. The parent CloudPage already had an
EventSource open (subscribing to all kinds — the source of the chip
counts). Two long-lived SSE streams from the same browser to the
same origin starve the connection budget; the second connection
hangs at "connecting" while the first holds the slot.

Fix: hoist the snapshot via CloudContext. CloudPage is already the
owner of the page-level useK8sCacheStream invocation; expose its
snapshot/status/revision through the existing useCloud() context.
K8sListPage now reads from useCloud() instead of opening a duplicate
stream. Single subscription, single source of truth for both chip
counts AND list rows.

Bump chart 1.4.76 → 1.4.77.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:34:16 +04:00
github-actions[bot]
0cfbb106dc deploy: update catalyst images to 2604c9c 2026-05-06 18:17:51 +00:00
e3mrah
2604c9cf36
feat(cloud): all live K8s kinds in +More + chip counts + tighter graph centering (#1064)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane

Founder asked: "make the real-time k8s information propagation
development reused — find the reverted prior work and implement the
final working one."

History:
- PR #358 (May 1) shipped the full informer + SSE data plane:
  internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics}
  + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) +
  UI hook lib/useK8sStream.ts + widget useK8sCacheStream.
- PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream
  with kinds=namespace,node,pv,pod,deployment,...,server.hcloud,
  volume.hcloud and `&initialState=1` for live cloud-graph deltas.
- PR #981 hotfix dropped the synchronous discovery probe in
  factory.go:AddCluster (it was calling
  core.Discovery().ServerResourcesForGroupVersion(gv) with NO context
  timeout — on a kubeconfig pointing at a decommissioned otech the
  call hung the catalyst-api startup for minutes per dead cluster).

After #981 the discovery-probe surgery was clean — no follow-up
broke. The data plane code stayed in the codebase. The remaining
gap was operational, not architectural:

  On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>),
  the catalyst-api boots without a posted-back kubeconfig in
  /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns []
  → factory has zero clusters → every
  /api/v1/sovereigns/{depId}/k8s/* request 404s with
  "sovereign \"...\" not registered". The architecture-graph
  in-flight call confirmed live on omantel.biz today.

Fix in this PR:

1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN
   env is set (chroot mode), build a ClusterRef with id resolved from
   CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by
   scanning /var/lib/catalyst/deployments/*.json for a record matching
   the FQDN (mirrors HandleSovereignSelf's store-fallback path for
   consistency). DynamicClient + CoreClient built from
   rest.InClusterConfig(). Append to the cluster list. Mother behavior
   unchanged — SOVEREIGN_FQDN unset → branch is a no-op.

2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide
   get/list/watch on every kind in the k8scache registry (pods,
   deployments, statefulsets, daemonsets, replicasets, services,
   endpointslices, ingresses, configmaps, secrets, persistentvolumes,
   persistentvolumeclaims, hcloud.crossplane.io managed resources,
   vclusters), plus authorization.k8s.io/subjectaccessreviews so the
   per-event SAR gating in the SSE handler doesn't 403 silently.

3. Bump chart 1.4.70 → 1.4.71.

The discovery-probe failure mode that triggered the original revert
(synchronous ServerResourcesForGroupVersion blocking startup) does
NOT recur here — InClusterConfig() returns immediately, NewForConfig
is lazy, and the first network call happens inside the informer
goroutine after Start, off the boot critical path. Mother-side
LoadClustersFromDir behavior is untouched (no probe, just kubeconfig
file parsing as it has been since #981).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): + More popover escapes overflow clip + graph centers via gravity force

Two cloud-page bugs caught live on omantel.biz:

(1) /cloud?view=list&kind=clusters → +More popover non-functional.
    The popover renders at its anchor coords but pointer events pass
    through to the toolbar below it. Diagnosis:
        .cloud-page-toolbar > [data-testid="cloud-kind-chips"] {
          overflow-x: auto;
        }
    Per CSS spec, when one overflow axis is non-visible, the OTHER
    axis becomes auto/hidden too. So overflow-x:auto on the chips
    strip silently sets overflow-y:auto, which clips the absolutely-
    positioned popover that hangs DOWN from the +More button.

    Fix: render the popover via React.createPortal to document.body
    so it's outside any overflow ancestor. Position via fixed
    coordinates computed from the +More button's
    getBoundingClientRect, recomputed on resize/scroll. Click-outside
    dismissal updated to check both wrapper AND portaled popover.

(2) /cloud?view=graph → bubbles drift to canvas edges, leaving the
    centre empty until enough nodes (e.g. worker nodes) are added
    to anchor things via link tension.

    Two coupled root causes:

    a) `forceCenter` only adjusts the centroid — it shifts ALL
       nodes uniformly so their average sits at (cx, cy). It does
       NOT pull individual nodes inward. With small node counts
       and high charge repulsion (-160 for ≤50 nodes), nothing
       opposes outward drift.

    b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x =
       minX`. Nodes that hit the wall get arrested with their
       velocity preserved on the perpendicular axis but no inward
       impulse → they slide along the wall and stack at corners.
       The simulation never relaxes back to the centre.

    Fix:
    a) Add forceX(cx) + forceY(cy) with `centerGravity` strength
       per node-count tier (0.08 for ≤50, scaling down with
       larger graphs where link tension is sufficient). This pulls
       every individual node toward the centre proportional to its
       offset.
    b) Replace the hard clamp with an elastic bounce: when a node
       hits the boundary, reverse its velocity component (×0.4
       damping) instead of zeroing it. Energy returns to the
       system, the simulation actually relaxes.

Bump chart 1.4.72 → 1.4.73.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloud): expose all live K8s kinds in +More popover + chip counts + tighter graph centering

Founder feedback (after PR #1062 lit up the data plane):
1. The +More popover was missing pods, deployments, statefulsets,
   daemonsets, configmaps, secrets, namespaces, etc. — it only
   carried the 6 placeholder kinds the legacy topology API knew
   about.
2. Several chips (Services, Ingresses, Storage Classes) showed "—"
   for count even though the data IS in the live cluster (visible in
   the graph view).
3. The graph view still pushed bubbles to canvas edges; only adding
   worker nodes brought things back. The previous gravity tuning
   wasn't strong enough for ~300 nodes.

This PR addresses all three.

(1) Eleven new K8s-backed list pages exposed in +More:
    Pods, Deployments, StatefulSets, DaemonSets, ReplicaSets,
    ConfigMaps, Secrets, Namespaces, Nodes, PersistentVolumes,
    EndpointSlices.
    Plus replaced the placeholder Services and Ingresses pages with
    live K8s tables.

    All built on a new generic K8sListPage that subscribes to
    /api/v1/sovereigns/{depId}/k8s/stream (same SSE channel the
    architecture-graph already uses) and renders a typed-column
    table per kind. Columns are declared once per kind in
    kindsPages.tsx; the rendering is uniform so adding a kind is a
    ~12-line wrapper.

(2) CloudPage.kindCounts now folds the live K8s snapshot into the
    chip-count map. KIND_TO_REGISTRY in kinds.ts maps each chip id
    to the registry kind name (pods → 'pod' etc). Counts that came
    from null (data not available) flip to live counts the moment
    the SSE stream's initialState=1 arrives.

(3) GraphCanvas physics retuned for live-data scale:
    - centerGravity: 0.08→0.18 for ≤50 nodes, 0.06→0.16 for ≤200,
      0.04→0.14 for ≤1000, 0.03→0.10 for ≤5000, 0.02→0.08 for >5000.
      The forceX/forceY pulls every individual node toward (cx,cy)
      proportional to its offset — 2-3× stronger than the original
      tuning so the canvas centre stays populated.
    - Charge softened: -160→-90 for ≤50 nodes, scaled down through
      every tier. The previous values were calibrated against a
      ~20-node topology stub; live data delivers 10-50× more nodes
      per Sovereign so charge needs to relax proportionally.

Bump chart 1.4.74 → 1.4.75.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:15:25 +04:00
github-actions[bot]
9d60bbab91 deploy: update catalyst images to 167d093 2026-05-06 17:53:26 +00:00
e3mrah
167d09348e
fix(cloud): +More popover escapes overflow clip + graph centers via gravity force (#1063)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane

Founder asked: "make the real-time k8s information propagation
development reused — find the reverted prior work and implement the
final working one."

History:
- PR #358 (May 1) shipped the full informer + SSE data plane:
  internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics}
  + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) +
  UI hook lib/useK8sStream.ts + widget useK8sCacheStream.
- PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream
  with kinds=namespace,node,pv,pod,deployment,...,server.hcloud,
  volume.hcloud and `&initialState=1` for live cloud-graph deltas.
- PR #981 hotfix dropped the synchronous discovery probe in
  factory.go:AddCluster (it was calling
  core.Discovery().ServerResourcesForGroupVersion(gv) with NO context
  timeout — on a kubeconfig pointing at a decommissioned otech the
  call hung the catalyst-api startup for minutes per dead cluster).

After #981 the discovery-probe surgery was clean — no follow-up
broke. The data plane code stayed in the codebase. The remaining
gap was operational, not architectural:

  On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>),
  the catalyst-api boots without a posted-back kubeconfig in
  /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns []
  → factory has zero clusters → every
  /api/v1/sovereigns/{depId}/k8s/* request 404s with
  "sovereign \"...\" not registered". The architecture-graph
  in-flight call confirmed live on omantel.biz today.

Fix in this PR:

1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN
   env is set (chroot mode), build a ClusterRef with id resolved from
   CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by
   scanning /var/lib/catalyst/deployments/*.json for a record matching
   the FQDN (mirrors HandleSovereignSelf's store-fallback path for
   consistency). DynamicClient + CoreClient built from
   rest.InClusterConfig(). Append to the cluster list. Mother behavior
   unchanged — SOVEREIGN_FQDN unset → branch is a no-op.

2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide
   get/list/watch on every kind in the k8scache registry (pods,
   deployments, statefulsets, daemonsets, replicasets, services,
   endpointslices, ingresses, configmaps, secrets, persistentvolumes,
   persistentvolumeclaims, hcloud.crossplane.io managed resources,
   vclusters), plus authorization.k8s.io/subjectaccessreviews so the
   per-event SAR gating in the SSE handler doesn't 403 silently.

3. Bump chart 1.4.70 → 1.4.71.

The discovery-probe failure mode that triggered the original revert
(synchronous ServerResourcesForGroupVersion blocking startup) does
NOT recur here — InClusterConfig() returns immediately, NewForConfig
is lazy, and the first network call happens inside the informer
goroutine after Start, off the boot critical path. Mother-side
LoadClustersFromDir behavior is untouched (no probe, just kubeconfig
file parsing as it has been since #981).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): + More popover escapes overflow clip + graph centers via gravity force

Two cloud-page bugs caught live on omantel.biz:

(1) /cloud?view=list&kind=clusters → +More popover non-functional.
    The popover renders at its anchor coords but pointer events pass
    through to the toolbar below it. Diagnosis:
        .cloud-page-toolbar > [data-testid="cloud-kind-chips"] {
          overflow-x: auto;
        }
    Per CSS spec, when one overflow axis is non-visible, the OTHER
    axis becomes auto/hidden too. So overflow-x:auto on the chips
    strip silently sets overflow-y:auto, which clips the absolutely-
    positioned popover that hangs DOWN from the +More button.

    Fix: render the popover via React.createPortal to document.body
    so it's outside any overflow ancestor. Position via fixed
    coordinates computed from the +More button's
    getBoundingClientRect, recomputed on resize/scroll. Click-outside
    dismissal updated to check both wrapper AND portaled popover.

(2) /cloud?view=graph → bubbles drift to canvas edges, leaving the
    centre empty until enough nodes (e.g. worker nodes) are added
    to anchor things via link tension.

    Two coupled root causes:

    a) `forceCenter` only adjusts the centroid — it shifts ALL
       nodes uniformly so their average sits at (cx, cy). It does
       NOT pull individual nodes inward. With small node counts
       and high charge repulsion (-160 for ≤50 nodes), nothing
       opposes outward drift.

    b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x =
       minX`. Nodes that hit the wall get arrested with their
       velocity preserved on the perpendicular axis but no inward
       impulse → they slide along the wall and stack at corners.
       The simulation never relaxes back to the centre.

    Fix:
    a) Add forceX(cx) + forceY(cy) with `centerGravity` strength
       per node-count tier (0.08 for ≤50, scaling down with
       larger graphs where link tension is sufficient). This pulls
       every individual node toward the centre proportional to its
       offset.
    b) Replace the hard clamp with an elastic bounce: when a node
       hits the boundary, reverse its velocity component (×0.4
       damping) instead of zeroing it. Energy returns to the
       system, the simulation actually relaxes.

Bump chart 1.4.72 → 1.4.73.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:51:07 +04:00
github-actions[bot]
eca1e00ab7 deploy: update catalyst images to 2ad31b4 2026-05-06 17:29:00 +00:00
e3mrah
2ad31b4481
feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes real-time data plane (#1062)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane

Founder asked: "make the real-time k8s information propagation
development reused — find the reverted prior work and implement the
final working one."

History:
- PR #358 (May 1) shipped the full informer + SSE data plane:
  internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics}
  + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) +
  UI hook lib/useK8sStream.ts + widget useK8sCacheStream.
- PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream
  with kinds=namespace,node,pv,pod,deployment,...,server.hcloud,
  volume.hcloud and `&initialState=1` for live cloud-graph deltas.
- PR #981 hotfix dropped the synchronous discovery probe in
  factory.go:AddCluster (it was calling
  core.Discovery().ServerResourcesForGroupVersion(gv) with NO context
  timeout — on a kubeconfig pointing at a decommissioned otech the
  call hung the catalyst-api startup for minutes per dead cluster).

After #981 the discovery-probe surgery was clean — no follow-up
broke. The data plane code stayed in the codebase. The remaining
gap was operational, not architectural:

  On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>),
  the catalyst-api boots without a posted-back kubeconfig in
  /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns []
  → factory has zero clusters → every
  /api/v1/sovereigns/{depId}/k8s/* request 404s with
  "sovereign \"...\" not registered". The architecture-graph
  in-flight call confirmed live on omantel.biz today.

Fix in this PR:

1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN
   env is set (chroot mode), build a ClusterRef with id resolved from
   CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by
   scanning /var/lib/catalyst/deployments/*.json for a record matching
   the FQDN (mirrors HandleSovereignSelf's store-fallback path for
   consistency). DynamicClient + CoreClient built from
   rest.InClusterConfig(). Append to the cluster list. Mother behavior
   unchanged — SOVEREIGN_FQDN unset → branch is a no-op.

2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide
   get/list/watch on every kind in the k8scache registry (pods,
   deployments, statefulsets, daemonsets, replicasets, services,
   endpointslices, ingresses, configmaps, secrets, persistentvolumes,
   persistentvolumeclaims, hcloud.crossplane.io managed resources,
   vclusters), plus authorization.k8s.io/subjectaccessreviews so the
   per-event SAR gating in the SSE handler doesn't 403 silently.

3. Bump chart 1.4.70 → 1.4.71.

The discovery-probe failure mode that triggered the original revert
(synchronous ServerResourcesForGroupVersion blocking startup) does
NOT recur here — InClusterConfig() returns immediately, NewForConfig
is lazy, and the first network call happens inside the informer
goroutine after Start, off the boot critical path. Mother-side
LoadClustersFromDir behavior is untouched (no probe, just kubeconfig
file parsing as it has been since #981).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:26:59 +04:00
github-actions[bot]
f88da5ff6e deploy: update catalyst images to eb6a3c1 2026-05-06 17:12:39 +00:00
e3mrah
eb6a3c1812
fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs — Sovereigns + contabo were frozen at :2122fb8 (#1060)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:10:31 +04:00
github-actions[bot]
66eca90c16 deploy: update catalyst images to 8361df4 2026-05-06 16:46:25 +00:00
e3mrah
8361df46ac
feat(apps): publish chip on each card — replaces deleted /catalog page (#1059)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:43:59 +04:00
github-actions[bot]
45b73651f8 deploy: update catalyst images to aed0a81 2026-05-06 16:30:28 +00:00
e3mrah
aed0a81f75
fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page (#1058)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:28:11 +04:00
github-actions[bot]
5d9fa2a5e7 deploy: update catalyst images to 8c8ccfb 2026-05-06 16:08:33 +00:00
e3mrah
8c8ccfbfed
fix(chroot): single chrome — no frame in frame, no mother handover banner (#1057)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:05:15 +04:00
github-actions[bot]
bda5617aed deploy: update catalyst images to 933b321 2026-05-06 15:15:15 +00:00