a987748b42
235 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
25f14469d3
|
fix(provisioner): map wizard's three-mode domain selector to tofu's binary pool/byo enum (#1069)
Caught live on omantel.biz re-provision (deploymentId ab0bf689620f4102):
tofu plan failed at exit 1 with:
Error: Invalid value for variable
on variables.tf line 296:
296: variable "domain_mode" {
├────────────────
│ var.domain_mode is "byo-manual"
Domain mode must be 'pool' or 'byo'.
The wizard's StepDomain has three options (pool / byo-manual /
byo-api) so the UX can branch the operator into the right flow:
- pool: OpenOva owns the parent zone via Dynadot+PDM
- byo-manual: operator pastes NS records into their registrar
- byo-api: operator's registrar API drives NS automatically
The OpenTofu module's `variable "domain_mode"` validation only
accepts the binary pool/byo distinction — from the cloud-infra layer
(Hetzner servers, network, LB) NONE of those wizard distinctions
matter; tofu only needs to know whether to call Dynadot at apply
time. The three-mode wizard value was being written verbatim to the
tfvars without mapping.
Add `mapDomainModeForTofu(wizardMode)` helper:
- "pool" → "pool"
- "byo-manual"→ "byo"
- "byo-api" → "byo"
- empty → "byo" (test path that doesn't set the field)
Bump chart 1.4.83 → 1.4.84.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0a0b912e0d
|
fix(wizard): KServe was wrongly under Always Included on every Sovereign (#1068)
* fix(hetzner-purge): close volumes/primary_ips/floating_ips gap — wipe was leaving Crossplane orphans
Founder caught the gap on omantel.biz post-decommission: Hetzner
console showed 0 servers/LBs/IPs but 1 Volume + 2 Networks + 1
Firewall lingering. Networks/Firewall were the existing async-detach
window (handled by name-prefix fallback in the next provision); the
**Volume** was a hard miss — Purge() never called /v1/volumes.
Root cause: post-handover, the Hetzner Cloud Volume CSI driver
allocates Hetzner Volumes for every CNPG/Harbor/Loki/Mimir
StatefulSet PVC. tofu state never tracks them. When the operator
decommissions, `tofu destroy` is a no-op for the Volume and the
existing label-sweep didn't list /v1/volumes either. Result: orphan
volumes accrue cloud cost across re-provision cycles.
Same architectural gap for primary_ips (CCM-allocated for LoadBalancer
services since Hetzner's 2023 IP-decoupling) and floating_ips
(rare in Catalyst stack but listed for completeness).
Fix: extend Purge() + purgeByNamePrefix() to walk three additional
endpoints in dependency order:
servers → load_balancers → firewalls → networks → ssh_keys
→ volumes (after servers detach)
→ primary_ips (after LBs free their IPs)
→ floating_ips
Both label-pass AND name-prefix-pass cover all 8 kinds. PurgeReport
extended with Volumes/PrimaryIPs/FloatingIPs slices; Total() updated.
CSI-named volumes (`pvc-<uid>` form) won't match either pass — those
need the canonical `catalyst.openova.io/sovereign=<fqdn>` label which
the Crossplane composition for VolumeClaim must apply. That's a
separate composition-layer fix tracked separately; this PR closes
the wipe gap for everything labelled OR name-prefixed.
Bump chart 1.4.80 → 1.4.81.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(wizard): KServe was wrongly under Always Included on every Sovereign
Founder caught on console.openova.io/sovereign/wizard step 4: KServe
appeared in the "Always Included" section as if every Sovereign had
to install it. False positive — KServe is conditionally mandatory
ONLY when the operator opts into the CORTEX (AI/ML) product family.
Two coupled bugs:
(1) Data model: kserve was tagged tier:'mandatory' inside the CORTEX
product family, but tier:'mandatory' is consumed everywhere in
the wizard as "always-on regardless of family selection":
- componentGroups.ts:543 — seedIds.add(c.id) → auto-selected at
wizard init for every Sovereign
- applicationCatalog.ts:97 — seeded into the apps grid
- store.ts:642 — special-cased as undeselectable
- StepComponents.tsx — surfaced under "Always Included" tab
Demote to tier:'recommended'. CORTEX has
cascadeOnMemberSelection:true so picking any CORTEX member (vLLM,
Specter, BGE, Milvus, …) still auto-pulls KServe via the cascade
— that's the right semantics. KServe stays visible under CORTEX
in Tab 1 ("Choose Your Stack") and locks-in once CORTEX is
selected.
(2) UI filter: AlwaysIncludedTab was iterating every PRODUCTS entry
regardless of product.tier and listing every member with
component.tier === 'mandatory'. That mixes the platform-mandatory
layer (PILOT/SPINE/SURGE/SILO/GUARDIAN tier:'mandatory' families)
with conditional-mandatory members of opt-in families
(CORTEX/RELAY tier:'optional', INSIGHTS/FABRIC tier:'recommended').
Filter by product.tier === 'mandatory' so only the always-on
families' mandatory members appear. Defence-in-depth — even if a
new opt-in family ships with internal-mandatory members, they
won't leak into "Always Included".
Audit confirmed kserve was the only offender across all 9 product
families today. PILOT/SPINE/SURGE/SILO/GUARDIAN remain unchanged
(their members rightfully tier:'mandatory'); CORTEX kserve fixed;
others have no internal mandatories.
Bump chart 1.4.81 → 1.4.82.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b233202b65
|
fix(hetzner-purge): close volumes/primary_ips/floating_ips gap — wipe was leaving Crossplane orphans (#1067)
Founder caught the gap on omantel.biz post-decommission: Hetzner console showed 0 servers/LBs/IPs but 1 Volume + 2 Networks + 1 Firewall lingering. Networks/Firewall were the existing async-detach window (handled by name-prefix fallback in the next provision); the **Volume** was a hard miss — Purge() never called /v1/volumes. Root cause: post-handover, the Hetzner Cloud Volume CSI driver allocates Hetzner Volumes for every CNPG/Harbor/Loki/Mimir StatefulSet PVC. tofu state never tracks them. When the operator decommissions, `tofu destroy` is a no-op for the Volume and the existing label-sweep didn't list /v1/volumes either. Result: orphan volumes accrue cloud cost across re-provision cycles. Same architectural gap for primary_ips (CCM-allocated for LoadBalancer services since Hetzner's 2023 IP-decoupling) and floating_ips (rare in Catalyst stack but listed for completeness). Fix: extend Purge() + purgeByNamePrefix() to walk three additional endpoints in dependency order: servers → load_balancers → firewalls → networks → ssh_keys → volumes (after servers detach) → primary_ips (after LBs free their IPs) → floating_ips Both label-pass AND name-prefix-pass cover all 8 kinds. PurgeReport extended with Volumes/PrimaryIPs/FloatingIPs slices; Total() updated. CSI-named volumes (`pvc-<uid>` form) won't match either pass — those need the canonical `catalyst.openova.io/sovereign=<fqdn>` label which the Crossplane composition for VolumeClaim must apply. That's a separate composition-layer fix tracked separately; this PR closes the wipe gap for everything labelled OR name-prefixed. Bump chart 1.4.80 → 1.4.81. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
daeff32cbe
|
fix(cloudpage): hoist k8sStream above ctx — TS use-before-declaration broke build-ui (#1066)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane Founder asked: "make the real-time k8s information propagation development reused — find the reverted prior work and implement the final working one." History: - PR #358 (May 1) shipped the full informer + SSE data plane: internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics} + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) + UI hook lib/useK8sStream.ts + widget useK8sCacheStream. - PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream with kinds=namespace,node,pv,pod,deployment,...,server.hcloud, volume.hcloud and `&initialState=1` for live cloud-graph deltas. - PR #981 hotfix dropped the synchronous discovery probe in factory.go:AddCluster (it was calling core.Discovery().ServerResourcesForGroupVersion(gv) with NO context timeout — on a kubeconfig pointing at a decommissioned otech the call hung the catalyst-api startup for minutes per dead cluster). After #981 the discovery-probe surgery was clean — no follow-up broke. The data plane code stayed in the codebase. The remaining gap was operational, not architectural: On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>), the catalyst-api boots without a posted-back kubeconfig in /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns [] → factory has zero clusters → every /api/v1/sovereigns/{depId}/k8s/* request 404s with "sovereign \"...\" not registered". The architecture-graph in-flight call confirmed live on omantel.biz today. Fix in this PR: 1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN env is set (chroot mode), build a ClusterRef with id resolved from CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by scanning /var/lib/catalyst/deployments/*.json for a record matching the FQDN (mirrors HandleSovereignSelf's store-fallback path for consistency). DynamicClient + CoreClient built from rest.InClusterConfig(). Append to the cluster list. Mother behavior unchanged — SOVEREIGN_FQDN unset → branch is a no-op. 2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide get/list/watch on every kind in the k8scache registry (pods, deployments, statefulsets, daemonsets, replicasets, services, endpointslices, ingresses, configmaps, secrets, persistentvolumes, persistentvolumeclaims, hcloud.crossplane.io managed resources, vclusters), plus authorization.k8s.io/subjectaccessreviews so the per-event SAR gating in the SSE handler doesn't 403 silently. 3. Bump chart 1.4.70 → 1.4.71. The discovery-probe failure mode that triggered the original revert (synchronous ServerResourcesForGroupVersion blocking startup) does NOT recur here — InClusterConfig() returns immediately, NewForConfig is lazy, and the first network call happens inside the informer goroutine after Start, off the boot critical path. Mother-side LoadClustersFromDir behavior is untouched (no probe, just kubeconfig file parsing as it has been since #981). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): + More popover escapes overflow clip + graph centers via gravity force Two cloud-page bugs caught live on omantel.biz: (1) /cloud?view=list&kind=clusters → +More popover non-functional. The popover renders at its anchor coords but pointer events pass through to the toolbar below it. Diagnosis: .cloud-page-toolbar > [data-testid="cloud-kind-chips"] { overflow-x: auto; } Per CSS spec, when one overflow axis is non-visible, the OTHER axis becomes auto/hidden too. So overflow-x:auto on the chips strip silently sets overflow-y:auto, which clips the absolutely- positioned popover that hangs DOWN from the +More button. Fix: render the popover via React.createPortal to document.body so it's outside any overflow ancestor. Position via fixed coordinates computed from the +More button's getBoundingClientRect, recomputed on resize/scroll. Click-outside dismissal updated to check both wrapper AND portaled popover. (2) /cloud?view=graph → bubbles drift to canvas edges, leaving the centre empty until enough nodes (e.g. worker nodes) are added to anchor things via link tension. Two coupled root causes: a) `forceCenter` only adjusts the centroid — it shifts ALL nodes uniformly so their average sits at (cx, cy). It does NOT pull individual nodes inward. With small node counts and high charge repulsion (-160 for ≤50 nodes), nothing opposes outward drift. b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x = minX`. Nodes that hit the wall get arrested with their velocity preserved on the perpendicular axis but no inward impulse → they slide along the wall and stack at corners. The simulation never relaxes back to the centre. Fix: a) Add forceX(cx) + forceY(cy) with `centerGravity` strength per node-count tier (0.08 for ≤50, scaling down with larger graphs where link tension is sufficient). This pulls every individual node toward the centre proportional to its offset. b) Replace the hard clamp with an elastic bounce: when a node hits the boundary, reverse its velocity component (×0.4 damping) instead of zeroing it. Energy returns to the system, the simulation actually relaxes. Bump chart 1.4.72 → 1.4.73. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cloud): expose all live K8s kinds in +More popover + chip counts + tighter graph centering Founder feedback (after PR #1062 lit up the data plane): 1. The +More popover was missing pods, deployments, statefulsets, daemonsets, configmaps, secrets, namespaces, etc. — it only carried the 6 placeholder kinds the legacy topology API knew about. 2. Several chips (Services, Ingresses, Storage Classes) showed "—" for count even though the data IS in the live cluster (visible in the graph view). 3. The graph view still pushed bubbles to canvas edges; only adding worker nodes brought things back. The previous gravity tuning wasn't strong enough for ~300 nodes. This PR addresses all three. (1) Eleven new K8s-backed list pages exposed in +More: Pods, Deployments, StatefulSets, DaemonSets, ReplicaSets, ConfigMaps, Secrets, Namespaces, Nodes, PersistentVolumes, EndpointSlices. Plus replaced the placeholder Services and Ingresses pages with live K8s tables. All built on a new generic K8sListPage that subscribes to /api/v1/sovereigns/{depId}/k8s/stream (same SSE channel the architecture-graph already uses) and renders a typed-column table per kind. Columns are declared once per kind in kindsPages.tsx; the rendering is uniform so adding a kind is a ~12-line wrapper. (2) CloudPage.kindCounts now folds the live K8s snapshot into the chip-count map. KIND_TO_REGISTRY in kinds.ts maps each chip id to the registry kind name (pods → 'pod' etc). Counts that came from null (data not available) flip to live counts the moment the SSE stream's initialState=1 arrives. (3) GraphCanvas physics retuned for live-data scale: - centerGravity: 0.08→0.18 for ≤50 nodes, 0.06→0.16 for ≤200, 0.04→0.14 for ≤1000, 0.03→0.10 for ≤5000, 0.02→0.08 for >5000. The forceX/forceY pulls every individual node toward (cx,cy) proportional to its offset — 2-3× stronger than the original tuning so the canvas centre stays populated. - Charge softened: -160→-90 for ≤50 nodes, scaled down through every tier. The previous values were calibrated against a ~20-node topology stub; live data delivers 10-50× more nodes per Sovereign so charge needs to relax proportionally. Bump chart 1.4.74 → 1.4.75. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud-list): share single SSE subscription via CloudContext — list pages were stuck connecting After PR #1064 the +More popover was correctly populated and chip counts were live, but clicking through to a list page (e.g. /cloud?view=list&kind=pods) hung at "Connecting to live cluster stream…" while the chip count beside the same kind already showed the right number (110 pods). Diagnosis: the K8sListPage was calling useK8sCacheStream with kinds:[kind], opening its OWN EventSource. The parent CloudPage already had an EventSource open (subscribing to all kinds — the source of the chip counts). Two long-lived SSE streams from the same browser to the same origin starve the connection budget; the second connection hangs at "connecting" while the first holds the slot. Fix: hoist the snapshot via CloudContext. CloudPage is already the owner of the page-level useK8sCacheStream invocation; expose its snapshot/status/revision through the existing useCloud() context. K8sListPage now reads from useCloud() instead of opening a duplicate stream. Single subscription, single source of truth for both chip counts AND list rows. Bump chart 1.4.76 → 1.4.77. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloudpage): hoist k8sStream above ctx — was used before declaration PR #1065 added k8sStream into the ctx useMemo deps but the useK8sCacheStream() call was at line 396, well after the ctx build at line 290. tsc -b caught it: TS2448/TS2454 use-before-declaration. CI build-ui failed. Move the useK8sCacheStream invocation to immediately precede the ctx build. No behaviour change. Bump chart 1.4.78 → 1.4.79. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f02136a89c
|
fix(cloud-list): share single SSE via CloudContext — list pages were stuck connecting (#1065)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane Founder asked: "make the real-time k8s information propagation development reused — find the reverted prior work and implement the final working one." History: - PR #358 (May 1) shipped the full informer + SSE data plane: internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics} + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) + UI hook lib/useK8sStream.ts + widget useK8sCacheStream. - PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream with kinds=namespace,node,pv,pod,deployment,...,server.hcloud, volume.hcloud and `&initialState=1` for live cloud-graph deltas. - PR #981 hotfix dropped the synchronous discovery probe in factory.go:AddCluster (it was calling core.Discovery().ServerResourcesForGroupVersion(gv) with NO context timeout — on a kubeconfig pointing at a decommissioned otech the call hung the catalyst-api startup for minutes per dead cluster). After #981 the discovery-probe surgery was clean — no follow-up broke. The data plane code stayed in the codebase. The remaining gap was operational, not architectural: On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>), the catalyst-api boots without a posted-back kubeconfig in /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns [] → factory has zero clusters → every /api/v1/sovereigns/{depId}/k8s/* request 404s with "sovereign \"...\" not registered". The architecture-graph in-flight call confirmed live on omantel.biz today. Fix in this PR: 1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN env is set (chroot mode), build a ClusterRef with id resolved from CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by scanning /var/lib/catalyst/deployments/*.json for a record matching the FQDN (mirrors HandleSovereignSelf's store-fallback path for consistency). DynamicClient + CoreClient built from rest.InClusterConfig(). Append to the cluster list. Mother behavior unchanged — SOVEREIGN_FQDN unset → branch is a no-op. 2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide get/list/watch on every kind in the k8scache registry (pods, deployments, statefulsets, daemonsets, replicasets, services, endpointslices, ingresses, configmaps, secrets, persistentvolumes, persistentvolumeclaims, hcloud.crossplane.io managed resources, vclusters), plus authorization.k8s.io/subjectaccessreviews so the per-event SAR gating in the SSE handler doesn't 403 silently. 3. Bump chart 1.4.70 → 1.4.71. The discovery-probe failure mode that triggered the original revert (synchronous ServerResourcesForGroupVersion blocking startup) does NOT recur here — InClusterConfig() returns immediately, NewForConfig is lazy, and the first network call happens inside the informer goroutine after Start, off the boot critical path. Mother-side LoadClustersFromDir behavior is untouched (no probe, just kubeconfig file parsing as it has been since #981). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): + More popover escapes overflow clip + graph centers via gravity force Two cloud-page bugs caught live on omantel.biz: (1) /cloud?view=list&kind=clusters → +More popover non-functional. The popover renders at its anchor coords but pointer events pass through to the toolbar below it. Diagnosis: .cloud-page-toolbar > [data-testid="cloud-kind-chips"] { overflow-x: auto; } Per CSS spec, when one overflow axis is non-visible, the OTHER axis becomes auto/hidden too. So overflow-x:auto on the chips strip silently sets overflow-y:auto, which clips the absolutely- positioned popover that hangs DOWN from the +More button. Fix: render the popover via React.createPortal to document.body so it's outside any overflow ancestor. Position via fixed coordinates computed from the +More button's getBoundingClientRect, recomputed on resize/scroll. Click-outside dismissal updated to check both wrapper AND portaled popover. (2) /cloud?view=graph → bubbles drift to canvas edges, leaving the centre empty until enough nodes (e.g. worker nodes) are added to anchor things via link tension. Two coupled root causes: a) `forceCenter` only adjusts the centroid — it shifts ALL nodes uniformly so their average sits at (cx, cy). It does NOT pull individual nodes inward. With small node counts and high charge repulsion (-160 for ≤50 nodes), nothing opposes outward drift. b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x = minX`. Nodes that hit the wall get arrested with their velocity preserved on the perpendicular axis but no inward impulse → they slide along the wall and stack at corners. The simulation never relaxes back to the centre. Fix: a) Add forceX(cx) + forceY(cy) with `centerGravity` strength per node-count tier (0.08 for ≤50, scaling down with larger graphs where link tension is sufficient). This pulls every individual node toward the centre proportional to its offset. b) Replace the hard clamp with an elastic bounce: when a node hits the boundary, reverse its velocity component (×0.4 damping) instead of zeroing it. Energy returns to the system, the simulation actually relaxes. Bump chart 1.4.72 → 1.4.73. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cloud): expose all live K8s kinds in +More popover + chip counts + tighter graph centering Founder feedback (after PR #1062 lit up the data plane): 1. The +More popover was missing pods, deployments, statefulsets, daemonsets, configmaps, secrets, namespaces, etc. — it only carried the 6 placeholder kinds the legacy topology API knew about. 2. Several chips (Services, Ingresses, Storage Classes) showed "—" for count even though the data IS in the live cluster (visible in the graph view). 3. The graph view still pushed bubbles to canvas edges; only adding worker nodes brought things back. The previous gravity tuning wasn't strong enough for ~300 nodes. This PR addresses all three. (1) Eleven new K8s-backed list pages exposed in +More: Pods, Deployments, StatefulSets, DaemonSets, ReplicaSets, ConfigMaps, Secrets, Namespaces, Nodes, PersistentVolumes, EndpointSlices. Plus replaced the placeholder Services and Ingresses pages with live K8s tables. All built on a new generic K8sListPage that subscribes to /api/v1/sovereigns/{depId}/k8s/stream (same SSE channel the architecture-graph already uses) and renders a typed-column table per kind. Columns are declared once per kind in kindsPages.tsx; the rendering is uniform so adding a kind is a ~12-line wrapper. (2) CloudPage.kindCounts now folds the live K8s snapshot into the chip-count map. KIND_TO_REGISTRY in kinds.ts maps each chip id to the registry kind name (pods → 'pod' etc). Counts that came from null (data not available) flip to live counts the moment the SSE stream's initialState=1 arrives. (3) GraphCanvas physics retuned for live-data scale: - centerGravity: 0.08→0.18 for ≤50 nodes, 0.06→0.16 for ≤200, 0.04→0.14 for ≤1000, 0.03→0.10 for ≤5000, 0.02→0.08 for >5000. The forceX/forceY pulls every individual node toward (cx,cy) proportional to its offset — 2-3× stronger than the original tuning so the canvas centre stays populated. - Charge softened: -160→-90 for ≤50 nodes, scaled down through every tier. The previous values were calibrated against a ~20-node topology stub; live data delivers 10-50× more nodes per Sovereign so charge needs to relax proportionally. Bump chart 1.4.74 → 1.4.75. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud-list): share single SSE subscription via CloudContext — list pages were stuck connecting After PR #1064 the +More popover was correctly populated and chip counts were live, but clicking through to a list page (e.g. /cloud?view=list&kind=pods) hung at "Connecting to live cluster stream…" while the chip count beside the same kind already showed the right number (110 pods). Diagnosis: the K8sListPage was calling useK8sCacheStream with kinds:[kind], opening its OWN EventSource. The parent CloudPage already had an EventSource open (subscribing to all kinds — the source of the chip counts). Two long-lived SSE streams from the same browser to the same origin starve the connection budget; the second connection hangs at "connecting" while the first holds the slot. Fix: hoist the snapshot via CloudContext. CloudPage is already the owner of the page-level useK8sCacheStream invocation; expose its snapshot/status/revision through the existing useCloud() context. K8sListPage now reads from useCloud() instead of opening a duplicate stream. Single subscription, single source of truth for both chip counts AND list rows. Bump chart 1.4.76 → 1.4.77. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2604c9cf36
|
feat(cloud): all live K8s kinds in +More + chip counts + tighter graph centering (#1064)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane Founder asked: "make the real-time k8s information propagation development reused — find the reverted prior work and implement the final working one." History: - PR #358 (May 1) shipped the full informer + SSE data plane: internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics} + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) + UI hook lib/useK8sStream.ts + widget useK8sCacheStream. - PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream with kinds=namespace,node,pv,pod,deployment,...,server.hcloud, volume.hcloud and `&initialState=1` for live cloud-graph deltas. - PR #981 hotfix dropped the synchronous discovery probe in factory.go:AddCluster (it was calling core.Discovery().ServerResourcesForGroupVersion(gv) with NO context timeout — on a kubeconfig pointing at a decommissioned otech the call hung the catalyst-api startup for minutes per dead cluster). After #981 the discovery-probe surgery was clean — no follow-up broke. The data plane code stayed in the codebase. The remaining gap was operational, not architectural: On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>), the catalyst-api boots without a posted-back kubeconfig in /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns [] → factory has zero clusters → every /api/v1/sovereigns/{depId}/k8s/* request 404s with "sovereign \"...\" not registered". The architecture-graph in-flight call confirmed live on omantel.biz today. Fix in this PR: 1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN env is set (chroot mode), build a ClusterRef with id resolved from CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by scanning /var/lib/catalyst/deployments/*.json for a record matching the FQDN (mirrors HandleSovereignSelf's store-fallback path for consistency). DynamicClient + CoreClient built from rest.InClusterConfig(). Append to the cluster list. Mother behavior unchanged — SOVEREIGN_FQDN unset → branch is a no-op. 2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide get/list/watch on every kind in the k8scache registry (pods, deployments, statefulsets, daemonsets, replicasets, services, endpointslices, ingresses, configmaps, secrets, persistentvolumes, persistentvolumeclaims, hcloud.crossplane.io managed resources, vclusters), plus authorization.k8s.io/subjectaccessreviews so the per-event SAR gating in the SSE handler doesn't 403 silently. 3. Bump chart 1.4.70 → 1.4.71. The discovery-probe failure mode that triggered the original revert (synchronous ServerResourcesForGroupVersion blocking startup) does NOT recur here — InClusterConfig() returns immediately, NewForConfig is lazy, and the first network call happens inside the informer goroutine after Start, off the boot critical path. Mother-side LoadClustersFromDir behavior is untouched (no probe, just kubeconfig file parsing as it has been since #981). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): + More popover escapes overflow clip + graph centers via gravity force Two cloud-page bugs caught live on omantel.biz: (1) /cloud?view=list&kind=clusters → +More popover non-functional. The popover renders at its anchor coords but pointer events pass through to the toolbar below it. Diagnosis: .cloud-page-toolbar > [data-testid="cloud-kind-chips"] { overflow-x: auto; } Per CSS spec, when one overflow axis is non-visible, the OTHER axis becomes auto/hidden too. So overflow-x:auto on the chips strip silently sets overflow-y:auto, which clips the absolutely- positioned popover that hangs DOWN from the +More button. Fix: render the popover via React.createPortal to document.body so it's outside any overflow ancestor. Position via fixed coordinates computed from the +More button's getBoundingClientRect, recomputed on resize/scroll. Click-outside dismissal updated to check both wrapper AND portaled popover. (2) /cloud?view=graph → bubbles drift to canvas edges, leaving the centre empty until enough nodes (e.g. worker nodes) are added to anchor things via link tension. Two coupled root causes: a) `forceCenter` only adjusts the centroid — it shifts ALL nodes uniformly so their average sits at (cx, cy). It does NOT pull individual nodes inward. With small node counts and high charge repulsion (-160 for ≤50 nodes), nothing opposes outward drift. b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x = minX`. Nodes that hit the wall get arrested with their velocity preserved on the perpendicular axis but no inward impulse → they slide along the wall and stack at corners. The simulation never relaxes back to the centre. Fix: a) Add forceX(cx) + forceY(cy) with `centerGravity` strength per node-count tier (0.08 for ≤50, scaling down with larger graphs where link tension is sufficient). This pulls every individual node toward the centre proportional to its offset. b) Replace the hard clamp with an elastic bounce: when a node hits the boundary, reverse its velocity component (×0.4 damping) instead of zeroing it. Energy returns to the system, the simulation actually relaxes. Bump chart 1.4.72 → 1.4.73. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cloud): expose all live K8s kinds in +More popover + chip counts + tighter graph centering Founder feedback (after PR #1062 lit up the data plane): 1. The +More popover was missing pods, deployments, statefulsets, daemonsets, configmaps, secrets, namespaces, etc. — it only carried the 6 placeholder kinds the legacy topology API knew about. 2. Several chips (Services, Ingresses, Storage Classes) showed "—" for count even though the data IS in the live cluster (visible in the graph view). 3. The graph view still pushed bubbles to canvas edges; only adding worker nodes brought things back. The previous gravity tuning wasn't strong enough for ~300 nodes. This PR addresses all three. (1) Eleven new K8s-backed list pages exposed in +More: Pods, Deployments, StatefulSets, DaemonSets, ReplicaSets, ConfigMaps, Secrets, Namespaces, Nodes, PersistentVolumes, EndpointSlices. Plus replaced the placeholder Services and Ingresses pages with live K8s tables. All built on a new generic K8sListPage that subscribes to /api/v1/sovereigns/{depId}/k8s/stream (same SSE channel the architecture-graph already uses) and renders a typed-column table per kind. Columns are declared once per kind in kindsPages.tsx; the rendering is uniform so adding a kind is a ~12-line wrapper. (2) CloudPage.kindCounts now folds the live K8s snapshot into the chip-count map. KIND_TO_REGISTRY in kinds.ts maps each chip id to the registry kind name (pods → 'pod' etc). Counts that came from null (data not available) flip to live counts the moment the SSE stream's initialState=1 arrives. (3) GraphCanvas physics retuned for live-data scale: - centerGravity: 0.08→0.18 for ≤50 nodes, 0.06→0.16 for ≤200, 0.04→0.14 for ≤1000, 0.03→0.10 for ≤5000, 0.02→0.08 for >5000. The forceX/forceY pulls every individual node toward (cx,cy) proportional to its offset — 2-3× stronger than the original tuning so the canvas centre stays populated. - Charge softened: -160→-90 for ≤50 nodes, scaled down through every tier. The previous values were calibrated against a ~20-node topology stub; live data delivers 10-50× more nodes per Sovereign so charge needs to relax proportionally. Bump chart 1.4.74 → 1.4.75. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
167d09348e
|
fix(cloud): +More popover escapes overflow clip + graph centers via gravity force (#1063)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane Founder asked: "make the real-time k8s information propagation development reused — find the reverted prior work and implement the final working one." History: - PR #358 (May 1) shipped the full informer + SSE data plane: internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics} + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) + UI hook lib/useK8sStream.ts + widget useK8sCacheStream. - PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream with kinds=namespace,node,pv,pod,deployment,...,server.hcloud, volume.hcloud and `&initialState=1` for live cloud-graph deltas. - PR #981 hotfix dropped the synchronous discovery probe in factory.go:AddCluster (it was calling core.Discovery().ServerResourcesForGroupVersion(gv) with NO context timeout — on a kubeconfig pointing at a decommissioned otech the call hung the catalyst-api startup for minutes per dead cluster). After #981 the discovery-probe surgery was clean — no follow-up broke. The data plane code stayed in the codebase. The remaining gap was operational, not architectural: On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>), the catalyst-api boots without a posted-back kubeconfig in /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns [] → factory has zero clusters → every /api/v1/sovereigns/{depId}/k8s/* request 404s with "sovereign \"...\" not registered". The architecture-graph in-flight call confirmed live on omantel.biz today. Fix in this PR: 1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN env is set (chroot mode), build a ClusterRef with id resolved from CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by scanning /var/lib/catalyst/deployments/*.json for a record matching the FQDN (mirrors HandleSovereignSelf's store-fallback path for consistency). DynamicClient + CoreClient built from rest.InClusterConfig(). Append to the cluster list. Mother behavior unchanged — SOVEREIGN_FQDN unset → branch is a no-op. 2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide get/list/watch on every kind in the k8scache registry (pods, deployments, statefulsets, daemonsets, replicasets, services, endpointslices, ingresses, configmaps, secrets, persistentvolumes, persistentvolumeclaims, hcloud.crossplane.io managed resources, vclusters), plus authorization.k8s.io/subjectaccessreviews so the per-event SAR gating in the SSE handler doesn't 403 silently. 3. Bump chart 1.4.70 → 1.4.71. The discovery-probe failure mode that triggered the original revert (synchronous ServerResourcesForGroupVersion blocking startup) does NOT recur here — InClusterConfig() returns immediately, NewForConfig is lazy, and the first network call happens inside the informer goroutine after Start, off the boot critical path. Mother-side LoadClustersFromDir behavior is untouched (no probe, just kubeconfig file parsing as it has been since #981). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): + More popover escapes overflow clip + graph centers via gravity force Two cloud-page bugs caught live on omantel.biz: (1) /cloud?view=list&kind=clusters → +More popover non-functional. The popover renders at its anchor coords but pointer events pass through to the toolbar below it. Diagnosis: .cloud-page-toolbar > [data-testid="cloud-kind-chips"] { overflow-x: auto; } Per CSS spec, when one overflow axis is non-visible, the OTHER axis becomes auto/hidden too. So overflow-x:auto on the chips strip silently sets overflow-y:auto, which clips the absolutely- positioned popover that hangs DOWN from the +More button. Fix: render the popover via React.createPortal to document.body so it's outside any overflow ancestor. Position via fixed coordinates computed from the +More button's getBoundingClientRect, recomputed on resize/scroll. Click-outside dismissal updated to check both wrapper AND portaled popover. (2) /cloud?view=graph → bubbles drift to canvas edges, leaving the centre empty until enough nodes (e.g. worker nodes) are added to anchor things via link tension. Two coupled root causes: a) `forceCenter` only adjusts the centroid — it shifts ALL nodes uniformly so their average sits at (cx, cy). It does NOT pull individual nodes inward. With small node counts and high charge repulsion (-160 for ≤50 nodes), nothing opposes outward drift. b) `makeForceBound` was a HARD clamp: `if (n.x < minX) n.x = minX`. Nodes that hit the wall get arrested with their velocity preserved on the perpendicular axis but no inward impulse → they slide along the wall and stack at corners. The simulation never relaxes back to the centre. Fix: a) Add forceX(cx) + forceY(cy) with `centerGravity` strength per node-count tier (0.08 for ≤50, scaling down with larger graphs where link tension is sufficient). This pulls every individual node toward the centre proportional to its offset. b) Replace the hard clamp with an elastic bounce: when a node hits the boundary, reverse its velocity component (×0.4 damping) instead of zeroing it. Energy returns to the system, the simulation actually relaxes. Bump chart 1.4.72 → 1.4.73. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2ad31b4481
|
feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes real-time data plane (#1062)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(k8scache): chroot Sovereign self-registers via in-cluster config — completes the real-time data plane Founder asked: "make the real-time k8s information propagation development reused — find the reverted prior work and implement the final working one." History: - PR #358 (May 1) shipped the full informer + SSE data plane: internal/k8scache/{factory,kinds,sar,redact,snapshot,hydrate,metrics} + handler/k8s.go (HandleK8sList, HandleK8sStream, HandleK8sSync) + UI hook lib/useK8sStream.ts + widget useK8sCacheStream. - PR #978 (May 5) wired ArchitectureGraphPage to useK8sCacheStream with kinds=namespace,node,pv,pod,deployment,...,server.hcloud, volume.hcloud and `&initialState=1` for live cloud-graph deltas. - PR #981 hotfix dropped the synchronous discovery probe in factory.go:AddCluster (it was calling core.Discovery().ServerResourcesForGroupVersion(gv) with NO context timeout — on a kubeconfig pointing at a decommissioned otech the call hung the catalyst-api startup for minutes per dead cluster). After #981 the discovery-probe surgery was clean — no follow-up broke. The data plane code stayed in the codebase. The remaining gap was operational, not architectural: On a chroot Sovereign Console (post-cutover, console.<sov-fqdn>), the catalyst-api boots without a posted-back kubeconfig in /var/lib/catalyst/kubeconfigs/. LoadClustersFromDir returns [] → factory has zero clusters → every /api/v1/sovereigns/{depId}/k8s/* request 404s with "sovereign \"...\" not registered". The architecture-graph in-flight call confirmed live on omantel.biz today. Fix in this PR: 1. **k8scache.FactoryFromEnv chroot self-register**: when SOVEREIGN_FQDN env is set (chroot mode), build a ClusterRef with id resolved from CATALYST_SELF_DEPLOYMENT_ID env (orchestrator-stamped) or by scanning /var/lib/catalyst/deployments/*.json for a record matching the FQDN (mirrors HandleSovereignSelf's store-fallback path for consistency). DynamicClient + CoreClient built from rest.InClusterConfig(). Append to the cluster list. Mother behavior unchanged — SOVEREIGN_FQDN unset → branch is a no-op. 2. **ClusterRole catalyst-api-cutover-driver**: grant cluster-wide get/list/watch on every kind in the k8scache registry (pods, deployments, statefulsets, daemonsets, replicasets, services, endpointslices, ingresses, configmaps, secrets, persistentvolumes, persistentvolumeclaims, hcloud.crossplane.io managed resources, vclusters), plus authorization.k8s.io/subjectaccessreviews so the per-event SAR gating in the SSE handler doesn't 403 silently. 3. Bump chart 1.4.70 → 1.4.71. The discovery-probe failure mode that triggered the original revert (synchronous ServerResourcesForGroupVersion blocking startup) does NOT recur here — InClusterConfig() returns immediately, NewForConfig is lazy, and the first network call happens inside the informer goroutine after Start, off the boot critical path. Mother-side LoadClustersFromDir behavior is untouched (no probe, just kubeconfig file parsing as it has been since #981). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
eb6a3c1812
|
fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs — Sovereigns + contabo were frozen at :2122fb8 (#1060)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8361df46ac
|
feat(apps): publish chip on each card — replaces deleted /catalog page (#1059)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
aed0a81f75
|
fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page (#1058)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8c8ccfbfed
|
fix(chroot): single chrome — no frame in frame, no mother handover banner (#1057)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
933b321890
|
fix(cloud): resolve deploymentId from cookie on chroot (#1056)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
fb7cfbcf8e
|
fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s (#1055)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ee8d2e2b0e
|
fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store, single endpoint (#1054)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9ec32e3311
|
fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 (#1051)
PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
366395c9d1
|
fix(graphcanvas): defensive label render + adapter never-undefined labels (#1049)
Crash on omantel.biz /cloud: 'TypeError: Cannot read properties of
undefined (reading length)' at GraphCanvas line 975 — n.label was
undefined when adapter produced a Region node from a topology where
region.name was empty AND region.providerRegion was undefined
(legacy mother-side adapter assumed both were populated).
Two-layer fix:
1. GraphCanvas — coerce label to '' before .length / .slice.
2. adapter.ts — addRegion / addCluster fall back to id then a
literal placeholder so the produced node always has a non-
empty label.
Bumps bp-catalyst-platform 1.4.54 → 1.4.55.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
|
||
|
|
959879a7e4
|
fix(architecture-graph): try/catch hierarchyToGraph + k8sToGraph (#1048)
The Sovereign-mode /api/v1/sovereign/topology shape lacks some fields the legacy hierarchyToGraph adapter dereferences (skuCp, skuWorker, providerRegion etc.). Wrap both adapter calls in try/catch so a missing field falls through to an empty graph rather than crashing the entire /cloud page via the React error boundary. Caught on omantel.biz 2026-05-06. Bumps bp-catalyst-platform 1.4.53 → 1.4.54. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
28d2cf17df
|
fix(cloud-page): defensive normalize + try/catch fallback to empty topology (#1047)
CloudPage threw 'Cannot read properties of undefined (reading length)' on omantel.biz because the Sovereign-mode topology shape carried slimmer fields than the wizard mother-side shape (region.id/name empty, node.region missing, etc). Add per-field nullish defaults at each level of the normalize + a try/catch fallback that renders an empty topology instead of crashing the entire page via the React error boundary. Bumps bp-catalyst-platform 1.4.52 → 1.4.53. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
862c77be1b
|
fix(jobs/jobdetail): URL-encode multi-segment live job ids + strict:false params (#1046)
The live /api/v1/sovereign/jobs endpoint returns job ids like 'job/syft-grype/syft-grype-bp-syft-grype-29633910' that contain '/'. tan-stack's '/jobs/$jobId' route matches a single segment so links to multi-segment ids 404'd. Encode the id in the link builder + decode in JobDetail. Also switches JobDetail's strict-mode useParams (the '/provision/$deploymentId/jobs/$jobId' from-clause) to strict:false + useResolvedDeploymentId fallback so it works on the chroot Sovereign route too. Caught on omantel.biz 2026-05-06. Bumps bp-catalyst-platform 1.4.51 → 1.4.52. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
fe4aa109d5
|
fix(sovereign-topology): return CloudSpec[] not object — CloudPage iterates (#1045)
CloudPage threw 'TypeError: e.cloud is not iterable' on omantel.biz
because /api/v1/sovereign/topology returned cloud as a JSON object
{provider, providerRegion} but the UI's HierarchicalInfrastructure
contract is cloud: CloudSpec[] (CloudPage runs for-of and useMemo
over it). Fixed: shape cloud as a single-element array of CloudSpec
(id/name/provider/regionCount/quotaUsed/quotaLimit) and add the
missing storage block (storageClasses/pools/volumes/buckets) the
UI also expects.
Bumps bp-catalyst-platform 1.4.50 → 1.4.51.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
|
||
|
|
15ae8796bc
|
fix(sovereign-console): close DoD gaps — Invariant + missing endpoints + chroot fetchers (#1044)
This is the comprehensive fix for the chroot Sovereign Console DoD
gaps caught on omantel.biz 2026-05-06. Eight pages were broken with
"Something went wrong!" / "Invariant failed" / "Couldn't load" /
"Not Found"; root causes traced to (a) /api/v1/sovereign/self
returning 503 because env vars weren't populated post-handover,
(b) several Sovereign endpoints (/users, /catalog, /settings,
/topology) didn't exist server-side, and (c) several pages used
strict-mode useParams against the mother-side /provision/$id/...
route which throws Invariant on the chroot /apps, /users, /settings,
/app/$id routes.
Server changes:
- auth.Claims gains SovereignFQDN + DeploymentID fields.
- auth_handover.go authHandoverClaims gains the same; the minted
Sovereign session JWT now carries them so downstream handlers
can resolve identity without env or store-fallback.
- sovereign_self.go reads sovereign_fqdn / deployment_id from the
catalyst_session cookie payload (best-effort base64 decode; no
signature check needed since this catalyst-api minted the cookie
in the first place). Resolution order: env → cookie → store →
503/404.
- new handlers in sovereign_more.go:
GET /api/v1/sovereign/users — Keycloak realm users
GET /api/v1/sovereign/catalog — embedded blueprints catalog
GET /api/v1/sovereign/settings — tenant identity + features
GET /api/v1/sovereign/topology — hierarchical infra view
for CloudPage's getHierarchicalInfrastructure()
All return well-shaped empty responses on any error (no 500s
that bubble into UI error boundaries).
UI changes:
- SettingsPage / AppDetail / UserAccessListPage replace strict-mode
useParams({ from: '/provision/$deploymentId/...' }) with
useParams({ strict: false }) + useResolvedDeploymentId() fall-
back. Now works on BOTH the mother route AND the chroot
Sovereign route without throwing Invariant.
- CatalogAdminPage's fetchApps swaps /catalog/apps → /api/v1/
sovereign/catalog when window.location.hostname is not
console.openova.io.
- getHierarchicalInfrastructure (CloudPage's source) swaps
/api/v1/deployments/{id}/infrastructure/topology → /api/v1/
sovereign/topology under the same chroot guard.
Bumps bp-catalyst-platform 1.4.49 → 1.4.50.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
|
||
|
|
68e61eb306
|
fix(jobs): coerce Sovereign live response into full Job shape (#1042)
The /api/v1/sovereign/jobs endpoint returns a minimal shape
{id, name, namespace, kind, status, startedAt, finishedAt} — no
appId, parentId, dependsOn, childIds. JobsTable iterates
`for (const d of job.dependsOn)` and reads
`job.appId.toLowerCase()` etc., which throws TypeError
'Cannot read properties of undefined (reading length)' and
breaks page render entirely (0 rows shown).
Coerce missing fields to safe defaults in defaultFetchJobs so
the table renders. Followup: server-side handler should return
the full Job shape with empty arrays for missing fields.
Bumps bp-catalyst-platform 1.4.48 → 1.4.49.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
|
||
|
|
8638613225
|
fix(useLiveJobsBackfill): enable query on Sovereign mode even when deploymentId empty (#1041)
The useLiveJobsBackfill hook gates with `enabled: enabled && !!deploymentId`. On chroot Sovereign Console where /sovereign/self returns 503 (deployment-id-not-yet-stamped) and the route doesn't carry an :deploymentId param, deploymentId is the empty string and the query NEVER mounts. Live jobs always remained empty, mergeJobs fell through to reducer-derived imported snapshot (every job pinned at 'pending'). Fix: when DETECTED_MODE.mode === 'sovereign', enable the query regardless of deploymentId emptiness. The URL is FQDN-scoped via the session cookie, no deploymentId needed in the path. Bumps bp-catalyst-platform 1.4.47 → 1.4.48. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
6f64753ea9
|
fix(cloud-page): defensive slice guard + bump chart 1.4.47 with literal :2122fb8 (#1040)
CloudPage's switcher rendered `d.id.slice(0, 8)` without a nullish guard. When listDeployments returns an entry with undefined id (e.g. malformed/legacy record), this throws TypeError 'Cannot read properties of undefined (reading slice)' which the React error boundary catches as 'Invariant failed', breaking all of /cloud. Caught on omantel.biz 2026-05-06. Also bumps the literal :91eeeed → :2122fb8 in api-deployment.yaml / ui-deployment.yaml so freshly provisioned Sovereigns pick up the JobsPage+AppsPage live-status fix from PR #1039 (chart 1.4.46's values.yaml had :2122fb8 but the templated literals didn't). Bumps bp-catalyst-platform 1.4.46 → 1.4.47. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
2122fb81c0
|
fix(sovereign-console): jobs + apps pages show LIVE status (not imported snapshot Pending) (#1039)
Symptom on omantel.biz 2026-05-06: every job and every app on the Sovereign Console showed "Pending" forever, even when the underlying HelmReleases were Ready=True and the cluster was fully operational. Root cause: - JobsPage's useLiveJobsBackfill was gated by `inFlight = streamStatus !== 'completed' && streamStatus !== 'failed'`. The imported snapshot mother POSTs at handover ALWAYS arrives with streamStatus="completed" (mother considered phase-1 done before firing the JWT). So inFlight=false and disablePolling=true on Sovereign mode → liveJobs.length=0 → mergeJobs returns the reducer-derived imported snapshot (every job pinned at "pending"). - AppsPage read `state.apps[id].status` from the same imported reducer state. No live-status overlay. Fix: - JobsPage: bypass the inFlight gate when DETECTED_MODE.mode === 'sovereign'. Live polling /api/v1/sovereign/jobs is the authoritative source on chroot Sovereign Console. - AppsPage: add a useQuery polling /api/v1/sovereign/apps every 5s on Sovereign mode, mapping the server's status enum (installed | installing | bootstrap | available) to the UI's ApplicationStatus vocabulary, and overlay it on top of the reducer-derived status. Bumps bp-catalyst-platform 1.4.45 → 1.4.46. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
838094348a
|
fix(rbac): grant catalyst-api SA cluster reads for /sovereign/cloud + /apps (#1038)
The Sovereign Console's chroot /cloud and /apps panes back onto
HandleSovereignCloud / HandleSovereignApps in catalyst-api, which
use the in-cluster client to enumerate cluster-wide K8s resources
(Nodes, Namespaces, Services, PVCs, StorageClasses, Ingresses,
HTTPRoutes, HelmReleases). The pre-existing ClusterRole only
covered the cutover-step Job-driving verbs (configmaps/jobs/pods).
Caught on otech130 2026-05-06: /api/v1/sovereign/cloud returned
{nodes:[], namespaces:[], …} because every List call hit a silent
apiserver Forbidden, and the handler's err branch falls through
to an empty response shape.
Adds get/list/watch on:
- core: nodes, namespaces, services, persistentvolumes,
persistentvolumeclaims
- networking.k8s.io: ingresses
- gateway.networking.k8s.io: httproutes, gateways
- storage.k8s.io: storageclasses
- helm.toolkit.fluxcd.io: helmreleases
Bumps bp-catalyst-platform 1.4.44 → 1.4.45.
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
|
||
|
|
d2ca2d492b
|
chore(bp-catalyst-platform): bump 1.4.43 → 1.4.44 + literal :ff864e9 → :91eeeed (#1032 PortalShell sidebar fix) (#1037)
Chart 1.4.43 was built before PR #1032 bumped chart Chart.yaml in the same commit, so its values.yaml had tag :91eeeed but the hardcoded image refs in templates/api-deployment.yaml and templates/ui-deployment.yaml stayed at :ff864e9 (the previous bump from PR #1030). Sovereigns provisioned with chart 1.4.43 therefore still have the duplicate-sidebar bug — caught on otech129 2026-05-05. This bump pins the literal refs to :91eeeed, which is PR #1032's commit SHA. Bootstrap-kit pin moves 1.4.43 → 1.4.44 so otech130+ get the PortalShell skip-inner-Sidebar logic. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
fc36731b4a
|
chore(bootstrap-kit): pin bp-catalyst-platform 1.4.41 → 1.4.43 (PR #1032 PortalShell sidebar fix) (#1035)
PR #1032's sed target was '1.4.42' but the in-tree pin was still 1.4.41 (chart Chart.yaml had been bumped 1.4.42 by the deploy job but the bootstrap-kit YAML file pinning the chart version for freshly provisioned Sovereigns was untouched). Picked up live on otech128 2026-05-05 — it provisioned with chart 1.4.41 and still exhibited the duplicate sidebar bug PR #1032 was meant to fix. This commit bumps the pin so otech129+ get chart 1.4.43. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
a6fb97f2ef
|
fix(cutover step-01): clone+push (regular repo) instead of pull-mirror (#1033)
PR #1029 added a step-06 PATCH to flip mirror=false before push so the cutover-helmrepository-patches Job could write HelmRepository URL pivots to local Gitea. On Gitea 1.22.3 the PATCH returns 200 but silently no-ops — `mirror_interval` updates but `mirror: true` stays. The repo remains read-only and step-06 still hits HTTP 403 "remote: mirror repository is read-only". Reproduced on otech127 2026-05-05 with chart 0.1.22 deployed. Per ADR (cutover ends upstream tracking — Sovereign goes self-hosted from this point), the architecturally correct fix is to never create the mirror in the first place. Step-01 now creates a regular Gitea repo and bare-clones+pushes upstream content. All refs (branches+tags) replicate via `git push --mirror --force`, which is idempotent on re-runs. Trade-off: post-cutover Sovereigns no longer auto-sync from upstream — that's the intended cutover semantics anyway. Operator re-runs this Job manually for chart rollouts (next-session follow-up: dedicated post-cutover sync mechanism, perhaps a periodic CronJob the operator can opt into). Bumps: - bp-self-sovereign-cutover chart 0.1.22 → 0.1.23 - bootstrap-kit pin 0.1.22 → 0.1.23 Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
a070808eda
|
fix(cutover step-06): convert pull-mirror to standalone before pushing patches (#1029)
Step-01 creates openova/openova on the Sovereign's local Gitea as a
pull mirror so it tracks upstream openova-public during early
bootstrap. After cutover, the Sovereign is self-hosted and MUST
diverge from upstream — but Gitea blocks pushes to a mirror with
HTTP 403 "remote: mirror repository is read-only".
Step-06 adds a Phase-1.5 PATCH /api/v1/repos/{owner}/{repo}
{"mirror": false, "mirror_interval": "0"} BEFORE attempting to
clone+push the HelmRepository URL pivot. This converts the
pull-mirror into a standalone writable repo — the way the post-
cutover Sovereign architecture expects it.
Caught on otech125 2026-05-05: cutover-helmrepository-patches Job
returned "FATAL: git push failed" with no upstream stderr (chart
0.1.20 lacks the printf '%s\n' "$push_err" fix from PR #1022, which
was published in 0.1.21 only). Reproduced by cloning openova/openova
from a debug pod and running git push: "remote: mirror repository
is read-only / fatal: ... HTTP 403". Without the demirror step,
EVERY Sovereign provisioned fails handover at this step.
Bumps:
- bp-self-sovereign-cutover chart 0.1.21 → 0.1.22
- bootstrap-kit pin 0.1.20 → 0.1.22
Co-authored-by: Hati Yildiz <hatiyildiz@openova.io>
|
||
|
|
4e2192ef4a
|
fix(deployments-list): row click goes to that row's dashboard, not the current one (#1026)
The Sovereign Console at /sovereign/deployments rendered every row's FQDN as a Link to=`/dashboard` regardless of which row was clicked. On contabo (mother) this resolved to /sovereign/dashboard (the CURRENT user's Sovereign), so clicking ANY row in the deployments list always navigated to the same dashboard — breaking the operator's expectation that "click row X to see deployment X's pages." Fix: route each row to /provision/<row-id>/dashboard on the mother view (Catalyst-Zero), and to /dashboard on the chroot Sovereign view (where each Sovereign sees only its own deployment, so /dashboard is correct). Mode resolved via the existing DETECTED_MODE singleton. Bumps bp-catalyst-platform chart 1.4.40 → 1.4.41. Co-authored-by: Hati Yildiz <hatiyildiz@openova.io> |
||
|
|
aba77c09a1
|
chore(bp-catalyst-platform): bump 1.4.39 → 1.4.40 + literal :1b62da7 → :074d65c (#1023 store-fallback) (#1024)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
362a377dc3
|
chore(bp-catalyst-platform): bump 1.4.38 → 1.4.39 + literal :69f3be2 → :1b62da7 (#1017 LIVE jobs) (#1020)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
b8ef07def4
|
chore(bp-catalyst-platform): bump 1.4.37 → 1.4.38 + literal :32d4a87 → :69f3be2 (#1014 sidebar redux) (#1015)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
4f3cce668d
|
chore(bp-catalyst-platform): bump 1.4.36 → 1.4.37 + literal :a1b30cc → :32d4a87 (#1012 wizard validators public) (#1013)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
78fe10aa87
|
chore(bp-catalyst-platform): bump 1.4.35 → 1.4.36 + literal :8ec8c01 → :a1b30cc (#1008 public subdomains/check) (#1009)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
b887f95d29
|
chore(bp-catalyst-platform): bump 1.4.34 → 1.4.35 + literal :b45a49f → :8ec8c01 (#1005)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
1b85ab9227
|
chore(bp-catalyst-platform): bump 1.4.33 → 1.4.34 + literal :11dd19e → :b45a49f (#1000 cloud chroot + wizard banner) (#1003)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
b15f08bc1e
|
chore(bp-catalyst-platform): bump 1.4.32 → 1.4.33 + literal :1af1c0d → :11dd19e (#998 chroot fix) (#999)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
2e493fc4f7
|
chore(bp-catalyst-platform): bump 1.4.31 → 1.4.32 + literal :ffe3607 → :1af1c0d (#996 redirect fixes) (#997)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
498a02549a
|
chore(bp-catalyst-platform): bump 1.4.30 → 1.4.31 + literal :019309f → :ffe3607 (#995)
Lands #994's wizard redirect fix on contabo + Sovereigns. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
92f1eb8468
|
chore(bp-catalyst-platform): bump 1.4.29 → 1.4.30 + chart literal :8a1fe04 → :019309f (#993)
Lands the clean post-revert image on Sovereigns:
- :019309f is the catalyst-build output for commit
|
||
|
|
e8fcd66a2b
|
chore(bp-catalyst-platform): bump 1.4.28 → 1.4.29 — pulls in #983 URL contract (#986)
Bumps the chart version + the per-Sovereign HelmRelease pin in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml so all Sovereigns reconciling against the template (otech117 et al.) pick up PR #983's fixes: - /dashboard /apps /jobs /cloud … render at clean roots; no /console/ prefix and no /provision/<id>/ prefix on Sovereign mode. - sovereign_self.go store fallback — data flows on clean URLs the moment fireHandover POSTs the deployment record to /api/v1/internal/ deployments/import; no waiting for a chart-values overlay roundtrip. - Sidebar links land on clean roots — no more /provision//cloud. - Auth handover redirect target → /dashboard (was /console/dashboard). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ed8872a15b
|
feat(catalyst-api): mother→child cutover data transfer at handover (#977)
The data half of the mother→child contract that PR #976 set up the URL routing for. At handover the mother POSTs the full deployment record (events, jobs history, HRs, cloud topology, kubeconfig meta) to the child's POST /api/v1/internal/deployments/import — the child persists it locally so its /api/v1/deployments/{id}/* endpoints answer with byte-byte-identical data the operator sees on the mother view at /sovereign/provision/<id>/<page>. Result: on the child cluster, clean URLs (/dashboard, /apps, /jobs, /cloud) render with REAL data (events, exec logs, job statuses, treemap utilisation) instead of empty arrays. - New endpoint: POST /api/v1/internal/deployments/import (child) Validates by FQDN match against CATALYST_OTECH_FQDN. Idempotent. - Mother fireHandover() now posts the record to the child after the JWT mint as a fire-and-forget goroutine. Failure logs loudly per INVIOLABLE-PRINCIPLES #3 but does not block SSE emit. Bumped: bp-catalyst-platform 1.4.27 → 1.4.28. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6ec7851bc2
|
feat(sovereign-console): kill duplicate /console/* pages, redirect to canonical /provision/$id/* (Iteration 1) (#972)
* feat(sovereign-console): kill duplicate /console/* pages, redirect to canonical /provision/$id/* (Iteration 1) Founder-reported on otech116/117: the /console/dashboard, /console/apps, /console/jobs, /console/cloud, /console/users, /console/settings pages are STUBS that look completely different from the canonical Sovereign Console operators see at console.openova.io/sovereign/provision/$id/*. Investigation: 6 duplicate Console*Page React components were shipped in PR #937 — separate stub implementations of pages that already exist as the canonical Dashboard / AppsPage / JobsPage / CloudPage / UserAccessListPage / SettingsPage components used by the /provision/$deploymentId/* route tree (the same the wizard renders). Fix (Iteration 1): - DELETE the 6 duplicate Console*Page components. - Replace the /console/* router routes with SovereignConsoleRedirect: a tiny component that fetches /api/v1/sovereign/self for the Sovereign's own deployment id, then router-navigates to the canonical /provision/<self-id>/<page>. Same components, same data, pixel-byte-byte-identical UI to the mothership view. - Add catalyst-api endpoint GET /api/v1/sovereign/self that returns the deployment id from CATALYST_SELF_DEPLOYMENT_ID env. Mothership (env unset) → 404. Sovereign with stamped id → 200. Sovereign pre-handover → 503 deployment-id-not-yet-stamped. - Wire env via the existing sovereign-fqdn ConfigMap (B1 PR #912): new key `selfDeploymentId`, sourced from .Values.global.sovereignSelfDeploymentId. Empty until the orchestrator's per-Sovereign overlay writer stamps it. - Add useResolvedDeploymentId React hook (URL params first, then /sovereign/self fallback) — wires Iteration 2 (clean URLs) below. Iteration 2 (next PR — out of scope here): - Drop the /sovereign/provision/<id>/ URL prefix on Sovereign by refactoring 6 canonical components to use useResolvedDeploymentId instead of strict useParams. Then /console/dashboard renders the canonical Dashboard at the clean URL with deployment id resolved from /sovereign/self. Iteration 3 (next PR after — also out of scope): - Handover history transfer: contabo's catalyst-api at handover POSTs the full deployment record (events, jobs, HRs, cloud topology) to the Sovereign's catalyst-api so /provision/<id>/* on the Sovereign answers with byte-byte-identical data. Bumped: bp-catalyst-platform 1.4.26 → 1.4.27. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sovereign-console): clean URLs — /console/* mounts canonical components directly Removes the SovereignConsoleRedirect indirection. The 6 canonical operator components (Dashboard, AppsPage, JobsPage, JobDetail, CloudPage, AppDetail, UserAccessListPage, UserAccessEditPage, SettingsPage) now render at clean /console/<page> URLs on Sovereign, NOT under /sovereign/provision/<id>/<page>. Pages that previously hard-coupled to the URL via useParams({ from: '/provision/$deploymentId/...' }) now use useResolvedDeploymentId() which: 1. reads URL params (when on the legacy /provision/$id/* tree on contabo's mothership wizard) 2. falls back to GET /api/v1/sovereign/self (Sovereign self-discovery) Refactored: Dashboard, AppsPage, JobsPage, SettingsPage, UserAccessListPage. CloudPage already used strict:false — no change needed. Wires the /console/* router subtree to the canonical components + adds the missing children routes (/jobs/$jobId, /users/new, /users/$name, /app/$componentId) so the canonical UI's deep-links work on the clean URL surface too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
608db53a25
|
fix(cutover 0.1.20): Step-06 pushes YAML edit to local Gitea so patches survive Flux reconcile (#970) (#971)
## Root cause (live on otech116 2026-05-05 14:38) After the #968 fix shipped (0.1.19), the cutover engine reached Step-7 (87%) successfully — Step-01..07 all completed. Then Step-08 (egress- block-test) caught 38/38 HelmRepositories had reverted to upstream: ``` external HelmRepositories still pointing at ghcr.io/openova-io: 38 OFFENDER flux-system/bp-cilium=oci://ghcr.io/openova-io ... (37 more) FAIL — at least one HelmRepository did not pivot ``` But Step-06's job logs say: ``` [helmrepository-patches] OK bp-cilium -> oci://harbor.otech116.omani.works/openova-io ... (37 more OK) ok=38 skip=0 fail=0 ``` So Step-06 thought it succeeded — and it had, momentarily. But then the bootstrap-kit Kustomization (which had successfully pivoted to local Gitea via Step-05) reconciled its YAML from local Gitea, where the YAML still declared `url: oci://ghcr.io/openova-io`. Within ~30s every kubectl patch was undone. The cutover engine then aborted at Step-8 verification. ## Fix Step-06 now runs in two phases: 1. **Live K8s patches** (existing behaviour) — flips spec.url on every HelmRepository immediately. Useful for the cluster between cutover and the next reconcile. 2. **NEW — Push YAML edit to local Gitea** — clones `openova/openova` from the local Gitea over basic-auth, sed-rewrites every `clusters/_template/bootstrap-kit/*.yaml` declaration of `url: oci://ghcr.io/openova-io` → `oci://harbor.<sov-fqdn>/openova-io`, commits with a clear message, pushes back. Subsequent reconciles see local Harbor as the steady-state. After the push, the script annotates `flux-system/openova` GitRepository to trigger immediate reconciliation so the new YAML lands without waiting for the polling interval. ## Image change Step-06 image bumped from `bitnami/kubectl:1.31.4` to `alpine/k8s:1.31.4` because the new phase needs both `kubectl` and `git` in one image (verified live on otech116 — both binaries present). ## Acceptance gate Test case 16 added to cutover-contract.sh — guards against future regressions that remove the `git clone`, the `git push origin main`, or the `clusters/_template/bootstrap-kit` target dir reference. ## Live verification Will fire on otech117 (next provision). Expected: - Step-06 logs `cloning gitea-http.gitea.../openova/openova.git` then `pushed to ...` - Step-08 verify PASSES (38/38 HelmRepositories pivoted in K8s + Gitea) - self-sovereign-cutover-status `cutoverComplete: "true"` - Egress block to ghcr.io safely activates Co-authored-by: e3mrah <ebaysal@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3db19b76b1
|
fix(cutover 0.1.19): Step-01 gitea-mirror DNS readiness probe + backoffLimit=3 (#968) (#969)
## Root cause (live on otech115 2026-05-05 14:15) After PR #959 (0.1.18) unblocked the auto-trigger to actually call /internal/cutover/trigger, the cutover engine fired Step-01 within ~8s of bp-self-sovereign-cutover Helm-install completing. The gitea Pod had only just reached Ready state — cluster-DNS endpoint publication for the headless service `gitea-http` was still in flight. One wget returned `bad address gitea-http.gitea.svc.cluster.local` and exited non-zero. Catalyst-api's cutover engine stamped Jobs with backoffLimit=0 (cutover.go:584), so a single DNS miss was terminal and aborted all 8 cutover steps. otech115 finished provisioning with cutoverComplete=false and tethered to upstream github.com/ghcr.io. ## Fix (dual-layer) **Layer A — catalyst-api (cutover.go)**: backoffLimit lifted from 0 to 3. A single transient miss is recoverable (4 attempts over each step's activeDeadlineSeconds) without burning operator-attention. Hard failures still surface within budget. **Layer B — chart Step-01 (01-gitea-mirror-job.yaml)**: explicit nslookup readiness probe at the top of the bash script, before any wget call. 30 attempts × 5s = 150s budget; alpine/git ships nslookup in /usr/bin (verified live on otech115). Layer B is faster than Layer A (in-script DNS retry vs Pod recreate); Layer A is the safety net for any other transient pre-cluster-stable race we haven't yet enumerated. ## Acceptance gate Test case 15 added to platform/self-sovereign-cutover/chart/tests/ cutover-contract.sh — guards against future regressions that drop either the gitea_host extraction or the nslookup loop. ## Live verification Will fire on the next provision (otech116). Expected: - Step-01 logs `[gitea-mirror] DNS ready for gitea-http.gitea.svc.cluster.local (attempt N)` - All 8 cutover Jobs reach Complete - self-sovereign-cutover-status ConfigMap reaches cutoverComplete=true Co-authored-by: e3mrah <ebaysal@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d1431bed09
|
fix(autoscaler+wizard): wire HCLOUD_CLOUD_INIT, validate SKU/region in catalyst-api (#965)
Closes #921 — bp-cluster-autoscaler-hcloud chart shipped without HCLOUD_CLUSTER_CONFIG / HCLOUD_CLOUD_INIT, so cluster-autoscaler 1.32.x FATALs at startup with "HCLOUD_CLUSTER_CONFIG or HCLOUD_CLOUD_INIT is not specified" on every Sovereign (otech112 evidence). HelmRelease reports Ready=True (Helm install succeeded) but the Pod CrashLoopBackOffs invisibly behind the False-positive condition. Closes #916 — wizard let operators dispatch unbuildable topologies (otech109: cpx32 worker in `ash`) because PROVIDER_NODE_SIZES did not encode regional orderability. Hetzner rejected the worker creation 41s into `tofu apply` after Phase-0 had already created the CP + network + LB + firewall. Chart fix (issue #921): - Add `clusterAutoscalerHcloud.{clusterConfig,cloudInit}` values to the umbrella chart (base64-encoded per upstream contract). - Render `hetzner-node-config` Secret unconditionally with both keys so the upstream Deployment's secretKeyRef references resolve cleanly during `helm template` AND in the live cluster regardless of overlay state. - Wire HCLOUD_CLUSTER_CONFIG + HCLOUD_CLOUD_INIT extraEnvSecrets onto the upstream chart's deployment. - Tofu Phase 0 base64-encodes the Phase-0 worker cloud-init and stamps it under `flux-system/cloud-credentials.hcloud-cloud-init`; the bootstrap-kit overlay lifts that key via Flux `valuesFrom` into `clusterAutoscalerHcloud.cloudInit`. Autoscaler-spawned workers thus receive the IDENTICAL bootstrap as the Phase-0 worker fleet. - Bump bp-cluster-autoscaler-hcloud chart 1.0.0 → 1.1.0. - Chart-test smoke gate (chart/tests/hetzner-node-config.sh) verifies Secret + env var wiring + no-regression of HCLOUD_TOKEN — runs in CI's blueprint-release "Run chart integration tests" step. Wizard fix (issue #916): - Add `availableRegions?: string[]` to NodeSize interface; encode cpx32 = ['fsn1','nbg1','hel1'], cpx21/cpx31 = [] (orderable nowhere new) per Hetzner /v1/server_types vs POST /v1/servers gap. - Add `isSkuAvailableInRegion()` + `suggestAlternativeSkus()` helpers. - StepProvider filters SKU dropdowns by selected region; auto-swaps current SKU to recommended default when region change drops it out of orderability. - Mirror the matrix Go-side in sku_availability.go; gate `provisioner.Request.Validate()` with same predicate so a stale wizard build OR direct API caller bypassing the UI cannot dispatch otech109's failure mode. - Two-sided enforcement covers both r.Regions[] (multi-region) and the legacy singular path. Tests: 13 vitest cases on the wizard side + 38 Go subtests on the API side. Chart smoke renders + helm template gates the env wiring at publish time. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ae5766f2d0
|
fix(bp-catalyst-platform 1.4.26): grant catalyst-api TokenReview RBAC for cutover trigger (#957) (#962)
Chart 0.1.18 fixed the readiness-probe loop on the auto-trigger Job (was 401-looping forever on /sovereign/cutover/status). The trigger now reaches /api/v1/internal/cutover/trigger — but every call returns 502 "token-review-failed" in <10ms because the catalyst-api SA does not have permission to create TokenReviews against the apiserver. PR #947 wired the endpoint but not its RBAC. The ClusterRole catalyst-api-cutover-driver had every verb the cutover engine needs (configmaps, jobs, events, deployments, daemonsets) EXCEPT authentication.k8s.io/tokenreviews — which the in-cluster trigger endpoint depends on for SA bearer-token validation. Live evidence on otech113 2026-05-05 12:02:55: GET /healthz → 200 (probe success — 0.1.18 fix working) POST /api/v1/internal/cutover/trigger → 502 in 8.879ms $ kubectl auth can-i create tokenreviews \ --as=system:serviceaccount:catalyst-system:catalyst-api-cutover-driver no Fix: add a separate Rule in clusterrole-cutover-driver.yaml for authentication.k8s.io/tokenreviews verbs=[create]. Per feedback_rbac_create_no_resourcenames.md the create verb stays in its own Rule (TokenReview is a virtual sub-resource with no name to scope to anyway). Bumped: - products/catalyst/chart/Chart.yaml: 1.4.25 → 1.4.26 - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: pin 1.4.26 Closes the #957 follow-up RBAC gap; PR #959 fixed the readiness loop. Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |