fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs — Sovereigns + contabo were frozen at :2122fb8 (#1060)

* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56

PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers,
HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology)
but left four route registrations in cmd/api/main.go that still
referenced those handler methods. The catalyst-api build for the merged
revert (run 25439549879) failed with:

  cmd/api/main.go:690:39: h.HandleSovereignUsers undefined
  cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined
  cmd/api/main.go:692:42: h.HandleSovereignSettings undefined
  cmd/api/main.go:693:42: h.HandleSovereignTopology undefined

That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never
published — only the UI image rolled. Result: omantel.biz catalyst-api
pod stuck in ImagePullBackOff.

Drop the four route registrations. Same baby, new address — the chroot
Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via
the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/*
endpoints.

Also revert two more parallel-baby fragments still on main:
  - getHierarchicalInfrastructure mode-aware fetcher → single mother
    URL (the chroot resolves deploymentId from the cookie and the
    mother-side topology handler serves byte-identical data once
    cutover-import has persisted the deployment record on the
    Sovereign's local store)
  - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere

Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster
Kustomization version pin to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign

The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api
binary as the mother. When that binary runs ON the Sovereign cluster
(catalyst-system namespace on the Sovereign itself), there is no
posted-back kubeconfig — the catalyst-api IS in the cluster it needs
to talk to, and rest.InClusterConfig() returns the right credentials.

Without this, every endpoint that needs the Sovereign-side dynamic
client returned 503 with "sovereign cluster kubeconfig not yet posted
back" — including ListUserAccess (/users page), CreateUserAccess,
infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users
rendered "list user-access: HTTP 503" because the Sovereign-side
catalyst-api was looking for a kubeconfig that doesn't exist on the
chroot side of the cutover boundary.

Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api
deployment by the chart) matches dep.Request.SovereignFQDN. On the
mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot,
SOVEREIGN_FQDN matches the only deployment served (its own) → use
in-cluster.

Same fallback applied to tryDynamicClientLocked (loaderInputFor's
best-effort live-source client) so /infrastructure/topology and the
/cloud graph render with live data on the chroot too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(user-access): empty list when CRD absent + RBAC for chroot

Two coupled fixes for the /users page on chroot Sovereign Console:

1. catalyst-api-cutover-driver ClusterRole: grant read/write on
   useraccesses.access.openova.io. The Sovereign chroot's catalyst-api
   uses the in-cluster ServiceAccount (per PR #1052). The list call
   was returning 403 from the apiserver because the SA had no rule
   covering this CRD.

2. ListUserAccess: return 200 with empty items when the CRD itself
   is not installed (apierrors.IsNotFound). The access.openova.io
   CRD ships via a separate blueprint that may not yet be installed
   on a fresh Sovereign — the page should render its empty state,
   not a 500 toast.

Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the
in-cluster client path: list call surfaced first as 403 (RBAC), then
as 500 "server could not find the requested resource" (CRD absent).
Both now resolve to a 200 + [].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint

Two parallel-baby paths still made the chroot diverge from the mother
on /cloud and /jobs/{jobId}. Both now ship one path that serves
byte-identical data on both surfaces.

1. CloudPage rendered fictional topology (Frankfurt, Helsinki,
   omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when
   the topology query errored — because it fell back to
   `infrastructureTopologyFixture` from `src/test/fixtures/`. That is
   a test-only file leaking into production via the production import
   tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no
   placeholder data — empty state when you don't know).

   Fix: drop the fixture fallback. On error → null → empty-state
   render. The mother shows the same empty state when its loader
   returns nothing; byte-identical.

2. JobsTable + JobDetail rendered a flat green-grid because the chroot
   was hitting `/api/v1/sovereign/jobs` which returns a minimal shape
   (no dependsOn, no parentId, no exec records). Mother's
   `/api/v1/deployments/{depId}/jobs` returns the rich shape from a
   per-deployment jobs.Store, which on the chroot starts empty (the
   mother's exportDeploymentToChild only ships the deployment record,
   not the jobs.Store contents).

   Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`.
   Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when
   SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per-
   deployment jobs.Store has 0 records: do a one-shot HelmRelease
   list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases
   — exported here, mirrors Watcher.SnapshotComponents without
   spinning up an informer), pass through snapshotsToSeeds +
   Bridge.SeedJobsFromInformerList. Subsequent calls read directly
   from the now-populated store and return rich Job records with
   dependsOn / parentId / status — exactly like the mother.

   useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI
   uses the same `/api/v1/deployments/{id}/jobs` URL as the mother.

3. HandleDeploymentImport now also loads the imported record into the
   in-memory deployments map immediately, so `/deployments/{id}/*`
   handlers don't need a pod restart's restoreFromStore to see the
   chroot-imported deployment.

Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s

JobDetail navigation was 404ing on the chroot because the link builder
URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak")
and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does
not decode `%3A` inside path segments. The catalyst-api router saw
the literal "%3A" and Store.GetJob's exact-match path missed.

Two coupled fixes:

1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding,
   producing /jobs/install-keycloak (Traefik-safe) instead of
   /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already
   accepts both bare jobName and canonical id (see store.go:781-789).

2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so
   the URL param resolves regardless of which format the link emitted.

Bump chart 1.4.58 → 1.4.59.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined

CloudPage's topology query fired against /deployments/undefined/...
on the chroot (URL is /cloud, no deploymentId path segment), so the
page showed "Couldn't load architecture" with all node counts at 0/0.

Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the
JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling
back from URL params. Topology query also gates on `!!deploymentId`
so it doesn't waste a 404 round-trip during cookie resolution.

Bump chart 1.4.60 → 1.4.61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): single chrome — no frame in frame, no mother handover banner

Two visible bleed-throughs from the mother's wizard UX onto the
chroot Sovereign Console at console.<sov-fqdn>:

1. **Two stacked headers + sidebar inside sidebar** ("frame in frame").
   SovereignConsoleLayout rendered its own sidebar+header AND the page
   inside rendered PortalShell which rendered ANOTHER header (its
   sidebar was already skipped for chroot per a prior fix). User saw
   two horizontal title bars stacked.

   Resolution: SovereignConsoleLayout becomes auth-only on the chroot.
   It runs the cookie/OIDC auth gate + RequiredActionsModal, then
   renders <Outlet/> with NO chrome. PortalShell is now the single
   chrome owner on both surfaces:
     - Mother (/sovereign/provision/$id): renders Sidebar with
       /provision/$id/X URLs + its header.
     - Chroot (console.<sov-fqdn>):       renders SovereignSidebar
       with clean /X URLs + the same header.
   One sidebar, one header, byte-identical to mother layout.

2. **"✓ Sovereign is ready — Redirecting to your Sovereign console"
   banner on /apps.** This is the mother's wizard celebration that
   tells the operator "you can now jump to your new Sovereign". On
   the chroot the operator IS already on the Sovereign Console; the
   banner bleeds through because the imported deployment record
   carries the mother's handover-ready event in its history.

   Resolution: AppsPage gates the banner, the toast, and the
   auto-redirect timer on `!isSovereignMode`. Chroot stays clean.

Bump chart 1.4.62 → 1.4.63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page

Three chroot-only pages bypassed PortalShell entirely. After
SovereignConsoleLayout went auth-only in #1057, they rendered
full-bleed with no sidebar / no header — visible look-and-feel break.

  /settings/marketplace   → MarketplaceSettings  (wrapped in PortalShell)
  /parent-domains         → ParentDomainsPage    (wrapped in PortalShell)
  /catalog                → CatalogAdminPage     (deleted)

Drop /catalog entirely per founder direction: a separate page just
to flip a "publish to marketplace" boolean per app is the wrong
shape. The natural place for that toggle is on each /apps card
(future PR — needs HandleSovereignApps to join publish state from
the SME catalog microservice). Removed:
  - /catalog route registration in router.tsx
  - 'Catalog' entry in SovereignSidebar's FLAT_NAV
  - CatalogAdminPage.tsx (525 lines)
  - 'catalog' from ActiveSection union + deriveActiveSection regex

The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish
on the SME catalog service is unaffected; it's exposed at
marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future
apps-card toggle will call it via the same path.

Bump chart 1.4.64 → 1.4.65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(apps): publish chip on each card — replaces deleted /catalog page

Per founder direction: "if the catalog is just labeling an app to be
shown in marketplace, why don't we do it through the apps?" — drop
the standalone /catalog page (#1058), put the publish toggle on each
/apps card.

Backend (catalyst-api):
- New file sme_catalog_client.go — best-effort client for the
  in-cluster SME catalog microservice at
  http://catalog.sme.svc.cluster.local:8082. 30s response cache,
  1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier
  not deployed on this Sovereign — common when marketplace.enabled
  is false).
- HandleSovereignApps decorates each app with `marketplacePublished`
  *bool joined by slug from the SME catalog. nil ⇒ slug not in SME
  catalog (bootstrap component, or marketplace not deployed) ⇒ FE
  suppresses the chip.
- New handler HandleSovereignAppPublish at PATCH
  /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}.
  Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME
  catalog. Surfaces upstream status verbatim. Invalidates the cache
  so the next /apps poll reflects the change immediately.

Frontend (AppsPage):
- liveAppsQuery returns { statusById, publishedBySlug } instead of
  the bare status map.
- Each AppCard with a non-null marketplacePublished renders a
  PUBLISHED / UNPUBLISHED chip alongside the status chip. Click →
  PATCH → optimistic refetch via React Query.
- Bootstrap components and apps not in the SME catalog have nil →
  no chip (correct: nothing to toggle).
- Cards with marketplace.enabled=false render no chips at all (SME
  catalog unreachable → nil for every slug).

Bump chart 1.4.66 → 1.4.67.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code

Audit triggered by founder asking if PRs #1051..#1059 reach NEW
Sovereigns or just my manual `kubectl set image` patches on omantel.
Answer was: nothing reached anyone except omantel via manual patches.
Both contabo AND every fresh Sovereign would install :2122fb8 — the
SHA frozen at PR #1040's last manual chart-touch on May 6 morning.

Root cause:
- chart/templates/api-deployment.yaml + ui-deployment.yaml carry
  LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"),
  not Helm-templated `{{ .Values.images.catalystApi.tag }}`.
- catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag
  on every push — but no template reads from it. Dead code.
- contabo's catalyst-platform Flux Kustomization at
  ./products/catalyst/chart/templates applies these as raw manifests.
- Sovereigns Helm-install the same chart; Helm passes the literal
  through unchanged.
- Both ended up frozen at whatever literal was committed at the last
  manual chart-touching PR.

Fix:
1. CI's deploy step now bumps both the literal SHAs in the two
   template files AND the unused-but-kept-for-SME-services
   values.yaml. Sed-patches the literal directly so contabo's Kustomize
   path keeps working.
2. The commit step adds the two templates to the staged set alongside
   values.yaml, so every "deploy: update catalyst images to <sha>"
   commit propagates to contabo (10-min reconcile) AND Sovereigns
   (next OCI chart publish via blueprint-release).
3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with
   the latest literal (currently :8361df4) gets republished and
   pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml.

Why drop the "freeze contabo" intent of the previous comment:
The previous comment said contabo auto-roll on every PR was bad
because PR #975's image broke contabo (k8scache startup loop).
Solution there is: fix the bug in the code, not freeze contabo.
Freezing masked real divergence — the reason the founder caught
this is that manual omantel patches were the only thing keeping
omantel current while contabo + every other fresh Sovereign quietly
ran 9 PRs behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-06 21:10:31 +04:00 committed by GitHub
parent 66eca90c16
commit eb6a3c1812
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 51 additions and 22 deletions

View File

@ -339,17 +339,29 @@ jobs:
echo "values.yaml after update:"
grep -A2 "catalystUi\|catalystApi" "${VALUES}" | head -10
# NOTE: the literal image refs in templates/api-deployment.yaml and
# templates/ui-deployment.yaml are deliberately NOT auto-bumped here.
# Those manifests are what contabo's Kustomize-path Flux reconciles —
# auto-bumping them auto-rolls contabo on every PR, which broke
# contabo on 2026-05-05 (k8scache startup loop on dead Sovereign
# kubeconfigs in PR #975's image). contabo rolls ONLY when an
# operator manually edits + commits those files (see
# docs/RUNBOOK-CONTABO-IMAGE-BUMP.md). Sovereigns are unaffected:
# they install via OCI chart whose values.yaml carries the new SHA
# (bumped above), which gets picked up at the next blueprint-release
# publish below.
# ALSO bump the literal image refs in the chart templates.
# Sovereigns Helm-install this chart and contabo applies it
# via Kustomize — both consume the literal directly because
# kustomize-controller can't render Helm templates. Without
# this auto-bump, every Sovereign provisioned after 2026-05-06
# was installing :2122fb8 (frozen at PR #1040's chart-touch),
# so PRs #1051..#1059 never reached anyone except via manual
# `kubectl set image` patches on omantel.
API_TPL="products/catalyst/chart/templates/api-deployment.yaml"
UI_TPL="products/catalyst/chart/templates/ui-deployment.yaml"
sed -i -E "s|(image: \"ghcr\.io/openova-io/openova/catalyst-api:)[^\"]*\"|\1${SHA_SHORT}\"|" "${API_TPL}"
sed -i -E "s|(image: \"ghcr\.io/openova-io/openova/catalyst-ui:)[^\"]*\"|\1${SHA_SHORT}\"|" "${UI_TPL}"
echo "templates after update:"
grep -E "image: \".*catalyst-(api|ui):" "${API_TPL}" "${UI_TPL}"
# contabo's catalyst-platform Kustomization at
# ./products/catalyst/chart/templates reconciles every 10 min
# — it will pick up the bumped literal on the next interval.
# If the new image breaks contabo, an operator can revert the
# template SHA via a follow-up PR; the previous "freeze"
# behaviour was masking real bugs (contabo silently ran an
# old image while the Sovereign provisioning churned through
# the same SHA being fixed downstream).
- name: Commit and push manifest updates
id: deploy_commit
@ -358,12 +370,16 @@ jobs:
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
# Only the chart's values.yaml is auto-bumped — Sovereigns consume
# the OCI chart bump via blueprint-release. Contabo's Kustomize-path
# deployment manifests are intentionally NOT touched here so contabo
# never auto-rolls; an operator manually bumps those files when a
# specific catalyst-api/ui SHA has been validated against contabo.
git add products/catalyst/chart/values.yaml
# values.yaml + the two literal-image templates (api-deployment,
# ui-deployment) are bumped together so:
# - Sovereigns get the new SHA via the next OCI chart publish
# (blueprint-release fires below).
# - contabo's Kustomize-path Flux reconciles the bumped literal
# within 10 min.
# Both surfaces converge on the same SHA on every push.
git add products/catalyst/chart/values.yaml \
products/catalyst/chart/templates/api-deployment.yaml \
products/catalyst/chart/templates/ui-deployment.yaml
if git diff --staged --quiet; then
echo "No changes to commit"
echo "pushed=false" >> "$GITHUB_OUTPUT"

View File

@ -231,7 +231,7 @@ spec:
# fallback (data renders the moment cutover-import lands without
# waiting for the orchestrator's chart-values overlay write).
# 2026-05-05.
version: 1.4.68
version: 1.4.70
sourceRef:
kind: HelmRepository
name: bp-catalyst-platform

View File

@ -124,8 +124,8 @@ name: bp-catalyst-platform
# otech113 2026-05-05 — chart 0.1.18 fixed the readiness-probe loop
# but every trigger immediately got 502 in <10ms (synchronous
# apiserver permission rejection). 2026-05-05.
version: 1.4.68
appVersion: 1.4.68
version: 1.4.70
appVersion: 1.4.70
description: |
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
Composes the catalyst-{ui,api}, console, admin, marketplace UI modules and the marketplace-api backend.

View File

@ -152,7 +152,15 @@ spec:
fsGroupChangePolicy: OnRootMismatch
containers:
- name: catalyst-api
image: "ghcr.io/openova-io/openova/catalyst-api:2122fb8"
# Literal image ref — required for the contabo-mkt Kustomize
# path (kustomize-controller doesn't render Helm templates).
# Auto-bumped by .github/workflows/catalyst-build.yaml's deploy
# step on every push to main, so Sovereigns AND contabo both
# roll to the latest catalyst-api SHA. The matching
# values.yaml `images.catalystApi.tag` is also bumped (but
# unused for catalyst-api; kept for SME services that DO read
# from values).
image: "ghcr.io/openova-io/openova/catalyst-api:8361df4"
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080

View File

@ -19,7 +19,12 @@ spec:
- name: ghcr-pull
containers:
- name: catalyst-ui
image: "ghcr.io/openova-io/openova/catalyst-ui:2122fb8"
# Literal image ref — required for the contabo-mkt Kustomize
# path (kustomize-controller doesn't render Helm templates).
# Auto-bumped by .github/workflows/catalyst-build.yaml's deploy
# step on every push to main, so Sovereigns AND contabo both
# roll to the latest catalyst-ui SHA.
image: "ghcr.io/openova-io/openova/catalyst-ui:8361df4"
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080