fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs — Sovereigns + contabo were frozen at :2122fb8 (#1060)
* fix(catalyst-api): rip out dangling sovereign_* route registrations + chart 1.4.56 PR #1050 deleted sovereign_more.go (which defined HandleSovereignUsers, HandleSovereignCatalog, HandleSovereignSettings, HandleSovereignTopology) but left four route registrations in cmd/api/main.go that still referenced those handler methods. The catalyst-api build for the merged revert (run 25439549879) failed with: cmd/api/main.go:690:39: h.HandleSovereignUsers undefined cmd/api/main.go:691:41: h.HandleSovereignCatalog undefined cmd/api/main.go:692:42: h.HandleSovereignSettings undefined cmd/api/main.go:693:42: h.HandleSovereignTopology undefined That's why ghcr.io/openova-io/openova/catalyst-api:fdd3354 was never published — only the UI image rolled. Result: omantel.biz catalyst-api pod stuck in ImagePullBackOff. Drop the four route registrations. Same baby, new address — the chroot Sovereign uses the existing /api/v1/deployments/{depId}/* handlers via the JWT-resolved deploymentId, not parallel-baby /api/v1/sovereign/* endpoints. Also revert two more parallel-baby fragments still on main: - getHierarchicalInfrastructure mode-aware fetcher → single mother URL (the chroot resolves deploymentId from the cookie and the mother-side topology handler serves byte-identical data once cutover-import has persisted the deployment record on the Sovereign's local store) - CatalogAdminPage.fetchApps mode-aware → /catalog/apps everywhere Bump bp-catalyst-platform chart 1.4.55 → 1.4.56 and the cluster Kustomization version pin to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sovereignDynamicClient): in-cluster fallback when running ON the Sovereign The chroot Sovereign Console at console.<sov-fqdn> is the SAME catalyst-api binary as the mother. When that binary runs ON the Sovereign cluster (catalyst-system namespace on the Sovereign itself), there is no posted-back kubeconfig — the catalyst-api IS in the cluster it needs to talk to, and rest.InClusterConfig() returns the right credentials. Without this, every endpoint that needs the Sovereign-side dynamic client returned 503 with "sovereign cluster kubeconfig not yet posted back" — including ListUserAccess (/users page), CreateUserAccess, infrastructure CRUD, etc. Caught on omantel.biz 2026-05-06: /users rendered "list user-access: HTTP 503" because the Sovereign-side catalyst-api was looking for a kubeconfig that doesn't exist on the chroot side of the cutover boundary. Detection: SOVEREIGN_FQDN env (set on every Sovereign-side catalyst-api deployment by the chart) matches dep.Request.SovereignFQDN. On the mother, SOVEREIGN_FQDN is unset → unchanged behavior. On the chroot, SOVEREIGN_FQDN matches the only deployment served (its own) → use in-cluster. Same fallback applied to tryDynamicClientLocked (loaderInputFor's best-effort live-source client) so /infrastructure/topology and the /cloud graph render with live data on the chroot too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(user-access): empty list when CRD absent + RBAC for chroot Two coupled fixes for the /users page on chroot Sovereign Console: 1. catalyst-api-cutover-driver ClusterRole: grant read/write on useraccesses.access.openova.io. The Sovereign chroot's catalyst-api uses the in-cluster ServiceAccount (per PR #1052). The list call was returning 403 from the apiserver because the SA had no rule covering this CRD. 2. ListUserAccess: return 200 with empty items when the CRD itself is not installed (apierrors.IsNotFound). The access.openova.io CRD ships via a separate blueprint that may not yet be installed on a fresh Sovereign — the page should render its empty state, not a 500 toast. Caught live on omantel.biz 2026-05-06 after PR #1052 unblocked the in-cluster client path: list call surfaced first as 403 (RBAC), then as 500 "server could not find the requested resource" (CRD absent). Both now resolve to a 200 + []. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): byte-identical /jobs + /cloud — kill fixture fallback, lazy-seed jobs.Store from live cluster, single endpoint Two parallel-baby paths still made the chroot diverge from the mother on /cloud and /jobs/{jobId}. Both now ship one path that serves byte-identical data on both surfaces. 1. CloudPage rendered fictional topology (Frankfurt, Helsinki, omantel-primary, omantel-secondary, edge-lb, vpc-net-eu, …) when the topology query errored — because it fell back to `infrastructureTopologyFixture` from `src/test/fixtures/`. That is a test-only file leaking into production via the production import tree, in direct violation of INVIOLABLE-PRINCIPLES #1 (no placeholder data — empty state when you don't know). Fix: drop the fixture fallback. On error → null → empty-state render. The mother shows the same empty state when its loader returns nothing; byte-identical. 2. JobsTable + JobDetail rendered a flat green-grid because the chroot was hitting `/api/v1/sovereign/jobs` which returns a minimal shape (no dependsOn, no parentId, no exec records). Mother's `/api/v1/deployments/{depId}/jobs` returns the rich shape from a per-deployment jobs.Store, which on the chroot starts empty (the mother's exportDeploymentToChild only ships the deployment record, not the jobs.Store contents). Fix: ship one URL on both surfaces — `/api/v1/deployments/{id}/jobs`. Add `chrootSeedJobsStoreIfEmpty` that runs at handler-time when SOVEREIGN_FQDN matches dep.Request.SovereignFQDN AND the per- deployment jobs.Store has 0 records: do a one-shot HelmRelease list via the in-cluster client (helmwatch.ListAndSnapshotHelmReleases — exported here, mirrors Watcher.SnapshotComponents without spinning up an informer), pass through snapshotsToSeeds + Bridge.SeedJobsFromInformerList. Subsequent calls read directly from the now-populated store and return rich Job records with dependsOn / parentId / status — exactly like the mother. useLiveJobsBackfill loses its mode-aware fetcher; the chroot UI uses the same `/api/v1/deployments/{id}/jobs` URL as the mother. 3. HandleDeploymentImport now also loads the imported record into the in-memory deployments map immediately, so `/deployments/{id}/*` handlers don't need a pod restart's restoreFromStore to see the chroot-imported deployment. Bump bp-catalyst-platform 1.4.56 → 1.4.57 (chart + Kustomization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jobdetail): bare-jobName URL — Traefik strips %3A so canonical id 404s JobDetail navigation was 404ing on the chroot because the link builder URL-encoded the canonical Job id ("69e73b3abe673840:install-keycloak") and Traefik (or any upstream proxy that's RFC 3986 §3.3-strict) does not decode `%3A` inside path segments. The catalyst-api router saw the literal "%3A" and Store.GetJob's exact-match path missed. Two coupled fixes: 1. useJobLinkBuilder strips the "<deploymentId>:" prefix before encoding, producing /jobs/install-keycloak (Traefik-safe) instead of /jobs/69e73b3abe673840%3Ainstall-keycloak. Store.GetJob already accepts both bare jobName and canonical id (see store.go:781-789). 2. JobDetail.jobsById indexes by BOTH canonical id AND bare jobName so the URL param resolves regardless of which format the link emitted. Bump chart 1.4.58 → 1.4.59. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cloud): resolve deploymentId from cookie on chroot — was firing topology against undefined CloudPage's topology query fired against /deployments/undefined/... on the chroot (URL is /cloud, no deploymentId path segment), so the page showed "Couldn't load architecture" with all node counts at 0/0. Fix: same pattern as JobDetail — useResolvedDeploymentId() reads the JWT cookie's deployment_id claim via /api/v1/sovereign/self, falling back from URL params. Topology query also gates on `!!deploymentId` so it doesn't waste a 404 round-trip during cookie resolution. Bump chart 1.4.60 → 1.4.61. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): single chrome — no frame in frame, no mother handover banner Two visible bleed-throughs from the mother's wizard UX onto the chroot Sovereign Console at console.<sov-fqdn>: 1. **Two stacked headers + sidebar inside sidebar** ("frame in frame"). SovereignConsoleLayout rendered its own sidebar+header AND the page inside rendered PortalShell which rendered ANOTHER header (its sidebar was already skipped for chroot per a prior fix). User saw two horizontal title bars stacked. Resolution: SovereignConsoleLayout becomes auth-only on the chroot. It runs the cookie/OIDC auth gate + RequiredActionsModal, then renders <Outlet/> with NO chrome. PortalShell is now the single chrome owner on both surfaces: - Mother (/sovereign/provision/$id): renders Sidebar with /provision/$id/X URLs + its header. - Chroot (console.<sov-fqdn>): renders SovereignSidebar with clean /X URLs + the same header. One sidebar, one header, byte-identical to mother layout. 2. **"✓ Sovereign is ready — Redirecting to your Sovereign console" banner on /apps.** This is the mother's wizard celebration that tells the operator "you can now jump to your new Sovereign". On the chroot the operator IS already on the Sovereign Console; the banner bleeds through because the imported deployment record carries the mother's handover-ready event in its history. Resolution: AppsPage gates the banner, the toast, and the auto-redirect timer on `!isSovereignMode`. Chroot stays clean. Bump chart 1.4.62 → 1.4.63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): wrap chroot-only pages in PortalShell + drop /catalog page Three chroot-only pages bypassed PortalShell entirely. After SovereignConsoleLayout went auth-only in #1057, they rendered full-bleed with no sidebar / no header — visible look-and-feel break. /settings/marketplace → MarketplaceSettings (wrapped in PortalShell) /parent-domains → ParentDomainsPage (wrapped in PortalShell) /catalog → CatalogAdminPage (deleted) Drop /catalog entirely per founder direction: a separate page just to flip a "publish to marketplace" boolean per app is the wrong shape. The natural place for that toggle is on each /apps card (future PR — needs HandleSovereignApps to join publish state from the SME catalog microservice). Removed: - /catalog route registration in router.tsx - 'Catalog' entry in SovereignSidebar's FLAT_NAV - CatalogAdminPage.tsx (525 lines) - 'catalog' from ActiveSection union + deriveActiveSection regex The publish-state PATCH endpoint at /catalog/admin/apps/{slug}/publish on the SME catalog service is unaffected; it's exposed at marketplace.<sov-fqdn>, not console.<sov-fqdn>, and the future apps-card toggle will call it via the same path. Bump chart 1.4.64 → 1.4.65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(apps): publish chip on each card — replaces deleted /catalog page Per founder direction: "if the catalog is just labeling an app to be shown in marketplace, why don't we do it through the apps?" — drop the standalone /catalog page (#1058), put the publish toggle on each /apps card. Backend (catalyst-api): - New file sme_catalog_client.go — best-effort client for the in-cluster SME catalog microservice at http://catalog.sme.svc.cluster.local:8082. 30s response cache, 1.5s probe budget, returns nil on DNS NXDOMAIN (SME services tier not deployed on this Sovereign — common when marketplace.enabled is false). - HandleSovereignApps decorates each app with `marketplacePublished` *bool joined by slug from the SME catalog. nil ⇒ slug not in SME catalog (bootstrap component, or marketplace not deployed) ⇒ FE suppresses the chip. - New handler HandleSovereignAppPublish at PATCH /api/v1/sovereign/apps/{slug}/publish. Body {"published": bool}. Proxies to PATCH /catalog/admin/apps/{slug}/publish on the SME catalog. Surfaces upstream status verbatim. Invalidates the cache so the next /apps poll reflects the change immediately. Frontend (AppsPage): - liveAppsQuery returns { statusById, publishedBySlug } instead of the bare status map. - Each AppCard with a non-null marketplacePublished renders a PUBLISHED / UNPUBLISHED chip alongside the status chip. Click → PATCH → optimistic refetch via React Query. - Bootstrap components and apps not in the SME catalog have nil → no chip (correct: nothing to toggle). - Cards with marketplace.enabled=false render no chips at all (SME catalog unreachable → nil for every slug). Bump chart 1.4.66 → 1.4.67. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart,ci): auto-bump literal catalyst-{api,ui} SHAs so all Sovereigns + contabo get fresh code Audit triggered by founder asking if PRs #1051..#1059 reach NEW Sovereigns or just my manual `kubectl set image` patches on omantel. Answer was: nothing reached anyone except omantel via manual patches. Both contabo AND every fresh Sovereign would install :2122fb8 — the SHA frozen at PR #1040's last manual chart-touch on May 6 morning. Root cause: - chart/templates/api-deployment.yaml + ui-deployment.yaml carry LITERAL image refs ("ghcr.io/openova-io/openova/catalyst-api:2122fb8"), not Helm-templated `{{ .Values.images.catalystApi.tag }}`. - catalyst-build CI's deploy step bumped values.yaml's catalystApi.tag on every push — but no template reads from it. Dead code. - contabo's catalyst-platform Flux Kustomization at ./products/catalyst/chart/templates applies these as raw manifests. - Sovereigns Helm-install the same chart; Helm passes the literal through unchanged. - Both ended up frozen at whatever literal was committed at the last manual chart-touching PR. Fix: 1. CI's deploy step now bumps both the literal SHAs in the two template files AND the unused-but-kept-for-SME-services values.yaml. Sed-patches the literal directly so contabo's Kustomize path keeps working. 2. The commit step adds the two templates to the staged set alongside values.yaml, so every "deploy: update catalyst images to <sha>" commit propagates to contabo (10-min reconcile) AND Sovereigns (next OCI chart publish via blueprint-release). 3. Bump bp-catalyst-platform 1.4.68 → 1.4.69 so the new chart with the latest literal (currently :8361df4) gets republished and pinned in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml. Why drop the "freeze contabo" intent of the previous comment: The previous comment said contabo auto-roll on every PR was bad because PR #975's image broke contabo (k8scache startup loop). Solution there is: fix the bug in the code, not freeze contabo. Freezing masked real divergence — the reason the founder caught this is that manual omantel patches were the only thing keeping omantel current while contabo + every other fresh Sovereign quietly ran 9 PRs behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
66eca90c16
commit
eb6a3c1812
50
.github/workflows/catalyst-build.yaml
vendored
50
.github/workflows/catalyst-build.yaml
vendored
@ -339,17 +339,29 @@ jobs:
|
||||
echo "values.yaml after update:"
|
||||
grep -A2 "catalystUi\|catalystApi" "${VALUES}" | head -10
|
||||
|
||||
# NOTE: the literal image refs in templates/api-deployment.yaml and
|
||||
# templates/ui-deployment.yaml are deliberately NOT auto-bumped here.
|
||||
# Those manifests are what contabo's Kustomize-path Flux reconciles —
|
||||
# auto-bumping them auto-rolls contabo on every PR, which broke
|
||||
# contabo on 2026-05-05 (k8scache startup loop on dead Sovereign
|
||||
# kubeconfigs in PR #975's image). contabo rolls ONLY when an
|
||||
# operator manually edits + commits those files (see
|
||||
# docs/RUNBOOK-CONTABO-IMAGE-BUMP.md). Sovereigns are unaffected:
|
||||
# they install via OCI chart whose values.yaml carries the new SHA
|
||||
# (bumped above), which gets picked up at the next blueprint-release
|
||||
# publish below.
|
||||
# ALSO bump the literal image refs in the chart templates.
|
||||
# Sovereigns Helm-install this chart and contabo applies it
|
||||
# via Kustomize — both consume the literal directly because
|
||||
# kustomize-controller can't render Helm templates. Without
|
||||
# this auto-bump, every Sovereign provisioned after 2026-05-06
|
||||
# was installing :2122fb8 (frozen at PR #1040's chart-touch),
|
||||
# so PRs #1051..#1059 never reached anyone except via manual
|
||||
# `kubectl set image` patches on omantel.
|
||||
API_TPL="products/catalyst/chart/templates/api-deployment.yaml"
|
||||
UI_TPL="products/catalyst/chart/templates/ui-deployment.yaml"
|
||||
sed -i -E "s|(image: \"ghcr\.io/openova-io/openova/catalyst-api:)[^\"]*\"|\1${SHA_SHORT}\"|" "${API_TPL}"
|
||||
sed -i -E "s|(image: \"ghcr\.io/openova-io/openova/catalyst-ui:)[^\"]*\"|\1${SHA_SHORT}\"|" "${UI_TPL}"
|
||||
echo "templates after update:"
|
||||
grep -E "image: \".*catalyst-(api|ui):" "${API_TPL}" "${UI_TPL}"
|
||||
|
||||
# contabo's catalyst-platform Kustomization at
|
||||
# ./products/catalyst/chart/templates reconciles every 10 min
|
||||
# — it will pick up the bumped literal on the next interval.
|
||||
# If the new image breaks contabo, an operator can revert the
|
||||
# template SHA via a follow-up PR; the previous "freeze"
|
||||
# behaviour was masking real bugs (contabo silently ran an
|
||||
# old image while the Sovereign provisioning churned through
|
||||
# the same SHA being fixed downstream).
|
||||
|
||||
- name: Commit and push manifest updates
|
||||
id: deploy_commit
|
||||
@ -358,12 +370,16 @@ jobs:
|
||||
run: |
|
||||
git config user.name "github-actions[bot]"
|
||||
git config user.email "github-actions[bot]@users.noreply.github.com"
|
||||
# Only the chart's values.yaml is auto-bumped — Sovereigns consume
|
||||
# the OCI chart bump via blueprint-release. Contabo's Kustomize-path
|
||||
# deployment manifests are intentionally NOT touched here so contabo
|
||||
# never auto-rolls; an operator manually bumps those files when a
|
||||
# specific catalyst-api/ui SHA has been validated against contabo.
|
||||
git add products/catalyst/chart/values.yaml
|
||||
# values.yaml + the two literal-image templates (api-deployment,
|
||||
# ui-deployment) are bumped together so:
|
||||
# - Sovereigns get the new SHA via the next OCI chart publish
|
||||
# (blueprint-release fires below).
|
||||
# - contabo's Kustomize-path Flux reconciles the bumped literal
|
||||
# within 10 min.
|
||||
# Both surfaces converge on the same SHA on every push.
|
||||
git add products/catalyst/chart/values.yaml \
|
||||
products/catalyst/chart/templates/api-deployment.yaml \
|
||||
products/catalyst/chart/templates/ui-deployment.yaml
|
||||
if git diff --staged --quiet; then
|
||||
echo "No changes to commit"
|
||||
echo "pushed=false" >> "$GITHUB_OUTPUT"
|
||||
|
||||
@ -231,7 +231,7 @@ spec:
|
||||
# fallback (data renders the moment cutover-import lands without
|
||||
# waiting for the orchestrator's chart-values overlay write).
|
||||
# 2026-05-05.
|
||||
version: 1.4.68
|
||||
version: 1.4.70
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-catalyst-platform
|
||||
|
||||
@ -124,8 +124,8 @@ name: bp-catalyst-platform
|
||||
# otech113 2026-05-05 — chart 0.1.18 fixed the readiness-probe loop
|
||||
# but every trigger immediately got 502 in <10ms (synchronous
|
||||
# apiserver permission rejection). 2026-05-05.
|
||||
version: 1.4.68
|
||||
appVersion: 1.4.68
|
||||
version: 1.4.70
|
||||
appVersion: 1.4.70
|
||||
description: |
|
||||
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
|
||||
Composes the catalyst-{ui,api}, console, admin, marketplace UI modules and the marketplace-api backend.
|
||||
|
||||
@ -152,7 +152,15 @@ spec:
|
||||
fsGroupChangePolicy: OnRootMismatch
|
||||
containers:
|
||||
- name: catalyst-api
|
||||
image: "ghcr.io/openova-io/openova/catalyst-api:2122fb8"
|
||||
# Literal image ref — required for the contabo-mkt Kustomize
|
||||
# path (kustomize-controller doesn't render Helm templates).
|
||||
# Auto-bumped by .github/workflows/catalyst-build.yaml's deploy
|
||||
# step on every push to main, so Sovereigns AND contabo both
|
||||
# roll to the latest catalyst-api SHA. The matching
|
||||
# values.yaml `images.catalystApi.tag` is also bumped (but
|
||||
# unused for catalyst-api; kept for SME services that DO read
|
||||
# from values).
|
||||
image: "ghcr.io/openova-io/openova/catalyst-api:8361df4"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
|
||||
@ -19,7 +19,12 @@ spec:
|
||||
- name: ghcr-pull
|
||||
containers:
|
||||
- name: catalyst-ui
|
||||
image: "ghcr.io/openova-io/openova/catalyst-ui:2122fb8"
|
||||
# Literal image ref — required for the contabo-mkt Kustomize
|
||||
# path (kustomize-controller doesn't render Helm templates).
|
||||
# Auto-bumped by .github/workflows/catalyst-build.yaml's deploy
|
||||
# step on every push to main, so Sovereigns AND contabo both
|
||||
# roll to the latest catalyst-ui SHA.
|
||||
image: "ghcr.io/openova-io/openova/catalyst-ui:8361df4"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
|
||||
Loading…
Reference in New Issue
Block a user