openova/products/catalyst
e3mrah c9507c8369
fix(catalyst-api): durable Phase-1 watcher across Pod restart (#830) (#833)
The Phase-1 helmwatch watcher used to lose state on every catalyst-api
Pod roll. fromRecord rewrote any "phase1-watching" status to "failed"
on the next Pod start — even though Phase 0 had already committed its
tofu state, the Sovereign cluster was healthy, the kubeconfig was on
the PVC, and the bootstrap-kit HelmReleases kept reconciling regardless
of whether catalyst-api's in-memory watcher was alive.

Caught live on otech102 (2026-05-04): a transient catalyst-api roll
mid-Phase-1 latched the deployment record to status=failed, the auto-
fire handover never triggered, and the operator was stranded on the
wizard page. Manual workaround was patching the record back to
status=ready + minting handover token by hand.

Fix: split the in-flight rewrite into two cases:
  - Phase-0 in-flight (pending/provisioning/tofu-applying/flux-
    bootstrapping) — STILL rewritten to failed (tofu workdir on /tmp
    emptyDir died with the Pod, Hetzner resources orphaned).
  - phase1-watching — preserved across restart so the post-restart
    resume path picks it up via shouldResumePhase1 + resumePhase1Watch
    (already wired). The on-disk store record stays consistent with
    the in-memory state during rehydrate.

Helmwatch's existing resume path (jobs_backfill.go) is idempotent —
it just observes HelmRelease.status, never patches/applies, so a fresh
informer over the same kubeconfig produces the same per-component
events the previous Pod was streaming.

Also:
  - Added isPhase0InFlightStatus helper to distinguish the two
    semantics; isInFlightStatus retained for release-subdomain conflict
    check (still includes phase1-watching — won't release a slot mid-
    Phase-1).
  - Updated TestPodRestart_StuckPhase1WatchingRewrittenToFailed →
    TestPodRestart_Phase1WatchingPreservedNotRewrittenToFailed (now
    asserts the new correct behavior).
  - New test TestPodRestart_Phase1WatchingResumesWithKubeconfig proves
    the gating decision (shouldResumePhase1=true) and the preserved
    Status value.
  - New parameterized test TestPodRestart_Phase0InFlightStillRewritten
    ToFailed proves the Phase-0 carve-out still works for all four
    Phase-0 statuses.
  - Updated TestShouldResumePhase1_GatesProperly cases to reflect the
    new phase1-watching=resumable / Phase-0=non-resumable split.

Issue: openova-io/openova#830 (Bug 3)

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:28:07 +04:00
..
bootstrap fix(catalyst-api): durable Phase-1 watcher across Pod restart (#830) (#833) 2026-05-04 23:28:07 +04:00
chart fix(bp-catalyst-platform): add cutover-driver RBAC for catalyst-api (#830) (#831) 2026-05-04 23:26:51 +04:00
README.md feat(consolidation): Phase 1 — move Catalyst-Zero apps + CI + manifests into public monorepo 2026-04-28 12:08:09 +02:00

OpenOva Catalyst (composite Blueprint)

The umbrella Blueprint bp-catalyst-platform — composes the Catalyst control plane.

Status: Deployed. Updated: 2026-04-28.

This product directory contains:

  • chart/ — the Helm chart that deploys Catalyst-Zero on a Kubernetes cluster (and every franchised Sovereign).
  • chart/templates/{ui,api}-deployment.yaml + service + ingress — the catalyst-ui (React SPA wizard scaffold) and catalyst-api (Go bootstrap API) workloads.
  • chart/templates/sme-services/ — 11 manifests for the legacy SME backend services + the consolidated console, admin, marketplace UI workloads (sourced from core/{console,admin,marketplace}/).
  • chart/templates/marketplace-api/ — manifests for the Go marketplace-api backend (sourced from core/marketplace-api/).
  • bootstrap/{ui,api}/ — the source code for catalyst-ui and catalyst-api (deployed via the catalyst-build CI workflow).

For the unified architecture and the wizard's target shape, see docs/PROVISIONING-PLAN.md, docs/ARCHITECTURE.md, and docs/SOVEREIGN-PROVISIONING.md.


How Catalyst-Zero is deployed today

A Flux Kustomization on the Catalyst-Zero cluster (Contabo k3s) reconciles products/catalyst/chart/templates/ from this public repo. CI workflows (.github/workflows/{catalyst,console,admin,marketplace,marketplace-api}-build.yaml) build and push images on every push to main, then the deploy step pins the image SHA into the corresponding manifest in this directory and commits back. Flux picks up the commit and rolls the deployment.

Image registry: ghcr.io/openova-io/openova/{catalyst-ui,catalyst-api,console,admin,marketplace,marketplace-api}:<sha>.

Migration status (per docs/PROVISIONING-PLAN.md)

Component Source location Image Status
catalyst-ui products/catalyst/bootstrap/ui/ ghcr.io/openova-io/openova/catalyst-ui public repo
catalyst-api products/catalyst/bootstrap/api/ ghcr.io/openova-io/openova/catalyst-api public repo
console core/console/ ghcr.io/openova-io/openova/console public repo (Phase 1)
admin core/admin/ ghcr.io/openova-io/openova/admin public repo (Phase 1)
marketplace core/marketplace/ ghcr.io/openova-io/openova/marketplace public repo (Phase 1)
marketplace-api core/marketplace-api/ ghcr.io/openova-io/openova/marketplace-api public repo (Phase 1)
sme-{auth,billing,catalog,domain,gateway,notification,provisioning,tenant} (still in openova-private/services/) ghcr.io/openova-io/openova-private/sme-* follow-up phase — source not yet moved