Commit Graph

575 Commits

Author SHA1 Message Date
hatiyildiz
07d27b33d4 feat(routing): /flow route + JobsPage drops Tab strip + batch chip → /flow
Routing restructure (founder rejected PR #242 Tab-on-JobsPage pattern):

router.tsx:
- Register the new /provision/$deploymentId/flow route with FlowPage.
- Drop the validateSearch{ view: table|flow } wiring on /jobs — the
  Tab strip is gone, search params no longer drive view selection.
- Add validateSearch{ scope, view } on /flow so deep links survive
  unknown values.

JobsPage.tsx:
- Remove the entire jobs-view-tabs strip (JOBS_VIEW_TABS, setView,
  resolveJobsView). The Flow surface now lives at /flow.
- Add a "Show as Flow" button in the page header that navigates to
  /flow?scope=all. Founder spec: "[Show as Flow] button in JobsPage
  header → /flow?scope=all".
- Drop the JobsFlowView import + the activeView render switch.

JobsPage.test.tsx:
- Replace the BatchDetail-link assertion with a /flow?scope=batch:<id>
  assertion (the v3 routing model).
- Add anti-regression guards for the retired Tab strip + new Show-as-
  Flow button.

JobsTable.tsx:
- Batch chip in each row now Links to /flow?scope=batch:<batchId>
  (was previously a Link to the BatchDetail page). Founder spec:
  "JobsTable batch chip click navigates to /flow?scope=batch:<id>".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:39:45 +02:00
hatiyildiz
9241864604 feat(flow): FlowPage canvas at /flow with scope + mode + click semantics
New per-deployment flow canvas served at:
  /sovereign/provision/$deploymentId/flow

Routing contract:
- ?scope=all              → render every job in the deployment
- ?scope=batch:<id>       → filter to a single batch
- ?view=jobs|batches      → mode toggle (default = jobs)

Mode contract:
- Jobs mode: every job rendered as a bubble; node border colour by
  status. Single-click bubble → opens FloatingLogPane (right 25vw).
  Double-click bubble → navigates to /jobs/$jobId. Click empty
  canvas → closes the floating pane.
- Batches mode: each batch as a single supernode. Single-click →
  highlights it (no log pane — batches have no execution logs).
  Double-click → drills into Jobs mode scoped to that batch
  (URL becomes ?scope=batch:<id>).

Embedded variant (`embedded` prop) — used by JobDetail's Flow tab:
- Reduces canvas height to ~50vh.
- Hides the StatusStrip (JobDetail's header already shows job-level
  breadcrumb + status badge).
- `highlightJobId` prop pre-emphasises the parent job (thicker
  accent border + glow rect overlay).
- `deploymentIdOverride` prop bypasses TanStack Router's strict
  useParams(from:'/flow'), since JobDetail mounts FlowPage from a
  different route.

Single-vs-double-click: SVG `onClick` fires on every click in a double-
click, so we debounce the single-click handler 220ms — if a second click
arrives first, cancel the timer and fire the double-click handler
instead. Matches OS double-click threshold.

Per docs/INVIOLABLE-PRINCIPLES.md #1 (waterfall) — full target shape
in this PR: route, mode toggle, log pane, double-click drill, embedded
variant. Per #2 (no compromise) — pure SVG + computed bezier; reuses
the existing Sugiyama core in pipelineLayout.ts. Per #4 (never
hardcode) — every CSS token comes from --color-*; the 25vw width
binds to the spec verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:39:31 +02:00
hatiyildiz
63a289e3ce feat(flow): FloatingLogPane (25vw slide-in) + StatusStrip components
Two new presentational components for the v3 Flow surface:

FloatingLogPane (products/.../components/FloatingLogPane.tsx):
- Slide-in 25vw log viewer that overlays the right edge of the canvas.
- Reuses the canonical <ExecutionLogs /> body — no rebuild.
- Closes on X click, Escape key, or canvas-background click (handled
  by the FlowPage parent).
- Renders an empty-state branch when executionId is falsy (pending
  jobs without an execution row).

StatusStrip (products/.../components/StatusStrip.tsx):
- Top contextual strip mirroring provision-mockup.html's geometry:
  breadcrumb / provisioning pill (animated pulse) / progress bar /
  optional Jobs↔Batches mode toggle.
- Mode toggle is URL-driven via a parent-supplied onChange callback.
- All colours bind to existing theme tokens; light/dark theme stays
  intact (no new CSS variables).

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), every dimension /
status / count is a prop. Per #2 (no compromise), no graph library and
no Mantine — pure CSS-token-bound styles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:39:13 +02:00
hatiyildiz
a41a240626 feat(flow): pipelineLayout supports highlightJobId option
Add an optional `highlightJobId` to PipelineLayoutOptions. When set, the
matching FlowNode is emitted with `highlighted = true`, which the new
FlowPage canvas renders with a thicker accent-coloured border + glow.
Used by JobDetail's embedded Flow tab to draw the operator's eye to the
parent job on first paint. Pure flag — no layout change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:39:00 +02:00
github-actions[bot]
511374ed35 deploy: update catalyst images to 6746fde 2026-04-30 10:08:52 +00:00
e3mrah
6746fdefd0
fix(pipeline-layout): index jobs by both id and jobName so dependsOn resolves (#244)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 12:07:00 +02:00
github-actions[bot]
363346bfcd deploy: update catalyst images to 32c4168 2026-04-30 09:29:55 +00:00
e3mrah
32c41687a2
fix(jobs): wire HelmRelease spec.dependsOn → Job.dependsOn so Flow view edges render (#243)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 11:27:05 +02:00
github-actions[bot]
88a7f686e0 deploy: update catalyst images to 4bbe22c 2026-04-30 09:16:39 +00:00
e3mrah
4bbe22c8a6
feat(jobs): Flow tab — two-level Sugiyama (batches as meta-stages, jobs as inner stages) (#242)
Adds a Flow tab to /sovereign/provision/$id/jobs (peer of the
existing Table tab) that renders the dependency chain as a
two-level Sugiyama layered DAG:
  - outer: batches arranged as meta-stages, left → right
  - inner: jobs within each batch as stages, left → right

Layout is a pure function (lib/pipelineLayout.ts) with crossing-
minimising barycenter sweeps + dummy nodes for long edges; the same
sugiyama() impl runs at both scales. Edges are SVG paths — straight
lines for span 1, cubic bezier for span ≥ 2 so long edges curve over
empty stage columns. Cross-batch edges fan out into job-level arrows
when both lanes are expanded; collapse to a single meta-arrow when
either side is a supernode. Source-batch failure dashes the arrow
red ("blocked by upstream"). Default zoom: in-flight batches expanded;
all-succeeded batches collapsed to supernodes.

URL state: ?view=table (default) | flow — bookmarkable, browser-back
works. Search param is validated on the route so older deep links
without ?view= keep working unchanged.

Tests:
  - 34 unit tests for pipelineLayout: empty input, canonical 5-job
    fan-in (4 stages, 5 edges, 2→5 bezier, zero crossings), real
    bootstrap-kit (13 jobs, 5 stages, fan-in at external-dns, zero
    crossings), two-batch meta-DAG (cross-batch source = last stage
    of phase-0), collapse semantics, default-collapse policy.
  - 13 component tests for JobsFlowView: empty state, 5-job render,
    4-stage assertion, click batch toggle (collapse/expand in place),
    click job navigates to /provision/$id/jobs/$jobId, edge kind
    classification, blocked-edge marker.
  - 4 new e2e cosmetic guards: tab strip exists, Flow URL flips to
    ?view=flow + canvas mounts, expanded batch shows job cards +
    toggle shrinks to supernode, default-expanded for in-flight
    batches.

No new fetch path — JobsFlowView reuses the same flatJobs the
JobsTable consumes (useLiveJobsBackfill + reducer derivation).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:14:52 +02:00
github-actions[bot]
ea70f14deb deploy: update catalyst images to c9e2bfd 2026-04-30 08:22:28 +00:00
hatiyildiz
c9e2bfd817 fix(infrastructure): coerce LB ports string to listeners array (backend compat)
The /infrastructure/topology backend serialises load-balancer listener
ports as a CSV string in the legacy shape, while the hierarchical UI
expects an array of {port, protocol}. The InfrastructurePage data
normaliser now accepts either form. The Network tab also defaults
listeners/targets to [] so flat-table renders never crash on a half-
shaped LB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:19:23 +02:00
github-actions[bot]
1c6c93f32f deploy: update catalyst images to 84fa004 2026-04-30 08:14:18 +00:00
hatiyildiz
84fa0046d8 fix(infrastructure): tolerate sparse backend responses (missing optional collections)
The live /infrastructure/topology backend currently returns regions
without networks/peerings/firewalls. The hierarchical UI assumed every
collection was always an array and crashed ("e.firewalls is not
iterable") on the Network tab.

This commit:
  - Defaults every optional collection field to [] in both
    InfrastructurePage's data normalisation and downstream tab
    consumers.
  - Synthesises a `cloud` block from the distinct providers in the
    regions list when the backend omits it.
  - Hardens topologyLayout against missing nested arrays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:12:05 +02:00
github-actions[bot]
2213420c76 deploy: update catalyst images to e9c2e19 2026-04-30 08:07:21 +00:00
hatiyildiz
e9c2e19933 deploy: update catalyst images to d0ef984
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:04:18 +02:00
e3mrah
392f56e6a9
fix(jobs): backend-only when live data — kills duplicate rows + log 404 (#241)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 10:01:38 +02:00
e3mrah
d0ef984e27
feat(infrastructure): topology default + CRUD modals reusing wizard steps (#240)
Refactors /sovereign/provision/$id/infrastructure to match the founder's
wizard mental model:

  Org/Sovereign
    └─ Topology pattern (SOLO | HA-PAIR | MULTI-REGION | AIR-GAP)
        └─ Region(s)
            └─ Physical Cluster(s)
                ├─ vClusters [DMZ · RTZ · MGMT]
                ├─ LBs / peerings / firewalls
                └─ Worker nodes / pools

The 4 tabs (Topology / Compute / Storage / Network) are filtered lenses
over ONE backend response. Topology view is the default landing,
hierarchical 4-depth (Cloud → Region → Cluster → vCluster), and the
detail panel slides in on click. Click a cluster to zoom — vClusters of
that cluster un-dim.

Per founder spec, every CRUD action is delivered through a delta-wizard
modal that creates a Job entry. Modals shipped:

  • AddRegionModal (3-step, re-uses StepProvider in mode='add-region')
  • AddClusterModal (re-uses StepTopology in mode='add-cluster')
  • AddVClusterModal · AddNodePoolModal · ScalePoolModal · ChangeSKUModal
  • AddLBModal · AddPeeringModal
  • EditFirewallRulesModal · EditDNSRecordsModal
  • NodeActionConfirm (cordon / drain / replace)
  • DeleteCascadeConfirm (with cascade preview)

NEW pure layout function `lib/topologyLayout.ts` produces the layered
graph (no force-directed, no reactflow). Typed CRUD client wrappers in
`lib/infrastructure-crud.ts`. Synthetic fixture under
`test/fixtures/infrastructure-topology.fixture.ts` so the page is
navigable when the live `/infrastructure/topology` backend isn't
deployed yet.

Header gains a per-Sovereign switcher fed by GET /v1/deployments.

Wizard step components (StepProvider, StepTopology) get a `mode` prop
for in-place reuse — they are NOT forked.

Tests: 51 infra tests pass (10 InfrastructurePage + 6 Topology +
5 Compute + 4 Storage + 6 Network + 11 topologyLayout + 9 fixture
shape). Wizard test coverage (135 tests) unchanged. Cosmetic guards
extended for layered canvas, side panel, and Sovereign switcher.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:00:54 +02:00
github-actions[bot]
7f1b5983d2 deploy: update catalyst images to 948bf61 2026-04-30 07:59:59 +00:00
e3mrah
948bf61b91
feat(catalyst-api): infrastructure CRUD via Crossplane XRC + unified topology endpoint (#239)
Refactors the Sovereign Infrastructure surface so every Day-2 mutation
flows through a Crossplane Composite Resource Claim (XRC) the catalyst-
api writes against the SOVEREIGN cluster's kubeconfig — never bespoke
hcloud-go calls, never `exec.Command("kubectl", ...)`, never client-go
direct mutation. Per docs/INVIOLABLE-PRINCIPLES.md #3 Crossplane is the
ONLY Day-2 IaC seam.

When the third-sibling chart's Composition for a given XRC kind isn't
present yet, Crossplane stores the claim and sits it as Pending; the
catalyst-api emits a Job log line "Awaiting Crossplane Composition
for <kind>" so an operator browsing /jobs sees the gap. Each mutation
also commits a Job + Execution + LogLines via the existing audit-trail
Bridge so the table view shows every Day-2 action.

Endpoints + XRC kinds (all Composition targets owned by the third-
sibling agent):

  GET    .../infrastructure/topology                                     unified hierarchical read
  POST   .../infrastructure/regions                       RegionClaim          region-composition
  POST   .../infrastructure/regions/{id}/clusters         ClusterClaim         cluster-composition
  POST   .../infrastructure/clusters/{id}/vclusters       VClusterClaim        vcluster-composition
  POST   .../infrastructure/clusters/{id}/pools           NodePoolClaim        nodepool-composition
  PATCH  .../infrastructure/pools/{id}                    NodePoolClaim        nodepool-composition
  POST   .../infrastructure/loadbalancers                 LoadBalancerClaim    lb-composition
  POST   .../infrastructure/peerings                      PeeringClaim         peering-composition
  POST   .../infrastructure/firewalls/{id}/rules          FirewallRuleClaim    firewall-composition
  POST   .../infrastructure/nodes/{id}/{cordon|drain|replace}
                                                          NodeActionClaim      node-action-composition
  DELETE .../infrastructure/{kind}/{id}                   <kind>'s claim       <kind>-composition

Response shape per write: 202 Accepted with
  { jobId, xrcKind, xrcName, status: "submitted-pending-composition",
    submittedAt, cascade?: [...] }
DELETE additionally returns a Cascade preview (computed from the live
topology) so the FE confirm dialog can render "deleting region X
will drain Y workloads, remove Z PVCs".

Unified topology endpoint emits TopologyResponse: cloud[*] +
topology.regions[*].clusters[*].(vclusters|nodePools|nodes|
loadBalancers) + storage.(pvcs|buckets|volumes). The four FE tabs
(Topology, Compute, Storage, Network) all derive their views off this
single response. Live-source fields fall back to empty arrays — never
placeholder data per the founder's "no synthetic rows" rule. Legacy
flat /compute, /storage, /network endpoints stay wired with their
pre-existing shapes until the FE migrates.

New files:
  - internal/infrastructure/types.go            wire types
  - internal/infrastructure/xrc.go              Crossplane writer + DNS-1123 namer
  - internal/infrastructure/topology_loader.go  composes from tofu outputs
                                                + informer cache + Crossplane MR list
  - internal/jobs/mutation_bridge.go            RegisterMutationJob /
                                                AppendXRCSubmittedLog /
                                                FinishMutationJob — every
                                                mutation lands in batch
                                                "day-2-mutations"

Tests (`go test -race`):
  - infrastructure_test.go        unified TopologyResponse shape +
                                  empty fallback for storage/peerings
                                  + legacy compute/storage/network
                                  endpoints unchanged
  - infrastructure_crud_test.go   per-endpoint 202 happy path, 404 unknown
                                  deployment, 409 XRC name conflict, 503
                                  on missing kubeconfig, DELETE cascade
                                  preview, audit-Job materialised in
                                  jobs.Store

CORS now allows PATCH + DELETE so the FE wire calls succeed.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 09:57:46 +02:00
github-actions[bot]
6cbfbd18ef deploy: update catalyst images to f658757 2026-04-30 07:55:58 +00:00
e3mrah
f658757962
fix(bp-crossplane): resolve CHART_DIR to absolute path in composition-validate.sh (#237)
CI invokes the script as `bash <script> "platform/crossplane/chart"` from
the repo root. The script then `cd`s into that relative path, which works,
but every later `"$CHART_DIR/<sub>"` reference (notably FIXTURE_DIR for
Case 6) inherits the now-stale relative prefix and resolves under the
wrong cwd. Fix: resolve CHART_DIR via `(cd ... && pwd)` to an absolute
path BEFORE the chdir.

Local repro before fix:

  $ bash platform/crossplane/chart/tests/composition-validate.sh \
        platform/crossplane/chart
  ...
  Case 6: every fixture XRC kind is matched by an XRD
  FAIL: fixtures dir platform/crossplane/chart/tests/fixtures missing

Local result after fix:

  $ bash platform/crossplane/chart/tests/composition-validate.sh \
        platform/crossplane/chart
  ...
  Case 6: every fixture XRC kind is matched by an XRD
    PASS
  All bp-crossplane Day-2 CRUD Composition gates green.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 09:36:07 +02:00
e3mrah
8592d20919
feat(bp-crossplane): 6 XRDs + Compositions for Day-2 CRUD (RegionClaim/ClusterClaim/NodePoolClaim/LoadBalancerClaim/PeeringClaim/NodeActionClaim) (#236)
Adds the 6 CompositeResourceDefinitions and matching Compositions that
back the catalyst-api Day-2 CRUD endpoints. catalyst-api writes XRCs of
these kinds; Crossplane materialises them into provider-hcloud (and a
small number of provider-kubernetes) managed resources. Per
docs/INVIOLABLE-PRINCIPLES.md #3, every cloud-side op flows through
provider-hcloud — never bespoke hcloud-go calls or shell-outs to the
hcloud CLI.

XRDs (canonical group: compose.openova.io/v1alpha1):

  - RegionClaim       → composes the Phase-0 quartet via provider-hcloud:
                        Network + NetworkSubnet + Firewall + Server (cp1)
                        + LoadBalancer + LoadBalancerNetwork +
                        LoadBalancerService×2 + LoadBalancerTarget. Mirrors
                        infra/hetzner/main.tf 1:1 so deletion of a
                        RegionClaim cascades the whole slice.
  - ClusterClaim      → composes a provider-kubernetes Object that
                        materialises a cluster-identity ConfigMap. The
                        catalyst-environment-controller reads the CM to
                        template per-server cloud-init.
  - NodePoolClaim     → composes up to 100 provider-hcloud Server
                        resources. UPDATE flow: patching replicas n→m
                        flips the per-index Required-policy gate so
                        Crossplane creates/deletes Server CRs.
  - LoadBalancerClaim → composes provider-hcloud LoadBalancer +
                        LoadBalancerNetwork + up to 50
                        LoadBalancerService entries (per listener) + up
                        to 50 LoadBalancerTarget entries. UPDATE: patch
                        listeners[]/targets[] → composite controller
                        adds/removes services/targets.
  - PeeringClaim      → composes 1 or 2 provider-hcloud Route resources
                        (bidirectional flag toggles the second one
                        through a Required-policy gate).
  - NodeActionClaim   → composes a provider-kubernetes Object that
                        creates a batch/v1 Job running kubectl
                        cordon/drain (k8s-side op, not a cloud op, per
                        the task spec). action=replace additionally
                        composes a provider-hcloud Server for the
                        replacement node.

UPDATE/DELETE summary:

  - UPDATE: every mutable schema field is patched onto the underlying
    managed resource; Crossplane's composite controller drives the diff
    and provider-hcloud reconciles to the new state.
  - DELETE: every composed resource has deletionPolicy: Delete, so a
    cascade delete of the composite tears down the whole resource graph
    in dependency-safe order (Crossplane retries until deps unblock).

New tests:
  - tests/composition-validate.sh — 7 gates: helm renders cleanly,
    exactly 6 XRDs, ≥ 6 Compositions, all 6 expected claim kinds
    present, every rendered doc is valid YAML, every fixture references
    a real XRD, and (when KUBECONFIG + Crossplane CRDs available)
    server-side dry-run for every fixture.
  - tests/fixtures/<kind>-sample.yaml — one XRC fixture per kind.

Version bump:
  - platform/crossplane/chart/Chart.yaml             1.1.1 → 1.1.2
  - platform/crossplane/blueprint.yaml               1.1.1 → 1.1.2
  - clusters/_template/bootstrap-kit/04-crossplane.yaml         → 1.1.2
  - clusters/otech.omani.works/bootstrap-kit/04-crossplane.yaml → 1.1.2

Hard rules respected:
  - provider-hcloud only for cloud ops (never hcloud-go, never CLI).
  - provider-kubernetes Object for k8s-side ops (never raw kubectl).
  - No bespoke kubectl manifests for cloud resources.
  - Frontend + catalyst-api Go code untouched (sibling-owned).
  - Target state, no MVP framing — all 6 Compositions ship.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 09:33:38 +02:00
github-actions[bot]
0379291948 deploy: update catalyst images to 7cb145e 2026-04-30 07:30:34 +00:00
e3mrah
7cb145e3b5
fix(jobs): GetJob accepts bare jobName (Traefik mangles encoded colons) (#235)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 09:28:41 +02:00
github-actions[bot]
a076a369dd deploy: update catalyst images to 785744f 2026-04-30 06:53:28 +00:00
e3mrah
785744f528
fix(catalyst-api): bridge backfills Jobs from informer initial-list + new refresh-watch + components/state endpoints (#234)
The pre-existing internal/jobs.Bridge only writes Jobs on state
TRANSITIONS, so a HelmRelease that has been Ready=True for an hour
shows up as an empty /jobs response — the founder's symptom report
on https://console.openova.io/sovereign/provision/<id>/jobs.

Three changes converge to fix backend-side backfill:

  1. Bridge.SeedJobsFromInformerList — given a snapshot of the
     helmwatch informer's local cache (one entry per bp-* HelmRelease
     at HasSynced time), the bridge writes a Job per HR plus a
     synthetic-log-line Execution for every terminal HR. Idempotent:
     calling it again on every helmwatch start is a no-op when the
     Job already has a LatestExecutionID.

  2. helmwatch.Watcher.OnInitialListSynced — the canonical hook the
     handler uses to wire (1) into every Watcher it constructs.
     Combined with SnapshotComponents(), it gives the new
     /components/state endpoint a stateless read against the
     in-memory cache.

  3. Two new HTTP endpoints:
       POST /api/v1/deployments/{depId}/refresh-watch
            202 + seededAt + components on a fresh watcher start;
            200 alreadyActive when one is already running; 409
            watch-not-resumable when kubeconfig is missing; 504
            on bridge-seed timeout. The FE uses this as an
            explicit handshake to re-attach watching after a Pod
            restart.
       GET  /api/v1/deployments/{depId}/components/state
            Snapshot of the live informer cache as JSON; falls
            back to dep.Result.ComponentStates when no Watcher
            is attached so the FE always renders consistent rows.

Concurrency: the Bridge gains a sync.Mutex covering activeExecID +
lastState because the seed hook (fired from Watcher.fireOnSyncedHooks)
now races OnHelmReleaseEvent (fired from the informer's processEvent
goroutine). Store-level writes are still serialised under Store.mu;
the new bridge mutex is purely for in-memory cursor state.

Tests:
  - TestSeedJobsFromInformerList_idempotent: dup-call writes 0
    new Executions (the load-bearing invariant).
  - TestSeedJobsFromInformerList_writesSyntheticLogLine: every
    terminal seed produces exactly one INFO/ERROR log line of
    the form [seeded] state=<state> at <ts>: <message>.
  - TestSeedJobsFromInformerList_subsequentTransitionSuppressed:
    a follow-up OnHelmReleaseEvent with the same state as the
    seed is dropped by lastState — no second Job upsert, no
    second Execution.
  - TestRefreshWatch_*: 202 happy path, 200 alreadyActive, 409
    no-kubeconfig, 404 unknown deployment, 503 no-jobs-store.
  - TestComponentsState_*: live-watcher shape, persisted-fallback
    shape, 404 unknown deployment.
  - All tests run under -race.

Verified:
  - go build ./... clean
  - go vet ./... clean
  - go test -race ./... 9 packages all pass

Refs openova-io/openova#232

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:51:20 +02:00
github-actions[bot]
357b70561e deploy: update catalyst images to 2cdc7a3 2026-04-30 06:48:18 +00:00
e3mrah
2cdc7a34ce
fix(wizard): clear stale phase1-skipped banner + backfill jobs view from live HR state (#233)
* fix(wizard): reducer clears stale phase1WatchSkipped on fresh component events

The AdminPage banner "Per-component install monitoring is unavailable for
this deployment" was sticking permanently because the reducer treated
phase1WatchSkipped as monotonic. When the SSE replay buffer carried a
single early `state: skipped` warn event from a transient `status: ready`
phase, the flag latched on — even after the deployment transitioned back
to `phase1-watching` and a healthy stream of per-component events
proved helmwatch was observing the new cluster.

The flag is now cleared by either:
  - a `phase: component` event that carries a real per-component state
    (anything except the explicit `state: skipped` no-data marker), OR
  - a `phase: deployment` event with status `phase1-watching` or
    `installing` (the API's explicit "helmwatch attached" signal).

The skipped-flag still latches on for warn/error events without a
component field (the kubeconfig-missing case) and survives unrelated
events (tofu-*, flux-bootstrap, noise) — only fresh ground-truth data
unsets it.

Updated the eventReducer tests: replaced "monotonic" with three new
tests covering each clear-rule path, plus a guard that `state: skipped`
does NOT clear the flag.

Refs openova-io/openova#232

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(wizard): backfill JobsTable from live deployments/{id}/jobs API

The Jobs view was showing every row pending despite 6 HelmReleases
being Ready=True on the live cluster. Two reasons converged:

  1. helmwatch only fires on TRANSITIONS — a HelmRelease that's
     already Ready=True at watch-attach time emits no SSE event the
     reducer can fold in.
  2. The SSE replay buffer carries old events that contradict live
     state; fresh per-component installations got masked by stale
     `state: skipped` markers.

The fix is a per-deployment polling hook that reads the canonical
backend Jobs endpoint:

  GET /api/v1/deployments/{depId}/jobs → { jobs: Job[] }

every 5s while the deployment is in flight, and merges the result with
the reducer-derived list (live data wins on conflict — same job.id).
When the live list is empty (404, network error, or the backend hasn't
populated yet), the reducer-derived list passes through unchanged.

Polling stops automatically when streamStatus reaches `completed` or
`failed` — by then the snapshot's componentStates already seeded every
card and further polling is wasteful.

JobsPage surfaces a small "Live state stream re-attached" banner when
backfill data is present so operators viewing a stalled-looking page
know the table is being refreshed from the backend, not the local SSE
replay.

Tests:
  - mergeJobs unit tests: empty live → reducer pass-through; conflict
    → live wins; non-overlapping ids → union; the issue-#232 symptom
    (0 reducer + 5 backend = 5 rows rendered).
  - useLiveJobsBackfill hook tests: fetcher resolves → liveJobs
    populated; enabled:false → no fetch; fetcher throws → isError:true,
    liveJobs:[] (silent fallback).
  - JobsTable render: 5 backend jobs render 5 rows with verbatim
    statuses (no demotion to pending).

The JobsPage test now wraps in a QueryClientProvider since the page
mounts useQuery() unconditionally; existing tests opt out of the
network call via the new disableJobsBackfill prop.

Refs openova-io/openova#232

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(wizard): ExecutionLogs distinguishes empty success from fetch error

The execution-logs panel was showing "Failed to load log page —
retrying" whenever the backend returned the perfectly-valid empty
shape:

  { lines: [], total: 0, executionFinished: false }

This is exactly what the catalyst-api Jobs bridge returns for jobs
whose state hasn't been recorded yet — most Phase 0 jobs fall into
this bucket until the bridge starts capturing state-transition lines.
The empty-success response was indistinguishable from the actual
failure path because both mounted the same generic placeholder, and
the error overlay rendered on every isError tick.

Three states now distinguish:
  - isLoading            → "Connecting to log stream…"
  - !isError, no lines   → "No logs captured yet for this job."
  - executionFinished    → "Execution finished — no log lines were
                           emitted."

The error overlay (with a new explicit Retry button replacing the
implicit "retrying..." copy) only renders on a real fetch failure
(query.isError). A successful response with an empty lines array is
NOT an error and never triggers the red banner.

Tests:
  - Empty success → shows "No logs captured yet"; error banner and
    retry button are both ABSENT (anti-regression for the issue-#232
    symptom).
  - Fetch throws → error banner present, retry button rendered.
  - All previous tests still pass.

Refs openova-io/openova#232

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:46:11 +02:00
e3mrah
c747fe2265
fix(bp-gitea): override postgresql to bitnamilegacy (Bitnami evacuated docker.io tags) (#231)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 08:27:49 +02:00
e3mrah
da87fb38c4
fix(bp-spire): disable ALL default-enabled clusterSPIFFEIDs (default+oidc+test-keys) (#230)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 08:13:41 +02:00
github-actions[bot]
f21d3a4676 deploy: update catalyst images to 1a4a54f 2026-04-30 06:03:50 +00:00
e3mrah
1a4a54f72e
feat(wizard): Infrastructure page (topology default + Compute/Storage/Network tabs) (#229)
* feat(ui): infrastructure.types — wire types + topology layout (#227)

Introduces the shared TypeScript contract the Infrastructure surface
consumes: TopologyNode/Edge, ComputeItem, StorageItem, NetworkItem,
fetchers keyed off API_BASE, and a deterministic layered topology
layout (cloud → region → cluster → node | lb → pvc | volume | network)
mirroring the depsLayout pattern from #206. Pure-function tests pin
the layer-by-NodeKind invariant, edge poly-line emission and
deterministic ordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): InfrastructurePage shell with 4 tabs (Topology default) (#227)

Page shell rendered at /sovereign/provision/$deploymentId/infrastructure.
Header + four-tab nav (Topology / Compute / Storage / Network) in the
canonical AppsPage tab style; active tab derived from the URL suffix
so back/forward keeps the active tab in sync. Founder spec verbatim:
"the infrastructure page must be opened by default with the topology
page" — Topology is the default and the bare URL redirects to it via
the router.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): InfrastructureTopology SVG canvas + detail panel (#227)

Topology tab — pure-SVG layered-graph canvas using the deterministic
topologyLayout. Status colour comes from canonical --color-success /
warn / danger / text-dim CSS variables. Click a node opens a right-rail
detail panel listing the node's metadata; closing the panel returns
to the bare canvas. Empty state shows a "Provisioning…" overlay rather
than placeholder data — the canvas is the canonical empty state until
the cluster reports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): Infrastructure Compute/Storage/Network card grids (#227)

Three card-grid tabs in the canonical .app-card visual rhythm:

  Compute  — Clusters + Worker Nodes
  Storage  — Persistent Volume Claims + Object Buckets + Block Volumes
  Network  — Load Balancers + DRGs / VPC Gateways + Peerings

Each tab fetches its slice from /api/v1/deployments/<id>/infrastructure/
<tab> with React Query, shows a section heading + count chip, renders
status-aware cards. Empty state per tab is a typographic empty card —
no placeholder data per founder spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): wire Infrastructure routes + sidebar nav item (#227)

Registers parent route /provision/$deploymentId/infrastructure with
four sub-routes (topology, compute, storage, network) plus an index
beforeLoad redirect that sends bare /infrastructure to /infrastructure/
topology. Adds the Infrastructure entry to the Sidebar nav with a
server-stack glyph distinct from Apps and Dashboard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): infrastructure REST surface (topology/compute/storage/network) (#227)

Four GET endpoints for the Sovereign Infrastructure page:

  /api/v1/deployments/{depId}/infrastructure/topology
  /api/v1/deployments/{depId}/infrastructure/compute
  /api/v1/deployments/{depId}/infrastructure/storage
  /api/v1/deployments/{depId}/infrastructure/network

Topology + Compute + Network compose from the deployment record's
Request + Result (always available post-Phase-0). Storage requires
the live cluster's kubeconfig; until that integration lands, the
handler returns the well-shaped empty response per the founder's
"no placeholder data, empty state instead" rule. JSON arrays serialise
as `[]` not `null` so the UI can iterate them safely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(ui): cosmetic guards for Infrastructure tabs + redirect (#227)

Three new @cosmetic-guard tests:

  1. /infrastructure redirects to /infrastructure/topology (default tab)
  2. Tabs are exactly Topology / Compute / Storage / Network in that
     order, with Topology aria-selected by default
  3. Sidebar exposes a sov-nav-infrastructure link to /infrastructure

Each test fails LOUD with the source-file pointer the next agent must
edit, matching the existing cosmetic-guard idiom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:01:46 +02:00
github-actions[bot]
58c7497db8 deploy: update catalyst images to 719c3ba 2026-04-30 05:53:52 +00:00
e3mrah
719c3bac35
fix(bp-spire): disable default ClusterSPIFFEID — CRD not observable in time on fresh install (#228)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 07:51:03 +02:00
github-actions[bot]
0fe0f11e5c deploy: update catalyst images to 6b8a161 2026-04-30 05:33:58 +00:00
e3mrah
6b8a161bc9
feat(ui): dashboard with Recharts treemap (cpu/memory utilization) (#226)
Adds the Sovereign Dashboard surface at
/sovereign/provision/$deploymentId/dashboard rendering a Recharts
<Treemap> where box AREA tracks the selected resource limit and box
COLOR is a continuous gradient (blue -> green -> red) over a
selectable utilisation/health/age metric. Toolbar lets the operator
pick Size, Color, and up to 4 nested Layer dimensions
(sovereign/cluster/family/namespace/application). Capacity size
metrics auto-lock the colour scale to utilisation. Drill-down walks
the in-memory tree (no refetch); breadcrumb chips pop back. Hover
yields a viewport-clamped tooltip with a deep link to AppDetail.

Architecture notes baked into the code:
- Module-level callback refs (_onCellHover/_onCellClick/_activeColorFn
  /_itemsByName) are required because Recharts clones the cell content
  component without preserving React closures or hooks.
- Parent-bounds Map clips child labels under the 24px nested header
  strip so a tall narrow child can't render under its parent's title.
- Cell renderers gate label visibility on width >= 50px / height >= 24px
  to avoid noisy text on tiny cells.
- isAnimationActive=false for perf on 500+ cells.

Backend (catalyst-api):
- New GET /api/v1/dashboard/treemap?group_by=A,B&color_by=C&size_by=D
  returning the nested TreemapItem[] shape the UI consumes.
- v1 emits a static placeholder shape derived from the canonical
  Catalyst-Zero family list (20 cells across 6 families). The HTTP
  schema is the target schema; only the data SOURCE is a placeholder.
  Replacing it with metrics-server integration is a follow-up.

Tests:
- 30 colour-gradient + drill-walk unit tests in
  src/lib/treemap.types.test.ts (0%->blue, 50%->green, 100%->red,
  interpolation, walk, query string).
- 9 controller toolbar tests (add/remove layer caps, capacity-metric
  auto-lock, dimension exclusion).
- 6 Dashboard render tests (toolbar, empty state, total count,
  breadcrumb root chip).
- 6 Go handler tests (default/nested response shape, dimension/colour/
  size validation, percentage-in-range invariant).

Sidebar gets a Dashboard nav entry. Sidebar.test updated to reflect.

Vite dev proxy gains a /sovereign/api passthrough (rewrites to /api)
so dev mirrors the production traefik prefix-strip.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:06:13 +02:00
e3mrah
245b359057
feat(ui): theme toggle + card cosmetics (refs #179) (#225)
* feat(ui): add light/dark theme toggle in PortalShell header

Mount ThemeToggle (sun/moon icon button) in the top-right of every
PortalShell page (Sovereign Apps, Jobs, AppDetail, JobDetail). Click
flips the `data-theme` attribute on `<html>` and persists to
`localStorage['oo-theme']`, in lockstep with the existing bootstrap
script in index.html and the useTheme hook.

Light theme palette: extend [data-theme="light"] in globals.css with
peers for every console token (--color-bg, --color-bg-2, --color-text,
--color-text-strong, --color-text-dim, --color-text-dimmer,
--color-border, --color-border-strong, --color-surface,
--color-surface-hover, --color-accent, --color-accent-hover,
--color-warn, --color-danger, --color-success). All ratios are
WCAG AA-or-better against --color-bg = #ffffff:
  text-on-bg          17.85:1  AAA
  text-strong-on-bg   20.17:1  AAA
  text-dim-on-bg       7.58:1  AAA
  text-dimmer-on-bg    4.76:1  AA
  accent-on-bg         5.17:1  AA
  danger-on-bg         6.47:1  AAA
  warn-on-bg           5.02:1  AA
  success-on-bg        5.48:1  AA

Two cosmetic-guard regression tests are added:
  • theme-toggle is present in PortalShell header
  • clicking theme-toggle flips data-theme on the html element +
    persists to localStorage[oo-theme]

Refs #179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(wizard): card description = 2 lines, + bubble floats over body

Two regressions on the StepComponents grid (#179):

1) Some cards rendered with a 1-line description because the
   .corp-comp-desc rule clamped at 2 lines but did NOT reserve 2
   lines of vertical space. Short descriptions collapsed the card
   body by ~14px and pulled the chip row (line 4) up, leaving the
   chips on a visibly ragged Y across the grid.

   Fix: add `min-height: 2.5em` to .corp-comp-desc. Computed value
   = 2.5 × 0.76rem × 1.4 lh × 16px = 30.4px reserved height — every
   card now hosts 2 lines of description even when the actual copy
   is one line. Verified: chipsY identical at 523.1 / 641.5 / 759.8
   across each row of three cards on the choose-stack grid.

2) The right ¼ of every card body was effectively empty because
   the inline "+" Add button shared line 1 with the family chip,
   reserving horizontal space the description never got to use.

   Fix: lift the toggle button out of .corp-comp-body and absolute-
   position it at top: 0.5rem; right: 0.5rem; z-index: 10 so it
   OVERLAYS the description's top-right corner instead of reserving
   width. Lines 2-3 (description) now span the full body width.

Acceptance:
  • All 93 StepComponents.test.tsx unit tests still pass
  • All 17 cosmetic-guard tests still pass (16 unrelated failures
    are pre-existing on origin/main, sibling agent territory)
  • New cosmetic-guard test "every component card has min-h:108px
    and 2-line description" added (asserts webkitLineClamp === '2',
    descBox.height ≥ 26px, chip-row Y spread ≤ 2px within a row)

Refs #179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:00:21 +02:00
e3mrah
d1d351b384
fix(bp-cilium): set l7Proxy=true so envoyconfig CRDs install (agent crash fix) (#224)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 06:51:13 +02:00
e3mrah
3c12995230
feat(wizard): drop top batches strip on JobsPage; new BatchDetail page with progress bar (#223)
Founder verbatim feedback (epic #204 item #4, addresses #222):
> "On the jobs page the top 3 cards are not required, the progress bar
>  needs to be shown only when I click a specific batch and it shows
>  the batch page along with its batch progress at the top"

Changes:
- JobsPage no longer renders the 3-card BatchProgress strip. Heading
  + tagline now sits directly above the JobsTable.
- JobsTable batch chip is now an `<a>` Link to /batches/$batchId; adds
  `initialBatchFilter` prop (mirrors `appIdFilter`) that hides the
  Batch dropdown when set.
- BatchProgress component grows a `singleBatch?: Batch` mode that
  renders ONE large card: batch label, finished/total summary, big
  progress bar, and 5 status tiles (Running / Pending / Succeeded /
  Failed / Total).
- New BatchDetail page at /provision/$deploymentId/batches/$batchId:
  back-link to Jobs, breadcrumb, batch title, single-batch progress
  card, and JobsTable filtered to that batch's rows.
- Router registers /batches/$batchId.
- JobsPage.test grows two cosmetic-guard cases:
  * does NOT render the per-batch progress strip
  * batch chip in a row is an <a> linking to /batches/...
- BatchDetail.test covers chrome, single-batch progress card,
  filtered JobsTable, hidden Batch dropdown, and not-found state.

Tests: 311 passed (20 files). Typecheck + build clean.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:44:09 +02:00
e3mrah
0eadc8d7b1
fix(bootstrap-kit): 15m install timeout (cert-manager exceeds 5m default) (#221)
* fix(bootstrap-kit): drop disableTakeOwnership — field not in HelmRelease v2 schema

* fix(bootstrap-kit): add 15m install/upgrade timeout — cert-manager + heavier charts exceed 5m default

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 06:29:37 +02:00
e3mrah
1689ffcd1a
fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220)
Both charts are scratch (no upstream Helm chart published — Coraza
project + anchore/syft+grype CLIs ship containers only). The
blueprint-release.yaml hollow-chart gate (issue #181) rejects charts
with zero declared dependencies. Adding sigstore/common as a tiny
library subchart satisfies the gate; common is a library type so it
contributes zero runtime resources to either chart's rendered output.

The Catalyst-side templates (Deployment+Service for bp-coraza,
CronJob+PVC for bp-syft-grype) remain entirely in templates/ — the
library dep is purely a CI-gate mechanism, NOT a functional dependency.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 06:15:28 +02:00
e3mrah
1a2c6ae146
fix(bootstrap-kit): drop disableTakeOwnership — field not in HelmRelease v2 schema (#219)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 06:12:34 +02:00
e3mrah
4b3376ff48
fix(clusters): seed otech.omani.works tree (temp diag — canonical fix in #216) (#217)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 06:10:37 +02:00
e3mrah
3a57e287e5
feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216)
* feat(bp-falco): umbrella chart for security layer

Catalyst Blueprint umbrella chart for falco — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-kyverno): umbrella chart for security layer

Catalyst Blueprint umbrella chart for kyverno — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-trivy): umbrella chart for security layer

Catalyst Blueprint umbrella chart for trivy — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-sigstore): umbrella chart for security layer

Catalyst Blueprint umbrella chart for sigstore — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-syft-grype): umbrella chart for security layer

Catalyst Blueprint umbrella chart for syft-grype — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-reloader): umbrella chart for security layer

Catalyst Blueprint umbrella chart for reloader — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-coraza): umbrella chart for security layer

Catalyst Blueprint umbrella chart for coraza — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

* feat(bp-litmus): umbrella chart for security layer

Catalyst Blueprint umbrella chart for litmus — security/policy layer.

Pinned upstream + appVersion verified against the helm index on
2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.
Solo-Sovereign defaults; per-Sovereign overlays bump to HA later.

Part of security-stack umbrellas batch 3.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 06:07:38 +02:00
e3mrah
75128781b3
feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214)
* feat(bp-grafana): umbrella chart for observability stack

Catalyst Blueprint umbrella for Grafana — visualization layer of the
LGTM observability stack (Loki/Grafana/Tempo/Mimir).

Pinned to grafana/grafana 10.5.15 (appVersion 12.3.1) — current stable
on 2026-04-29. Solo-Sovereign defaults: 1 replica, 10Gi PVC,
ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-loki): umbrella chart for observability stack

Catalyst Blueprint umbrella for Grafana Loki — log aggregation backend
of the LGTM stack. SingleBinary mode by default (solo-Sovereign min);
SimpleScalable/Distributed are values toggles.

Pinned to grafana/loki 7.0.0 (appVersion 3.6.7) on 2026-04-29.
Filesystem storage default; SeaweedFS S3 wiring is per-Sovereign overlay
when scaling out. All observability toggles default false per
BLUEPRINT-AUTHORING.md §11.2.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-mimir): umbrella chart for observability stack

Catalyst Blueprint umbrella for Grafana Mimir — metrics storage tier of
the LGTM stack.

Pinned to grafana/mimir-distributed 6.0.6 (appVersion 3.0.4) on
2026-04-29. Solo-Sovereign defaults: every component scaled to 1
replica, zoneAwareReplication disabled, Kafka ingest-storage disabled.
Bundled MinIO kept enabled as a stop-gap so the chart renders;
SeaweedFS S3 wiring is per-Sovereign overlay. All metaMonitoring
toggles default false per BLUEPRINT-AUTHORING.md §11.2.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-tempo): umbrella chart for observability stack

Catalyst Blueprint umbrella for Grafana Tempo — distributed tracing
backend of the LGTM stack. Single-binary mode by default
(solo-Sovereign min); microservice mode (tempo-distributed) is a chart
swap toggle.

Pinned to grafana/tempo 1.24.4 (appVersion 2.9.0) on 2026-04-29. Local
PVC storage default; SeaweedFS S3 wiring is per-Sovereign overlay.
Metrics generator disabled by default (depends on bp-mimir).
ServiceMonitor default false per BLUEPRINT-AUTHORING.md §11.2.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-alloy): umbrella chart for observability stack

Catalyst Blueprint umbrella for Grafana Alloy — unified telemetry
collector for the LGTM stack (logs, metrics, traces; OTLP-native).

Pinned to grafana/alloy 1.8.0 (appVersion v1.16.0) on 2026-04-29.
DaemonSet controller default (one Alloy per node) so node + container
telemetry work out of the box. Empty Alloy config by default;
per-Sovereign overlays populate forwarders to bp-loki/bp-mimir/bp-tempo
once those reconcile. ServiceMonitor + ingress + CRDs default false per
BLUEPRINT-AUTHORING.md §11.2.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-opentelemetry): umbrella chart for observability stack

Catalyst Blueprint umbrella for the OpenTelemetry Collector — vendor-
neutral telemetry collector. Sibling to bp-alloy; per-Sovereign overlays
choose one.

Pinned to open-telemetry/opentelemetry-collector 0.152.0 (appVersion
0.150.1) on 2026-04-29. Uses the contrib distribution
(otel/opentelemetry-collector-contrib:0.150.1) so Loki/Mimir/Tempo
exporters are bundled. Deployment mode default (1 replica); DaemonSet
+ StatefulSet are values toggles. All presets default false; ingress
+ ServiceMonitor + PodMonitor + PrometheusRule + NetworkPolicy default
false per BLUEPRINT-AUTHORING.md §11.2.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-langfuse): umbrella chart for observability stack

Catalyst Blueprint umbrella for Langfuse — LLM observability platform.
Complements bp-grafana (infrastructure metrics) with AI-specific
telemetry (traces, evaluations, prompts, cost attribution).

Pinned to langfuse/langfuse 1.5.28 (appVersion 3.171.0) on 2026-04-29.

Catalyst convention: ALL bundled Bitnami subcharts are disabled —
PostgreSQL via cnpg.io/Cluster (bp-cnpg), Redis via bp-valkey,
ClickHouse via bp-clickhouse, S3 via bp-seaweedfs. Per-Sovereign
overlays wire external endpoints + Secret references. Telemetry to
Langfuse Inc. defaulted false; signUpDisabled defaulted true.

Part of issue #204 observability-stack umbrellas batch.

* feat(bp-velero): umbrella chart for observability stack

Catalyst Blueprint umbrella for Velero — Kubernetes-native backup and
disaster recovery. Per platform/velero/README.md, ALL Velero output
goes to SeaweedFS (Catalyst's unified S3 encapsulation), which
transitions to a cloud archival backend on the cold tier.

Pinned to vmware-tanzu/velero 12.0.1 (appVersion 1.18.0) on 2026-04-29.
Bundled velero-plugin-for-aws:v1.14.0 init container so SeaweedFS S3 is
reachable. backupsEnabled/snapshotsEnabled defaulted false at this
layer (placeholders for backupStorageLocation); per-Sovereign overlays
flip on after wiring SeaweedFS endpoint + credentials. ServiceMonitor +
PodMonitor + PrometheusRule default false per BLUEPRINT-AUTHORING.md
§11.2.

Part of issue #204 observability-stack umbrellas batch.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-29 22:11:04 +02:00
github-actions[bot]
4215cdf49c deploy: update catalyst images to 2d0ea7c 2026-04-29 19:43:32 +00:00
e3mrah
2d0ea7c750
feat(wizard): replace jobs accordion with table view + batches + AppDetail Jobs tab (refs #204) (#211)
The founder rejected the expand-in-place job accordion verbatim — "NEVER
use accordions anywhere — the wizard filled them everywhere for jobs.
Unacceptable." (issue #204 comment 0). This rebuild replaces the
canonical core/console JobsPage.svelte port with a table-view that
matches the founder's verbatim spec for items 1, 2, 4, 6, 7, 8a, 8b, 10.

Frontend changes:
- New JobsTable.tsx — Tailwind+Tanstack table with seven columns
  (Name / App / Deps / Batch / Status / Started / Duration), search
  input, status/app/batch filter dropdowns, and a default sort that
  honours item #10 (status priority running > pending > succeeded >
  failed, then startedAt DESC).
- New BatchProgress.tsx — per-batch progress strip rendered above the
  table (item #4: "Jobs in groups → batches with overall progress bar
  based on finishing count").
- Rewritten JobsPage.tsx — now mounts <BatchProgress /> + <JobsTable />
  in place of the per-row JobCard accordion. Existing reducer-derived
  Job model is adapted to the flat row shape via new jobsAdapter.ts so
  the live SSE event stream still populates the table.
- Modified AppDetail.tsx — Jobs section now exposes a tablist ([Jobs |
  Dependencies]) with the Jobs tab selected by default (item #9 +
  #8b: AppDetail → Jobs tab filtered to that app's jobs only). The
  remaining canonical sections (About / Connection / Bundled deps /
  Tenant / Configuration) keep their h2/h3 layout — only the bottom
  Jobs section was tabbed.
- Deleted JobCard.tsx + JobCard.test.tsx — the accordion row is gone.
- New router stub for /provision/<id>/jobs/<jobId> so the table's row
  link resolves; full page is owned by the JobDetail sibling agent.

Contract:
- New src/lib/jobs.types.ts exports { Job, Batch, JobStatus } per the
  contract the backend sibling agent on #205 will emit on
    GET /api/v1/deployments/{depId}/jobs
    GET /api/v1/deployments/{depId}/jobs/batches
- New src/test/fixtures/jobs.fixture.ts has 8 jobs across 2 batches
  with every status bucket represented; reusable across sibling test
  surfaces.

Tests:
- 4 new cosmetic-guard e2e tests (cosmetic-guards.spec.ts):
    1. data-testid="jobs-table" exists; legacy job-row-/job-expansion-
       testids are gone.
    2. Table headers are name / app / deps / batch / status / started /
       duration in that order.
    3. Typing in jobs-search filters the row count.
    4. AppDetail page has a tab labelled "Jobs".
- New JobsTable.test.tsx — unit coverage for compareJobs (status
  priority, startedAt DESC, pending-jumps-to-top tiebreak), matchJob
  (search predicate spans jobName/appId/dependsOn/status/batchId),
  formatDuration ("12s" / "1m 24s" / "2h 5m"), and the rendered
  surface (search/status filter/appIdFilter/columns/row link).
- New BatchProgress.test.tsx — empty state, per-batch render, aria
  progressbar, failed-chip visibility, deriveBatches helper.
- Updated JobsPage.test.tsx + AppDetail.test.tsx to assert the new
  table/tab shape and that no legacy accordion testids remain.
- Updated cosmetic-guard test 13 (AppDetail layout) to permit the
  founder-requested Jobs tab while still banning the retired
  Logs/Status/Overview tab vocabulary.

Verification:
- `npm test` → 16 files / 265 tests, all green.
- `npm run typecheck` → clean.
- `npm run build` → vite build produces the production bundle.
- Playwright MCP at 1440px screenshots saved under
  .playwright-mcp/jobs-table-rework/ (JobsPage populated, search
  filtered, BatchProgress strip, AppDetail Jobs tab).

Founder items addressed: 1, 2, 4, 6, 7, 8a, 8b, 10.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:41:31 +02:00
e3mrah
fad36836ed
fix(ci): tempo + ntfy logos are now .svg (logo-fix-batch-2) (#213)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-29 21:41:29 +02:00
e3mrah
896dc041d7
feat(wizard): job dependencies SVG DAG + (stretch) timeline view (closes #206) (#212)
Implements item 11 of epic #204 — job dependency visualisation. Ships
both a primary surface and a stretch surface per the proposal at
`docs/proposals/jobs-dependencies-viz.md`:

  PRIMARY  — pure-SVG topological-layered DAG inline on the JobDetail
             Dependencies tab. Color-coded by status (succeeded / running
             / failed / pending), click-a-node to navigate, keyboard
             accessible (Enter / Space). 350-450px clamp.

  STRETCH  — fullscreen Gantt timeline at /sovereign/provision/$id/jobs/
             timeline. One row per job, bars from startedAt → finishedAt
             (or now if running), nice-tick time axis, hover tooltips.

New files:

  • src/widgets/job-deps-graph/JobDependenciesGraph.tsx — SVG DAG widget,
    structurally typed against any Job-like shape so it works with both
    today's jobs.ts model and the evolved-but-not-yet-merged backend
    contract from #205.
  • src/shared/lib/depsLayout.ts — pure topological-layered layout. Kahn
    topo-sort + cycle break + within-layer sort by descendant count.
    Zero external graph deps (no reactflow / cytoscape / d3-dag — per
    the issue hard rules + INVIOLABLE-PRINCIPLES.md #2).
  • src/pages/sovereign/JobsTimeline.tsx — Gantt page chrome.
  • src/test/fixtures/deps-graph.fixture.ts — shared mock graph for
    sibling agents per the contract in the epic.
  • src/pages/designs/JobsDepsVizDemo.tsx — visual lock-in surface at
    /sovereign/designs/jobs-deps-viz for reviewer eye-checks.
  • docs/proposals/jobs-dependencies-viz.md — recommendation rationale.

Integration into the merged JobDetail surface (PR #208):

  • src/components/JobDependencies.tsx — replaces the placeholder list-
    only surface with DAG-on-top-of-list. List preserved for keyboard
    accessibility + screen readers.

Tests:

  • depsLayout.test.ts (15 cases): topo order, no overlap, cycle break,
    unknown-id drop, custom options, edge geometry.
  • JobDependenciesGraph.test.tsx (8 cases): render counts, status data
    attribute, click + keyboard handlers, height clamp.

Cosmetic verification: 4 screenshots at 1440px under
`.playwright-mcp/jobs-deps-viz/` showing the DAG (5-job + 3-node
fixtures), the integrated Dependencies tab on a real JobDetail page,
and the Gantt timeline route.

Refs #204
Closes #206

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:40:43 +02:00