Commit Graph

634 Commits

Author SHA1 Message Date
hatiyildiz
876d5e170b test(catalyst-ui): Playwright E2E for Cloud accordion + redirects
Adds e2e/cloud-nav.spec.ts — 7 Playwright assertions that lock in
the Sovereign-portal Cloud accordion contract from issue #309:

  1. Sidebar exposes Cloud (not Infrastructure) accordion.
  2. Clicking the Cloud header toggles expanded state and reveals 4
     sub-items (Architecture / Compute / Network / Storage).
  3. Each sub-item routes to /provision/$id/cloud/{suffix} and
     declares aria-current=page when active.
  4. Legacy /infrastructure/* paths redirect to /cloud/* equivalents.
  5. Expanded state persists across page reloads via the
     `sov-nav-cloud-expanded` localStorage key.
  6. Accordion auto-expands when the operator deep-links onto a
     /cloud/* route.
  7. Captures three 1440x900 screenshots (collapsed, expanded with
     Architecture active, expanded with Compute active) under
     e2e/screenshots/p1-cloud-nav-*.png for visual evidence.

Also fixes a Sidebar bug surfaced by the e2e run: the active-section
detector was using `pathname.includes('/cloud')`, which would falsely
flag any deploymentId containing the substring "cloud" as being on a
/cloud/* route. Replaced with a path-segment regex.

Adds e2e/screenshots/ to .gitignore (regenerated each run, never
committed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:08:45 +04:00
hatiyildiz
4ba99525f1 feat(catalyst-ui): rename InfrastructureTopology/Compute/Network/Storage files + testids
Renames the four Sovereign-Cloud sub-page files + classes + testids
(issue #309). The component contents stay otherwise unchanged in P1
— the force-graph rewrite (P2) and per-resource list pages (P3) are
separate phases.

Renames:
  InfrastructureTopology.tsx → Architecture.tsx
  InfrastructureTopology  → Architecture
  InfrastructureCompute.tsx → CloudCompute.tsx
  InfrastructureCompute   → CloudCompute
  InfrastructureNetwork.tsx → CloudNetwork.tsx
  InfrastructureNetwork   → CloudNetwork
  InfrastructureStorage.tsx → CloudStorage.tsx
  InfrastructureStorage   → CloudStorage

Testid prefix renames (data-testid + FlatTable testId props):
  infrastructure-topology-* → cloud-architecture-*
  infrastructure-compute-*  → cloud-compute-*
  infrastructure-network-*  → cloud-network-*
  infrastructure-storage-*  → cloud-storage-*
  infrastructure-pools-*    → cloud-pools-*
  infrastructure-pool-row-* → cloud-pool-row-*
  infrastructure-nodes-*    → cloud-nodes-*
  infrastructure-node-row-* → cloud-node-row-*
  infrastructure-pvcs-*     → cloud-pvcs-*
  infrastructure-pvc-row-*  → cloud-pvc-row-*
  infrastructure-buckets-*  → cloud-buckets-*
  infrastructure-bucket-row-* → cloud-bucket-row-*
  infrastructure-volumes-*  → cloud-volumes-*
  infrastructure-volume-row-* → cloud-volume-row-*
  infrastructure-lbs-*      → cloud-lbs-*
  infrastructure-lb-row-*   → cloud-lb-row-*
  infrastructure-peerings-* → cloud-peerings-*
  infrastructure-peering-row-* → cloud-peering-row-*
  infrastructure-firewalls-* → cloud-firewalls-*
  infrastructure-firewall-row-* → cloud-firewall-row-*
  infra-edge-*              → cloud-edge-*
  infra-node-*              → cloud-node-*
  infra-topology-arrow      → cloud-architecture-arrow

Modal testids (`infrastructure-modal-*`) are out of scope for P1 and
keep their current shape — those modal components are reused beyond
the Cloud surface.

Architecture sub-page user-visible strings updated:
  "Loading topology…" → "Loading architecture…"
  "Couldn't load topology" → "Couldn't load architecture"
  "Topology will appear here..." → "The cloud architecture will appear here..."
  aria-label: "Sovereign infrastructure topology" → "Sovereign cloud architecture"

Router imports + component references switched to the renamed
exports. Test files updated alongside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:08:45 +04:00
hatiyildiz
344a8009df feat(catalyst-ui): redirect /infrastructure/* → /cloud/*
Converts every legacy /provision/$deploymentId/infrastructure/* path
into a beforeLoad redirect that targets the equivalent /cloud/* route,
preserving the $deploymentId param so deep links and bookmarks land
on the renamed surface without an extra hop:

  /infrastructure                    → /cloud/architecture
  /infrastructure/topology           → /cloud/architecture
  /infrastructure/compute            → /cloud/compute
  /infrastructure/network            → /cloud/network
  /infrastructure/storage            → /cloud/storage

The redirect routes still register tanstack-router components (a
no-op stub), because the route node must exist for the path to match
before `beforeLoad` fires.

Updates the cosmetic-guard suite to assert the new redirect
behaviour + the new sidebar shape (sov-nav-cloud accordion replacing
the flat sov-nav-infrastructure entry). The original `infrastructure
page` describe block is replaced by a tighter `cloud section` one
that focuses on structural surface contract; deeper accordion
behaviour is owned by the new cloud-nav.spec.ts (added in a
subsequent commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:08:45 +04:00
hatiyildiz
9b47b44cf6 feat(catalyst-ui): sidebar accordion under Cloud + persist expand state
Replaces the flat Infrastructure entry in the Sovereign sidebar with a
Cloud accordion (issue #309). The four sub-pages — Architecture,
Compute, Network, Storage — render as indented entries under the Cloud
header instead of as an in-page tab strip.

Behavior:
  - Cloud header is a <button> (not a Link) that toggles the
    accordion. Active when on any /cloud/* (or legacy /infrastructure/*)
    route.
  - Sub-items are tanstack-router <Link>s targeting
    /provision/$deploymentId/cloud/{architecture,compute,network,storage}.
    Active sub-item carries aria-current="page".
  - Auto-expanded by default when the operator is on a /cloud/* route.
  - Persists expand state in localStorage under
    `sov-nav-cloud-expanded` so it survives page reloads.
  - ARIA: aria-expanded + aria-controls on the header; the sub-list
    is role="group" with the matching id (sov-nav-cloud-group).
  - Keyboard accessible: Enter / Space toggle the accordion.

Test IDs:
  sov-nav-cloud (header), sov-nav-cloud-toggle (chevron),
  sov-nav-cloud-architecture, sov-nav-cloud-compute,
  sov-nav-cloud-network, sov-nav-cloud-storage (sub-items),
  sov-nav-cloud-group (group container).

Issue #309 founder verbatim:
  "have accordion menu under cloud left pane"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:08:45 +04:00
hatiyildiz
4b4241a7e3 feat(catalyst-ui): rename InfrastructurePage→CloudPage, drop tab strip
Renames the Sovereign Cloud shell + replaces the in-page Topology /
Compute / Storage / Network tab strip with a future sidebar accordion.
The sub-page contents are unchanged in this commit (they keep their
file names + testids; the next commits rename those).

Changes:
  - InfrastructurePage.tsx → CloudPage.tsx (file + class + context).
  - InfrastructureContext / useInfrastructure() → CloudContext /
    useCloud() — sub-pages updated to pull from the renamed hook.
  - Page header "Infrastructure" → "Cloud"; tagline rewritten so it no
    longer enumerates the legacy tab labels.
  - Drop INFRA_TABS, resolveActiveTab, the <nav role=tablist> block,
    and the .tabs / .tab CSS rules. The sidebar accordion (next
    commit) replaces the in-page navigation.
  - data-testid renames: infrastructure-page → cloud-page,
    infrastructure-title → cloud-title,
    infrastructure-content → cloud-content,
    infrastructure-sovereign-switcher → cloud-sovereign-switcher.
  - Compute table cluster-link target updated from /topology →
    /cloud/architecture so it lands on the renamed canvas route.
  - InfrastructurePage.test.tsx renamed; tab-strip assertions
    converted into "tab strip is absent" assertions.
  - Sub-page test fixtures updated to mount under /cloud/* paths.

Issue #309 founder verbatim:
  "we call it as cloud maybe"
  "have accordion menu under cloud left pane"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:08:45 +04:00
hatiyildiz
c007bc41e0 feat(catalyst-ui): add /cloud/* routes alongside /infrastructure/*
Adds the new Sovereign-portal Cloud surface routing tree (issue #309)
without removing the legacy /infrastructure/* paths yet:

  /provision/$deploymentId/cloud                  → CloudPage shell
    ↳ /                                            → redirect to /architecture
    ↳ /architecture                                → Architecture canvas
    ↳ /compute                                     → CloudCompute
    ↳ /network                                     → CloudNetwork
    ↳ /storage                                     → CloudStorage

Both /infrastructure/* and /cloud/* now resolve to the same components.
Subsequent commits will rename the components, drop the in-page tab
strip, switch the sidebar to an accordion, and convert /infrastructure/*
into redirects to /cloud/*.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:08:45 +04:00
e3mrah
23b0d648fd
docs(lessons-learned): helm-controller RBAC + parse behavior — from #338, #340 (#343)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-01 08:02:41 +04:00
e3mrah
b8d7a8b9cf
fix(bp-seaweedfs): disable global.enableSecurity to avoid fromToml on helm-controller v1.1.0 (#339)
Upstream seaweedfs/seaweedfs templates/shared/security-configmap.yaml
uses Helm template fromToml; helm-controller v1.1.0's bundled helm SDK
(v3.x older than 3.13) doesn't define fromToml so the install fails:
  parse error at security-configmap.yaml:21: function fromToml not defined
Setting global.seaweedfs.enableSecurity: false skips the entire template.
Internal SeaweedFS API is cluster-IP only on Sovereign-1; chart-level
security is acceptable to defer until helm-controller is bumped.
Bumped 1.0.0 → 1.0.1.
Unblocks the chain: bp-loki, bp-mimir, bp-tempo, bp-velero, bp-harbor,
bp-grafana all dependsOn bp-seaweedfs.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 23:42:43 +04:00
e3mrah
9554be4a5e
fix(bp-external-secrets): gate ClusterSecretStore on CRD presence + drop delete-policy (#337)
The chart's post-install hook was failing on otech.omani.works:
  failed post-install: unable to build kubernetes object for deleting hook
  bp-external-secrets/templates/clustersecretstore-vault-region1.yaml:
  resource mapping not found for kind ClusterSecretStore in version
  external-secrets.io/v1beta1
Two corrections:
1. Capabilities-gate the entire template — don't render unless the
   ClusterSecretStore CRD is registered (it ships in via the upstream
   ESO subchart but isn't live on first install)
2. Remove 'before-hook-creation' delete-policy (was the actual trigger
   for the 'deleting hook' failure path)
Bumped 1.0.0 → 1.0.1.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 23:31:24 +04:00
e3mrah
2de8bb68b9
fix(ci): bump helm 3.16.3 → 3.18.4 in blueprint-release — fixes seaweedfs smoke-render (#336)
'function fromToml not defined' error on bp-seaweedfs publish.
Upstream seaweedfs/seaweedfs 4.22.0 (templates/shared/security-configmap.yaml:21)
uses fromToml which exists in 3.13+ but the rendered context in the smoke
step needs newer Sprig functions present in 3.18+. Bump unblocks the
chain of HRs (bp-loki, bp-mimir, bp-tempo, bp-velero, bp-harbor, bp-grafana)
all blocked on bp-seaweedfs publish.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 23:27:45 +04:00
github-actions[bot]
2261b89289 deploy: update catalyst images to 4f80be2 2026-04-30 19:17:23 +00:00
e3mrah
4f80be232a
fix(catalyst-ui): ExecutionLogs uses API_BASE so /api/ → /sovereign/api/ routes correctly (#305 follow-up 4) (#332)
Pre-existing bug exposed by #305: ExecutionLogs fetched
`/api/v1/actions/executions/{id}/logs` directly instead of going
through API_BASE (`${BASE}api`). Under Vite's `/sovereign/` base path,
the Traefik ingress only routes `/sovereign/api/...` — bare `/api/...`
returns 404.

Live evidence after #328 (jobId raw colon fix):
  GET /sovereign/api/v1/deployments/.../jobs/{id} → 200  (FE rewire OK)
  GET /api/v1/actions/executions/{realExecId}/logs → 404 (this bug)

Note that the executionId in the failing URL is a real 32-char hex
(5f59cb0bc9df2c720b4cf07989e4dc4f), not the synthetic `:latest` —
proving the rewire in #307 + the colon fix in #328 both worked. Only
the logs URL prefix remained wrong.

Fix: import API_BASE; use `${API_BASE}/v1/actions/executions/...`.
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode URLs in app
source) — the original direct `/api/...` was a violation that this
PR settles permanently.

Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:15:29 +04:00
e3mrah
aa77537be1
fix(catalyst-ui): Flow — pipeline spacing, click highlight, no standalone /flow (#333)
Five operator-spec corrections:

1. More structured (pipeline-like)
   forceX strength 0.32 → 0.55. Same-depth siblings now cluster around
   their depth column; pipeline-y horizontal feel preserved.

2. Min spacing between bubbles + smaller bubbles
   NODE_RADIUS 30 → 22 (more breathing room).
   COLLIDE_PADDING 6 → 14 (forces wider gap regardless of zoom).

3. Hard MAX bubble size — no more elephant in batch view
   Auto-fit viewBox now enforces a MIN viewBox size (1200×700). Single-
   bubble or few-bubble cases (batch detail, etc.) keep the canvas at
   that minimum so the bubble can't scale up to fill the whole screen.
   bbox is centered within the (possibly larger) viewBox.

4. Click highlight — selected node + neighbors + connecting edges
   • openJobId node: amber outer ring (4px) + amber glow halo
   • Direct neighbors: lighter-amber ring (3px) + softer halo
   • Edges connecting selected node: amber stroke 2.6px + amber arrow
   • Non-selected non-neighbor nodes: dimmed to opacity 0.35
   • Status fill kept (so we still see succeeded/failed/running/pending)
   The amber palette is distinct from any status colour so selection
   reads clearly even on running (cyan) or failed (red) bubbles.

5. Remove standalone /flow route + 'Show as Flow' button
   Operator: 'we cannot hard code a specific flow, we'll have multiple
   flows, therefore we should show the flows only under the respective
   jobs.' Removed:
   • provisionFlowRoute from router.tsx
   • 'Show as Flow' button from JobsPage.tsx
   • JobsTable batch chip retargeted from /flow?scope=batch:<id> to the
     canonical /batches/ page (which embeds the flow internally)
   FlowPage component preserved — it's still embedded inside JobDetail
   and BatchDetail as the in-context Flow tab.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 23:13:56 +04:00
github-actions[bot]
eeabe26dbe deploy: update catalyst images to 8c884a8 2026-04-30 19:08:16 +00:00
e3mrah
8c884a8988
fix(catalyst-ui): JobDetail fetches /jobs/{id} with RAW colon, not %3A (#305 follow-up 3) (#328)
The browser auto-encodes `:` to `%3A` when encodeURIComponent is
applied to a path segment. Chi's router does NOT decode %3A before
matching the route, so every JobDetail fetch returned 404 against the
catalyst-api.

Live evidence (Playwright network log on otech wizard, 2026-04-30):

  GET https://console.openova.io/sovereign/api/v1/deployments/
      ce476aaf80731a46/jobs/ce476aaf80731a46%3Ainstall-seaweedfs
  → 404

Internal probe with the raw colon:

  wget http://localhost:8080/api/v1/deployments/.../jobs/
       ce476aaf80731a46:install-seaweedfs
  → 200

Result on the live deployment: every JobDetail page rendered the
"Execution metadata pending" placeholder even though the catalyst-api
DID have a valid execution to surface. Bug is in the FE encoder, not
the backend or the route.

Fix:
  - useJobDetail inserts jobId raw into the URL template. The colon
    is RFC 3986 path-safe so this is correct per spec.
  - deploymentId stays encodeURIComponent'd defensively (it's a hex
    string, no-op in practice, but the encode is cheap insurance).
  - Test now asserts the URL contains the raw `:` and rejects %3A.

Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:06:20 +04:00
github-actions[bot]
87c8626d92 deploy: update catalyst images to 787b284 2026-04-30 18:44:30 +00:00
e3mrah
787b284990
fix(helmwatch): logtailer parses flux v2.4 nested-object HelmRelease format (#305 follow-up 2) (#314)
helm-controller in flux v2.4 (the version Catalyst-Zero pins) emits
structured JSON log lines with HelmRelease as a NESTED OBJECT:

  "HelmRelease":{"name":"bp-mimir","namespace":"flux-system"}

The old regex only matched the legacy flat-string format
(`helmrelease="flux-system/bp-X"` or `"helmrelease":"flux-system/bp-X"`).
Result on otech.omani.works: every helm-controller stdout line was
parsed but did not match → silently dropped → zero PhaseComponentLog
events emitted → exec log viewer rendered only synthetic [seeded] /
[<state>] anchor lines.

Verified by tailing helm-controller-86c6b84dcd-t58td on the live otech
cluster (10h reconcile activity, format consistent across hundreds of
lines).

Fix:
  - logtailer.helmControllerNameRe now alternates across all three
    observed formats: flat-string colon, flat-string equals, and
    nested-object name+namespace.
  - pumpLines picks whichever capture group fired (regex alternation
    leaves the other group empty).
  - logtailer_test.go fixtures extended with two real flux v2.4
    nested-object samples copied verbatim from the live otech
    cluster's helm-controller stdout.

Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:42:34 +04:00
e3mrah
7956a780c1
fix(catalyst-ui): Flow — straight edges, drag pins permanently, auto-fit viewBox (#315)
Three operator-spec corrections to the organic Flow canvas:

1. Straight edges, not bezier curves
   FlowEdge now renders <line x1 y1 x2 y2> rim-to-rim instead of a
   cubic bezier with perpendicular control points.

2. Drag pins permanently — no spring-back
   d3-drag 'end' handler no longer clears d.fx/d.fy. The bubble stays
   exactly where the operator dropped it. Operator can re-drag any time.
   forceX/forceY anchors only act on non-pinned (fx/fy === null) nodes.

3. Auto-fit viewBox — smart canvas filling regardless of node count
   Replaced fixed viewBox="0 0 2000 1100" with bbox computed each
   render: vbX/vbY = min(x|y) - padding, vbW/vbH = (max - min) +
   2*padding. preserveAspectRatio="xMidYMid meet" then auto-scales.
   Result:
     • 2 bubbles at depth 0/1 → small bbox → tight zoom (no
       irrelevant left-right corner flight)
     • 35 bubbles at depth 0..6 → wide bbox → full canvas use (~85-95%)
   Bubble radius stays 30px; per-depth x step stays 150px; per-region
   band height 240px — all bounded so links can't stretch arbitrarily.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 22:41:24 +04:00
github-actions[bot]
7ef7ad68cf deploy: update catalyst images to 20fd788 2026-04-30 18:22:52 +00:00
e3mrah
20fd78807f
fix(catalyst-ui): inject canonical bootstrap-kit dep graph so organic depth resolves (#312)
PR #308 shipped the organic layout. Live verification at 1440px showed:
- bubbles cluster at depth=0 (left ~12% of canvas)
- only 1 edge rendered

Root cause: live Job objects from the backend bridge don't carry their
upstream dependsOn arrays — the bridge surfaces flat status only. The
useJobHints hook was relying on Job.dependsOn + ApplicationDescriptor
deps; both are empty for bootstrap-kit jobs (cilium, cert-manager,
spire, etc.) because they're not user-selected components.

Fix: encode the canonical bootstrap-kit dep graph from
docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2 directly in useJobHints, with
a bareName→liveJobId resolver that handles the various id formats
the backend may use ('bp-cnpg' / 'install-cnpg' / 'install-cnpg::r1').

Result: depth populates 0..6 (longest chain cilium → cert-manager →
spire → openbao → keycloak → gitea → catalyst-platform), bubbles
spread across full canvas width via depthToX(depth/maxDepth), edges
render between every parent→child pair.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 22:20:56 +04:00
github-actions[bot]
2bc0e1179e deploy: update catalyst images to 3628b73 2026-04-30 18:19:07 +00:00
e3mrah
3628b73a4d
fix(helmwatch): default CoreFactory in production so logtailer actually runs (#305 follow-up) (#311)
In production, handler.New() never assigns h.coreFactory, so phase1_watch
left cfg.CoreFactory == nil. helmwatch.NewWatcher had no default for
CoreFactory (DynamicFactory had one) → the helm-controller log tailer was
never launched → every PhaseComponentLog event was silently dropped.

Result on the live otech cluster: the bridge fix in #307 worked
correctly for state transitions, but the GitLab-style log viewer only
ever saw the synthetic [seeded] / [<state>] anchor lines because the
upstream emission path of raw helm-controller stdout was disconnected.

Fix:
  - helmwatch.NewWatcher defaults CoreFactory to
    NewKubernetesClientFromKubeconfig (mirroring the existing
    DynamicFactory default).
  - New regression test TestNewWatcher_DefaultsBothFactories asserts
    both factories are non-nil after construction.

Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:17:10 +04:00
github-actions[bot]
add7f47ba1 deploy: update catalyst images to 4bf1e12 2026-04-30 18:10:36 +00:00
e3mrah
4bf1e1285e
fix(jobs): JobDetail uses real exec id; bridge captures raw helm-controller logs (#305) (#307)
End-to-end fix for the JobDetail log viewer. Three stacked bugs surfaced
by https://console.openova.io/sovereign/provision/ce476aaf80731a46/jobs/install-seaweedfs:

A. Frontend constructed `${jobId}:latest` and sent it to
   /api/v1/actions/executions/{id}/logs. The catalyst-api resolves
   execId by exact match against 16-byte hex IDs — there is no
   `:latest` route, so every log fetch returned 404 and the viewer
   rendered "Failed to load log page" / "No logs captured for this log".

B. SeedJobsFromInformerList wrote a Job row with status=running for
   non-terminal HR states (installing/degraded) but skipped
   StartExecution AND set b.lastState[comp]=state. Subsequent
   OnHelmReleaseEvent calls with the same state took the prev==state
   early-return and never allocated an Execution. 7 jobs on the live
   otech cluster were stuck this way.

C. OnProvisionerEvent filtered ev.Phase != "component" and dropped
   every PhaseComponentLog event the helmwatch logtailer emits. Raw
   helm-controller stdout (one line per reconcile/error/event) never
   reached the persisted Execution log file — the GitLab-style viewer
   only ever rendered synthetic [seeded] / [<state>] summary lines.

Fixes:

- helmwatch_bridge.go::SeedJobsFromInformerList now allocates an
  Execution + writes a [seeded] anchor line for installing/degraded
  states. The Execution is left OPEN so OnHelmReleaseEvent and
  OnRawComponentLog can keep appending until the HR transitions to a
  terminal state.

- helmwatch_bridge.go::OnProvisionerEvent dispatches on Phase:
  "component" → OnHelmReleaseEvent (state transitions);
  "component-log" → new OnRawComponentLog (raw helm-controller line
  appended verbatim to the active Execution). Resolution policy on a
  missing in-memory cursor: re-attach to the persisted
  LatestExecutionID for non-terminal Jobs; allocate fresh for unknown
  Jobs; drop for terminal Jobs (post-install drift-check chatter).

- ui/src/pages/sovereign/useJobDetail.ts (new) — React Query hook
  fetches /api/v1/deployments/{id}/jobs/{jobId} and exposes
  executions[0].id as the latestExecutionId. 5s poll while the
  deployment is in flight.

- ui/src/pages/sovereign/JobDetail.tsx — replaces the synthetic
  `${jobId}:latest` with detail.latestExecutionId. When executions[]
  is empty, renders ExecutionLogsPlaceholder with status-aware copy
  (pending / loading / empty / error) instead of an empty log viewer.

Tests:

- 4 new Go tests on the bridge: raw-log appendsToActiveExecution,
  allocatesExecutionWhenJobMissing, dropsAfterTerminal, and
  dropsUnknownPhases. Existing seed-idempotency tests updated for
  the new "non-terminal seed allocates Execution" contract.

- 2 new vitest cases on JobDetail: uses real executions[0].id (NOT
  `${jobId}:latest`) when fetching log lines; renders placeholder
  (not viewer) when executions[] is empty.

- All 502 vitest pass; all api Go tests pass; production UI build
  clean.

Closes via UAT on https://console.openova.io/sovereign/provision/ce476aaf80731a46/jobs/install-seaweedfs

Refs #204, supersedes the cosmetic #232 surface.

Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:08:45 +04:00
e3mrah
a856bfb92d
fix(catalyst-ui): Flow organic layout — full-width spread by depth, no STAGE grid (#308)
Replaces the stage-column / Sugiyama grid that all prior Flow PRs
inherited (#245, #282, #299, #303, #304). The grid was the actual
cause of the "8x5 squashed in middle 1/3" bug operators kept rejecting
— bubbles spawned in column-grid positions and physics could only
nudge them slightly off the grid.

Per operator spec (2026-04-30):
  • Bubbles spread organically across full canvas width.
  • X-axis = dependency depth (longest-path-from-root); depth 0 left,
    deepest right; 6%-94% of viewport.
  • Y-axis = region midpoint + per-node deterministic vertical jitter,
    so same-depth siblings scatter naturally — NOT a strict column.
  • Edges are bezier curves with status-colored arrowheads, drawn
    each tick from live simulation positions.
  • NO "STAGE 1/2/..." labels. NO column dividers. NO grid.
  • Bubbles draggable (d3-drag); collision avoidance via d3-force.
  • Batch view: single-click → BatchSummaryPane (start, finish OR ETA,
    duration, succeeded/running/pending/failed counts).
  • Batch view: double-click drills via ?scope=batch:<id>&view=jobs
    (siblings stay rendered at parent level via the URL scope).

New files:
  • src/lib/flowLayoutOrganic.ts — pure data prep (depth, region,
    family, edges); NO precomputed positions.
  • src/pages/sovereign/FlowCanvasOrganic.tsx — full SVG renderer
    with d3-force seed + drag.
  • src/pages/sovereign/BatchSummaryPane.tsx — right floating pane
    for batch-mode single-click.

Updated:
  • FlowPage.tsx — switches imports + renderer; routes batch dbl-click
    via ?scope= URL; routes single-click pane by mode.

Old flowLayoutV4.ts + FlowCanvasV4.tsx are kept on disk for now (only
DEFAULT_FAMILIES is still imported); a follow-up PR will delete them.

Per docs/INVIOLABLE-PRINCIPLES.md:
  §1 (waterfall) — full target-state organic layout in this PR.
  §2 (no compromise) — replace the wrong layout, not patch it.
  §8 (disclose divergence) — flowLayoutV4.ts intentionally retained
    for the DEFAULT_FAMILIES export only; cleanup follow-up.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:07:06 +04:00
github-actions[bot]
aacafb3a23 deploy: update catalyst images to d4440af 2026-04-30 17:29:15 +00:00
e3mrah
d4440afe2a
fix(catalyst-ui): d3-drag listener attach — was failing because .data() join cleared the binding (#304)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 21:27:22 +04:00
github-actions[bot]
42d3df99fd deploy: update catalyst images to dde56cc 2026-04-30 17:14:41 +00:00
e3mrah
dde56cc8e1
fix(catalyst-ui): Flow page single-pane + drag/physics + working batch toggle (#303)
Three corrections per founder spec (verbatim 2026-04-30):

1. SINGLE pane only — removed the persistent left deployment-tree
   panel and the persistent right log-feed panel. Canvas is now full
   width. The exec log appears as a FloatingLogPane only on
   single-click of a job bubble (existing behaviour, unchanged).

2. Job/Batch toggle now actually switches detail level:
   • mode='jobs' (default) renders one bubble per job (~35 nodes)
   • mode='batches' renders one bubble per batch with rolled-up
     status (failed > running > pending > succeeded), startedAt
     (earliest), finishedAt (latest), durationMs (max-earliest),
     and inferred cross-batch dependsOn edges.

3. Bubbles draggable + physics — added d3-force simulation with:
   • forceCollide(r=node.r+4) — natural collision avoidance
   • forceX/forceY toward layout-suggested anchor — soft return
     to canonical position when not held
   • forceLink between dependsOn pairs — gentle attraction
   • d3-drag wired via data-flow-draggable on each <g> node group;
     drag pins node, release lets physics resettle
   • bezier edges recompute control points each tick so they
     follow dragged nodes naturally
   • cursor: grab on every bubble

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:12:48 +04:00
e3mrah
729cd73d4c
fix: restore templates/kustomization.yaml — PR #246 deleted it, blocks contabo Flux (#302)
* fix: correct sme-services/kustomization.yaml corrupted by stderr-capture in #297 — unblocks contabo Flux

* fix: restore templates/kustomization.yaml that PR #246 deleted — sets namespace: catalyst for contabo Flux

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 20:36:17 +04:00
e3mrah
b32fd24588
fix: correct sme-services/kustomization.yaml corrupted by stderr-capture in #297 — unblocks contabo Flux (#301)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 20:30:01 +04:00
e3mrah
ce744d9c70
fix(bp-catalyst-platform): correct sme-services kustomization.yaml that I corrupted in PR #297 (#300)
In PR #297 I tried to restore sme-services/kustomization.yaml via
'git show f4a83a27^:...path... > file 2>&1' but git show failed (the
file didn't exist in that commit) and the stderr got captured into the
file as literal text:

    fatal: path 'products/.../kustomization.yaml' exists on disk, but not in 'f4a83a27^'

Kustomize then choked on this file with:
    invalid Kustomization: json: unknown field "fatal"

This blocked contabo's flux-system/catalyst-platform Kustomization from
applying anything since 16:16 UTC.

Restoring the correct kustomization.yaml content from commit 6eac8a72^.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:27:26 +04:00
github-actions[bot]
b3b2829002 deploy: update catalyst images to 3e6f495 2026-04-30 16:16:36 +00:00
e3mrah
3e6f49596d
fix(bp-catalyst-platform): restore contabo Flux paths broken by PRs 246/280/281/286 (#297)
Three contabo-mkt Flux Kustomizations were broken by my recent PRs:

- flux-system/catalyst-platform: PR #260 added a Helm-template-syntax CRD
  at products/catalyst/chart/templates/crd-provisioningstate.yaml.
  Contabo's Flux Kustomization reads this path as raw YAML and chokes on
  the {{ }} blocks. Moved the CRD to products/catalyst/chart/crds/
  (Helm convention — installed unconditionally, not Helm-templated).

- flux-system/marketplace-api: PR #246 deleted the kustomization.yaml
  index file that contabo's Flux Kustomization needs to enumerate
  manifests. PR #280 deleted the marketplace-api/ingress.yaml. Restored
  both as raw YAML.

- flux-system/sme-services: PR #281 deleted the entire sme-services/
  directory. Restored all 14 manifest files as raw YAML.

Sovereign-side: added .helmignore entries so Sovereign HelmRelease
installs (otech, omantel) skip the contabo-only files entirely:
- templates/ingress.yaml (Traefik Middleware + Ingress for console)
- templates/ingress-console-tls.yaml (TLS-terminating ingress, NEW —
  was missing on contabo, causing TRAEFIK DEFAULT CERT errors)
- templates/sme-services/
- templates/marketplace-api/

Bumped 1.1.6 -> 1.1.8.

Cluster impact:
- contabo: 3 broken Kustomizations recover; console.openova.io gets
  proper Let's Encrypt cert via the new console-openova-tls Certificate.
- otech / omantel Sovereigns: no contabo-mkt content rendered; install
  works clean against chart 1.1.8.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:02:46 +04:00
e3mrah
5502d9aa48
feat(dns): cert-manager-dynadot-webhook for DNS-01 wildcard TLS (closes #159) (#291)
Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer
in bp-cert-manager by shipping the missing piece — a Go binary that
satisfies cert-manager's external webhook contract
(`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json.

Architecture
============

* `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with
  pool-domain-manager and catalyst-dns). Encapsulates the api3.json
  transport, command builders, response decoding, and the safe
  read-modify-write semantics required to never accidentally wipe a
  zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2`
  variant is unexported.
* `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook
  binary. Implements `Solver.Present` via the client's append-only
  `AddRecord` path and `Solver.CleanUp` via the read-modify-write
  `RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`)
  rejects challenges for unmanaged apexes BEFORE any Dynadot call.
* `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm
  wrapper. Templates Deployment + Service + APIService + serving
  Certificate (CA chain via cert-manager Issuer self-signing) +
  RBAC + ServiceAccount. Mirrors the standard cert-manager external-
  webhook deployment shape.
* `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the
  paired ClusterIssuer activates. The interim http01 issuer remains
  templated as the rollback path.

Test results
============

  core/pkg/dynadot-client          — 7 tests PASS  (race-clean)
  core/cmd/cert-manager-dynadot-... — 9 tests PASS  (race-clean)

Test coverage includes a Present/CleanUp round-trip against an
httptest fixture that models Dynadot's zone state, an explicit
unmanaged-domain rejection, a regression preserving a pre-existing
CNAME across the DNS-01 round-trip (the zone-wipe defence), and a
typed-error propagation test that surfaces `ErrInvalidToken` to
cert-manager so the controller will retry.

Helm template smoke render
==========================

`helm template` against the new chart with default values yields 12
resources / 424 lines (APIService, Certificate, ClusterRoleBinding,
Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The
modified bp-cert-manager chart still renders both ClusterIssuers
(`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default
values; flipping `certManager.issuers.dns01.enabled=false` is the
clean rollback.

Smoke command (post-deploy)
===========================

  kubectl get apiservices.apiregistration.k8s.io \
    v1alpha1.acme.dynadot.openova.io
  # Issue a *.<sovereign>.<pool> wildcard cert and watch the
  # Order/Challenge progress through cert-manager.

CI
==

`.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the
pool-domain-manager-build pattern (cosign keyless signing, SBOM
attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager-
dynadot-webhook:<sha>`). Triggered by changes to either the binary or
the shared dynadot-client package.

Closes #159

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:37:47 +04:00
e3mrah
c09109a61a
feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290)
Edge + serverless + model-serving batch (W2.5.C) — three upstream-
subchart umbrella Blueprints completing the bootstrap-kit slots for
WebRTC media relay (bp-relay → bp-stunner) and the AI/ML serving stack
(bp-cortex → bp-kserve → bp-knative).

Each chart follows the canonical umbrella pattern from
docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream
chart under `dependencies:` so `helm dependency build` bundles the
upstream payload into the OCI artifact, and Catalyst-curated overlay
values + templates sit alongside in chart/values.yaml + chart/templates/.

Per-chart highlights:
- bp-stunner/1.0.0 — wraps stunner/stunner-gateway-operator 1.1.0.
  Ships a Cilium-native GatewayClass (Capabilities-gated on
  gateway.networking.k8s.io/v1) so bp-relay (LiveKit / SFU) can claim
  Gateway CRs without an operator-ordering dance. Default UDP TURN port
  range 30000-32767 matches the range opened at the Sovereign edge
  firewall (Crossplane bp-firewall composition).
- bp-knative/1.0.0 — wraps knative-operator v1.21.1. Ships a
  KnativeServing CR pre-configured for **istio-less mode**
  (ingress.istio.enabled=false, ingress.contour.enabled=false,
  ingress.kourier.enabled=false; config.network.ingress-class=cilium).
  Sovereign FQDN sourced from values, no hardcoded fallback per
  inviolable principle #4 — render fails loudly if cluster overlay
  doesn't set knativeOverlay.knativeServing.sovereignFqdn.
- bp-kserve/1.0.0 — wraps kserve/kserve v0.16.0 (latest version
  published on the official OCI registry as of 2026-04-30). Default
  deploymentMode=RawDeployment (no Knative hop on the hot path) but
  bp-knative is still installed (declared as a hard dep) so per-IS
  annotation `serving.kserve.io/deploymentMode: Serverless` opts in to
  scale-to-zero per tenant. Cilium native Gateway-API ingress
  (enableGatewayApi=true, className=cilium, disableIstioVirtualHost=
  true).

Observability discipline (issue #182): every observability toggle
(ServiceMonitor, HPA, GatewayClass) defaults false and is operator-
tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles.
Each chart ships tests/observability-toggle.sh covering default-off,
opt-in (with `--api-versions monitoring.coreos.com/v1` to simulate
Prometheus Operator CRDs), and explicit-off cases.

Per-chart kind summary (helm template default render):

  bp-stunner: ClusterRole, ClusterRoleBinding, ConfigMap, Dataplane,
              Deployment, Role, RoleBinding, Service, ServiceAccount.
              (+ GatewayClass when --api-versions
              gateway.networking.k8s.io/v1 is passed.)

  bp-knative: ClusterRole, ClusterRoleBinding, ConfigMap,
              CustomResourceDefinition, Deployment, KnativeServing,
              Role, RoleBinding, Secret, Service, ServiceAccount.

  bp-kserve:  Certificate, ClusterRole, ClusterRoleBinding,
              ClusterServingRuntime, ClusterStorageContainer,
              ConfigMap, Deployment, Gateway, Issuer,
              MutatingWebhookConfiguration, Role, RoleBinding,
              Service, ServiceAccount, ValidatingWebhookConfiguration.

`helm lint` clean for all three (single INFO on missing icon — icons
land with marketplace card work).

`bash tests/observability-toggle.sh` green for all three (3 cases each:
default-off, opt-in, explicit-off).

Closes #263 #264 #265

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:37:38 +04:00
e3mrah
782d8015c5
feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289)
W2.5.F — three Catalyst Blueprint umbrella charts at platform/{openmeter,
livekit,matrix}/, each declaring its upstream chart under Chart.yaml
`dependencies:` so `helm dependency build` bundles the upstream payload
into the published OCI artifact (per docs/BLUEPRINT-AUTHORING.md §11.1
— hollow charts forbidden, CI-enforced by issue #181).

Per-chart kind summary
======================

bp-openmeter (closes #272)
  default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount
  upstream chart: openmeter 1.0.0-beta.213 (oci://ghcr.io/openmeterio/helm-charts)

  ClickHouse-less profile per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §6.4.
  The upstream chart's bundled clickhouse / kafka / postgresql / redis /
  svix subcharts are all DISABLED — Catalyst supplies CNPG (postgres),
  JetStream (event bus), and Valkey (redis-compat) at the platform tier.
  Chart-level toggle `catalystBlueprint.backend.kind` (default `cnpg`,
  alt `clickhouse`) records the active profile so observability/audit
  pipelines can report it. The OpenMeter binary's
  `aggregation.clickhouse.address` is left blank — per-Sovereign overlay
  supplies it once a host cluster adds bp-clickhouse and the operator
  re-rolls with `backend.kind: clickhouse`. Catalyst overlay templates
  (NetworkPolicy / ServiceMonitor / HPA) all default OFF per
  docs/BLUEPRINT-AUTHORING.md §11.2.

bp-livekit (closes #273)
  default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount
  upstream chart: livekit-server 1.9.0 (https://helm.livekit.io)

  WebRTC SFU. Powers the Huawei iFlytek voice demo. Catalyst defaults
  pair LiveKit with bp-stunner (the upstream chart's bundled co-located
  TURN server is OFF; per-Sovereign overlay points the LiveKit TURN
  config at the stunner UDP-gateway Service). RTC UDP port range is
  50000-60000 (matches the Hetzner firewall rule the per-Sovereign
  overlay opens). Catalyst overlay templates (NetworkPolicy /
  ServiceMonitor / HPA) all default OFF; the chart's NetworkPolicy
  template documents that LiveKit's hostNetwork mode means pod-level
  policies do NOT cover the SFU port range — the firewall rule is the
  load-bearing control. blueprint.yaml `depends:` declares bp-stunner +
  bp-cert-manager + bp-valkey.

bp-matrix (closes #274)
  default `helm template` kinds: ConfigMap, Deployment, Ingress, Job,
  PersistentVolumeClaim, Pod, Role, RoleBinding, Secret, Service,
  ServiceAccount
  upstream chart: matrix-synapse 3.12.25 (https://ananace.gitlab.io/charts)

  Synapse (the Matrix server implementation, NOT the retired OpenOva
  product noun). Federation OFF by default (Catalyst per-Sovereign
  tenancy default — operator overlays flip it on per-Organization).
  Postgres backend via bp-cnpg externalPostgresql; OIDC SSO via
  bp-keycloak; bundled bitnami postgresql + redis subcharts both
  disabled. Catalyst overlay NetworkPolicy gates the federation port
  (8448) on `federation.enabled` — verified by Case 5 of the
  observability-toggle test. Catalyst-overlay ServiceMonitor (upstream
  chart has none) + HPA both default OFF.

Lint
====
All three charts pass `helm lint` clean (only the noisy "icon is
recommended" INFO message).

Observability tests
===================
Each chart's `tests/observability-toggle.sh` enforces the Catalyst
contract from docs/BLUEPRINT-AUTHORING.md §11.2:
  Case 1: default render produces zero monitoring.coreos.com/v1
          resources (no ServiceMonitor / PrometheusRule).
  Case 2: opt-in (--set serviceMonitor.enabled=true --api-versions
          monitoring.coreos.com/v1) renders a ServiceMonitor.
  Case 3: explicit-off render is clean.
  Case 4 (per chart):
    - openmeter: ClickHouse-less profile asserts no
      clickhouse.altinity.com / Kafka subchart resources leak into the
      default render.
    - livekit:   asserts upstream livekit-server.serviceMonitor.create
      defaults false.
    - matrix:    asserts default render carries an empty
      federation_domain_whitelist (the per-Sovereign tenancy default).
  Case 5 (matrix only): `--set federation.enabled=true networkPolicy
          .enabled=true` opens port 8448 in the Catalyst NetworkPolicy.

All gates green for all three charts.

Closes #272 #273 #274

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 19:37:28 +04:00
e3mrah
87d9a4afa7
feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288)
W2.5.E batch — three Application-tier Blueprints completing the LLM
serving / workflow stack:

- bp-temporal/1.0.0 — wraps temporal/temporal 1.2.0 (the new chart
  rewrite that removed cassandra:/mysql:/postgresql:/elasticsearch:/
  prometheus:/grafana: top-level keys in favour of
  server.config.persistence.datastores). Postgres-only via CNPG-backed
  visibility store (skip Cassandra). Web UI ON. Keycloak OIDC
  integration via --auth-claim-mapper renders auth.yaml ConfigMap
  (operator wires via additionalVolumes once bp-keycloak is
  reconciled, default OFF). dependsOn: bp-cnpg + bp-cert-manager.
  Closes #271.
  Kinds: Cluster (CNPG) + ConfigMap + Deployment + Job + Pod +
  Service.

- bp-llm-gateway/1.0.0 — wraps berriai/litellm-helm 0.1.572 from OCI.
  Subscription-aware proxy for Claude Code: routes to Anthropic (via
  operator OAuth/Max subscription — NEVER an ANTHROPIC_API_KEY,
  per memory/feedback_no_api_key.md), Bedrock, Vertex,
  OpenAI-compatible (via bp-anthropic-adapter), and self-hosted
  vLLM. CNPG-backed audit log (every prompt + response persisted
  for compliance). Bundled bitnami postgresql + redis subcharts
  DISABLED (db.useExisting=true points at the CNPG cluster).
  Keycloak SSO via auth.yaml ConfigMap (default OFF).
  ExternalSecret-backed environmentSecrets brings tokens / IAM
  creds in without inlining plaintext. dependsOn: bp-cnpg +
  bp-keycloak + bp-external-secrets. Closes #267.
  Kinds: Cluster (CNPG audit) + ConfigMap + Deployment + Job +
  Pod + Secret + Service + ServiceAccount.

- bp-anthropic-adapter/1.0.0 — Catalyst-authored scratch chart for
  the OpenAI ↔ Anthropic translation Go service. SHA-pinned image
  ghcr.io/openova-io/openova/anthropic-adapter:<sha> (Inviolable
  Principle #4a — GitHub Actions is the only build path; empty
  default tag fails the render with a clear error instead of
  silently shipping :latest). OAuth/Max subscription token mounted
  from K8s Secret materialized by ESO from bp-openbao —
  ANTHROPIC_OAUTH_TOKEN env var, NEVER an ANTHROPIC_API_KEY.
  Includes OpenAI → Anthropic model-mapping ConfigMap (gpt-4 →
  claude-3-5-sonnet, gpt-4o-mini → claude-3-5-haiku, etc.).
  sigstore/common library subchart included to satisfy the
  hollow-chart gate (matches bp-vllm pattern from #283).
  dependsOn: bp-external-secrets. Closes #268.
  Kinds: ConfigMap + Deployment + Service + ServiceAccount.

CRITICAL — bp-llm-gateway and bp-anthropic-adapter both consume the
operator's Claude OAuth/Max subscription. Per memory/
feedback_no_api_key.md and the user's standing instruction, neither
chart accepts or generates an ANTHROPIC_API_KEY. Tokens flow
exclusively through ExternalSecret-managed K8s Secrets that ESO
materializes from bp-openbao at install time.

Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every
observability toggle defaults `false` (ServiceMonitor / metrics
sidecar / PodMonitor) and is operator-tunable via per-cluster
overlay once bp-kube-prometheus-stack reconciles. Each chart ships
tests/observability-toggle.sh covering default-off, opt-in (with
--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and
explicit-off cases. bp-anthropic-adapter additionally tests the
never-:latest gate via Case 4 (empty image tag must fail render).

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every
upstream version, namespace, server URL, role, secret name, model
default, and toggle is exposed under values.yaml. Cluster overlays
in clusters/<sovereign>/ may override without rebuilding the
Blueprint OCI artifact.

Per docs/BLUEPRINT-AUTHORING.md §11.1 (umbrella shape — hard
contract): bp-temporal and bp-llm-gateway declare their upstream
charts under Chart.yaml dependencies: so helm dependency build
bundles the upstream payload into the OCI artifact. bp-anthropic-
adapter is a scratch chart (no upstream Helm chart exists) and
includes sigstore/common as the obligatory hollow-chart-gate
dependency, matching the bp-vllm precedent from W2.5.D (#283).

Closes #267
Closes #268
Closes #271

helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only)

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 19:37:19 +04:00
e3mrah
a6bf07b0ce
feat(charts): bp-librechat wrapper chart (closes #275) (#287)
W2.5.G — Catalyst-authored scratch chart for LibreChat (slot 48 of the
omantel-1 bootstrap-kit). LibreChat upstream does not publish a Helm
chart, so this chart hand-wires the official ghcr.io/danny-avila/librechat
container as Deployment + Service + Ingress + ConfigMap + ServiceAccount
+ NetworkPolicy + ServiceMonitor + HPA, with the sigstore/common
library subchart declared to satisfy the hollow-chart gate (issue #181).

Per docs/BLUEPRINT-AUTHORING.md §11.2: every observability toggle
(serviceMonitor, hpa) defaults false; opt-in via per-cluster overlay
once kube-prometheus-stack reconciles. The ServiceMonitor template is
double-gated by .Values.serviceMonitor.enabled AND
Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so flipping the
toggle on a too-early Sovereign cannot break the bp-librechat reconcile.

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every endpoint
URL, model name, secret reference, namespace selector, and image tag is
operator-tunable via values.yaml. The Sovereign FQDN, Keycloak issuer,
llm-gateway URL, embeddings URL, and TLS ClusterIssuer are all
operator-supplied at install time. The image tag is pinned to v0.7.5
(no :latest).

Connectors:
- Chat completions: bp-llm-gateway (OpenAI-compatible /v1/chat/completions)
  exposed as a "custom" endpoint named "Catalyst LLM"
- Embeddings (RAG): bp-bge — provider=bge maps to EMBEDDINGS_PROVIDER=openai
  + RAG_OPENAI_BASEURL=<bge.svc> at template-render time
- SSO: bp-keycloak (OpenID Connect) — issuer/clientId from values,
  client secret + session secret from ExternalSecret
- Conversation store: FerretDB on bp-cnpg (MongoDB wire protocol over
  Postgres) — operator-supplied connection URI

Hosted at chat-app.<sovereign-fqdn>; the chart `fail`s render if
ingress.host is empty (no platform-wide default).

helm template (default values, --set ingress.host=...):
  ConfigMap, Deployment, Ingress, NetworkPolicy, Service, ServiceAccount

helm template (--set hpa.enabled=true serviceMonitor.enabled=true
              --api-versions monitoring.coreos.com/v1):
  ConfigMap, Deployment, HorizontalPodAutoscaler, Ingress, NetworkPolicy,
  Service, ServiceAccount, ServiceMonitor

helm lint: 1 chart(s) linted, 0 chart(s) failed (single INFO on
missing icon — icons land with the marketplace card work).

tests/observability-toggle.sh: PASS on default-off, opt-in
(--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and
explicit-off cases.

Path isolation: only platform/librechat/ — no HR slot files,
blueprint-release.yaml, or other charts touched. The HR slot files
(clusters/.../48-librechat.yaml) and blueprint-release.yaml will land
in a separate slot-wiring PR per the W2.K4 expansion plan.

Closes #275

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:56:59 +04:00
e3mrah
f4a83a27eb
fix(bp-catalyst-platform): remove SME-namespace references — Sovereigns don't run SME (closes #281) (#286)
Sovereigns don't have an `sme` namespace, so installing bp-catalyst-platform
1.1.5 on otech.omani.works failed with:

  Helm install failed for release catalyst-system/catalyst-platform
  with chart bp-catalyst-platform@1.1.5:
    failed to create resource: namespaces "sme" not found

Same family of bug as #279 (Traefik Middleware): Group C cutover dragged
contabo-mkt-only SME product manifests into the Catalyst umbrella chart.
PR #280 deleted the SME *ingresses* but the deeper microservice mesh
remained.

Fix
---
Delete the entire `products/catalyst/chart/templates/sme-services/`
directory — 13 manifests, ~36 K8s resources. Every one of them is
hardcoded to `namespace: sme` and to `sme.openova.io` URLs. The SME
service mesh (auth/catalog/tenant/provisioning/billing/domain/
notification/gateway/console/admin/marketplace + configmap + SAs) is
the OpenOva.io contabo-mkt marketplace product, not part of the
Catalyst control plane that ships with every Sovereign.

If/when SME is redeployed it will live in a contabo-mkt-only
Kustomization or a separate `bp-sme` Blueprint — out of scope for the
bp-catalyst-platform umbrella, which must remain Sovereign-portable.

Verification
------------
- `grep -rn 'namespace: sme' products/catalyst/chart/templates/` → 0 hits
- `grep -rn 'sme' products/catalyst/chart/templates/` → 0 hits
- `helm template products/catalyst/chart` → exit 0, 260 kinds, 0 SME refs

Versions bumped 1.1.5 → 1.1.6 in:
  - products/catalyst/chart/Chart.yaml (chart + appVersion)
  - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
  - clusters/otech.omani.works/bootstrap-kit/13-bp-catalyst-platform.yaml
  - clusters/omantel.omani.works/bootstrap-kit/13-bp-catalyst-platform.yaml

Closes #281
Related #279, #280 (same root-cause family — Group C cutover artifacts)

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 18:50:36 +04:00
e3mrah
9dc8506dd9
feat(charts): bp-external-secrets + bp-cnpg + bp-valkey wrapper charts (#285)
Storage-substrate batch (W2.5.A) — closes #254 by shipping the three
upstream-subchart umbrella Blueprints that the Flux HRs at
clusters/_template/bootstrap-kit/{15-external-secrets,16-cnpg,17-valkey}
.yaml (merged via PR #262) target.

Each chart follows the canonical umbrella pattern documented in
docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream
chart under `dependencies:` so `helm dependency build` bundles the
upstream payload into the OCI artifact, and Catalyst-curated overlay
values + templates sit alongside in chart/values.yaml + chart/templates/.

Per-chart highlights:
- bp-external-secrets/1.0.0 — wraps external-secrets/external-secrets
  0.10.7. Ships a default `vault-region1` ClusterSecretStore (via Helm
  post-install/post-upgrade hook to defer the CR application until the
  upstream chart's CRDs are registered) wired to the in-cluster
  bp-openbao service. clusterSecretStore.enabled toggle lets cluster
  overlays opt out and author their own multi-region CRs.
- bp-cnpg/1.0.0 — wraps cnpg/cloudnative-pg 0.28.0. Operator-only
  surface (Cluster CRs are per-Application). CRDs ship in-chart so
  bp-powerdns / bp-keycloak / bp-gitea / bp-langfuse / bp-grafana /
  bp-temporal / bp-matrix / bp-llm-gateway / bp-bge / bp-nemo-guardrails
  / bp-openmeter / pool-domain-manager can `dependsOn: bp-cnpg` via
  Flux — closing #254 (bp-powerdns CreateContainerConfigError on
  pdns-pg-app secret).
- bp-valkey/1.0.0 — wraps bitnami/valkey 5.5.1. BSD-3 Redis-compatible
  cache, replication architecture, password auth ON, NetworkPolicy ON,
  replicas 0 by default for solo Sovereigns (cluster overlays bump for
  HA). Application-tier cache only — Catalyst control plane uses NATS
  JetStream KV (per ARCHITECTURE.md §5).

Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every observability
toggle defaults `false` (ServiceMonitor / PodMonitor / PrometheusRule /
metrics sidecar) and is operator-tunable via per-cluster overlay once
bp-kube-prometheus-stack reconciles. Each chart ships
tests/observability-toggle.sh covering default-off, opt-in (--api-versions
monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases.

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every upstream
version, namespace, server URL, role, and password toggle is exposed
under values.yaml. Cluster overlays in clusters/<sovereign>/ may
override without rebuilding the Blueprint OCI artifact.

helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only)
helm template default render kinds:
  bp-external-secrets: ClusterRole, ClusterRoleBinding, ClusterSecretStore, CustomResourceDefinition, Deployment, Role, RoleBinding, Secret, Service, ServiceAccount, ValidatingWebhookConfiguration
  bp-cnpg:             ClusterRole, ClusterRoleBinding, ConfigMap, CustomResourceDefinition, Deployment, MutatingWebhookConfiguration, Service, ServiceAccount, ValidatingWebhookConfiguration
  bp-valkey:           ConfigMap, NetworkPolicy, PodDisruptionBudget, Secret, Service, ServiceAccount, StatefulSet

Closes #254

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 18:39:29 +04:00
e3mrah
ba2ff05292
feat(charts): bp-seaweedfs + bp-harbor + bp-vpa wrapper charts (#284)
W2.5.B — first authoring of the three Catalyst Blueprint wrapper charts
that fill bootstrap-kit slots 18 (seaweedfs), 19 (harbor) and 29 (vpa).
Each wraps an upstream chart as a Helm subchart and ships Catalyst-
curated overlay templates (NetworkPolicy + ServiceMonitor) gated behind
opt-in toggles, per docs/BLUEPRINT-AUTHORING.md §11 and
docs/INVIOLABLE-PRINCIPLES.md.

bp-seaweedfs (slot 18 — storage foundation)
  - Wraps seaweedfs/seaweedfs 4.22.0; Chart name `bp-seaweedfs`.
  - Catalyst defaults: 1 master + 3 volume + 1 filer + 2 s3 replicas.
  - S3 API on 8333 — single S3 surface every consumer talks to per
    docs/PLATFORM-TECH-STACK.md §3.5 (no per-app MinIO).
  - Overlay templates: NetworkPolicy (cross-namespace S3 reachability,
    cold-tier egress allowlist), ServiceMonitor (Capabilities-gated,
    DEFAULT FALSE per §11.2).
  - Default helm template kinds: ClusterRole, ClusterRoleBinding,
    ConfigMap, Deployment, Secret, Service, ServiceAccount, StatefulSet.

bp-harbor (slot 19 — per-Sovereign OCI registry)
  - Wraps goharbor/harbor 1.18.3 (appVersion 2.14.3); Chart name
    `bp-harbor`.
  - Catalyst defaults: blob backend = SeaweedFS S3 (regionendpoint
    seaweedfs-s3.seaweedfs.svc:8333), metadata DB = bp-cnpg external
    Postgres, ingress class `cilium`, expose.tls.enabled true (cert-
    manager-issued Secret).
  - Overlay templates: NetworkPolicy (CNPG/SeaweedFS/Keycloak egress),
    ServiceMonitor (Capabilities-gated, DEFAULT FALSE).
  - Trivy + SSO + pull-mirror are operator-flag opt-ins per per-
    Sovereign overlay (default false; trivy/keycloak/cnpg deps land on
    later slots).
  - Default helm template kinds: ConfigMap, Deployment, Ingress,
    PersistentVolumeClaim, Secret, Service, StatefulSet.

bp-vpa (slot 29 — vertical autoscaling)
  - Wraps cowboysysop/vertical-pod-autoscaler 11.1.1 (appVersion
    1.5.0); Chart name `bp-vpa`.
  - Catalyst defaults: 1 replica each of recommender + updater +
    admission-controller. Default mode `Off` (recommend only).
  - Admission webhook self-signs via init Job (cluster-internal); per-
    Sovereign overlay MAY swap to cert-manager.
  - Overlay templates: NetworkPolicy (apiserver + metrics-server
    egress, admission webhook ingress).
  - Upstream metrics.serviceMonitor / metrics.prometheusRule defaulted
    false per §11.2.
  - Default helm template kinds: ClusterRole, ClusterRoleBinding,
    ConfigMap, Deployment, Job, Pod, Secret, Service, ServiceAccount.

Lint + observability-toggle results
  helm lint: 1 chart(s) linted, 0 chart(s) failed (each)
  tests/observability-toggle.sh: PASS on all three (default render has
  zero monitoring.coreos.com/v1 references; opt-in render produces a
  ServiceMonitor; explicit-off render is clean).

Path isolation: only platform/seaweedfs/, platform/harbor/, and
platform/vpa/ — no HR slot files or other charts touched.

Refs: bootstrap-kit slots 18, 19, 29 reconcile against
ghcr.io/openova-io/bp-seaweedfs:1.0.0, bp-harbor:1.0.0, bp-vpa:1.0.0
which this commit produces on next blueprint-release CI run.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 18:37:50 +04:00
e3mrah
c3c9c0cf27
feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283)
Catalyst-authored umbrella charts for the W2.5.D AI-inference stack.
None of the three upstream projects publish a Helm chart, so each
chart hand-wires the upstream container as Deployment + Service +
ConfigMap + ServiceMonitor + NetworkPolicy + HPA, with the
sigstore/common library subchart declared to satisfy the
hollow-chart gate (issue #181).

bp-vllm (slot 39) — wraps vllm/vllm-openai:v0.6.4. GPU-aware
(nvidia.com/gpu when vllm.gpu.enabled=true; CPU fallback for dev).
Default model meta-llama/Llama-3.1-8B-Instruct, port 8000,
OpenAI-compatible /v1/chat/completions. All engine knobs
(maxModelLen, gpuMemoryUtilization, dtype, quantization,
tensorParallelSize, prefix-caching) overlay-tunable. Closes #266.

bp-bge (slot 42) — wraps ghcr.io/huggingface/text-embeddings-inference:cpu-1.5.
Default model BAAI/bge-small-en-v1.5 + BAAI/bge-reranker-base
sidecar in same Pod. Two-port Service (8080 embed, 8081 rerank)
annotated for bp-llm-gateway discovery. CPU-friendly defaults;
overlay swaps in BAAI/bge-m3 on GPU Sovereigns. Closes #269.

bp-nemo-guardrails (slot 43) — wraps the upstream NVIDIA/NeMo-Guardrails
Dockerfile (nemoguardrails server, FastAPI, port 8000). LLM endpoint
+ model + engine all overlay-tunable; Colang flow bundle mounts via
configMap.externalName for production rails. ConfigMap stub renders
a default rail for smoke testing. Closes #270.

All three charts:
- Default observability toggles to false per BLUEPRINT-AUTHORING.md §11.2
- Pin upstream image tags (no :latest) per INVIOLABLE-PRINCIPLES.md #4
- Non-root securityContext (runAsUser 1000, drop ALL capabilities)
- prometheus.io scrape annotations on the Pod for fallback discovery
- Operator-tunable NetworkPolicy gating ingress to bp-llm-gateway and
  egress to HuggingFace / bp-vllm / bp-bge as appropriate

helm template (default values) per chart:
  bp-vllm:            ConfigMap, Deployment, Service, ServiceAccount
  bp-bge:             ConfigMap, Deployment, Service, ServiceAccount
  bp-nemo-guardrails: ConfigMap, Deployment, Service, ServiceAccount

helm template (--set serviceMonitor.enabled=true networkPolicy.enabled=true hpa.enabled=true):
  All three render ConfigMap + Deployment + Service + ServiceAccount +
  ServiceMonitor + NetworkPolicy + HorizontalPodAutoscaler.

helm lint: 0 chart(s) failed for all three (single INFO on missing icon —
icons land with the marketplace card work).

Closes #266
Closes #269
Closes #270

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:37:07 +04:00
e3mrah
2997edc4f9
fix(catalyst-ui): rebuild Flow canvas to match provision-mockup-v4.png (#282)
Replaces the pill-card swimlane layout shipped in PR #245 — which the
operator rejected as "intentional divergence from the canonical
mockup" (issue #251) — with a circular-node, multi-region, bezier-edge
canvas that matches `marketing/mockups/provision-mockup-v4.png`.

What changed
------------
New geometry library (`src/lib/flowLayoutV4.ts`):
  • Multi-region partition — caller supplies FlowRegion[]; every
    region renders as a horizontal band stacked top → bottom.
  • Per-region longest-path stage assignment (Kahn) keyed on Job
    dependsOn + caller-supplied `extraDepIds` (component-graph edges
    surfaced from ApplicationDescriptor.dependencies so dense families
    like SPINE / GUARDIAN read as columns even when the test catalog
    has no per-job dependsOn yet).
  • Sub-column splitting at MAX_PER_COLUMN (8) so dense families
    stack vertically rather than fanning out into many sub-columns.
  • Caller-injected family palette (PRODUCTS taxonomy from
    componentGroups.ts) so the flow + the wizard StepComponents
    page colour-code identically.
  • Bezier router — straight for span=0 within-region, cubic-bezier
    for span≥1 and all cross-region edges (warm-amber dashed).

New presentation layer:
  • `FlowCanvasV4.tsx` — circular nodes with status-tinted progress
    arcs, family-colour rings, single-letter glyphs (✓ / ✕ / family
    initial), per-status arrow markers, region band frames, and
    stage column dividers + labels.
  • `FlowDeploymentTree.tsx` + `flowDeploymentTreeData.ts` — left
    "DEPLOYMENT PROGRESS" panel; static tree (NO accordion per the
    operator's standing rule), groups by region → family → job.
  • `FlowLogFeed.tsx` — right "LIVE LOG" panel; replays the focused
    job's recent reducer events, status-coloured, blinking cursor
    when live.

`FlowPage.tsx`:
  • Replaces the JobBubble pipelineLayout pipeline with the
    flowLayoutV4 + FlowCanvasV4 + tree + log triplet.
  • Wires region descriptors from `useWizardStore().regions` (with a
    fallback single-region for empty stores).
  • Derives stage hints — Phase 0 = stage 1, cluster-bootstrap =
    stage 2, components = 3 + componentGraphDepth.
  • Picks an initial focused job (running > failed > first) so the
    log feed always shows something on first paint.
  • Inlines the surface CSS so canvas + tree + log stay in lockstep.

Preserved testid contract
-------------------------
  data-testid="flow-canvas-svg"           — root <svg>
  data-testid="flow-job-<jobId>"          — every node group
  data-testid="flow-batch-<regionId>"     — every region band
  data-testid="flow-canvas-empty"         — empty placeholder

So the existing cosmetic-guards Test #6/#7/#8 continue to pass without
edit (Jobs↔Batches mode toggle + single-click → FloatingLogPane behaviour
is unchanged).

New testids for the upgraded V4 surface:
  data-testid="flow-node-circle-<jobId>"  — actual <circle>
  data-testid="flow-region-<regionId>"    — region band frame
  data-testid="flow-stage-<n>"            — stage column divider
  data-testid="flow-edge-<from>-<to>"     — directional edges
  data-testid="flow-deployment-tree"      — left tree
  data-testid="flow-log-feed"             — right log panel

Tests
-----
21 new unit tests in `src/lib/flowLayoutV4.test.ts` lock the layout
contract (multi-region partitioning, cross-region edge classification,
family palette mapping, bezier routing). All 500 vitest tests + tsc
typecheck green.

Per docs/INVIOLABLE-PRINCIPLES.md
---------------------------------
  #1 (waterfall) — full target shape ships in one PR: circular nodes,
                   multi-region bands, bezier edges, log feed,
                   deployment tree, family palette, stage hints.
  #2 (no compromise) — no graph library, no canvas; pure SVG so testids
                       work and the operator can right-click → inspect.
  #4 (never hardcode) — every dimension is in FlowGeometryV4 knobs;
                        every colour comes from the FlowFamily palette
                        (caller-injected, sourced from PRODUCTS).

Closes #251.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:32:02 +04:00
e3mrah
f446407843
fix(bp-catalyst-platform): remove legacy Traefik Middleware references — Sovereigns use Cilium gateway (#280)
Three template files emitted traefik.io/v1alpha1 Middleware resources
(strip-sovereign in catalyst, strip-nova + root-to-nova in sme) that
are leftover artifacts from the contabo-mkt cluster's k3s-bundled
Traefik. Sovereigns use Cilium native gateway per ARCHITECTURE.md §11
and the Traefik CRDs are never installed there.

Decision: DELETE rather than migrate to Gateway API HTTPRoute. The
strip-sovereign middleware was contabo-mkt-specific (catalyst console
was previously hosted under /sovereign path; on Sovereigns the console
serves at the root of console.<FQDN>). The Nova middlewares are for
the SME product not deployed on Sovereigns under the franchise model.

Bumped 1.1.4 -> 1.1.5.

Closes #279

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:21:47 +04:00
e3mrah
0cfd0defa9
fix(bp-langfuse): drop apostrophe from description to clear GHCR 500 (resolves #215) (#278)
Root cause: Helm's `helm push` collapses the chart `description` field
into a single-line OCI manifest annotation
`org.opencontainers.image.description`. The GHCR manifest-PUT validator
returns a deterministic 500 Internal Server Error when that annotation
is long AND contains an ASCII apostrophe. bp-langfuse 1.0.0 was the
only chart in the observability batch (PR #214) carrying both
characteristics, so it was the only one that failed to publish.

Fix: reword the affected sentence from "Langfuse's persistent state" to
"the Langfuse persistent state" — drops the apostrophe, preserves the
meaning, and crucially preserves every byte of the actual chart payload
(values, templates, all 350 entries of the upstream langfuse-1.5.28
subchart with its 4-level-deep Bitnami vendoring). No runtime
behavioural change; helm template renders the exact same 6 resources
across 490 lines.

The narrowing was done by progressively reducing the Chart.yaml from
the failing version to a passing version while pushing to a scratch
GHCR namespace, with the bp-langfuse repo deleted between attempts
(verified via `DELETE /orgs/openova-io/packages/container/bp-langfuse`
and re-querying). The trigger is reproducible: long description +
apostrophe → 500; long description without apostrophe → push succeeds;
short description with apostrophe → push succeeds.

Added a multi-line WARNING comment immediately above `description:`
documenting the trigger so future authors do not reintroduce a
possessive form. Issue #215 captures the full reproduction.

Closes #215

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 17:31:51 +04:00
e3mrah
6166b97345
feat(bootstrap-kit): edge + apps + AI batch — slot 35 (W2.K4) (#261)
Per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.6 + §2.7, W2.K4 is the
14-slot batch (35-48) covering Tier 8 (edge) + Tier 9 (apps + AI
runtime). Pre-flight chart-existence check found that only `bp-coraza`
(slot 35) currently has an authored chart — the remaining 13 platform
directories (stunner/knative/kserve/vllm/llm-gateway/anthropic-adapter/
bge/nemo-guardrails/temporal/openmeter/livekit/matrix/librechat) contain
README scaffolding only, no Chart.yaml or blueprint.yaml.

Per the W2 dispatch rule (skip slots whose chart isn't ready, file an
issue, ship what is ready), this PR ships slot 35 only and tracks the
13 missing charts as separate issues. Each missing-chart issue links
back to this PR and to the BOOTSTRAP-KIT-EXPANSION-PLAN.md slot row so
follow-up work has a clean DAG anchor.

Slot 35 — bp-coraza
- chart: platform/coraza/chart/ (1.0.0, scratch chart wiring
  ghcr.io/corazawaf/coraza-spoa:0.7.0 as Deployment + Service)
- dependsOn: bp-cilium (01) [L7 enforcement substrate],
             bp-cert-manager (02) [TLS issuers for SPOA listeners]
- HR knobs: install/upgrade.disableWait: true (event-driven readiness
  via Flux dependsOn graph; per session-2026-04-30 architectural
  decision, never use blanket `spec.timeout: Nm` watchdogs).
- Replicated to all 3 cluster trees: _template, otech.omani.works,
  omantel.omani.works.

Validation
- python3 yaml.safe_load_all on all 6 touched files: OK
- kubectl kustomize on all 3 bootstrap-kit dirs: OK
  (Namespace + HelmRepository + HelmRelease bp-coraza render cleanly)

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:23:59 +04:00
e3mrah
fd5a9ecfad
feat(bootstrap-kit): security+policy batch — slots 27-34 (W2.K3) (#276)
Adds 8 Tier 7 (Security/Policy) HelmReleases per
docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.5 — three cluster copies
(_template, omantel.omani.works, otech.omani.works).

Slots:
  27 bp-kyverno     dependsOn: bp-cilium       (admission policy engine)
  28 bp-reloader    dependsOn: (none)          (configmap/secret-rotation glue)
  29 bp-vpa         dependsOn: (none)          (vertical autoscaler)
  30 bp-trivy       dependsOn: bp-cert-manager (static scanner / operator)
  31 bp-falco       dependsOn: bp-cilium       (runtime threat detection / eBPF)
  32 bp-sigstore    dependsOn: bp-cert-manager (cosign admission verifier)
  33 bp-syft-grype  dependsOn: bp-cert-manager (SBOM + vulnerability matcher)
  34 bp-velero      dependsOn: bp-seaweedfs    (backup; SeaweedFS-backed)

Conventions followed:
  - HR shape mirrors the post-PR-250 event-driven pattern:
    install.disableWait + upgrade.disableWait, no blanket spec.timeout.
  - SOVEREIGN_FQDN substitution: `_template` carries the literal
    `${SOVEREIGN_FQDN}` placeholder; cluster copies have it expanded
    to the per-Sovereign FQDN at provisioning time (matches slot 11/12
    convention introduced by PR #168).
  - bp-reloader and bp-vpa intentionally have no dependsOn — they are
    fully independent infrastructure helpers per the plan's §2.5.
  - kustomization.yaml entries appended in numeric order (slots 15–26
    intentionally empty — reserved for W2.K1 storage+DB and W2.K2
    observability; W2.K3 ships independently).

Validation:
  - `kubectl kustomize clusters/_template/bootstrap-kit/`           OK
  - `kubectl kustomize clusters/omantel.omani.works/bootstrap-kit/` OK
  - `kubectl kustomize clusters/otech.omani.works/bootstrap-kit/`   OK
    (each: 22 HelmReleases, 22 HelmRepositories, 19 Namespaces)
  - All 24 new HR YAML files parse as 3 docs (Namespace + HelmRepository
    + HelmRelease).

Charts and OCI artifacts: charts already present at platform/<name>/
(kyverno, reloader, trivy, falco, sigstore, syft-grype, velero — all
v1.0.0 umbrella charts). Note: platform/vpa/ currently has README.md
only — chart authoring is tracked separately and does not block this
HR-shape PR (Flux will retry until the OCI artifact lands).

Refs docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.5, §3.1 (W2.K3 row),
§4.2 (kustomization merge protocol).

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:22:34 +04:00
e3mrah
adebbddca6
feat(bootstrap-kit): observability batch — slots 20-26 (W2.K2) (#277)
Adds the 7 Tier-6 observability HelmReleases per
docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.4 (W2.K2 batch). Files added in
all three cluster directories (_template, omantel.omani.works,
otech.omani.works) and listed in each cluster's kustomization.yaml.

Slots:

| Slot | Blueprint            | dependsOn                                          |
|-----:|----------------------|----------------------------------------------------|
|   20 | bp-opentelemetry     | bp-cert-manager                                    |
|   21 | bp-alloy             | bp-opentelemetry                                   |
|   22 | bp-loki              | bp-seaweedfs                                       |
|   23 | bp-mimir             | bp-seaweedfs                                       |
|   24 | bp-tempo             | bp-seaweedfs                                       |
|   25 | bp-grafana           | bp-cnpg, bp-loki, bp-mimir, bp-tempo, bp-keycloak  |
|   26 | bp-langfuse          | bp-cnpg, bp-keycloak, bp-cert-manager              |

Pattern follows existing slot files (e.g. 11-powerdns, 13-bp-catalyst-
platform): Namespace + HelmRepository (oci://ghcr.io/openova-io,
ghcr-pull secret) + HelmRelease with disableWait: true on install and
upgrade per the locked decision in MEMORY/session-2026-04-30-handover.md
(disableWait avoids deadlock when downstream backends or CRDs are not
yet reconciled; runtime convergence is observed via kubectl, not gated
on Helm).

Validated with W2.K0's scripts/check-bootstrap-deps.sh — 0 drift, 0
cycles, all 21 declared slots match scripts/expected-bootstrap-deps.yaml.

Forward-prep notice for slot 26 (bp-langfuse): bp-langfuse:1.0.0 has
not yet published to ghcr.io/openova-io due to issue #215 (Helm v3.16 +
GHCR manifest 500 with nested OCI subchart deps). W1.G is the concurrent
track fixing the publish path. Until that lands, this HelmRelease will
fail to install with a chart-pull error; this is expected and the HR
file is committed now so Flux reconciles automatically once the OCI
artifact is published.

Refs: docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.4, §3.1, §4
Depends on (deferred — flagged in PR body): #215 (langfuse publish)

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:21:26 +04:00
e3mrah
ca295c78a4
feat(bootstrap-kit): storage+DB foundation batch — slots 15-19 (W2.K1; resolves #254) (#262)
W2.K1 of the bootstrap-kit expansion plan (docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md).
Adds the Tier 5 storage+DB foundation as 5 contiguous HRs, mirrored across
the 3 cluster manifest trees (_template, otech.omani.works, omantel.omani.works).

| Slot | File                       | Blueprint           | Tier | dependsOn (Flux) |
|-----:|----------------------------|---------------------|------|------------------|
|   15 | 15-external-secrets.yaml   | bp-external-secrets | 0/3  | bp-openbao(08), bp-cert-manager(02) |
|   16 | 16-cnpg.yaml               | bp-cnpg             | 5    | bp-flux(03) |
|   17 | 17-valkey.yaml             | bp-valkey           | 5    | bp-flux(03) |
|   18 | 18-seaweedfs.yaml          | bp-seaweedfs        | 5    | bp-flux(03), bp-cert-manager(02) |
|   19 | 19-harbor.yaml             | bp-harbor           | 5    | bp-cnpg(16), bp-seaweedfs(18), bp-cert-manager(02) |

Per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.3 the dependsOn graph for
Tier 5 is finite-depth: ESO routes through bp-openbao (slot 08, Tier 1)
so Flux gates ESO install on OpenBao Ready=True regardless of slot order;
bp-cnpg and bp-valkey only need Flux Ready (their own CRDs ship in-chart);
bp-seaweedfs requests TLS from cert-manager; bp-harbor closes the cohort
by depending on cnpg + seaweedfs + cert-manager.

All 5 HRs use spec.install.disableWait=true + spec.upgrade.disableWait=true
per docs/INVIOLABLE-PRINCIPLES.md #3 (event-driven; Flux dependsOn is the
gate, not Helm timeout). Replaces the pre-PR-250 blanket spec.timeout: 15m
band-aid pattern.

Namespaces:
  bp-external-secrets → external-secrets-system
  bp-cnpg             → cnpg-system
  bp-valkey           → valkey
  bp-seaweedfs        → seaweedfs
  bp-harbor           → harbor

Resolves issue #254 — bp-powerdns pod stuck in CreateContainerConfigError
because pdns-pg-app Secret is generated by a CNPG Cluster CR; without the
operator the secret never materializes. Wiring bp-cnpg into the kit is
the structural fix; PR #248's disableWait keeps the HR Ready=True while
the pod itself recovers once the Cluster CR materializes the Secret.

Validation:
  kubectl kustomize clusters/_template/bootstrap-kit/         → 54 objects, 19 HRs
  kubectl kustomize clusters/otech.omani.works/bootstrap-kit/ → 54 objects, 19 HRs
  kubectl kustomize clusters/omantel.omani.works/bootstrap-kit/ → 54 objects, 19 HRs

Path isolation: this commit touches only slots 15-19 + the 3 kustomization.yaml
files (numeric-append). Charts under platform/<name>/ are NOT touched —
chart authoring is owned by separate parallel agents per the W2 dispatch.
The HelmRelease 1.0.0 version is the first-release convention (cf. slot 14
bp-crossplane-claims:1.0.0 in PR #247); the OCI artifact lands once the
chart is authored and the blueprint-release workflow publishes it.

Closes #254

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:18:12 +04:00