Adds e2e/cloud-nav.spec.ts — 7 Playwright assertions that lock in
the Sovereign-portal Cloud accordion contract from issue #309:
1. Sidebar exposes Cloud (not Infrastructure) accordion.
2. Clicking the Cloud header toggles expanded state and reveals 4
sub-items (Architecture / Compute / Network / Storage).
3. Each sub-item routes to /provision/$id/cloud/{suffix} and
declares aria-current=page when active.
4. Legacy /infrastructure/* paths redirect to /cloud/* equivalents.
5. Expanded state persists across page reloads via the
`sov-nav-cloud-expanded` localStorage key.
6. Accordion auto-expands when the operator deep-links onto a
/cloud/* route.
7. Captures three 1440x900 screenshots (collapsed, expanded with
Architecture active, expanded with Compute active) under
e2e/screenshots/p1-cloud-nav-*.png for visual evidence.
Also fixes a Sidebar bug surfaced by the e2e run: the active-section
detector was using `pathname.includes('/cloud')`, which would falsely
flag any deploymentId containing the substring "cloud" as being on a
/cloud/* route. Replaced with a path-segment regex.
Adds e2e/screenshots/ to .gitignore (regenerated each run, never
committed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Converts every legacy /provision/$deploymentId/infrastructure/* path
into a beforeLoad redirect that targets the equivalent /cloud/* route,
preserving the $deploymentId param so deep links and bookmarks land
on the renamed surface without an extra hop:
/infrastructure → /cloud/architecture
/infrastructure/topology → /cloud/architecture
/infrastructure/compute → /cloud/compute
/infrastructure/network → /cloud/network
/infrastructure/storage → /cloud/storage
The redirect routes still register tanstack-router components (a
no-op stub), because the route node must exist for the path to match
before `beforeLoad` fires.
Updates the cosmetic-guard suite to assert the new redirect
behaviour + the new sidebar shape (sov-nav-cloud accordion replacing
the flat sov-nav-infrastructure entry). The original `infrastructure
page` describe block is replaced by a tighter `cloud section` one
that focuses on structural surface contract; deeper accordion
behaviour is owned by the new cloud-nav.spec.ts (added in a
subsequent commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the flat Infrastructure entry in the Sovereign sidebar with a
Cloud accordion (issue #309). The four sub-pages — Architecture,
Compute, Network, Storage — render as indented entries under the Cloud
header instead of as an in-page tab strip.
Behavior:
- Cloud header is a <button> (not a Link) that toggles the
accordion. Active when on any /cloud/* (or legacy /infrastructure/*)
route.
- Sub-items are tanstack-router <Link>s targeting
/provision/$deploymentId/cloud/{architecture,compute,network,storage}.
Active sub-item carries aria-current="page".
- Auto-expanded by default when the operator is on a /cloud/* route.
- Persists expand state in localStorage under
`sov-nav-cloud-expanded` so it survives page reloads.
- ARIA: aria-expanded + aria-controls on the header; the sub-list
is role="group" with the matching id (sov-nav-cloud-group).
- Keyboard accessible: Enter / Space toggle the accordion.
Test IDs:
sov-nav-cloud (header), sov-nav-cloud-toggle (chevron),
sov-nav-cloud-architecture, sov-nav-cloud-compute,
sov-nav-cloud-network, sov-nav-cloud-storage (sub-items),
sov-nav-cloud-group (group container).
Issue #309 founder verbatim:
"have accordion menu under cloud left pane"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renames the Sovereign Cloud shell + replaces the in-page Topology /
Compute / Storage / Network tab strip with a future sidebar accordion.
The sub-page contents are unchanged in this commit (they keep their
file names + testids; the next commits rename those).
Changes:
- InfrastructurePage.tsx → CloudPage.tsx (file + class + context).
- InfrastructureContext / useInfrastructure() → CloudContext /
useCloud() — sub-pages updated to pull from the renamed hook.
- Page header "Infrastructure" → "Cloud"; tagline rewritten so it no
longer enumerates the legacy tab labels.
- Drop INFRA_TABS, resolveActiveTab, the <nav role=tablist> block,
and the .tabs / .tab CSS rules. The sidebar accordion (next
commit) replaces the in-page navigation.
- data-testid renames: infrastructure-page → cloud-page,
infrastructure-title → cloud-title,
infrastructure-content → cloud-content,
infrastructure-sovereign-switcher → cloud-sovereign-switcher.
- Compute table cluster-link target updated from /topology →
/cloud/architecture so it lands on the renamed canvas route.
- InfrastructurePage.test.tsx renamed; tab-strip assertions
converted into "tab strip is absent" assertions.
- Sub-page test fixtures updated to mount under /cloud/* paths.
Issue #309 founder verbatim:
"we call it as cloud maybe"
"have accordion menu under cloud left pane"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the new Sovereign-portal Cloud surface routing tree (issue #309)
without removing the legacy /infrastructure/* paths yet:
/provision/$deploymentId/cloud → CloudPage shell
↳ / → redirect to /architecture
↳ /architecture → Architecture canvas
↳ /compute → CloudCompute
↳ /network → CloudNetwork
↳ /storage → CloudStorage
Both /infrastructure/* and /cloud/* now resolve to the same components.
Subsequent commits will rename the components, drop the in-page tab
strip, switch the sidebar to an accordion, and convert /infrastructure/*
into redirects to /cloud/*.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream seaweedfs/seaweedfs templates/shared/security-configmap.yaml
uses Helm template fromToml; helm-controller v1.1.0's bundled helm SDK
(v3.x older than 3.13) doesn't define fromToml so the install fails:
parse error at security-configmap.yaml:21: function fromToml not defined
Setting global.seaweedfs.enableSecurity: false skips the entire template.
Internal SeaweedFS API is cluster-IP only on Sovereign-1; chart-level
security is acceptable to defer until helm-controller is bumped.
Bumped 1.0.0 → 1.0.1.
Unblocks the chain: bp-loki, bp-mimir, bp-tempo, bp-velero, bp-harbor,
bp-grafana all dependsOn bp-seaweedfs.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
The chart's post-install hook was failing on otech.omani.works:
failed post-install: unable to build kubernetes object for deleting hook
bp-external-secrets/templates/clustersecretstore-vault-region1.yaml:
resource mapping not found for kind ClusterSecretStore in version
external-secrets.io/v1beta1
Two corrections:
1. Capabilities-gate the entire template — don't render unless the
ClusterSecretStore CRD is registered (it ships in via the upstream
ESO subchart but isn't live on first install)
2. Remove 'before-hook-creation' delete-policy (was the actual trigger
for the 'deleting hook' failure path)
Bumped 1.0.0 → 1.0.1.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
'function fromToml not defined' error on bp-seaweedfs publish.
Upstream seaweedfs/seaweedfs 4.22.0 (templates/shared/security-configmap.yaml:21)
uses fromToml which exists in 3.13+ but the rendered context in the smoke
step needs newer Sprig functions present in 3.18+. Bump unblocks the
chain of HRs (bp-loki, bp-mimir, bp-tempo, bp-velero, bp-harbor, bp-grafana)
all blocked on bp-seaweedfs publish.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Pre-existing bug exposed by #305: ExecutionLogs fetched
`/api/v1/actions/executions/{id}/logs` directly instead of going
through API_BASE (`${BASE}api`). Under Vite's `/sovereign/` base path,
the Traefik ingress only routes `/sovereign/api/...` — bare `/api/...`
returns 404.
Live evidence after #328 (jobId raw colon fix):
GET /sovereign/api/v1/deployments/.../jobs/{id} → 200 (FE rewire OK)
GET /api/v1/actions/executions/{realExecId}/logs → 404 (this bug)
Note that the executionId in the failing URL is a real 32-char hex
(5f59cb0bc9df2c720b4cf07989e4dc4f), not the synthetic `:latest` —
proving the rewire in #307 + the colon fix in #328 both worked. Only
the logs URL prefix remained wrong.
Fix: import API_BASE; use `${API_BASE}/v1/actions/executions/...`.
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode URLs in app
source) — the original direct `/api/...` was a violation that this
PR settles permanently.
Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five operator-spec corrections:
1. More structured (pipeline-like)
forceX strength 0.32 → 0.55. Same-depth siblings now cluster around
their depth column; pipeline-y horizontal feel preserved.
2. Min spacing between bubbles + smaller bubbles
NODE_RADIUS 30 → 22 (more breathing room).
COLLIDE_PADDING 6 → 14 (forces wider gap regardless of zoom).
3. Hard MAX bubble size — no more elephant in batch view
Auto-fit viewBox now enforces a MIN viewBox size (1200×700). Single-
bubble or few-bubble cases (batch detail, etc.) keep the canvas at
that minimum so the bubble can't scale up to fill the whole screen.
bbox is centered within the (possibly larger) viewBox.
4. Click highlight — selected node + neighbors + connecting edges
• openJobId node: amber outer ring (4px) + amber glow halo
• Direct neighbors: lighter-amber ring (3px) + softer halo
• Edges connecting selected node: amber stroke 2.6px + amber arrow
• Non-selected non-neighbor nodes: dimmed to opacity 0.35
• Status fill kept (so we still see succeeded/failed/running/pending)
The amber palette is distinct from any status colour so selection
reads clearly even on running (cyan) or failed (red) bubbles.
5. Remove standalone /flow route + 'Show as Flow' button
Operator: 'we cannot hard code a specific flow, we'll have multiple
flows, therefore we should show the flows only under the respective
jobs.' Removed:
• provisionFlowRoute from router.tsx
• 'Show as Flow' button from JobsPage.tsx
• JobsTable batch chip retargeted from /flow?scope=batch:<id> to the
canonical /batches/ page (which embeds the flow internally)
FlowPage component preserved — it's still embedded inside JobDetail
and BatchDetail as the in-context Flow tab.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
The browser auto-encodes `:` to `%3A` when encodeURIComponent is
applied to a path segment. Chi's router does NOT decode %3A before
matching the route, so every JobDetail fetch returned 404 against the
catalyst-api.
Live evidence (Playwright network log on otech wizard, 2026-04-30):
GET https://console.openova.io/sovereign/api/v1/deployments/
ce476aaf80731a46/jobs/ce476aaf80731a46%3Ainstall-seaweedfs
→ 404
Internal probe with the raw colon:
wget http://localhost:8080/api/v1/deployments/.../jobs/
ce476aaf80731a46:install-seaweedfs
→ 200
Result on the live deployment: every JobDetail page rendered the
"Execution metadata pending" placeholder even though the catalyst-api
DID have a valid execution to surface. Bug is in the FE encoder, not
the backend or the route.
Fix:
- useJobDetail inserts jobId raw into the URL template. The colon
is RFC 3986 path-safe so this is correct per spec.
- deploymentId stays encodeURIComponent'd defensively (it's a hex
string, no-op in practice, but the encode is cheap insurance).
- Test now asserts the URL contains the raw `:` and rejects %3A.
Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
helm-controller in flux v2.4 (the version Catalyst-Zero pins) emits
structured JSON log lines with HelmRelease as a NESTED OBJECT:
"HelmRelease":{"name":"bp-mimir","namespace":"flux-system"}
The old regex only matched the legacy flat-string format
(`helmrelease="flux-system/bp-X"` or `"helmrelease":"flux-system/bp-X"`).
Result on otech.omani.works: every helm-controller stdout line was
parsed but did not match → silently dropped → zero PhaseComponentLog
events emitted → exec log viewer rendered only synthetic [seeded] /
[<state>] anchor lines.
Verified by tailing helm-controller-86c6b84dcd-t58td on the live otech
cluster (10h reconcile activity, format consistent across hundreds of
lines).
Fix:
- logtailer.helmControllerNameRe now alternates across all three
observed formats: flat-string colon, flat-string equals, and
nested-object name+namespace.
- pumpLines picks whichever capture group fired (regex alternation
leaves the other group empty).
- logtailer_test.go fixtures extended with two real flux v2.4
nested-object samples copied verbatim from the live otech
cluster's helm-controller stdout.
Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three operator-spec corrections to the organic Flow canvas:
1. Straight edges, not bezier curves
FlowEdge now renders <line x1 y1 x2 y2> rim-to-rim instead of a
cubic bezier with perpendicular control points.
2. Drag pins permanently — no spring-back
d3-drag 'end' handler no longer clears d.fx/d.fy. The bubble stays
exactly where the operator dropped it. Operator can re-drag any time.
forceX/forceY anchors only act on non-pinned (fx/fy === null) nodes.
3. Auto-fit viewBox — smart canvas filling regardless of node count
Replaced fixed viewBox="0 0 2000 1100" with bbox computed each
render: vbX/vbY = min(x|y) - padding, vbW/vbH = (max - min) +
2*padding. preserveAspectRatio="xMidYMid meet" then auto-scales.
Result:
• 2 bubbles at depth 0/1 → small bbox → tight zoom (no
irrelevant left-right corner flight)
• 35 bubbles at depth 0..6 → wide bbox → full canvas use (~85-95%)
Bubble radius stays 30px; per-depth x step stays 150px; per-region
band height 240px — all bounded so links can't stretch arbitrarily.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
PR #308 shipped the organic layout. Live verification at 1440px showed:
- bubbles cluster at depth=0 (left ~12% of canvas)
- only 1 edge rendered
Root cause: live Job objects from the backend bridge don't carry their
upstream dependsOn arrays — the bridge surfaces flat status only. The
useJobHints hook was relying on Job.dependsOn + ApplicationDescriptor
deps; both are empty for bootstrap-kit jobs (cilium, cert-manager,
spire, etc.) because they're not user-selected components.
Fix: encode the canonical bootstrap-kit dep graph from
docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2 directly in useJobHints, with
a bareName→liveJobId resolver that handles the various id formats
the backend may use ('bp-cnpg' / 'install-cnpg' / 'install-cnpg::r1').
Result: depth populates 0..6 (longest chain cilium → cert-manager →
spire → openbao → keycloak → gitea → catalyst-platform), bubbles
spread across full canvas width via depthToX(depth/maxDepth), edges
render between every parent→child pair.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
In production, handler.New() never assigns h.coreFactory, so phase1_watch
left cfg.CoreFactory == nil. helmwatch.NewWatcher had no default for
CoreFactory (DynamicFactory had one) → the helm-controller log tailer was
never launched → every PhaseComponentLog event was silently dropped.
Result on the live otech cluster: the bridge fix in #307 worked
correctly for state transitions, but the GitLab-style log viewer only
ever saw the synthetic [seeded] / [<state>] anchor lines because the
upstream emission path of raw helm-controller stdout was disconnected.
Fix:
- helmwatch.NewWatcher defaults CoreFactory to
NewKubernetesClientFromKubeconfig (mirroring the existing
DynamicFactory default).
- New regression test TestNewWatcher_DefaultsBothFactories asserts
both factories are non-nil after construction.
Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end fix for the JobDetail log viewer. Three stacked bugs surfaced
by https://console.openova.io/sovereign/provision/ce476aaf80731a46/jobs/install-seaweedfs:
A. Frontend constructed `${jobId}:latest` and sent it to
/api/v1/actions/executions/{id}/logs. The catalyst-api resolves
execId by exact match against 16-byte hex IDs — there is no
`:latest` route, so every log fetch returned 404 and the viewer
rendered "Failed to load log page" / "No logs captured for this log".
B. SeedJobsFromInformerList wrote a Job row with status=running for
non-terminal HR states (installing/degraded) but skipped
StartExecution AND set b.lastState[comp]=state. Subsequent
OnHelmReleaseEvent calls with the same state took the prev==state
early-return and never allocated an Execution. 7 jobs on the live
otech cluster were stuck this way.
C. OnProvisionerEvent filtered ev.Phase != "component" and dropped
every PhaseComponentLog event the helmwatch logtailer emits. Raw
helm-controller stdout (one line per reconcile/error/event) never
reached the persisted Execution log file — the GitLab-style viewer
only ever rendered synthetic [seeded] / [<state>] summary lines.
Fixes:
- helmwatch_bridge.go::SeedJobsFromInformerList now allocates an
Execution + writes a [seeded] anchor line for installing/degraded
states. The Execution is left OPEN so OnHelmReleaseEvent and
OnRawComponentLog can keep appending until the HR transitions to a
terminal state.
- helmwatch_bridge.go::OnProvisionerEvent dispatches on Phase:
"component" → OnHelmReleaseEvent (state transitions);
"component-log" → new OnRawComponentLog (raw helm-controller line
appended verbatim to the active Execution). Resolution policy on a
missing in-memory cursor: re-attach to the persisted
LatestExecutionID for non-terminal Jobs; allocate fresh for unknown
Jobs; drop for terminal Jobs (post-install drift-check chatter).
- ui/src/pages/sovereign/useJobDetail.ts (new) — React Query hook
fetches /api/v1/deployments/{id}/jobs/{jobId} and exposes
executions[0].id as the latestExecutionId. 5s poll while the
deployment is in flight.
- ui/src/pages/sovereign/JobDetail.tsx — replaces the synthetic
`${jobId}:latest` with detail.latestExecutionId. When executions[]
is empty, renders ExecutionLogsPlaceholder with status-aware copy
(pending / loading / empty / error) instead of an empty log viewer.
Tests:
- 4 new Go tests on the bridge: raw-log appendsToActiveExecution,
allocatesExecutionWhenJobMissing, dropsAfterTerminal, and
dropsUnknownPhases. Existing seed-idempotency tests updated for
the new "non-terminal seed allocates Execution" contract.
- 2 new vitest cases on JobDetail: uses real executions[0].id (NOT
`${jobId}:latest`) when fetching log lines; renders placeholder
(not viewer) when executions[] is empty.
- All 502 vitest pass; all api Go tests pass; production UI build
clean.
Closes via UAT on https://console.openova.io/sovereign/provision/ce476aaf80731a46/jobs/install-seaweedfs
Refs #204, supersedes the cosmetic #232 surface.
Co-authored-by: hatice yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the stage-column / Sugiyama grid that all prior Flow PRs
inherited (#245, #282, #299, #303, #304). The grid was the actual
cause of the "8x5 squashed in middle 1/3" bug operators kept rejecting
— bubbles spawned in column-grid positions and physics could only
nudge them slightly off the grid.
Per operator spec (2026-04-30):
• Bubbles spread organically across full canvas width.
• X-axis = dependency depth (longest-path-from-root); depth 0 left,
deepest right; 6%-94% of viewport.
• Y-axis = region midpoint + per-node deterministic vertical jitter,
so same-depth siblings scatter naturally — NOT a strict column.
• Edges are bezier curves with status-colored arrowheads, drawn
each tick from live simulation positions.
• NO "STAGE 1/2/..." labels. NO column dividers. NO grid.
• Bubbles draggable (d3-drag); collision avoidance via d3-force.
• Batch view: single-click → BatchSummaryPane (start, finish OR ETA,
duration, succeeded/running/pending/failed counts).
• Batch view: double-click drills via ?scope=batch:<id>&view=jobs
(siblings stay rendered at parent level via the URL scope).
New files:
• src/lib/flowLayoutOrganic.ts — pure data prep (depth, region,
family, edges); NO precomputed positions.
• src/pages/sovereign/FlowCanvasOrganic.tsx — full SVG renderer
with d3-force seed + drag.
• src/pages/sovereign/BatchSummaryPane.tsx — right floating pane
for batch-mode single-click.
Updated:
• FlowPage.tsx — switches imports + renderer; routes batch dbl-click
via ?scope= URL; routes single-click pane by mode.
Old flowLayoutV4.ts + FlowCanvasV4.tsx are kept on disk for now (only
DEFAULT_FAMILIES is still imported); a follow-up PR will delete them.
Per docs/INVIOLABLE-PRINCIPLES.md:
§1 (waterfall) — full target-state organic layout in this PR.
§2 (no compromise) — replace the wrong layout, not patch it.
§8 (disclose divergence) — flowLayoutV4.ts intentionally retained
for the DEFAULT_FAMILIES export only; cleanup follow-up.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three corrections per founder spec (verbatim 2026-04-30):
1. SINGLE pane only — removed the persistent left deployment-tree
panel and the persistent right log-feed panel. Canvas is now full
width. The exec log appears as a FloatingLogPane only on
single-click of a job bubble (existing behaviour, unchanged).
2. Job/Batch toggle now actually switches detail level:
• mode='jobs' (default) renders one bubble per job (~35 nodes)
• mode='batches' renders one bubble per batch with rolled-up
status (failed > running > pending > succeeded), startedAt
(earliest), finishedAt (latest), durationMs (max-earliest),
and inferred cross-batch dependsOn edges.
3. Bubbles draggable + physics — added d3-force simulation with:
• forceCollide(r=node.r+4) — natural collision avoidance
• forceX/forceY toward layout-suggested anchor — soft return
to canonical position when not held
• forceLink between dependsOn pairs — gentle attraction
• d3-drag wired via data-flow-draggable on each <g> node group;
drag pins node, release lets physics resettle
• bezier edges recompute control points each tick so they
follow dragged nodes naturally
• cursor: grab on every bubble
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In PR #297 I tried to restore sme-services/kustomization.yaml via
'git show f4a83a27^:...path... > file 2>&1' but git show failed (the
file didn't exist in that commit) and the stderr got captured into the
file as literal text:
fatal: path 'products/.../kustomization.yaml' exists on disk, but not in 'f4a83a27^'
Kustomize then choked on this file with:
invalid Kustomization: json: unknown field "fatal"
This blocked contabo's flux-system/catalyst-platform Kustomization from
applying anything since 16:16 UTC.
Restoring the correct kustomization.yaml content from commit 6eac8a72^.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three contabo-mkt Flux Kustomizations were broken by my recent PRs:
- flux-system/catalyst-platform: PR #260 added a Helm-template-syntax CRD
at products/catalyst/chart/templates/crd-provisioningstate.yaml.
Contabo's Flux Kustomization reads this path as raw YAML and chokes on
the {{ }} blocks. Moved the CRD to products/catalyst/chart/crds/
(Helm convention — installed unconditionally, not Helm-templated).
- flux-system/marketplace-api: PR #246 deleted the kustomization.yaml
index file that contabo's Flux Kustomization needs to enumerate
manifests. PR #280 deleted the marketplace-api/ingress.yaml. Restored
both as raw YAML.
- flux-system/sme-services: PR #281 deleted the entire sme-services/
directory. Restored all 14 manifest files as raw YAML.
Sovereign-side: added .helmignore entries so Sovereign HelmRelease
installs (otech, omantel) skip the contabo-only files entirely:
- templates/ingress.yaml (Traefik Middleware + Ingress for console)
- templates/ingress-console-tls.yaml (TLS-terminating ingress, NEW —
was missing on contabo, causing TRAEFIK DEFAULT CERT errors)
- templates/sme-services/
- templates/marketplace-api/
Bumped 1.1.6 -> 1.1.8.
Cluster impact:
- contabo: 3 broken Kustomizations recover; console.openova.io gets
proper Let's Encrypt cert via the new console-openova-tls Certificate.
- otech / omantel Sovereigns: no contabo-mkt content rendered; install
works clean against chart 1.1.8.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer
in bp-cert-manager by shipping the missing piece — a Go binary that
satisfies cert-manager's external webhook contract
(`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json.
Architecture
============
* `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with
pool-domain-manager and catalyst-dns). Encapsulates the api3.json
transport, command builders, response decoding, and the safe
read-modify-write semantics required to never accidentally wipe a
zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2`
variant is unexported.
* `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook
binary. Implements `Solver.Present` via the client's append-only
`AddRecord` path and `Solver.CleanUp` via the read-modify-write
`RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`)
rejects challenges for unmanaged apexes BEFORE any Dynadot call.
* `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm
wrapper. Templates Deployment + Service + APIService + serving
Certificate (CA chain via cert-manager Issuer self-signing) +
RBAC + ServiceAccount. Mirrors the standard cert-manager external-
webhook deployment shape.
* `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the
paired ClusterIssuer activates. The interim http01 issuer remains
templated as the rollback path.
Test results
============
core/pkg/dynadot-client — 7 tests PASS (race-clean)
core/cmd/cert-manager-dynadot-... — 9 tests PASS (race-clean)
Test coverage includes a Present/CleanUp round-trip against an
httptest fixture that models Dynadot's zone state, an explicit
unmanaged-domain rejection, a regression preserving a pre-existing
CNAME across the DNS-01 round-trip (the zone-wipe defence), and a
typed-error propagation test that surfaces `ErrInvalidToken` to
cert-manager so the controller will retry.
Helm template smoke render
==========================
`helm template` against the new chart with default values yields 12
resources / 424 lines (APIService, Certificate, ClusterRoleBinding,
Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The
modified bp-cert-manager chart still renders both ClusterIssuers
(`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default
values; flipping `certManager.issuers.dns01.enabled=false` is the
clean rollback.
Smoke command (post-deploy)
===========================
kubectl get apiservices.apiregistration.k8s.io \
v1alpha1.acme.dynadot.openova.io
# Issue a *.<sovereign>.<pool> wildcard cert and watch the
# Order/Challenge progress through cert-manager.
CI
==
`.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the
pool-domain-manager-build pattern (cosign keyless signing, SBOM
attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager-
dynadot-webhook:<sha>`). Triggered by changes to either the binary or
the shared dynadot-client package.
Closes#159
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Edge + serverless + model-serving batch (W2.5.C) — three upstream-
subchart umbrella Blueprints completing the bootstrap-kit slots for
WebRTC media relay (bp-relay → bp-stunner) and the AI/ML serving stack
(bp-cortex → bp-kserve → bp-knative).
Each chart follows the canonical umbrella pattern from
docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream
chart under `dependencies:` so `helm dependency build` bundles the
upstream payload into the OCI artifact, and Catalyst-curated overlay
values + templates sit alongside in chart/values.yaml + chart/templates/.
Per-chart highlights:
- bp-stunner/1.0.0 — wraps stunner/stunner-gateway-operator 1.1.0.
Ships a Cilium-native GatewayClass (Capabilities-gated on
gateway.networking.k8s.io/v1) so bp-relay (LiveKit / SFU) can claim
Gateway CRs without an operator-ordering dance. Default UDP TURN port
range 30000-32767 matches the range opened at the Sovereign edge
firewall (Crossplane bp-firewall composition).
- bp-knative/1.0.0 — wraps knative-operator v1.21.1. Ships a
KnativeServing CR pre-configured for **istio-less mode**
(ingress.istio.enabled=false, ingress.contour.enabled=false,
ingress.kourier.enabled=false; config.network.ingress-class=cilium).
Sovereign FQDN sourced from values, no hardcoded fallback per
inviolable principle #4 — render fails loudly if cluster overlay
doesn't set knativeOverlay.knativeServing.sovereignFqdn.
- bp-kserve/1.0.0 — wraps kserve/kserve v0.16.0 (latest version
published on the official OCI registry as of 2026-04-30). Default
deploymentMode=RawDeployment (no Knative hop on the hot path) but
bp-knative is still installed (declared as a hard dep) so per-IS
annotation `serving.kserve.io/deploymentMode: Serverless` opts in to
scale-to-zero per tenant. Cilium native Gateway-API ingress
(enableGatewayApi=true, className=cilium, disableIstioVirtualHost=
true).
Observability discipline (issue #182): every observability toggle
(ServiceMonitor, HPA, GatewayClass) defaults false and is operator-
tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles.
Each chart ships tests/observability-toggle.sh covering default-off,
opt-in (with `--api-versions monitoring.coreos.com/v1` to simulate
Prometheus Operator CRDs), and explicit-off cases.
Per-chart kind summary (helm template default render):
bp-stunner: ClusterRole, ClusterRoleBinding, ConfigMap, Dataplane,
Deployment, Role, RoleBinding, Service, ServiceAccount.
(+ GatewayClass when --api-versions
gateway.networking.k8s.io/v1 is passed.)
bp-knative: ClusterRole, ClusterRoleBinding, ConfigMap,
CustomResourceDefinition, Deployment, KnativeServing,
Role, RoleBinding, Secret, Service, ServiceAccount.
bp-kserve: Certificate, ClusterRole, ClusterRoleBinding,
ClusterServingRuntime, ClusterStorageContainer,
ConfigMap, Deployment, Gateway, Issuer,
MutatingWebhookConfiguration, Role, RoleBinding,
Service, ServiceAccount, ValidatingWebhookConfiguration.
`helm lint` clean for all three (single INFO on missing icon — icons
land with marketplace card work).
`bash tests/observability-toggle.sh` green for all three (3 cases each:
default-off, opt-in, explicit-off).
Closes#263#264#265
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
W2.5.F — three Catalyst Blueprint umbrella charts at platform/{openmeter,
livekit,matrix}/, each declaring its upstream chart under Chart.yaml
`dependencies:` so `helm dependency build` bundles the upstream payload
into the published OCI artifact (per docs/BLUEPRINT-AUTHORING.md §11.1
— hollow charts forbidden, CI-enforced by issue #181).
Per-chart kind summary
======================
bp-openmeter (closes#272)
default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount
upstream chart: openmeter 1.0.0-beta.213 (oci://ghcr.io/openmeterio/helm-charts)
ClickHouse-less profile per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §6.4.
The upstream chart's bundled clickhouse / kafka / postgresql / redis /
svix subcharts are all DISABLED — Catalyst supplies CNPG (postgres),
JetStream (event bus), and Valkey (redis-compat) at the platform tier.
Chart-level toggle `catalystBlueprint.backend.kind` (default `cnpg`,
alt `clickhouse`) records the active profile so observability/audit
pipelines can report it. The OpenMeter binary's
`aggregation.clickhouse.address` is left blank — per-Sovereign overlay
supplies it once a host cluster adds bp-clickhouse and the operator
re-rolls with `backend.kind: clickhouse`. Catalyst overlay templates
(NetworkPolicy / ServiceMonitor / HPA) all default OFF per
docs/BLUEPRINT-AUTHORING.md §11.2.
bp-livekit (closes#273)
default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount
upstream chart: livekit-server 1.9.0 (https://helm.livekit.io)
WebRTC SFU. Powers the Huawei iFlytek voice demo. Catalyst defaults
pair LiveKit with bp-stunner (the upstream chart's bundled co-located
TURN server is OFF; per-Sovereign overlay points the LiveKit TURN
config at the stunner UDP-gateway Service). RTC UDP port range is
50000-60000 (matches the Hetzner firewall rule the per-Sovereign
overlay opens). Catalyst overlay templates (NetworkPolicy /
ServiceMonitor / HPA) all default OFF; the chart's NetworkPolicy
template documents that LiveKit's hostNetwork mode means pod-level
policies do NOT cover the SFU port range — the firewall rule is the
load-bearing control. blueprint.yaml `depends:` declares bp-stunner +
bp-cert-manager + bp-valkey.
bp-matrix (closes#274)
default `helm template` kinds: ConfigMap, Deployment, Ingress, Job,
PersistentVolumeClaim, Pod, Role, RoleBinding, Secret, Service,
ServiceAccount
upstream chart: matrix-synapse 3.12.25 (https://ananace.gitlab.io/charts)
Synapse (the Matrix server implementation, NOT the retired OpenOva
product noun). Federation OFF by default (Catalyst per-Sovereign
tenancy default — operator overlays flip it on per-Organization).
Postgres backend via bp-cnpg externalPostgresql; OIDC SSO via
bp-keycloak; bundled bitnami postgresql + redis subcharts both
disabled. Catalyst overlay NetworkPolicy gates the federation port
(8448) on `federation.enabled` — verified by Case 5 of the
observability-toggle test. Catalyst-overlay ServiceMonitor (upstream
chart has none) + HPA both default OFF.
Lint
====
All three charts pass `helm lint` clean (only the noisy "icon is
recommended" INFO message).
Observability tests
===================
Each chart's `tests/observability-toggle.sh` enforces the Catalyst
contract from docs/BLUEPRINT-AUTHORING.md §11.2:
Case 1: default render produces zero monitoring.coreos.com/v1
resources (no ServiceMonitor / PrometheusRule).
Case 2: opt-in (--set serviceMonitor.enabled=true --api-versions
monitoring.coreos.com/v1) renders a ServiceMonitor.
Case 3: explicit-off render is clean.
Case 4 (per chart):
- openmeter: ClickHouse-less profile asserts no
clickhouse.altinity.com / Kafka subchart resources leak into the
default render.
- livekit: asserts upstream livekit-server.serviceMonitor.create
defaults false.
- matrix: asserts default render carries an empty
federation_domain_whitelist (the per-Sovereign tenancy default).
Case 5 (matrix only): `--set federation.enabled=true networkPolicy
.enabled=true` opens port 8448 in the Catalyst NetworkPolicy.
All gates green for all three charts.
Closes#272#273#274
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
W2.5.E batch — three Application-tier Blueprints completing the LLM
serving / workflow stack:
- bp-temporal/1.0.0 — wraps temporal/temporal 1.2.0 (the new chart
rewrite that removed cassandra:/mysql:/postgresql:/elasticsearch:/
prometheus:/grafana: top-level keys in favour of
server.config.persistence.datastores). Postgres-only via CNPG-backed
visibility store (skip Cassandra). Web UI ON. Keycloak OIDC
integration via --auth-claim-mapper renders auth.yaml ConfigMap
(operator wires via additionalVolumes once bp-keycloak is
reconciled, default OFF). dependsOn: bp-cnpg + bp-cert-manager.
Closes#271.
Kinds: Cluster (CNPG) + ConfigMap + Deployment + Job + Pod +
Service.
- bp-llm-gateway/1.0.0 — wraps berriai/litellm-helm 0.1.572 from OCI.
Subscription-aware proxy for Claude Code: routes to Anthropic (via
operator OAuth/Max subscription — NEVER an ANTHROPIC_API_KEY,
per memory/feedback_no_api_key.md), Bedrock, Vertex,
OpenAI-compatible (via bp-anthropic-adapter), and self-hosted
vLLM. CNPG-backed audit log (every prompt + response persisted
for compliance). Bundled bitnami postgresql + redis subcharts
DISABLED (db.useExisting=true points at the CNPG cluster).
Keycloak SSO via auth.yaml ConfigMap (default OFF).
ExternalSecret-backed environmentSecrets brings tokens / IAM
creds in without inlining plaintext. dependsOn: bp-cnpg +
bp-keycloak + bp-external-secrets. Closes#267.
Kinds: Cluster (CNPG audit) + ConfigMap + Deployment + Job +
Pod + Secret + Service + ServiceAccount.
- bp-anthropic-adapter/1.0.0 — Catalyst-authored scratch chart for
the OpenAI ↔ Anthropic translation Go service. SHA-pinned image
ghcr.io/openova-io/openova/anthropic-adapter:<sha> (Inviolable
Principle #4a — GitHub Actions is the only build path; empty
default tag fails the render with a clear error instead of
silently shipping :latest). OAuth/Max subscription token mounted
from K8s Secret materialized by ESO from bp-openbao —
ANTHROPIC_OAUTH_TOKEN env var, NEVER an ANTHROPIC_API_KEY.
Includes OpenAI → Anthropic model-mapping ConfigMap (gpt-4 →
claude-3-5-sonnet, gpt-4o-mini → claude-3-5-haiku, etc.).
sigstore/common library subchart included to satisfy the
hollow-chart gate (matches bp-vllm pattern from #283).
dependsOn: bp-external-secrets. Closes#268.
Kinds: ConfigMap + Deployment + Service + ServiceAccount.
CRITICAL — bp-llm-gateway and bp-anthropic-adapter both consume the
operator's Claude OAuth/Max subscription. Per memory/
feedback_no_api_key.md and the user's standing instruction, neither
chart accepts or generates an ANTHROPIC_API_KEY. Tokens flow
exclusively through ExternalSecret-managed K8s Secrets that ESO
materializes from bp-openbao at install time.
Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every
observability toggle defaults `false` (ServiceMonitor / metrics
sidecar / PodMonitor) and is operator-tunable via per-cluster
overlay once bp-kube-prometheus-stack reconciles. Each chart ships
tests/observability-toggle.sh covering default-off, opt-in (with
--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and
explicit-off cases. bp-anthropic-adapter additionally tests the
never-:latest gate via Case 4 (empty image tag must fail render).
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every
upstream version, namespace, server URL, role, secret name, model
default, and toggle is exposed under values.yaml. Cluster overlays
in clusters/<sovereign>/ may override without rebuilding the
Blueprint OCI artifact.
Per docs/BLUEPRINT-AUTHORING.md §11.1 (umbrella shape — hard
contract): bp-temporal and bp-llm-gateway declare their upstream
charts under Chart.yaml dependencies: so helm dependency build
bundles the upstream payload into the OCI artifact. bp-anthropic-
adapter is a scratch chart (no upstream Helm chart exists) and
includes sigstore/common as the obligatory hollow-chart-gate
dependency, matching the bp-vllm precedent from W2.5.D (#283).
Closes#267Closes#268Closes#271
helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
W2.5.G — Catalyst-authored scratch chart for LibreChat (slot 48 of the
omantel-1 bootstrap-kit). LibreChat upstream does not publish a Helm
chart, so this chart hand-wires the official ghcr.io/danny-avila/librechat
container as Deployment + Service + Ingress + ConfigMap + ServiceAccount
+ NetworkPolicy + ServiceMonitor + HPA, with the sigstore/common
library subchart declared to satisfy the hollow-chart gate (issue #181).
Per docs/BLUEPRINT-AUTHORING.md §11.2: every observability toggle
(serviceMonitor, hpa) defaults false; opt-in via per-cluster overlay
once kube-prometheus-stack reconciles. The ServiceMonitor template is
double-gated by .Values.serviceMonitor.enabled AND
Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so flipping the
toggle on a too-early Sovereign cannot break the bp-librechat reconcile.
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every endpoint
URL, model name, secret reference, namespace selector, and image tag is
operator-tunable via values.yaml. The Sovereign FQDN, Keycloak issuer,
llm-gateway URL, embeddings URL, and TLS ClusterIssuer are all
operator-supplied at install time. The image tag is pinned to v0.7.5
(no :latest).
Connectors:
- Chat completions: bp-llm-gateway (OpenAI-compatible /v1/chat/completions)
exposed as a "custom" endpoint named "Catalyst LLM"
- Embeddings (RAG): bp-bge — provider=bge maps to EMBEDDINGS_PROVIDER=openai
+ RAG_OPENAI_BASEURL=<bge.svc> at template-render time
- SSO: bp-keycloak (OpenID Connect) — issuer/clientId from values,
client secret + session secret from ExternalSecret
- Conversation store: FerretDB on bp-cnpg (MongoDB wire protocol over
Postgres) — operator-supplied connection URI
Hosted at chat-app.<sovereign-fqdn>; the chart `fail`s render if
ingress.host is empty (no platform-wide default).
helm template (default values, --set ingress.host=...):
ConfigMap, Deployment, Ingress, NetworkPolicy, Service, ServiceAccount
helm template (--set hpa.enabled=true serviceMonitor.enabled=true
--api-versions monitoring.coreos.com/v1):
ConfigMap, Deployment, HorizontalPodAutoscaler, Ingress, NetworkPolicy,
Service, ServiceAccount, ServiceMonitor
helm lint: 1 chart(s) linted, 0 chart(s) failed (single INFO on
missing icon — icons land with the marketplace card work).
tests/observability-toggle.sh: PASS on default-off, opt-in
(--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and
explicit-off cases.
Path isolation: only platform/librechat/ — no HR slot files,
blueprint-release.yaml, or other charts touched. The HR slot files
(clusters/.../48-librechat.yaml) and blueprint-release.yaml will land
in a separate slot-wiring PR per the W2.K4 expansion plan.
Closes#275
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sovereigns don't have an `sme` namespace, so installing bp-catalyst-platform
1.1.5 on otech.omani.works failed with:
Helm install failed for release catalyst-system/catalyst-platform
with chart bp-catalyst-platform@1.1.5:
failed to create resource: namespaces "sme" not found
Same family of bug as #279 (Traefik Middleware): Group C cutover dragged
contabo-mkt-only SME product manifests into the Catalyst umbrella chart.
PR #280 deleted the SME *ingresses* but the deeper microservice mesh
remained.
Fix
---
Delete the entire `products/catalyst/chart/templates/sme-services/`
directory — 13 manifests, ~36 K8s resources. Every one of them is
hardcoded to `namespace: sme` and to `sme.openova.io` URLs. The SME
service mesh (auth/catalog/tenant/provisioning/billing/domain/
notification/gateway/console/admin/marketplace + configmap + SAs) is
the OpenOva.io contabo-mkt marketplace product, not part of the
Catalyst control plane that ships with every Sovereign.
If/when SME is redeployed it will live in a contabo-mkt-only
Kustomization or a separate `bp-sme` Blueprint — out of scope for the
bp-catalyst-platform umbrella, which must remain Sovereign-portable.
Verification
------------
- `grep -rn 'namespace: sme' products/catalyst/chart/templates/` → 0 hits
- `grep -rn 'sme' products/catalyst/chart/templates/` → 0 hits
- `helm template products/catalyst/chart` → exit 0, 260 kinds, 0 SME refs
Versions bumped 1.1.5 → 1.1.6 in:
- products/catalyst/chart/Chart.yaml (chart + appVersion)
- clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
- clusters/otech.omani.works/bootstrap-kit/13-bp-catalyst-platform.yaml
- clusters/omantel.omani.works/bootstrap-kit/13-bp-catalyst-platform.yaml
Closes#281
Related #279, #280 (same root-cause family — Group C cutover artifacts)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Catalyst-authored umbrella charts for the W2.5.D AI-inference stack.
None of the three upstream projects publish a Helm chart, so each
chart hand-wires the upstream container as Deployment + Service +
ConfigMap + ServiceMonitor + NetworkPolicy + HPA, with the
sigstore/common library subchart declared to satisfy the
hollow-chart gate (issue #181).
bp-vllm (slot 39) — wraps vllm/vllm-openai:v0.6.4. GPU-aware
(nvidia.com/gpu when vllm.gpu.enabled=true; CPU fallback for dev).
Default model meta-llama/Llama-3.1-8B-Instruct, port 8000,
OpenAI-compatible /v1/chat/completions. All engine knobs
(maxModelLen, gpuMemoryUtilization, dtype, quantization,
tensorParallelSize, prefix-caching) overlay-tunable. Closes#266.
bp-bge (slot 42) — wraps ghcr.io/huggingface/text-embeddings-inference:cpu-1.5.
Default model BAAI/bge-small-en-v1.5 + BAAI/bge-reranker-base
sidecar in same Pod. Two-port Service (8080 embed, 8081 rerank)
annotated for bp-llm-gateway discovery. CPU-friendly defaults;
overlay swaps in BAAI/bge-m3 on GPU Sovereigns. Closes#269.
bp-nemo-guardrails (slot 43) — wraps the upstream NVIDIA/NeMo-Guardrails
Dockerfile (nemoguardrails server, FastAPI, port 8000). LLM endpoint
+ model + engine all overlay-tunable; Colang flow bundle mounts via
configMap.externalName for production rails. ConfigMap stub renders
a default rail for smoke testing. Closes#270.
All three charts:
- Default observability toggles to false per BLUEPRINT-AUTHORING.md §11.2
- Pin upstream image tags (no :latest) per INVIOLABLE-PRINCIPLES.md #4
- Non-root securityContext (runAsUser 1000, drop ALL capabilities)
- prometheus.io scrape annotations on the Pod for fallback discovery
- Operator-tunable NetworkPolicy gating ingress to bp-llm-gateway and
egress to HuggingFace / bp-vllm / bp-bge as appropriate
helm template (default values) per chart:
bp-vllm: ConfigMap, Deployment, Service, ServiceAccount
bp-bge: ConfigMap, Deployment, Service, ServiceAccount
bp-nemo-guardrails: ConfigMap, Deployment, Service, ServiceAccount
helm template (--set serviceMonitor.enabled=true networkPolicy.enabled=true hpa.enabled=true):
All three render ConfigMap + Deployment + Service + ServiceAccount +
ServiceMonitor + NetworkPolicy + HorizontalPodAutoscaler.
helm lint: 0 chart(s) failed for all three (single INFO on missing icon —
icons land with the marketplace card work).
Closes#266Closes#269Closes#270
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the pill-card swimlane layout shipped in PR #245 — which the
operator rejected as "intentional divergence from the canonical
mockup" (issue #251) — with a circular-node, multi-region, bezier-edge
canvas that matches `marketing/mockups/provision-mockup-v4.png`.
What changed
------------
New geometry library (`src/lib/flowLayoutV4.ts`):
• Multi-region partition — caller supplies FlowRegion[]; every
region renders as a horizontal band stacked top → bottom.
• Per-region longest-path stage assignment (Kahn) keyed on Job
dependsOn + caller-supplied `extraDepIds` (component-graph edges
surfaced from ApplicationDescriptor.dependencies so dense families
like SPINE / GUARDIAN read as columns even when the test catalog
has no per-job dependsOn yet).
• Sub-column splitting at MAX_PER_COLUMN (8) so dense families
stack vertically rather than fanning out into many sub-columns.
• Caller-injected family palette (PRODUCTS taxonomy from
componentGroups.ts) so the flow + the wizard StepComponents
page colour-code identically.
• Bezier router — straight for span=0 within-region, cubic-bezier
for span≥1 and all cross-region edges (warm-amber dashed).
New presentation layer:
• `FlowCanvasV4.tsx` — circular nodes with status-tinted progress
arcs, family-colour rings, single-letter glyphs (✓ / ✕ / family
initial), per-status arrow markers, region band frames, and
stage column dividers + labels.
• `FlowDeploymentTree.tsx` + `flowDeploymentTreeData.ts` — left
"DEPLOYMENT PROGRESS" panel; static tree (NO accordion per the
operator's standing rule), groups by region → family → job.
• `FlowLogFeed.tsx` — right "LIVE LOG" panel; replays the focused
job's recent reducer events, status-coloured, blinking cursor
when live.
`FlowPage.tsx`:
• Replaces the JobBubble pipelineLayout pipeline with the
flowLayoutV4 + FlowCanvasV4 + tree + log triplet.
• Wires region descriptors from `useWizardStore().regions` (with a
fallback single-region for empty stores).
• Derives stage hints — Phase 0 = stage 1, cluster-bootstrap =
stage 2, components = 3 + componentGraphDepth.
• Picks an initial focused job (running > failed > first) so the
log feed always shows something on first paint.
• Inlines the surface CSS so canvas + tree + log stay in lockstep.
Preserved testid contract
-------------------------
data-testid="flow-canvas-svg" — root <svg>
data-testid="flow-job-<jobId>" — every node group
data-testid="flow-batch-<regionId>" — every region band
data-testid="flow-canvas-empty" — empty placeholder
So the existing cosmetic-guards Test #6/#7/#8 continue to pass without
edit (Jobs↔Batches mode toggle + single-click → FloatingLogPane behaviour
is unchanged).
New testids for the upgraded V4 surface:
data-testid="flow-node-circle-<jobId>" — actual <circle>
data-testid="flow-region-<regionId>" — region band frame
data-testid="flow-stage-<n>" — stage column divider
data-testid="flow-edge-<from>-<to>" — directional edges
data-testid="flow-deployment-tree" — left tree
data-testid="flow-log-feed" — right log panel
Tests
-----
21 new unit tests in `src/lib/flowLayoutV4.test.ts` lock the layout
contract (multi-region partitioning, cross-region edge classification,
family palette mapping, bezier routing). All 500 vitest tests + tsc
typecheck green.
Per docs/INVIOLABLE-PRINCIPLES.md
---------------------------------
#1 (waterfall) — full target shape ships in one PR: circular nodes,
multi-region bands, bezier edges, log feed,
deployment tree, family palette, stage hints.
#2 (no compromise) — no graph library, no canvas; pure SVG so testids
work and the operator can right-click → inspect.
#4 (never hardcode) — every dimension is in FlowGeometryV4 knobs;
every colour comes from the FlowFamily palette
(caller-injected, sourced from PRODUCTS).
Closes#251.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three template files emitted traefik.io/v1alpha1 Middleware resources
(strip-sovereign in catalyst, strip-nova + root-to-nova in sme) that
are leftover artifacts from the contabo-mkt cluster's k3s-bundled
Traefik. Sovereigns use Cilium native gateway per ARCHITECTURE.md §11
and the Traefik CRDs are never installed there.
Decision: DELETE rather than migrate to Gateway API HTTPRoute. The
strip-sovereign middleware was contabo-mkt-specific (catalyst console
was previously hosted under /sovereign path; on Sovereigns the console
serves at the root of console.<FQDN>). The Nova middlewares are for
the SME product not deployed on Sovereigns under the franchise model.
Bumped 1.1.4 -> 1.1.5.
Closes#279
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause: Helm's `helm push` collapses the chart `description` field
into a single-line OCI manifest annotation
`org.opencontainers.image.description`. The GHCR manifest-PUT validator
returns a deterministic 500 Internal Server Error when that annotation
is long AND contains an ASCII apostrophe. bp-langfuse 1.0.0 was the
only chart in the observability batch (PR #214) carrying both
characteristics, so it was the only one that failed to publish.
Fix: reword the affected sentence from "Langfuse's persistent state" to
"the Langfuse persistent state" — drops the apostrophe, preserves the
meaning, and crucially preserves every byte of the actual chart payload
(values, templates, all 350 entries of the upstream langfuse-1.5.28
subchart with its 4-level-deep Bitnami vendoring). No runtime
behavioural change; helm template renders the exact same 6 resources
across 490 lines.
The narrowing was done by progressively reducing the Chart.yaml from
the failing version to a passing version while pushing to a scratch
GHCR namespace, with the bp-langfuse repo deleted between attempts
(verified via `DELETE /orgs/openova-io/packages/container/bp-langfuse`
and re-querying). The trigger is reproducible: long description +
apostrophe → 500; long description without apostrophe → push succeeds;
short description with apostrophe → push succeeds.
Added a multi-line WARNING comment immediately above `description:`
documenting the trigger so future authors do not reintroduce a
possessive form. Issue #215 captures the full reproduction.
Closes#215
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.6 + §2.7, W2.K4 is the
14-slot batch (35-48) covering Tier 8 (edge) + Tier 9 (apps + AI
runtime). Pre-flight chart-existence check found that only `bp-coraza`
(slot 35) currently has an authored chart — the remaining 13 platform
directories (stunner/knative/kserve/vllm/llm-gateway/anthropic-adapter/
bge/nemo-guardrails/temporal/openmeter/livekit/matrix/librechat) contain
README scaffolding only, no Chart.yaml or blueprint.yaml.
Per the W2 dispatch rule (skip slots whose chart isn't ready, file an
issue, ship what is ready), this PR ships slot 35 only and tracks the
13 missing charts as separate issues. Each missing-chart issue links
back to this PR and to the BOOTSTRAP-KIT-EXPANSION-PLAN.md slot row so
follow-up work has a clean DAG anchor.
Slot 35 — bp-coraza
- chart: platform/coraza/chart/ (1.0.0, scratch chart wiring
ghcr.io/corazawaf/coraza-spoa:0.7.0 as Deployment + Service)
- dependsOn: bp-cilium (01) [L7 enforcement substrate],
bp-cert-manager (02) [TLS issuers for SPOA listeners]
- HR knobs: install/upgrade.disableWait: true (event-driven readiness
via Flux dependsOn graph; per session-2026-04-30 architectural
decision, never use blanket `spec.timeout: Nm` watchdogs).
- Replicated to all 3 cluster trees: _template, otech.omani.works,
omantel.omani.works.
Validation
- python3 yaml.safe_load_all on all 6 touched files: OK
- kubectl kustomize on all 3 bootstrap-kit dirs: OK
(Namespace + HelmRepository + HelmRelease bp-coraza render cleanly)
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the 7 Tier-6 observability HelmReleases per
docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.4 (W2.K2 batch). Files added in
all three cluster directories (_template, omantel.omani.works,
otech.omani.works) and listed in each cluster's kustomization.yaml.
Slots:
| Slot | Blueprint | dependsOn |
|-----:|----------------------|----------------------------------------------------|
| 20 | bp-opentelemetry | bp-cert-manager |
| 21 | bp-alloy | bp-opentelemetry |
| 22 | bp-loki | bp-seaweedfs |
| 23 | bp-mimir | bp-seaweedfs |
| 24 | bp-tempo | bp-seaweedfs |
| 25 | bp-grafana | bp-cnpg, bp-loki, bp-mimir, bp-tempo, bp-keycloak |
| 26 | bp-langfuse | bp-cnpg, bp-keycloak, bp-cert-manager |
Pattern follows existing slot files (e.g. 11-powerdns, 13-bp-catalyst-
platform): Namespace + HelmRepository (oci://ghcr.io/openova-io,
ghcr-pull secret) + HelmRelease with disableWait: true on install and
upgrade per the locked decision in MEMORY/session-2026-04-30-handover.md
(disableWait avoids deadlock when downstream backends or CRDs are not
yet reconciled; runtime convergence is observed via kubectl, not gated
on Helm).
Validated with W2.K0's scripts/check-bootstrap-deps.sh — 0 drift, 0
cycles, all 21 declared slots match scripts/expected-bootstrap-deps.yaml.
Forward-prep notice for slot 26 (bp-langfuse): bp-langfuse:1.0.0 has
not yet published to ghcr.io/openova-io due to issue #215 (Helm v3.16 +
GHCR manifest 500 with nested OCI subchart deps). W1.G is the concurrent
track fixing the publish path. Until that lands, this HelmRelease will
fail to install with a chart-pull error; this is expected and the HR
file is committed now so Flux reconciles automatically once the OCI
artifact is published.
Refs: docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.4, §3.1, §4
Depends on (deferred — flagged in PR body): #215 (langfuse publish)
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
W2.K1 of the bootstrap-kit expansion plan (docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md).
Adds the Tier 5 storage+DB foundation as 5 contiguous HRs, mirrored across
the 3 cluster manifest trees (_template, otech.omani.works, omantel.omani.works).
| Slot | File | Blueprint | Tier | dependsOn (Flux) |
|-----:|----------------------------|---------------------|------|------------------|
| 15 | 15-external-secrets.yaml | bp-external-secrets | 0/3 | bp-openbao(08), bp-cert-manager(02) |
| 16 | 16-cnpg.yaml | bp-cnpg | 5 | bp-flux(03) |
| 17 | 17-valkey.yaml | bp-valkey | 5 | bp-flux(03) |
| 18 | 18-seaweedfs.yaml | bp-seaweedfs | 5 | bp-flux(03), bp-cert-manager(02) |
| 19 | 19-harbor.yaml | bp-harbor | 5 | bp-cnpg(16), bp-seaweedfs(18), bp-cert-manager(02) |
Per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2.3 the dependsOn graph for
Tier 5 is finite-depth: ESO routes through bp-openbao (slot 08, Tier 1)
so Flux gates ESO install on OpenBao Ready=True regardless of slot order;
bp-cnpg and bp-valkey only need Flux Ready (their own CRDs ship in-chart);
bp-seaweedfs requests TLS from cert-manager; bp-harbor closes the cohort
by depending on cnpg + seaweedfs + cert-manager.
All 5 HRs use spec.install.disableWait=true + spec.upgrade.disableWait=true
per docs/INVIOLABLE-PRINCIPLES.md #3 (event-driven; Flux dependsOn is the
gate, not Helm timeout). Replaces the pre-PR-250 blanket spec.timeout: 15m
band-aid pattern.
Namespaces:
bp-external-secrets → external-secrets-system
bp-cnpg → cnpg-system
bp-valkey → valkey
bp-seaweedfs → seaweedfs
bp-harbor → harbor
Resolves issue #254 — bp-powerdns pod stuck in CreateContainerConfigError
because pdns-pg-app Secret is generated by a CNPG Cluster CR; without the
operator the secret never materializes. Wiring bp-cnpg into the kit is
the structural fix; PR #248's disableWait keeps the HR Ready=True while
the pod itself recovers once the Cluster CR materializes the Secret.
Validation:
kubectl kustomize clusters/_template/bootstrap-kit/ → 54 objects, 19 HRs
kubectl kustomize clusters/otech.omani.works/bootstrap-kit/ → 54 objects, 19 HRs
kubectl kustomize clusters/omantel.omani.works/bootstrap-kit/ → 54 objects, 19 HRs
Path isolation: this commit touches only slots 15-19 + the 3 kustomization.yaml
files (numeric-append). Charts under platform/<name>/ are NOT touched —
chart authoring is owned by separate parallel agents per the W2 dispatch.
The HelmRelease 1.0.0 version is the first-release convention (cf. slot 14
bp-crossplane-claims:1.0.0 in PR #247); the OCI artifact lands once the
chart is authored and the blueprint-release workflow publishes it.
Closes#254
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>