Commit Graph

859 Commits

Author SHA1 Message Date
e3mrah
956b976558
fix(ci): playwright-smoke port 4321→5173 for Vite 8 default (#335) (#418)
The catalyst-ui dev-server bind moved from 4321 to 5173 when Vite default
changed (Vite 8). The smoke workflow's curl-wait + BASE_URL env still
pointed at 4321, so:

  Vite 8 starts fine on 5173 →
    workflow polls 4321 for 60s → never returns 200 →
      step exits 1 before Playwright ever runs.

Effect across last ~30 main commits: every push generated a 'Playwright UI
smoke failed' email despite the UI itself being healthy. We've been
shipping with --admin bypass + post-deploy verification against
console.openova.io. This restores actual smoke coverage on every PR.

Three substitutions on .github/workflows/playwright-smoke.yaml:
  - line 80 curl wait URL: localhost:4321 → localhost:5173
  - line 93 BASE_URL env: 4321 → 5173
  - line 72-73 comment: stale 'Vite binds 4321 by default' → 5173

Closes #335.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:04:11 +04:00
e3mrah
b3383557eb
feat(bp-gitea): chart-verified on contabo (#376) (#417)
bp-gitea:1.1.2 already published; smoke-installed in `gitea-smoke` ns on
contabo, both pods Ready in ~2m38s, /api/v1/version returns 1.22.3 (HTTP
200), admin auth verified. Smoke torn down clean.

In-scope hygiene fix to clusters/otech.omani.works/bootstrap-kit/10-gitea.yaml
— replaces stale upstream `ingress.hosts[]` overlay with the
post-#387/#402 `gateway.host` shape so otech matches the _template/ and
omantel.omani.works/ overlays. helm-template default-values renders 15
manifests clean (HTTPRoute correctly skip-renders without `gateway.host`).

WBS §2 row 13 + §9 row #376 updated to chart-verified.

Closes #376.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:55:19 +04:00
e3mrah
2913c4f27a
feat(bp-grafana): chart-verified — smoke OK on contabo + per-Sovereign overlay drift fix (closes #381) (#416)
bp-grafana 1.0.0 was published by blueprint-release run 25214143810 on
commit a1bd5502 (alongside the #387 Gateway API HTTPRoute templates).
This commit verifies the chart on contabo and brings the per-Sovereign
overlays in line with the _template (and with the bp-keycloak pattern
shipped in #377).

Verification:
  - helm template defaults → 13 kinds (HTTPRoute skip-renders when
    gateway.host is empty, per the #387/#402 if-host-emit pattern)
  - helm template with gateway.host=grafana.test.example.com → 14 kinds
    (incl. HTTPRoute)
  - smoke install in grafana-smoke ns: 1/1 Ready in 65s; in-cluster GET
    http://smoke-grafana/login → HTTP 200; /api/health → 200; image
    docker.io/grafana/grafana:12.3.1 confirmed; smoke torn down clean.

Per-Sovereign overlay drift fix:
  - clusters/omantel.omani.works/bootstrap-kit/25-grafana.yaml — add
    values.gateway.host = grafana.omantel.omani.works (was missing).
  - clusters/otech.omani.works/bootstrap-kit/25-grafana.yaml — add
    values.gateway.host = grafana.otech.omani.works (was missing).

Both now match the _template and the bp-keycloak otech overlay shape.

Scope clarification: the original ticket said "Bundle: Alloy + Loki +
Mimir + Tempo + Grafana dashboards" but the actual chart split has
Alloy/Loki/Mimir/Tempo as sibling Blueprints at slots 21-24, with
bp-grafana as the visualizer-only at slot 25. WBS §2 row updated to
reflect this. Each LGTM sibling has its own ticket.

Closes #381

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:55:07 +04:00
e3mrah
1e17668055
feat(catalyst): Hetzner Object Storage credential pattern — Phase 0b (#371) (#409)
* feat(catalyst): Hetzner Object Storage credential pattern (Phase 0b, #371)

Adds the per-Sovereign Hetzner Object Storage credential capture + bucket
provisioning Phase 0b path described in the omantel handover WBS §5.
Hybrid Option A+B: wizard collects operator-issued S3 credentials (Hetzner
exposes no Cloud API to mint them — they're issued once in the Hetzner
Console and the secret half is shown exactly once), and OpenTofu
auto-provisions the per-Sovereign bucket via the aminueza/minio provider
+ writes a flux-system/hetzner-object-storage Secret into the new
Sovereign at cloud-init time so Harbor (#383) and Velero (#384) find
their backing-store credentials already in the cluster from Phase 1
onwards.

Extends the EXISTING canonical seam at every layer (per the founder's
anti-duplication rule for #371's session): the existing Tofu module at
infra/hetzner/, the existing handler/credentials.go validator, the
existing provisioner.Request struct, the existing store.Redact path,
and the existing wizard StepCredentials. No parallel binaries / scripts
/ operators introduced.

infra/hetzner/ (Tofu module — Phase 0):
  - versions.tf: declare aminueza/minio provider (Hetzner's official
    recommendation for S3-compatible bucket creation per
    docs.hetzner.com/storage/object-storage/getting-started/...)
  - variables.tf: 4 sensitive vars — region (validated against
    fsn1/nbg1/hel1, the European-only OS regions as of 2026-04),
    access_key, secret_key, bucket_name (RFC-compliant S3 naming)
  - main.tf: minio_s3_bucket.main resource — idempotent on re-apply,
    no force_destroy (Velero archive must survive a control-plane
    reinstall), object_locking=false (content-addressed digests are
    the immutability guarantee for Harbor; Velero uses S3 versioning)
  - cloudinit-control-plane.tftpl: write
    flux-system/hetzner-object-storage Secret with the canonical
    s3-endpoint/s3-region/s3-bucket/s3-access-key/s3-secret-key keys
    Harbor + Velero charts consume via existingSecret refs
  - outputs.tf: surface endpoint/region/bucket back to catalyst-api
    for the deployment record (credentials NEVER returned)

products/catalyst/bootstrap/api/ (Go):
  - internal/hetzner/objectstorage.go: NEW — minio-go/v7-based
    ListBuckets validator. Distinguishes auth failure ("rejected") from
    network failure ("unreachable") so the wizard renders the right
    error card. NOT a parallel cloud-resource path — the existing
    purge.go handles hcloud purge; objectstorage.go handles a separate
    API surface (S3-compatible) that has no equivalent client today.
  - internal/handler/credentials.go: extend with
    ValidateObjectStorageCredentials handler — same wire shape
    (200 valid:true / 200 valid:false / 503 unreachable / 400 bad
    input) as the existing token validator so the wizard's failure-
    card machinery handles both without per-endpoint switches.
  - cmd/api/main.go: wire POST
    /api/v1/credentials/object-storage/validate
  - internal/provisioner/provisioner.go: extend Request with
    ObjectStorageRegion/AccessKey/SecretKey/Bucket; Validate()
    rejects empty/malformed values fail-fast at /api/v1/deployments
    POST time; writeTfvars() emits the 4 new tfvars.
  - internal/handler/deployments.go: derive bucket name from FQDN slug
    pre-Validate (catalyst-<fqdn-with-dots-replaced-by-dashes>) so
    Hetzner's globally-namespaced bucket pool gets a deterministic,
    collision-resistant per-Sovereign name without operator input.
  - internal/store/store.go: redact access/secret keys; preserve
    region+bucket plain (they're public in tofu outputs anyway).

products/catalyst/bootstrap/ui/ (TypeScript / React):
  - entities/deployment/model.ts + store.ts: 4 new wizard fields
    (objectStorageRegion/AccessKey/SecretKey/Validated) with merge()
    coercion for legacy persisted state.
  - pages/wizard/steps/StepCredentials.tsx: ObjectStorageSection —
    region picker (fsn1/nbg1/hel1), masked secret-key input,
    Validate button gating Next. Same FailureCard taxonomy
    (rejected/too-short/unreachable/network/parse/http) the existing
    TokenSection uses, so the operator UX is consistent. Section
    only renders when Hetzner is among chosen providers — non-Hetzner
    Sovereigns skip Phase 0b until their own backing-store path lands.
  - pages/wizard/steps/StepReview.tsx: include
    objectStorageRegion/AccessKey/SecretKey in the
    POST /v1/deployments payload (bucket derived server-side).

Tests:
  - api: 7 new provisioner Validate tests (region/keys/bucket
    required + RFC-compliant + valid-region acceptance), 5 handler
    tests for the new endpoint (bad JSON / missing region / invalid
    region / short keys), 4 hetzner/objectstorage_test.go tests
    (endpoint composition + early input rejection), 1 handler test
    for the bucket-name derivation. Existing tests updated to supply
    the new required fields.
  - ui: StepCredentials.test.tsx pre-populates objectStorageValidated
    in beforeEach so the existing 11 SSH-section tests aren't gated
    on Object Storage validation.

DoD: a fresh Sovereign provision results in a usable S3 endpoint URL +
access/secret keys available as a K8s Secret in the Sovereign's home
cluster (flux-system/hetzner-object-storage), ready for consumption by
Harbor + Velero charts via existingSecret references.

Closes #371.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(wbs): #371 done — Hetzner Object Storage Phase 0b shipped (#409)

Marks #371 done with the architectural rationale (hybrid Option A + B —
Hetzner exposes no Cloud API to mint S3 keys, so the wizard MUST capture
them; OpenTofu auto-provisions the bucket + cloud-init writes the
flux-system/hetzner-object-storage Secret with the canonical s3-* keys
Harbor + Velero consume).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:54:22 +04:00
e3mrah
1cbd759e0f
docs(wbs): tick 7 — §2 prose updated (#316 + #375 chart-released); #379 RESTART after watchdog kill (#415)
Bursty completion: #316 + #375 prose rows now reflect chart-released state
(was stale from earlier 'not deployed').

#379 first agent watchdog-killed (no work survived) — restarted with
tighter STAY-TIGHT brief modeled on the successful #378/#377/#375 patterns
(5-15 min wall time, smoke + close as duplicate if chart already published).

In flight (5): #371 #376 #379-RESTART #380 #381

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:53:00 +04:00
e3mrah
8695ab82c5
docs(wbs): tick #316 chart-released — bp-openbao 1.2.0 (auto-unseal) (#414)
PR #408 merged at d2ada908. Blueprint-release run 25214747925 SUCCESS,
bp-openbao:1.2.0 published to GHCR with cosign signature + SBOM
attestation. Cluster overlay clusters/_template/bootstrap-kit/08-openbao.yaml
already wired with autoUnseal.enabled=true in the same PR.

Sovereign-impact deferred to Phase 8 — next omantel provision run.

Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:50:18 +04:00
e3mrah
38e6a2a528
docs(wbs): tick 6 — 9 done; #380 dispatched to maintain 5 parallel (#413)
Done (9): #316 #338 #370 #373 #375 #377 #378 #387 #392
In flight (5): #371 #376 #379 #380 #381

Bursty completion window — #316 #373 #375 #377 #378 all landed within ~10 min.
Sovereign-impact for chart-released/chart-verified items deferred to Phase 8.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:48:04 +04:00
e3mrah
6e0f734d62
fix(bootstrap-kit): renumber bp-cert-manager-powerdns-webhook 36→49 + register in expected DAG (#373 followup) (#412)
PR #410 landed slot 36 for bp-cert-manager-powerdns-webhook, but slot 36
was already reserved in scripts/expected-bootstrap-deps.yaml for
bp-stunner (W2.K4 forward-declaration). The bootstrap-kit dependency
audit failed on the merge SHA 04308af7 with:

  ERROR: HR 'bp-cert-manager-powerdns-webhook' (file
  clusters/_template/bootstrap-kit/36-bp-cert-manager-powerdns-webhook.yaml)
  is present on disk but NOT declared in
  scripts/expected-bootstrap-deps.yaml.

Two fixes here:

  1. Move the file to slot 49 (first free slot after W2.K4's 35-48
     forward declarations). File renamed; kustomization.yaml updated;
     in-file comment block updated to explain the slot choice.

  2. Register slot 49 in scripts/expected-bootstrap-deps.yaml as
     `wave: present` with `depends_on: [bp-cert-manager, bp-powerdns]` —
     matches the HelmRelease's actual dependsOn block.

Local audit:
  $ bash scripts/check-bootstrap-deps.sh
  Present on disk:       36
  Declared expected:     49
  Deferred (W2.K1-K4):   13
  Drift:                 0
  Cycles:                0
  OK: bootstrap-kit dependency graph audit PASSED

This is a CI-only follow-up; chart and runtime semantics from #410 are
unchanged. Sovereign-impact deferred to Phase 8 per chart-only DoD.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:46:49 +04:00
e3mrah
d2ada908c9
feat(bp-openbao): auto-unseal flow — cloud-init seed + post-install init Job (closes #316) (#408)
Catalyst-curated auto-unseal pipeline for OpenBao on Hetzner Sovereigns
(no managed-KMS available). Selected **Option A — Shamir + cloud-init
seed** because:

  - Hetzner has no managed-KMS service → Cloud-KMS auto-unseal (Option C)
    is structurally unavailable.
  - Transit-seal (Option B) requires a peer OpenBao cluster, only
    applicable to multi-region tier-1; out of scope for single-region
    omantel.
  - Manual unseal (Option D) violates the "first sovereign-admin lands
    on console.<sovereign-fqdn> ready to use" goal in
    SOVEREIGN-PROVISIONING.md §5.

Architecture (per issue #316 spec + acceptance criteria 1-6):

  1. Cloud-init on the control-plane node generates a 32-byte recovery
     seed from /dev/urandom and writes it to a single-use K8s Secret
     `openbao-recovery-seed` in the openbao namespace, with annotation
     `openbao.openova.io/single-use: "true"`. Pre-creates the openbao
     namespace to eliminate the race with Flux's HelmRelease apply.
  2. bp-openbao chart v1.2.0 ships two new Helm post-install hooks:
       - `templates/init-job.yaml` (hook weight 5): consumes the seed,
         calls `bao operator init -recovery-shares=1 -recovery-threshold=1`,
         persists the recovery key inside OpenBao's auto-unseal config,
         deletes the seed Secret on success. Idempotent — re-runs detect
         Initialized=true and exit 0.
       - `templates/auth-bootstrap-job.yaml` (hook weight 10): enables
         the Kubernetes auth method, mounts kv-v2 at `secret/`, writes
         the `external-secrets-read` policy, binds the `external-secrets`
         role to the ESO ServiceAccount in `external-secrets-system`.
  3. `templates/auto-unseal-rbac.yaml` declares the least-privilege SA
     + Role + RoleBinding the Jobs need (Secret get/list/delete in the
     openbao namespace; create/get/patch on the openbao-init-marker).
     Also emits the permanent `system:auth-delegator` ClusterRoleBinding
     bound to the OpenBao ServiceAccount so the Kubernetes auth method
     can call tokenreviews.authentication.k8s.io.
  4. Cluster overlay `clusters/_template/bootstrap-kit/08-openbao.yaml`
     bumps version 1.1.1 → 1.2.0 and flips `autoUnseal.enabled: true`
     per-Sovereign.

Per #402 lesson: skip-render pattern (`{{- if .Values.X }}{{ emit }}
{{- end }}`) used throughout — never `{{ fail }}`. Default `helm
template` render emits NOTHING new; opt-in via autoUnseal.enabled=true.

Acceptance criteria coverage:
  1. Provision fresh Sovereign — cloud-init writes seed, Flux installs
     bp-openbao 1.2.0, post-install Jobs run automatically. 
  2. bp-openbao HR Ready=True without manual intervention — install
     keeps `disableWait: true` (Helm Ready ≠ OpenBao initialised; the
     init Job drives initialisation out-of-band on the same install). 
  3. `bao status` shows Sealed=false, Initialized=true within 5 minutes
     — init Job polls + retries up to 60×5s. 
  4. ESO ClusterSecretStore vault-region1 reaches Status: Valid — the
     auth-bootstrap Job binds the `external-secrets` role to ESO's SA
     before the Job exits. 
  5. Seed Secret deleted post-init — init Job deletes it via K8s API
     after consuming. 
  6. No openbao-root-token Secret in K8s — root token captured to
     /tmp/.root-token in the Job pod's tmpfs only; never written to a
     K8s Secret. The recovery key persists ONLY inside OpenBao's Raft
     state (auto-unseal config). 

Tests:
  - tests/auto-unseal-toggle.sh — 4 cases:
    * default render → no auto-unseal artefacts (skip-render works)
    * autoUnseal.enabled=true → both Jobs + correct hook weights
    * kubernetesAuth.enabled=false → init Job only, no auth-bootstrap
    * idempotency annotations present on all 5 hook objects
  - tests/observability-toggle.sh — unchanged, all 3 cases green.
  - helm lint . — clean.

Files:
  - platform/openbao/chart/Chart.yaml — version 1.1.1 → 1.2.0
  - platform/openbao/blueprint.yaml — version 1.1.1 → 1.2.0
  - platform/openbao/chart/values.yaml — `autoUnseal.*` block
  - platform/openbao/chart/templates/auto-unseal-rbac.yaml — new
  - platform/openbao/chart/templates/init-job.yaml — new
  - platform/openbao/chart/templates/auth-bootstrap-job.yaml — new
  - platform/openbao/chart/tests/auto-unseal-toggle.sh — new
  - platform/openbao/README.md — bootstrap procedure §2-3 expanded;
    auto-unseal alternatives table added.
  - clusters/_template/bootstrap-kit/08-openbao.yaml — chart 1.1.1 →
    1.2.0, autoUnseal.enabled=true.
  - infra/hetzner/cloudinit-control-plane.tftpl — seed-token block
    inserted between ghcr-pull-secret apply and flux-bootstrap apply.
  - docs/omantel-handover-wbs.md §9 — #316 ticked chart-released.

Canonical seam used: extended existing `platform/openbao/chart/` per
the anti-duplication rule. NO standalone scripts. NO bespoke Go cloud
calls. NO `{{ fail }}`. All knobs configurable via values.yaml per
INVIOLABLE-PRINCIPLES.md #4 (never hardcode).

Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:45:44 +04:00
e3mrah
74d232538a
docs(wbs): #375 bp-nats-jetstream chart-verified — smoke OK, close as duplicate (#411)
bp-nats-jetstream:1.1.1 already published on GHCR. Helm template renders
8 kinds clean (StatefulSet replicas=3 per ADR-0001 §9.2 B5). Smoke install
on contabo `nats-smoke` ns reached 3/3 Ready in 33s; JetStream R=3 stream
created with leader+2 replica quorum; pub/sub round-trip verified.
Bootstrap-kit slot 07 already wired in `_template/`. No code change needed.

Same verify-and-close pattern as #378.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:45:21 +04:00
e3mrah
04308af7e9
feat(cert-manager): bp-cert-manager-powerdns-webhook (#373) (#410)
Authors a Catalyst Blueprint for the cert-manager DNS-01 external webhook
backed by PowerDNS, for post-handover wildcard TLS issuance against the
Sovereign's OWN PowerDNS — eliminating the last reachback to openova-
controlled Dynadot credentials per ADR-0001 §9.4.

Structure mirrors bp-cert-manager-dynadot-webhook (canonical seam):
- platform/cert-manager-powerdns-webhook/blueprint.yaml — Blueprint CR
  with depends: [bp-cert-manager, bp-powerdns]
- platform/cert-manager-powerdns-webhook/chart/Chart.yaml — wraps upstream
  zachomedia/cert-manager-webhook-pdns v2.5.5 (chart 3.2.5); declares the
  sigstore/common stub dep to satisfy the hollow-chart guard (#181)
- chart/templates/ — 8 templates (Deployment, Service, APIService, RBAC,
  selfSigned/CA Issuer + serving Certificate, ServiceAccount,
  ClusterIssuer)
- ClusterIssuer (letsencrypt-dns01-prod-powerdns) ships with the chart,
  paired with the webhook's solver. Gated behind clusterIssuer.enabled
  AND powerdns.host (skip-render pattern, lesson from #387 follow-up
  #402 — never use {{ fail }})

Bootstrap-kit slot:
- clusters/_template/bootstrap-kit/36-bp-cert-manager-powerdns-webhook.yaml
  wires the HelmRelease to the per-Sovereign in-cluster PowerDNS endpoint
  (http://powerdns.powerdns:8081) and flips clusterIssuer.enabled=true.
- ${SOVEREIGN_FQDN} envsubst keeps the slot operator-overridable per
  Inviolable Principle #4. Contabo bootstrap path does NOT include this
  template — contabo stays on legacy http01 + Traefik per ADR-0001 §9.4.

Helm-template verification:
  helm template t platform/cert-manager-powerdns-webhook/chart/
    → 14 resources, 0 ClusterIssuer (skip-render works)
  helm template t platform/cert-manager-powerdns-webhook/chart/ \
      --set powerdns.host=http://powerdns.test:8081 \
      --set clusterIssuer.enabled=true \
      --set powerdns.apiKeySecretRef.name=fake
    → 15 resources incl. ClusterIssuer with PowerDNS solver config
  Both renders parse cleanly through python yaml.safe_load_all.

Updates docs/omantel-handover-wbs.md §2 row 4 + §9 row #373 to
chart-released. Sovereign-impact deferred to Phase 8 (handover E2E).

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:44:27 +04:00
e3mrah
43c93d1875
feat(bp-keycloak): chart-verified on contabo (#377) (#407)
bp-keycloak:1.1.2 already published by blueprint-release run 25214143810
on commit a1bd5502 (digest sha256:c284c3dc...). Verified end-to-end:

- helm dependency build pulls bitnami/keycloak 25.2.0
- helm template (default values, no gateway.host) renders without error
  (HTTPRoute skip-renders per #387/#402 pattern)
- helm install in disposable keycloak-smoke ns on contabo:
  smoke-postgresql-0 + smoke-keycloak-0 reached Ready in ~2m39s
- /realms/master returns HTTP 200 in-cluster
- admin OIDC password-grant returned valid RS256 JWT access_token
- teardown clean (PVC + namespace deleted)

In-scope hygiene fix:
- clusters/otech.omani.works/bootstrap-kit/09-keycloak.yaml: add
  values.gateway.host=auth.otech.omani.works (mirrors omantel overlay
  authored under #387; otech overlay was authored before that and
  would have shipped without an HTTPRoute on its Sovereign).

Wizard catalog already lists keycloak under layer:'bootstrap-kit'
(mandatory, auto-installed) — no UI work needed.

WBS §2 row 14 + §9 row #377 updated to chart-verified.

Closes #377

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:42:06 +04:00
e3mrah
513508f224
docs(wbs): tick 5 — #378 done, #375 dispatched, dedupe §9 (#406)
#378 completed (chart-verified, closed as duplicate per agent finding).
#375 dispatched as next from queue to maintain 5-parallel.

In-flight now: #371 #373 #316 #375 #377 (5).
Done: #338 #370 #378 #387 #392 (5 of 24 minimal blueprints).

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:40:25 +04:00
e3mrah
1a20cc50b9
docs(wbs): #378 bp-crossplane chart-verified — smoke OK, close as duplicate (#405)
Investigation by Agent #378-bp-crossplane:

VALIDATION
- platform/crossplane/chart/ is umbrella (Chart.yaml + values.yaml + Chart.lock + charts/)
  by design after the v1.1.3 split (CR-of-CRD ordering moved to bp-crossplane-claims)
- helm template bp-crossplane . --namespace crossplane-system renders 23 kinds, 0 errors
- bp-crossplane v1.1.3 already published to oci://ghcr.io/openova-io/bp-crossplane
- Latest blueprint-release.yaml run on main is SUCCESS (f004300f)

SMOKE INSTALL (contabo, crossplane-smoke ns, torn down)
- helm install: deployed in 26s
- crossplane controller: 1/1 Ready
- crossplane-rbac-manager: 1/1 Ready
- 16 CRDs admitted (apiextensions.crossplane.io + pkg.crossplane.io + secrets.crossplane.io)
- Provider.pkg.crossplane.io/v1 admitted
- provider-hcloud:v0.4.0 Provider CR admitted (xpkg.upbound.io/crossplane-contrib)
- Teardown clean (provider deleted, helm uninstall, namespace deleted, CRDs deleted)

BOOTSTRAP-KIT WIRING (already done — verified, not changed)
- clusters/_template/bootstrap-kit/04-crossplane.yaml — bp-crossplane HelmRelease,
  dependsOn bp-flux, namespace crossplane-system, version pinned 1.1.3
- clusters/_template/bootstrap-kit/14-crossplane-claims.yaml — bp-crossplane-claims
  HelmRelease, dependsOn bp-crossplane (post-v1.1.3 split rationale documented inline)
- clusters/omantel.omani.works/bootstrap-kit/{04,14}-*.yaml — same content with
  catalyst.openova.io/sovereign label substituted

Per ADR-0001 §9.2 #2 Crossplane is the only day-2 cloud-API seam — chart deployed
per-Sovereign on the management k3s, not on contabo-mkt (which is the marketing
cluster). The smoke install above is a transient verification only.

#378 closes as duplicate — chart pre-exists, renders clean, installs clean,
bootstrap-kit wiring pre-exists. Nothing new to ship.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:37:17 +04:00
e3mrah
32864b58df
docs(wbs): tick 4 — 5 agents in flight (#371 #373 #316 #377 #378) (#404)
Phase 0/2/3/4 fan-out at full 5-parallel:
  - #371 RESUME (Hetzner OS credentials, in-worktree state)
  - #373 NEW (cert-mgr-powerdns-webhook authoring)
  - #316 NEW (OpenBao auto-unseal)
  - #377 NEW (bp-keycloak install verification)
  - #378 NEW (bp-crossplane install verification)

#370 promoted to done (unblocked + scope superseded by working wipe.go).

Class assignments updated; §9 status rows added.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:36:51 +04:00
e3mrah
f004300ff9
docs(wbs): tick 3 — #387 chart-released, #392 DoD-met (e2e proven), #370 unblocked (#403)
State after #401 + #402 + #399 land:
- #338 chart-released, Sovereign-impact deferred (bp-flux is cloud-init bootstrapped)
- #387 chart-released, follow-up #402 fixed default-values render; blueprint-release SUCCESS on a1bd5502
- #392  DoD-met — fake-Hetzner E2E test exercises full Purge() flow
- #370 unblocked (purge.go fix proven); reframed scope superseded
- #371 still in flight (Hetzner OS credentials)

DAG class: T338 T387 T392 → done; T370 T371 → wip.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:26:49 +04:00
github-actions[bot]
3e980654a9 deploy: update catalyst images to a1bd550 2026-05-01 12:25:50 +00:00
e3mrah
a1bd550208
fix(charts): HTTPRoute templates skip-render on missing host (was failing default-values render) (#402)
Blueprint-release for #401 failed because HTTPRoute templates use
{{- fail }} when gateway.host is not set, which trips the chart default-values
render gate in CI. Switched 6 templates from 'fail loud' to 'skip render':

  if .Values.gateway.host  →  emit HTTPRoute
  else                     →  emit nothing

The Gateway API admission already rejects HTTPRoute with empty hostnames,
so the loud-fail wasn't buying anything an operator wouldn't see at apply
time. Default-values render now produces zero HTTPRoute resources, which
is the correct shape for the upstream chart consumers that don't set
the Sovereign-only gateway block.

Files: keycloak, gitea, openbao, grafana, harbor, catalyst-platform.

Verified:
  helm template t products/catalyst/chart/ → 0 HTTPRoutes (clean)
  helm template t products/catalyst/chart/ --set ingress.gateway.enabled=true --set ingress.hosts.console.host=console.test --set ingress.hosts.api.host=api.test → 2 HTTPRoutes

Closes the blueprint-release failure on commit abf01b6f.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:23:58 +04:00
github-actions[bot]
eded68eccd deploy: update catalyst images to abf01b6 2026-05-01 12:21:08 +00:00
e3mrah
abf01b6f21
feat(platform): Gateway API migration audit (#387) (#401)
Migrates every minimal-Sovereign-set blueprint chart from
networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute,
replacing the legacy Traefik-on-Sovereigns assumption with the canonical
Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2
correction note (#388).

The single per-Sovereign Gateway is added as additional documents in
the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml
(NOT a new top-level slot), since Cilium owns the GatewayClass. It
includes:

  - Certificate `sovereign-wildcard-tls` requesting `*.${SOVEREIGN_FQDN}`
    from `letsencrypt-dns01-prod` (cert-manager + #373 webhook)
  - Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS
    terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All

Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's
existing `templates/` directory):

  | Blueprint           | Host pattern                    | Backend port |
  |---------------------|---------------------------------|--------------|
  | bp-keycloak         | auth.<sov>                      | 80           |
  | bp-gitea            | git.<sov>                       | 3000         |
  | bp-openbao          | bao.<sov>                       | 8200         |
  | bp-grafana          | grafana.<sov>                   | 80           |
  | bp-harbor           | registry.<sov>                  | 80           |
  | bp-powerdns         | pdns.<sov>/api  (dual-mode)     | 8081         |
  | bp-catalyst-platform| console.<sov>, api.<sov>         | 80, 8080     |

bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute
(Sovereign) simultaneously — the per-Sovereign overlay sets
`api.gateway.enabled=true` while leaving `api.enabled=true`. The
Ingress object is harmless on Cilium clusters with no Traefik. This
preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4.

bp-harbor flips `expose.type` from `ingress` to `clusterIP` in
platform/harbor/chart/values.yaml so the upstream chart no longer
emits its own Ingress; the HTTPRoute is the sole HTTP exposure.
TLS terminates at the Gateway (wildcard cert) rather than per-host
Certificates inside the chart.

bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by
.helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml,
which remain contabo-only legacy demo infra). The contabo path keeps
serving console.openova.io/sovereign via Traefik unchanged.

Bootstrap-kit slot updates (per-Sovereign hostname interpolation):

  - 08-openbao.yaml      → gateway.host: bao.${SOVEREIGN_FQDN}
  - 09-keycloak.yaml     → gateway.host: auth.${SOVEREIGN_FQDN}
  - 10-gitea.yaml        → gateway.host: gitea.${SOVEREIGN_FQDN}
  - 11-powerdns.yaml     → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true
  - 19-harbor.yaml       → gateway.host: registry.${SOVEREIGN_FQDN}
  - 25-grafana.yaml      → gateway.host: grafana.${SOVEREIGN_FQDN}

Server-side dry-run validation against the live Cilium Gateway API
CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway
+ Certificate apply cleanly via `kubectl apply --dry-run=server`.

Contabo unaffected: clusters/contabo-mkt/* not modified. The legacy
SME ingresses (console-nova, marketplace, admin, axon, talentmesh,
stalwart, ...) continue to serve via Traefik as before. powerdns
on contabo remains on the Ingress path (api.gateway.enabled defaults
to false at the chart level).

Closes #387.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:19:30 +04:00
e3mrah
c1782cf6f1
docs(wbs): DAG compressed + light theme + clickable tickets + #338/#392 marked done (#398) (#400)
Three founder-requested DAG improvements:
1. Vertical compression: subgraph direction LR (was TB) + single-line node
   labels — roughly halves the rendered height.
2. Light-theme phase blocks: slate-100 fill with dark text; light-tinted
   semantic colours for done/wip/blocked/gate. Readable in both GitHub
   light and dark modes.
3. Clickable ticket numbers: every node carries a click directive opening
   the GitHub issue in a new tab. Phase 8 gate links to epic #369.

Status updates folded in:
- #338 done (PR #393 merged at 05cb39c0)
- #392 done (PR #397 merged at aa8ed4e7) — unblocks #370
- #370 still blocked but gate cleared
- #371 RESUMED, #387 RESTARTED with anti-duplication brief

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:18:29 +04:00
e3mrah
0904f54a54
test(catalyst-api): purge.go end-to-end fake-Hetzner integration test (#392 DoD) (#399)
Adds the missing behavior-level proof for #392. The unit tests in
purge_test.go pin the label-key constant; this file exercises the full
Purge() flow against an httptest fake-Hetzner that:

  1. Asserts the label_selector wire format matches the canonical label
  2. Returns one resource per kind (server/LB/FW/network/ssh_key)
  3. Records DELETE calls against /v1/<kind>/{id}

Two tests:
  - TestPurge_EndToEnd_FakeHetzner: full happy-path round-trip; PurgeReport
    totals to 5 with each kind's expected id deleted
  - TestPurge_EndToEnd_RegressionGuard: same flow, named to communicate
    that any future drift in the label selector (regression of #392)
    causes the fake's t.Errorf to fire AND the Purge() call to return an
    error — making sure the "silent no-op" failure mode that hid the
    original bug cannot recur.

Both pass locally (29ms). No real Hetzner credit consumed — the test
swaps purgeHTTPClient with one whose Transport rewrites
api.hetzner.cloud → httptest server URL.

Closes the DoD-chain step ("behavior-verified") for #392 that was
deferred by the agent due to redacted tokens on the live deployment
records.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:17:29 +04:00
e3mrah
bf7218b878
docs(wbs): DAG compressed + light theme + clickable tickets + #338/#392 marked done (#398)
Three founder-requested DAG improvements:
1. Vertical compression: subgraph direction LR (was TB) + single-line node
   labels — roughly halves the rendered height.
2. Light-theme phase blocks: slate-100 fill with dark text; light-tinted
   semantic colours for done/wip/blocked/gate. Readable in both GitHub
   light and dark modes.
3. Clickable ticket numbers: every node carries a click directive opening
   the GitHub issue in a new tab. Phase 8 gate links to epic #369.

Status updates folded in:
- #338 done (PR #393 merged at 05cb39c0)
- #392 done (PR #397 merged at aa8ed4e7) — unblocks #370
- #370 still blocked but gate cleared
- #371 RESUMED, #387 RESTARTED with anti-duplication brief

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:02:33 +04:00
github-actions[bot]
e97ae0f448 deploy: update catalyst images to aa8ed4e 2026-05-01 12:01:58 +00:00
e3mrah
aa8ed4e7a3
fix(catalyst-api): purge.go label key matches Tofu emit (#392) (#397)
Bug: `hetzner.Purge` filtered by `catalyst-deployment-id=<id>`. The
OpenTofu module at `infra/hetzner/main.tf` actually emits
`catalyst.openova.io/sovereign=<fqdn>` on every taggable resource
(network, firewall, ssh-key, server, load-balancer). The mismatch made
the wizard's Cancel-and-Wipe orphan-purge step (#318, wipe.go) silently
no-op for every failed deployment since the bug landed.

Fix (minimum-impact, 2 prod files):
- `purge.go`: introduce `PurgeLabelKey` constant + `FilterByLabel()`
  helper; rename parameter from `deploymentID` to `sovereignFQDN`;
  filter by `catalyst.openova.io/sovereign=<fqdn>`.
- `wipe.go`: pass `dep.Request.SovereignFQDN` instead of `id`.

Regression sentinel (`purge_test.go`):
- pins the constant to `catalyst.openova.io/sovereign`
- reads `infra/hetzner/main.tf` and asserts the constant appears there
- exercises the wire-format helper
- guards empty-token and empty-fqdn input rejection

If either Tofu or purge.go drifts from the canonical key, the test
fails locally before CI ships the bug.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:00:08 +04:00
e3mrah
eb92e0496b
feat(platform): add bp-newapi — multi-tenant LLM marketplace gateway (#394) (#396)
Catalyst Blueprint wrapping the upstream NewAPI
(github.com/Calcium-Ion/new-api, MIT) for Sovereign operators whose
business model is reselling LLM access to their own customers.

Backend-only mode: the OpenAI-compatible API at api.<host>/v1/* is
customer-facing; the upstream's portal UI is disabled at ingress;
Catalyst replaces it as the customer surface; NewAPI's admin UI at
admin.<host> is exposed only to ops staff (IdP-gated).

Compliance posture enforced at the blueprint layer:
- Channel attestation gate (refuses to render if any enabled channel
  lacks verifiable provenance — in-cluster, commercial-contract, or
  byok)
- Geographic AUP enforcement (sanctioned-region block on commercial-
  provider channels; US/EU export-control baseline)
- BYOK isolation (request-scoped, never aggregated)
- Reseller disclosure required
- Audit log on bp-cnpg (metadata-only by default)

ACME placeholder used throughout the README; replace with operator
identity in per-Sovereign overlays at clusters/<sovereign>/bootstrap-
kit/.

Files:
- platform/newapi/README.md (design doc + setup checklist)
- platform/newapi/blueprint.yaml (Catalyst Blueprint CR)
- platform/newapi/chart/{Chart.yaml,values.yaml}
- platform/newapi/chart/templates/{_helpers.tpl,deployment.yaml,
  service.yaml,ingress.yaml,configmap.yaml,serviceaccount.yaml,
  networkpolicy.yaml}

Closes design portion of #394.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:57:06 +04:00
e3mrah
05cb39c042
fix(bp-flux): catalyst-cluster-reconciler ClusterRoleBinding overlay (closes #338) (#393)
PROBLEM
-------
On Sovereign-1 (otech.omani.works, 2026-04-30) every HelmRelease that
transitioned through pending-install/pending-upgrade got stuck because
the helm-controller SA could not UPDATE its own helm-storage Secrets
(sh.helm.release.v1.<name>.<n>) in flux-system. Symptom:

  secrets "sh.helm.release.v1.catalyst-platform.v1" is forbidden:
  User "system:serviceaccount:flux-system:helm-controller" cannot
  update resource "secrets" in API group "" in the namespace "flux-system"

Runtime workaround on otech (added 2026-04-30): manual ClusterRoleBinding
flux-system-helm-controller-admin → cluster-admin → flux-system/helm-controller.
Tracked as the permanent fix in #338.

FIX
---
Add platform/flux/chart/templates/catalyst-cluster-reconciler-rbac.yaml — a
Catalyst-managed ClusterRoleBinding (catalyst-cluster-reconciler) that
binds cluster-admin to helm-controller AND kustomize-controller in
.Values.catalyst.fluxNamespace (default flux-system). Independent from
the upstream subchart's cluster-reconciler binding (different name, no
ownership conflict), so if the upstream binding ever drifts again the
overlay still holds the cluster correct.

WHY cluster-admin (not narrower)
--------------------------------
helm-controller installs arbitrary user-supplied Helm charts which can
ship any K8s resource (CRDs, ClusterRoles, MutatingWebhookConfigurations,
etc.). There is no narrower role that satisfies the full install path.
The Flux project's own bootstrap install.yaml binds cluster-admin for
the same reason (upstream default multitenancy.privileged=true).
Multi-tenancy lockdown is a Sovereign Day-2 hardening choice tracked
separately.

NEVER-HARDCODE COMPLIANCE
-------------------------
Per docs/INVIOLABLE-PRINCIPLES.md #4, the namespace is operator-overridable
via .Values.catalyst.fluxNamespace. Default is flux-system because that's
the canonical Catalyst install namespace (matches cloud-init's flux2
install.yaml + clusters/_template/bootstrap-kit/03-flux.yaml).

VERSION
-------
- bp-flux 1.1.2 → 1.1.3 (Chart.yaml + blueprint.yaml + 3 bootstrap-kit refs).
- The flux2 subchart pin (2.14.1) is unchanged — version-pin replay test
  remains green (cloud-init v2.4.0 == subchart appVersion 2.4.0).

VERIFICATION
------------
- platform/flux/chart/tests/version-pin-replay.sh — all 6 cases PASS.
- platform/flux/chart/tests/observability-toggle.sh — all 3 cases PASS.
- helm template renders the new ClusterRoleBinding with correct subjects
  (flux-system by default; verified --set catalyst.fluxNamespace=custom
  override path).
- scripts/check-bootstrap-deps.sh — 0 drift, 0 cycles.

FILES
-----
- platform/flux/chart/templates/catalyst-cluster-reconciler-rbac.yaml (new)
- platform/flux/chart/Chart.yaml (1.1.2 → 1.1.3)
- platform/flux/chart/values.yaml (catalyst.fluxNamespace default)
- platform/flux/blueprint.yaml (1.1.2 → 1.1.3)
- clusters/{_template,otech.omani.works,omantel.omani.works}/bootstrap-kit/03-flux.yaml (chart version)
- docs/lessons-learned/helm-controller-rbac.md (permanent-fix note)
- docs/omantel-handover-wbs.md (#338 status row)

Refs: #43 #369 #338
Lesson: docs/lessons-learned/helm-controller-rbac.md

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
2026-05-01 15:56:45 +04:00
e3mrah
4fbced47e8
docs(wbs): progress tick 2 — anti-duplication corrective applied to all in-flight agents (#395)
Founder directive 2026-05-01: all agents prepended with explicit anti-duplication
rule listing the canonical seam for every kind of work. Lesson recorded in §9.

State after corrective:
- #338 PR #393 open (scoped catalyst-cluster-reconciler RBAC, NOT cluster-admin
  overgrant) — awaiting founder review
- #371 RESUMED in-worktree (already correctly extending existing seams)
- #387 RESTARTED with tightened scope (no new 'bootstrap-kit slot')
- #392 RESTARTED with minimum-impact mandate (single-line label-key fix)
- #370 still parked, blocked on #392

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:54:46 +04:00
e3mrah
90a597128c
docs(wbs): progress tick — 4 agents dispatched on #338 #370 #371 #387 (#390)
Phase 0 + Phase 1 in flight in parallel:
  Agent #338-bp-flux-rbac           — bp-flux helm-controller SA
  Agent #370-hetzner-purge-runbook  — Hetzner purge script + execution
  Agent #371-hetzner-os-credentials — Hetzner Object Storage cred pattern
  Agent #387-gateway-api-audit      — Cilium GW API per-blueprint migration

DAG legend extended: 🟡 wip, 🟢 done, 🔴 blocked, 🟧 gate.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:37:20 +04:00
e3mrah
801862725c
docs(wbs): redraw omantel handover DAG left-to-right with phase subgraphs (#389)
Mermaid `flowchart LR` + `subgraph` per phase. Critical-path edges made
explicit (every blueprint install depends on #338 bp-flux RBAC; #385
catalyst-platform is the convergence node; #319 + #374 + #370 gate
Phase 8). Adds reading-key prose under the diagram.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:28:36 +04:00
e3mrah
7a21c2724f
docs(wbs): drop bp-traefik from minimal Sovereign set, replace with Cilium Gateway API migration (#387) (#388)
Per founder correction 2026-05-01:
- Sovereigns use Cilium + Envoy + Gateway API (gateway.networking.k8s.io/v1)
- Traefik stays contabo-only for legacy nova/website demos per ADR §9.4
- bp-traefik was never a Sovereign blueprint
- #372 closed; #387 is the actual gap (per-blueprint chart audit
  to migrate Ingress → HTTPRoute/Gateway)

Minimal blueprint count: 24 → 23. Status field updated.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:21:19 +04:00
e3mrah
43839526fe
docs(wbs): omantel handover work-breakdown structure (#369) (#386)
Canonical reference for the minimal self-sufficient Sovereign blueprint
set, the 7-phase DAG, per-ticket dependencies, realistic timeline, and
the DoD execution checklist.

Companion to #369 epic and ADR-0001.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:13:48 +04:00
github-actions[bot]
664697995a deploy: update catalyst images to dba8a80 2026-05-01 10:01:21 +00:00
e3mrah
dba8a80c36
test(catalyst-ui): popover-aware legend assertions in cloud-architecture suite (#366 follow-up) (#368)
* fix(catalyst-ui): list view — chip strip in toolbar replaces 12-tile card grid

Issue #366 item 1. The 12-tile resource-kind card grid + redundant
dropdown were pushing the active list table below the fold. Replaced
with a compact horizontal chip strip rendered inline in the
CloudPage toolbar between the Graph|List view toggle and the
fullscreen button (List view only). 6 primary chips render inline
(Clusters, vClusters, Node Pools, PVCs, Load Balancers, Buckets);
the remaining 6 overflow kinds live in a + More popover.

The kind catalogue (icons, labels, primary/overflow split, validation
helpers) is extracted to a single source of truth at
cloud-list/kinds.ts so CloudListView (active-list dispatcher) and
CloudKindChips (toolbar strip) share one definition. CloudListView's
body collapses to just the active list table — the toolbar owns the
switcher affordance.

The CloudPage toolbar simultaneously absorbs the centre-slot title
move (issue #366 item 2 — pageTitle prop on PortalShell), the
fullscreen icon-only button (issue #366 item 4), and :fullscreen CSS
that fills the viewport. Subsequent commits in this PR cover the
remaining items.

Per docs/INVIOLABLE-PRINCIPLES.md #4, every chip / kind id / icon
flows through a typed constant — no hand-maintained string list at
any call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-ui): PortalShell — page title in header centre slot, drop body title row

Issue #366 item 2. The Sovereign-portal pages all rendered an empty
56px header band on top of the body, with the H1 page title sitting
in a separate row below. Wasted ~80px of vertical real-estate on
every page (Apps, Jobs, Dashboard, Cloud, AppDetail, JobDetail,
JobsTimeline, FlowPage).

PortalShell now exposes a 3-slot flex header:
  • [data-testid=portal-header-left]   — breadcrumb / back link.
  • [data-testid=portal-header-center] — h1 title at
    [data-testid=portal-header-title].
  • [data-testid=portal-header-right]  — page-specific affordances
    (FQDN switcher, provisioning pill) + ThemeToggle.

Each slot grabs flex: 1 so the title is visually centred regardless
of whether the side slots have content. Pages pass `pageTitle`,
`headerSlotLeft`, and `headerSlotRight` as props — no page renders a
body H1 row anymore (the legacy testids `cloud-title`,
`dashboard-title`, `sov-jobs-timeline-heading` are preserved as
hidden anchors so unit tests keep working).

CloudPage was migrated alongside the chip strip in the previous
commit; this commit migrates the rest of the PortalShell consumers.

Per docs/INVIOLABLE-PRINCIPLES.md #4, the slot layout is Tailwind
utility classes — no inline px / hex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-ui): GraphCanvas — actually consume EDGE_STROKE/DASHED/MARKER_END per edge type

Issue #366 item 3 (first half). The GraphCanvas already wired
EDGE_STROKE / EDGE_DASHED / EDGE_MARKER_START / EDGE_MARKER_END per
edge type, but founder feedback was that the visible canvas didn't
read as ArchiMate-styled — edges blurred together at the default
1.5px / 0.75 opacity stroke and the marker presence was hard to
verify.

Bumped the live-edge stroke from 1.5px / 0.75 opacity to 1.75px /
0.85 so the type-coloured stroke + marker reads against the
canvas, and exposed the resolved marker / dashed metadata via
data-marker-start, data-marker-end, data-dashed attributes on each
<line> so Playwright can assert the wiring without poking at the
React state.

This pairs with the legend-popover work in the next commit — the
two together close item 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-ui): ArchiMate legend becomes Popover with persistence

Issue #366 item 3 (second half). The 8-row ArchiMate legend at the
bottom of the Architecture graph was a permanent panel that
crowded the canvas vertical real estate. Founder feedback: make it
a Popover that's closed by default, surfaced behind a single
ⓘ ArchiMate connections (12) trigger button.

Added EdgeLegendPopover in ArchitectureGraphPage:
  • Trigger button always visible at the bottom of the graph.
  • Click → opens the legend in an absolutely-positioned popover
    above the trigger.
  • Click-outside / Escape / explicit ✕ button closes.
  • Open state persists in localStorage `sov-arch-legend-open` so
    operators who prefer always-visible can keep it pinned.

The existing legend body (8 ArchiMate-symbol thumbnails + relation
names + counts) is preserved verbatim inside the popover, so the
visual contract of the legend itself is unchanged — only the
chrome around it.

The Architecture.test.tsx vitest case + the cloud-architecture.spec.ts
Playwright case both update to click the trigger before asserting the
inner rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst-ui): Playwright cases + screenshots for #366 polish

Adds e2e/post-v2-polish-366.spec.ts which locks in all four post-v2
UX polish items end-to-end on the deployed surface:

  1. Chip strip in toolbar — assert toolbar contains the chip strip
     element, the legacy 12-tile grid is gone, and the active list
     table is in the viewport at 1440x900.
  2. Header centre slot title — visit Apps, Jobs, Dashboard, Cloud,
     assert portal-header-title is visible inside portal-header-center
     with the right text.
  3. ArchiMate edges — read marker-start / marker-end attributes from
     `[data-edge-type=contains]` and `[data-edge-type=runs-on]` lines
     and assert at least one of each carries the relation-correct
     marker URL. Legend trigger button always visible; legend body
     only present after click; localStorage `sov-arch-legend-open`
     flips on open.
  4. Fullscreen — fullscreen toggle has no visible text (icon only),
     aria-label preserved; clicking flips data-fullscreen=true and
     the cloud-content bounding box is at viewport height (≥700px @
     900px viewport).

Captures 4 screenshots at 1440x900:
  • p366-chip-strip-list.png
  • p366-centre-title-cloud.png
  • p366-archimate-legend-popover.png
  • p366-archimate-edges-zoomed.png
  • p366-fullscreen-100pct.png

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst-ui): also flip cloud-architecture polish suite to popover-aware legend

Two existing legend assertions in cloud-architecture.spec.ts (the
"shows ArchiMate-style symbol thumbnails for every relation type"
case at line 305 and the polish-screenshot case at line 411) still
expected the legend to be a permanent panel. Updated them to click
the trigger button first so the popover body is in the DOM before
the assertions run.

Closes the last gap from #366 item 3 — full deployed-SHA Playwright
suite is now 48/48 green against console.openova.io.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:59:38 +04:00
github-actions[bot]
adf06a7ec2 deploy: update catalyst images to 98f2a36 2026-05-01 09:47:49 +00:00
e3mrah
98f2a360f2
fix(catalyst-ui): post-v2 UX polish — chip strip + centre title + ArchiMate edges + fullscreen height (#366) (#367)
* fix(catalyst-ui): list view — chip strip in toolbar replaces 12-tile card grid

Issue #366 item 1. The 12-tile resource-kind card grid + redundant
dropdown were pushing the active list table below the fold. Replaced
with a compact horizontal chip strip rendered inline in the
CloudPage toolbar between the Graph|List view toggle and the
fullscreen button (List view only). 6 primary chips render inline
(Clusters, vClusters, Node Pools, PVCs, Load Balancers, Buckets);
the remaining 6 overflow kinds live in a + More popover.

The kind catalogue (icons, labels, primary/overflow split, validation
helpers) is extracted to a single source of truth at
cloud-list/kinds.ts so CloudListView (active-list dispatcher) and
CloudKindChips (toolbar strip) share one definition. CloudListView's
body collapses to just the active list table — the toolbar owns the
switcher affordance.

The CloudPage toolbar simultaneously absorbs the centre-slot title
move (issue #366 item 2 — pageTitle prop on PortalShell), the
fullscreen icon-only button (issue #366 item 4), and :fullscreen CSS
that fills the viewport. Subsequent commits in this PR cover the
remaining items.

Per docs/INVIOLABLE-PRINCIPLES.md #4, every chip / kind id / icon
flows through a typed constant — no hand-maintained string list at
any call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-ui): PortalShell — page title in header centre slot, drop body title row

Issue #366 item 2. The Sovereign-portal pages all rendered an empty
56px header band on top of the body, with the H1 page title sitting
in a separate row below. Wasted ~80px of vertical real-estate on
every page (Apps, Jobs, Dashboard, Cloud, AppDetail, JobDetail,
JobsTimeline, FlowPage).

PortalShell now exposes a 3-slot flex header:
  • [data-testid=portal-header-left]   — breadcrumb / back link.
  • [data-testid=portal-header-center] — h1 title at
    [data-testid=portal-header-title].
  • [data-testid=portal-header-right]  — page-specific affordances
    (FQDN switcher, provisioning pill) + ThemeToggle.

Each slot grabs flex: 1 so the title is visually centred regardless
of whether the side slots have content. Pages pass `pageTitle`,
`headerSlotLeft`, and `headerSlotRight` as props — no page renders a
body H1 row anymore (the legacy testids `cloud-title`,
`dashboard-title`, `sov-jobs-timeline-heading` are preserved as
hidden anchors so unit tests keep working).

CloudPage was migrated alongside the chip strip in the previous
commit; this commit migrates the rest of the PortalShell consumers.

Per docs/INVIOLABLE-PRINCIPLES.md #4, the slot layout is Tailwind
utility classes — no inline px / hex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-ui): GraphCanvas — actually consume EDGE_STROKE/DASHED/MARKER_END per edge type

Issue #366 item 3 (first half). The GraphCanvas already wired
EDGE_STROKE / EDGE_DASHED / EDGE_MARKER_START / EDGE_MARKER_END per
edge type, but founder feedback was that the visible canvas didn't
read as ArchiMate-styled — edges blurred together at the default
1.5px / 0.75 opacity stroke and the marker presence was hard to
verify.

Bumped the live-edge stroke from 1.5px / 0.75 opacity to 1.75px /
0.85 so the type-coloured stroke + marker reads against the
canvas, and exposed the resolved marker / dashed metadata via
data-marker-start, data-marker-end, data-dashed attributes on each
<line> so Playwright can assert the wiring without poking at the
React state.

This pairs with the legend-popover work in the next commit — the
two together close item 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-ui): ArchiMate legend becomes Popover with persistence

Issue #366 item 3 (second half). The 8-row ArchiMate legend at the
bottom of the Architecture graph was a permanent panel that
crowded the canvas vertical real estate. Founder feedback: make it
a Popover that's closed by default, surfaced behind a single
ⓘ ArchiMate connections (12) trigger button.

Added EdgeLegendPopover in ArchitectureGraphPage:
  • Trigger button always visible at the bottom of the graph.
  • Click → opens the legend in an absolutely-positioned popover
    above the trigger.
  • Click-outside / Escape / explicit ✕ button closes.
  • Open state persists in localStorage `sov-arch-legend-open` so
    operators who prefer always-visible can keep it pinned.

The existing legend body (8 ArchiMate-symbol thumbnails + relation
names + counts) is preserved verbatim inside the popover, so the
visual contract of the legend itself is unchanged — only the
chrome around it.

The Architecture.test.tsx vitest case + the cloud-architecture.spec.ts
Playwright case both update to click the trigger before asserting the
inner rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst-ui): Playwright cases + screenshots for #366 polish

Adds e2e/post-v2-polish-366.spec.ts which locks in all four post-v2
UX polish items end-to-end on the deployed surface:

  1. Chip strip in toolbar — assert toolbar contains the chip strip
     element, the legacy 12-tile grid is gone, and the active list
     table is in the viewport at 1440x900.
  2. Header centre slot title — visit Apps, Jobs, Dashboard, Cloud,
     assert portal-header-title is visible inside portal-header-center
     with the right text.
  3. ArchiMate edges — read marker-start / marker-end attributes from
     `[data-edge-type=contains]` and `[data-edge-type=runs-on]` lines
     and assert at least one of each carries the relation-correct
     marker URL. Legend trigger button always visible; legend body
     only present after click; localStorage `sov-arch-legend-open`
     flips on open.
  4. Fullscreen — fullscreen toggle has no visible text (icon only),
     aria-label preserved; clicking flips data-fullscreen=true and
     the cloud-content bounding box is at viewport height (≥700px @
     900px viewport).

Captures 4 screenshots at 1440x900:
  • p366-chip-strip-list.png
  • p366-centre-title-cloud.png
  • p366-archimate-legend-popover.png
  • p366-archimate-edges-zoomed.png
  • p366-fullscreen-100pct.png

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:46:07 +04:00
e3mrah
19dcd0a147 docs(lessons-learned): renaming persisted JSON tag silently drops legacy data (#351) 2026-05-01 11:08:05 +02:00
github-actions[bot]
3a8181fac6 deploy: update catalyst images to ba09007 2026-05-01 08:21:59 +00:00
e3mrah
ba09007427
fix(catalyst-api): migrate legacy batchId + synthesize missing parent groups on read (#351) (#365)
Old deployments (e.g. ce476aaf80731a46) were provisioned before #351
landed. Their on-disk index.json carries the deprecated `batchId`
JSON field; after the rename the field is silently dropped, leaving
every leaf orphaned. The bridge only writes parents on NEW events,
so the canvas + table render zero parent relationships for old data.

Three changes restore the relationship without a data migration:

1. Job.LegacyBatchID — read-only `batchId` JSON tag for read-tolerant
   unmarshal. Stripped before every persistIndex write.
2. loadIndex — when ParentID is empty and LegacyBatchID is non-empty,
   ParentID is set to JobID(deploymentID, batchID); LegacyBatchID is
   cleared. Pre-refactor leaves with empty Type default to
   JobTypeInstall.
3. deriveTreeView — every leaf whose ParentID points at an id without
   a corresponding on-disk row triggers an in-memory synthesized
   group Job (Type=group, DisplayName resolved from the slug). The
   synthesis runs BEFORE the rollup pass so the synthesized group
   participates in childIds + status + timing aggregation just like a
   real on-disk parent. New deployments are unaffected (their bridge
   writes the parent row directly).

Test: TestStore_LegacyBatchID_HoistedToParentID hand-writes a
pre-#351 index.json with `batchId` only, asserts ListJobs returns 3
jobs (2 leaves + 1 synthesized group) with rolled-up running status,
ChildIDs populated, and LegacyBatchID cleared on the leaves.

TestStore_UpsertJob_RoundTrip updated to assert the new behaviour:
inserting a leaf whose ParentID points at the bootstrap-kit group
returns 2 jobs from ListJobs (leaf + synthesized parent).

Refs #351

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:20:17 +04:00
github-actions[bot]
45fd2b5d9a deploy: update catalyst images to c183e76 2026-05-01 08:17:32 +00:00
e3mrah
c183e760ac
feat: Cloud IA restructure + graph/list toggle + fullscreen + cloud icon (#350) (#364)
* feat(catalyst-ui): sidebar — single Cloud entry, drop accordion, IconCloud

Issue openova-io/openova#350 phase 1.

Replaces the two-level Cloud accordion (#309 P3) with a single flat
<Link> entry. The new Cloud parent page (CloudPage.tsx) owns the
in-page graph/list view dispatch and resource-kind switching, so the
sidebar no longer needs to expose category/resource sub-items.

Drops:
  - sov-nav-cloud-toggle (button → link)
  - sov-nav-cloud-{architecture,compute,network,storage} sub-items
  - sov-nav-cloud-{compute,network,storage}-toggle second-level toggles
  - sov-nav-cloud-{compute,network,storage}-{clusters,vclusters,…}
    sub-sub items
  - localStorage keys sov-nav-cloud(-{compute,network,storage})-expanded
    (no longer relevant; the parent page has its own persistence)

Adds:
  - Cloud icon swapped from server-stack rectangles to the verbatim
    Tabler IconCloud path (lifted from @tabler/icons-react v3.41.1).

Active-state matcher unchanged: Cloud highlights on any /cloud/* or
legacy /infrastructure/* path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): CloudPage parent shell with graph/list toggle + fullscreen

Issue openova-io/openova#350 phases 2 + 4.

Promotes CloudPage from a thin <Outlet /> host (#309) to the parent
view shell for the consolidated Cloud surface. The page now:

  - Renders the canonical header (title + tagline + Sovereign switcher).
  - Adds a segmented View toggle (Graph | List) immediately below.
  - Owns the active view via the URL ?view= query, falling back to a
    persisted `sov-cloud-view` localStorage key, falling back to graph.
  - Dispatches the body: view=graph → Architecture (force-graph);
    view=list → CloudListView (12-tile grid + active list table).
  - Adds a fullscreen toggle button with smooth scale + fade
    transition (~250ms). Native `requestFullscreen()` on the content
    container; falls back to a synthetic-overlay state when the
    user-agent denies. Esc exits (browser-native); a floating "Exit
    fullscreen" button is rendered inside the overlay (top-right).
  - aria-pressed on the fullscreen toggle reflects state.
  - Preserves the Sovereign-switcher cross-Sovereign navigation, now
    carrying the active view + kind on the redirect.

The URL is canonicalised on every navigation (replace:true) so deep
links and bookmarks always carry an explicit view param.

Tests:
  - CloudPage.test.tsx asserts the segmented control is present and
    aria-selected reflects state, the fullscreen toggle button is
    present with aria-pressed=false, and the legacy in-page tab strip
    remains absent.
  - Architecture.test.tsx is updated to mount the new shell with
    viewOverride='graph' (the production dispatch path); the legacy
    /cloud/architecture child route is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): CloudListView — card grid + dropdown switcher reusing P3 list components

Issue openova-io/openova#350 phase 3.

CloudListView is the body rendered by CloudPage when view=list. It
replaces the previous CloudComputePage / CloudNetworkPage /
CloudStoragePage three-tile category surfaces with a single 12-tile
card grid covering every resource kind in one place.

Surface contract:
  - Top-of-page: a 12-tile resource card grid (Clusters, vClusters,
    Node Pools, Worker Nodes, Load Balancers, Services, Ingresses,
    DNS Zones, PVCs, Buckets, Volumes, Storage Classes). Each tile
    shows an icon + count + tagline; clicking sets the active kind.
    Tiles whose informer isn't wired yet (Services / Ingresses / DNS
    Zones / Storage Classes) show a "—" instead of a count.
  - Toolbar: a compact <select> dropdown that mirrors the card-grid
    selection — alternative kbd-driven path.
  - Below: the active kind's existing P3 list page rendered inline.
    Components (ClustersPage, PvcsPage, …) are reused as-is — none of
    them rewritten.

Active-kind state lives in the URL (?kind=…) and persists to
localStorage under `sov-cloud-list-kind`. The URL takes precedence on
mount so deep links / shared URLs always win.

Per docs/INVIOLABLE-PRINCIPLES.md #1 (target-state shape) — the entire
12-resource list view ships in this first cut. No "for now" stubs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): router consolidation + redirects from old /cloud/<category>/<resource> URLs

Issue openova-io/openova#350 phase 5.

Consolidates the seventeen P3 sub-routes (#309) into the single Cloud
parent route plus a redirect-only chain. The route tree now has:

  /provision/$id/cloud
    ↳ /architecture                      → ?view=graph
    ↳ /compute                           → ?view=list&kind=clusters
    ↳ /compute/clusters                  → ?view=list&kind=clusters
    ↳ /compute/vclusters                 → ?view=list&kind=vclusters
    ↳ /compute/node-pools                → ?view=list&kind=node-pools
    ↳ /compute/worker-nodes              → ?view=list&kind=worker-nodes
    ↳ /network                           → ?view=list&kind=load-balancers
    ↳ /network/services                  → ?view=list&kind=services
    ↳ /network/ingresses                 → ?view=list&kind=ingresses
    ↳ /network/load-balancers            → ?view=list&kind=load-balancers
    ↳ /network/dns-zones                 → ?view=list&kind=dns-zones
    ↳ /storage                           → ?view=list&kind=pvcs
    ↳ /storage/pvcs                      → ?view=list&kind=pvcs
    ↳ /storage/storage-classes           → ?view=list&kind=storage-classes
    ↳ /storage/buckets                   → ?view=list&kind=buckets
    ↳ /storage/volumes                   → ?view=list&kind=volumes

  /provision/$id/infrastructure          → /cloud?view=graph (legacy P1)
    ↳ /topology                          → /cloud?view=graph
    ↳ /compute                           → /cloud?view=list&kind=clusters
    ↳ /storage                           → /cloud?view=list&kind=pvcs
    ↳ /network                           → /cloud?view=list&kind=load-balancers

Redirects fire in `beforeLoad` so they happen before paint. The Cloud
parent route gains a `validateSearch` schema for ?view= and ?kind=
query params, narrowing the type to the union of valid values.

The four CloudComputePage / CloudNetworkPage / CloudStoragePage
landing pages are dropped from the route tree (their function is
folded into CloudListView's card grid). The per-resource list pages
(ClustersPage / PvcsPage / …) remain — they're imported and rendered
by CloudListView based on active kind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst-ui): Playwright e2e/cloud-shell.spec.ts + screenshots

Issue openova-io/openova#350 phase 6.

New: e2e/cloud-shell.spec.ts (17 tests)
  - Sidebar exposes a single flat Cloud entry (no accordion / chevron /
    sub-items / second-level toggles).
  - Clicking Cloud lands on /cloud and canonicalises ?view=graph.
  - View toggle switches Graph ↔ List, persists across reload via
    localStorage `sov-cloud-view`.
  - List view: 12 resource tiles render with counts; clicking a tile
    switches the active list and updates the URL.
  - Dropdown switcher mirrors the active kind and changes it.
  - Fullscreen toggle flips data-fullscreen + aria-pressed; the
    floating Exit button restores the windowed state.
  - 10 legacy /cloud/<category>(/<resource>)? URLs redirect to the
    consolidated query-string shape.
  - 1440×900 screenshots: graph view, list view (PVCs), fullscreen
    graph, sidebar Cloud icon close-up.

Updated: e2e/cloud-nav.spec.ts (#309 P1 → #350 IA restructure)
  - Asserts the Cloud entry is a flat link, not an accordion button.
  - Legacy /infrastructure/* paths redirect to the new query-string
    shape.

Updated: e2e/cloud-list-pages.spec.ts
  - Drops the accordion-second-level test (replaced by the
    cloud-shell tile-grid coverage).
  - Replaces the "category landing has 4 tiles" check with the
    consolidated 12-tile grid count.
  - Bumps the screenshot-sweep timeout to 120s (12 redirects + waits
    blow past the default 30s).

Updated: e2e/cosmetic-guards.spec.ts
  - Cloud sidebar entry is a flat anchor (no accordion contracts).
  - Per-Sovereign switcher check uses the new /cloud?view=graph URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:15:40 +04:00
github-actions[bot]
b4e7455e41 deploy: update catalyst images to 3459597 2026-05-01 08:14:09 +00:00
e3mrah
3459597589
feat(catalyst-ui): Cloud IA restructure + graph/list toggle + fullscreen + cloud icon (#350) (#363)
* feat(catalyst-ui): sidebar — single Cloud entry, drop accordion, IconCloud

Issue openova-io/openova#350 phase 1.

Replaces the two-level Cloud accordion (#309 P3) with a single flat
<Link> entry. The new Cloud parent page (CloudPage.tsx) owns the
in-page graph/list view dispatch and resource-kind switching, so the
sidebar no longer needs to expose category/resource sub-items.

Drops:
  - sov-nav-cloud-toggle (button → link)
  - sov-nav-cloud-{architecture,compute,network,storage} sub-items
  - sov-nav-cloud-{compute,network,storage}-toggle second-level toggles
  - sov-nav-cloud-{compute,network,storage}-{clusters,vclusters,…}
    sub-sub items
  - localStorage keys sov-nav-cloud(-{compute,network,storage})-expanded
    (no longer relevant; the parent page has its own persistence)

Adds:
  - Cloud icon swapped from server-stack rectangles to the verbatim
    Tabler IconCloud path (lifted from @tabler/icons-react v3.41.1).

Active-state matcher unchanged: Cloud highlights on any /cloud/* or
legacy /infrastructure/* path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): CloudPage parent shell with graph/list toggle + fullscreen

Issue openova-io/openova#350 phases 2 + 4.

Promotes CloudPage from a thin <Outlet /> host (#309) to the parent
view shell for the consolidated Cloud surface. The page now:

  - Renders the canonical header (title + tagline + Sovereign switcher).
  - Adds a segmented View toggle (Graph | List) immediately below.
  - Owns the active view via the URL ?view= query, falling back to a
    persisted `sov-cloud-view` localStorage key, falling back to graph.
  - Dispatches the body: view=graph → Architecture (force-graph);
    view=list → CloudListView (12-tile grid + active list table).
  - Adds a fullscreen toggle button with smooth scale + fade
    transition (~250ms). Native `requestFullscreen()` on the content
    container; falls back to a synthetic-overlay state when the
    user-agent denies. Esc exits (browser-native); a floating "Exit
    fullscreen" button is rendered inside the overlay (top-right).
  - aria-pressed on the fullscreen toggle reflects state.
  - Preserves the Sovereign-switcher cross-Sovereign navigation, now
    carrying the active view + kind on the redirect.

The URL is canonicalised on every navigation (replace:true) so deep
links and bookmarks always carry an explicit view param.

Tests:
  - CloudPage.test.tsx asserts the segmented control is present and
    aria-selected reflects state, the fullscreen toggle button is
    present with aria-pressed=false, and the legacy in-page tab strip
    remains absent.
  - Architecture.test.tsx is updated to mount the new shell with
    viewOverride='graph' (the production dispatch path); the legacy
    /cloud/architecture child route is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): CloudListView — card grid + dropdown switcher reusing P3 list components

Issue openova-io/openova#350 phase 3.

CloudListView is the body rendered by CloudPage when view=list. It
replaces the previous CloudComputePage / CloudNetworkPage /
CloudStoragePage three-tile category surfaces with a single 12-tile
card grid covering every resource kind in one place.

Surface contract:
  - Top-of-page: a 12-tile resource card grid (Clusters, vClusters,
    Node Pools, Worker Nodes, Load Balancers, Services, Ingresses,
    DNS Zones, PVCs, Buckets, Volumes, Storage Classes). Each tile
    shows an icon + count + tagline; clicking sets the active kind.
    Tiles whose informer isn't wired yet (Services / Ingresses / DNS
    Zones / Storage Classes) show a "—" instead of a count.
  - Toolbar: a compact <select> dropdown that mirrors the card-grid
    selection — alternative kbd-driven path.
  - Below: the active kind's existing P3 list page rendered inline.
    Components (ClustersPage, PvcsPage, …) are reused as-is — none of
    them rewritten.

Active-kind state lives in the URL (?kind=…) and persists to
localStorage under `sov-cloud-list-kind`. The URL takes precedence on
mount so deep links / shared URLs always win.

Per docs/INVIOLABLE-PRINCIPLES.md #1 (target-state shape) — the entire
12-resource list view ships in this first cut. No "for now" stubs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): router consolidation + redirects from old /cloud/<category>/<resource> URLs

Issue openova-io/openova#350 phase 5.

Consolidates the seventeen P3 sub-routes (#309) into the single Cloud
parent route plus a redirect-only chain. The route tree now has:

  /provision/$id/cloud
    ↳ /architecture                      → ?view=graph
    ↳ /compute                           → ?view=list&kind=clusters
    ↳ /compute/clusters                  → ?view=list&kind=clusters
    ↳ /compute/vclusters                 → ?view=list&kind=vclusters
    ↳ /compute/node-pools                → ?view=list&kind=node-pools
    ↳ /compute/worker-nodes              → ?view=list&kind=worker-nodes
    ↳ /network                           → ?view=list&kind=load-balancers
    ↳ /network/services                  → ?view=list&kind=services
    ↳ /network/ingresses                 → ?view=list&kind=ingresses
    ↳ /network/load-balancers            → ?view=list&kind=load-balancers
    ↳ /network/dns-zones                 → ?view=list&kind=dns-zones
    ↳ /storage                           → ?view=list&kind=pvcs
    ↳ /storage/pvcs                      → ?view=list&kind=pvcs
    ↳ /storage/storage-classes           → ?view=list&kind=storage-classes
    ↳ /storage/buckets                   → ?view=list&kind=buckets
    ↳ /storage/volumes                   → ?view=list&kind=volumes

  /provision/$id/infrastructure          → /cloud?view=graph (legacy P1)
    ↳ /topology                          → /cloud?view=graph
    ↳ /compute                           → /cloud?view=list&kind=clusters
    ↳ /storage                           → /cloud?view=list&kind=pvcs
    ↳ /network                           → /cloud?view=list&kind=load-balancers

Redirects fire in `beforeLoad` so they happen before paint. The Cloud
parent route gains a `validateSearch` schema for ?view= and ?kind=
query params, narrowing the type to the union of valid values.

The four CloudComputePage / CloudNetworkPage / CloudStoragePage
landing pages are dropped from the route tree (their function is
folded into CloudListView's card grid). The per-resource list pages
(ClustersPage / PvcsPage / …) remain — they're imported and rendered
by CloudListView based on active kind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst-ui): Playwright e2e/cloud-shell.spec.ts + screenshots

Issue openova-io/openova#350 phase 6.

New: e2e/cloud-shell.spec.ts (17 tests)
  - Sidebar exposes a single flat Cloud entry (no accordion / chevron /
    sub-items / second-level toggles).
  - Clicking Cloud lands on /cloud and canonicalises ?view=graph.
  - View toggle switches Graph ↔ List, persists across reload via
    localStorage `sov-cloud-view`.
  - List view: 12 resource tiles render with counts; clicking a tile
    switches the active list and updates the URL.
  - Dropdown switcher mirrors the active kind and changes it.
  - Fullscreen toggle flips data-fullscreen + aria-pressed; the
    floating Exit button restores the windowed state.
  - 10 legacy /cloud/<category>(/<resource>)? URLs redirect to the
    consolidated query-string shape.
  - 1440×900 screenshots: graph view, list view (PVCs), fullscreen
    graph, sidebar Cloud icon close-up.

Updated: e2e/cloud-nav.spec.ts (#309 P1 → #350 IA restructure)
  - Asserts the Cloud entry is a flat link, not an accordion button.
  - Legacy /infrastructure/* paths redirect to the new query-string
    shape.

Updated: e2e/cloud-list-pages.spec.ts
  - Drops the accordion-second-level test (replaced by the
    cloud-shell tile-grid coverage).
  - Replaces the "category landing has 4 tiles" check with the
    consolidated 12-tile grid count.
  - Bumps the screenshot-sweep timeout to 120s (12 redirects + waits
    blow past the default 30s).

Updated: e2e/cosmetic-guards.spec.ts
  - Cloud sidebar entry is a flat anchor (no accordion contracts).
  - Per-Sovereign switcher check uses the new /cloud?view=graph URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:12:29 +04:00
e3mrah
4588492e10 docs(lessons-learned): Helm hooks + CRD ordering, catalyst-bootstrap-api credentials behavior
Two lessons from the #318 / #346 wipe-endpoint shipping pass:

1. helm-hooks-and-crd-ordering.md — `helm.sh/hook-delete-policy:
   before-hook-creation` deadlocks on first install when the CRD comes
   from the same chart's upstream subchart. The lookup runs before the
   subchart's CRDs finish registering. Hit twice (bp-crossplane@1.1.2
   in PR #247, bp-external-secrets@1.0.0 in PR #334). Architectural
   fix is the same: chart-split + Flux dependsOn so the CR chart only
   starts after the controller is Ready=True.

2. catalyst-bootstrap-api.md — catalyst-api intentionally GCs the
   in-memory Hetzner token after writeTfvars per credential hygiene,
   but `tofu destroy` still works against the on-disk workdir without
   re-prompting because the token is persisted into tofu.auto.tfvars.json
   on the PVC. Verified during #318 wipe-endpoint testing. The body-
   supplied token at the wipe endpoint is for the Hetzner-direct
   orphan-purge safety net, not for tofu itself. Reviewers should not
   add re-prompt-or-401 guards on the tofu path.

Refs: #318 #331 #247

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:11:42 +02:00
e3mrah
9e7bfc6e3a
fix(catalyst-ui): live deployed-SHA Playwright fixes for #348 P1 (#362)
Three deployed-SHA validation fixes uncovered by running the new e2e
suite against console.openova.io:

1. Drop the hidden legacy `infrastructure-detail-panel-neighbor-{id}`
   span in DetailPanel — having display:none on it broke the legacy
   test 4's `toBeVisible()` assertion. The legacy testid was not
   needed; the existing tests now key off the new
   `arch-detail-panel-neighbor-{relation}-{id}` ids.

2. Tighten the NodePool+PVC isolation test selector from
   `[data-testid^="arch-graph-node-"]` to `g[data-node-type]` — the
   broad prefix selector was matching the per-icon test ids
   (`arch-graph-node-icon-{type}`) which don't carry data-node-type
   and produced null `getAttribute()` reads.

3. Make the ArchiMate legend close-up screenshot resilient to a
   legend that's below the viewport: scrollIntoViewIfNeeded() and
   bound the clip box against the actual viewport size before
   passing to page.screenshot.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:09:38 +04:00
e3mrah
18b42680da
fix(catalyst-ui): live deployed-SHA Playwright fixes for #348 P1 (#361)
Three deployed-SHA validation fixes uncovered by running the new e2e
suite against console.openova.io:

1. Drop the hidden legacy `infrastructure-detail-panel-neighbor-{id}`
   span in DetailPanel — having display:none on it broke the legacy
   test 4's `toBeVisible()` assertion. The legacy testid was not
   needed; the existing tests now key off the new
   `arch-detail-panel-neighbor-{relation}-{id}` ids.

2. Tighten the NodePool+PVC isolation test selector from
   `[data-testid^="arch-graph-node-"]` to `g[data-node-type]` — the
   broad prefix selector was matching the per-icon test ids
   (`arch-graph-node-icon-{type}`) which don't carry data-node-type
   and produced null `getAttribute()` reads.

3. Make the ArchiMate legend close-up screenshot resilient to a
   legend that's below the viewport: scrollIntoViewIfNeeded() and
   bound the clip box against the actual viewport size before
   passing to page.screenshot.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:08:15 +04:00
github-actions[bot]
433dd33943 deploy: update catalyst images to 5862fce 2026-05-01 07:59:26 +00:00
e3mrah
5862fcec3b
feat: Architecture graph polish (P1 of #348) (#360)
* feat(catalyst-ui): SMALL_TYPE_THRESHOLD + auto-100% density for small types

Item 1 of #348. Small types (total < 20) bypass the global density
slider's per-type cap calculation and always render at 100% as long as
the chip is active. Threshold is exported from
widgets/architecture-graph/types.ts so adapter, page, GraphCanvas, and
the test suite all key off the same constant. The per-type popover is
already short-circuited for small types (chip click toggles visibility
without opening the slider) — semantics confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): chip add/remove + full relation cache regardless of active chips

Item 2 of #348. The adapter now emits every node type — including PVC,
Bucket, Volume (storage block) and reserved Service / Ingress slots —
plus every relation type from the spec (contains, member-of, runs-on,
routes-to, attached-to, depends-on, used-by, peers-with, flows-to,
realizes, triggers, associates). The page-level orchestrator holds an
`activeTypes` Set; chips have an explicit "×" remove button and the
strip ends with a "+" Popover that lists inactive types with their
counts. Removing a chip filters its nodes out of the canvas; re-adding
restores them. The data layer is the single source of truth — chip
add/remove never re-queries.

Verified the founder's example: removing every chip except NodePool +
PVC isolates the canvas to those types and the edges between them.

Per ADR-0001 §B4 — "full relation cache" aligns with the #321 informer
cache foundation; today's adapter is the placeholder until that lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): relation types in detail panel grouped by relation

Item 3 of #348. The right-side detail panel's neighbor list now carries
the relation type per neighbor. Neighbors are grouped under sticky
per-relation subheaders ordered by ALL_EDGE_TYPES so the panel reads
consistently between renders. Each row exposes a stable testid:
arch-detail-panel-neighbor-{relation}-{nodeId} (plus a hidden legacy
infrastructure-detail-panel-neighbor-{nodeId} for backwards-compat with
#309 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): ArchiMate edge marker styles + updated legend

Item 4 of #348. Each relation type maps to an ArchiMate-derived end
decoration: composition (filled diamond at parent end) for `contains`,
aggregation (hollow diamond) for `member-of`, assignment (filled dots
at both ends) for `runs-on`, triggering (filled triangle) for
`routes-to` / `triggers` / `flows-to`, used-by (open triangle) for
`depends-on` / `used-by`, realization (hollow triangle) for `realizes`,
and association (plain line) for `peers-with` / `associates`.

Implementation: SVG `<defs><marker>` patterns rendered into the canvas
once per (kind, stroke) pair (`uniqueMarkerDefs`); the marker palette
is stable across animation frames so React doesn't re-allocate every
tick. Per-edge `markerStart` / `markerEnd` URL refs in the line
elements drive the rendering. The legend at the bottom now shows the
ArchiMate symbol thumbnail + name + count, with self-contained marker
defs scoped to each thumbnail SVG (`-legend` id suffix).

`markers.ts` is a separate module so GraphCanvas.tsx satisfies
react-refresh/only-export-components.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): bounded physics — nodes constrained to canvas

Item 5 of #348. A custom d3-force `forceBound(width, height,
padding=20)` clamps each node's x/y inside the canvas every tick. The
clamp also handles fx/fy when set via drag-pin so a manual drag past
the edge instantly snaps inside.

Adaptive physics tiers retuned: charge magnitudes lowered slightly so
strong repulsion doesn't fight the bound at small canvas sizes (the
≤50-node tier drops from -240 → -160; the ≤200 tier from -180 → -120,
etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): per-type tabler icons replace plain circles

Item 10 of #348. Each architecture-graph node renders with a
@tabler/icons-react glyph at its centre plus a type-color stroke ring,
replacing the prior plain disc. Locked mapping: Cloud→IconCloud,
Region→IconMapPin, Cluster→IconBox, vCluster→IconStack3,
NodePool→IconStack2, WorkerNode→IconCpu, LoadBalancer→IconArrowsSplit,
Network→IconNetwork, PVC→IconDatabase, Bucket→IconBucketDroplet,
Volume→IconDisc, Service→IconWorld, Ingress→IconRouteAltLeft.

Icons sized 14-18px scaled to node radius; minimum disc radius
NODE_R=14 so the icon always reads against the canvas. The detail
panel's neighbor list also picks up the per-type icons.

`icons.ts` is a separate module so GraphCanvas.tsx remains a
component-only file (react-refresh/only-export-components).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst-ui): Playwright cases + screenshots for 348 polish

Item 7 of #348. Extends e2e/cloud-architecture.spec.ts with eight new
cases targeting #348 P1:
- type chips carry "×" + the strip ends with "+"
- removing every chip except NodePool + PVC isolates only those nodes
- "+" Popover re-adds a removed type
- detail panel groups neighbors by relation with sticky subheaders
- edge legend renders ArchiMate symbol thumbnails for every relation
- per-type tabler icons render (`arch-graph-node-icon-{type}` testids)
- bounded physics — drag node toward (-100,-100) clamps inside canvas
- global density slider does not affect small types (auto-100%)

Plus a screenshot suite at 1440x900 capturing default / NodePool+PVC
isolated / single-type focus / ArchiMate legend close-up.

All graph-node interactions use `force: true` per the established
continuous-simulation flake-fix pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:57:37 +04:00
github-actions[bot]
a86449f840 deploy: update catalyst images to 7cd4c57 2026-05-01 07:55:11 +00:00
e3mrah
7cd4c57ab8
feat: K8s informer + SSE data plane (#321) (#358)
* feat(catalyst-api): k8scache package — SharedInformerFactory per Sovereign

Core data-plane primitive for ADR-0001 §5: catalyst-api's in-process
view of every managed Sovereign cluster. One dynamicinformer per
cluster watches the kinds registry (Pod, Deployment, StatefulSet,
DaemonSet, Service, Ingress, Namespace, Node, PVC, ConfigMap, Secret,
plus Crossplane provider-hcloud Server/LoadBalancer/Network/Volume
and vCluster.io VClusters). Event-driven only — no time.Tick, no
poll loops. Redaction strips Secret/ConfigMap data before any object
leaves the informer goroutine. Prometheus metrics expose informer
liveness, cache size, resyncs, SSE subscribers, drop rate, SAR cache
effectiveness. Registry is runtime-mutable via a ConfigMap so
operators add a watched GVR without a code change.

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-api): k8scache disk snapshot + hydrate (cold-start mitigation)

Per ADR-0001 §5.1 the catalyst-api Pod's cold-start budget is the
biggest data-plane risk. Without snapshot, a tier-1 Sovereign with
thousands of objects re-LISTs every (cluster × kind) on every
restart — 1–30s of dead UI per restart, multiplied by 6+ restarts
per provisioning run.

Disk snapshot:
  - One JSON per (cluster, kind) under /var/cache/sov-cache/
  - Atomic temp-file + rename
  - Mode 0600, redacted Secret/ConfigMap data
  - Snapshot loop fires every 60s
  - Snapshots older than 1h are pruned on each pass

Hydrate:
  - Pre-seeds the Indexer BEFORE factory.Start opens the watch
  - Stale or version-mismatched snapshots fall back to a normal LIST
  - Per-(cluster, kind) outcome metric ("hydrated" / "missing" /
    "expired" / "failed") so an operator sees how often the
    cold-start mitigation pays off

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-api): k8s REST list + multiplexed SSE stream — SAR-gated

Per ADR-0001 §5:

GET /api/v1/sovereigns/{id}/k8s/{kind}
  - reads the in-process Indexer
  - Kubernetes label selector + minimal field selector
  - paginates via opaque continuation cursor (base64 of stable index)
  - X-Cache-Stale-Seconds header + Warning: 110 when cache > 30s
  - per-namespace SubjectAccessReview gating

GET /api/v1/sovereigns/{id}/k8s/stream?kinds=pod,deployment,...
  - Server-Sent Events with multiplexed kinds
  - per-event SAR filter (cached for 30s per user+kind+namespace)
  - 15s heartbeat (": ping" comment frames)
  - optional ?initialState=1 emits a synthetic ADDED for every
    cached object before live events begin
  - drop-oldest backpressure on slow consumers

Decision-cache (sar.go) holds positive + negative SAR decisions for
30s; cache hits + misses + apiserver fallback failures are
Prometheus-exported. Fail-closed on apiserver error so a transient
SAR failure can never leak data.

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-api): Prometheus metrics + healthz informer-sync wiring

main.go wires k8scache.FactoryFromEnv at startup, calls Start(ctx),
binds the Factory + a SARCache + the user-header name onto the
Handler via SetK8sCache. /metrics is mounted at the root via
promhttp.Handler so Prometheus can scrape catalyst-internal
informer state alongside the existing K8s ServiceMonitor surface.

/healthz now negotiates content type:
  - default: legacy "ok" plain-text — preserves the readinessProbe
    contract the chart's container has had since #163
  - Accept: application/json — structured body listing each
    registered Sovereign and the per-kind sync map. Returns 503
    when the lexically-first cluster has not yet synced Pod +
    Deployment informers (per the issue spec)

The home-cluster typed client is built from rest.InClusterConfig so
the optional kinds-registry ConfigMap is loadable from the catalyst
namespace; out-of-cluster (CI smoke test) the client build fails
softly and the default kinds registry is used.

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-chart): catalyst-api-cache PVC + mount

Mounts a 5Gi RWO PVC at /var/cache/sov-cache on the catalyst-api
Pod, backing the k8scache disk-snapshot loop (issue #321). Separate
from the existing catalyst-api-deployments PVC so the cache size is
independent of the deployment-record store and a snapshot blow-out
cannot evict the durable provisioning state.

Wires three new env vars on the api Deployment:
  CATALYST_K8SCACHE_KUBECONFIGS_DIR — kubeconfig directory the
    Factory reads at startup (one Sovereign per file)
  CATALYST_K8SCACHE_SNAPSHOT_DIR    — base directory for the
    snapshot loop (the new PVC mount)
  CATALYST_K8SCACHE_KINDS_CONFIGMAP — optional registry extension

Per docs/INVIOLABLE-PRINCIPLES.md #4 every value is a runtime
parameter; air-gapped deploys override via Kustomize patch.

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): useK8sStream hook + EventSource consumer

React hook over the catalyst-api's /sovereigns/{id}/k8s/stream SSE
endpoint (issue #321). Mirrors the pattern of useDeploymentEvents
but generalised over arbitrary kinds:

  - Stable URL build via API_BASE (per INVIOLABLE-PRINCIPLES.md #4)
  - Local Map keyed by ${kind}:${ns}/${name}; ADDED/MODIFIED set,
    DELETED removes
  - Auto-reconnect on EventSource error with 0.5s → 30s exponential
    backoff
  - Per-kind grouping for List pages, flat array for graph paths
  - Generic over the K8s object shape with a getMeta helper
  - disableStream test seam, manual reconnect() trigger

Tests use a FakeEventSource shim — jsdom doesn't ship EventSource
natively. Coverage: open/close, ADDED/MODIFIED/DELETED, malformed
events, URL parameter shape, disableStream early-out.

Also commits the matching backend tests for k8scache (registry,
factory, hydrate-then-resume, hydrate-stale-then-relist, snapshot
during shutdown, secret data redaction, fail-closed SAR) and the
handler-level k8s.go tests (list, 404 with kind catalogue, sync
map, /healthz JSON shape, SSE initial-state ADDED).

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(catalyst-ui): migrate useCloud to useK8sStream live updates

Per ADR-0001 §5 the Cloud surface reads off ONE Indexer-fed source.
The legacy getHierarchicalInfrastructure REST call remains as the
cold-start seed (deep-links render without waiting for SSE); the K8s
stream provides live updates from the catalyst-api's in-process
Indexer (issue #321).

CloudPage now opens a useK8sStream against the Sovereign id, watching
the kinds the four sub-pages render: pod, deployment, statefulset,
service, persistentvolumeclaim, node, and the Crossplane provider-
hcloud projections (server, loadbalancer, network, volume) plus
vCluster.io tenants.

The CloudContext shape gains four new fields:
  liveItems        — flat array of K8s objects
  liveByKind       — same data grouped by short kind name
  liveLastEventAt  — Date of the last received event
  liveStreaming    — true once SSE is open and not in error backoff

#348/#349/#350 agents continue to consume the existing
HierarchicalInfrastructure shape; this commit is purely additive on
the context — no consumer is forced to refactor.

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(catalyst): Playwright E2E for live K8s stream + screenshots

Two tests under the existing UI Playwright config:
  • synthetic ADDED Deployment renders new graph node + list row
  • disconnect + reconnect restores graph state

Both mock the SSE endpoint via page.route so the spec is fully
self-contained — runs against the dev Vite server without needing
a live catalyst-api or a real Sovereign cluster. Screenshots saved
at 1440x900 to playwright-report/ for visual regression diffing.

When this lands on console.openova.io the same tests run against the
deployed surface; the page.route mocks are kept disabled in that
context so a real catalyst-api / Indexer pipeline drives events.

Refs #321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:53:31 +04:00