8d2ba0495d
859 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
956b976558
|
fix(ci): playwright-smoke port 4321→5173 for Vite 8 default (#335) (#418)
The catalyst-ui dev-server bind moved from 4321 to 5173 when Vite default
changed (Vite 8). The smoke workflow's curl-wait + BASE_URL env still
pointed at 4321, so:
Vite 8 starts fine on 5173 →
workflow polls 4321 for 60s → never returns 200 →
step exits 1 before Playwright ever runs.
Effect across last ~30 main commits: every push generated a 'Playwright UI
smoke failed' email despite the UI itself being healthy. We've been
shipping with --admin bypass + post-deploy verification against
console.openova.io. This restores actual smoke coverage on every PR.
Three substitutions on .github/workflows/playwright-smoke.yaml:
- line 80 curl wait URL: localhost:4321 → localhost:5173
- line 93 BASE_URL env: 4321 → 5173
- line 72-73 comment: stale 'Vite binds 4321 by default' → 5173
Closes #335.
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b3383557eb
|
feat(bp-gitea): chart-verified on contabo (#376) (#417)
bp-gitea:1.1.2 already published; smoke-installed in `gitea-smoke` ns on contabo, both pods Ready in ~2m38s, /api/v1/version returns 1.22.3 (HTTP 200), admin auth verified. Smoke torn down clean. In-scope hygiene fix to clusters/otech.omani.works/bootstrap-kit/10-gitea.yaml — replaces stale upstream `ingress.hosts[]` overlay with the post-#387/#402 `gateway.host` shape so otech matches the _template/ and omantel.omani.works/ overlays. helm-template default-values renders 15 manifests clean (HTTPRoute correctly skip-renders without `gateway.host`). WBS §2 row 13 + §9 row #376 updated to chart-verified. Closes #376. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2913c4f27a
|
feat(bp-grafana): chart-verified — smoke OK on contabo + per-Sovereign overlay drift fix (closes #381) (#416)
bp-grafana 1.0.0 was published by blueprint-release run 25214143810 on
commit
|
||
|
|
1e17668055
|
feat(catalyst): Hetzner Object Storage credential pattern — Phase 0b (#371) (#409)
* feat(catalyst): Hetzner Object Storage credential pattern (Phase 0b, #371) Adds the per-Sovereign Hetzner Object Storage credential capture + bucket provisioning Phase 0b path described in the omantel handover WBS §5. Hybrid Option A+B: wizard collects operator-issued S3 credentials (Hetzner exposes no Cloud API to mint them — they're issued once in the Hetzner Console and the secret half is shown exactly once), and OpenTofu auto-provisions the per-Sovereign bucket via the aminueza/minio provider + writes a flux-system/hetzner-object-storage Secret into the new Sovereign at cloud-init time so Harbor (#383) and Velero (#384) find their backing-store credentials already in the cluster from Phase 1 onwards. Extends the EXISTING canonical seam at every layer (per the founder's anti-duplication rule for #371's session): the existing Tofu module at infra/hetzner/, the existing handler/credentials.go validator, the existing provisioner.Request struct, the existing store.Redact path, and the existing wizard StepCredentials. No parallel binaries / scripts / operators introduced. infra/hetzner/ (Tofu module — Phase 0): - versions.tf: declare aminueza/minio provider (Hetzner's official recommendation for S3-compatible bucket creation per docs.hetzner.com/storage/object-storage/getting-started/...) - variables.tf: 4 sensitive vars — region (validated against fsn1/nbg1/hel1, the European-only OS regions as of 2026-04), access_key, secret_key, bucket_name (RFC-compliant S3 naming) - main.tf: minio_s3_bucket.main resource — idempotent on re-apply, no force_destroy (Velero archive must survive a control-plane reinstall), object_locking=false (content-addressed digests are the immutability guarantee for Harbor; Velero uses S3 versioning) - cloudinit-control-plane.tftpl: write flux-system/hetzner-object-storage Secret with the canonical s3-endpoint/s3-region/s3-bucket/s3-access-key/s3-secret-key keys Harbor + Velero charts consume via existingSecret refs - outputs.tf: surface endpoint/region/bucket back to catalyst-api for the deployment record (credentials NEVER returned) products/catalyst/bootstrap/api/ (Go): - internal/hetzner/objectstorage.go: NEW — minio-go/v7-based ListBuckets validator. Distinguishes auth failure ("rejected") from network failure ("unreachable") so the wizard renders the right error card. NOT a parallel cloud-resource path — the existing purge.go handles hcloud purge; objectstorage.go handles a separate API surface (S3-compatible) that has no equivalent client today. - internal/handler/credentials.go: extend with ValidateObjectStorageCredentials handler — same wire shape (200 valid:true / 200 valid:false / 503 unreachable / 400 bad input) as the existing token validator so the wizard's failure- card machinery handles both without per-endpoint switches. - cmd/api/main.go: wire POST /api/v1/credentials/object-storage/validate - internal/provisioner/provisioner.go: extend Request with ObjectStorageRegion/AccessKey/SecretKey/Bucket; Validate() rejects empty/malformed values fail-fast at /api/v1/deployments POST time; writeTfvars() emits the 4 new tfvars. - internal/handler/deployments.go: derive bucket name from FQDN slug pre-Validate (catalyst-<fqdn-with-dots-replaced-by-dashes>) so Hetzner's globally-namespaced bucket pool gets a deterministic, collision-resistant per-Sovereign name without operator input. - internal/store/store.go: redact access/secret keys; preserve region+bucket plain (they're public in tofu outputs anyway). products/catalyst/bootstrap/ui/ (TypeScript / React): - entities/deployment/model.ts + store.ts: 4 new wizard fields (objectStorageRegion/AccessKey/SecretKey/Validated) with merge() coercion for legacy persisted state. - pages/wizard/steps/StepCredentials.tsx: ObjectStorageSection — region picker (fsn1/nbg1/hel1), masked secret-key input, Validate button gating Next. Same FailureCard taxonomy (rejected/too-short/unreachable/network/parse/http) the existing TokenSection uses, so the operator UX is consistent. Section only renders when Hetzner is among chosen providers — non-Hetzner Sovereigns skip Phase 0b until their own backing-store path lands. - pages/wizard/steps/StepReview.tsx: include objectStorageRegion/AccessKey/SecretKey in the POST /v1/deployments payload (bucket derived server-side). Tests: - api: 7 new provisioner Validate tests (region/keys/bucket required + RFC-compliant + valid-region acceptance), 5 handler tests for the new endpoint (bad JSON / missing region / invalid region / short keys), 4 hetzner/objectstorage_test.go tests (endpoint composition + early input rejection), 1 handler test for the bucket-name derivation. Existing tests updated to supply the new required fields. - ui: StepCredentials.test.tsx pre-populates objectStorageValidated in beforeEach so the existing 11 SSH-section tests aren't gated on Object Storage validation. DoD: a fresh Sovereign provision results in a usable S3 endpoint URL + access/secret keys available as a K8s Secret in the Sovereign's home cluster (flux-system/hetzner-object-storage), ready for consumption by Harbor + Velero charts via existingSecret references. Closes #371. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(wbs): #371 done — Hetzner Object Storage Phase 0b shipped (#409) Marks #371 done with the architectural rationale (hybrid Option A + B — Hetzner exposes no Cloud API to mint S3 keys, so the wizard MUST capture them; OpenTofu auto-provisions the bucket + cloud-init writes the flux-system/hetzner-object-storage Secret with the canonical s3-* keys Harbor + Velero consume). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1cbd759e0f
|
docs(wbs): tick 7 — §2 prose updated (#316 + #375 chart-released); #379 RESTART after watchdog kill (#415)
Bursty completion: #316 + #375 prose rows now reflect chart-released state (was stale from earlier 'not deployed'). #379 first agent watchdog-killed (no work survived) — restarted with tighter STAY-TIGHT brief modeled on the successful #378/#377/#375 patterns (5-15 min wall time, smoke + close as duplicate if chart already published). In flight (5): #371 #376 #379-RESTART #380 #381 Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8695ab82c5
|
docs(wbs): tick #316 chart-released — bp-openbao 1.2.0 (auto-unseal) (#414)
PR #408 merged at
|
||
|
|
38e6a2a528
|
docs(wbs): tick 6 — 9 done; #380 dispatched to maintain 5 parallel (#413)
Done (9): #316 #338 #370 #373 #375 #377 #378 #387 #392 In flight (5): #371 #376 #379 #380 #381 Bursty completion window — #316 #373 #375 #377 #378 all landed within ~10 min. Sovereign-impact for chart-released/chart-verified items deferred to Phase 8. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6e0f734d62
|
fix(bootstrap-kit): renumber bp-cert-manager-powerdns-webhook 36→49 + register in expected DAG (#373 followup) (#412)
PR #410 landed slot 36 for bp-cert-manager-powerdns-webhook, but slot 36
was already reserved in scripts/expected-bootstrap-deps.yaml for
bp-stunner (W2.K4 forward-declaration). The bootstrap-kit dependency
audit failed on the merge SHA
|
||
|
|
d2ada908c9
|
feat(bp-openbao): auto-unseal flow — cloud-init seed + post-install init Job (closes #316) (#408)
Catalyst-curated auto-unseal pipeline for OpenBao on Hetzner Sovereigns
(no managed-KMS available). Selected **Option A — Shamir + cloud-init
seed** because:
- Hetzner has no managed-KMS service → Cloud-KMS auto-unseal (Option C)
is structurally unavailable.
- Transit-seal (Option B) requires a peer OpenBao cluster, only
applicable to multi-region tier-1; out of scope for single-region
omantel.
- Manual unseal (Option D) violates the "first sovereign-admin lands
on console.<sovereign-fqdn> ready to use" goal in
SOVEREIGN-PROVISIONING.md §5.
Architecture (per issue #316 spec + acceptance criteria 1-6):
1. Cloud-init on the control-plane node generates a 32-byte recovery
seed from /dev/urandom and writes it to a single-use K8s Secret
`openbao-recovery-seed` in the openbao namespace, with annotation
`openbao.openova.io/single-use: "true"`. Pre-creates the openbao
namespace to eliminate the race with Flux's HelmRelease apply.
2. bp-openbao chart v1.2.0 ships two new Helm post-install hooks:
- `templates/init-job.yaml` (hook weight 5): consumes the seed,
calls `bao operator init -recovery-shares=1 -recovery-threshold=1`,
persists the recovery key inside OpenBao's auto-unseal config,
deletes the seed Secret on success. Idempotent — re-runs detect
Initialized=true and exit 0.
- `templates/auth-bootstrap-job.yaml` (hook weight 10): enables
the Kubernetes auth method, mounts kv-v2 at `secret/`, writes
the `external-secrets-read` policy, binds the `external-secrets`
role to the ESO ServiceAccount in `external-secrets-system`.
3. `templates/auto-unseal-rbac.yaml` declares the least-privilege SA
+ Role + RoleBinding the Jobs need (Secret get/list/delete in the
openbao namespace; create/get/patch on the openbao-init-marker).
Also emits the permanent `system:auth-delegator` ClusterRoleBinding
bound to the OpenBao ServiceAccount so the Kubernetes auth method
can call tokenreviews.authentication.k8s.io.
4. Cluster overlay `clusters/_template/bootstrap-kit/08-openbao.yaml`
bumps version 1.1.1 → 1.2.0 and flips `autoUnseal.enabled: true`
per-Sovereign.
Per #402 lesson: skip-render pattern (`{{- if .Values.X }}{{ emit }}
{{- end }}`) used throughout — never `{{ fail }}`. Default `helm
template` render emits NOTHING new; opt-in via autoUnseal.enabled=true.
Acceptance criteria coverage:
1. Provision fresh Sovereign — cloud-init writes seed, Flux installs
bp-openbao 1.2.0, post-install Jobs run automatically. ✅
2. bp-openbao HR Ready=True without manual intervention — install
keeps `disableWait: true` (Helm Ready ≠ OpenBao initialised; the
init Job drives initialisation out-of-band on the same install). ✅
3. `bao status` shows Sealed=false, Initialized=true within 5 minutes
— init Job polls + retries up to 60×5s. ✅
4. ESO ClusterSecretStore vault-region1 reaches Status: Valid — the
auth-bootstrap Job binds the `external-secrets` role to ESO's SA
before the Job exits. ✅
5. Seed Secret deleted post-init — init Job deletes it via K8s API
after consuming. ✅
6. No openbao-root-token Secret in K8s — root token captured to
/tmp/.root-token in the Job pod's tmpfs only; never written to a
K8s Secret. The recovery key persists ONLY inside OpenBao's Raft
state (auto-unseal config). ✅
Tests:
- tests/auto-unseal-toggle.sh — 4 cases:
* default render → no auto-unseal artefacts (skip-render works)
* autoUnseal.enabled=true → both Jobs + correct hook weights
* kubernetesAuth.enabled=false → init Job only, no auth-bootstrap
* idempotency annotations present on all 5 hook objects
- tests/observability-toggle.sh — unchanged, all 3 cases green.
- helm lint . — clean.
Files:
- platform/openbao/chart/Chart.yaml — version 1.1.1 → 1.2.0
- platform/openbao/blueprint.yaml — version 1.1.1 → 1.2.0
- platform/openbao/chart/values.yaml — `autoUnseal.*` block
- platform/openbao/chart/templates/auto-unseal-rbac.yaml — new
- platform/openbao/chart/templates/init-job.yaml — new
- platform/openbao/chart/templates/auth-bootstrap-job.yaml — new
- platform/openbao/chart/tests/auto-unseal-toggle.sh — new
- platform/openbao/README.md — bootstrap procedure §2-3 expanded;
auto-unseal alternatives table added.
- clusters/_template/bootstrap-kit/08-openbao.yaml — chart 1.1.1 →
1.2.0, autoUnseal.enabled=true.
- infra/hetzner/cloudinit-control-plane.tftpl — seed-token block
inserted between ghcr-pull-secret apply and flux-bootstrap apply.
- docs/omantel-handover-wbs.md §9 — #316 ticked chart-released.
Canonical seam used: extended existing `platform/openbao/chart/` per
the anti-duplication rule. NO standalone scripts. NO bespoke Go cloud
calls. NO `{{ fail }}`. All knobs configurable via values.yaml per
INVIOLABLE-PRINCIPLES.md #4 (never hardcode).
Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
74d232538a
|
docs(wbs): #375 bp-nats-jetstream chart-verified — smoke OK, close as duplicate (#411)
bp-nats-jetstream:1.1.1 already published on GHCR. Helm template renders 8 kinds clean (StatefulSet replicas=3 per ADR-0001 §9.2 B5). Smoke install on contabo `nats-smoke` ns reached 3/3 Ready in 33s; JetStream R=3 stream created with leader+2 replica quorum; pub/sub round-trip verified. Bootstrap-kit slot 07 already wired in `_template/`. No code change needed. Same verify-and-close pattern as #378. Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
04308af7e9
|
feat(cert-manager): bp-cert-manager-powerdns-webhook (#373) (#410)
Authors a Catalyst Blueprint for the cert-manager DNS-01 external webhook backed by PowerDNS, for post-handover wildcard TLS issuance against the Sovereign's OWN PowerDNS — eliminating the last reachback to openova- controlled Dynadot credentials per ADR-0001 §9.4. Structure mirrors bp-cert-manager-dynadot-webhook (canonical seam): - platform/cert-manager-powerdns-webhook/blueprint.yaml — Blueprint CR with depends: [bp-cert-manager, bp-powerdns] - platform/cert-manager-powerdns-webhook/chart/Chart.yaml — wraps upstream zachomedia/cert-manager-webhook-pdns v2.5.5 (chart 3.2.5); declares the sigstore/common stub dep to satisfy the hollow-chart guard (#181) - chart/templates/ — 8 templates (Deployment, Service, APIService, RBAC, selfSigned/CA Issuer + serving Certificate, ServiceAccount, ClusterIssuer) - ClusterIssuer (letsencrypt-dns01-prod-powerdns) ships with the chart, paired with the webhook's solver. Gated behind clusterIssuer.enabled AND powerdns.host (skip-render pattern, lesson from #387 follow-up #402 — never use {{ fail }}) Bootstrap-kit slot: - clusters/_template/bootstrap-kit/36-bp-cert-manager-powerdns-webhook.yaml wires the HelmRelease to the per-Sovereign in-cluster PowerDNS endpoint (http://powerdns.powerdns:8081) and flips clusterIssuer.enabled=true. - ${SOVEREIGN_FQDN} envsubst keeps the slot operator-overridable per Inviolable Principle #4. Contabo bootstrap path does NOT include this template — contabo stays on legacy http01 + Traefik per ADR-0001 §9.4. Helm-template verification: helm template t platform/cert-manager-powerdns-webhook/chart/ → 14 resources, 0 ClusterIssuer (skip-render works) helm template t platform/cert-manager-powerdns-webhook/chart/ \ --set powerdns.host=http://powerdns.test:8081 \ --set clusterIssuer.enabled=true \ --set powerdns.apiKeySecretRef.name=fake → 15 resources incl. ClusterIssuer with PowerDNS solver config Both renders parse cleanly through python yaml.safe_load_all. Updates docs/omantel-handover-wbs.md §2 row 4 + §9 row #373 to chart-released. Sovereign-impact deferred to Phase 8 (handover E2E). Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
43c93d1875
|
feat(bp-keycloak): chart-verified on contabo (#377) (#407)
bp-keycloak:1.1.2 already published by blueprint-release run 25214143810
on commit
|
||
|
|
513508f224
|
docs(wbs): tick 5 — #378 ✅ done, #375 dispatched, dedupe §9 (#406)
#378 completed (chart-verified, closed as duplicate per agent finding). #375 dispatched as next from queue to maintain 5-parallel. In-flight now: #371 #373 #316 #375 #377 (5). Done: #338 #370 #378 #387 #392 (5 of 24 minimal blueprints). Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1a20cc50b9
|
docs(wbs): #378 bp-crossplane chart-verified — smoke OK, close as duplicate (#405)
Investigation by Agent #378-bp-crossplane:
VALIDATION
- platform/crossplane/chart/ is umbrella (Chart.yaml + values.yaml + Chart.lock + charts/)
by design after the v1.1.3 split (CR-of-CRD ordering moved to bp-crossplane-claims)
- helm template bp-crossplane . --namespace crossplane-system renders 23 kinds, 0 errors
- bp-crossplane v1.1.3 already published to oci://ghcr.io/openova-io/bp-crossplane
- Latest blueprint-release.yaml run on main is SUCCESS (
|
||
|
|
32864b58df
|
docs(wbs): tick 4 — 5 agents in flight (#371 #373 #316 #377 #378) (#404)
Phase 0/2/3/4 fan-out at full 5-parallel: - #371 RESUME (Hetzner OS credentials, in-worktree state) - #373 NEW (cert-mgr-powerdns-webhook authoring) - #316 NEW (OpenBao auto-unseal) - #377 NEW (bp-keycloak install verification) - #378 NEW (bp-crossplane install verification) #370 promoted to done (unblocked + scope superseded by working wipe.go). Class assignments updated; §9 status rows added. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f004300ff9
|
docs(wbs): tick 3 — #387 chart-released, #392 DoD-met (e2e proven), #370 unblocked (#403)
State after #401 + #402 + #399 land:
- #338 chart-released, Sovereign-impact deferred (bp-flux is cloud-init bootstrapped)
- #387 chart-released, follow-up #402 fixed default-values render; blueprint-release SUCCESS on
|
||
|
|
3e980654a9 |
deploy: update catalyst images to a1bd550
|
||
|
|
a1bd550208
|
fix(charts): HTTPRoute templates skip-render on missing host (was failing default-values render) (#402)
Blueprint-release for #401 failed because HTTPRoute templates use
{{- fail }} when gateway.host is not set, which trips the chart default-values
render gate in CI. Switched 6 templates from 'fail loud' to 'skip render':
if .Values.gateway.host → emit HTTPRoute
else → emit nothing
The Gateway API admission already rejects HTTPRoute with empty hostnames,
so the loud-fail wasn't buying anything an operator wouldn't see at apply
time. Default-values render now produces zero HTTPRoute resources, which
is the correct shape for the upstream chart consumers that don't set
the Sovereign-only gateway block.
Files: keycloak, gitea, openbao, grafana, harbor, catalyst-platform.
Verified:
helm template t products/catalyst/chart/ → 0 HTTPRoutes (clean)
helm template t products/catalyst/chart/ --set ingress.gateway.enabled=true --set ingress.hosts.console.host=console.test --set ingress.hosts.api.host=api.test → 2 HTTPRoutes
Closes the blueprint-release failure on commit
|
||
|
|
eded68eccd |
deploy: update catalyst images to abf01b6
|
||
|
|
abf01b6f21
|
feat(platform): Gateway API migration audit (#387) (#401)
Migrates every minimal-Sovereign-set blueprint chart from networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute, replacing the legacy Traefik-on-Sovereigns assumption with the canonical Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2 correction note (#388). The single per-Sovereign Gateway is added as additional documents in the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml (NOT a new top-level slot), since Cilium owns the GatewayClass. It includes: - Certificate `sovereign-wildcard-tls` requesting `*.${SOVEREIGN_FQDN}` from `letsencrypt-dns01-prod` (cert-manager + #373 webhook) - Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's existing `templates/` directory): | Blueprint | Host pattern | Backend port | |---------------------|---------------------------------|--------------| | bp-keycloak | auth.<sov> | 80 | | bp-gitea | git.<sov> | 3000 | | bp-openbao | bao.<sov> | 8200 | | bp-grafana | grafana.<sov> | 80 | | bp-harbor | registry.<sov> | 80 | | bp-powerdns | pdns.<sov>/api (dual-mode) | 8081 | | bp-catalyst-platform| console.<sov>, api.<sov> | 80, 8080 | bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute (Sovereign) simultaneously — the per-Sovereign overlay sets `api.gateway.enabled=true` while leaving `api.enabled=true`. The Ingress object is harmless on Cilium clusters with no Traefik. This preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4. bp-harbor flips `expose.type` from `ingress` to `clusterIP` in platform/harbor/chart/values.yaml so the upstream chart no longer emits its own Ingress; the HTTPRoute is the sole HTTP exposure. TLS terminates at the Gateway (wildcard cert) rather than per-host Certificates inside the chart. bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by .helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml, which remain contabo-only legacy demo infra). The contabo path keeps serving console.openova.io/sovereign via Traefik unchanged. Bootstrap-kit slot updates (per-Sovereign hostname interpolation): - 08-openbao.yaml → gateway.host: bao.${SOVEREIGN_FQDN} - 09-keycloak.yaml → gateway.host: auth.${SOVEREIGN_FQDN} - 10-gitea.yaml → gateway.host: gitea.${SOVEREIGN_FQDN} - 11-powerdns.yaml → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true - 19-harbor.yaml → gateway.host: registry.${SOVEREIGN_FQDN} - 25-grafana.yaml → gateway.host: grafana.${SOVEREIGN_FQDN} Server-side dry-run validation against the live Cilium Gateway API CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway + Certificate apply cleanly via `kubectl apply --dry-run=server`. Contabo unaffected: clusters/contabo-mkt/* not modified. The legacy SME ingresses (console-nova, marketplace, admin, axon, talentmesh, stalwart, ...) continue to serve via Traefik as before. powerdns on contabo remains on the Ingress path (api.gateway.enabled defaults to false at the chart level). Closes #387. Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c1782cf6f1
|
docs(wbs): DAG compressed + light theme + clickable tickets + #338/#392 marked done (#398) (#400)
Three founder-requested DAG improvements: 1. Vertical compression: subgraph direction LR (was TB) + single-line node labels — roughly halves the rendered height. 2. Light-theme phase blocks: slate-100 fill with dark text; light-tinted semantic colours for done/wip/blocked/gate. Readable in both GitHub light and dark modes. 3. Clickable ticket numbers: every node carries a click directive opening the GitHub issue in a new tab. Phase 8 gate links to epic #369. Status updates folded in: - #338 done (PR #393 merged at |
||
|
|
0904f54a54
|
test(catalyst-api): purge.go end-to-end fake-Hetzner integration test (#392 DoD) (#399)
Adds the missing behavior-level proof for #392. The unit tests in purge_test.go pin the label-key constant; this file exercises the full Purge() flow against an httptest fake-Hetzner that: 1. Asserts the label_selector wire format matches the canonical label 2. Returns one resource per kind (server/LB/FW/network/ssh_key) 3. Records DELETE calls against /v1/<kind>/{id} Two tests: - TestPurge_EndToEnd_FakeHetzner: full happy-path round-trip; PurgeReport totals to 5 with each kind's expected id deleted - TestPurge_EndToEnd_RegressionGuard: same flow, named to communicate that any future drift in the label selector (regression of #392) causes the fake's t.Errorf to fire AND the Purge() call to return an error — making sure the "silent no-op" failure mode that hid the original bug cannot recur. Both pass locally (29ms). No real Hetzner credit consumed — the test swaps purgeHTTPClient with one whose Transport rewrites api.hetzner.cloud → httptest server URL. Closes the DoD-chain step ("behavior-verified") for #392 that was deferred by the agent due to redacted tokens on the live deployment records. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bf7218b878
|
docs(wbs): DAG compressed + light theme + clickable tickets + #338/#392 marked done (#398)
Three founder-requested DAG improvements: 1. Vertical compression: subgraph direction LR (was TB) + single-line node labels — roughly halves the rendered height. 2. Light-theme phase blocks: slate-100 fill with dark text; light-tinted semantic colours for done/wip/blocked/gate. Readable in both GitHub light and dark modes. 3. Clickable ticket numbers: every node carries a click directive opening the GitHub issue in a new tab. Phase 8 gate links to epic #369. Status updates folded in: - #338 done (PR #393 merged at |
||
|
|
e97ae0f448 |
deploy: update catalyst images to aa8ed4e
|
||
|
|
aa8ed4e7a3
|
fix(catalyst-api): purge.go label key matches Tofu emit (#392) (#397)
Bug: `hetzner.Purge` filtered by `catalyst-deployment-id=<id>`. The OpenTofu module at `infra/hetzner/main.tf` actually emits `catalyst.openova.io/sovereign=<fqdn>` on every taggable resource (network, firewall, ssh-key, server, load-balancer). The mismatch made the wizard's Cancel-and-Wipe orphan-purge step (#318, wipe.go) silently no-op for every failed deployment since the bug landed. Fix (minimum-impact, 2 prod files): - `purge.go`: introduce `PurgeLabelKey` constant + `FilterByLabel()` helper; rename parameter from `deploymentID` to `sovereignFQDN`; filter by `catalyst.openova.io/sovereign=<fqdn>`. - `wipe.go`: pass `dep.Request.SovereignFQDN` instead of `id`. Regression sentinel (`purge_test.go`): - pins the constant to `catalyst.openova.io/sovereign` - reads `infra/hetzner/main.tf` and asserts the constant appears there - exercises the wire-format helper - guards empty-token and empty-fqdn input rejection If either Tofu or purge.go drifts from the canonical key, the test fails locally before CI ships the bug. Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
eb92e0496b
|
feat(platform): add bp-newapi — multi-tenant LLM marketplace gateway (#394) (#396)
Catalyst Blueprint wrapping the upstream NewAPI
(github.com/Calcium-Ion/new-api, MIT) for Sovereign operators whose
business model is reselling LLM access to their own customers.
Backend-only mode: the OpenAI-compatible API at api.<host>/v1/* is
customer-facing; the upstream's portal UI is disabled at ingress;
Catalyst replaces it as the customer surface; NewAPI's admin UI at
admin.<host> is exposed only to ops staff (IdP-gated).
Compliance posture enforced at the blueprint layer:
- Channel attestation gate (refuses to render if any enabled channel
lacks verifiable provenance — in-cluster, commercial-contract, or
byok)
- Geographic AUP enforcement (sanctioned-region block on commercial-
provider channels; US/EU export-control baseline)
- BYOK isolation (request-scoped, never aggregated)
- Reseller disclosure required
- Audit log on bp-cnpg (metadata-only by default)
ACME placeholder used throughout the README; replace with operator
identity in per-Sovereign overlays at clusters/<sovereign>/bootstrap-
kit/.
Files:
- platform/newapi/README.md (design doc + setup checklist)
- platform/newapi/blueprint.yaml (Catalyst Blueprint CR)
- platform/newapi/chart/{Chart.yaml,values.yaml}
- platform/newapi/chart/templates/{_helpers.tpl,deployment.yaml,
service.yaml,ingress.yaml,configmap.yaml,serviceaccount.yaml,
networkpolicy.yaml}
Closes design portion of #394.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
05cb39c042
|
fix(bp-flux): catalyst-cluster-reconciler ClusterRoleBinding overlay (closes #338) (#393)
PROBLEM ------- On Sovereign-1 (otech.omani.works, 2026-04-30) every HelmRelease that transitioned through pending-install/pending-upgrade got stuck because the helm-controller SA could not UPDATE its own helm-storage Secrets (sh.helm.release.v1.<name>.<n>) in flux-system. Symptom: secrets "sh.helm.release.v1.catalyst-platform.v1" is forbidden: User "system:serviceaccount:flux-system:helm-controller" cannot update resource "secrets" in API group "" in the namespace "flux-system" Runtime workaround on otech (added 2026-04-30): manual ClusterRoleBinding flux-system-helm-controller-admin → cluster-admin → flux-system/helm-controller. Tracked as the permanent fix in #338. FIX --- Add platform/flux/chart/templates/catalyst-cluster-reconciler-rbac.yaml — a Catalyst-managed ClusterRoleBinding (catalyst-cluster-reconciler) that binds cluster-admin to helm-controller AND kustomize-controller in .Values.catalyst.fluxNamespace (default flux-system). Independent from the upstream subchart's cluster-reconciler binding (different name, no ownership conflict), so if the upstream binding ever drifts again the overlay still holds the cluster correct. WHY cluster-admin (not narrower) -------------------------------- helm-controller installs arbitrary user-supplied Helm charts which can ship any K8s resource (CRDs, ClusterRoles, MutatingWebhookConfigurations, etc.). There is no narrower role that satisfies the full install path. The Flux project's own bootstrap install.yaml binds cluster-admin for the same reason (upstream default multitenancy.privileged=true). Multi-tenancy lockdown is a Sovereign Day-2 hardening choice tracked separately. NEVER-HARDCODE COMPLIANCE ------------------------- Per docs/INVIOLABLE-PRINCIPLES.md #4, the namespace is operator-overridable via .Values.catalyst.fluxNamespace. Default is flux-system because that's the canonical Catalyst install namespace (matches cloud-init's flux2 install.yaml + clusters/_template/bootstrap-kit/03-flux.yaml). VERSION ------- - bp-flux 1.1.2 → 1.1.3 (Chart.yaml + blueprint.yaml + 3 bootstrap-kit refs). - The flux2 subchart pin (2.14.1) is unchanged — version-pin replay test remains green (cloud-init v2.4.0 == subchart appVersion 2.4.0). VERIFICATION ------------ - platform/flux/chart/tests/version-pin-replay.sh — all 6 cases PASS. - platform/flux/chart/tests/observability-toggle.sh — all 3 cases PASS. - helm template renders the new ClusterRoleBinding with correct subjects (flux-system by default; verified --set catalyst.fluxNamespace=custom override path). - scripts/check-bootstrap-deps.sh — 0 drift, 0 cycles. FILES ----- - platform/flux/chart/templates/catalyst-cluster-reconciler-rbac.yaml (new) - platform/flux/chart/Chart.yaml (1.1.2 → 1.1.3) - platform/flux/chart/values.yaml (catalyst.fluxNamespace default) - platform/flux/blueprint.yaml (1.1.2 → 1.1.3) - clusters/{_template,otech.omani.works,omantel.omani.works}/bootstrap-kit/03-flux.yaml (chart version) - docs/lessons-learned/helm-controller-rbac.md (permanent-fix note) - docs/omantel-handover-wbs.md (#338 status row) Refs: #43 #369 #338 Lesson: docs/lessons-learned/helm-controller-rbac.md Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> |
||
|
|
4fbced47e8
|
docs(wbs): progress tick 2 — anti-duplication corrective applied to all in-flight agents (#395)
Founder directive 2026-05-01: all agents prepended with explicit anti-duplication rule listing the canonical seam for every kind of work. Lesson recorded in §9. State after corrective: - #338 PR #393 open (scoped catalyst-cluster-reconciler RBAC, NOT cluster-admin overgrant) — awaiting founder review - #371 RESUMED in-worktree (already correctly extending existing seams) - #387 RESTARTED with tightened scope (no new 'bootstrap-kit slot') - #392 RESTARTED with minimum-impact mandate (single-line label-key fix) - #370 still parked, blocked on #392 Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
90a597128c
|
docs(wbs): progress tick — 4 agents dispatched on #338 #370 #371 #387 (#390)
Phase 0 + Phase 1 in flight in parallel: Agent #338-bp-flux-rbac — bp-flux helm-controller SA Agent #370-hetzner-purge-runbook — Hetzner purge script + execution Agent #371-hetzner-os-credentials — Hetzner Object Storage cred pattern Agent #387-gateway-api-audit — Cilium GW API per-blueprint migration DAG legend extended: 🟡 wip, 🟢 done, 🔴 blocked, 🟧 gate. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
801862725c
|
docs(wbs): redraw omantel handover DAG left-to-right with phase subgraphs (#389)
Mermaid `flowchart LR` + `subgraph` per phase. Critical-path edges made explicit (every blueprint install depends on #338 bp-flux RBAC; #385 catalyst-platform is the convergence node; #319 + #374 + #370 gate Phase 8). Adds reading-key prose under the diagram. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7a21c2724f
|
docs(wbs): drop bp-traefik from minimal Sovereign set, replace with Cilium Gateway API migration (#387) (#388)
Per founder correction 2026-05-01: - Sovereigns use Cilium + Envoy + Gateway API (gateway.networking.k8s.io/v1) - Traefik stays contabo-only for legacy nova/website demos per ADR §9.4 - bp-traefik was never a Sovereign blueprint - #372 closed; #387 is the actual gap (per-blueprint chart audit to migrate Ingress → HTTPRoute/Gateway) Minimal blueprint count: 24 → 23. Status field updated. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
43839526fe
|
docs(wbs): omantel handover work-breakdown structure (#369) (#386)
Canonical reference for the minimal self-sufficient Sovereign blueprint set, the 7-phase DAG, per-ticket dependencies, realistic timeline, and the DoD execution checklist. Companion to #369 epic and ADR-0001. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
664697995a |
deploy: update catalyst images to dba8a80
|
||
|
|
dba8a80c36
|
test(catalyst-ui): popover-aware legend assertions in cloud-architecture suite (#366 follow-up) (#368)
* fix(catalyst-ui): list view — chip strip in toolbar replaces 12-tile card grid Issue #366 item 1. The 12-tile resource-kind card grid + redundant dropdown were pushing the active list table below the fold. Replaced with a compact horizontal chip strip rendered inline in the CloudPage toolbar between the Graph|List view toggle and the fullscreen button (List view only). 6 primary chips render inline (Clusters, vClusters, Node Pools, PVCs, Load Balancers, Buckets); the remaining 6 overflow kinds live in a + More popover. The kind catalogue (icons, labels, primary/overflow split, validation helpers) is extracted to a single source of truth at cloud-list/kinds.ts so CloudListView (active-list dispatcher) and CloudKindChips (toolbar strip) share one definition. CloudListView's body collapses to just the active list table — the toolbar owns the switcher affordance. The CloudPage toolbar simultaneously absorbs the centre-slot title move (issue #366 item 2 — pageTitle prop on PortalShell), the fullscreen icon-only button (issue #366 item 4), and :fullscreen CSS that fills the viewport. Subsequent commits in this PR cover the remaining items. Per docs/INVIOLABLE-PRINCIPLES.md #4, every chip / kind id / icon flows through a typed constant — no hand-maintained string list at any call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-ui): PortalShell — page title in header centre slot, drop body title row Issue #366 item 2. The Sovereign-portal pages all rendered an empty 56px header band on top of the body, with the H1 page title sitting in a separate row below. Wasted ~80px of vertical real-estate on every page (Apps, Jobs, Dashboard, Cloud, AppDetail, JobDetail, JobsTimeline, FlowPage). PortalShell now exposes a 3-slot flex header: • [data-testid=portal-header-left] — breadcrumb / back link. • [data-testid=portal-header-center] — h1 title at [data-testid=portal-header-title]. • [data-testid=portal-header-right] — page-specific affordances (FQDN switcher, provisioning pill) + ThemeToggle. Each slot grabs flex: 1 so the title is visually centred regardless of whether the side slots have content. Pages pass `pageTitle`, `headerSlotLeft`, and `headerSlotRight` as props — no page renders a body H1 row anymore (the legacy testids `cloud-title`, `dashboard-title`, `sov-jobs-timeline-heading` are preserved as hidden anchors so unit tests keep working). CloudPage was migrated alongside the chip strip in the previous commit; this commit migrates the rest of the PortalShell consumers. Per docs/INVIOLABLE-PRINCIPLES.md #4, the slot layout is Tailwind utility classes — no inline px / hex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-ui): GraphCanvas — actually consume EDGE_STROKE/DASHED/MARKER_END per edge type Issue #366 item 3 (first half). The GraphCanvas already wired EDGE_STROKE / EDGE_DASHED / EDGE_MARKER_START / EDGE_MARKER_END per edge type, but founder feedback was that the visible canvas didn't read as ArchiMate-styled — edges blurred together at the default 1.5px / 0.75 opacity stroke and the marker presence was hard to verify. Bumped the live-edge stroke from 1.5px / 0.75 opacity to 1.75px / 0.85 so the type-coloured stroke + marker reads against the canvas, and exposed the resolved marker / dashed metadata via data-marker-start, data-marker-end, data-dashed attributes on each <line> so Playwright can assert the wiring without poking at the React state. This pairs with the legend-popover work in the next commit — the two together close item 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-ui): ArchiMate legend becomes Popover with persistence Issue #366 item 3 (second half). The 8-row ArchiMate legend at the bottom of the Architecture graph was a permanent panel that crowded the canvas vertical real estate. Founder feedback: make it a Popover that's closed by default, surfaced behind a single ⓘ ArchiMate connections (12) trigger button. Added EdgeLegendPopover in ArchitectureGraphPage: • Trigger button always visible at the bottom of the graph. • Click → opens the legend in an absolutely-positioned popover above the trigger. • Click-outside / Escape / explicit ✕ button closes. • Open state persists in localStorage `sov-arch-legend-open` so operators who prefer always-visible can keep it pinned. The existing legend body (8 ArchiMate-symbol thumbnails + relation names + counts) is preserved verbatim inside the popover, so the visual contract of the legend itself is unchanged — only the chrome around it. The Architecture.test.tsx vitest case + the cloud-architecture.spec.ts Playwright case both update to click the trigger before asserting the inner rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst-ui): Playwright cases + screenshots for #366 polish Adds e2e/post-v2-polish-366.spec.ts which locks in all four post-v2 UX polish items end-to-end on the deployed surface: 1. Chip strip in toolbar — assert toolbar contains the chip strip element, the legacy 12-tile grid is gone, and the active list table is in the viewport at 1440x900. 2. Header centre slot title — visit Apps, Jobs, Dashboard, Cloud, assert portal-header-title is visible inside portal-header-center with the right text. 3. ArchiMate edges — read marker-start / marker-end attributes from `[data-edge-type=contains]` and `[data-edge-type=runs-on]` lines and assert at least one of each carries the relation-correct marker URL. Legend trigger button always visible; legend body only present after click; localStorage `sov-arch-legend-open` flips on open. 4. Fullscreen — fullscreen toggle has no visible text (icon only), aria-label preserved; clicking flips data-fullscreen=true and the cloud-content bounding box is at viewport height (≥700px @ 900px viewport). Captures 4 screenshots at 1440x900: • p366-chip-strip-list.png • p366-centre-title-cloud.png • p366-archimate-legend-popover.png • p366-archimate-edges-zoomed.png • p366-fullscreen-100pct.png Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst-ui): also flip cloud-architecture polish suite to popover-aware legend Two existing legend assertions in cloud-architecture.spec.ts (the "shows ArchiMate-style symbol thumbnails for every relation type" case at line 305 and the polish-screenshot case at line 411) still expected the legend to be a permanent panel. Updated them to click the trigger button first so the popover body is in the DOM before the assertions run. Closes the last gap from #366 item 3 — full deployed-SHA Playwright suite is now 48/48 green against console.openova.io. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
adf06a7ec2 |
deploy: update catalyst images to 98f2a36
|
||
|
|
98f2a360f2
|
fix(catalyst-ui): post-v2 UX polish — chip strip + centre title + ArchiMate edges + fullscreen height (#366) (#367)
* fix(catalyst-ui): list view — chip strip in toolbar replaces 12-tile card grid Issue #366 item 1. The 12-tile resource-kind card grid + redundant dropdown were pushing the active list table below the fold. Replaced with a compact horizontal chip strip rendered inline in the CloudPage toolbar between the Graph|List view toggle and the fullscreen button (List view only). 6 primary chips render inline (Clusters, vClusters, Node Pools, PVCs, Load Balancers, Buckets); the remaining 6 overflow kinds live in a + More popover. The kind catalogue (icons, labels, primary/overflow split, validation helpers) is extracted to a single source of truth at cloud-list/kinds.ts so CloudListView (active-list dispatcher) and CloudKindChips (toolbar strip) share one definition. CloudListView's body collapses to just the active list table — the toolbar owns the switcher affordance. The CloudPage toolbar simultaneously absorbs the centre-slot title move (issue #366 item 2 — pageTitle prop on PortalShell), the fullscreen icon-only button (issue #366 item 4), and :fullscreen CSS that fills the viewport. Subsequent commits in this PR cover the remaining items. Per docs/INVIOLABLE-PRINCIPLES.md #4, every chip / kind id / icon flows through a typed constant — no hand-maintained string list at any call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-ui): PortalShell — page title in header centre slot, drop body title row Issue #366 item 2. The Sovereign-portal pages all rendered an empty 56px header band on top of the body, with the H1 page title sitting in a separate row below. Wasted ~80px of vertical real-estate on every page (Apps, Jobs, Dashboard, Cloud, AppDetail, JobDetail, JobsTimeline, FlowPage). PortalShell now exposes a 3-slot flex header: • [data-testid=portal-header-left] — breadcrumb / back link. • [data-testid=portal-header-center] — h1 title at [data-testid=portal-header-title]. • [data-testid=portal-header-right] — page-specific affordances (FQDN switcher, provisioning pill) + ThemeToggle. Each slot grabs flex: 1 so the title is visually centred regardless of whether the side slots have content. Pages pass `pageTitle`, `headerSlotLeft`, and `headerSlotRight` as props — no page renders a body H1 row anymore (the legacy testids `cloud-title`, `dashboard-title`, `sov-jobs-timeline-heading` are preserved as hidden anchors so unit tests keep working). CloudPage was migrated alongside the chip strip in the previous commit; this commit migrates the rest of the PortalShell consumers. Per docs/INVIOLABLE-PRINCIPLES.md #4, the slot layout is Tailwind utility classes — no inline px / hex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-ui): GraphCanvas — actually consume EDGE_STROKE/DASHED/MARKER_END per edge type Issue #366 item 3 (first half). The GraphCanvas already wired EDGE_STROKE / EDGE_DASHED / EDGE_MARKER_START / EDGE_MARKER_END per edge type, but founder feedback was that the visible canvas didn't read as ArchiMate-styled — edges blurred together at the default 1.5px / 0.75 opacity stroke and the marker presence was hard to verify. Bumped the live-edge stroke from 1.5px / 0.75 opacity to 1.75px / 0.85 so the type-coloured stroke + marker reads against the canvas, and exposed the resolved marker / dashed metadata via data-marker-start, data-marker-end, data-dashed attributes on each <line> so Playwright can assert the wiring without poking at the React state. This pairs with the legend-popover work in the next commit — the two together close item 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-ui): ArchiMate legend becomes Popover with persistence Issue #366 item 3 (second half). The 8-row ArchiMate legend at the bottom of the Architecture graph was a permanent panel that crowded the canvas vertical real estate. Founder feedback: make it a Popover that's closed by default, surfaced behind a single ⓘ ArchiMate connections (12) trigger button. Added EdgeLegendPopover in ArchitectureGraphPage: • Trigger button always visible at the bottom of the graph. • Click → opens the legend in an absolutely-positioned popover above the trigger. • Click-outside / Escape / explicit ✕ button closes. • Open state persists in localStorage `sov-arch-legend-open` so operators who prefer always-visible can keep it pinned. The existing legend body (8 ArchiMate-symbol thumbnails + relation names + counts) is preserved verbatim inside the popover, so the visual contract of the legend itself is unchanged — only the chrome around it. The Architecture.test.tsx vitest case + the cloud-architecture.spec.ts Playwright case both update to click the trigger before asserting the inner rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst-ui): Playwright cases + screenshots for #366 polish Adds e2e/post-v2-polish-366.spec.ts which locks in all four post-v2 UX polish items end-to-end on the deployed surface: 1. Chip strip in toolbar — assert toolbar contains the chip strip element, the legacy 12-tile grid is gone, and the active list table is in the viewport at 1440x900. 2. Header centre slot title — visit Apps, Jobs, Dashboard, Cloud, assert portal-header-title is visible inside portal-header-center with the right text. 3. ArchiMate edges — read marker-start / marker-end attributes from `[data-edge-type=contains]` and `[data-edge-type=runs-on]` lines and assert at least one of each carries the relation-correct marker URL. Legend trigger button always visible; legend body only present after click; localStorage `sov-arch-legend-open` flips on open. 4. Fullscreen — fullscreen toggle has no visible text (icon only), aria-label preserved; clicking flips data-fullscreen=true and the cloud-content bounding box is at viewport height (≥700px @ 900px viewport). Captures 4 screenshots at 1440x900: • p366-chip-strip-list.png • p366-centre-title-cloud.png • p366-archimate-legend-popover.png • p366-archimate-edges-zoomed.png • p366-fullscreen-100pct.png Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
19dcd0a147 | docs(lessons-learned): renaming persisted JSON tag silently drops legacy data (#351) | ||
|
|
3a8181fac6 |
deploy: update catalyst images to ba09007
|
||
|
|
ba09007427
|
fix(catalyst-api): migrate legacy batchId + synthesize missing parent groups on read (#351) (#365)
Old deployments (e.g. ce476aaf80731a46) were provisioned before #351 landed. Their on-disk index.json carries the deprecated `batchId` JSON field; after the rename the field is silently dropped, leaving every leaf orphaned. The bridge only writes parents on NEW events, so the canvas + table render zero parent relationships for old data. Three changes restore the relationship without a data migration: 1. Job.LegacyBatchID — read-only `batchId` JSON tag for read-tolerant unmarshal. Stripped before every persistIndex write. 2. loadIndex — when ParentID is empty and LegacyBatchID is non-empty, ParentID is set to JobID(deploymentID, batchID); LegacyBatchID is cleared. Pre-refactor leaves with empty Type default to JobTypeInstall. 3. deriveTreeView — every leaf whose ParentID points at an id without a corresponding on-disk row triggers an in-memory synthesized group Job (Type=group, DisplayName resolved from the slug). The synthesis runs BEFORE the rollup pass so the synthesized group participates in childIds + status + timing aggregation just like a real on-disk parent. New deployments are unaffected (their bridge writes the parent row directly). Test: TestStore_LegacyBatchID_HoistedToParentID hand-writes a pre-#351 index.json with `batchId` only, asserts ListJobs returns 3 jobs (2 leaves + 1 synthesized group) with rolled-up running status, ChildIDs populated, and LegacyBatchID cleared on the leaves. TestStore_UpsertJob_RoundTrip updated to assert the new behaviour: inserting a leaf whose ParentID points at the bootstrap-kit group returns 2 jobs from ListJobs (leaf + synthesized parent). Refs #351 Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
45fd2b5d9a |
deploy: update catalyst images to c183e76
|
||
|
|
c183e760ac
|
feat: Cloud IA restructure + graph/list toggle + fullscreen + cloud icon (#350) (#364)
* feat(catalyst-ui): sidebar — single Cloud entry, drop accordion, IconCloud Issue openova-io/openova#350 phase 1. Replaces the two-level Cloud accordion (#309 P3) with a single flat <Link> entry. The new Cloud parent page (CloudPage.tsx) owns the in-page graph/list view dispatch and resource-kind switching, so the sidebar no longer needs to expose category/resource sub-items. Drops: - sov-nav-cloud-toggle (button → link) - sov-nav-cloud-{architecture,compute,network,storage} sub-items - sov-nav-cloud-{compute,network,storage}-toggle second-level toggles - sov-nav-cloud-{compute,network,storage}-{clusters,vclusters,…} sub-sub items - localStorage keys sov-nav-cloud(-{compute,network,storage})-expanded (no longer relevant; the parent page has its own persistence) Adds: - Cloud icon swapped from server-stack rectangles to the verbatim Tabler IconCloud path (lifted from @tabler/icons-react v3.41.1). Active-state matcher unchanged: Cloud highlights on any /cloud/* or legacy /infrastructure/* path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): CloudPage parent shell with graph/list toggle + fullscreen Issue openova-io/openova#350 phases 2 + 4. Promotes CloudPage from a thin <Outlet /> host (#309) to the parent view shell for the consolidated Cloud surface. The page now: - Renders the canonical header (title + tagline + Sovereign switcher). - Adds a segmented View toggle (Graph | List) immediately below. - Owns the active view via the URL ?view= query, falling back to a persisted `sov-cloud-view` localStorage key, falling back to graph. - Dispatches the body: view=graph → Architecture (force-graph); view=list → CloudListView (12-tile grid + active list table). - Adds a fullscreen toggle button with smooth scale + fade transition (~250ms). Native `requestFullscreen()` on the content container; falls back to a synthetic-overlay state when the user-agent denies. Esc exits (browser-native); a floating "Exit fullscreen" button is rendered inside the overlay (top-right). - aria-pressed on the fullscreen toggle reflects state. - Preserves the Sovereign-switcher cross-Sovereign navigation, now carrying the active view + kind on the redirect. The URL is canonicalised on every navigation (replace:true) so deep links and bookmarks always carry an explicit view param. Tests: - CloudPage.test.tsx asserts the segmented control is present and aria-selected reflects state, the fullscreen toggle button is present with aria-pressed=false, and the legacy in-page tab strip remains absent. - Architecture.test.tsx is updated to mount the new shell with viewOverride='graph' (the production dispatch path); the legacy /cloud/architecture child route is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): CloudListView — card grid + dropdown switcher reusing P3 list components Issue openova-io/openova#350 phase 3. CloudListView is the body rendered by CloudPage when view=list. It replaces the previous CloudComputePage / CloudNetworkPage / CloudStoragePage three-tile category surfaces with a single 12-tile card grid covering every resource kind in one place. Surface contract: - Top-of-page: a 12-tile resource card grid (Clusters, vClusters, Node Pools, Worker Nodes, Load Balancers, Services, Ingresses, DNS Zones, PVCs, Buckets, Volumes, Storage Classes). Each tile shows an icon + count + tagline; clicking sets the active kind. Tiles whose informer isn't wired yet (Services / Ingresses / DNS Zones / Storage Classes) show a "—" instead of a count. - Toolbar: a compact <select> dropdown that mirrors the card-grid selection — alternative kbd-driven path. - Below: the active kind's existing P3 list page rendered inline. Components (ClustersPage, PvcsPage, …) are reused as-is — none of them rewritten. Active-kind state lives in the URL (?kind=…) and persists to localStorage under `sov-cloud-list-kind`. The URL takes precedence on mount so deep links / shared URLs always win. Per docs/INVIOLABLE-PRINCIPLES.md #1 (target-state shape) — the entire 12-resource list view ships in this first cut. No "for now" stubs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): router consolidation + redirects from old /cloud/<category>/<resource> URLs Issue openova-io/openova#350 phase 5. Consolidates the seventeen P3 sub-routes (#309) into the single Cloud parent route plus a redirect-only chain. The route tree now has: /provision/$id/cloud ↳ /architecture → ?view=graph ↳ /compute → ?view=list&kind=clusters ↳ /compute/clusters → ?view=list&kind=clusters ↳ /compute/vclusters → ?view=list&kind=vclusters ↳ /compute/node-pools → ?view=list&kind=node-pools ↳ /compute/worker-nodes → ?view=list&kind=worker-nodes ↳ /network → ?view=list&kind=load-balancers ↳ /network/services → ?view=list&kind=services ↳ /network/ingresses → ?view=list&kind=ingresses ↳ /network/load-balancers → ?view=list&kind=load-balancers ↳ /network/dns-zones → ?view=list&kind=dns-zones ↳ /storage → ?view=list&kind=pvcs ↳ /storage/pvcs → ?view=list&kind=pvcs ↳ /storage/storage-classes → ?view=list&kind=storage-classes ↳ /storage/buckets → ?view=list&kind=buckets ↳ /storage/volumes → ?view=list&kind=volumes /provision/$id/infrastructure → /cloud?view=graph (legacy P1) ↳ /topology → /cloud?view=graph ↳ /compute → /cloud?view=list&kind=clusters ↳ /storage → /cloud?view=list&kind=pvcs ↳ /network → /cloud?view=list&kind=load-balancers Redirects fire in `beforeLoad` so they happen before paint. The Cloud parent route gains a `validateSearch` schema for ?view= and ?kind= query params, narrowing the type to the union of valid values. The four CloudComputePage / CloudNetworkPage / CloudStoragePage landing pages are dropped from the route tree (their function is folded into CloudListView's card grid). The per-resource list pages (ClustersPage / PvcsPage / …) remain — they're imported and rendered by CloudListView based on active kind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst-ui): Playwright e2e/cloud-shell.spec.ts + screenshots Issue openova-io/openova#350 phase 6. New: e2e/cloud-shell.spec.ts (17 tests) - Sidebar exposes a single flat Cloud entry (no accordion / chevron / sub-items / second-level toggles). - Clicking Cloud lands on /cloud and canonicalises ?view=graph. - View toggle switches Graph ↔ List, persists across reload via localStorage `sov-cloud-view`. - List view: 12 resource tiles render with counts; clicking a tile switches the active list and updates the URL. - Dropdown switcher mirrors the active kind and changes it. - Fullscreen toggle flips data-fullscreen + aria-pressed; the floating Exit button restores the windowed state. - 10 legacy /cloud/<category>(/<resource>)? URLs redirect to the consolidated query-string shape. - 1440×900 screenshots: graph view, list view (PVCs), fullscreen graph, sidebar Cloud icon close-up. Updated: e2e/cloud-nav.spec.ts (#309 P1 → #350 IA restructure) - Asserts the Cloud entry is a flat link, not an accordion button. - Legacy /infrastructure/* paths redirect to the new query-string shape. Updated: e2e/cloud-list-pages.spec.ts - Drops the accordion-second-level test (replaced by the cloud-shell tile-grid coverage). - Replaces the "category landing has 4 tiles" check with the consolidated 12-tile grid count. - Bumps the screenshot-sweep timeout to 120s (12 redirects + waits blow past the default 30s). Updated: e2e/cosmetic-guards.spec.ts - Cloud sidebar entry is a flat anchor (no accordion contracts). - Per-Sovereign switcher check uses the new /cloud?view=graph URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b4e7455e41 |
deploy: update catalyst images to 3459597
|
||
|
|
3459597589
|
feat(catalyst-ui): Cloud IA restructure + graph/list toggle + fullscreen + cloud icon (#350) (#363)
* feat(catalyst-ui): sidebar — single Cloud entry, drop accordion, IconCloud Issue openova-io/openova#350 phase 1. Replaces the two-level Cloud accordion (#309 P3) with a single flat <Link> entry. The new Cloud parent page (CloudPage.tsx) owns the in-page graph/list view dispatch and resource-kind switching, so the sidebar no longer needs to expose category/resource sub-items. Drops: - sov-nav-cloud-toggle (button → link) - sov-nav-cloud-{architecture,compute,network,storage} sub-items - sov-nav-cloud-{compute,network,storage}-toggle second-level toggles - sov-nav-cloud-{compute,network,storage}-{clusters,vclusters,…} sub-sub items - localStorage keys sov-nav-cloud(-{compute,network,storage})-expanded (no longer relevant; the parent page has its own persistence) Adds: - Cloud icon swapped from server-stack rectangles to the verbatim Tabler IconCloud path (lifted from @tabler/icons-react v3.41.1). Active-state matcher unchanged: Cloud highlights on any /cloud/* or legacy /infrastructure/* path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): CloudPage parent shell with graph/list toggle + fullscreen Issue openova-io/openova#350 phases 2 + 4. Promotes CloudPage from a thin <Outlet /> host (#309) to the parent view shell for the consolidated Cloud surface. The page now: - Renders the canonical header (title + tagline + Sovereign switcher). - Adds a segmented View toggle (Graph | List) immediately below. - Owns the active view via the URL ?view= query, falling back to a persisted `sov-cloud-view` localStorage key, falling back to graph. - Dispatches the body: view=graph → Architecture (force-graph); view=list → CloudListView (12-tile grid + active list table). - Adds a fullscreen toggle button with smooth scale + fade transition (~250ms). Native `requestFullscreen()` on the content container; falls back to a synthetic-overlay state when the user-agent denies. Esc exits (browser-native); a floating "Exit fullscreen" button is rendered inside the overlay (top-right). - aria-pressed on the fullscreen toggle reflects state. - Preserves the Sovereign-switcher cross-Sovereign navigation, now carrying the active view + kind on the redirect. The URL is canonicalised on every navigation (replace:true) so deep links and bookmarks always carry an explicit view param. Tests: - CloudPage.test.tsx asserts the segmented control is present and aria-selected reflects state, the fullscreen toggle button is present with aria-pressed=false, and the legacy in-page tab strip remains absent. - Architecture.test.tsx is updated to mount the new shell with viewOverride='graph' (the production dispatch path); the legacy /cloud/architecture child route is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): CloudListView — card grid + dropdown switcher reusing P3 list components Issue openova-io/openova#350 phase 3. CloudListView is the body rendered by CloudPage when view=list. It replaces the previous CloudComputePage / CloudNetworkPage / CloudStoragePage three-tile category surfaces with a single 12-tile card grid covering every resource kind in one place. Surface contract: - Top-of-page: a 12-tile resource card grid (Clusters, vClusters, Node Pools, Worker Nodes, Load Balancers, Services, Ingresses, DNS Zones, PVCs, Buckets, Volumes, Storage Classes). Each tile shows an icon + count + tagline; clicking sets the active kind. Tiles whose informer isn't wired yet (Services / Ingresses / DNS Zones / Storage Classes) show a "—" instead of a count. - Toolbar: a compact <select> dropdown that mirrors the card-grid selection — alternative kbd-driven path. - Below: the active kind's existing P3 list page rendered inline. Components (ClustersPage, PvcsPage, …) are reused as-is — none of them rewritten. Active-kind state lives in the URL (?kind=…) and persists to localStorage under `sov-cloud-list-kind`. The URL takes precedence on mount so deep links / shared URLs always win. Per docs/INVIOLABLE-PRINCIPLES.md #1 (target-state shape) — the entire 12-resource list view ships in this first cut. No "for now" stubs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): router consolidation + redirects from old /cloud/<category>/<resource> URLs Issue openova-io/openova#350 phase 5. Consolidates the seventeen P3 sub-routes (#309) into the single Cloud parent route plus a redirect-only chain. The route tree now has: /provision/$id/cloud ↳ /architecture → ?view=graph ↳ /compute → ?view=list&kind=clusters ↳ /compute/clusters → ?view=list&kind=clusters ↳ /compute/vclusters → ?view=list&kind=vclusters ↳ /compute/node-pools → ?view=list&kind=node-pools ↳ /compute/worker-nodes → ?view=list&kind=worker-nodes ↳ /network → ?view=list&kind=load-balancers ↳ /network/services → ?view=list&kind=services ↳ /network/ingresses → ?view=list&kind=ingresses ↳ /network/load-balancers → ?view=list&kind=load-balancers ↳ /network/dns-zones → ?view=list&kind=dns-zones ↳ /storage → ?view=list&kind=pvcs ↳ /storage/pvcs → ?view=list&kind=pvcs ↳ /storage/storage-classes → ?view=list&kind=storage-classes ↳ /storage/buckets → ?view=list&kind=buckets ↳ /storage/volumes → ?view=list&kind=volumes /provision/$id/infrastructure → /cloud?view=graph (legacy P1) ↳ /topology → /cloud?view=graph ↳ /compute → /cloud?view=list&kind=clusters ↳ /storage → /cloud?view=list&kind=pvcs ↳ /network → /cloud?view=list&kind=load-balancers Redirects fire in `beforeLoad` so they happen before paint. The Cloud parent route gains a `validateSearch` schema for ?view= and ?kind= query params, narrowing the type to the union of valid values. The four CloudComputePage / CloudNetworkPage / CloudStoragePage landing pages are dropped from the route tree (their function is folded into CloudListView's card grid). The per-resource list pages (ClustersPage / PvcsPage / …) remain — they're imported and rendered by CloudListView based on active kind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst-ui): Playwright e2e/cloud-shell.spec.ts + screenshots Issue openova-io/openova#350 phase 6. New: e2e/cloud-shell.spec.ts (17 tests) - Sidebar exposes a single flat Cloud entry (no accordion / chevron / sub-items / second-level toggles). - Clicking Cloud lands on /cloud and canonicalises ?view=graph. - View toggle switches Graph ↔ List, persists across reload via localStorage `sov-cloud-view`. - List view: 12 resource tiles render with counts; clicking a tile switches the active list and updates the URL. - Dropdown switcher mirrors the active kind and changes it. - Fullscreen toggle flips data-fullscreen + aria-pressed; the floating Exit button restores the windowed state. - 10 legacy /cloud/<category>(/<resource>)? URLs redirect to the consolidated query-string shape. - 1440×900 screenshots: graph view, list view (PVCs), fullscreen graph, sidebar Cloud icon close-up. Updated: e2e/cloud-nav.spec.ts (#309 P1 → #350 IA restructure) - Asserts the Cloud entry is a flat link, not an accordion button. - Legacy /infrastructure/* paths redirect to the new query-string shape. Updated: e2e/cloud-list-pages.spec.ts - Drops the accordion-second-level test (replaced by the cloud-shell tile-grid coverage). - Replaces the "category landing has 4 tiles" check with the consolidated 12-tile grid count. - Bumps the screenshot-sweep timeout to 120s (12 redirects + waits blow past the default 30s). Updated: e2e/cosmetic-guards.spec.ts - Cloud sidebar entry is a flat anchor (no accordion contracts). - Per-Sovereign switcher check uses the new /cloud?view=graph URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4588492e10 |
docs(lessons-learned): Helm hooks + CRD ordering, catalyst-bootstrap-api credentials behavior
Two lessons from the #318 / #346 wipe-endpoint shipping pass: 1. helm-hooks-and-crd-ordering.md — `helm.sh/hook-delete-policy: before-hook-creation` deadlocks on first install when the CRD comes from the same chart's upstream subchart. The lookup runs before the subchart's CRDs finish registering. Hit twice (bp-crossplane@1.1.2 in PR #247, bp-external-secrets@1.0.0 in PR #334). Architectural fix is the same: chart-split + Flux dependsOn so the CR chart only starts after the controller is Ready=True. 2. catalyst-bootstrap-api.md — catalyst-api intentionally GCs the in-memory Hetzner token after writeTfvars per credential hygiene, but `tofu destroy` still works against the on-disk workdir without re-prompting because the token is persisted into tofu.auto.tfvars.json on the PVC. Verified during #318 wipe-endpoint testing. The body- supplied token at the wipe endpoint is for the Hetzner-direct orphan-purge safety net, not for tofu itself. Reviewers should not add re-prompt-or-401 guards on the tofu path. Refs: #318 #331 #247 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9e7bfc6e3a
|
fix(catalyst-ui): live deployed-SHA Playwright fixes for #348 P1 (#362)
Three deployed-SHA validation fixes uncovered by running the new e2e
suite against console.openova.io:
1. Drop the hidden legacy `infrastructure-detail-panel-neighbor-{id}`
span in DetailPanel — having display:none on it broke the legacy
test 4's `toBeVisible()` assertion. The legacy testid was not
needed; the existing tests now key off the new
`arch-detail-panel-neighbor-{relation}-{id}` ids.
2. Tighten the NodePool+PVC isolation test selector from
`[data-testid^="arch-graph-node-"]` to `g[data-node-type]` — the
broad prefix selector was matching the per-icon test ids
(`arch-graph-node-icon-{type}`) which don't carry data-node-type
and produced null `getAttribute()` reads.
3. Make the ArchiMate legend close-up screenshot resilient to a
legend that's below the viewport: scrollIntoViewIfNeeded() and
bound the clip box against the actual viewport size before
passing to page.screenshot.
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
18b42680da
|
fix(catalyst-ui): live deployed-SHA Playwright fixes for #348 P1 (#361)
Three deployed-SHA validation fixes uncovered by running the new e2e
suite against console.openova.io:
1. Drop the hidden legacy `infrastructure-detail-panel-neighbor-{id}`
span in DetailPanel — having display:none on it broke the legacy
test 4's `toBeVisible()` assertion. The legacy testid was not
needed; the existing tests now key off the new
`arch-detail-panel-neighbor-{relation}-{id}` ids.
2. Tighten the NodePool+PVC isolation test selector from
`[data-testid^="arch-graph-node-"]` to `g[data-node-type]` — the
broad prefix selector was matching the per-icon test ids
(`arch-graph-node-icon-{type}`) which don't carry data-node-type
and produced null `getAttribute()` reads.
3. Make the ArchiMate legend close-up screenshot resilient to a
legend that's below the viewport: scrollIntoViewIfNeeded() and
bound the clip box against the actual viewport size before
passing to page.screenshot.
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
433dd33943 |
deploy: update catalyst images to 5862fce
|
||
|
|
5862fcec3b
|
feat: Architecture graph polish (P1 of #348) (#360)
* feat(catalyst-ui): SMALL_TYPE_THRESHOLD + auto-100% density for small types Item 1 of #348. Small types (total < 20) bypass the global density slider's per-type cap calculation and always render at 100% as long as the chip is active. Threshold is exported from widgets/architecture-graph/types.ts so adapter, page, GraphCanvas, and the test suite all key off the same constant. The per-type popover is already short-circuited for small types (chip click toggles visibility without opening the slider) — semantics confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): chip add/remove + full relation cache regardless of active chips Item 2 of #348. The adapter now emits every node type — including PVC, Bucket, Volume (storage block) and reserved Service / Ingress slots — plus every relation type from the spec (contains, member-of, runs-on, routes-to, attached-to, depends-on, used-by, peers-with, flows-to, realizes, triggers, associates). The page-level orchestrator holds an `activeTypes` Set; chips have an explicit "×" remove button and the strip ends with a "+" Popover that lists inactive types with their counts. Removing a chip filters its nodes out of the canvas; re-adding restores them. The data layer is the single source of truth — chip add/remove never re-queries. Verified the founder's example: removing every chip except NodePool + PVC isolates the canvas to those types and the edges between them. Per ADR-0001 §B4 — "full relation cache" aligns with the #321 informer cache foundation; today's adapter is the placeholder until that lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): relation types in detail panel grouped by relation Item 3 of #348. The right-side detail panel's neighbor list now carries the relation type per neighbor. Neighbors are grouped under sticky per-relation subheaders ordered by ALL_EDGE_TYPES so the panel reads consistently between renders. Each row exposes a stable testid: arch-detail-panel-neighbor-{relation}-{nodeId} (plus a hidden legacy infrastructure-detail-panel-neighbor-{nodeId} for backwards-compat with #309 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): ArchiMate edge marker styles + updated legend Item 4 of #348. Each relation type maps to an ArchiMate-derived end decoration: composition (filled diamond at parent end) for `contains`, aggregation (hollow diamond) for `member-of`, assignment (filled dots at both ends) for `runs-on`, triggering (filled triangle) for `routes-to` / `triggers` / `flows-to`, used-by (open triangle) for `depends-on` / `used-by`, realization (hollow triangle) for `realizes`, and association (plain line) for `peers-with` / `associates`. Implementation: SVG `<defs><marker>` patterns rendered into the canvas once per (kind, stroke) pair (`uniqueMarkerDefs`); the marker palette is stable across animation frames so React doesn't re-allocate every tick. Per-edge `markerStart` / `markerEnd` URL refs in the line elements drive the rendering. The legend at the bottom now shows the ArchiMate symbol thumbnail + name + count, with self-contained marker defs scoped to each thumbnail SVG (`-legend` id suffix). `markers.ts` is a separate module so GraphCanvas.tsx satisfies react-refresh/only-export-components. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): bounded physics — nodes constrained to canvas Item 5 of #348. A custom d3-force `forceBound(width, height, padding=20)` clamps each node's x/y inside the canvas every tick. The clamp also handles fx/fy when set via drag-pin so a manual drag past the edge instantly snaps inside. Adaptive physics tiers retuned: charge magnitudes lowered slightly so strong repulsion doesn't fight the bound at small canvas sizes (the ≤50-node tier drops from -240 → -160; the ≤200 tier from -180 → -120, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): per-type tabler icons replace plain circles Item 10 of #348. Each architecture-graph node renders with a @tabler/icons-react glyph at its centre plus a type-color stroke ring, replacing the prior plain disc. Locked mapping: Cloud→IconCloud, Region→IconMapPin, Cluster→IconBox, vCluster→IconStack3, NodePool→IconStack2, WorkerNode→IconCpu, LoadBalancer→IconArrowsSplit, Network→IconNetwork, PVC→IconDatabase, Bucket→IconBucketDroplet, Volume→IconDisc, Service→IconWorld, Ingress→IconRouteAltLeft. Icons sized 14-18px scaled to node radius; minimum disc radius NODE_R=14 so the icon always reads against the canvas. The detail panel's neighbor list also picks up the per-type icons. `icons.ts` is a separate module so GraphCanvas.tsx remains a component-only file (react-refresh/only-export-components). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst-ui): Playwright cases + screenshots for 348 polish Item 7 of #348. Extends e2e/cloud-architecture.spec.ts with eight new cases targeting #348 P1: - type chips carry "×" + the strip ends with "+" - removing every chip except NodePool + PVC isolates only those nodes - "+" Popover re-adds a removed type - detail panel groups neighbors by relation with sticky subheaders - edge legend renders ArchiMate symbol thumbnails for every relation - per-type tabler icons render (`arch-graph-node-icon-{type}` testids) - bounded physics — drag node toward (-100,-100) clamps inside canvas - global density slider does not affect small types (auto-100%) Plus a screenshot suite at 1440x900 capturing default / NodePool+PVC isolated / single-type focus / ArchiMate legend close-up. All graph-node interactions use `force: true` per the established continuous-simulation flake-fix pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a86449f840 |
deploy: update catalyst images to 7cd4c57
|
||
|
|
7cd4c57ab8
|
feat: K8s informer + SSE data plane (#321) (#358)
* feat(catalyst-api): k8scache package — SharedInformerFactory per Sovereign Core data-plane primitive for ADR-0001 §5: catalyst-api's in-process view of every managed Sovereign cluster. One dynamicinformer per cluster watches the kinds registry (Pod, Deployment, StatefulSet, DaemonSet, Service, Ingress, Namespace, Node, PVC, ConfigMap, Secret, plus Crossplane provider-hcloud Server/LoadBalancer/Network/Volume and vCluster.io VClusters). Event-driven only — no time.Tick, no poll loops. Redaction strips Secret/ConfigMap data before any object leaves the informer goroutine. Prometheus metrics expose informer liveness, cache size, resyncs, SSE subscribers, drop rate, SAR cache effectiveness. Registry is runtime-mutable via a ConfigMap so operators add a watched GVR without a code change. Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-api): k8scache disk snapshot + hydrate (cold-start mitigation) Per ADR-0001 §5.1 the catalyst-api Pod's cold-start budget is the biggest data-plane risk. Without snapshot, a tier-1 Sovereign with thousands of objects re-LISTs every (cluster × kind) on every restart — 1–30s of dead UI per restart, multiplied by 6+ restarts per provisioning run. Disk snapshot: - One JSON per (cluster, kind) under /var/cache/sov-cache/ - Atomic temp-file + rename - Mode 0600, redacted Secret/ConfigMap data - Snapshot loop fires every 60s - Snapshots older than 1h are pruned on each pass Hydrate: - Pre-seeds the Indexer BEFORE factory.Start opens the watch - Stale or version-mismatched snapshots fall back to a normal LIST - Per-(cluster, kind) outcome metric ("hydrated" / "missing" / "expired" / "failed") so an operator sees how often the cold-start mitigation pays off Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-api): k8s REST list + multiplexed SSE stream — SAR-gated Per ADR-0001 §5: GET /api/v1/sovereigns/{id}/k8s/{kind} - reads the in-process Indexer - Kubernetes label selector + minimal field selector - paginates via opaque continuation cursor (base64 of stable index) - X-Cache-Stale-Seconds header + Warning: 110 when cache > 30s - per-namespace SubjectAccessReview gating GET /api/v1/sovereigns/{id}/k8s/stream?kinds=pod,deployment,... - Server-Sent Events with multiplexed kinds - per-event SAR filter (cached for 30s per user+kind+namespace) - 15s heartbeat (": ping" comment frames) - optional ?initialState=1 emits a synthetic ADDED for every cached object before live events begin - drop-oldest backpressure on slow consumers Decision-cache (sar.go) holds positive + negative SAR decisions for 30s; cache hits + misses + apiserver fallback failures are Prometheus-exported. Fail-closed on apiserver error so a transient SAR failure can never leak data. Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-api): Prometheus metrics + healthz informer-sync wiring main.go wires k8scache.FactoryFromEnv at startup, calls Start(ctx), binds the Factory + a SARCache + the user-header name onto the Handler via SetK8sCache. /metrics is mounted at the root via promhttp.Handler so Prometheus can scrape catalyst-internal informer state alongside the existing K8s ServiceMonitor surface. /healthz now negotiates content type: - default: legacy "ok" plain-text — preserves the readinessProbe contract the chart's container has had since #163 - Accept: application/json — structured body listing each registered Sovereign and the per-kind sync map. Returns 503 when the lexically-first cluster has not yet synced Pod + Deployment informers (per the issue spec) The home-cluster typed client is built from rest.InClusterConfig so the optional kinds-registry ConfigMap is loadable from the catalyst namespace; out-of-cluster (CI smoke test) the client build fails softly and the default kinds registry is used. Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-chart): catalyst-api-cache PVC + mount Mounts a 5Gi RWO PVC at /var/cache/sov-cache on the catalyst-api Pod, backing the k8scache disk-snapshot loop (issue #321). Separate from the existing catalyst-api-deployments PVC so the cache size is independent of the deployment-record store and a snapshot blow-out cannot evict the durable provisioning state. Wires three new env vars on the api Deployment: CATALYST_K8SCACHE_KUBECONFIGS_DIR — kubeconfig directory the Factory reads at startup (one Sovereign per file) CATALYST_K8SCACHE_SNAPSHOT_DIR — base directory for the snapshot loop (the new PVC mount) CATALYST_K8SCACHE_KINDS_CONFIGMAP — optional registry extension Per docs/INVIOLABLE-PRINCIPLES.md #4 every value is a runtime parameter; air-gapped deploys override via Kustomize patch. Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): useK8sStream hook + EventSource consumer React hook over the catalyst-api's /sovereigns/{id}/k8s/stream SSE endpoint (issue #321). Mirrors the pattern of useDeploymentEvents but generalised over arbitrary kinds: - Stable URL build via API_BASE (per INVIOLABLE-PRINCIPLES.md #4) - Local Map keyed by ${kind}:${ns}/${name}; ADDED/MODIFIED set, DELETED removes - Auto-reconnect on EventSource error with 0.5s → 30s exponential backoff - Per-kind grouping for List pages, flat array for graph paths - Generic over the K8s object shape with a getMeta helper - disableStream test seam, manual reconnect() trigger Tests use a FakeEventSource shim — jsdom doesn't ship EventSource natively. Coverage: open/close, ADDED/MODIFIED/DELETED, malformed events, URL parameter shape, disableStream early-out. Also commits the matching backend tests for k8scache (registry, factory, hydrate-then-resume, hydrate-stale-then-relist, snapshot during shutdown, secret data redaction, fail-closed SAR) and the handler-level k8s.go tests (list, 404 with kind catalogue, sync map, /healthz JSON shape, SSE initial-state ADDED). Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalyst-ui): migrate useCloud to useK8sStream live updates Per ADR-0001 §5 the Cloud surface reads off ONE Indexer-fed source. The legacy getHierarchicalInfrastructure REST call remains as the cold-start seed (deep-links render without waiting for SSE); the K8s stream provides live updates from the catalyst-api's in-process Indexer (issue #321). CloudPage now opens a useK8sStream against the Sovereign id, watching the kinds the four sub-pages render: pod, deployment, statefulset, service, persistentvolumeclaim, node, and the Crossplane provider- hcloud projections (server, loadbalancer, network, volume) plus vCluster.io tenants. The CloudContext shape gains four new fields: liveItems — flat array of K8s objects liveByKind — same data grouped by short kind name liveLastEventAt — Date of the last received event liveStreaming — true once SSE is open and not in error backoff #348/#349/#350 agents continue to consume the existing HierarchicalInfrastructure shape; this commit is purely additive on the context — no consumer is forced to refactor. Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalyst): Playwright E2E for live K8s stream + screenshots Two tests under the existing UI Playwright config: • synthetic ADDED Deployment renders new graph node + list row • disconnect + reconnect restores graph state Both mock the SSE endpoint via page.route so the spec is fully self-contained — runs against the dev Vite server without needing a live catalyst-api or a real Sovereign cluster. Screenshots saved at 1440x900 to playwright-report/ for visual regression diffing. When this lands on console.openova.io the same tests run against the deployed surface; the page.route mocks are kept disabled in that context so a real catalyst-api / Indexer pipeline drives events. Refs #321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |