openova

Author	SHA1	Message	Date
hatiyildiz	e28b16935a	merge: cloudinit Cilium k8sServiceHost=127.0.0.1 (verified working on omantel)	2026-04-29 15:31:34 +02:00
hatiyildiz	548720095a	fix(cloudinit): use 127.0.0.1 for Cilium k8sServiceHost (host's local apiserver) Cilium with --set k8sServiceHost=10.0.1.2 (the cp1 private NIC IP) sat in init phase forever — the agent's API client kept logging "Establishing connection to apiserver host=https://10.0.1.2:6443" and never got a response, even though `curl https://10.0.1.2:6443/healthz` from the host returned 401 (TLS+auth challenge = endpoint reachable). Switching to k8sServiceHost=127.0.0.1 brought the DaemonSet up immediately. Verified end-to-end on the live cluster: $ kubectl get nodes catalyst-omantel-omani-works-cp1 Ready ... 32m v1.31.4+k3s1 The node's local apiserver always binds 127.0.0.1:6443; using that as the bootstrap apiserver endpoint sidesteps whatever was rejecting the private-NIC IP route during Cilium's pre-CNI bring-up. Once Cilium is the CNI and the cluster has real Service VIPs, every other component reaches the apiserver via the kubernetes.default service as usual.	2026-04-29 15:31:21 +02:00
github-actions[bot]	9f2f3416f5	deploy: update catalyst images to `f0f2513`	2026-04-29 13:30:31 +00:00
hatiyildiz	f0f2513c3d	merge: cloudinit installs Cilium before Flux (fix CNI bootstrap deadlock)	2026-04-29 15:29:20 +02:00
hatiyildiz	e571ec7aa2	fix(cloudinit): install Cilium BEFORE Flux to break CNI bootstrap deadlock omantel.omani.works deployment 5cd1bceaaacb71f6 reached Phase 0 success (10 Hetzner resources up, LB IP 49.12.16.160, DNS committed via PDM) but stayed silent for 25 minutes — `https://console.omantel.omani.works` returned no response, every Flux pod was Pending, and the node was NotReady. SSH'd into the cp1 box (firewall opened temporarily for the operator IP) and found the canonical CNI bootstrap deadlock: Ready: False (KubeletNotReady) message: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady cni plugin not initialized cloud-init started k3s with --flannel-backend=none + --disable-network-policy (the right Cilium-ready posture), then immediately applied the Flux install.yaml. Flux pods are Pending because there is no CNI yet, so Flux never starts → never reconciles bp-cilium → CNI never installs → deadlock. The "wait for deployment Available --timeout=300s" line silently times out and cloud-init proceeds anyway with the Flux GitRepository + Kustomization that nothing reconciles. Resolution: install Cilium ONCE in cloud-init via the canonical Helm chart at the SAME version (1.16.5) that platform/cilium/blueprint.yaml declares for bp-cilium. When Flux later reconciles clusters/<sovereign_fqdn>/bootstrap-kit/01-cilium.yaml it adopts the existing Helm release (release name + namespace match), so the wizard's ownership model stays single-source-of-truth (Flux + Blueprints) after the bootstrap exception. Per INVIOLABLE-PRINCIPLES.md #3, this Helm install is the one-shot bootstrap exception authorised by "the GitOps engine is Flux — everything ELSE gets installed by Flux". Cilium IS the CNI Flux needs, so it cannot be installed by Flux without bootstrapping itself first. Every other component still flows through the Blueprint pipeline. Verified: ssh'd into the running omantel cp1 (firewall opened for the operator IP), ran the same `helm install cilium ...` command this patch encodes, and the cluster recovered — node Ready, Flux pods scheduling, GitRepository pulling. Will redeploy from scratch with the patched cloud-init to validate the full unattended path. Cloud-init is the Phase-0 OpenTofu artifact baked into the Hetzner server's user_data, so this change activates on the NEXT `tofu apply` that creates a new control-plane server. Existing omantel cp1 is manually unblocked already; new Sovereigns provisioned after the catalyst-api image with this template is rolled will not hit the deadlock.	2026-04-29 15:29:10 +02:00
hatiyildiz	7a10ae6c4e	merge: SSE events buffer + replay endpoint for completed deployments	2026-04-29 15:27:21 +02:00
hatiyildiz	29fcb9a8db	fix(catalyst-api): buffer SSE events on Deployment + replay on connect for ProvisionPage history Closes the user-reported regression "this is empty are you sure this is progressing?" — `/sovereign/provision/<id>` rendered `0 events · done` even when the deployment succeeded with 10 Hetzner resources, because a browser that connected after `event: done` arrived at an already-closed channel with nothing to replay. API: - Add `eventsBuf` durable slice (mutex-guarded) on `Deployment`, capped at 10,000 events with FIFO eviction so a runaway producer cannot OOM. - Tee every emit through `recordEvent` — single source of truth for the buffer + the live channel, so they cannot diverge. - StreamLogs replays the buffer on connect; if the deployment is already done, replays + emits `event: done` and closes. - New `GET /api/v1/deployments/{id}/events` returns slice + state JSON for stateless reconnect / fast-path render. - `Deployment.State()` includes `numEvents` summary. - New tests prove buffer fill, replay-on-completed, GET endpoint shape, and FIFO eviction at cap. UI: - ProvisionPage fetches GET /events on mount BEFORE attaching the SSE stream; replays through `applyEventToContext()` so a deep-link to a completed deployment renders the FULL history of bubbles + log entries instead of an empty shell. - Live SSE `seen` counter de-duplicates the SSE replay-on-connect against the GET fetch we already applied. - Elapsed clock anchors on first event time for completed deployments. - 4 new vitest tests (153 total) cover the GET fetch, completed-state bubble flip, 404 graceful handling, and elapsed-clock anchor. Closes #180. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:23:39 +02:00
github-actions[bot]	6f9c0b261a	deploy: update catalyst images to `b3497e6`	2026-04-29 13:15:38 +00:00
hatiyildiz	b3497e6a16	merge: smoke-test bp-* via fixed-string grep on extracted bundle (false-negative fix)	2026-04-29 15:14:27 +02:00
hatiyildiz	19e7ba14e8	ci(catalyst-build): smoke-test bp-* bundle presence via fixed-string grep on extracted bundle	2026-04-29 15:14:16 +02:00
hatiyildiz	a336a3315c	merge: bundle bootstrap-kit + platform + products into catalyst-ui build (fix empty Provision DAG)	2026-04-29 15:10:09 +02:00
hatiyildiz	0898a0dfd9	fix(ui): bundle bootstrap-kit + platform + products into catalyst-ui build The wizard's /sovereign/provision/<id> page rendered only 2 supernodes (Hetzner-infra + Flux-bootstrap) instead of the 11 bootstrap-kit Blueprints + the user's selected components. Verified by greping the deployed bundle: $ kubectl exec -n catalyst <ui-pod> -- \ grep -c "bp-cilium\\|bp-cert-manager" /usr/share/nginx/html/assets/index-.js 0 Root cause: scripts/build-catalog.mjs computes REPO_ROOT relative to the script's own location and walks platform/<name>/blueprint.yaml, products/<name>/blueprint.yaml, clusters/_template/bootstrap-kit/. The docker build context for catalyst-ui was set to products/catalyst/bootstrap/ui/, so REPO_ROOT in the container resolved to a directory ABOVE the build context that holds nothing. The script silently emitted catalog.generated.ts with BOOTSTRAP_KIT = [] and ALL_BLUEPRINTS = [], shipping an empty bundle. Three coupled fixes (no bandaid): 1. scripts/build-catalog.mjs — accept OPENOVA_REPO_ROOT env override AND fail loudly with a clear message if any of platform/, products/, clusters/_template/bootstrap-kit/ is missing. A future misconfigured context cannot silently regress the bundle. 2. products/catalyst/bootstrap/ui/Containerfile — build context is now /repo (the OpenOva repo root). Containerfile COPYs the four needed subtrees explicitly (platform/, products/, clusters/_template/ bootstrap-kit/, products/catalyst/bootstrap/ui/) and exports OPENOVA_REPO_ROOT=/repo so the prebuild script picks them up. 3. .github/workflows/catalyst-build.yaml — UI build context flipped from openova-src/products/catalyst/bootstrap/ui to openova-src. Plus a new bootstrap-kit smoke test that asserts every bp- id (cilium, cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak, gitea) is present in the built bundle. Failure of this step fails the build — the regression is now caught in CI, not by the user staring at an empty progress page. Verified locally: `node scripts/build-catalog.mjs` still emits 11 blueprints when run from the dev path (env override falls back to the relative-resolve mode).	2026-04-29 15:09:58 +02:00
github-actions[bot]	364a4903d4	deploy: update catalyst images to `7b1eb7b`	2026-04-29 13:04:35 +00:00
hatiyildiz	7b1eb7badf	merge: per-brand-colour logo tiles (Alloy orange, FerretDB navy, Temporal blue, etc.)	2026-04-29 15:01:28 +02:00
hatiyildiz	8d99acf38c	fix(wizard): logo tiles use each project's canonical brand colour as backplate Replaces the synthetic 2-tone classification (light=slate-900, color=slate-100) with a per-brand surface map keyed by each project's canonical homepage / press-kit colour. Every component's logo tile now renders against its own brand surface — exactly how each project displays its mark on its own homepage: - Alloy → Grafana orange (#FF671D), white wordmark crisp - FerretDB → navy (#042B41), fawn glyph clearly visible - Temporal → signature blue (#127ED1), white logo crisp - Cilium → navy (#1A2236), hexagon mosaic visible - Grafana → dark navy (#0B0F19), orange-yellow gradient pops - Cert-manager / OpenSearch → white tile (matches their on-white brand) - Stalwart → navy (#100E42), coral red wordmark - Strimzi → navy (#192C47), cyan accent visible Per-brand surface is theme-INDEPENDENT — homepage logos look the same regardless of viewer theme, and the wizard mirrors that. The card BODY surrounding the tile still flips with the wizard theme; only the LOGO TILE is brand-locked. Internal letter-mark components without a finalized upstream brand mark (axon, bge, continuum, specter, powerdns) are assigned distinct slate / navy tones from the OpenOva platform palette so the letter reads cleanly and the tile doesn't visually clash with neighbouring brand tiles in the same family. Backwards-compatibility shim retained: `getLogoToneStyle` aliases `getLogoSurface`, so the four call sites (StepComponents, StepReview, MarketplaceFamilyPage, MarketplaceProductPage) work unchanged. Their descriptive comments are updated to reflect the per-brand semantics. Refs #179 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:55:54 +02:00
github-actions[bot]	cbc09d1109	deploy: update catalyst images to `58b5d6d`	2026-04-29 12:54:22 +00:00
hatiyildiz	58b5d6d6f4	merge: drop redundant null_resource.dns_pool — PDM owns DNS writes	2026-04-29 14:53:06 +02:00
hatiyildiz	330211d275	fix(tofu): drop redundant null_resource.dns_pool — PDM owns DNS writes Every tofu apply on a pool deployment was hitting: null_resource.dns_pool[0]: Provisioning with 'local-exec'... null_resource.dns_pool[0] (local-exec): (output suppressed due to sensitive value in config) Error: Invalid field in API request catalyst-dns: write DNS: add *.omantel record: dynadot api error: code= Two separate code paths were both writing Dynadot records for the same deployment: 1. The OpenTofu module's null_resource.dns_pool — a local-exec that shells out to /usr/local/bin/catalyst-dns inside the catalyst-api container. The binary's request payload is rejected by Dynadot. 2. catalyst-api's pool-domain-manager call — pdm.Commit() at handler/deployments.go:247 writes the canonical record set with the LB IP after tofu apply returns. This path works. Per #168 PDM is the single owner of all pool-domain Dynadot writes. The null_resource path is a pre-#168 artifact that should have been removed when PDM took ownership; keeping it dual-wrote DNS records (when it worked) and broke the entire provision flow (when it didn't). Verified end-to-end against the live catalyst-api at console.openova.io: tofu apply created 7 of 11 Hetzner resources (network, firewall, subnet, LB, 2 LB services, ssh_key) before failing at null_resource.dns_pool[0]. With this commit the DNS-write step disappears from the plan, and PDM /commit handles record creation after the LB IP is known. The dynadot_key + dynadot_secret variables in variables.tf remain declared (provisioner.go still passes them through tfvars.json) but are no longer referenced by any resource. Removing them is a separate sweep — left for a follow-up to keep this commit narrowly scoped to the failure path.	2026-04-29 14:52:57 +02:00
hatiyildiz	132d3dcd38	fix(tofu): drop redundant null_resource.dns_pool — PDM owns DNS writes Every tofu apply on a pool deployment was hitting: null_resource.dns_pool[0]: Provisioning with 'local-exec'... null_resource.dns_pool[0] (local-exec): (output suppressed due to sensitive value in config) Error: Invalid field in API request catalyst-dns: write DNS: add *.omantel record: dynadot api error: code= Two separate code paths were both writing Dynadot records for the same deployment: 1. The OpenTofu module's null_resource.dns_pool — a local-exec that shells out to /usr/local/bin/catalyst-dns inside the catalyst-api container. The binary's request payload is rejected by Dynadot. 2. catalyst-api's pool-domain-manager call — pdm.Commit() at handler/deployments.go:247 writes the canonical record set with the LB IP after tofu apply returns. This path works. Per #168 PDM is the single owner of all pool-domain Dynadot writes. The null_resource path is a pre-#168 artifact that should have been removed when PDM took ownership; keeping it dual-wrote DNS records (when it worked) and broke the entire provision flow (when it didn't). Verified end-to-end against the live catalyst-api at console.openova.io: tofu apply created 7 of 11 Hetzner resources (network, firewall, subnet, LB, 2 LB services, ssh_key) before failing at null_resource.dns_pool[0]. With this commit the DNS-write step disappears from the plan, and PDM /commit handles record creation after the LB IP is known. The dynadot_key + dynadot_secret variables in variables.tf remain declared (provisioner.go still passes them through tfvars.json) but are no longer referenced by any resource. Removing them is a separate sweep — left for a follow-up to keep this commit narrowly scoped to the failure path.	2026-04-29 14:52:24 +02:00
github-actions[bot]	96f4fe9265	deploy: update catalyst images to `80b86a1`	2026-04-29 12:45:06 +00:00
hatiyildiz	80b86a14ac	merge: accept cpx* SKU family + empty worker_size for solo Sovereigns	2026-04-29 14:44:02 +02:00
hatiyildiz	c6cbfe684c	fix(tofu): accept cpx* SKU family + empty worker_size for solo Sovereigns The wizard's recommended Hetzner SKU is CPX32 (4 vCPU AMD / 8 GB / €0.0232/hr) but the module's variables.tf validation rule only accepted the cx / ccx / cax families — CPX (AMD shared) was missing entirely. Every Launch through the wizard hit: Error: Invalid value for variable on variables.tf line 68: variable "control_plane_size" { var.control_plane_size is "cpx32" control_plane_size must match Hetzner server-type naming (cxNN \| ccxNN \| caxNN) Solo Sovereigns (worker_count = 0) also legitimately have an empty worker_size — the validation rejected that too: Error: Invalid value for variable on variables.tf line 91: variable "worker_size" { var.worker_size is "" Both fixed by extending the regex with the cpx* family AND permitting the empty string on worker_size when the operator runs a solo Sovereign. Reproduced end-to-end against the deployed catalyst-api before the fix: the SSE stream surfaced exactly these two validation errors. With the regex updated they no longer fire — failure now requires a real Hetzner token instead of being blocked at module-validation time.	2026-04-29 14:43:52 +02:00
github-actions[bot]	a646afa041	deploy: update catalyst images to `dc07b0d`	2026-04-29 12:24:02 +00:00
hatiyildiz	dc07b0d68e	merge: logo tile mirrors canonical marketplace treatment (theme-aware, Temporal visible)	2026-04-29 14:21:56 +02:00
hatiyildiz	5ba0c1c53b	fix(wizard): logo tile mirrors canonical marketplace treatment (theme-aware, Temporal visible) The universal `rgba(255,255,255,0.96)` tile from `691467b4` dropped white-on-transparent brand marks (Temporal, LiveKit, Mimir, Tempo, Velero, OpenBao …) into a blinding white pill — the user's "almost nothing is visible" complaint. Mirrors the SME marketplace's per-asset PNG approach (https://marketplace.openova.io/apps/) with metadata-driven backplates instead of universal chrome: - new `logoTone.ts` classifies every vendored component logo as `light` (white-glyph, needs slate-900 backplate) or `color` (full-colour or dark-glyph, reads on slate-100). Both tones are theme-independent — exactly like marketplace PNGs ship the same surface regardless of card theme. Empirically validated against every asset under public/component-logos/ on five candidate surfaces. - StepComponents.tsx — `.corp-comp-logo` tile + IconFallback now consume `getLogoToneStyle(entry.id)`. - StepReview.tsx — ComponentMiniCard 40×40 tile + LetterFallback same. - MarketplaceFamilyPage.tsx — `.mp-related-logo` / `.mp-related-icon` CSS rules now own geometry only; surface is per-asset inline style. - MarketplaceProductPage.tsx — `.mp-product-logo` / `.mp-product-icon` same pattern on the 80×80 hero tile. Per-component verification (dark + light wizard themes): Temporal — light tone → slate-900 backplate, white logo crisp Cilium — color tone → slate-100, full hexagon visible Cert-manager — color tone → slate-100, blue badge readable Grafana — color tone → slate-100, orange G readable Strimzi — color tone → slate-100, dark mark visible Keycloak — color tone → slate-100, color badge readable FerretDB — color tone → slate-100, wordmark + glyph visible Gates: tsc --noEmit clean · 149/149 vitest tests pass · vite build OK.	2026-04-29 14:21:12 +02:00
hatiyildiz	cea9621072	merge: bundle OpenTofu CLI in catalyst-api image; fix catalyst-system → catalyst namespace string	2026-04-29 14:08:36 +02:00
hatiyildiz	9b6c297dd8	fix(catalyst-api): bundle OpenTofu CLI in runtime image (pinned + checksum verified) The previous image bundled the infra/hetzner/ .tf sources but not the tofu binary itself, so every Launch failed with: tofu init: exec: "tofu": executable file not found in $PATH Add a dedicated builder stage that downloads OpenTofu v1.11.6 from the canonical GitHub release, verifies the SHA256 against the upstream SHA256SUMS file before extraction, and ships the binary into the runtime image at /usr/local/bin/tofu (mode 0755 so UID 65534 can exec it). The stage branches on $TARGETARCH (amd64 / arm64) to keep multi-arch buildx correct; both arch checksums are pinned as build args so version bumps are an explicit two-line change. Add a CI smoke step in catalyst-build.yaml's build-api job that runs `tofu version` inside the freshly-built image and asserts the output matches EXPECTED_TOFU_VERSION; failure fails the build. Also re-run with `--user 65534:65534` to gate exec-as-non-root at build time. The prior infra/hetzner/ presence smoke step is preserved unchanged. Sibling fix in ProvisionPage's FailureCard: the kubectl hint pointed at namespace `catalyst-system`, but catalyst-api actually runs in namespace `catalyst` (per chart/templates/api-deployment.yaml + live cluster). Replace the namespace literal so the diagnostic command copy-pastes correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:08:03 +02:00
github-actions[bot]	5e3cd1efbe	deploy: update catalyst images to `80db0da`	2026-04-29 11:56:13 +00:00
hatiyildiz	80db0da908	merge: contrast audit — restore theme tokens on ProvisionPage non-logo surfaces	2026-04-29 13:54:47 +02:00
hatiyildiz	6327d8db8b	fix(wizard): contrast audit — restore theme tokens on non-logo surfaces Provision page styled three surfaces with hardcoded rgba(255,255,255,...) literals rather than the page's theme tokens. The theme tokens (--s1, --md, --lo) already flip correctly under .provision-shell[data-theme="light"], so any element painted with the raw rgba was theme-locked to dark and washed out / invisible against the light radial-gradient page background. Three surfaces switched to tokens that already exist on the same page and flip per-theme: • DAG bubble label fill (pending state) — colour rgba(255,255,255,0.45) → var(--lo) Dark: --lo = rgba(255,255,255,0.40) (≈ same) Light: --lo = #475569 (slate-600, readable on light bg) • Live-log info-line text — color rgba(255,255,255,.78) → var(--md) Dark: --md = rgba(255,255,255,0.65) Light: --md = #334155 (readable on light log panel) • Live-log meta pill + failure-card hint <code> background — rgba(255,255,255,.04) → var(--s1) Dark: --s1 = rgba(255,255,255,0.04) (unchanged) Light: --s1 = #fff (lifted pill on slate page bg) The wizard StepReview surfaces (Section / Field / RegionCard / ComponentMiniCard) and the marketplace family/product pages were already migrated off raw rgba in 4f6dd10a; logo TILES intentionally keep rgba(255,255,255,0.96) per the documented contract in StepComponents.tsx LOGO_TILE_BG (vendored brand marks render in mixed treatments — dark glyphs designed for white backdrops, white glyphs on transparent — and a near-white pill keeps every glyph legible regardless of theme). Verification: • npx tsc --noEmit ✓ • npm run build ✓ • ./node_modules/.bin/vitest run — 149 passed (149) ✓ • Live wizard at /sovereign/wizard — every step's section surfaces and card surfaces render with proper contrast in BOTH dark and light themes; logo tiles still readable. • Live marketplace at /sovereign/marketplace/family/cortex and /sovereign/marketplace/product/axon — flat-section layout intact, logo tiles crisp. No layout, no test selectors, no router, no componentGroups.ts, no providerSizes.ts changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:54:03 +02:00
hatiyildiz	5520e91443	merge: review components — drop family summary, pixel-match marketplace.openova.io/review	2026-04-29 13:49:07 +02:00
hatiyildiz	4f6dd10a20	fix(wizard): review components — drop family summary, pixel-match marketplace.openova.io/review The Components section on StepReview rendered both a family-summary mini-card grid (PILOT M5 / SPINE M5 R1 O1 / …) AND a per-component card grid below. The summary was a duplicate read of the same data — each per-component card already shows its family chip, so the strip above counted what the cards already display. Drop it. The per-component cards themselves were tiny `auto-fill, minmax(180px, 1fr)` chips with logo + name + tier letter + family chip. Replace with a pixel-mirror of the canonical `.stack-card` on https://marketplace.openova.io/review/ — same horizontal flex layout, 40×40 logo tile, semibold name, low-key category pill, and single-line description. Tokens map 1:1 (light theme): marketplace `--color-bg` → wizard `--wiz-bg-input` marketplace `--color-border` → wizard `--wiz-border` marketplace `--color-text-strong` → wizard `--wiz-text-hi` marketplace `--color-text-dim` → wizard `--wiz-text-md` (desc), `--wiz-text-sub` (cat) Card geometry verified pixel-identical to marketplace at 1440px width: padding 10.4px, gap 10.4px, border-radius 8px, card height 66.078125px, 2-column grid with 8px gap collapsing to 1 column under 700px. Tier (M/R/O) intentionally dropped — not on the canonical card; the Components step before review already enforces tier semantics. The legend below the grid goes with it. Section + Field shells switched from `--wiz-bg-xs` to `--wiz-bg-sub` so the card surfaces lift visibly off the section background in light mode — the previous near-white tint was the same colour as the cards, so cards visually melted into the section ("white-on-white"). Verification: • npx tsc --noEmit ✓ • npm run build ✓ • ./node_modules/.bin/vitest run — 149 passed (149) ✓ • Live wizard at /sovereign/wizard step 7 — components section renders 2-col grid of stack-card-shaped components, no family summary, no tier legend, computed CSS matches marketplace. POST body to /v1/deployments unchanged. componentGroups.ts, provider/topology cards, router.tsx untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:48:28 +02:00
github-actions[bot]	e62fd5f3eb	deploy: update catalyst images to `7931f79`	2026-04-29 11:46:08 +00:00
hatiyildiz	7931f79ac4	merge: bundle infra/hetzner/ tofu module into catalyst-api image	2026-04-29 13:44:50 +02:00
hatiyildiz	61c6122633	fix(catalyst-api): bundle infra/hetzner/ tofu module into the image The catalyst-api Pod is the OpenTofu runner — provisioner.New() reads CATALYST_TOFU_MODULE_PATH (default /infra/hetzner) and stageModule() copies the canonical .tf / .tftpl files into a per-deployment workdir on every Launch. The previous Containerfile did not COPY the module in, so every Launch failed: {"level":"ERROR","msg":"provision failed", "err":"stage tofu module: open /infra/hetzner: no such file or directory"} Containerfile changes - Build context is now the public openova repo root (Containerfile paths COPY from products/catalyst/bootstrap/api/ explicitly). - New `COPY infra/hetzner/ /infra/hetzner/` brings the FULL tree (main.tf, variables.tf, outputs.tf, versions.tf, cloudinit-.tftpl, README.md) into the runtime image. The path /infra/hetzner/ matches provisioner.New()'s default and the catalyst-platform Helm chart's CATALYST_TOFU_MODULE_PATH override. Workflow changes (.github/workflows/catalyst-build.yaml, build-api job) - context: openova-src/products/catalyst/bootstrap/api -> openova-src (the repo root is needed so infra/hetzner/ is in the build context). - Split build into Build (load: true) + Smoke + Push, mirroring the UI job pattern. The smoke step runs `ls -la /infra/hetzner/` inside the built image and asserts main.tf, variables.tf, outputs.tf, versions.tf, and both cloudinit-.tftpl files are present. Failure fails the build — broken images can no longer ship. Verification (local) - go vet ./... + go test ./... in products/catalyst/bootstrap/api: clean - docker build -f products/catalyst/bootstrap/api/Containerfile . at the repo root succeeds; `docker run --rm --entrypoint sh catalyst-api:test -c 'ls -la /infra/hetzner/'` lists main.tf, variables.tf, outputs.tf, versions.tf, cloudinit-control-plane.tftpl, cloudinit-worker.tftpl. provisioner.go business logic untouched. catalyst-platform Helm chart api-deployment.yaml untouched (CATALYST_TOFU_MODULE_PATH already aligns with /infra/hetzner).	2026-04-29 13:44:11 +02:00
github-actions[bot]	127398e969	deploy: update catalyst images to `36747a3`	2026-04-29 11:39:01 +00:00
hatiyildiz	36747a3b26	merge: provision route invariant fix (use internal route id)	2026-04-29 13:38:00 +02:00
hatiyildiz	18d56ab8b8	fix(provision): use internal route id for useParams (basepath stripped) The /provision/ route is registered against the router's internal path; '/sovereign' is the basepath, stripped before matching. The 'from: "/sovereign/provision/$deploymentId"' lookup matched no route at runtime — TanStack Router throws 'Invariant failed' for any useParams call against an unknown route id. Cast was hiding the type error. This unblocks the SPA route — /sovereign/provision/<id> now renders the ProvisionPage without throwing.	2026-04-29 13:36:34 +02:00
github-actions[bot]	0745945eb8	deploy: update catalyst images to `4e5c75e`	2026-04-29 11:17:59 +00:00
hatiyildiz	4e5c75e05c	merge: provision as SPA route /sovereign/provision/:deploymentId; fix FQDN, components count, failure UX # Conflicts: # products/catalyst/bootstrap/ui/src/pages/wizard/steps/StepReview.tsx	2026-04-29 13:16:53 +02:00
hatiyildiz	8f8d9c0d8a	merge: dense multi-card review rows; per-component cards in Components	2026-04-29 13:15:41 +02:00
hatiyildiz	6a54782c7f	merge: neutral high-contrast logo tile across cards, review, marketplace	2026-04-29 13:14:41 +02:00
hatiyildiz	08cd438762	fix(wizard): provision as SPA route /sovereign/provision/:deploymentId; fix FQDN, components count, failure UX The provision page was a 1198-line static public/provision.html artefact plus a sibling provision.js / catalog.js triple. The .html URL was the visible give-away that the page wasn't first-class — it was rendered outside the React app, did not share design tokens, did not get bundled, and could not consume the wizard's zustand store directly. The result was a page that displayed "omantel.omani-works · SOLO · 0 components · Failed" with no actionable detail when something went wrong. This commit deletes all three static artefacts and ships a real SPA route at `/sovereign/provision/$deploymentId` instead. Same DAG visual, same EventSource wiring, same phase→bubble state machine — but as a React component that: - reads the deploymentId from URL params (deep-linkable, refresh-safe) - reads selectedComponents + topology from useWizardStore directly - resolves the FQDN via resolveSovereignDomain(store) — fixes the "omantel.omani-works" hyphen bug; the page now shows "omantel.omani.works" - renders a real FailureCard when SSE surfaces status="failed", carrying the deployment's actual error message + Retry / Back-to-wizard CTAs - handles 404 / EventSource error with a clean retry surface Wiring: - New /sovereign/provision/$deploymentId route in router.tsx - StepReview's provision() callback now navigates via router.navigate instead of window.location.href = path('provision.html') - BOOTSTRAP_KIT export added to catalog.generated.ts (read from clusters/_template/bootstrap-kit/ at build time, ordered by NN- prefix) so the React route can import the same source-of-truth the deleted catalog.js used to surface as window.CATALYST_CATALOG - emitPublicCatalog() removed from build-catalog.mjs — no static page consumes it any more Files deleted: - public/provision.html - public/provision.js - public/catalog.js Files added: - src/pages/provision/ProvisionPage.tsx (1300+ lines: catalog read, expandWithDependencies, buildNodes, buildEdges, computeLayout, applyEvent state machine, sidebar, log panel, failure card, status pill) Verified: tsc clean, 149/149 vitest tests pass.	2026-04-29 13:14:31 +02:00
hatiyildiz	9280cd4a4b	fix(wizard): dense multi-card review rows; per-component cards in Components Review page packs small fields/cards in horizontal rows instead of stacking them top-to-bottom. The Components section now renders every selected component as its own mini-card (logo + name + family chip + tier) so the operator sees exactly what will be installed, not just family-level counts. Reduced section padding and dropped redundant whitespace between rows so the review fits a typical viewport without scrolling. The provision()-to-/v1/deployments POST body is unchanged — visual only.	2026-04-29 13:10:41 +02:00
hatiyildiz	691467b486	fix(wizard): neutral high-contrast logo tile across cards, review, marketplace Component-logos vendored under public/component-logos/ are upstream brand marks rendered as-shipped — some are dark glyphs designed for white backdrops, some are white glyphs on transparent (designed for dark surfaces), some are full-colour. The previous tile (rgba(255,255,255,0.04) with the icon-fallback using oklch hue rotation) made dark glyphs invisible in dark mode and white glyphs invisible against the dim tile. Worse, the contrast story was inconsistent across surfaces — the wizard cards, the review page, and the marketplace family/product pages each picked their own background. This commit pins ONE tile contract used in every place a component logo renders: - background: rgba(255,255,255,0.96) (near-white pill, theme-independent) - border-radius: 10px - 1px outer border in --wiz-border-sub so the tile doesn't fight the card - 6px internal padding so tight square SVGs aren't cropped - IconFallback letter colour pinned to fixed slate (#0f172a) so the letter reads against the white tile in BOTH dark- and light-mode themes (--wiz-text-hi flips with the theme and would white-out in dark mode) Files updated: - StepComponents.tsx — .corp-comp-logo + IconFallback - MarketplaceFamilyPage.tsx — .mp-related-logo + .mp-related-icon - MarketplaceProductPage.tsx — .mp-product-logo + .mp-product-icon Verified by toggling dark/light theme and walking the wizard + marketplace pages — every brand mark legible regardless of glyph palette or theme.	2026-04-29 13:09:37 +02:00
github-actions[bot]	676889d67c	deploy: update catalyst images to `4149c44`	2026-04-29 10:38:14 +00:00
hatiyildiz	4149c443e4	merge: 4-line card grid; 6-10 word professional descs; full-width text body	2026-04-29 12:36:38 +02:00
hatiyildiz	9af51d980e	fix(wizard): 4-line card grid; 6-10 word descs; full-width text body The wizard component cards were copying the SME marketplace's `app-body { padding-right: 72px }` pattern, which reserves the right quarter of every card for an absolute-positioned hover-only round Add button. Combined with one- to three-word `desc` strings, every card showed a name, a chip line, a single half-line of description, and a visually empty right column — a quarter of valuable space wasted. This change restructures the cards around a rigid 4-line grid that spans the FULL body width: Line 1 — name (left, flex) + family chip + inline toggle (right) Line 2 — description line 1 (full width) Line 3 — description line 2 (full width, two-line clamp) Line 4 — tier chip + dependency chips + SELECTED dot (right) Chips appear ONLY on line 1 or line 4, never on lines 2-3. The `.corp-comp-body` no longer reserves any horizontal padding for overlay buttons; descriptions use the entire body column. The toggle affordance is relocated from an absolute-positioned 32×32 overlay (top-right of the card, opacity-0 until hover) to an inline 22×22 round button at the trailing edge of line 1, sharing the chip row with the family chip. It still fades in on card hover and stays visible when in-cart, but it occupies a single inline cell instead of reserving a vertical column. The bottom-right SELECTED text pill is replaced by a compact green dot anchored to the right end of line 4. The card already conveys selection through its green border, green-tinted background, and the green ✓ toggle button on line 1; the loud text pill duplicated those signals while crowding the dependency chips on cards with deps. Every component description in `componentGroups.ts` is rewritten as a 6-10 word professional sentence-fragment distilled from the long-form `COMPONENT_COPY.positioning` text in `marketplaceCopy.ts`. Same voice: factual, technical, terse — no hype, no forbidden vocabulary. Five before/after samples: flux: "GitOps delivery engine" → "GitOps reconciler driving every Sovereign cluster from Git" cilium: "CNI & eBPF service mesh" → "eBPF CNI and service mesh with kernel-level policy" cert-manager:"TLS certificate automation" → "Automated TLS issuance and rotation for every ingress" grafana: "Dashboards & alerting" → "Curated dashboards across metrics, logs, and traces" langfuse: "LLM observability & tracing" → "Prompt, completion, and cost tracing for the AI plane" All 63 component descriptions verified within 6-10 words; no forbidden vocabulary ("MVP", "for now", "stub", "iterative", "demo"); no marketing fluff. CSS changes preserve the canonical 108px resting height; tablet/mobile responsive floor unchanged. All 149 vitest specs continue to pass; existing data-testid selectors (`toggle-<id>`, `family-chip-<id>`, `tier-<id>`, `selected-<id>`, `deps-<id>-<dep>`, `includes-<id>`, `component-card-<id>`) are preserved unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 12:35:57 +02:00
hatiyildiz	570147cd8f	merge: canonical SKU catalogs (Hetzner CPX32 recommended; Huawei c7n.xlarge.2; OCI E5.Flex.2.16; AWS m6i.xlarge; Azure D4s_v5)	2026-04-29 12:32:08 +02:00
hatiyildiz	183c3066f2	fix(wizard): canonical SKU catalog from each provider's pricing page (Hetzner, Huawei, OCI, AWS, Azure) Replaces the guessed per-provider SKU catalog with values that match what each cloud provider publishes on its canonical pricing page today (snapshot 2026-04-29). Confused CX (Intel) vs CPX (AMD) vs CAX (ARM) vs CCX (dedicated) labels are gone — each id, label, vCPU/RAM/disk spec, and EUR price now comes from the source pricing page directly. Hetzner (19 SKUs): full CX23/33/43/53 (Intel), CPX22/32/42/52/62 (AMD), CAX11/21/31/41 (ARM), CCX13/23/33/43/53/63 (dedicated). Recommended: CPX32 — 4 vCPU AMD / 8 GB / 160 GB SSD, €0.0232/hr €14.49/mo (founder-stated EU starter). Sources: hetzner.com/cloud/regular-performance, /cost-optimized, /general-purpose. Huawei (11 SKUs): s7 / c7n / m7 families across 2/4/8/16 vCPU sizes. Recommended: c7n.xlarge.2 (4 vCPU / 8 GB). Source: huaweicloud.com/intl/en-us/product/ecs/pricing.html (specs cross-checked on Cloud Mercato). OCI (11 SKUs): VM.Standard.E5.Flex (AMD Genoa), .E4.Flex (Milan), .Standard3.Flex (Intel), .A1.Flex (Ampere ARM). Recommended: VM.Standard.E5.Flex (2 OCPU / 16 GB). Source: oracle.com/cloud/compute/pricing/ ($0.030/OCPU + $0.002/GB AMD; $0.010/OCPU ARM). AWS (15 SKUs): m6i / c6i / r6i (Intel Ice Lake) plus m7g (Graviton3 ARM) at .large/.xlarge/.2xlarge/.4xlarge. Recommended: m6i.xlarge (4 vCPU / 16 GB). Source: aws.amazon.com/ec2/pricing/on-demand/ (us-east-1 Linux on-demand, verified on Vantage). Azure (10 SKUs): Dsv5 / Esv5 / Dpsv5 v5 generation (Intel + Ampere ARM) at 2/4/8/16 vCPU sizes. Recommended: Standard_D4s_v5 (4 vCPU / 16 GB). Source: azure.microsoft.com/en-us/pricing/details/ virtual-machines/linux/ (West Europe, verified on Vantage). NodeSize interface gains `disk: number \| string` (local SSD GB or "EBS-only"/"Variable") and `priceMonth: number` (Hetzner cap; hyperscaler hour×730). USD list prices converted to EUR at 1 USD = 0.92 EUR (snapshot 2026-04, applied once at table-build time via priceUSDtoEUR helper). StepProvider sublabel now renders disk + monthly cap alongside vCPU/RAM/ hourly. Stale comment references to "cx32"/"cx42" updated to "CPX32" (the canonical Hetzner page calls it CPX32, never "CX32 — Standard"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 12:31:32 +02:00

... 3 4 5 6 7 ...

662 Commits