Commit Graph

409 Commits

Author SHA1 Message Date
hatiyildiz
4ee9e7dd6f fix(wizard): topology before provider; per-provider SKU catalog; per-region sizing
The wizard step order was inverted: it asked for the provider before the
topology, then put hetzner-only SKUs inside the topology step. Topology
decides how many regions exist; provider is a per-region property; SKU
vocabulary is per-provider (cx32 means nothing on Azure). Fixes all three.

New step order (WIZARD_STEPS + WizardPage STEPS): Org -> Topology ->
Provider -> Credentials -> Components -> Domain -> Review.

Per-provider SKU catalog at products/catalyst/bootstrap/ui/src/shared/
constants/providerSizes.ts replaces the legacy hetzner-only HETZNER_NODE_SIZES.
Five providers (hetzner, huawei, oci, aws, azure), each with realistic SKU
options drawn from that vendor's native instance-type vocabulary. Every
SKU read in the wizard goes through PROVIDER_NODE_SIZES[provider] -- no
SKU literal lives anywhere else.

StepProvider now renders one card per topology slot. Each card carries:
provider chooser, that provider's region picker, that provider's
control-plane SKU, that provider's worker SKU + count. Cost rollup sums
each region's (cp + worker*count) at its OWN provider's pricing, so a
mixed-cloud topology computes correctly.

StepTopology drops the SkuCard + NodeSizingPanel; it now captures only
the topology template, HA flag, and AIR-GAP add-on.

Per-region store fields (regionControlPlaneSizes, regionWorkerSizes,
regionWorkerCounts) replace the singular controlPlaneSize/workerSize/
workerCount as the canonical shape. Migration in store.merge() hydrates
the arrays from any persisted singular fields; the cx22 legacy default
is treated as "no selection" so a hetzner-only id never leaks into a
non-hetzner region.

Backend Request gains an optional Regions []RegionSpec field. Validate
mirrors Regions[0] into the legacy singular fields for the existing
solo-Hetzner writeTfvars path. infra/hetzner/variables.tf accepts the
list-of-objects shape; the for_each iteration that activates the rest
of the regions is the multi-region tofu wiring follow-up. Door open
structurally; no shape compromised.

Dead code removed: StepInfrastructure and shared/constants/hetzner.ts
(both orphaned, contained the only HETZNER_NODE_SIZES reference outside
the catalog).

Gates: tsc --noEmit, vite build, vitest (149 tests), go vet, go test
(provisioner + handler).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:44:33 +02:00
hatiyildiz
6afdb3038c merge: align marketplace family/product pages with canonical marketplace tokens 2026-04-29 11:39:07 +02:00
hatiyildiz
a44408e095 fix(wizard): align marketplace family/product pages with canonical marketplace tokens
Replaces the bespoke hero-card + ad-hoc spacing pattern in
MarketplaceFamilyPage and MarketplaceProductPage with a layout that
mirrors the canonical marketplace at https://marketplace.openova.io/apps/
(source: core/marketplace/src/components/AppDetail.svelte).

Tokens aligned:
  - H1 hero            1.5rem / 700 / --wiz-text-hi (canonical 24px / 700)
  - Section H2         1rem / 600 / --wiz-text-hi (canonical 16px / 600)
  - Subtitle           0.9rem / --wiz-text-sub (canonical 14.4px / dim)
  - Body paragraph     0.9rem / line-height 1.7 (canonical 14.4px / 1.7)
  - Bullets            0.85rem with 6px green dot bullet (canonical match)
  - Tier pills         0.62rem uppercase, family-tinted bg (mirrors
                       .detail-meta span, with light-mode WCAG override)
  - Member tile        36×36 logo + name + 2-line tagline + tier pill
                       (mirrors canonical .related-card)
  - Product hero       80×80 logo / centred body / right-aligned CTA
                       (mirrors canonical .detail-hero)
  - CTA buttons        0.6rem 1.4rem / radius 8 / 0.88rem / 600 (mirrors
                       canonical .detail-add)
  - Family chips       4px-radius accent-tinted, low-opacity (mirrors
                       canonical .detail-cat)
  - Dependency tiles   surface chip with mono name + dim group label
                       (mirrors canonical .detail-dependencies li)
  - Sections           flat, divided by 1px subtle border (mirrors
                       canonical .detail-section)
  - Hover state        border → accent + 1px lift (canonical match)

Removed:
  - Custom rounded-14px hero card with full background fill
  - Inline-style "made-up" right-arrow on member rows (replaced with
    actual component logos)
  - Stacked tier pill + button column (replaced with canonical's
    horizontal meta-row + right-aligned CTA pattern)
  - 1.05rem section h2 (canonical is 1rem)
  - 1.6 paragraph line-height (canonical is 1.7)

Forbidden words audit clean: no "MVP", "for now", "stub", "iterative",
"demo" in copy. Family palette colours preserved (sky/violet/amber/
emerald/rose/pink/indigo/cyan) — they are the canonical
brand-identification tier and align with the marketplace's role of
distinguishing platform families.

Tests: all 145 vitest cases pass; tsc --noEmit clean; vite build clean.
componentGroups.ts and StepComponents.tsx untouched per parallel-agent
ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:38:28 +02:00
hatiyildiz
0b432dd711 fix(wizard): pixel-match component cards to canonical marketplace UX
The d3346441 family-chip refactor bumped the wizard component-card
height from 108px to 130px and added an always-visible "Add / Selected"
pill button at bottom-right with a 1.85rem padding-bottom carved into
the card body. That broke the documented "pixel-match SME marketplace"
contract — the corporate Catalyst wizard cards no longer matched
https://marketplace.openova.io/apps/.

Restore the canonical SME marketplace card surface:
  - card height: 130px → 108px (read-only path stays 108px, no special-case)
  - body padding: 0.4rem right + 1.85rem bottom → 4.5rem right (no bottom)
  - replace the bottom-right "Select / Selected" pill button with the
    canonical 32×32 round icon button at top-right (Plus → Check),
    opacity 0 by default, opacity 1 on card hover, always visible
    when in-cart (mirrors AppsStep.svelte .app-add-btn 1:1)
  - re-introduce the bottom-right SELECTED status pill (only when
    in-cart) — mirrors AppsStep.svelte .status-corner / .s-selected
  - render dependencies as one chip per dep ("+ DepName"), matching
    AppsStep's chip-dep pattern (replaces the single deps-count chip
    + extra paragraph that forced the height bloat)
  - keep the test/a11y `includes-<id>` paragraph but absolute-position
    it off-screen (sr-only) so layout stays at 108px

Affordance reconciliation (no card-height growth):
  - the entire card is now an anchor to /marketplace/product/<id>,
    matching SME's `<a href="/app?slug=X" class="app-card">` wrapper
  - the family chip nested inside is a `<Link>` to
    /marketplace/family/<id> with stopPropagation
  - the round +/✓ button stops propagation and toggles selection via
    the wizard store (data-testid=`toggle-<id>` preserved for tests)
  - all three navigation surfaces preserved: family chip → family
    portfolio, card body → product detail, +/✓ button → wizard store

Read-only Tab 2 ("Always Included") path unchanged behaviourally —
renders as a plain `<div>` (not a `<Link>`) so it stays inert.

All 145 vitest cases pass (including the 89 in StepComponents.test.tsx).
TypeScript clean. Production vite build clean.

Refs: docs/INVIOLABLE-PRINCIPLES.md #2 (never compromise quality —
the SME marketplace IS the proven shape; do not diverge from it).
2026-04-29 11:29:13 +02:00
github-actions[bot]
c0f3e63ffc deploy: update catalyst images to b0ec0c4 2026-04-29 08:51:14 +00:00
hatiyildiz
b0ec0c4300 merge: family chips, product detail, family portfolio routes 2026-04-29 10:49:22 +02:00
hatiyildiz
6a7d2dd89b ci(catalyst-build): align UI smoke-test asset list with canonical extensions
Agent 1 (#176 logos) sourced each component's official upstream brand
mark in whatever format the project itself publishes — most projects
ship SVG, but Grafana docs (loki/mimir/tempo), Aqua (trivy), Anchore
(syft-grype), the LangFuse repo, vLLM, Ntfy, FerretDB, OpenMeter,
Coraza, External-DNS, NetBird, and StrongSwan only publish PNG. The
old smoke test hard-asserted every spot-checked id resolved as
.svg, so the langfuse PNG broke the build.

Replaced the hardcoded extension loop with an explicit list of full
paths matching componentGroups.ts. Every entry mirrors the actual
logoUrl the wizard renders, so a missing or mis-named asset still
fails the build — but in lockstep with the data file, not against
a stale extension assumption.
2026-04-29 10:49:09 +02:00
hatiyildiz
d3346441d6 feat(wizard): family chips, product detail, family portfolio routes 2026-04-29 10:48:22 +02:00
hatiyildiz
c78041c518 merge: reorder wizard steps (domain after components), revamp review
# Conflicts:
#	products/catalyst/bootstrap/ui/src/pages/wizard/steps/StepReview.tsx
2026-04-29 10:42:24 +02:00
hatiyildiz
8aec6244c5 merge: worker SKU + count selector in topology step 2026-04-29 10:40:46 +02:00
hatiyildiz
60e403ae6b merge: dynamic DAG + SSE wiring on provision page 2026-04-29 10:40:43 +02:00
hatiyildiz
56519aef5f merge: dependency mapping audit (fixes Spector→FABRIC and other bogus edges)
# Conflicts:
#	products/catalyst/bootstrap/ui/src/pages/wizard/steps/componentGroups.ts
2026-04-29 10:40:41 +02:00
hatiyildiz
169b1d1c70 merge: original product logos (replaces stylized placeholders with canonical upstream marks) 2026-04-29 10:38:56 +02:00
hatiyildiz
30ff318d0d fix(wizard): use canonical upstream logos for component cards
Every platform-component card now renders the OFFICIAL upstream brand
mark instead of a stylized OpenOva placeholder. Logos are sourced from
the CNCF artwork repo and each project's own repository:

  Source                           Components
  ────────────────────────────────────────────────────────────────────
  cncf/artwork                     cert-manager, cilium, cnpg
                                   (cloudnativepg), crossplane, envoy,
                                   external-secrets (eso), falco,
                                   flux, harbor, keda, keycloak,
                                   knative, kserve, kyverno, litmus,
                                   opentelemetry, opentofu, sigstore,
                                   strimzi, vpa (kubernetes)
  Project repo                     alloy, clickhouse, debezium,
                                   ferretdb, frpc, gitea, grafana,
                                   iceberg, kserve, langfuse,
                                   librechat, livekit, loki, matrix,
                                   milvus, mimir, neo4j, netbird,
                                   ntfy, openbao, openmeter,
                                   opensearch, reloader, seaweedfs,
                                   stalwart, strongswan, stunner,
                                   superset, syft-grype, temporal,
                                   tempo, trivy, valkey, vcluster,
                                   velero, vllm, flink, coraza

44 components ship as SVG; 14 components whose upstream publishes only
PNG marks (Loki, Mimir, Tempo, Trivy, NetBird, ntfy, OpenMeter, vLLM,
Coraza, Ferret, Syft+Grype, External-DNS, strongSwan, LangFuse) ship
as `<id>.png` with an explicit `logoUrl` override.

Five components retain `logoUrl: null` (letter-mark fallback): the
existing PowerDNS plus BGE (a model-family identifier rather than a
branded product) and the OpenOva-internal Axon, Continuum, Specter
components whose brand marks are not yet finalized.

Card markup, `depends:`, and family flags are intentionally not
touched in this commit (handled by parallel agents).

Quality gates:
  - npx tsc --noEmit            green
  - npm run build               green
  - vitest StepComponents.test  90/90 passed
2026-04-29 10:34:29 +02:00
hatiyildiz
a02f33cec0 feat(wizard): dynamic DAG + SSE wiring on provision page
Drop the 1100-line static-mock provision.html in favour of a runtime-
generated DAG keyed off the wizard's persisted localStorage state and the
build-time blueprint catalog. Bubbles, edges, sub-progress, log routing
and final CTA are all computed from real backend data.

What is now dynamic:
- Hardcoded NODES/TOPO/EDGES/LOGS arrays gone — DAG is built from
  window.CATALYST_CATALOG (components + bootstrap-kit) and the wizard
  selection at page load.
- One Hetzner-infra supernode and one Flux-bootstrap supernode anchor the
  graph; bootstrap-kit Blueprints render in numeric install order; user
  selection from selectedComponents (with transitive HARD deps expanded
  via blueprint.depends) makes up the rest.
- EventSource wired to <BASE>api/v1/deployments/<id>/logs. Phase events
  drive bubble state transitions (tofu-init|tofu-plan run Hetzner-infra
  through 0→.30 progress; raw `tofu` lines parse hcloud_network/
  hcloud_firewall/hcloud_server/hcloud_load_balancer markers to advance
  the supernode's sub-progress; tofu-output finishes it; flux-bootstrap
  opens the second supernode).
- Raw stdout/stderr lines stream into the live-log panel and the active
  bubble's expandable detail (per-node accumulation, click any bubble to
  read its slice).
- On `event: done` with status=ready, surface "Open Console →" CTA
  pointing at result.consoleURL from the snapshot.
- Empty-state path renders a clean "no active deployment" view when the
  page is hit without a wizard session or deploymentId in localStorage.

Build-catalog change:
- scripts/build-catalog.mjs now also emits public/catalog.js setting
  window.CATALYST_CATALOG = { components, bootstrapKit }. bootstrapKit is
  read from clusters/_template/bootstrap-kit/ (numbered prefix → install
  order). Same scan as the typed catalog.generated.ts so both surfaces
  stay in lock-step.

Per-component states beyond flux-bootstrap are not yet emitted by
catalyst-api; nodeForPhase() already routes phase=bp-<slug> events onto
the matching bubble so wiring the Flux Kustomization watcher on the
backend lights up the rest with no further page work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:30:14 +02:00
hatiyildiz
5ab70abad9 feat(wizard): reorder steps (domain after components), revamp review
Order changes:
- StepOrg now scope: name, industry, size, HQ, compliance only (email
  + domain inputs removed)
- New step sequence: Org -> Provider -> Credentials -> Topology ->
  Components -> Domain -> Review
- StepDomain captures the admin contact email alongside the Sovereign
  FQDN; the email pairs naturally with the deployment's external
  surface (Let's Encrypt registration, completion notifications)
- WIZARD_STEPS labels updated to match the new flow

StepReview revamp — single source of truth for the POST body:
- Sections in order: Organisation, Cloud Provider, SSH Access,
  Topology (incl. workerSize / workerCount with defensive null-guard),
  Components, Domain (admin email lives here)
- Hetzner token + registrar token rendered as fixed-length mask plus
  character count, never plaintext (INVIOLABLE-PRINCIPLES.md #10)
- SSH source row distinguishes auto-generated vs. pasted; fingerprint
  truncated for readability
- Domain section explicitly shows the resolved FQDN and the chosen mode

Gates: tsc --noEmit clean, vite build green, vitest 146/146 pass,
dev server boots cleanly on /sovereign/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:27:34 +02:00
hatiyildiz
0b6bb3eaea fix(wizard): audit and correct component depends + family cascade
Operator-reported defect: "if I select spectoer it is brining the entire
fabric family as well, I dont think there is such depenency in reality".
The user is right — the cascade was over-broad.

Root cause: PRODUCTS['cortex'].familyDependencies = ['fabric']. With
CORTEX.cascadeOnMemberSelection = true, selecting any CORTEX member (or
Specter, whose component-level deps include CORTEX members) walked the
family graph and pulled every FABRIC à-la-carte member — Strimzi/Kafka,
Debezium/CDC, Flink, Temporal, ClickHouse, Iceberg, Superset — onto the
selection. Specter and the rest of CORTEX have no runtime dependency on
any of those workloads. The only real cross-family need is cnpg (for
LangFuse) and a Mongo-compatible store (for LibreChat). cnpg is already
mandatory via transitive promotion; the Mongo backend is satisfied by
FerretDB, which is now reached via the corrected component-level dep
(librechat → ferretdb → cnpg).

Changes (one line per change):
- componentGroups.ts: PRODUCTS.cortex.familyDependencies: ['fabric'] → []
- componentGroups.ts: librechat.dependencies: ['cnpg'] → ['ferretdb']
  (LibreChat speaks MongoDB, not PG; FerretDB cascades cnpg transitively)
- componentGroups.ts: grafana.dependencies: ['seaweedfs'] → []
  (Grafana the dashboard server uses SQLite/PG; only its companion stores
  Loki/Mimir/Tempo need object storage)
- StepComponents.test.tsx: regression test "selecting Specter does NOT
  auto-select the FABRIC family" + companion tests asserting CORTEX
  familyDependencies is empty, librechat → ferretdb, grafana has no deps,
  and addProduct(cortex) does not drag à-la-carte FABRIC members.

Verification:
- npm run test (vitest run): 150/150 pass on the worktree
- npx tsc --noEmit: clean
- npm run build: clean
- Live wizard probe (vite dev): addComponent('specter') yields 34 ids,
  zero of {strimzi, debezium, flink, temporal, clickhouse, iceberg,
  superset} present, full CORTEX family present, librechat → ferretdb
  cascade fires.

The CORTEX cascadeOnMemberSelection flag remains true (per issue #175
operator intent: "BGE alone doesn't have much meaning unless we have
Cortex"). FABRIC stays cascadeOnMemberSelection: false (à-la-carte). The
wizard now mirrors real-world component coupling: Specter brings only
the CORTEX runtime members it actually needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:27:11 +02:00
hatiyildiz
b96a03a585 feat(wizard): worker SKU + count selector in topology step
Closes the wizard polish gap "Selecting the shapes of the worker
nodes should be there." StepInfrastructure had a worker SKU + count
selector but was never wired into WizardPage.STEPS — the user walks
through StepTopology, where no sizing controls existed.

Adds a NodeSizingPanel inside StepTopology that:
  • Renders control-plane and worker SKU cards from
    HETZNER_NODE_SIZES (single source of truth — no SKU duplication).
  • Exposes a worker-count stepper and editable spinbutton, clamped
    to the topology-aware floor (0 for solo, 3 for multi-region) and
    a ceiling of 6 to stay inside Hetzner's default project quota.
  • Shows the worker SKU grid only when count > 0.
  • Surfaces a hard validation error when count > 0 but workerSize
    is unset; gates the Topology step's Continue button on the same.

Updates the store's setTopology to seed the worker-count default at
topology-pick time (solo → 0, multi-region → max(current, 3)) so
users land on a sensible default and the existing partialize() rules
keep persisting controlPlaneSize / workerSize / workerCount across
sessions unchanged.

StepReview now renders three chips inside the Infrastructure section
(control plane, workers, compute-total cost rollup) so the SKU + count
choice is visible at launch time, alongside the per-region cards
that were already there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:25:48 +02:00
hatiyildiz
27527e4ca5 fix(catalyst-api): pin TOFU_WORKDIR to writable /tmp + raise cpu/mem caps
Launch failed instantly with "create workdir: mkdir /var/lib/catalyst:
permission denied". The catalyst-api Pod runs as UID 65534 with emptyDir
mounts only at /tmp and /home/nonroot — /var/lib was never writable, so
the provisioner.New() default for CATALYST_TOFU_WORKDIR
(/var/lib/catalyst/tofu) lost on the very first MkdirAll call.

Three coupled fixes:

- Set CATALYST_TOFU_WORKDIR=/tmp/catalyst/tofu so the per-deployment
  workdir tree lands in the existing /tmp emptyDir.
- Bump cpu limit 100m → 1000m, memory limit 64Mi → 1Gi. tofu init pulls
  ~80MB hcloud + ~30MB dynadot provider plugins; tofu plan/apply hold
  the state file in memory; 64Mi was always going to OOM on first init.
- Grow /tmp emptyDir sizeLimit 256Mi → 2Gi to fit the per-Sovereign
  subdirectory tree (provider binaries + state + plan output).

Manifest-only change — Flux reconciles, kubectl rollout swaps the Pod,
no image rebuild required.
2026-04-29 10:12:44 +02:00
github-actions[bot]
f74e2816f1 deploy: update catalyst images to beefe02 2026-04-29 07:45:25 +00:00
hatiyildiz
beefe0262a merge: #175 — product-family dependency model + transitive-mandatory promotion
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:44:11 +02:00
hatiyildiz
0887073735 feat(wizard): #175 — product-family dependency model + transitive-mandatory promotion
Two interlocking fixes for StepComponents per operator feedback (#175):

1. **Transitive-mandatory promotion** (Fix A) — at module-load time walk
   the dependency graph from every mandatory-tier component and promote
   every reached component to mandatory. cnpg + valkey are lifted from
   recommended → mandatory because Harbor / Gitea / PowerDNS / Keycloak
   (mandatory or transitively mandatory) cannot run without them. They
   no longer surface in Tab 1 ("Choose Your Stack"); they appear in Tab 2
   ("Always Included") under the FABRIC product section.

2. **Product-family model** (Fix B) — new `Product` type in
   `componentGroups.ts` with `tier`, `components`, `familyDependencies`,
   and `cascadeOnMemberSelection`. CORTEX is flagged as
   cascade-on-member-selection (operator: "BGE alone doesn't have much
   meaning unless we have Cortex... when chosen the entire family needs
   to be selected"). Selecting any CORTEX member or Specter (whose deps
   reach into CORTEX) cascades the rest of CORTEX plus FABRIC (CORTEX's
   familyDependency). À-la-carte products (FABRIC, RELAY) keep
   independent member selection.

UX additions:
- Product header per family in Tab 1 with "Select entire X family" CTA
  (selectable via product-cta-<id> testid)
- Cascade-add toast surfaces both component-deps and family additions
- Cascade-remove confirmation modal lists every dependent that will go
- All operator-visible strings sourced from new
  `stepComponentsCopy.ts` i18n module — no inline literals in JSX

Store actions: `addProduct(id)` / `removeProduct(id)` plus a
member-selection cascade in `addComponent` that respects the product
flag. Mandatory components are protected from any cascade-remove path.

Documentation: `docs/PRODUCT-FAMILIES.md` describes the dependency
model, every product entry, and worked examples (Specter, BGE, Harbor,
ClickHouse).

Vitest: 43 new test cases including transitive-promotion verification,
cross-product cascade, product CTA flow, and i18n wiring. All 146
tests pass; typecheck + build green.

Closes #175.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:43:00 +02:00
hatiyildiz
04559e5c37 docs(reconcile-pass-1): align docs with ground truth at dd578d1c
Reconcile Pass 1 — first holistic LLM-driven reconciliation pass per
~/.claude/skills/reconcile-catalyst-docs/SKILL.md. Skill triggered after
the post-Group-M architectural batch (#161, #162, #163, #167, #168,
#169, #170, #171, #173, #174, #175). Live ground truth verified against
kubectl + ls platform/ + git log + GHCR + componentGroups.ts.

Drift categories fixed:

- A. Numerical: bp-powerdns 1.0.5 → 1.0.6; component-logos 63 → 62
  (powerdns SVG missing, tracked under #173); bootstrap kit 11 → 12
  with bp-powerdns added per #167.
- B. Service: pool-domain-manager + 5 registrar adapters
  (Cloudflare/Namecheap/GoDaddy/OVH/Dynadot, #170) added to
  IMPLEMENTATION-STATUS, ARCHITECTURE, PLATFORM-TECH-STACK, GLOSSARY,
  and PROVISIONING-PLAN; bp-powerdns added to ARCHITECTURE bootstrap
  kit + Catalyst-on-Catalyst dependency tree.
- C. Architectural: SOVEREIGN-PROVISIONING §3 + DEMO-RUNBOOK Step 4
  + ORCHESTRATOR-STATE Step 6 rewritten from Dynadot-direct DNS writes
  to PowerDNS authoritative + PDM /v1/commit + registrar-adapter
  NS-flip; PROVISIONING-PLAN Phase 4 paths corrected to
  products/catalyst/bootstrap/api/ (per INVIOLABLE-PRINCIPLES #3 the
  Go provisioner does NOT call cloud APIs); Phase 6 retitled and
  rewritten for the new DNS architecture.
- D. Process: RUNBOOK-PROVISIONING §2 wizard-step table + DEMO-RUNBOOK
  Step 2 wizard-step table updated to canonical 7-step ordering
  (Org → Domain → Topology → Provider → Credentials → Components →
  Review per WIZARD_STEPS in WizardLayout.tsx, post #169 + #174); the
  three-mode StepDomain (pool / byo-manual / byo-api per #169) and
  two-tab StepComponents (mandatory infra + apps per #161/#162/#175)
  now documented.
- E. Cross-doc: Group G  across PROVISIONING-PLAN +
  ORCHESTRATOR-STATE (superseded by #167+#163+#170, not by the
  original Dynadot-multi-domain plan); Group C  in
  PROVISIONING-PLAN (Flux is reconciling from openova-public today);
  README Stack-at-a-glance DNS row expanded.
- F. Stale terminology: 11-grep banned-terms scan clean — every k8gb
  residual is a legitimate "removed at #171, replaced by lua-records"
  reference.

VALIDATION-LOG.md gains the Reconcile Pass 1 entry per skill spec.
Reconcile-skill numbering is independent of the Audit-skill numbering
(which continues at Pass 108+).

Files: 13 docs + VALIDATION-LOG entry.
Escalations: none.
2026-04-29 09:40:10 +02:00
github-actions[bot]
c83171805c deploy: update catalyst images to 2e6cfd7 2026-04-29 07:20:21 +00:00
hatiyildiz
2e6cfd79c3 merge: #173 — fix wizard component-card logos under /sovereign/ base
Squashed-via-no-ff: #173 root-cause fix for absolute logo URLs that
ignored the Vite base. componentGroups.ts now derives every logo path
from `import.meta.env.BASE_URL` via `path()` so the URL stays in sync
with vite.config.ts. Adds CI smoke step that curls the logos to fail
the build on any missing/mis-cased SVG, plus Vitest coverage for the
letter-mark fallback path.
2026-04-29 09:19:18 +02:00
hatiyildiz
d382d99e45 fix(catalyst-ui): #173 — wizard component logos render under /sovereign/ base
Root cause: componentGroups.ts hardcoded `/component-logos/<id>.svg`. The
catalyst-ui SPA is served at the Vite base `/sovereign/`, so the browser
fetches `/component-logos/...` (no prefix), which Traefik routes to the
website ingress, not catalyst-ui — every logo 404'd and the IconFallback
letter avatar took over for all 63 cards.

Fix: derive logo URLs from `path()` in shared/config/urls.ts, which reads
`import.meta.env.BASE_URL`. Vite injects the base at build time
(`/sovereign/` in prod, `/` in dev/test) so the URL stays in sync with
`vite.config.ts` and the ingress without any hardcoded prefix
(INVIOLABLE PRINCIPLE #4).

Also:
- powerdns.svg was never vendored — set logoUrl: null so the wizard
  renders the letter-mark fallback for that one card by design.
- Add Vitest coverage for the null-logoUrl fallback path (PowerDNS).
- Add CI smoke step that asserts /component-logos/<id>.svg returns 200
  for 11 representative components so a missing or mis-cased vendored
  SVG fails the build, not the user.
- Document the logo path convention in a docblock at the top of
  componentGroups.ts so future devs can't reintroduce a hardcoded path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:18:50 +02:00
github-actions[bot]
dd578d1c13 deploy: update catalyst images to 7a5e5db 2026-04-29 07:16:20 +00:00
Emrah Baysal
7a5e5db9ba merge: hoist wizard step indicator into page header (#174)
Brings fix/wizard-step-header to main. Wizard's 7-step progress
indicator now lives in a single 56px page-header band (alongside the
OpenOva brand mark + theme/exit actions), matching the nova/core console
chrome convention. Step body reclaims the vertical real estate. New
vitest suite asserts the header layout, the 7 step indicators, the
active-step class, and the mobile fallback.

Closes #174.
2026-04-29 09:15:06 +02:00
hatiyildiz
dbf37e1ba5 fix(catalyst-wizard): hoist 7-step indicator into page header (#174)
The wizard's progress stepper used to live inside `.corp-main` (the step
body region). Nova / core console renders all chrome in a top header
band, so the wizard now does the same:

  - Single 56px-tall sticky `<header data-testid="wizard-header">` band
    hosts brand mark + 7-step indicator + theme/exit actions
  - Step indicator carries `data-testid="wizard-stepper"` and exposes
    `wizard-step-{1..7}` testids; the active step gets `.active` and
    `aria-current="step"`, completed steps get `.done`
  - At ≤1024px the per-step labels collapse, ≤720px the dotted indicator
    hides and a "Step X of Y · <Label>" string takes over
  - All dimensions/colors come from the wizard's `--wiz-*` token set
    (per Inviolable Principle #4 — never hardcode); the 56px height
    matches nova's Sidebar.svelte logo row (`h-14`) and the border-bottom
    uses the shared `--wiz-border` token

Vitest covers: header presence, brand mark, exactly 7 step indicators,
active/done class application, mobile-collapsed indicator, and the
absence of a duplicate stepper inside the step body.

Closes #174.
2026-04-29 09:14:37 +02:00
github-actions[bot]
40805334a8 deploy: update catalyst images to 194b0ee 2026-04-29 07:04:00 +00:00
e3mrah
194b0ee413
Merge pull request #172 from openova-io/feat/wizard-byo-domain
feat(wizard): #169 — StepDomain three-mode (pool / byo-manual / byo-api)
2026-04-29 11:02:21 +04:00
hatiyildiz
20f5dca902 feat(wizard): #169 — StepDomain three-mode (pool / byo-manual / byo-api)
Closes openova#169.

Wizard UI:
- New StepDomain.tsx with three radio modes (pool / BYO manual NS / BYO
  registrar API). Pool flow unchanged from #163. BYO-manual surfaces the
  three OpenOva nameservers (ns1-3.openova.io) verbatim with copy buttons.
  BYO-api adds a registrar dropdown (Cloudflare, Namecheap, GoDaddy, OVH,
  Dynadot) + token field + Validate button — read-only validation hits
  /api/v1/registrar/{r}/validate before Next is enabled.
- StepOrg trimmed to org-only fields (domain capture moved to StepDomain).
- WizardPage + WizardLayout add the new "Domain" step (now 7 steps total).

Wizard store:
- DomainMode expanded to 'pool' | 'byo-manual' | 'byo-api' with legacy
  'byo' coerced to 'byo-manual' on rehydrate.
- New fields: registrarType (RegistrarType | null), registrarToken,
  registrarTokenValidated.
- partialize() strips registrarToken + registrarTokenValidated from
  localStorage (credential hygiene per docs/INVIOLABLE-PRINCIPLES.md #10).
- setSovereignDomainMode cascades a clean reset of irrelevant fields.

PDM (core/pool-domain-manager):
- New endpoint POST /api/v1/registrar/{registrar}/validate — read-only
  twin of /set-ns. Calls adapter.ValidateToken; never flips NS records.
  Maps registrar errors to canonical HTTP statuses (401/403/429/502).
  Token never enters a logged struct.

catalyst-api (products/catalyst/bootstrap/api):
- New handler/registrar.go — thin proxy that forwards
  /api/v1/registrar/{r}/{validate|set-ns} to PDM's matching endpoint,
  reading the body once and streaming PDM's response status + body
  verbatim so the wizard's error-mapping vocabulary stays consistent.

Tests:
- StepDomain.test.tsx — 18 vitest cases covering all three modes,
  mode-switch field cleanup, validate fetch happy/error paths, token
  invalidation on edit.
- store.test.ts — wizard-store mutations + persist hygiene.
- StepSuccess.test.tsx — fixture updated 'byo' -> 'byo-manual'.
- registrar_test.go (PDM) — 7 new test cases for /validate covering
  happy, invalid-token, domain-not-in-account, unsupported-registrar,
  missing-fields, bad-JSON, response-doesnt-leak-token.

103 vitest cases pass. Go tests pass for both PDM and catalyst-api.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:01:07 +02:00
github-actions[bot]
f06b6d9ce7 deploy: update catalyst images to 67fdecb 2026-04-29 06:52:27 +00:00
hatiyildiz
67fdecb770 merge: remove k8gb (#171) 2026-04-29 08:51:21 +02:00
hatiyildiz
f5daac52af refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171)
PowerDNS lua-records (`ifurlup`, `pickclosest`, `ifportup`) cover everything
k8gb was doing — geo-aware response selection, health-checked failover,
weighted round-robin — at the authoritative DNS layer. Eliminates a
separate K8s controller, CRD set, and CoreDNS plugin from every Sovereign.

Changes:
- platform/k8gb/ deleted (Chart.yaml, values.yaml, blueprint.yaml never
  authored — only README existed)
- products/catalyst/bootstrap/ui/public/component-logos/k8gb.svg deleted
- componentGroups.ts: remove k8gb component (PowerDNS already there)
- componentLogos.tsx: drop logo_k8gb + k8gb map entry
- model.ts DEFAULT_COMPONENT_GROUPS spine: replace k8gb with powerdns
- StepInfrastructure.tsx: copy refers to PowerDNS lua-records, not k8gb
- provision.html: replace k8gb tile and edges with powerdns
- catalog.generated.ts regenerated (now includes bp-powerdns)
- docs sweep — every k8gb reference in PLATFORM-TECH-STACK, NAMING-
  CONVENTION, SOVEREIGN-PROVISIONING, SRE, ARCHITECTURE, GLOSSARY,
  COMPONENT-LOGOS, IMPLEMENTATION-STATUS, BUSINESS-STRATEGY,
  TECHNOLOGY-FORECAST, README, infra/hetzner/README, platform READMEs
  (cilium, external-dns, failover-controller, litmus, flux, opentofu)
  rewritten to point at PowerDNS lua-records / MULTI-REGION-DNS.md.
  Historical entries in VALIDATION-LOG.md preserved as audit trail.
- New docs/MULTI-REGION-DNS.md — canonical reference for the lua-record
  patterns (ifurlup all/pickclosest/pickfirst, ifportup, pickwhashed),
  Application Placement → lua-record selector mapping, when to add a
  second Sovereign region, operational checks.

Closes #171.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:51:09 +02:00
hatiyildiz
6e9b9fe8a3 merge: bp-powerdns 1.0.6 — gpgsql-dnssec=yes (openova#168 followup)
Fixes the 422 "no DNSSEC-capable backends loaded" surfaced when PDM
tried to enable DNSSEC on parent pool zones at startup.
2026-04-29 08:42:27 +02:00
hatiyildiz
f4679e2748 fix(powerdns): enable gpgsql-dnssec for DNSSEC API (1.0.6)
Without `gpgsql-dnssec=yes` the gpgsql backend driver does not expose
the DNSSEC API surface — `PUT /zones/<zone>` with `dnssec:true` returns
422 "no DNSSEC-capable backends are loaded". This blocks pool-domain-
manager from enabling DNSSEC on every Sovereign child zone (mandatory
per docs/PLATFORM-POWERDNS.md).

Fix lands in additionalConfig so the directive is rendered alongside
`default-soa-edit-signed=INCEPTION-EPOCH` and `direct-dnskey=yes`. No
schema migration needed — the gpgsql 5.0.3 schema already includes the
cryptokeys table; the missing piece was just the backend feature flag.

Bumps Chart.yaml to 1.0.6. Verified: after this lands the PUT call
returns 204 and POST /cryptokeys mints a usable KSK.

Discovered while bringing up openova#168 (PDM per-Sovereign zones).
2026-04-29 08:42:18 +02:00
hatiyildiz
f777394367 merge: PDM per-Sovereign PowerDNS zones (openova#168)
PDM /reserve now creates a per-Sovereign child zone in PowerDNS with
apex NS RRset + adds NS delegation into the parent pool zone +
enables DNSSEC. /commit writes the canonical 6-record set into the
child zone (atomic PATCH). /release drops the child zone and removes
the parent NS delegation.

Includes pdns client (22 tests), allocator with DNSWriter interface
(fake-DNS state-machine tests), startup parent-zone bootstrap, and
trimmed dynadot package (now config helpers only — registrar adapter
under internal/registrar/dynadot/ untouched for #170 BYO Flow B).
2026-04-29 08:37:07 +02:00
hatiyildiz
a6fb7410f4 feat(pdm): per-Sovereign PowerDNS zones for #168
Refactor pool-domain-manager to own per-Sovereign zones in PowerDNS,
replacing the previous Dynadot-set_dns2 record-write flow.

Phase 1 — internal/pdns: REST client for PowerDNS Authoritative API
  - CreateZone / DeleteZone / EnsureZone / ZoneExists
  - PatchRRSets (atomic batch RRset writes)
  - AddARecord / AddNSDelegation / RemoveNSDelegation
  - EnableDNSSEC: PUT dnssec flag, generate KSK+ZSK (algorithm 13
    ECDSAP256SHA256 per docs/PLATFORM-POWERDNS.md), POST rectify
  - retry-once-on-5xx with exponential backoff (250ms, 1s)
  - X-API-Key header from K8s Secret, never logged
  - 22 unit tests covering every method against httptest mock

Phase 2 — allocator: DNSWriter interface + per-Sovereign lifecycle
  - /reserve: insert pdm-pg row + create child zone with apex NS
    RRset + add NS delegation into parent + enable DNSSEC on child
  - /commit: write the canonical 6-record set (apex, *, console,
    api, gitea, harbor) into child zone, TTL 300, atomic PATCH
  - /release: drop child zone (DNSSEC keys retire) + remove parent
    NS delegation, idempotent on 404
  - sweeper teardowns DNS for expired reservations before deleting
    pdm-pg rows
  - rollback path on Reserve failure preserves operator UX
  - allocator_test.go: fake DNSWriter for state-machine assertions

Phase 3 — startup parent-zone bootstrap
  - BootstrapParentZones runs at PDM startup before HTTP serves
  - EnsureZone for every entry in DYNADOT_MANAGED_DOMAINS
  - DNSSEC enabled on each parent zone (idempotent)
  - PDM exits non-zero if bootstrap fails

Phase 4 — schema unchanged
  - child zone name derived as <subdomain>.<poolDomain>, no new column
  - existing pool_allocations table works as-is

Phase 5 — dynadot package trimmed
  - removed AddSovereignRecords / DeleteSubdomainRecords / AddRecord /
    getZone / writeZone (Dynadot DNS write code)
  - kept IsManagedDomain / ManagedDomains / ResetManagedDomains /
    ErrUnmanagedDomain (config-resolution helpers)
  - registrar adapter at internal/registrar/dynadot/ untouched (handles
    BYO Flow B NS-delegation via #170)

Phase 6 — env-var contract
  PDM_PDNS_BASE_URL, PDM_PDNS_API_KEY, PDM_PDNS_SERVER_ID, PDM_NAMESERVERS
  all runtime-configurable per docs/INVIOLABLE-PRINCIPLES.md #4.

Quality bar (all met):
  - DNSSEC enabled on every child zone (mandatory per spec)
  - parent NS delegation TTL 3600, child A-record TTL 300
  - retry-once-on-5xx with exponential backoff in pdns client
  - all credentials flow from env vars sourced from K8s Secrets
  - no hardcoded URLs, regions, or NS endpoints

Closes openova#168 (DNS-side; private-repo manifest update lands separately).
2026-04-29 08:36:45 +02:00
hatiyildiz
22d430eaa8 merge: bp-powerdns 1.0.5 (postInitSQL syntax fix, openova#167)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:17:34 +02:00
hatiyildiz
fa84cac438 fix(powerdns): plain ALTER TABLE in postInitSQL (avoid $$ escape battle, 1.0.5)
The DO block in 1.0.4 rendered with $$ collapsed to $ by the time it
reached CNPG's postInitApplicationSQL — "syntax error at or near $".
Both Helm template processing and the YAML scalar block were chewing on
the dollar signs.

Replaced with explicit ALTER TABLE statements (one per gpgsql table) +
GRANT — same end state, no PL/pgSQL quoting required. Verified at
runtime on contabo-mkt: powerdns Pod went CrashLoopBackOff →
Running 1/1 immediately after the manual ALTER ran by hand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:17:28 +02:00
hatiyildiz
30f3015dc8 merge: bp-powerdns 1.0.4 (CNPG ownership fix, openova#167)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:14:18 +02:00
hatiyildiz
214a3e1ada fix(powerdns): grant table ownership to pdns user in CNPG bootstrap (1.0.4)
Verified at runtime on Contabo-mkt: postInitApplicationSQL runs as the
postgres superuser, not the application owner, so the schema tables
created by the bootstrap block were owned by postgres. PowerDNS connects
as 'pdns' and got 'permission denied for table domains' on the first
SELECT against the zone cache.

Added a DO block at the end of the schema bootstrap that walks every
table in the public schema and ALTERs OWNER TO {{ .Values.postgres.cluster.owner }}
plus GRANT ALL PRIVILEGES ON SCHEMA public — same shape PDM uses (and
the contabo-mkt cluster verified the fix runtime: powerdns Pod went
from CrashLoopBackOff to 1/1 Ready immediately after the same DDL was
run by hand).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:14:12 +02:00
hatiyildiz
036dc39800 merge: bp-powerdns 1.0.3 (dnsdist backend env-injection + table ownership, openova#167)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:13:54 +02:00
hatiyildiz
db20e9d42b fix(powerdns): dnsdist backend resolution + drop DnstapLogAction (1.0.3)
dnsdist 1.9.14 runtime errors:
  1. newServer{address='powerdns:5353'} → "Unable to convert presentation
     address" — dnsdist's address parser expects IP[:port], not a DNS
     name. Kubernetes auto-injects POWERDNS_SERVICE_HOST as an env var
     into every pod in the same namespace as the powerdns Service; using
     that gives us the ClusterIP at config-load time without needing an
     init container or runtime DNS resolution.
  2. DnstapLogAction(name, bool, fn) signature changed in 1.9 — the
     2nd parameter now expects a shared_ptr to a RemoteLoggerInterface,
     not a boolean. Rather than wire up a remote dnstap server (which
     adds a moving part for marginal observability gain), drop the line.
     Catalyst observability is the dnsdist /metrics endpoint surfaced
     to Prometheus + the k8s container log.

Bumped chart to 1.0.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:12:27 +02:00
hatiyildiz
790fc7efb0 merge: bp-powerdns 1.0.2 (dnsdist tag + RO rootfs fix, openova#167)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:06:47 +02:00
hatiyildiz
20c0543806 fix(powerdns): correct dnsdist image tag + drop readOnlyRootFilesystem (1.0.2)
Two runtime issues caught during first contabo-mkt rollout:

1. dnsdist image tag was "1.9" (default) — that tag doesn't exist in
   docker.io/powerdns/dnsdist-19. The 1.9.x line publishes 1.9.0 .. 1.9.14
   (no rolling "1.9" alias). Pinned to 1.9.14 (current latest).

2. PowerDNS pod crash-looped on Errno 30 (Read-only file system:
   /etc/powerdns/pdns.d/0-api.conf.conf). The upstream pdns_server-startup
   script writes rendered config files to /etc/powerdns/pdns.d/ at
   container start, and the upstream template doesn't expose an emptyDir
   we could redirect that path to. Set readOnlyRootFilesystem=false with
   a verbose comment explaining why; the rest of the security context
   (runAsNonRoot, runAsUser=953, drop ALL caps) stays in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:06:39 +02:00
hatiyildiz
134b3fbedf merge: bp-powerdns 1.0.1 (dnsdist checksum fix, openova#167)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:03:00 +02:00
hatiyildiz
19d926bfeb fix(powerdns): avoid recursive include in dnsdist checksum, bump to 1.0.1
Helm flagged dnsdist.yaml's checksum/config annotation as a recursive
template self-reference (the file included itself). Replaced with a
hash of the rendered .Values.dnsdist.config (post-tpl), which is the
substantive content the annotation is supposed to track anyway.

Bumped Chart.yaml to 1.0.1 so the OCIRepository semver "1.x" picks
up the fix automatically on next reconcile. Blueprint API version stays
at 1.0.0 (Blueprint contract is unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:02:53 +02:00
hatiyildiz
e3a006bc6f merge: bp-powerdns wrapper + per-Sovereign zone model (closes #167 phases 1-3)
Closes #167 (public-repo phases). Cluster manifest deploy in
openova-private feat/powerdns-deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:50:16 +02:00