Commit Graph

8 Commits

Author SHA1 Message Date
e3mrah
cd6b2555a0
fix(pdm/dynadot): remove fictional ResponseHeader wrapper from api3.json adapter (#939) (#948)
Dynadot's real api3.json response places ResponseCode + Status + Error
DIRECTLY under each <Command>Response envelope; there is no nested
`ResponseHeader` object — the prior decode shape was a misread of the
docs that survived because every test fixture used the same fictional
shape.

Live capture (2026-05-05, omani.works domain_info success):
  {"DomainInfoResponse":{"ResponseCode":0,"Status":"success",
   "DomainInfo":{...}}}

Live capture (error envelope):
  {"DomainInfoResponse":{"ResponseCode":"-1","Status":"error",
   "Error":"could not find domain in your account"}}

Note: ResponseCode is JSON int 0 on success but JSON string "-1" on
error. Switched to json.Number so both shapes round-trip without an
Unmarshal failure, and added codeIsZero() to normalise comparison.

What's fixed in this commit:

- core/pool-domain-manager/internal/registrar/dynadot:
  ValidateToken / SetNameservers / GetNameservers / GetGlueRecord /
  RegisterGlueRecord (all five command paths) now decode against the
  real shape. Tightened classifyDynadotError so "could not find domain
  in your account" maps to ErrDomainNotInAccount before the auth
  matcher (which would otherwise grab on the substring "auth").

- core/pkg/dynadot-client: GetDomainInfo (was the last set_dns2 sibling
  still using the wrapper) aligned with the rest of the client.

- products/catalyst/bootstrap/api/internal/dynadot: AddRecord rebound
  to SetDnsResponse (not the SetDns2Response key it never returned)
  with code+status at the top — fixes the silent-success-on-failure
  loophole the catalyst-api was hitting.

Tests use real api3.json fixture shapes; new regression coverage for:
  - ResponseCode=int 0 w/o Status field (Dynadot omits Status sometimes)
  - "could not find domain in your account" → ErrDomainNotInAccount
  - "needs to be registered with an ip address" set_ns rejection (#900)

Verified via live integration call against api.dynadot.com:
  - ValidateToken(omani.works)  -> success
  - ValidateToken(google.com)   -> ErrDomainNotInAccount
  - GetNameservers(omani.works) -> ["ns1.openova.io","ns2.openova.io"]

Refs #939, #170, #900, #825.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:11:39 +04:00
e3mrah
e08d8721e1
fix(pdm/dynadot): pre-register glue records before set_ns (#900) (#906)
Multi-domain Day-2 add-domain on a Sovereign was failing with Dynadot's
"'ns1.<sov>.omani.works' needs to be registered with an ip address
before it can be used" error. Dynadot rejects set_ns whenever the NS
hostnames aren't registered as account-level "host records" first.

This change wires the glue pre-registration into the PDM dynadot
adapter as an optional registrar.GlueRegistrar interface, threads the
Sovereign's load-balancer IPv4 from cloud-init through Flux postBuild
into the chart's `global.sovereignLBIP`, and forwards it via
catalyst-api's pdmFlipNS to PDM's /set-ns endpoint as a new `glueIP`
field. PDM's SetNS handler calls RegisterGlueRecord for each
out-of-bailiwick NS before SetNameservers, with idempotent get_ns →
register_ns / set_ns_ip semantics so retries are free.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:00:45 +04:00
e3mrah
4946ccd125
feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes #710) (#719)
Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS
operator with a single overlay toggle.

Changes
=======

products/catalyst/chart:
- Chart.yaml 1.2.7 → 1.3.0
- values.yaml: ingress.marketplace.enabled toggle (default false) +
  marketplace.{brand,currency,paymentProvider,signupPolicy} surface
- templates/sme-services/marketplace-routes.yaml: HTTPRoute
  marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin,
  / → marketplace; HTTPRoute *.<sov> → console (per-tenant wildcard)
- templates/sme-services/marketplace-reference-grant.yaml: cross-
  namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services
- .helmignore: stop excluding sme-services/* and marketplace-api/* (only
  *.kustomization.yaml + *.ingress.yaml remain Kustomize-only)
- All sme-services/* + marketplace-api/* manifests wrapped with
  {{ if .Values.ingress.marketplace.enabled }} so non-marketplace
  Sovereigns render the chart unchanged

clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
- chart version 1.2.7 → 1.3.0
- ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN}
- ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false}

infra/hetzner:
- variables.tf: marketplace_enabled var (string "true"/"false", default "false")
- main.tf: thread var into cloudinit-control-plane.tftpl
- cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED
  on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations

products/catalyst/bootstrap/api/internal/provisioner/provisioner.go:
- Request.MarketplaceEnabled bool (json:"marketplaceEnabled")
- writeTfvars: marketplace_enabled = "true"|"false"

core/pool-domain-manager/internal/allocator/allocator.go:
- canonicalRecordSet adds "marketplace" prefix → marketplace.<sov>
  resolves via PDM at zone-commit time (PR #710 explicit record so
  caches don't depend on the *.<sov> wildcard alone)

DoD ready
=========
- helm template with ingress.marketplace.enabled=false → identical
  manifest set to 1.2.7 (verified locally)
- helm template with ingress.marketplace.enabled=true → emits 17 extra
  resources: 13 sme-services workloads + 2 marketplace-api + 1
  HTTPRoute pair + 1 ReferenceGrant
- pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green
- catalyst-api builds, provisioner cloudinit_path_test green

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-04 07:47:37 +04:00
e3mrah
3a34969a2f
feat(catalyst+pdm): Sovereign self-decommission + post-handover redirect (closes #319) (#451)
Customer-side decommission UI + PDM release endpoints + Catalyst-Zero
redirect to console.<sovereign-fqdn> once handover is finalised.

Anti-duplication map (canonical seams reused, NOT duplicated):
  - catalyst-api wipe.go: existing wipe endpoint already drives PDM
    release + Hetzner purge + tofu destroy + local cleanup. The new
    DecommissionPage POSTs to the same endpoint with an optional
    backup-destination payload.
  - PDM Allocator.Release: child zone delete + parent-zone NS revert
    + allocation row delete already idempotent. The new sovereign-side
    POST /api/v1/release is a thin FQDN-shaped wrapper that splits at
    the first dot and delegates to Allocator.Release.
  - The orphan force-release path adds gates (X-Force-Release-Confirm
    header, 30-day grace, DNS-NXDOMAIN check) on top of the same seam.

Scope contract with #317 (handover finalisation): NOT touching
internal/handler/handover.go. AdoptedAt is a new contract field on
Deployment + store.Record that the redirect helper consumes; future
#317 enhancement will populate it before deletion.

Files:
  core/pool-domain-manager/internal/handler/release.go         (NEW)
  core/pool-domain-manager/internal/handler/release_test.go    (NEW)
  core/pool-domain-manager/internal/handler/handler.go         (route wiring)
  products/catalyst/bootstrap/api/internal/handler/deployments.go     (AdoptedAt field + State()/toRecord/fromRecord)
  products/catalyst/bootstrap/api/internal/handler/deployments_adopted_test.go (NEW)
  products/catalyst/bootstrap/api/internal/store/store.go      (AdoptedAt persistence)
  products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.tsx        (NEW)
  products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.test.tsx   (NEW)
  products/catalyst/bootstrap/ui/src/pages/sovereign/Dashboard.tsx    (Decommission link)
  products/catalyst/bootstrap/ui/src/app/router.tsx            (redirect + decom route)
  docs/omantel-handover-wbs.md                                 (T319 → done)

Tests: 13 new Go test cases + 5 new vitest cases all green. catalyst-
api + PDM full suites pass. Live execution against omantel deferred to
Phase 8 per ticket scope (no Dynadot/Hetzner exec here).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:27:18 +04:00
hatiyildiz
20f5dca902 feat(wizard): #169 — StepDomain three-mode (pool / byo-manual / byo-api)
Closes openova#169.

Wizard UI:
- New StepDomain.tsx with three radio modes (pool / BYO manual NS / BYO
  registrar API). Pool flow unchanged from #163. BYO-manual surfaces the
  three OpenOva nameservers (ns1-3.openova.io) verbatim with copy buttons.
  BYO-api adds a registrar dropdown (Cloudflare, Namecheap, GoDaddy, OVH,
  Dynadot) + token field + Validate button — read-only validation hits
  /api/v1/registrar/{r}/validate before Next is enabled.
- StepOrg trimmed to org-only fields (domain capture moved to StepDomain).
- WizardPage + WizardLayout add the new "Domain" step (now 7 steps total).

Wizard store:
- DomainMode expanded to 'pool' | 'byo-manual' | 'byo-api' with legacy
  'byo' coerced to 'byo-manual' on rehydrate.
- New fields: registrarType (RegistrarType | null), registrarToken,
  registrarTokenValidated.
- partialize() strips registrarToken + registrarTokenValidated from
  localStorage (credential hygiene per docs/INVIOLABLE-PRINCIPLES.md #10).
- setSovereignDomainMode cascades a clean reset of irrelevant fields.

PDM (core/pool-domain-manager):
- New endpoint POST /api/v1/registrar/{registrar}/validate — read-only
  twin of /set-ns. Calls adapter.ValidateToken; never flips NS records.
  Maps registrar errors to canonical HTTP statuses (401/403/429/502).
  Token never enters a logged struct.

catalyst-api (products/catalyst/bootstrap/api):
- New handler/registrar.go — thin proxy that forwards
  /api/v1/registrar/{r}/{validate|set-ns} to PDM's matching endpoint,
  reading the body once and streaming PDM's response status + body
  verbatim so the wizard's error-mapping vocabulary stays consistent.

Tests:
- StepDomain.test.tsx — 18 vitest cases covering all three modes,
  mode-switch field cleanup, validate fetch happy/error paths, token
  invalidation on edit.
- store.test.ts — wizard-store mutations + persist hygiene.
- StepSuccess.test.tsx — fixture updated 'byo' -> 'byo-manual'.
- registrar_test.go (PDM) — 7 new test cases for /validate covering
  happy, invalid-token, domain-not-in-account, unsupported-registrar,
  missing-fields, bad-JSON, response-doesnt-leak-token.

103 vitest cases pass. Go tests pass for both PDM and catalyst-api.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:01:07 +02:00
hatiyildiz
a6fb7410f4 feat(pdm): per-Sovereign PowerDNS zones for #168
Refactor pool-domain-manager to own per-Sovereign zones in PowerDNS,
replacing the previous Dynadot-set_dns2 record-write flow.

Phase 1 — internal/pdns: REST client for PowerDNS Authoritative API
  - CreateZone / DeleteZone / EnsureZone / ZoneExists
  - PatchRRSets (atomic batch RRset writes)
  - AddARecord / AddNSDelegation / RemoveNSDelegation
  - EnableDNSSEC: PUT dnssec flag, generate KSK+ZSK (algorithm 13
    ECDSAP256SHA256 per docs/PLATFORM-POWERDNS.md), POST rectify
  - retry-once-on-5xx with exponential backoff (250ms, 1s)
  - X-API-Key header from K8s Secret, never logged
  - 22 unit tests covering every method against httptest mock

Phase 2 — allocator: DNSWriter interface + per-Sovereign lifecycle
  - /reserve: insert pdm-pg row + create child zone with apex NS
    RRset + add NS delegation into parent + enable DNSSEC on child
  - /commit: write the canonical 6-record set (apex, *, console,
    api, gitea, harbor) into child zone, TTL 300, atomic PATCH
  - /release: drop child zone (DNSSEC keys retire) + remove parent
    NS delegation, idempotent on 404
  - sweeper teardowns DNS for expired reservations before deleting
    pdm-pg rows
  - rollback path on Reserve failure preserves operator UX
  - allocator_test.go: fake DNSWriter for state-machine assertions

Phase 3 — startup parent-zone bootstrap
  - BootstrapParentZones runs at PDM startup before HTTP serves
  - EnsureZone for every entry in DYNADOT_MANAGED_DOMAINS
  - DNSSEC enabled on each parent zone (idempotent)
  - PDM exits non-zero if bootstrap fails

Phase 4 — schema unchanged
  - child zone name derived as <subdomain>.<poolDomain>, no new column
  - existing pool_allocations table works as-is

Phase 5 — dynadot package trimmed
  - removed AddSovereignRecords / DeleteSubdomainRecords / AddRecord /
    getZone / writeZone (Dynadot DNS write code)
  - kept IsManagedDomain / ManagedDomains / ResetManagedDomains /
    ErrUnmanagedDomain (config-resolution helpers)
  - registrar adapter at internal/registrar/dynadot/ untouched (handles
    BYO Flow B NS-delegation via #170)

Phase 6 — env-var contract
  PDM_PDNS_BASE_URL, PDM_PDNS_API_KEY, PDM_PDNS_SERVER_ID, PDM_NAMESERVERS
  all runtime-configurable per docs/INVIOLABLE-PRINCIPLES.md #4.

Quality bar (all met):
  - DNSSEC enabled on every child zone (mandatory per spec)
  - parent NS delegation TTL 3600, child A-record TTL 300
  - retry-once-on-5xx with exponential backoff in pdns client
  - all credentials flow from env vars sourced from K8s Secrets
  - no hardcoded URLs, regions, or NS endpoints

Closes openova#168 (DNS-side; private-repo manifest update lands separately).
2026-04-29 08:36:45 +02:00
hatiyildiz
567d7e1f60 feat(pdm): registrar adapters for Cloudflare, Namecheap, GoDaddy, OVH, Dynadot (#170)
Adds the BYO Flow B (#166) registrar-flip seam: PDM now exposes a
provider-agnostic Registrar interface and 5 adapter implementations
plus a new HTTP endpoint that dispatches to them.

Wire surface
- POST /api/v1/registrar/{registrar}/set-ns
  Body: {"domain":"...","token":"...","nameservers":["..."]}
  Reply: {"success":true,"registrar":"...","domain":"...",
          "nameservers":["..."],"propagation":"..."}
- GET /healthz now lists the wired-in registrar names.

Interface (internal/registrar/registrar.go)
- Name(), ValidateToken, SetNameservers, GetNameservers
- Typed errors: ErrInvalidToken, ErrRateLimited, ErrDomainNotInAccount,
  ErrAPIUnavailable, ErrUnsupportedRegistrar
- Registry map[string]Registrar with Lookup + Names helpers

Adapters
- internal/registrar/cloudflare/  — API v4 with Bearer token; verifies
  via /user/tokens/verify, looks up zone by name, PATCHes name_servers
- internal/registrar/namecheap/   — XML API; ApiUser+ApiKey+UserName+
  ClientIp auth; getBalances probe + getList domain check; setCustom
  for write. IP-whitelisting requirement documented in source comments
- internal/registrar/godaddy/     — v1 API with sso-key auth;
  GET /v1/domains list + PATCH /v1/domains/{d} with nameServers body
- internal/registrar/ovh/         — request signing (HMAC-SHA1 over
  appSecret+consumerKey+method+url+body+timestamp); GET /domain probe;
  POST /domain/{d}/nameServers/update for write; GET .../nameServer[/{id}]
  for read
- internal/registrar/dynadot/     — api3.json with key+secret as colon-
  separated token; uses set_ns + domain_info commands. Distinct from
  the existing internal/dynadot package which is the DNS-record writer
  for OpenOva-managed pool domains (different concern: pool DNS vs.
  customer-domain registrar NS-flip)

Token hygiene (per docs/INVIOLABLE-PRINCIPLES.md #10)
- Tokens never persisted: in-memory only for the duration of the call
- Never logged: handler uses classifyOutcome to render redacted
  outcome labels, never the raw error message or token
- Never echoed in responses
- TestSetNSResponseDoesNotEchoToken + TestSetNSHappy assert no token
  bytes appear in JSON body or zerolog/slog output

Tests
- 74 new unit tests (httptest server per adapter):
  cloudflare 11, dynadot 11, godaddy 11, namecheap 13, ovh 12,
  handler 14, registrar interface 2
- Each adapter covers: happy path, bad-token, rate-limited (429),
  bad-domain (404 / not-in-account), empty-NS guard, name+default
- OVH signature math verified deterministically via injected nowFn

Acceptance (issue #170)
- All 5 adapters pass their unit tests
- PDM /api/v1/registrar/{r}/set-ns endpoint live
- Wired into cmd/pdm/main.go: every adapter registered at startup

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), each adapter's
BaseURL is constructor-default + struct-overridable, so tests inject
httptest endpoints without environment shenanigans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:46:30 +02:00
hatiyildiz
585b046f5d feat(pdm): pool-domain-manager service skeleton (Phase 1 of #163)
Build a new Go service core/pool-domain-manager that becomes the SOLE
authority for OpenOva-pool subdomain allocation across the fleet.

Why this exists: today products/catalyst/bootstrap/api/internal/handler/
subdomains.go does naive net.LookupHost() to decide whether a candidate
subdomain is taken. Dynadot's wildcard parking record at the apex of
omani.works (and any future pool domain) makes EVERY subdomain resolve
to 185.53.179.128, so the check rejects everything. DNS is the wrong
source of truth for an OpenOva-managed pool — the central control plane
must own the allocation table.

What this commit adds (no integration with catalyst-api yet — that lands
in a follow-up commit):

  core/pool-domain-manager/
    cmd/pdm/main.go                     chi router, healthz, sweeper boot
    api/openapi.yaml                     wire contract for every endpoint
    Containerfile                        alpine final stage, UID 65534
    internal/store/                      pgx + CNPG; pool_allocations table
      migrations.sql                       idempotent CREATE TABLE schema
      store.go                             Reserve/Get/Commit/Release/List
      store_test.go                        integration tests (PDM_TEST_DSN)
    internal/dynadot/                    moved + extended; SOLE Dynadot caller
      dynadot.go                           AddRecord, AddSovereignRecords,
                                           DeleteSubdomainRecords (read-modify-
                                           write to honour feedback_dynadot_dns)
      dynadot_test.go                      managed-domain resolution tests
    internal/reserved/                   centralised reserved-name list
      reserved.go                          IsReserved/All; pulled out of
                                           catalyst-api's subdomains.go
    internal/handler/                    HTTP surface
      handler.go                           /api/v1/pool/{domain}/{check,reserve,
                                           commit,release,list}, /healthz,
                                           /api/v1/reserved
    internal/allocator/                  state machine + sweeper goroutine

Architecture choices and how they map to docs/INVIOLABLE-PRINCIPLES.md:

  - Principle #4 (never hardcode): every value (PORT, PDM_DATABASE_URL,
    DYNADOT_MANAGED_DOMAINS, PDM_RESERVATION_TTL, PDM_SWEEPER_INTERVAL)
    flows from env vars; the K8s ExternalSecret will populate them at
    deploy time. The reserved-subdomain list lives in ONE place
    (internal/reserved); catalyst-api will not duplicate it.

  - Principle #2 (no quality compromise): the state machine commits the
    DB row before the Dynadot side-effect, so a crash between the two
    leaves the system in a recoverable state (operator runs Release).
    The reservation_token in the row protects against stale-tab commit
    races. UPSERT semantics + a CHECK constraint mean two operators
    racing /reserve get a clean 23505 (unique_violation) → HTTP 409.

  - Principle #3 (follow architecture): PDM is a ClusterIP service in
    openova-system — it is not a Crossplane provider, not a Flux
    HelmRelease, not bespoke OpenTofu state. catalyst-api speaks to it
    via plain HTTP. The Crossplane Composition that wraps PDM as a
    declarative MR (XDynadotPoolAllocation) lands in a follow-up phase.

The DNS-wildcard problem the issue describes is fixed STRUCTURALLY here:
PDM never calls net.LookupHost. The /check path is a single SELECT
against pool_allocations. omani.works's wildcard A record at the apex
becomes architecturally irrelevant.

Tests exercised in this commit:
  - internal/reserved: full unit coverage (case-insensitive, sorted, set
    membership)
  - internal/dynadot: managed-domain runtime resolution (env-var,
    legacy single-domain fallback, built-in defaults, list parsing)
  - internal/store: integration suite gated on PDM_TEST_DSN env var,
    covers reserve happy-path, reserve race (ErrConflict), TTL expiry
    frees, commit happy-path, commit token mismatch, release removes
    row, sweeper deletes expired rows

Closes phase 1 of #163. Phase 2 (catalyst-api wiring), Phase 3 (CI +
manifests), Phase 4 (Crossplane composition), Phase 6 (deploy +
verification curl) follow in separate commits.

Refs: #163
2026-04-29 06:37:38 +02:00