feat(catalyst-api): wire pool-domain-manager into the wizard lifecycle (Phase 2 of #163)

The wizard's StepDomain debounced check, the deployment-creation reserve,
the post-tofu-apply commit, and the on-failure release now all flow
through the pool-domain-manager service that landed in the previous
commit. The DNS-wildcard regression at omani.works (where every
subdomain resolved to 185.53.179.128 because of the apex parking record
and broke the LookupHost-based check) is now FIXED STRUCTURALLY:

  - Managed pools: route through PDM, which has zero DNS dependency.
  - BYO domains:   keep the legacy LookupHost path because the customer
                   owns the zone — that nameserver IS the source of truth.

Files changed:

  internal/pdm/client.go (new)
    Tiny HTTP client for PDM (Check, Reserve, Commit, Release) plus a
    package-level IsManagedDomain runtime resolver that mirrors the legacy
    catalyst-api dynadot.IsManagedDomain semantics WITHOUT importing the
    dynadot package. The DYNADOT_MANAGED_DOMAINS env var is the contract;
    PDM is the writer of any actual Dynadot side-effect.

  internal/handler/handler.go
    New(...) reads POOL_DOMAIN_MANAGER_URL from env (default = in-cluster
    service FQDN). NewWithPDM(client) is exposed for tests so a fake can
    be injected without spinning up a real HTTP server. Per docs/INVIOLABLE-
    PRINCIPLES.md #4 the URL is configuration, not code.

  internal/handler/subdomains.go (rewritten)
    Removed: net.LookupHost on '<sub>.<pool>' for managed pools. Removed:
    duplicate reservedSubdomains map (lives ONLY in PDM now). Added:
    h.checkManagedPool() that delegates to PDM and surfaces PDM's
    Available/Reason/Detail verbatim. Added: h.checkBYO() that keeps the
    legacy DNS path for non-managed domains. Defence in depth: when PDM
    URL is misconfigured the handler returns reason='pdm-unavailable'
    rather than silently falling back to DNS (which would resurrect the
    wildcard bug).

  internal/handler/deployments.go
    CreateDeployment now reserves the pool subdomain via PDM BEFORE
    launching the runProvisioning goroutine, captures the
    reservation_token onto the Deployment struct, and returns 409 on
    PDM ErrConflict so the wizard's StepReview can surface the race
    cleanly. runProvisioning issues PDM /commit on success (with the
    LB IP) or /release on failure. PDM owns the eventual Dynadot write —
    catalyst-api never calls api.dynadot.com directly for the wizard's
    lifecycle after this lands.

  internal/handler/{subdomains,deployments}_test.go (new)
    Subdomains: prove (a) managed pool delegates to PDM and surfaces
    PDM's response verbatim, (b) DNS-wildcard parking records cannot
    cause Available=false for any random subdomain (regression guard
    for #163), (c) PDM returns active-state → handler returns
    Available=false with the right reason, (d) BYO falls back to DNS
    correctly, (e) invalid label short-circuits before PDM is called,
    (f) PDM unavailable surfaces 'pdm-unavailable' rather than
    silently succeeding.
    Deployments: prove (a) managed pool reserves via PDM exactly once,
    (b) PDM 409 conflict on reserve blocks the deployment with HTTP
    409, (c) BYO mode does NOT consult PDM.

Architectural compliance:

  - Principle #4 (never hardcode): every URL/domain/region is runtime
    configuration. POOL_DOMAIN_MANAGER_URL has a sane default so the
    common case 'just works' but is overridable for air-gap installs.
  - Principle #2 (no quality compromise): the PDM lifecycle is the
    target-state shape. Reserve before tofu apply guarantees a name
    can't be double-allocated by a parallel wizard tab. Commit AFTER
    tofu apply guarantees we don't write DNS for a Sovereign that
    doesn't exist yet.
  - Lesson #24 (don't bypass off-the-shelf primitives): the catalyst-api
    no longer carries its own copy of the reserved-name list, no longer
    calls Dynadot directly for the lifecycle, and no longer does DNS-
    based availability checks for managed pools. PDM IS the off-the-
    shelf primitive for this concern; we use it.

Refs: #163
This commit is contained in:
hatiyildiz 2026-04-29 06:44:22 +02:00
parent 585b046f5d
commit 01183cb44c
6 changed files with 906 additions and 84 deletions

View File

@ -16,6 +16,7 @@ import (
"crypto/rand"
"encoding/hex"
"encoding/json"
"errors"
"fmt"
"net/http"
"sync"
@ -23,6 +24,7 @@ import (
"github.com/go-chi/chi/v5"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/pdm"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/provisioner"
)
@ -37,6 +39,17 @@ type Deployment struct {
FinishedAt time.Time
Events chan provisioner.Event
mu sync.Mutex
// PDM reservation captured before `tofu apply` for managed-pool
// deployments. The reservationToken is held until `tofu apply`
// returns the LB IP, at which point we POST it to PDM /commit. On
// `tofu destroy` (or a phase-0 retry that decides to abandon) we
// DELETE /release.
//
// Empty for BYO deployments — those keep their own DNS off-platform.
pdmReservationToken string
pdmPoolDomain string
pdmSubdomain string
}
// State returns a JSON-safe snapshot for the GET endpoint.
@ -92,6 +105,47 @@ func (h *Handler) CreateDeployment(w http.ResponseWriter, r *http.Request) {
StartedAt: time.Now(),
Events: make(chan provisioner.Event, 256),
}
// Reserve the pool subdomain via PDM BEFORE we kick off `tofu apply`.
// PDM holds the name with a TTL — if `tofu apply` fails or this catalyst-
// api Pod crashes, the TTL expires and the name is freed automatically.
// On the success path the runProvisioning goroutine calls /commit with
// the LB IP, which flips the reservation to ACTIVE and writes the
// Dynadot DNS records.
//
// For BYO deployments (the customer owns the DNS zone) we skip PDM
// entirely — the customer points their own CNAME at the LB IP shown
// on the success screen.
if req.SovereignDomainMode == "pool" && pdm.IsManagedDomain(req.SovereignPoolDomain) {
if h.pdm == nil {
writeJSON(w, http.StatusInternalServerError, map[string]string{
"error": "pool-domain-manager client is not configured (POOL_DOMAIN_MANAGER_URL)",
})
return
}
reserveCtx, reserveCancel := context.WithTimeout(r.Context(), 10*time.Second)
reservation, reserveErr := h.pdm.Reserve(reserveCtx, req.SovereignPoolDomain, req.SovereignSubdomain, "catalyst-api/deployment-"+id)
reserveCancel()
if reserveErr != nil {
if errors.Is(reserveErr, pdm.ErrConflict) {
writeJSON(w, http.StatusConflict, map[string]string{
"error": "subdomain-conflict",
"detail": "this subdomain has been reserved or activated for the chosen pool — pick a different name",
})
return
}
h.log.Error("pdm reserve failed", "id", id, "err", reserveErr)
writeJSON(w, http.StatusServiceUnavailable, map[string]string{
"error": "pdm-unavailable",
"detail": "pool-domain-manager is temporarily unreachable: " + reserveErr.Error(),
})
return
}
dep.pdmReservationToken = reservation.ReservationToken
dep.pdmPoolDomain = reservation.PoolDomain
dep.pdmSubdomain = reservation.Subdomain
}
h.deployments.Store(id, dep)
// Capture status before launching the goroutine — runProvisioning races
@ -181,6 +235,48 @@ func (h *Handler) runProvisioning(dep *Deployment) {
)
}
dep.mu.Unlock()
// PDM lifecycle: on success, /commit with the LB IP; on failure, /release
// so the reservation TTL doesn't have to expire to free the name. PDM is
// the single owner of the Dynadot side-effect (it is also responsible for
// AddSovereignRecords on commit; catalyst-api never writes DNS itself).
if dep.pdmReservationToken != "" && h.pdm != nil {
pdmCtx, pdmCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer pdmCancel()
if err == nil && result != nil {
commitErr := h.pdm.Commit(pdmCtx, dep.pdmPoolDomain, pdm.CommitInput{
Subdomain: dep.pdmSubdomain,
ReservationToken: dep.pdmReservationToken,
SovereignFQDN: result.SovereignFQDN,
LoadBalancerIP: result.LoadBalancerIP,
})
if commitErr != nil {
h.log.Error("pdm commit failed; sovereign is live but DNS records may be stale",
"id", dep.ID,
"poolDomain", dep.pdmPoolDomain,
"subdomain", dep.pdmSubdomain,
"err", commitErr,
)
} else {
h.log.Info("pdm commit complete",
"id", dep.ID,
"poolDomain", dep.pdmPoolDomain,
"subdomain", dep.pdmSubdomain,
"loadBalancerIP", result.LoadBalancerIP,
)
}
} else {
releaseErr := h.pdm.Release(pdmCtx, dep.pdmPoolDomain, dep.pdmSubdomain)
if releaseErr != nil && !errors.Is(releaseErr, pdm.ErrNotFound) {
h.log.Error("pdm release failed; reservation will expire on TTL",
"id", dep.ID,
"poolDomain", dep.pdmPoolDomain,
"subdomain", dep.pdmSubdomain,
"err", releaseErr,
)
}
}
}
}
func newID() string {

View File

@ -0,0 +1,122 @@
// Tests for the catalyst-api → PDM lifecycle: reserve before tofu apply,
// commit on success, release on failure. These cover the deployment-level
// path #163 introduced — the wizard creates a deployment, PDM holds the
// reservation while tofu runs, and PDM owns the eventual DNS write.
package handler
import (
"bytes"
"context"
"encoding/json"
"log/slog"
"net/http"
"net/http/httptest"
"testing"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/pdm"
)
func TestCreateDeployment_ManagedPoolReservesViaPDM(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{}
h := NewWithPDM(slog.Default(), fake)
body, _ := json.Marshal(map[string]any{
"sovereignFQDN": "omantel.omani.works",
"sovereignDomainMode": "pool",
"sovereignPoolDomain": "omani.works",
"sovereignSubdomain": "omantel",
"hetznerToken": "tok",
"hetznerProjectID": "proj",
"region": "fsn1",
"orgName": "Omantel",
"orgEmail": "ops@omantel.om",
"sshPublicKey": "ssh-ed25519 AAAA test",
})
w := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodPost, "/api/v1/deployments", bytes.NewReader(body))
h.CreateDeployment(w, r)
// 201 — deployment row created. The runProvisioning goroutine is
// launched in a background goroutine; in this unit test the goroutine
// will fail at tofu exec (not installed) but for this test we only
// care that CreateDeployment reserved before launching it.
if w.Code != http.StatusCreated {
t.Fatalf("status=%d body=%s", w.Code, w.Body.String())
}
if len(fake.reserves) != 1 {
t.Fatalf("expected 1 PDM reserve, got %d", len(fake.reserves))
}
if fake.reserves[0].pool != "omani.works" || fake.reserves[0].sub != "omantel" {
t.Errorf("reserve called with wrong args: %+v", fake.reserves[0])
}
}
func TestCreateDeployment_PDMConflictBlocksDeployment(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{
reserve: func(ctx context.Context, pool, sub, by string) (*pdm.Reservation, error) {
return nil, pdm.ErrConflict
},
}
h := NewWithPDM(slog.Default(), fake)
body, _ := json.Marshal(map[string]any{
"sovereignFQDN": "omantel.omani.works",
"sovereignDomainMode": "pool",
"sovereignPoolDomain": "omani.works",
"sovereignSubdomain": "omantel",
"hetznerToken": "tok",
"hetznerProjectID": "proj",
"region": "fsn1",
"orgName": "Omantel",
"orgEmail": "ops@omantel.om",
"sshPublicKey": "ssh-ed25519 AAAA test",
})
w := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodPost, "/api/v1/deployments", bytes.NewReader(body))
h.CreateDeployment(w, r)
if w.Code != http.StatusConflict {
t.Fatalf("status=%d want 409 (subdomain-conflict), body=%s", w.Code, w.Body.String())
}
}
func TestCreateDeployment_BYODoesNotReserve(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{}
h := NewWithPDM(slog.Default(), fake)
body, _ := json.Marshal(map[string]any{
"sovereignFQDN": "k8s.acme.io",
"sovereignDomainMode": "byo",
"sovereignPoolDomain": "acme.io",
"sovereignSubdomain": "k8s",
"hetznerToken": "tok",
"hetznerProjectID": "proj",
"region": "fsn1",
"orgName": "Acme",
"orgEmail": "ops@acme.io",
"sshPublicKey": "ssh-ed25519 AAAA test",
})
w := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodPost, "/api/v1/deployments", bytes.NewReader(body))
h.CreateDeployment(w, r)
if w.Code != http.StatusCreated {
t.Fatalf("status=%d body=%s", w.Code, w.Body.String())
}
// BYO must NOT consult PDM — the customer owns DNS.
if len(fake.reserves) != 0 {
t.Errorf("BYO reserved via PDM unexpectedly: %+v", fake.reserves)
}
}

View File

@ -1,3 +1,4 @@
// Package handler holds shared state for all HTTP handlers.
package handler
import (
@ -6,28 +7,62 @@ import (
"net/http"
"os"
"sync"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/pdm"
)
// Handler holds shared state for all HTTP handlers.
//
// dynadotAPIKey + dynadotAPISecret are read from environment variables that
// are mounted from the dynadot-api-credentials K8s secret in the
// openova-system namespace via ESO at deploy time. They are injected into
// pool-domain ProvisionRequests so the provisioner can write DNS records
// for *.{subdomain}.{pool-domain}.
// dynadotAPIKey + dynadotAPISecret remain on the Handler so the OpenTofu
// module's `dynadot_*` variables can still receive credentials for the
// Phase-0 DNS bootstrap that runs at first `tofu apply` time. After #163
// Phase 4 lands the Crossplane Composition that wraps PDM as a declarative
// MR, even those fields go away (PDM holds the credentials; catalyst-api
// merely calls PDM via the in-cluster service FQDN).
//
// pdm is the central authority for OpenOva-pool subdomain allocation
// (introduced by #163). catalyst-api never calls api.dynadot.com directly
// for the availability check / reservation lifecycle after this lands —
// every interaction with the Dynadot zone flows through PDM.
type Handler struct {
log *slog.Logger
deployments sync.Map // map[string]*Deployment
dynadotAPIKey string
dynadotAPISecret string
// pdm — pool-domain-manager client. Required in production; tests can
// inject a fake via NewWithPDM. The default URL points at the in-cluster
// service FQDN so a stock Catalyst-Zero deployment "just works" without
// per-pod configuration.
pdm pdmClient
}
// New creates a Handler.
// New creates a Handler with the runtime configuration loaded from env.
//
// POOL_DOMAIN_MANAGER_URL — defaults to the in-cluster service FQDN. Per
// docs/INVIOLABLE-PRINCIPLES.md #4 the URL is configuration, not code; an
// air-gapped install can override it to point at the operator's own
// PDM endpoint.
func New(log *slog.Logger) *Handler {
pdmURL := os.Getenv("POOL_DOMAIN_MANAGER_URL")
if pdmURL == "" {
pdmURL = "http://pool-domain-manager.openova-system.svc.cluster.local:8080"
}
return &Handler{
log: log,
dynadotAPIKey: os.Getenv("DYNADOT_API_KEY"),
dynadotAPISecret: os.Getenv("DYNADOT_API_SECRET"),
pdm: pdm.New(pdmURL),
}
}
// NewWithPDM is exposed for tests; production code uses New.
func NewWithPDM(log *slog.Logger, client pdmClient) *Handler {
return &Handler{
log: log,
dynadotAPIKey: os.Getenv("DYNADOT_API_KEY"),
dynadotAPISecret: os.Getenv("DYNADOT_API_SECRET"),
pdm: client,
}
}

View File

@ -1,39 +1,34 @@
// Package handler — subdomains.go: pre-submit availability check.
//
// Closes #124 ([I] ux: error handling — what happens if subdomain already
// taken). The wizard's StepOrg debounces keystrokes and POSTs the
// candidate subdomain + pool-domain pair here BEFORE the user clicks
// Next, so the validator catches collisions early instead of failing
// at provisioning time when Dynadot rejects the duplicate record.
// Closes the DNS-wildcard regression in #163 by routing every check for an
// OpenOva-managed pool domain through pool-domain-manager (PDM). PDM is the
// authoritative allocation source — it does not consult DNS at all, so the
// Dynadot wildcard parking record at the apex of omani.works (which made
// EVERY subdomain resolve to 185.53.179.128 and broke the previous
// LookupHost-based check) is now architecturally irrelevant for managed
// pools.
//
// How "taken" is determined:
// Decision tree per request:
//
// 1. Pool-domain check — only OpenOva-managed pool domains are
// candidates for this endpoint; BYO domains are the customer's
// responsibility. We reject any pool the wizard doesn't recognise
// (defence-in-depth — the wizard already filters its own dropdown,
// but the handler must validate independently).
// 1. Validate the subdomain as an RFC 1035 label (cheap, local).
// 2. If poolDomain is in the runtime DYNADOT_MANAGED_DOMAINS list →
// delegate to PDM via Client.Check. PDM owns the reserved-name list
// and the allocation table; we just surface its response verbatim.
// 3. Otherwise the caller is asking about a BYO domain (a customer's own
// DNS zone) — fall back to a DNS-based check via net.LookupHost. PDM
// doesn't manage BYO zones; the customer's nameserver IS the source
// of truth there.
//
// 2. Reserved-name check — short list of RFC 1035 / OpenOva
// Sovereign-control-plane subdomains we never let a tenant claim
// (api, admin, console, gitea, harbor, www, mail). Tenants get
// *those* names automatically as part of the Sovereign FQDN
// structure once they pick their root subdomain.
// Per docs/INVIOLABLE-PRINCIPLES.md #4: PDM's URL is read from the
// POOL_DOMAIN_MANAGER_URL env var (default = in-cluster service FQDN). The
// reserved-name list lives ONLY in PDM after this commit — catalyst-api no
// longer maintains a copy.
//
// 3. DNS resolution — net.DefaultResolver.LookupHost on
// "<subdomain>.<pool>" with a 2-second timeout. If anything
// resolves, the name is considered taken (whether it's an A,
// AAAA, or CNAME, the global DNS already knows about it).
//
// Per docs/INVIOLABLE-PRINCIPLES.md #4 ("never hardcode") the pool
// list is shared with the package-level IsManagedDomain check in
// internal/dynadot/. The reserved-name list is centralised here.
//
// Per the auto-memory `feedback_dynadot_dns.md`: NEVER run exploratory
// set_dns2 calls. We deliberately do NOT call Dynadot's API for the
// availability check — Dynadot's API is write-only-safe. The global
// DNS resolver is the eventually-consistent source of truth for what
// names already point somewhere.
// Per Lesson #24 in docs/INVIOLABLE-PRINCIPLES.md: this is a STRUCTURAL fix,
// not a bandaid. The previous DNS-based path is REMOVED for managed pools,
// not augmented. The only remaining net.LookupHost call lives in the BYO
// branch — and it is the right tool there because BYO zones are owned by
// the customer, not by OpenOva.
package handler
import (
@ -45,37 +40,9 @@ import (
"strings"
"time"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/dynadot"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/pdm"
)
// reservedSubdomains — names we never let a tenant claim as their
// Sovereign root subdomain. Tenants get *.omantel.omani.works style
// records automatically; the wizard prevents claiming any name that
// would collide with the canonical control-plane sub-records.
var reservedSubdomains = map[string]struct{}{
"api": {},
"admin": {},
"console": {},
"gitea": {},
"harbor": {},
"keycloak": {},
"www": {},
"mail": {},
"smtp": {},
"imap": {},
"vpn": {},
"openova": {},
"catalyst": {},
"docs": {},
"status": {},
"app": {},
"system": {},
"openbao": {},
"vault": {},
"flux": {},
"k8s": {},
}
type subdomainCheckRequest struct {
Subdomain string `json:"subdomain"`
// PoolDomain — the apex pool domain (e.g. "omani.works"), NOT the
@ -88,14 +55,19 @@ type subdomainCheckRequest struct {
//
// available=true → subdomain is free, user can submit.
// available=false → subdomain is taken; reason explains why.
// (no error field) → backend reached resolver / pool list cleanly.
//
// reason values:
// reason values (managed pools mirror PDM verbatim, BYO uses local strings):
// "invalid-format" subdomain is not a valid RFC 1035 label
// "unsupported-pool" poolDomain is not an OpenOva-managed pool
// "reserved" subdomain is in reservedSubdomains
// "exists" DNS resolver returned at least one record
// "lookup-error" DNS lookup itself failed (transient — user retries)
// "unsupported-pool" poolDomain is not an OpenOva-managed pool (PDM)
// — only surfaced for the BYO path's sanity check;
// managed-pool requests delegate to PDM which owns
// this verdict.
// "reserved" subdomain is in PDM's reserved list (managed)
// "reserved-state" PDM holds a non-expired reservation (managed)
// "active-state" PDM has an active allocation (managed)
// "exists" BYO DNS resolver returned at least one record
// "lookup-error" BYO DNS lookup itself failed (transient)
// "pdm-unavailable" PDM call failed — wizard treats as transient
type SubdomainCheckResponse struct {
Available bool `json:"available"`
Reason string `json:"reason,omitempty"`
@ -137,36 +109,71 @@ func (h *Handler) CheckSubdomain(w http.ResponseWriter, r *http.Request) {
return
}
if !dynadot.IsManagedDomain(pool) {
// Managed pools — PDM is the authoritative source of truth.
if pdm.IsManagedDomain(pool) {
h.checkManagedPool(w, r.Context(), pool, sub)
return
}
// BYO domain — fall back to the legacy DNS-based check. The customer
// owns the zone; resolving the name is the only signal we have.
h.checkBYO(w, r.Context(), pool, sub)
}
// checkManagedPool delegates to PDM. We surface PDM's response verbatim
// (available, reason, detail, fqdn) so the wizard can render PDM's
// authoritative messages without an extra mapping layer.
func (h *Handler) checkManagedPool(w http.ResponseWriter, ctx context.Context, pool, sub string) {
if h.pdm == nil {
// Defence-in-depth: if the deployment forgot POOL_DOMAIN_MANAGER_URL,
// surface a transient error rather than silently falling back to DNS
// (which would resurrect the wildcard-parking bug this file exists
// to fix).
writeJSON(w, http.StatusOK, SubdomainCheckResponse{
Available: false,
Reason: "unsupported-pool",
Detail: "pool domain " + pool + " is not managed by OpenOva — pick a different pool or use BYO",
Reason: "pdm-unavailable",
Detail: "pool-domain-manager client is not configured — operator must set POOL_DOMAIN_MANAGER_URL",
FQDN: sub + "." + pool,
})
return
}
if _, taken := reservedSubdomains[sub]; taken {
pdmCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
res, err := h.pdm.Check(pdmCtx, pool, sub)
if err != nil {
h.log.Error("pdm check failed", "pool", pool, "sub", sub, "err", err)
writeJSON(w, http.StatusOK, SubdomainCheckResponse{
Available: false,
Reason: "reserved",
Detail: "this subdomain is reserved for the Sovereign control plane — pick a different name",
Reason: "pdm-unavailable",
Detail: "pool-domain-manager is temporarily unreachable — try again",
FQDN: sub + "." + pool,
})
return
}
writeJSON(w, http.StatusOK, SubdomainCheckResponse{
Available: res.Available,
Reason: res.Reason,
Detail: res.Detail,
FQDN: res.FQDN,
})
}
// checkBYO performs the DNS-based availability check for customer-owned
// (Bring-Your-Own) domains. PDM doesn't manage BYO zones — the customer's
// nameserver is the source of truth — so net.LookupHost is the right
// primitive here.
func (h *Handler) checkBYO(w http.ResponseWriter, ctx context.Context, pool, sub string) {
fqdn := sub + "." + pool
// Two-second timeout — long enough for global DNS but short enough
// that the wizard's debounced keystroke loop stays responsive.
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
dnsCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
addrs, err := net.DefaultResolver.LookupHost(ctx, fqdn)
addrs, err := net.DefaultResolver.LookupHost(dnsCtx, fqdn)
if err != nil {
// NXDOMAIN is "not taken" — the most common, success case. Any
// other error class (timeout, server-fail) is a transient lookup
// problem the wizard surfaces but doesn't treat as taken.
if isNXDomain(err) {
writeJSON(w, http.StatusOK, SubdomainCheckResponse{
Available: true,
@ -182,7 +189,6 @@ func (h *Handler) CheckSubdomain(w http.ResponseWriter, r *http.Request) {
})
return
}
if len(addrs) == 0 {
writeJSON(w, http.StatusOK, SubdomainCheckResponse{
Available: true,
@ -190,7 +196,6 @@ func (h *Handler) CheckSubdomain(w http.ResponseWriter, r *http.Request) {
})
return
}
writeJSON(w, http.StatusOK, SubdomainCheckResponse{
Available: false,
Reason: "exists",
@ -235,3 +240,12 @@ func isNXDomain(err error) bool {
}
return false
}
// pdmClient is implemented by *pdm.Client. The interface lets us pass a
// fake in tests without wiring a real HTTP server.
type pdmClient interface {
Check(ctx context.Context, poolDomain, subdomain string) (*pdm.CheckResult, error)
Reserve(ctx context.Context, poolDomain, subdomain, createdBy string) (*pdm.Reservation, error)
Commit(ctx context.Context, poolDomain string, in pdm.CommitInput) error
Release(ctx context.Context, poolDomain, subdomain string) error
}

View File

@ -0,0 +1,260 @@
// Tests for subdomains.go — the catalyst-api side of the PDM contract.
// These cover three architectural invariants:
//
// 1. Managed pools NEVER call net.LookupHost. The DNS-wildcard parking
// record at omani.works (which previously made every subdomain
// resolve to 185.53.179.128) cannot cause a false positive when
// the pool is in DYNADOT_MANAGED_DOMAINS — PDM is the single source
// of truth.
// 2. BYO domains use net.LookupHost — the customer's nameserver is
// authoritative; PDM doesn't manage their zone.
// 3. The PDM client is consulted exactly once per managed-pool request,
// with the response surfaced verbatim.
//
// These guarantees are what prevent the regression #163 was opened for.
package handler
import (
"context"
"encoding/json"
"errors"
"io"
"log/slog"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/pdm"
)
// fakePDM is a stub pdmClient that records every call. We assert against the
// recorded calls to prove the behaviour the architecture requires.
type fakePDM struct {
checks []checkCall
check func(ctx context.Context, pool, sub string) (*pdm.CheckResult, error)
reserves []reserveCall
reserve func(ctx context.Context, pool, sub, by string) (*pdm.Reservation, error)
commits []pdm.CommitInput
commit func(ctx context.Context, pool string, in pdm.CommitInput) error
releases []releaseCall
release func(ctx context.Context, pool, sub string) error
}
type checkCall struct{ pool, sub string }
type reserveCall struct{ pool, sub, by string }
type releaseCall struct{ pool, sub string }
func (f *fakePDM) Check(ctx context.Context, pool, sub string) (*pdm.CheckResult, error) {
f.checks = append(f.checks, checkCall{pool, sub})
if f.check != nil {
return f.check(ctx, pool, sub)
}
return &pdm.CheckResult{Available: true, FQDN: sub + "." + pool}, nil
}
func (f *fakePDM) Reserve(ctx context.Context, pool, sub, by string) (*pdm.Reservation, error) {
f.reserves = append(f.reserves, reserveCall{pool, sub, by})
if f.reserve != nil {
return f.reserve(ctx, pool, sub, by)
}
return &pdm.Reservation{
PoolDomain: pool, Subdomain: sub, State: "reserved",
ReservationToken: "00000000-0000-0000-0000-000000000000",
}, nil
}
func (f *fakePDM) Commit(ctx context.Context, pool string, in pdm.CommitInput) error {
f.commits = append(f.commits, in)
if f.commit != nil {
return f.commit(ctx, pool, in)
}
return nil
}
func (f *fakePDM) Release(ctx context.Context, pool, sub string) error {
f.releases = append(f.releases, releaseCall{pool, sub})
if f.release != nil {
return f.release(ctx, pool, sub)
}
return nil
}
func decodeResp(t *testing.T, body io.Reader) SubdomainCheckResponse {
t.Helper()
var got SubdomainCheckResponse
raw, _ := io.ReadAll(body)
if err := json.Unmarshal(raw, &got); err != nil {
t.Fatalf("decode response: %v (body=%s)", err, string(raw))
}
return got
}
func makeRequest(body string) *http.Request {
r := httptest.NewRequest(http.MethodPost, "/api/v1/subdomains/check", strings.NewReader(body))
r.Header.Set("Content-Type", "application/json")
return r
}
func TestCheckSubdomain_ManagedPoolDelegatesToPDM(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works,openova.io")
pdm.ResetManagedDomains()
fake := &fakePDM{
check: func(ctx context.Context, pool, sub string) (*pdm.CheckResult, error) {
return &pdm.CheckResult{Available: true, FQDN: sub + "." + pool}, nil
},
}
h := NewWithPDM(slog.Default(), fake)
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"dadasg4543sdfs","poolDomain":"omani.works"}`))
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%s", w.Code, w.Body.String())
}
got := decodeResp(t, w.Body)
if !got.Available {
t.Errorf("Available=false body=%+v", got)
}
if len(fake.checks) != 1 || fake.checks[0].pool != "omani.works" || fake.checks[0].sub != "dadasg4543sdfs" {
t.Errorf("expected single PDM check call, got %+v", fake.checks)
}
}
// The architectural invariant: even if the customer happens to type a
// random string that "would" resolve via the omani.works wildcard, PDM
// (which has no DNS dependency) returns Available=true — i.e. the
// wildcard parking record is NEVER consulted on the managed-pool path.
func TestCheckSubdomain_WildcardParkingIsIgnored(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{
check: func(ctx context.Context, pool, sub string) (*pdm.CheckResult, error) {
// PDM has nothing in its allocation table — it returns
// Available=true regardless of what DNS says.
return &pdm.CheckResult{Available: true, FQDN: sub + "." + pool}, nil
},
}
h := NewWithPDM(slog.Default(), fake)
for _, sub := range []string{"foo", "dadasg4543sdfs", "totally-random-name"} {
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"`+sub+`","poolDomain":"omani.works"}`))
got := decodeResp(t, w.Body)
if !got.Available {
t.Errorf("sub=%s: Available=false (wildcard regression!): %+v", sub, got)
}
}
// Exactly one check per call — PDM is consulted, DNS is not.
if len(fake.checks) != 3 {
t.Errorf("expected 3 PDM checks, got %d", len(fake.checks))
}
}
func TestCheckSubdomain_ManagedPoolPDMConflict(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{
check: func(ctx context.Context, pool, sub string) (*pdm.CheckResult, error) {
return &pdm.CheckResult{
Available: false,
Reason: "active-state",
Detail: "this subdomain is already taken by a live Sovereign — pick a different name",
FQDN: sub + "." + pool,
}, nil
},
}
h := NewWithPDM(slog.Default(), fake)
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"omantel","poolDomain":"omani.works"}`))
got := decodeResp(t, w.Body)
if got.Available {
t.Fatalf("expected unavailable, got %+v", got)
}
if got.Reason != "active-state" {
t.Errorf("Reason=%s want active-state", got.Reason)
}
}
func TestCheckSubdomain_BYOFallsBackToDNS(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{}
h := NewWithPDM(slog.Default(), fake)
// Pick a domain that is guaranteed to be in DNS — example.com always
// resolves. The handler should call LookupHost and surface "exists".
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"www","poolDomain":"example.com"}`))
got := decodeResp(t, w.Body)
if got.Available {
t.Errorf("www.example.com should resolve and be unavailable: %+v", got)
}
if len(fake.checks) != 0 {
t.Errorf("BYO path must NOT consult PDM; got %d checks", len(fake.checks))
}
}
func TestCheckSubdomain_BYONXDomainAvailable(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{}
h := NewWithPDM(slog.Default(), fake)
// A guaranteed-NXDOMAIN under example.com (RFC 6761).
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"this-name-must-not-resolve-1234567","poolDomain":"example.com"}`))
got := decodeResp(t, w.Body)
if !got.Available {
t.Errorf("BYO NXDOMAIN should be available, got %+v", got)
}
}
func TestCheckSubdomain_InvalidLabel(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{}
h := NewWithPDM(slog.Default(), fake)
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"-bad-","poolDomain":"omani.works"}`))
got := decodeResp(t, w.Body)
if got.Available {
t.Errorf("invalid label should be unavailable")
}
if got.Reason != "invalid-format" {
t.Errorf("Reason=%s want invalid-format", got.Reason)
}
if len(fake.checks) != 0 {
t.Errorf("invalid label must short-circuit before PDM is called")
}
}
func TestCheckSubdomain_PDMUnavailable(t *testing.T) {
t.Setenv("DYNADOT_MANAGED_DOMAINS", "omani.works")
pdm.ResetManagedDomains()
fake := &fakePDM{
check: func(ctx context.Context, pool, sub string) (*pdm.CheckResult, error) {
return nil, errors.New("connection refused")
},
}
h := NewWithPDM(slog.Default(), fake)
w := httptest.NewRecorder()
h.CheckSubdomain(w, makeRequest(`{"subdomain":"omantel","poolDomain":"omani.works"}`))
got := decodeResp(t, w.Body)
if got.Available {
t.Errorf("PDM unavailable must NOT be reported as Available=true")
}
if got.Reason != "pdm-unavailable" {
t.Errorf("Reason=%s want pdm-unavailable", got.Reason)
}
}

View File

@ -0,0 +1,295 @@
// Package pdm — HTTP client for pool-domain-manager.
//
// This package is the catalyst-api side of the contract introduced by #163.
// PDM owns every Dynadot write in the OpenOva fleet; catalyst-api never calls
// api.dynadot.com directly anymore. The wizard's pre-submit check, the
// reservation taken before `tofu apply`, the commit after the LB IP is known,
// and the release on `tofu destroy` all flow through this client.
//
// Per docs/INVIOLABLE-PRINCIPLES.md #4 the base URL is read from the
// POOL_DOMAIN_MANAGER_URL env var — defaulting to the in-cluster service
// FQDN so a stock catalyst-api deployment "just works" against the PDM
// running in openova-system. Tests/dev override the env var.
package pdm
import (
"bytes"
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"os"
"strings"
"sync"
"time"
)
// Client is the catalyst-api → PDM HTTP client.
type Client struct {
BaseURL string
HTTP *http.Client
}
// New constructs a Client. baseURL must NOT have a trailing slash.
func New(baseURL string) *Client {
return &Client{
BaseURL: strings.TrimRight(baseURL, "/"),
HTTP: &http.Client{Timeout: 15 * time.Second},
}
}
// CheckResult mirrors PDM's response shape — kept loose so the wizard can
// surface PDM's reason/detail strings verbatim without an extra mapping.
type CheckResult struct {
Available bool `json:"available"`
Reason string `json:"reason,omitempty"`
Detail string `json:"detail,omitempty"`
FQDN string `json:"fqdn,omitempty"`
}
// Check calls GET /api/v1/pool/{domain}/check?sub=X.
func (c *Client) Check(ctx context.Context, poolDomain, subdomain string) (*CheckResult, error) {
u := fmt.Sprintf("%s/api/v1/pool/%s/check?sub=%s",
c.BaseURL, url.PathEscape(poolDomain), url.QueryEscape(subdomain))
req, err := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
if err != nil {
return nil, fmt.Errorf("build request: %w", err)
}
resp, err := c.HTTP.Do(req)
if err != nil {
return nil, fmt.Errorf("pdm check: %w", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode >= 500 {
return nil, fmt.Errorf("pdm /check status %d: %s", resp.StatusCode, truncate(string(body), 256))
}
var out CheckResult
if err := json.Unmarshal(body, &out); err != nil {
return nil, fmt.Errorf("decode pdm check: %w (body=%s)", err, truncate(string(body), 256))
}
return &out, nil
}
// Reservation is the wire response of POST /reserve.
type Reservation struct {
PoolDomain string `json:"poolDomain"`
Subdomain string `json:"subdomain"`
State string `json:"state"`
ReservedAt string `json:"reservedAt"`
ExpiresAt string `json:"expiresAt"`
ReservationToken string `json:"reservationToken"`
CreatedBy string `json:"createdBy"`
}
// ErrConflict — PDM returned 409 Conflict (subdomain already taken).
var ErrConflict = errors.New("pool allocation conflict")
// ErrNotFound — PDM returned 404 (no row to commit/release).
var ErrNotFound = errors.New("pool allocation not found")
// Reserve calls POST /api/v1/pool/{domain}/reserve. Returns ErrConflict on
// 409 so callers can distinguish "name taken" from "PDM down".
func (c *Client) Reserve(ctx context.Context, poolDomain, subdomain, createdBy string) (*Reservation, error) {
body := map[string]string{
"subdomain": subdomain,
"createdBy": createdBy,
}
raw, err := json.Marshal(body)
if err != nil {
return nil, err
}
u := fmt.Sprintf("%s/api/v1/pool/%s/reserve", c.BaseURL, url.PathEscape(poolDomain))
req, err := http.NewRequestWithContext(ctx, http.MethodPost, u, bytes.NewReader(raw))
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json")
resp, err := c.HTTP.Do(req)
if err != nil {
return nil, fmt.Errorf("pdm reserve: %w", err)
}
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
switch resp.StatusCode {
case http.StatusCreated:
var out Reservation
if err := json.Unmarshal(respBody, &out); err != nil {
return nil, fmt.Errorf("decode reserve: %w (body=%s)", err, truncate(string(respBody), 256))
}
return &out, nil
case http.StatusConflict:
return nil, ErrConflict
default:
return nil, fmt.Errorf("pdm reserve status %d: %s", resp.StatusCode, truncate(string(respBody), 256))
}
}
// CommitInput maps to PDM's commit body shape.
type CommitInput struct {
Subdomain string
ReservationToken string
SovereignFQDN string
LoadBalancerIP string
}
// Commit calls POST /api/v1/pool/{domain}/commit.
func (c *Client) Commit(ctx context.Context, poolDomain string, in CommitInput) error {
body := map[string]string{
"subdomain": in.Subdomain,
"reservationToken": in.ReservationToken,
"sovereignFQDN": in.SovereignFQDN,
"loadBalancerIP": in.LoadBalancerIP,
}
raw, err := json.Marshal(body)
if err != nil {
return err
}
u := fmt.Sprintf("%s/api/v1/pool/%s/commit", c.BaseURL, url.PathEscape(poolDomain))
req, err := http.NewRequestWithContext(ctx, http.MethodPost, u, bytes.NewReader(raw))
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/json")
resp, err := c.HTTP.Do(req)
if err != nil {
return fmt.Errorf("pdm commit: %w", err)
}
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
switch resp.StatusCode {
case http.StatusOK, http.StatusAccepted:
return nil
case http.StatusNotFound:
return ErrNotFound
default:
return fmt.Errorf("pdm commit status %d: %s", resp.StatusCode, truncate(string(respBody), 256))
}
}
// Release calls DELETE /api/v1/pool/{domain}/release.
func (c *Client) Release(ctx context.Context, poolDomain, subdomain string) error {
body := map[string]string{"subdomain": subdomain}
raw, err := json.Marshal(body)
if err != nil {
return err
}
u := fmt.Sprintf("%s/api/v1/pool/%s/release", c.BaseURL, url.PathEscape(poolDomain))
req, err := http.NewRequestWithContext(ctx, http.MethodDelete, u, bytes.NewReader(raw))
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/json")
resp, err := c.HTTP.Do(req)
if err != nil {
return fmt.Errorf("pdm release: %w", err)
}
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
switch resp.StatusCode {
case http.StatusOK, http.StatusAccepted:
return nil
case http.StatusNotFound:
return ErrNotFound
default:
return fmt.Errorf("pdm release status %d: %s", resp.StatusCode, truncate(string(respBody), 256))
}
}
func truncate(s string, max int) string {
if len(s) <= max {
return s
}
return s[:max] + "..."
}
// ── Managed-pool resolution ─────────────────────────────────────────────
//
// catalyst-api needs to know which pool domains PDM owns (so it knows when
// to delegate to PDM vs. fall back to the BYO/DNS path). PDM exposes the
// list at /healthz, but caching that on every wizard keystroke is wasteful.
// Instead — per docs/INVIOLABLE-PRINCIPLES.md #4 — we read the same
// DYNADOT_MANAGED_DOMAINS env var that the K8s ExternalSecret projects into
// the PDM Pod, and that the same secret can project into the catalyst-api
// Pod for this purpose. The env var value is the contract; PDM is the writer.
var managedDomainsState struct {
once sync.Once
set map[string]struct{}
}
// IsManagedDomain reports whether the given domain is in the runtime
// DYNADOT_MANAGED_DOMAINS list. catalyst-api uses this to route /check
// requests: managed → PDM, BYO → DNS lookup.
//
// Resolution order mirrors the legacy dynadot package's so a deployment
// migrating to PDM keeps working without secret edits:
// 1. DYNADOT_MANAGED_DOMAINS env var (canonical)
// 2. DYNADOT_DOMAIN single-value fallback
// 3. Built-in defaults: openova.io, omani.works
func IsManagedDomain(domain string) bool {
d := strings.ToLower(strings.TrimSpace(domain))
if d == "" {
return false
}
managedDomainsState.once.Do(func() {
managedDomainsState.set = computeManagedDomains()
})
_, ok := managedDomainsState.set[d]
return ok
}
// ResetManagedDomains clears the cache so tests can re-evaluate after
// mutating env vars.
func ResetManagedDomains() {
managedDomainsState.once = sync.Once{}
managedDomainsState.set = nil
}
// ManagedDomains returns a sorted, deduplicated copy of the configured
// managed-domain list.
func ManagedDomains() []string {
managedDomainsState.once.Do(func() {
managedDomainsState.set = computeManagedDomains()
})
out := make([]string, 0, len(managedDomainsState.set))
for d := range managedDomainsState.set {
out = append(out, d)
}
for i := 1; i < len(out); i++ {
for j := i; j > 0 && out[j-1] > out[j]; j-- {
out[j-1], out[j] = out[j], out[j-1]
}
}
return out
}
func computeManagedDomains() map[string]struct{} {
out := make(map[string]struct{})
if raw := os.Getenv("DYNADOT_MANAGED_DOMAINS"); strings.TrimSpace(raw) != "" {
out = splitDomainsList(raw)
if len(out) > 0 {
return out
}
}
if d := strings.ToLower(strings.TrimSpace(os.Getenv("DYNADOT_DOMAIN"))); d != "" {
out[d] = struct{}{}
return out
}
out["openova.io"] = struct{}{}
out["omani.works"] = struct{}{}
return out
}
func splitDomainsList(raw string) map[string]struct{} {
raw = strings.ToLower(raw)
raw = strings.ReplaceAll(raw, ",", " ")
out := make(map[string]struct{})
for _, p := range strings.Fields(raw) {
out[p] = struct{}{}
}
return out
}