openova/core/pool-domain-manager/cmd/pdm/main.go
hatiyildiz 585b046f5d feat(pdm): pool-domain-manager service skeleton (Phase 1 of #163)
Build a new Go service core/pool-domain-manager that becomes the SOLE
authority for OpenOva-pool subdomain allocation across the fleet.

Why this exists: today products/catalyst/bootstrap/api/internal/handler/
subdomains.go does naive net.LookupHost() to decide whether a candidate
subdomain is taken. Dynadot's wildcard parking record at the apex of
omani.works (and any future pool domain) makes EVERY subdomain resolve
to 185.53.179.128, so the check rejects everything. DNS is the wrong
source of truth for an OpenOva-managed pool — the central control plane
must own the allocation table.

What this commit adds (no integration with catalyst-api yet — that lands
in a follow-up commit):

  core/pool-domain-manager/
    cmd/pdm/main.go                     chi router, healthz, sweeper boot
    api/openapi.yaml                     wire contract for every endpoint
    Containerfile                        alpine final stage, UID 65534
    internal/store/                      pgx + CNPG; pool_allocations table
      migrations.sql                       idempotent CREATE TABLE schema
      store.go                             Reserve/Get/Commit/Release/List
      store_test.go                        integration tests (PDM_TEST_DSN)
    internal/dynadot/                    moved + extended; SOLE Dynadot caller
      dynadot.go                           AddRecord, AddSovereignRecords,
                                           DeleteSubdomainRecords (read-modify-
                                           write to honour feedback_dynadot_dns)
      dynadot_test.go                      managed-domain resolution tests
    internal/reserved/                   centralised reserved-name list
      reserved.go                          IsReserved/All; pulled out of
                                           catalyst-api's subdomains.go
    internal/handler/                    HTTP surface
      handler.go                           /api/v1/pool/{domain}/{check,reserve,
                                           commit,release,list}, /healthz,
                                           /api/v1/reserved
    internal/allocator/                  state machine + sweeper goroutine

Architecture choices and how they map to docs/INVIOLABLE-PRINCIPLES.md:

  - Principle #4 (never hardcode): every value (PORT, PDM_DATABASE_URL,
    DYNADOT_MANAGED_DOMAINS, PDM_RESERVATION_TTL, PDM_SWEEPER_INTERVAL)
    flows from env vars; the K8s ExternalSecret will populate them at
    deploy time. The reserved-subdomain list lives in ONE place
    (internal/reserved); catalyst-api will not duplicate it.

  - Principle #2 (no quality compromise): the state machine commits the
    DB row before the Dynadot side-effect, so a crash between the two
    leaves the system in a recoverable state (operator runs Release).
    The reservation_token in the row protects against stale-tab commit
    races. UPSERT semantics + a CHECK constraint mean two operators
    racing /reserve get a clean 23505 (unique_violation) → HTTP 409.

  - Principle #3 (follow architecture): PDM is a ClusterIP service in
    openova-system — it is not a Crossplane provider, not a Flux
    HelmRelease, not bespoke OpenTofu state. catalyst-api speaks to it
    via plain HTTP. The Crossplane Composition that wraps PDM as a
    declarative MR (XDynadotPoolAllocation) lands in a follow-up phase.

The DNS-wildcard problem the issue describes is fixed STRUCTURALLY here:
PDM never calls net.LookupHost. The /check path is a single SELECT
against pool_allocations. omani.works's wildcard A record at the apex
becomes architecturally irrelevant.

Tests exercised in this commit:
  - internal/reserved: full unit coverage (case-insensitive, sorted, set
    membership)
  - internal/dynadot: managed-domain runtime resolution (env-var,
    legacy single-domain fallback, built-in defaults, list parsing)
  - internal/store: integration suite gated on PDM_TEST_DSN env var,
    covers reserve happy-path, reserve race (ErrConflict), TTL expiry
    frees, commit happy-path, commit token mismatch, release removes
    row, sweeper deletes expired rows

Closes phase 1 of #163. Phase 2 (catalyst-api wiring), Phase 3 (CI +
manifests), Phase 4 (Crossplane composition), Phase 6 (deploy +
verification curl) follow in separate commits.

Refs: #163
2026-04-29 06:37:38 +02:00

181 lines
5.2 KiB
Go

// Command pdm — pool-domain-manager service entrypoint.
//
// Wires CNPG/Postgres (store), the Dynadot client, and the chi-based HTTP
// router. Starts the TTL-expiry sweeper as a goroutine. Handles SIGTERM by
// closing the listener gracefully so K8s rolling deploys finish in-flight
// requests before the pod terminates.
//
// All configuration is read from environment variables — per
// docs/INVIOLABLE-PRINCIPLES.md #4 nothing here is hardcoded:
//
// PORT — listen port (default 8080)
// PDM_DATABASE_URL — postgres DSN, REQUIRED
// DYNADOT_API_KEY — dynadot api key, REQUIRED
// DYNADOT_API_SECRET — dynadot api secret, REQUIRED
// DYNADOT_MANAGED_DOMAINS — comma-separated managed pool list
// DYNADOT_DOMAIN — legacy single-domain fallback
// PDM_RESERVATION_TTL — go duration string, default "10m"
// PDM_SWEEPER_INTERVAL — go duration string, default "30s"
// PDM_LOG_LEVEL — debug | info | warn | error (default info)
package main
import (
"context"
"errors"
"log/slog"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/go-chi/chi/v5"
"github.com/go-chi/chi/v5/middleware"
"github.com/openova-io/openova/core/pool-domain-manager/internal/allocator"
"github.com/openova-io/openova/core/pool-domain-manager/internal/dynadot"
"github.com/openova-io/openova/core/pool-domain-manager/internal/handler"
"github.com/openova-io/openova/core/pool-domain-manager/internal/store"
)
func main() {
log := newLogger(env("PDM_LOG_LEVEL", "info"))
slog.SetDefault(log)
cfg, err := loadConfig()
if err != nil {
log.Error("config load failed", "err", err)
os.Exit(2)
}
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer cancel()
startCtx, startCancel := context.WithTimeout(ctx, 30*time.Second)
defer startCancel()
s, err := store.New(startCtx, cfg.DatabaseURL)
if err != nil {
log.Error("postgres connect failed", "err", err)
os.Exit(1)
}
defer s.Close()
dyn := dynadot.New(cfg.DynadotAPIKey, cfg.DynadotAPISecret)
alloc := allocator.New(s, dyn, log, cfg.ReservationTTL)
go alloc.RunSweeper(ctx, cfg.SweeperInterval)
h := handler.New(alloc, s, log)
root := chi.NewRouter()
root.Use(middleware.RequestID)
root.Use(middleware.RealIP)
root.Use(middleware.Logger)
root.Use(middleware.Recoverer)
root.Mount("/", h.Routes())
srv := &http.Server{
Addr: ":" + cfg.Port,
Handler: root,
ReadHeaderTimeout: 10 * time.Second,
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 2 * time.Minute,
}
// Surface the managed-domain list at startup so operators can grep logs
// for misconfiguration (e.g. typo in the secret's `domains` key).
log.Info("pool-domain-manager starting",
"port", cfg.Port,
"reservationTTL", cfg.ReservationTTL.String(),
"sweeperInterval", cfg.SweeperInterval.String(),
"managedDomains", dynadot.ManagedDomains(),
)
go func() {
if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Error("http server failed", "err", err)
os.Exit(1)
}
}()
<-ctx.Done()
log.Info("shutdown signal received, draining")
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 20*time.Second)
defer shutdownCancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
log.Error("graceful shutdown failed", "err", err)
os.Exit(1)
}
log.Info("shutdown complete")
}
// config bundles the runtime configuration so loadConfig can return a single
// struct + error.
type config struct {
Port string
DatabaseURL string
DynadotAPIKey string
DynadotAPISecret string
ReservationTTL time.Duration
SweeperInterval time.Duration
}
func loadConfig() (*config, error) {
c := &config{
Port: env("PORT", "8080"),
}
c.DatabaseURL = strings.TrimSpace(os.Getenv("PDM_DATABASE_URL"))
if c.DatabaseURL == "" {
return nil, errors.New("PDM_DATABASE_URL is required")
}
c.DynadotAPIKey = strings.TrimSpace(os.Getenv("DYNADOT_API_KEY"))
if c.DynadotAPIKey == "" {
return nil, errors.New("DYNADOT_API_KEY is required")
}
c.DynadotAPISecret = strings.TrimSpace(os.Getenv("DYNADOT_API_SECRET"))
if c.DynadotAPISecret == "" {
return nil, errors.New("DYNADOT_API_SECRET is required")
}
ttlStr := env("PDM_RESERVATION_TTL", "10m")
ttl, err := time.ParseDuration(ttlStr)
if err != nil {
return nil, errors.New("PDM_RESERVATION_TTL is not a valid duration: " + err.Error())
}
c.ReservationTTL = ttl
swStr := env("PDM_SWEEPER_INTERVAL", "30s")
sw, err := time.ParseDuration(swStr)
if err != nil {
return nil, errors.New("PDM_SWEEPER_INTERVAL is not a valid duration: " + err.Error())
}
c.SweeperInterval = sw
return c, nil
}
func env(key, fallback string) string {
if v := os.Getenv(key); v != "" {
return v
}
return fallback
}
func newLogger(level string) *slog.Logger {
var lvl slog.Level
switch strings.ToLower(level) {
case "debug":
lvl = slog.LevelDebug
case "warn":
lvl = slog.LevelWarn
case "error":
lvl = slog.LevelError
default:
lvl = slog.LevelInfo
}
return slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: lvl}))
}