Build a new Go service core/pool-domain-manager that becomes the SOLE
authority for OpenOva-pool subdomain allocation across the fleet.
Why this exists: today products/catalyst/bootstrap/api/internal/handler/
subdomains.go does naive net.LookupHost() to decide whether a candidate
subdomain is taken. Dynadot's wildcard parking record at the apex of
omani.works (and any future pool domain) makes EVERY subdomain resolve
to 185.53.179.128, so the check rejects everything. DNS is the wrong
source of truth for an OpenOva-managed pool — the central control plane
must own the allocation table.
What this commit adds (no integration with catalyst-api yet — that lands
in a follow-up commit):
core/pool-domain-manager/
cmd/pdm/main.go chi router, healthz, sweeper boot
api/openapi.yaml wire contract for every endpoint
Containerfile alpine final stage, UID 65534
internal/store/ pgx + CNPG; pool_allocations table
migrations.sql idempotent CREATE TABLE schema
store.go Reserve/Get/Commit/Release/List
store_test.go integration tests (PDM_TEST_DSN)
internal/dynadot/ moved + extended; SOLE Dynadot caller
dynadot.go AddRecord, AddSovereignRecords,
DeleteSubdomainRecords (read-modify-
write to honour feedback_dynadot_dns)
dynadot_test.go managed-domain resolution tests
internal/reserved/ centralised reserved-name list
reserved.go IsReserved/All; pulled out of
catalyst-api's subdomains.go
internal/handler/ HTTP surface
handler.go /api/v1/pool/{domain}/{check,reserve,
commit,release,list}, /healthz,
/api/v1/reserved
internal/allocator/ state machine + sweeper goroutine
Architecture choices and how they map to docs/INVIOLABLE-PRINCIPLES.md:
- Principle #4 (never hardcode): every value (PORT, PDM_DATABASE_URL,
DYNADOT_MANAGED_DOMAINS, PDM_RESERVATION_TTL, PDM_SWEEPER_INTERVAL)
flows from env vars; the K8s ExternalSecret will populate them at
deploy time. The reserved-subdomain list lives in ONE place
(internal/reserved); catalyst-api will not duplicate it.
- Principle #2 (no quality compromise): the state machine commits the
DB row before the Dynadot side-effect, so a crash between the two
leaves the system in a recoverable state (operator runs Release).
The reservation_token in the row protects against stale-tab commit
races. UPSERT semantics + a CHECK constraint mean two operators
racing /reserve get a clean 23505 (unique_violation) → HTTP 409.
- Principle #3 (follow architecture): PDM is a ClusterIP service in
openova-system — it is not a Crossplane provider, not a Flux
HelmRelease, not bespoke OpenTofu state. catalyst-api speaks to it
via plain HTTP. The Crossplane Composition that wraps PDM as a
declarative MR (XDynadotPoolAllocation) lands in a follow-up phase.
The DNS-wildcard problem the issue describes is fixed STRUCTURALLY here:
PDM never calls net.LookupHost. The /check path is a single SELECT
against pool_allocations. omani.works's wildcard A record at the apex
becomes architecturally irrelevant.
Tests exercised in this commit:
- internal/reserved: full unit coverage (case-insensitive, sorted, set
membership)
- internal/dynadot: managed-domain runtime resolution (env-var,
legacy single-domain fallback, built-in defaults, list parsing)
- internal/store: integration suite gated on PDM_TEST_DSN env var,
covers reserve happy-path, reserve race (ErrConflict), TTL expiry
frees, commit happy-path, commit token mismatch, release removes
row, sweeper deletes expired rows
Closes phase 1 of #163. Phase 2 (catalyst-api wiring), Phase 3 (CI +
manifests), Phase 4 (Crossplane composition), Phase 6 (deploy +
verification curl) follow in separate commits.
Refs: #163
39 lines
1.2 KiB
Docker
39 lines
1.2 KiB
Docker
# pool-domain-manager — central authority for OpenOva-pool subdomain
|
|
# allocation. Per docs/INVIOLABLE-PRINCIPLES.md the image is statically
|
|
# compiled, runs as a non-root numeric UID, and ships nothing beyond the
|
|
# binary + CA bundle.
|
|
#
|
|
# Two stages:
|
|
# build — golang:1.23-alpine with go modules cached
|
|
# final — alpine:3.20 minimal runtime (CA certs + the binary)
|
|
|
|
FROM docker.io/library/golang:1.23-alpine AS build
|
|
WORKDIR /app
|
|
|
|
# Cache layer for go.mod / go.sum so day-to-day source rebuilds skip the
|
|
# module download.
|
|
COPY go.mod go.sum ./
|
|
RUN go mod download
|
|
|
|
COPY . .
|
|
RUN CGO_ENABLED=0 GOOS=linux go build \
|
|
-ldflags="-s -w -X main.version=$(cat /etc/hostname)" \
|
|
-o /pdm ./cmd/pdm
|
|
|
|
# Use a minimal runtime stage. We need:
|
|
# - ca-certificates so the Dynadot HTTPS calls can verify the API cert
|
|
# - tzdata so timestamps render correctly in operator logs
|
|
# Nothing else.
|
|
FROM docker.io/library/alpine:3.20
|
|
|
|
RUN apk add --no-cache ca-certificates tzdata
|
|
COPY --from=build /pdm /pdm
|
|
|
|
# Alpine 3.20 already ships UID 65534 as `nobody`. Reuse that rather than
|
|
# creating a duplicate `nonroot` account. The numeric form satisfies
|
|
# runAsNonRoot=true + runAsUser=65534 in the Deployment.
|
|
USER 65534:65534
|
|
|
|
EXPOSE 8080
|
|
ENTRYPOINT ["/pdm"]
|