Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer
in bp-cert-manager by shipping the missing piece — a Go binary that
satisfies cert-manager's external webhook contract
(`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json.
Architecture
============
* `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with
pool-domain-manager and catalyst-dns). Encapsulates the api3.json
transport, command builders, response decoding, and the safe
read-modify-write semantics required to never accidentally wipe a
zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2`
variant is unexported.
* `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook
binary. Implements `Solver.Present` via the client's append-only
`AddRecord` path and `Solver.CleanUp` via the read-modify-write
`RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`)
rejects challenges for unmanaged apexes BEFORE any Dynadot call.
* `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm
wrapper. Templates Deployment + Service + APIService + serving
Certificate (CA chain via cert-manager Issuer self-signing) +
RBAC + ServiceAccount. Mirrors the standard cert-manager external-
webhook deployment shape.
* `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the
paired ClusterIssuer activates. The interim http01 issuer remains
templated as the rollback path.
Test results
============
core/pkg/dynadot-client — 7 tests PASS (race-clean)
core/cmd/cert-manager-dynadot-... — 9 tests PASS (race-clean)
Test coverage includes a Present/CleanUp round-trip against an
httptest fixture that models Dynadot's zone state, an explicit
unmanaged-domain rejection, a regression preserving a pre-existing
CNAME across the DNS-01 round-trip (the zone-wipe defence), and a
typed-error propagation test that surfaces `ErrInvalidToken` to
cert-manager so the controller will retry.
Helm template smoke render
==========================
`helm template` against the new chart with default values yields 12
resources / 424 lines (APIService, Certificate, ClusterRoleBinding,
Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The
modified bp-cert-manager chart still renders both ClusterIssuers
(`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default
values; flipping `certManager.issuers.dns01.enabled=false` is the
clean rollback.
Smoke command (post-deploy)
===========================
kubectl get apiservices.apiregistration.k8s.io \
v1alpha1.acme.dynadot.openova.io
# Issue a *.<sovereign>.<pool> wildcard cert and watch the
# Order/Challenge progress through cert-manager.
CI
==
`.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the
pool-domain-manager-build pattern (cosign keyless signing, SBOM
attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager-
dynadot-webhook:<sha>`). Triggered by changes to either the binary or
the shared dynadot-client package.
Closes #159
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
310 lines
12 KiB
Go
310 lines
12 KiB
Go
// Command cert-manager-dynadot-webhook is the cert-manager external
|
|
// DNS-01 webhook for Dynadot.
|
|
//
|
|
// It implements the cert-manager webhook contract documented at
|
|
// https://cert-manager.io/docs/configuration/acme/dns01/webhook/ and
|
|
// uses the canonical Dynadot HTTP client at
|
|
// github.com/openova-io/openova/core/pkg/dynadot-client to perform the
|
|
// underlying record mutations.
|
|
//
|
|
// Why this binary exists separately from external-dns-dynadot-webhook:
|
|
// the external-dns webhook contract is a different protocol (records.list /
|
|
// records.add / records.delete RPCs) — see
|
|
// platform/cert-manager/chart/templates/clusterissuer-letsencrypt-dns01.yaml
|
|
// for the historical context. cert-manager's webhook is an aggregated
|
|
// apiserver registered via APIService, served on TCP/443 with mTLS, and
|
|
// receives ChallengeRequest objects for Present/CleanUp.
|
|
//
|
|
// Configuration is environment-variable driven so a Sovereign overlay can
|
|
// retune the binary without rebuilding the image (per
|
|
// docs/INVIOLABLE-PRINCIPLES.md #4):
|
|
//
|
|
// GROUP_NAME — webhook API group, default
|
|
// "acme.dynadot.openova.io". MUST match the
|
|
// ClusterIssuer's solvers[].dns01.webhook.groupName.
|
|
// DYNADOT_API_KEY — Dynadot api3.json API key. REQUIRED.
|
|
// DYNADOT_API_SECRET — Dynadot api3.json API secret. REQUIRED.
|
|
// DYNADOT_MANAGED_DOMAINS — comma- or whitespace-separated allowlist
|
|
// of pool domains the webhook is permitted
|
|
// to mutate (e.g.
|
|
// "openova.io,omani.works,omanyx.works").
|
|
// REQUIRED for production; allowlist is a
|
|
// defence against a misconfigured or stolen
|
|
// ClusterIssuer pointing at a third-party
|
|
// domain. Single-domain operators may set
|
|
// DYNADOT_DOMAIN as a fallback.
|
|
// DYNADOT_DOMAIN — optional single-domain fallback when
|
|
// DYNADOT_MANAGED_DOMAINS is empty. Honoured
|
|
// for parity with pool-domain-manager (#108).
|
|
// DYNADOT_BASE_URL — override for tests; production uses
|
|
// https://api.dynadot.com/api3.json.
|
|
//
|
|
// At Present time the webhook splits the ChallengeRequest's ResolvedFQDN
|
|
// into (subdomain, apex) by matching the apex against the managed-domains
|
|
// allowlist, then writes a TXT record at `_acme-challenge.<subdomain>`
|
|
// using AddRecord (append-only path — never wipes the zone, see
|
|
// core/pkg/dynadot-client/doc.go safety contract). At CleanUp it does a
|
|
// safe read-modify-write via RemoveSubRecord.
|
|
//
|
|
// Idempotency: cert-manager retries Present and CleanUp on transient
|
|
// errors. AddRecord is idempotent because Dynadot dedupes by
|
|
// (subdomain, type, value); RemoveSubRecord returns nil when nothing
|
|
// matches. Both behaviours are required by the webhook spec.
|
|
package main
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
"fmt"
|
|
"log/slog"
|
|
"os"
|
|
"strings"
|
|
|
|
dynadot "github.com/openova-io/openova/core/pkg/dynadot-client"
|
|
|
|
"github.com/cert-manager/cert-manager/pkg/acme/webhook"
|
|
"github.com/cert-manager/cert-manager/pkg/acme/webhook/apis/acme/v1alpha1"
|
|
"github.com/cert-manager/cert-manager/pkg/acme/webhook/cmd"
|
|
"k8s.io/client-go/rest"
|
|
)
|
|
|
|
// defaultGroupName matches the value baked into
|
|
// platform/cert-manager/chart/templates/clusterissuer-letsencrypt-dns01.yaml.
|
|
// Operators MAY override via the GROUP_NAME env so a Sovereign overlay
|
|
// can retune the API group without rebuilding the image.
|
|
const defaultGroupName = "acme.dynadot.openova.io"
|
|
|
|
func main() {
|
|
logger := slog.New(slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{
|
|
Level: parseLogLevel(os.Getenv("LOG_LEVEL")),
|
|
}))
|
|
slog.SetDefault(logger)
|
|
|
|
groupName := strings.TrimSpace(os.Getenv("GROUP_NAME"))
|
|
if groupName == "" {
|
|
groupName = defaultGroupName
|
|
}
|
|
|
|
solver, err := newDynadotSolver(loadConfigFromEnv())
|
|
if err != nil {
|
|
logger.Error("solver init failed", "err", err)
|
|
os.Exit(2)
|
|
}
|
|
|
|
logger.Info("cert-manager-dynadot-webhook starting",
|
|
"groupName", groupName,
|
|
"managedDomains", solver.managed.List(),
|
|
)
|
|
|
|
// RunWebhookServer blocks until the apiserver process is signalled
|
|
// to terminate. It reads --secure-port / --tls-cert-file etc. from
|
|
// argv (set by the chart's args:) and serves the aggregated apiserver
|
|
// that cert-manager calls into.
|
|
cmd.RunWebhookServer(groupName, solver)
|
|
}
|
|
|
|
// solverConfig is the fully-resolved configuration of the webhook,
|
|
// captured into a struct so the unit tests can inject overrides without
|
|
// touching process-global env state.
|
|
type solverConfig struct {
|
|
APIKey string
|
|
APISecret string
|
|
ManagedDomains string
|
|
Fallback string // legacy DYNADOT_DOMAIN
|
|
BaseURL string // optional override for tests
|
|
}
|
|
|
|
// loadConfigFromEnv builds a solverConfig from the documented env vars.
|
|
func loadConfigFromEnv() solverConfig {
|
|
return solverConfig{
|
|
APIKey: os.Getenv("DYNADOT_API_KEY"),
|
|
APISecret: os.Getenv("DYNADOT_API_SECRET"),
|
|
ManagedDomains: os.Getenv("DYNADOT_MANAGED_DOMAINS"),
|
|
Fallback: os.Getenv("DYNADOT_DOMAIN"),
|
|
BaseURL: os.Getenv("DYNADOT_BASE_URL"),
|
|
}
|
|
}
|
|
|
|
// dynadotSolver is the cert-manager webhook.Solver implementation.
|
|
//
|
|
// It is split from main() so tests can construct one with a fixture
|
|
// httptest.Server and a deterministic managed-domain list, then drive
|
|
// Present / CleanUp directly without wiring up the aggregated-apiserver
|
|
// transport.
|
|
type dynadotSolver struct {
|
|
client *dynadot.Client
|
|
managed *dynadot.ManagedDomains
|
|
}
|
|
|
|
// newDynadotSolver validates configuration and constructs a solver.
|
|
// Returns an error rather than panicking so the caller's structured
|
|
// logger can surface a clean error path on misconfiguration.
|
|
func newDynadotSolver(cfg solverConfig) (*dynadotSolver, error) {
|
|
if strings.TrimSpace(cfg.APIKey) == "" || strings.TrimSpace(cfg.APISecret) == "" {
|
|
return nil, errors.New("DYNADOT_API_KEY and DYNADOT_API_SECRET are required")
|
|
}
|
|
managedRaw := cfg.ManagedDomains
|
|
if strings.TrimSpace(managedRaw) == "" {
|
|
managedRaw = cfg.Fallback
|
|
}
|
|
if strings.TrimSpace(managedRaw) == "" {
|
|
return nil, errors.New("DYNADOT_MANAGED_DOMAINS (or legacy DYNADOT_DOMAIN) must list at least one domain")
|
|
}
|
|
c := dynadot.New(cfg.APIKey, cfg.APISecret)
|
|
if cfg.BaseURL != "" {
|
|
c.BaseURL = cfg.BaseURL
|
|
}
|
|
return &dynadotSolver{
|
|
client: c,
|
|
managed: dynadot.NewManagedDomains(managedRaw),
|
|
}, nil
|
|
}
|
|
|
|
// Name is the solverName referenced by the ClusterIssuer's
|
|
// solvers[].dns01.webhook.solverName field. cert-manager dispatches to
|
|
// this solver only when the issuer's solverName matches.
|
|
func (s *dynadotSolver) Name() string { return "dynadot" }
|
|
|
|
// Initialize is a no-op for this webhook. cert-manager passes its own
|
|
// kube REST config in case a solver wants to reconcile a CR; we don't.
|
|
// The signal channel is closed on shutdown — callers must return
|
|
// promptly when it closes; since Initialize itself returns immediately,
|
|
// there is nothing to wind down.
|
|
func (s *dynadotSolver) Initialize(_ *rest.Config, _ <-chan struct{}) error {
|
|
return nil
|
|
}
|
|
|
|
// Present writes the TXT record cert-manager needs Let's Encrypt to see
|
|
// at `_acme-challenge.<subdomain>` on the apex domain.
|
|
//
|
|
// The ChallengeRequest carries:
|
|
// - ResolvedFQDN — fully-qualified challenge name with trailing dot,
|
|
// e.g. "_acme-challenge.console.omantel.omani.works."
|
|
// - ResolvedZone — the zone cert-manager believes is authoritative,
|
|
// e.g. "omani.works."
|
|
// - Key — the TXT value Let's Encrypt is expecting.
|
|
//
|
|
// We resolve apex from the managed-domains allowlist (NOT from
|
|
// ResolvedZone) so a misconfigured Issuer or compromised
|
|
// kube-apiserver cannot trick the webhook into mutating a domain we
|
|
// don't own. If no managed domain is a suffix of ResolvedFQDN the
|
|
// challenge is rejected with a typed error.
|
|
func (s *dynadotSolver) Present(ch *v1alpha1.ChallengeRequest) error {
|
|
apex, sub, err := s.resolveDomain(ch.ResolvedFQDN)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
slog.Info("Present",
|
|
"apex", apex, "subdomain", sub,
|
|
"resolvedFQDN", ch.ResolvedFQDN, "resolvedZone", ch.ResolvedZone,
|
|
)
|
|
|
|
rec := dynadot.Record{
|
|
Subdomain: sub,
|
|
Type: "TXT",
|
|
Value: ch.Key,
|
|
TTL: 60,
|
|
}
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), defaultPresentTimeout)
|
|
defer cancel()
|
|
if err := s.client.AddRecord(ctx, apex, rec); err != nil {
|
|
return fmt.Errorf("dynadot AddRecord %s/%s TXT: %w", apex, sub, err)
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// CleanUp removes the TXT record written by Present.
|
|
//
|
|
// Per the webhook spec, CleanUp MUST be idempotent — Let's Encrypt may
|
|
// have already validated the challenge, or cert-manager may retry after
|
|
// a transient failure. RemoveSubRecord uses GetDomainInfo →
|
|
// SetFullDNS so the entire zone state is preserved verbatim except for
|
|
// the matching record; if no matching record exists, it returns nil.
|
|
//
|
|
// The match key is (subdomain, TXT, key) — we DO NOT remove every TXT
|
|
// at `_acme-challenge.<subdomain>` because two parallel orders for the
|
|
// same hostname (concurrent renewal + new cert) write different keys to
|
|
// the same name and BOTH must validate.
|
|
func (s *dynadotSolver) CleanUp(ch *v1alpha1.ChallengeRequest) error {
|
|
apex, sub, err := s.resolveDomain(ch.ResolvedFQDN)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
slog.Info("CleanUp",
|
|
"apex", apex, "subdomain", sub,
|
|
"resolvedFQDN", ch.ResolvedFQDN, "resolvedZone", ch.ResolvedZone,
|
|
)
|
|
|
|
match := dynadot.Record{
|
|
Subdomain: sub,
|
|
Type: "TXT",
|
|
Value: ch.Key,
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), defaultCleanUpTimeout)
|
|
defer cancel()
|
|
if err := s.client.RemoveSubRecord(ctx, apex, match); err != nil {
|
|
return fmt.Errorf("dynadot RemoveSubRecord %s/%s TXT: %w", apex, sub, err)
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// resolveDomain matches a fully-qualified ACME challenge FQDN against
|
|
// the managed-domains allowlist and returns (apex, subdomain) suitable
|
|
// for the Dynadot api3.json `set_dns2` parameters.
|
|
//
|
|
// Examples:
|
|
//
|
|
// "_acme-challenge.console.omantel.omani.works." with apex "omani.works"
|
|
// → apex="omani.works", subdomain="_acme-challenge.console.omantel"
|
|
// "_acme-challenge.openova.io." with apex "openova.io"
|
|
// → apex="openova.io", subdomain="_acme-challenge"
|
|
//
|
|
// We strip the trailing dot, lowercase, and pick the longest matching
|
|
// apex from the allowlist (so "omani.works" wins over "works" if both
|
|
// were configured — guards against operator typos).
|
|
func (s *dynadotSolver) resolveDomain(fqdn string) (apex, sub string, err error) {
|
|
host := strings.ToLower(strings.TrimSuffix(strings.TrimSpace(fqdn), "."))
|
|
if host == "" {
|
|
return "", "", errors.New("dynadot webhook: ChallengeRequest.ResolvedFQDN is empty")
|
|
}
|
|
var bestApex string
|
|
for _, d := range s.managed.List() {
|
|
if host == d || strings.HasSuffix(host, "."+d) {
|
|
if len(d) > len(bestApex) {
|
|
bestApex = d
|
|
}
|
|
}
|
|
}
|
|
if bestApex == "" {
|
|
return "", "", fmt.Errorf("dynadot webhook: %q is not under any DYNADOT_MANAGED_DOMAINS entry %v", host, s.managed.List())
|
|
}
|
|
if host == bestApex {
|
|
// Apex challenge — Dynadot uses the special "@" subdomain (or
|
|
// equivalently empty). The client encodes this as a main_record0.
|
|
return bestApex, "@", nil
|
|
}
|
|
return bestApex, strings.TrimSuffix(host, "."+bestApex), nil
|
|
}
|
|
|
|
// parseLogLevel maps the LOG_LEVEL env to a slog.Level. Defaults to
|
|
// info; "debug" and "warn" / "error" are honoured.
|
|
func parseLogLevel(s string) slog.Level {
|
|
switch strings.ToLower(strings.TrimSpace(s)) {
|
|
case "debug":
|
|
return slog.LevelDebug
|
|
case "warn", "warning":
|
|
return slog.LevelWarn
|
|
case "error":
|
|
return slog.LevelError
|
|
default:
|
|
return slog.LevelInfo
|
|
}
|
|
}
|
|
|
|
// Compile-time guard: dynadotSolver implements the cert-manager webhook
|
|
// Solver interface. If cert-manager's contract changes the build fails
|
|
// here rather than at runtime when the apiserver dispatches the first
|
|
// ChallengeRequest.
|
|
var _ webhook.Solver = (*dynadotSolver)(nil)
|