Group J — closes#127, #128, #129, #130, #131, #132.
Defaults
- control_plane_size default cx42 (16 GB) — cx32 (8 GB) is INSUFFICIENT
for a solo Sovereign per PLATFORM-TECH-STACK.md §7.1 (~11.3 GB Catalyst)
+ §7.4 (~8.8 GB per-host-cluster) = ~20 GB minimum. The previous cx32
default would OOM during the OpenBao + Keycloak step of bootstrap.
- New k3s_version variable (v1.31.4+k3s1) — pinned, validated against
the INSTALL_K3S_VERSION format. Previously hardcoded inside the
cloud-init templates, in violation of INVIOLABLE-PRINCIPLES.md §4.
Validation
- Region restricted to the 5 known Hetzner locations.
- control_plane_size + worker_size restricted to the cxNN | ccxNN | caxNN
namespace (blocks tiny dev sizes that would OOM at runtime).
- k3s_version regex matches the upstream installer's version format.
- ssh_allowed_cidrs validated as proper CIDRs.
Firewall
- Document each open port (80, 443, 6443, ICMP) and each blocked port
(22, 10250, 2379/2380, 8472) in README.md §"Firewall rules".
- SSH (22) is now a dynamic rule keyed off ssh_allowed_cidrs (default
empty = no SSH at the firewall, break-glass via Hetzner Console).
OS hardening (cloudinit-*.tftpl)
- sshd drop-in: PasswordAuthentication no, PermitRootLogin
prohibit-password, no forwarding, MaxAuthTries=3, LoginGraceTime=30.
- enable_unattended_upgrades (default true): security-only pocket,
auto-reboot at 02:30, removes unused kernels.
- enable_fail2ban (default true): sshd jail, systemd backend.
- Both control-plane and worker templates carry the same baseline.
Documentation
- New infra/hetzner/README.md (operator-facing) covers:
* What the module creates + Phase-0/Phase-1 boundary.
* Sizing rationale with the §7.1+§7.4 RAM math + upgrade path.
* Firewall rules: every open port, every blocked port, every
deliberate egress flow.
* k3s flag-by-flag rationale tied to PLATFORM-TECH-STACK.md §8.
* SSH key management: why no auto-generated keys (break-glass +
audit-trail + custody + compliance).
* OS hardening table.
* Standalone CLI invocation pattern (tofu apply -var-file=...).
* What the module does NOT do (Crossplane / Flux territory).
Closes#127#128#129#130#131#132
Per docs/INVIOLABLE-PRINCIPLES.md Lesson #24 — the previous commits 915c467 + 07b4bcf shipped bespoke Go code that called Hetzner Cloud API directly + exec'd helm/kubectl, which violates principle #3 (OpenTofu provisions Phase 0, Crossplane is the ONLY day-2 IaC, Flux is the ONLY GitOps reconciler, Blueprints are the ONLY install unit). This commit reverts all of that and replaces it with the canonical architecture.
REVERTED (deleted):
- products/catalyst/bootstrap/api/internal/hetzner/resources.go (379 lines bespoke Hetzner API client)
- products/catalyst/bootstrap/api/internal/hetzner/cloudinit.go (bespoke cloud-init builder)
- products/catalyst/bootstrap/api/internal/hetzner/provisioner.go (306 lines orchestrator)
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go (helm-exec installer for 11 components)
- products/catalyst/bootstrap/api/internal/bootstrap/exec.go (kubectl/helm exec wrappers)
KEPT:
- products/catalyst/bootstrap/api/internal/hetzner/client.go — fast token validity probe used by StepCredentials wizard step. NOT architectural drift; just a UX pre-flight check.
- products/catalyst/bootstrap/api/internal/dynadot/dynadot.go — DNS API client. Will be invoked by the OpenTofu module via local-exec (the catalyst-dns helper binary).
NEW (canonical architecture):
infra/hetzner/ — OpenTofu module per docs/SOVEREIGN-PROVISIONING.md §3 Phase 0:
- versions.tf: hetznercloud/hcloud provider ~> 1.49
- variables.tf: 17 typed variables matching wizard inputs (sovereign_fqdn, hcloud_token, region, control_plane_size, ssh_public_key, domain_mode, gitops_repo_url, etc.) — all runtime parameters, none hardcoded per principle #4
- main.tf: hcloud_network + subnet + firewall + ssh_key + control-plane server(s) with cloud-init + worker servers + load_balancer with services + null_resource calling /usr/local/bin/catalyst-dns for pool-domain DNS writes
- outputs.tf: control_plane_ip, load_balancer_ip, sovereign_fqdn, console_url, gitops_repo_url
- cloudinit-control-plane.tftpl: installs k3s with --flannel-backend=none --disable=traefik --disable=servicelb (Cilium replaces all of these), then installs Flux core, then applies a GitRepository pointing at clusters/${sovereign_fqdn}/ in the public OpenOva monorepo. From this point Flux is the GitOps engine — it reconciles bp-cilium → bp-cert-manager → bp-crossplane → ... → bp-catalyst-platform via the Kustomization tree the cluster directory ships. NO bespoke helm install from outside the cluster. NO direct kubectl apply. Flux is the install layer.
- cloudinit-worker.tftpl: k3s agent join via private-IP control plane
products/catalyst/bootstrap/api/internal/provisioner/provisioner.go — thin OpenTofu invoker:
- Validates wizard inputs
- Stages the canonical infra/hetzner/ module into a per-deployment workdir
- Writes tofu.auto.tfvars.json from the wizard request
- Execs `tofu init`, `tofu plan -out=tfplan`, `tofu apply tfplan`, streaming stdout/stderr lines as SSE events to the wizard
- Reads tofu output -json for control_plane_ip + load_balancer_ip
- Returns Result. Flux on the new cluster takes over from here.
products/catalyst/bootstrap/api/internal/handler/deployments.go — rewritten:
- Uses provisioner.Request and provisioner.New() (no more hetzner.Provisioner)
- Same SSE/poll endpoints; same Dynadot env-var injection for pool-domain mode
What this commit DOES NOT yet include (intentionally — separate work):
- clusters/${sovereign_fqdn}/ Kustomization tree in the monorepo that Flux will reconcile (each Sovereign gets its own cluster directory). Tracked separately as part of the bp-catalyst-platform umbrella work.
- /usr/local/bin/catalyst-dns helper binary in the catalyst-api Containerfile. Tracked as ticket [G] dns Dynadot client.
- Crossplane Compositions for hcloud resources at platform/crossplane/compositions/. Tracked as part of [F] crossplane chart.
Lesson #24 closed. Architecture now matches docs/ARCHITECTURE.md §10 + SOVEREIGN-PROVISIONING.md §3-§4 exactly.