Commit Graph

2 Commits

Author SHA1 Message Date
hatiyildiz
e7a74f0eef feat(infra/hetzner): bump default to cx42, add OS hardening + operator README
Group J — closes #127, #128, #129, #130, #131, #132.

Defaults
- control_plane_size default cx42 (16 GB) — cx32 (8 GB) is INSUFFICIENT
  for a solo Sovereign per PLATFORM-TECH-STACK.md §7.1 (~11.3 GB Catalyst)
  + §7.4 (~8.8 GB per-host-cluster) = ~20 GB minimum. The previous cx32
  default would OOM during the OpenBao + Keycloak step of bootstrap.
- New k3s_version variable (v1.31.4+k3s1) — pinned, validated against
  the INSTALL_K3S_VERSION format. Previously hardcoded inside the
  cloud-init templates, in violation of INVIOLABLE-PRINCIPLES.md §4.

Validation
- Region restricted to the 5 known Hetzner locations.
- control_plane_size + worker_size restricted to the cxNN | ccxNN | caxNN
  namespace (blocks tiny dev sizes that would OOM at runtime).
- k3s_version regex matches the upstream installer's version format.
- ssh_allowed_cidrs validated as proper CIDRs.

Firewall
- Document each open port (80, 443, 6443, ICMP) and each blocked port
  (22, 10250, 2379/2380, 8472) in README.md §"Firewall rules".
- SSH (22) is now a dynamic rule keyed off ssh_allowed_cidrs (default
  empty = no SSH at the firewall, break-glass via Hetzner Console).

OS hardening (cloudinit-*.tftpl)
- sshd drop-in: PasswordAuthentication no, PermitRootLogin
  prohibit-password, no forwarding, MaxAuthTries=3, LoginGraceTime=30.
- enable_unattended_upgrades (default true): security-only pocket,
  auto-reboot at 02:30, removes unused kernels.
- enable_fail2ban (default true): sshd jail, systemd backend.
- Both control-plane and worker templates carry the same baseline.

Documentation
- New infra/hetzner/README.md (operator-facing) covers:
  * What the module creates + Phase-0/Phase-1 boundary.
  * Sizing rationale with the §7.1+§7.4 RAM math + upgrade path.
  * Firewall rules: every open port, every blocked port, every
    deliberate egress flow.
  * k3s flag-by-flag rationale tied to PLATFORM-TECH-STACK.md §8.
  * SSH key management: why no auto-generated keys (break-glass +
    audit-trail + custody + compliance).
  * OS hardening table.
  * Standalone CLI invocation pattern (tofu apply -var-file=...).
  * What the module does NOT do (Crossplane / Flux territory).

Closes #127 #128 #129 #130 #131 #132
2026-04-28 13:54:15 +02:00
hatiyildiz
e668637bc9 feat(provisioner): replace bespoke Hetzner+helm-exec code with OpenTofu→Crossplane→Flux
Per docs/INVIOLABLE-PRINCIPLES.md Lesson #24 — the previous commits 915c467 + 07b4bcf shipped bespoke Go code that called Hetzner Cloud API directly + exec'd helm/kubectl, which violates principle #3 (OpenTofu provisions Phase 0, Crossplane is the ONLY day-2 IaC, Flux is the ONLY GitOps reconciler, Blueprints are the ONLY install unit). This commit reverts all of that and replaces it with the canonical architecture.

REVERTED (deleted):
- products/catalyst/bootstrap/api/internal/hetzner/resources.go (379 lines bespoke Hetzner API client)
- products/catalyst/bootstrap/api/internal/hetzner/cloudinit.go (bespoke cloud-init builder)
- products/catalyst/bootstrap/api/internal/hetzner/provisioner.go (306 lines orchestrator)
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go (helm-exec installer for 11 components)
- products/catalyst/bootstrap/api/internal/bootstrap/exec.go (kubectl/helm exec wrappers)

KEPT:
- products/catalyst/bootstrap/api/internal/hetzner/client.go — fast token validity probe used by StepCredentials wizard step. NOT architectural drift; just a UX pre-flight check.
- products/catalyst/bootstrap/api/internal/dynadot/dynadot.go — DNS API client. Will be invoked by the OpenTofu module via local-exec (the catalyst-dns helper binary).

NEW (canonical architecture):

infra/hetzner/ — OpenTofu module per docs/SOVEREIGN-PROVISIONING.md §3 Phase 0:
- versions.tf: hetznercloud/hcloud provider ~> 1.49
- variables.tf: 17 typed variables matching wizard inputs (sovereign_fqdn, hcloud_token, region, control_plane_size, ssh_public_key, domain_mode, gitops_repo_url, etc.) — all runtime parameters, none hardcoded per principle #4
- main.tf: hcloud_network + subnet + firewall + ssh_key + control-plane server(s) with cloud-init + worker servers + load_balancer with services + null_resource calling /usr/local/bin/catalyst-dns for pool-domain DNS writes
- outputs.tf: control_plane_ip, load_balancer_ip, sovereign_fqdn, console_url, gitops_repo_url
- cloudinit-control-plane.tftpl: installs k3s with --flannel-backend=none --disable=traefik --disable=servicelb (Cilium replaces all of these), then installs Flux core, then applies a GitRepository pointing at clusters/${sovereign_fqdn}/ in the public OpenOva monorepo. From this point Flux is the GitOps engine — it reconciles bp-cilium → bp-cert-manager → bp-crossplane → ... → bp-catalyst-platform via the Kustomization tree the cluster directory ships. NO bespoke helm install from outside the cluster. NO direct kubectl apply. Flux is the install layer.
- cloudinit-worker.tftpl: k3s agent join via private-IP control plane

products/catalyst/bootstrap/api/internal/provisioner/provisioner.go — thin OpenTofu invoker:
- Validates wizard inputs
- Stages the canonical infra/hetzner/ module into a per-deployment workdir
- Writes tofu.auto.tfvars.json from the wizard request
- Execs `tofu init`, `tofu plan -out=tfplan`, `tofu apply tfplan`, streaming stdout/stderr lines as SSE events to the wizard
- Reads tofu output -json for control_plane_ip + load_balancer_ip
- Returns Result. Flux on the new cluster takes over from here.

products/catalyst/bootstrap/api/internal/handler/deployments.go — rewritten:
- Uses provisioner.Request and provisioner.New() (no more hetzner.Provisioner)
- Same SSE/poll endpoints; same Dynadot env-var injection for pool-domain mode

What this commit DOES NOT yet include (intentionally — separate work):
- clusters/${sovereign_fqdn}/ Kustomization tree in the monorepo that Flux will reconcile (each Sovereign gets its own cluster directory). Tracked separately as part of the bp-catalyst-platform umbrella work.
- /usr/local/bin/catalyst-dns helper binary in the catalyst-api Containerfile. Tracked as ticket [G] dns Dynadot client.
- Crossplane Compositions for hcloud resources at platform/crossplane/compositions/. Tracked as part of [F] crossplane chart.

Lesson #24 closed. Architecture now matches docs/ARCHITECTURE.md §10 + SOVEREIGN-PROVISIONING.md §3-§4 exactly.
2026-04-28 13:38:56 +02:00