openova/infra/hetzner/cloudinit-worker.tftpl
hatiyildiz e7a74f0eef feat(infra/hetzner): bump default to cx42, add OS hardening + operator README
Group J — closes #127, #128, #129, #130, #131, #132.

Defaults
- control_plane_size default cx42 (16 GB) — cx32 (8 GB) is INSUFFICIENT
  for a solo Sovereign per PLATFORM-TECH-STACK.md §7.1 (~11.3 GB Catalyst)
  + §7.4 (~8.8 GB per-host-cluster) = ~20 GB minimum. The previous cx32
  default would OOM during the OpenBao + Keycloak step of bootstrap.
- New k3s_version variable (v1.31.4+k3s1) — pinned, validated against
  the INSTALL_K3S_VERSION format. Previously hardcoded inside the
  cloud-init templates, in violation of INVIOLABLE-PRINCIPLES.md §4.

Validation
- Region restricted to the 5 known Hetzner locations.
- control_plane_size + worker_size restricted to the cxNN | ccxNN | caxNN
  namespace (blocks tiny dev sizes that would OOM at runtime).
- k3s_version regex matches the upstream installer's version format.
- ssh_allowed_cidrs validated as proper CIDRs.

Firewall
- Document each open port (80, 443, 6443, ICMP) and each blocked port
  (22, 10250, 2379/2380, 8472) in README.md §"Firewall rules".
- SSH (22) is now a dynamic rule keyed off ssh_allowed_cidrs (default
  empty = no SSH at the firewall, break-glass via Hetzner Console).

OS hardening (cloudinit-*.tftpl)
- sshd drop-in: PasswordAuthentication no, PermitRootLogin
  prohibit-password, no forwarding, MaxAuthTries=3, LoginGraceTime=30.
- enable_unattended_upgrades (default true): security-only pocket,
  auto-reboot at 02:30, removes unused kernels.
- enable_fail2ban (default true): sshd jail, systemd backend.
- Both control-plane and worker templates carry the same baseline.

Documentation
- New infra/hetzner/README.md (operator-facing) covers:
  * What the module creates + Phase-0/Phase-1 boundary.
  * Sizing rationale with the §7.1+§7.4 RAM math + upgrade path.
  * Firewall rules: every open port, every blocked port, every
    deliberate egress flow.
  * k3s flag-by-flag rationale tied to PLATFORM-TECH-STACK.md §8.
  * SSH key management: why no auto-generated keys (break-glass +
    audit-trail + custody + compliance).
  * OS hardening table.
  * Standalone CLI invocation pattern (tofu apply -var-file=...).
  * What the module does NOT do (Crossplane / Flux territory).

Closes #127 #128 #129 #130 #131 #132
2026-04-28 13:54:15 +02:00

100 lines
3.5 KiB
Plaintext

#cloud-config
# Catalyst Sovereign worker bootstrap.
# Sovereign: ${sovereign_fqdn}
#
# This script:
# 1. Installs OS hardening (SSH password-auth off, fail2ban, unattended-upgrades).
# 2. Joins the cluster as a k3s agent via the control plane's private IP.
# 3. Touches /var/lib/catalyst/cloud-init-complete for the provisioner.
package_update: true
package_upgrade: false
packages:
- curl
- iptables
- ca-certificates
%{ if enable_fail2ban ~}
- fail2ban
%{ endif ~}
%{ if enable_unattended_upgrades ~}
- unattended-upgrades
- apt-listchanges
%{ endif ~}
write_files:
# ── OS hardening: SSH daemon ──────────────────────────────────────────
# Identical drop-in to the control plane — Phase-0 baseline. Operators
# tighten further via Crossplane Composition once Phase 1 completes.
- path: /etc/ssh/sshd_config.d/99-catalyst-hardening.conf
permissions: '0644'
content: |
# Managed by Catalyst Sovereign cloud-init — do not edit by hand.
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
PermitRootLogin prohibit-password
PermitEmptyPasswords no
UsePAM yes
X11Forwarding no
AllowAgentForwarding no
AllowTcpForwarding no
ClientAliveInterval 300
ClientAliveCountMax 2
MaxAuthTries 3
LoginGraceTime 30
%{ if enable_unattended_upgrades ~}
- path: /etc/apt/apt.conf.d/20auto-upgrades
permissions: '0644'
content: |
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
- path: /etc/apt/apt.conf.d/52unattended-upgrades-catalyst
permissions: '0644'
content: |
Unattended-Upgrade::Allowed-Origins {
"$${distro_id}:$${distro_codename}-security";
"$${distro_id}ESMApps:$${distro_codename}-apps-security";
"$${distro_id}ESM:$${distro_codename}-infra-security";
};
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-Time "02:30";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
%{ endif ~}
%{ if enable_fail2ban ~}
- path: /etc/fail2ban/jail.d/catalyst-sshd.local
permissions: '0644'
content: |
[sshd]
enabled = true
port = ssh
filter = sshd
maxretry = 5
findtime = 10m
bantime = 1h
backend = systemd
%{ endif ~}
runcmd:
- swapoff -a
- sed -i '/swap/d' /etc/fstab
- update-alternatives --set iptables /usr/sbin/iptables-legacy || true
- update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy || true
- systemctl reload ssh || systemctl reload sshd || true
%{ if enable_fail2ban ~}
- systemctl enable --now fail2ban
%{ endif ~}
%{ if enable_unattended_upgrades ~}
- systemctl enable --now unattended-upgrades
%{ endif ~}
# Join the control plane via private network IP (10.0.1.2 — the first
# control-plane node in the network subnet). k3s_version pinned so all
# workers in this Sovereign land on the same Kubernetes minor as the CP.
- 'curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=${k3s_version} K3S_URL=https://${cp_private_ip}:6443 K3S_TOKEN=${k3s_token} INSTALL_K3S_EXEC="agent --node-label catalyst.openova.io/role=worker" sh -'
- mkdir -p /var/lib/catalyst
- touch /var/lib/catalyst/cloud-init-complete
final_message: "Catalyst worker bootstrap complete after $UPTIME seconds"