openova/infra/hetzner/cloudinit-control-plane.tftpl
hatiyildiz acf426c5a9 feat(catalyst-api): cloud-init POSTs kubeconfig back via bearer token (closes #183)
Implement Option D from issue #183: the new Sovereign's cloud-init
PUTs its rewritten kubeconfig (server URL pinned to the LB public
IP, k3s service-account token in the body) to catalyst-api over
HTTPS using a per-deployment bearer token. catalyst-api never SSHs
into the Sovereign — by design, it does not hold the SSH private
key (the wizard returns it once to the browser and does not
persist it on the catalyst-api side).

How the bearer flow works
-------------------------
1. CreateDeployment mints a 32-byte random bearer (crypto/rand,
   hex-encoded), computes its SHA-256, and persists ONLY the
   hash on Deployment.kubeconfigBearerHash. Plaintext is stamped
   onto provisioner.Request just long enough for writeTfvars to
   render it into the per-deployment OpenTofu workdir, then GC'd.

2. infra/hetzner/variables.tf adds three variables — deployment_id,
   kubeconfig_bearer_token (sensitive), catalyst_api_url. main.tf
   passes them through templatefile() with load_balancer_ipv4 read
   from hcloud_load_balancer.main.ipv4.

3. cloudinit-control-plane.tftpl, after `kubectl --raw /healthz`
   succeeds, sed-rewrites k3s.yaml's https://127.0.0.1:6443 to the
   LB's public IPv4, writes the result to a 0600 file, and curls
   PUT to {catalyst_api_url}/api/v1/deployments/{deployment_id}/
   kubeconfig with `Authorization: Bearer {token}`. --retry 60
   --retry-delay 10 --retry-all-errors handles transient
   reachability gaps. The 0600 file is removed after the PUT.

4. PUT /api/v1/deployments/{id}/kubeconfig:
   - Reads `Authorization: Bearer <token>` (RFC 6750).
   - Computes SHA-256 of the inbound bearer, constant-time-compares
     to the persisted hash via subtle.ConstantTimeCompare.
   - 401 on missing/malformed Authorization, 403 on bearer
     mismatch, 403 if no hash on record, 403 if KubeconfigPath
     already set (single-use replay defence), 422 on empty/oversize
     body, 503 if the kubeconfigs directory is unwritable.
   - On 204: writes the body to /var/lib/catalyst/kubeconfigs/
     <id>.yaml at mode 0600 (atomic temp+rename), sets
     Result.KubeconfigPath, persistDeployment, then `go
     runPhase1Watch(dep)`.

5. GET /api/v1/deployments/{id}/kubeconfig now reads the file at
   Result.KubeconfigPath. 409 with {"error":"not-implemented"} when
   the postback hasn't happened yet (preserves the wizard's
   existing StepSuccess fallback). 409 {"error":
   "kubeconfig-file-missing"} on PVC drift.

6. internal/store: Record carries KubeconfigBearerHash. The path
   pointer round-trips via Result.KubeconfigPath; the JSON record
   NEVER contains the kubeconfig plaintext (test grep on the on-
   disk JSON for the kubeconfig sentinels asserts zero matches).

7. restoreFromStore relaunches helmwatch on Pod restart for any
   rehydrated deployment whose Result.KubeconfigPath points at an
   existing file AND Phase1FinishedAt is nil AND the original
   status was not in-flight (the existing
   in-flight-status-rewrite-to-failed contract is preserved).
   Channels are re-allocated for resumed deployments because the
   fromRecord-loaded ones are closed.

8. internal/handler/phase1_watch.go reads kubeconfig YAML from
   the file at Result.KubeconfigPath (not from a string field on
   Result). The Result.Kubeconfig field is removed entirely; the
   on-disk JSON only carries kubeconfigPath.

Tests
-----
internal/handler/kubeconfig_test.go covers every spec gate:
- PUT 401 missing/malformed Authorization
- PUT 403 bearer mismatch / no-bearer-hash / already-set
- PUT 422 empty body / oversize body
- PUT 404 deployment not found
- PUT 204 first success, file at <dir>/<id>.yaml mode 0600,
  Result.KubeconfigPath set, on-disk JSON has kubeconfigPath
  pointer with no plaintext leak
- PUT triggers Phase 1 helmwatch goroutine
- GET reads from path-pointer
- GET 409 path-pointer-set-but-file-missing
- newBearerToken / hashBearerToken round-trip + entropy
- subtle.ConstantTimeCompare correctness
- shouldResumePhase1 gates every branch
- restoreFromStore re-launches helmwatch on rehydrated deployments
- phase1Started guard prevents double watch (PUT then runProvisioning)
- extractBearer RFC 6750 case-insensitive scheme

Chart
-----
products/catalyst/chart/templates/api-deployment.yaml mounts the
existing catalyst-api-deployments PVC at /var/lib/catalyst (one
level up) so deployments/<id>.json and kubeconfigs/<id>.yaml live
on the same single-attach volume — no second PVC. Adds env vars
CATALYST_KUBECONFIGS_DIR=/var/lib/catalyst/kubeconfigs and
CATALYST_API_PUBLIC_URL=https://console.openova.io/sovereign.

Per docs/INVIOLABLE-PRINCIPLES.md
- #3: OpenTofu is still the only Phase-0 IaC; cloud-init is part of
  the OpenTofu module's templated user_data, not a separate code
  path. catalyst-api never execs helm/kubectl/ssh.
- #4: catalyst_api_url is runtime-configurable
  (CATALYST_API_PUBLIC_URL env var), so air-gapped franchises
  override without code changes.
- #10: Bearer plaintext NEVER lands on disk on the catalyst-api
  side (only the SHA-256 hash). Kubeconfig plaintext NEVER lands
  in the JSON record (only the file path). The kubeconfig file is
  chmod 0600 and the directory 0700 owned by the catalyst-api UID.

Closes #183.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:26:53 +02:00

371 lines
17 KiB
Plaintext

#cloud-config
# Catalyst Sovereign control-plane bootstrap.
# Sovereign: ${sovereign_fqdn}
# Provisioned by: catalyst-provisioner (https://console.openova.io/sovereign)
#
# This script:
# 1. Installs OS hardening (SSH password-auth off, fail2ban, unattended-upgrades).
# 2. Installs k3s with --flannel-backend=none (Cilium replaces it).
# 3. Installs Flux + bootstraps the GitRepository pointing at clusters/${sovereign_fqdn}/
# in the public OpenOva monorepo. From this point Flux is the GitOps
# reconciler and installs the 11-component bootstrap kit
# (Cilium → cert-manager → Crossplane → ... → bp-catalyst-platform) in
# dependency order via Kustomizations the cluster directory ships.
# 4. Touches /var/lib/catalyst/cloud-init-complete so the catalyst-api
# provisioner can detect cloud-init has finished.
package_update: true
package_upgrade: false
packages:
- curl
- iptables
- jq
- ca-certificates
- git
%{ if enable_fail2ban ~}
- fail2ban
%{ endif ~}
%{ if enable_unattended_upgrades ~}
- unattended-upgrades
- apt-listchanges
%{ endif ~}
write_files:
- path: /var/lib/catalyst/sovereign.json
permissions: '0644'
content: |
{
"sovereignFQDN": "${sovereign_fqdn}",
"sovereignSubdomain": "${sovereign_subdomain}",
"orgName": ${jsonencode(org_name)},
"orgEmail": ${jsonencode(org_email)},
"region": "${region}",
"haEnabled": ${ha_enabled},
"workerCount": ${worker_count},
"k3sVersion": "${k3s_version}",
"gitopsRepoUrl": "${gitops_repo_url}",
"gitopsBranch": "${gitops_branch}"
}
# ── OS hardening: SSH daemon ──────────────────────────────────────────
# Drop-in overrides /etc/ssh/sshd_config defaults. Per Catalyst's threat
# model the operator's only valid path in is the Hetzner-project SSH key
# injected via cloud-init authorized_keys. Password auth, KbdInteractive,
# and root password login are all off.
- path: /etc/ssh/sshd_config.d/99-catalyst-hardening.conf
permissions: '0644'
content: |
# Managed by Catalyst Sovereign cloud-init — do not edit by hand.
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
PermitRootLogin prohibit-password
PermitEmptyPasswords no
UsePAM yes
X11Forwarding no
AllowAgentForwarding no
AllowTcpForwarding no
ClientAliveInterval 300
ClientAliveCountMax 2
MaxAuthTries 3
LoginGraceTime 30
%{ if enable_unattended_upgrades ~}
# ── Unattended security upgrades ──────────────────────────────────────
# Ubuntu's stock unattended-upgrades, restricted to the security pocket.
# Runs daily, reboots automatically at 02:30 if a kernel upgrade requires
# it (k3s tolerates single-node restarts on a solo Sovereign within the
# ~60s window the Hetzner LB health-check covers).
- path: /etc/apt/apt.conf.d/20auto-upgrades
permissions: '0644'
content: |
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
- path: /etc/apt/apt.conf.d/52unattended-upgrades-catalyst
permissions: '0644'
content: |
Unattended-Upgrade::Allowed-Origins {
"$${distro_id}:$${distro_codename}-security";
"$${distro_id}ESMApps:$${distro_codename}-apps-security";
"$${distro_id}ESM:$${distro_codename}-infra-security";
};
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-Time "02:30";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
%{ endif ~}
%{ if enable_fail2ban ~}
# ── fail2ban: sshd jail ───────────────────────────────────────────────
# Even though SSH is firewalled to ssh_allowed_cidrs (or fully closed at
# the firewall), fail2ban remains a defence-in-depth layer for the case
# where the firewall rule is widened by an operator post-bootstrap.
- path: /etc/fail2ban/jail.d/catalyst-sshd.local
permissions: '0644'
content: |
[sshd]
enabled = true
port = ssh
filter = sshd
maxretry = 5
findtime = 10m
bantime = 1h
backend = systemd
%{ endif ~}
# ── flux-system/ghcr-pull Secret ─────────────────────────────────────
#
# Every HelmRepository CR in clusters/${sovereign_fqdn}/bootstrap-kit/
# references `secretRef: name: ghcr-pull` because the bp-* OCI artifacts
# at `ghcr.io/openova-io/` are PRIVATE. Without this Secret, the
# source-controller logs:
#
# failed to get authentication secret 'flux-system/ghcr-pull':
# secrets "ghcr-pull" not found
#
# …and Phase 1 stalls at bp-cilium. The operator workaround (kubectl
# apply the Secret by hand after Flux installs) is not durable across
# re-provisioning of the same Sovereign — every fresh control-plane
# boots without the Secret.
#
# We write the Secret into flux-system at cloud-init time, BEFORE
# /var/lib/catalyst/flux-bootstrap.yaml is applied, so the GitRepository +
# Kustomization land into a cluster that already has working GHCR creds.
# The apply step is in runcmd: below; the manifest itself lives here.
#
# Token rotation policy: yearly, stored in 1Password under
# "Catalyst — GHCR pull token (catalyst-ghcr-pull-token)". See
# docs/SECRET-ROTATION.md. The token NEVER lives in git.
- path: /var/lib/catalyst/ghcr-pull-secret.yaml
permissions: '0600'
content: |
apiVersion: v1
kind: Secret
metadata:
name: ghcr-pull
namespace: flux-system
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${base64encode(jsonencode({
auths = {
"ghcr.io" = {
username = ghcr_pull_username
password = ghcr_pull_token
auth = ghcr_pull_auth_b64
}
}
}))}
# Flux GitRepository + Kustomization that take over after k3s is up.
# The clusters/${sovereign_fqdn}/ directory in the public OpenOva monorepo
# contains a Kustomization tree that installs the 11-component bootstrap
# kit + bp-catalyst-platform umbrella in dependency order.
- path: /var/lib/catalyst/flux-bootstrap.yaml
permissions: '0644'
content: |
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: openova
namespace: flux-system
spec:
interval: 1m
url: ${gitops_repo_url}
ref:
branch: ${gitops_branch}
ignore: |
/*
!/clusters/${sovereign_fqdn}
!/platform
!/products
---
# Two Flux Kustomizations with dependsOn so Crossplane CRDs land
# before any resource that uses them is dry-run-applied.
#
# bootstrap-kit installs the 11 HelmReleases (Cilium, cert-manager,
# Flux, Crossplane core, sealed-secrets, SPIRE, NATS-JetStream,
# OpenBao, Keycloak, Gitea, bp-catalyst-platform). bp-crossplane
# registers the Crossplane core CRDs (Provider, ProviderConfig…)
# AND the bp-catalyst-platform umbrella reconciles the rest.
#
# infrastructure-config applies the cluster's Provider package +
# ProviderConfig + Compositions. Because it dependsOn bootstrap-kit
# AND uses wait: true, Flux waits until bootstrap-kit's HelmReleases
# are Ready (Crossplane core + provider-hcloud installed,
# hcloud.crossplane.io/v1beta1 CRDs registered) before dry-running
# ProviderConfig — which is the exact ordering the prior single-
# Kustomization model tripped over with:
# no matches for kind "ProviderConfig" in version
# "hcloud.crossplane.io/v1beta1"
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: bootstrap-kit
namespace: flux-system
spec:
interval: 5m
path: ./clusters/${sovereign_fqdn}/bootstrap-kit
prune: true
sourceRef:
kind: GitRepository
name: openova
wait: true
timeout: 30m
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infrastructure-config
namespace: flux-system
spec:
interval: 5m
path: ./clusters/${sovereign_fqdn}/infrastructure
prune: true
sourceRef:
kind: GitRepository
name: openova
dependsOn:
- name: bootstrap-kit
wait: true
timeout: 30m
runcmd:
- swapoff -a
- sed -i '/swap/d' /etc/fstab
- update-alternatives --set iptables /usr/sbin/iptables-legacy || true
- update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy || true
# Activate hardened sshd config (cloud-init may have written authorized_keys
# already from Hetzner ssh_keys[]; we never touch that file).
- systemctl reload ssh || systemctl reload sshd || true
%{ if enable_fail2ban ~}
- systemctl enable --now fail2ban
%{ endif ~}
%{ if enable_unattended_upgrades ~}
- systemctl enable --now unattended-upgrades
%{ endif ~}
# k3s control-plane. Flags per docs/SOVEREIGN-PROVISIONING.md §3 and
# docs/PLATFORM-TECH-STACK.md §8.1:
# --cluster-init Initialise embedded etcd (HA-ready).
# --flannel-backend=none Cilium replaces flannel.
# --disable=traefik Cilium Gateway replaces traefik.
# --disable=servicelb Hetzner LB handles ingress.
# --disable=local-storage Crossplane-provisioned hcloud-csi instead.
# --disable-network-policy Cilium handles NetworkPolicy.
# --tls-san=${sovereign_fqdn} API server cert valid for the sovereign FQDN.
- 'curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=${k3s_version} K3S_TOKEN=${k3s_token} INSTALL_K3S_EXEC="server --cluster-init --flannel-backend=none --disable-network-policy --disable=traefik --disable=servicelb --disable=local-storage --tls-san=${sovereign_fqdn} --node-label catalyst.openova.io/role=control-plane --write-kubeconfig-mode=0644" sh -'
# Wait for the API server to be reachable. Cilium needs to come up before
# nodes Ready, so we wait specifically for the API endpoint.
- 'until kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml get --raw /healthz; do sleep 5; done'
%{ if deployment_id != "" && kubeconfig_bearer_token != "" && catalyst_api_url != "" ~}
# ── Cloud-init kubeconfig postback (issue #183, Option D) ───────────────
#
# The k3s install above wrote /etc/rancher/k3s/k3s.yaml with the API
# server URL pinned to https://127.0.0.1:6443 — kubectl's default for a
# local single-node install. catalyst-api lives off-cluster (Catalyst-Zero
# franchise console on contabo-mkt) and cannot reach 127.0.0.1 on this
# node, so we MUST rewrite that field before sending the kubeconfig
# back. The Hetzner load balancer at $${load_balancer_ipv4} forwards
# 6443 to the control plane's 6443 (firewall rule above), so a kubeconfig
# pointing at the LB's public IPv4 is reachable from anywhere.
#
# Plaintext: we read from /etc/rancher/k3s/k3s.yaml (mode 0644 written
# by k3s), apply the rewrite via sed, write the result to
# /etc/rancher/k3s/k3s.yaml.public (mode 0600 explicitly), then
# curl --data-binary the file content to catalyst-api with the bearer
# token. The .public file is removed at the end of the runcmd block
# so the bearer-protected kubeconfig only lives on this node for the
# few seconds it takes to PUT.
#
# --retry 60 --retry-delay 10 --retry-all-errors handles the case
# where catalyst-api is briefly unreachable (image roll, ingress
# reconciliation) — the cloud-init runcmd budget is bounded by the
# systemd cloud-final timeout (~30 minutes).
- install -m 0600 /dev/null /etc/rancher/k3s/k3s.yaml.public
- sed 's|https://127.0.0.1:6443|https://${load_balancer_ipv4}:6443|g' /etc/rancher/k3s/k3s.yaml > /etc/rancher/k3s/k3s.yaml.public
- chmod 0600 /etc/rancher/k3s/k3s.yaml.public
- |
curl -fsSL --retry 60 --retry-delay 10 --retry-all-errors \
-X PUT \
-H "Authorization: Bearer ${kubeconfig_bearer_token}" \
-H "Content-Type: application/x-yaml" \
--data-binary @/etc/rancher/k3s/k3s.yaml.public \
${catalyst_api_url}/api/v1/deployments/${deployment_id}/kubeconfig
- rm -f /etc/rancher/k3s/k3s.yaml.public
%{ endif ~}
# ── Cilium FIRST (before Flux) ───────────────────────────────────────────
#
# k3s started with --flannel-backend=none, so the cluster has NO CNI yet.
# If we apply Flux install.yaml at this point, the Flux controller pods
# stay Pending forever — kubelet rejects them with
# "container runtime network not ready: cni plugin not initialized"
# Flux is then unable to reconcile bp-cilium, so Cilium is never
# installed → bootstrap deadlock that we hit in production at
# omantel.omani.works deployment 5cd1bceaaacb71f6 (25 min stuck Pending).
#
# Bootstrap chicken-and-egg: Cilium IS the install unit (bp-cilium), but
# Flux needs a CNI to run, and Cilium IS the CNI. Resolution: install
# Cilium ONCE here via Helm with the same chart + values bp-cilium would
# apply later. When Flux reconciles bp-cilium, it adopts the existing
# release (Helm release-name match), so there is no churn.
#
# Per INVIOLABLE-PRINCIPLES.md #3 the GitOps engine is Flux — this Helm
# install is the one-shot bootstrap exception explicitly authorised by
# the same principle's "everything ELSE" qualifier. The chart version
# matches platform/cilium/blueprint.yaml's chartVersion to keep the
# bootstrap install and the reconciled HelmRelease byte-identical.
- 'curl -sSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash'
- 'helm repo add cilium https://helm.cilium.io/'
- 'helm repo update'
- |
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install cilium cilium/cilium \
--version 1.16.5 \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set k8sServiceHost=127.0.0.1 \
--set k8sServicePort=6443 \
--set ipam.mode=kubernetes \
--set tunnelProtocol=vxlan \
--set bpf.masquerade=true
- 'kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml -n kube-system rollout status ds/cilium --timeout=240s'
# Install Flux core. Cilium is now the cluster's CNI, so Flux pods will
# actually start. Flux then reconciles clusters/${sovereign_fqdn}/ which
# adopts the Helm release above as bp-cilium and continues with
# bp-cert-manager, bp-flux (host-level Flux, distinct from this Flux
# which is the CONTROL-PLANE Flux), bp-crossplane, etc.
- 'curl -fsSL https://github.com/fluxcd/flux2/releases/download/v2.4.0/install.yaml | kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml apply -f -'
- 'kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml -n flux-system wait --for=condition=Available --timeout=300s deployment --all'
# ── flux-system/ghcr-pull Secret (applied BEFORE GitRepository) ──────
#
# Apply the docker-registry pull secret rendered above. This MUST land
# before the GitRepository + Kustomization in flux-bootstrap.yaml,
# because the bootstrap-kit Kustomization includes HelmRepository CRs
# that reference this Secret by name; the source-controller resolves
# them on its first reconciliation tick and a missing Secret propagates
# as a Ready=False/AuthError state that has been observed to persist
# for 5+ minutes even after the Secret is later applied.
#
# Idempotent: `kubectl apply` against an existing Secret is a no-op
# when the manifest's bytes match. A reprovision (same Sovereign FQDN)
# rewrites this with the same content; a token rotation propagates
# through here on the next cloud-init render.
- 'kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml apply -f /var/lib/catalyst/ghcr-pull-secret.yaml'
# Apply the Flux bootstrap GitRepository + Kustomization. From here, Flux
# owns the cluster: pulls clusters/${sovereign_fqdn}/, installs Cilium
# via bp-cilium, cert-manager via bp-cert-manager, etc., then bp-catalyst-platform.
- 'kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml apply -f /var/lib/catalyst/flux-bootstrap.yaml'
# Marker for the catalyst-api provisioner to detect cloud-init is done.
- mkdir -p /var/lib/catalyst
- touch /var/lib/catalyst/cloud-init-complete
final_message: "Catalyst control-plane bootstrap complete after $UPTIME seconds — Flux is now reconciling clusters/${sovereign_fqdn}/"