fix(api): cloud-init kubeconfig postback must live outside RequireSession (#637)
* fix(infra): break tofu cycle — resolve CP public IP at boot via metadata service PR #546 (Closes #542) introduced a dependency cycle: hcloud_server.control_plane.user_data → local.control_plane_cloud_init local.control_plane_cloud_init → hcloud_server.control_plane[0].ipv4_address `tofu plan` failed with: Error: Cycle: local.control_plane_cloud_init (expand), hcloud_server.control_plane Caught live during otech23 first-end-to-end provisioning attempt. Fix: stop templating `control_plane_ipv4` at plan time. cloud-init runs ON the CP node, so it resolves its own public IPv4 at boot via Hetzner's metadata service: curl http://169.254.169.254/hetzner/v1/metadata/public-ipv4 Same observable behavior as #546 (kubeconfig server: rewritten to CP public IP, not LB IP — preserves the wizard-jobs-page-not-stuck-PENDING fix), with no graph cycle. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> * fix(infra+api): wire handover_jwt_public_key end-to-end The OpenTofu cloud-init template references ${handover_jwt_public_key} (infra/hetzner/cloudinit-control-plane.tftpl:371) and variables.tf declares the variable, but neither side wires it: - main.tf templatefile() call did not pass the key → "vars map does not contain key handover_jwt_public_key" on tofu plan - provisioner.writeTfvars never set the var → empty even when wired Caught live during otech23 provisioning, immediately after the tofu-cycle fix landed. tofu plan failed with: Error: Invalid function argument on main.tf line 170, in locals: 170: control_plane_cloud_init = replace(templatefile(... Invalid value for "vars" parameter: vars map does not contain key "handover_jwt_public_key", referenced at ./cloudinit-control-plane.tftpl:371,9-32. Fix: - main.tf templatefile() now passes handover_jwt_public_key = var.handover_jwt_public_key - provisioner.Request gains a HandoverJWTPublicKey field (json:"-", server-stamped, never accepted from client JSON) - handler.CreateDeployment stamps it from h.handoverSigner.PublicJWK() when the signer is configured (CATALYST_HANDOVER_KEY_PATH set) - writeTfvars emits the value into tofu.auto.tfvars.json variables.tf default "" preserves the no-signer path: cloud-init writes an empty handover-jwt-public.jwk and the new Sovereign is provisioned without the handover-validation surface (handover flow simply not wired on that Sovereign — degraded gracefully, not a hard failure). Co-authored-by: hatiyildiz <hatiyildiz@openova.io> * fix(api): cloud-init kubeconfig postback must live outside RequireSession The PUT /api/v1/deployments/{id}/kubeconfig route was registered inside the RequireSession-gated chi.Group, so every cloud-init postback was rejected with HTTP 401 {"error":"unauthenticated"} before PutKubeconfig could run. Cloud-init has no browser session cookie — it authenticates with the SHA-256-hashed bearer token PutKubeconfig already verifies internally. Result on otech23: Phase 0 finished (Hetzner CP + LB up), but every cloud-init `curl --retry 60 -X PUT ... /kubeconfig` returned 401 unauth. catalyst-api never received the kubeconfig, Phase 1 helmwatch never started, the wizard's Jobs page stayed in PENDING forever. Fix: register the PUT outside the auth group so cloud-init's bearer-hash auth path is the only gate. The matching GET stays inside session auth — the operator's "Download kubeconfig" button needs the session cookie. Caught live during otech23 first end-to-end provisioning. Per the new "punish-back-to-zero" rule, otech23 was wiped (Hetzner + PDM + PowerDNS + on-disk state) and the next provision will use otech24. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> --------- Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
This commit is contained in:
parent
12233290d1
commit
9402970da2
@ -194,6 +194,16 @@ func main() {
|
||||
})
|
||||
r.Delete("/api/v1/auth/session", h.HandleAuthLogout)
|
||||
|
||||
// Unauthenticated cloud-init postback (issue #183, Option D + #634).
|
||||
// The new Sovereign's control plane PUTs its rewritten kubeconfig
|
||||
// here with `Authorization: Bearer <postback-token>`. PutKubeconfig
|
||||
// has its own SHA-256-hash-vs-stored-hash compare — it MUST live
|
||||
// outside the session-cookie middleware because cloud-init has no
|
||||
// browser cookies. Putting this inside the RequireSession group
|
||||
// rejected every postback with 401 {"error":"unauthenticated"} and
|
||||
// stuck Phase-1 in PENDING forever (caught live on otech23).
|
||||
r.Put("/api/v1/deployments/{id}/kubeconfig", h.PutKubeconfig)
|
||||
|
||||
// Auth-gated wizard endpoints — RequireSession validates the
|
||||
// HMAC-signed catalyst_session cookie on every request. When
|
||||
// cfg is nil (Sovereign clusters, CI without CATALYST_KC_ADDR)
|
||||
@ -242,13 +252,8 @@ func main() {
|
||||
// catalyst-api Pod cold-starts mid-Phase-1 and has to reattach
|
||||
// to a deployment whose kubeconfig is on the PVC.
|
||||
rg.Get("/api/v1/deployments/{id}/kubeconfig", h.GetKubeconfig)
|
||||
// PUT — cloud-init postback (issue #183, Option D). The new
|
||||
// Sovereign's control plane PUTs its rewritten kubeconfig here
|
||||
// with an Authorization: Bearer header. The handler verifies
|
||||
// SHA-256 of the bearer against the persisted hash, writes the
|
||||
// kubeconfig file to the PVC at mode 0600, and triggers the
|
||||
// Phase-1 helmwatch goroutine.
|
||||
rg.Put("/api/v1/deployments/{id}/kubeconfig", h.PutKubeconfig)
|
||||
// (PUT /kubeconfig is registered ABOVE the session group — see
|
||||
// the cloud-init postback comment near r.Delete /auth/session.)
|
||||
// Registrar proxy — wizard's BYO Flow B (#169). /validate is called
|
||||
// pre-submit so a typo'd token surfaces at the prompt; /set-ns is
|
||||
// called from CreateDeployment when domainMode == byo-api.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user