fix(tofu): pass sovereign_fqdn_slug into secondary regions templatefile (#1511)

* fix(clustermesh): default clustermesh-apiserver to LoadBalancer (DoD A3)

DoD A3 from docs/SOVEREIGN-MULTI-REGION-DOD.md: Cilium ClusterMesh
apiserver Service MUST be LoadBalancer (NEVER NodePort).

Pre-this-change: bootstrap-kit/01-cilium.yaml defaulted
${CLUSTERMESH_SERVICE_TYPE:=NodePort}. Every multi-region Sovereign
landed with clustermesh-apiserver as NodePort, in direct violation of
A3 and breaking AutoEstablishClusterMesh (handler/clustermesh.go,
PR #1508) which hard-fails on Service.type != LoadBalancer.

Caught on prov t112.omani.works (f2e7f02e6ffb6a18, 2026-05-15):
- 3 cpx52 region cluster (hel1+nbg1+sin) converged HRs Ready=True
- clustermesh-apiserver Service = NodePort on all 3 regions
- cilium-clustermesh peer Secret empty (0 peers) — orchestrator
  never wrote them because of the type-check
- D10 + D12 both failed silently

Fix flips the chart default to LoadBalancer and threads Hetzner CCM
LB annotations (location, type, name) from the bootstrap-kit
substitute env. provisioner now emits CLUSTERMESH_SERVICE_TYPE +
HCLOUD_LB_LOCATION + SOVEREIGN_FQDN_SLUG into the cloud-init
postBuild substitute block alongside the existing CLUSTER_MESH_NAME
+ CLUSTER_MESH_ID.

Operator escape hatch preserved: bare-metal / non-cloud Sovereigns
override CLUSTERMESH_SERVICE_TYPE=NodePort in their per-Sovereign
bootstrap-kit overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tofu): pass sovereign_fqdn_slug into secondary regions templatefile

PR #1509 added ${sovereign_fqdn_slug} reference to cloudinit-control-plane.tftpl
(for the Hetzner CCM LB name annotation on clustermesh-apiserver) and wired
it into the PRIMARY templatefile() invocation in main.tf, but missed the
SECONDARY-regions templatefile() at line ~990. Every multi-region prov
now fails at `tofu plan`:

  Invalid value for "vars" parameter: vars map does not contain key
  "sovereign_fqdn_slug", referenced at ./cloudinit-control-plane.tftpl:991,37-56.

Caught on prov t113.omani.works (82c3587b97156a08, 2026-05-15) — first
multi-region prov against #1509's chart fix. Phase-0 failed at plan
before any servers spun up.

Fix is trivial: thread the same replace(var.sovereign_fqdn, ".", "-")
through the for_each secondary block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-16 00:00:19 +04:00 committed by GitHub
parent 2585b439d4
commit 0c9e391d59
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -988,6 +988,7 @@ locals {
cluster_cidr = local.region_cluster_cidr[k]
service_cidr = local.region_service_cidr[k]
sovereign_fqdn = var.sovereign_fqdn
sovereign_fqdn_slug = replace(var.sovereign_fqdn, ".", "-")
sovereign_subdomain = var.sovereign_subdomain
# OpenovaFlow integration (Agent #3). The secondary CP's region
# key is each.key from the secondary_regions for_each (e.g. "hel1"