openova/infra/hetzner/variables.tf
e3mrah fcfed6408c
feat(infra,cilium): wire Cilium ClusterMesh anchors via tofu→cloudinit→envsubst (#1101) (#1226)
* feat(infra,cilium): wire Cilium ClusterMesh anchors via tofu→cloudinit→envsubst (#1101)

Follow-up to #1223. The Flux Kustomization on every Sovereign points
at clusters/_template/bootstrap-kit/ and post-build-substitutes per-
Sovereign vars (SOVEREIGN_FQDN, MARKETPLACE_ENABLED, ...). The
per-Sovereign overlay file at clusters/<sov>/bootstrap-kit/01-cilium.yaml
that #1223 added is therefore dead code (Flux doesn't read that
path). The canonical mechanism is to extend the template with
envsubst placeholders + thread the values through tofu vars.

Wires four layers end-to-end:

1. clusters/_template/bootstrap-kit/01-cilium.yaml — adds
   `cluster.name: ${CLUSTER_MESH_NAME:=}` and
   `cluster.id: ${CLUSTER_MESH_ID:=0}` plus
   `clustermesh.useAPIServer: true` + NodePort 32379. Empty defaults
   = single-cluster Sovereign (no peer connects); the cilium subchart
   accepts empty cluster.name when id=0.

2. infra/hetzner/cloudinit-control-plane.tftpl — adds
   CLUSTER_MESH_NAME / CLUSTER_MESH_ID to the bootstrap-kit
   Kustomization's postBuild.substitute block (alongside
   SOVEREIGN_FQDN, MARKETPLACE_ENABLED, PARENT_DOMAINS_YAML).

3. infra/hetzner/variables.tf — declares cluster_mesh_name (string,
   default "") and cluster_mesh_id (number, default 0, validated 0-255).

4. infra/hetzner/main.tf — primary cloud-init passes
   var.cluster_mesh_{name,id} verbatim. Secondary regions (when
   var.regions[i>0] is non-empty per slice G3) auto-derive each
   peer's name as `<sovereign-stem>-<region-code-no-digits>` and
   increment id from var.cluster_mesh_id+1. Per-region override via
   the new RegionSpec.ClusterMeshName field.

5. products/catalyst/bootstrap/api/internal/provisioner/provisioner.go
   — adds ClusterMeshName + ClusterMeshID to Request and threads them
   into writeTfvars(); RegionSpec gains ClusterMeshName for per-peer
   override.

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), the chart-side
default is intentionally empty — operator request OR per-Sovereign
overlay must supply the values when ClusterMesh is enabled. The
allocation registry lives at docs/CLUSTERMESH-CLUSTER-IDS.md
(introduced in #1223).

Refs: #1101 (EPIC-6), qa-loop iter-6 fix-33 follow-up to #1223

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(infra): escape $ in tftpl comments referencing envsubst placeholders

`tofu validate` reads `${CLUSTER_MESH_NAME}` inside YAML comments as a
template variable reference; the comment was meant to refer to the Flux
envsubst placeholder consumed downstream by the bootstrap-kit cilium
HelmRelease. Escaped both refs with `$$` per Terraform's templatefile
escape syntax so the comment renders verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(infra): replace coalesce with conditional in secondary_region_cluster_mesh_name

coalesce errors when every arg is empty (the not-in-mesh path). Switch
to a conditional that yields '' when both the per-region override AND
var.cluster_mesh_name are empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:19:53 +04:00

717 lines
33 KiB
HCL
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# All wizard inputs, as OpenTofu variables. The catalyst-api provisioner
# package writes these as tofu.auto.tfvars.json before running tofu apply.
#
# Per docs/INVIOLABLE-PRINCIPLES.md principle #4: nothing is hardcoded. Every
# value the wizard captures or the operator chose at provisioning time is a
# variable here. Defaults below describe the COMMON case (solo Sovereign on
# Hetzner) — see infra/hetzner/README.md for the rationale behind each default.
# ── Identity ──────────────────────────────────────────────────────────────
variable "sovereign_fqdn" {
type = string
description = "Fully-qualified domain for this Sovereign — e.g. omantel.omani.works"
validation {
condition = can(regex("^[a-z][a-z0-9-]*(\\.[a-z][a-z0-9-]*)+$", var.sovereign_fqdn))
error_message = "Sovereign FQDN must be a valid lowercase domain (RFC 1035)."
}
}
variable "sovereign_subdomain" {
type = string
description = "Subdomain portion when domain_mode=pool e.g. 'omantel' for omantel.omani.works. Empty when BYO."
default = ""
}
variable "marketplace_enabled" {
type = string
description = "When 'true', bp-catalyst-platform 1.3.0+ renders the marketplace + tenant-wildcard HTTPRoutes exposing marketplace.<sov> + *.<sov>. Operator opt-in (issue #710). Default 'false' for non-marketplace Sovereigns."
default = "false"
validation {
condition = contains(["true", "false"], var.marketplace_enabled)
error_message = "marketplace_enabled must be the string 'true' or 'false'."
}
}
# ── Multi-domain Sovereign (issue #827, parent epic #825) ─────────────────
#
# The Sovereign supports N parent zones, NOT one. The wizard captures the
# operator's parent-domain list (one for own use, optionally one per SME
# pool, etc.) and serialises it as a YAML inline-array literal. The
# string is interpolated into Flux's postBuild.substitute as
# PARENT_DOMAINS_YAML, then consumed by:
# - bootstrap-kit slot 11 (bp-powerdns) — values.zones
# - bootstrap-kit slot 13 (bp-catalyst-platform) — values.parentZones
# in lockstep so the two slots agree on what the Sovereign considers a
# parent zone.
#
# The default below renders a single-entry array derived from
# sovereign_fqdn so legacy single-zone provisioning paths keep working
# without per-overlay edits. The wizard / catalyst-api populates this
# explicitly when the operator brings 2+ parent zones at signup.
variable "parent_domains_yaml" {
type = string
description = "Parent-domain list for the Sovereign as a YAML inline-array literal. Each entry: {name: <apex>, role: <primary|sme-pool>, ...}. Empty = single-zone fallback derived from sovereign_fqdn."
default = ""
}
# Cilium ClusterMesh per-Sovereign anchors (#1101 EPIC-6 multi-region DR).
# Empty + 0 = not joined to any mesh (single-cluster Sovereign — the chart
# still installs the clustermesh-apiserver Pod but no peer connects). When
# the operator joins this Sovereign to a multi-region mesh (e.g. omantel-fsn
# + omantel-hel), set both to the registered values from
# docs/CLUSTERMESH-CLUSTER-IDS.md. Per docs/INVIOLABLE-PRINCIPLES.md #4
# (never hardcode), the values flow operator-request → catalyst-api
# Request.ClusterMeshName/ClusterMeshID → tofu vars → cloudinit
# postBuild.substitute → bootstrap-kit slot 01-cilium.yaml.
variable "cluster_mesh_name" {
type = string
description = "Cilium ClusterMesh peer name for this Sovereign (e.g. omantel-fsn). Empty = not in a mesh. Convention: <sovereign-stem>-<region-code>. Allocated via docs/CLUSTERMESH-CLUSTER-IDS.md."
default = ""
}
variable "cluster_mesh_id" {
type = number
description = "Cilium ClusterMesh peer id (1-255 unique within a mesh; 0 reserved for not-in-mesh). Allocated via docs/CLUSTERMESH-CLUSTER-IDS.md — every PR adding a new peer MUST claim a row in that registry."
default = 0
validation {
condition = var.cluster_mesh_id >= 0 && var.cluster_mesh_id <= 255
error_message = "cluster_mesh_id must be 0 (not-in-mesh) or 1-255 (peer id)."
}
}
variable "org_name" {
type = string
description = "Organisation name for resource labels + initial sovereign-admin Org name"
}
variable "org_email" {
type = string
description = "Initial sovereign-admin email — becomes the first user in Keycloak's catalyst-admin realm"
validation {
condition = can(regex("^[^@]+@[^@]+\\.[^@]+$", var.org_email))
error_message = "Email must be a syntactically valid address."
}
}
# ── Hetzner ───────────────────────────────────────────────────────────────
variable "hcloud_token" {
type = string
description = "Hetzner Cloud API token (read+write). Never logged. Never committed to git."
sensitive = true
}
variable "hcloud_project_id" {
type = string
description = "Hetzner project ID for resource attribution + audit log"
}
variable "region" {
type = string
description = "Hetzner location (region). Runtime parameter — never hardcoded."
validation {
# Authoritative list of Hetzner Cloud locations as of 2026-04-28.
# Update when Hetzner adds a new location AND the operator wants to
# provision there. The local.network_zone lookup in main.tf must be
# updated in the same PR.
condition = contains(["fsn1", "nbg1", "hel1", "ash", "hil"], var.region)
error_message = "Region must be a valid Hetzner location: fsn1 (Falkenstein), nbg1 (Nuremberg), hel1 (Helsinki), ash (Ashburn), hil (Hillsboro)."
}
}
# ── Topology ──────────────────────────────────────────────────────────────
variable "control_plane_size" {
type = string
description = <<-EOT
Hetzner server type for the control plane node.
Default cpx22 (2 vCPU / 4 GB AMD shared) cost-optimised default
for the Phase-8a CP working set. The control plane carries ONLY
k3s (apiserver/etcd/scheduler/controller-manager) + cilium-operator
+ flux controllers + cert-manager + sealed-secrets. Heavy stack
(bp-keycloak / bp-cnpg / bp-harbor / bp-openbao / bp-grafana)
schedules to workers because the bootstrap-kit explicitly tolerates
away from the CP taint. RAM budget: etcd ~512 MB + control plane
~1.5 GB + cilium/flux/cert-manager/sealed-secrets ~1 GB + OS ~512
MB = ~3.5 GB on CPX22's 4 GB.
Smaller SKUs in the cpx family (cpx21 3 vCPU / 4 GB / 10.99/mo)
are LISTED in /v1/server_types with EU prices but POST /v1/servers
returns {"error":{"code":"invalid_input","message":"unsupported
location for server type"}} for cpx11/cpx21/cpx31/cpx41 in any of
fsn1/nbg1/hel1 (verified 2026-05-04, see issue #752 + the README
§"Why cpx21/cpx31 are NOT the default" for the curl reproducer).
cpx22 is the smallest orderable AMD shared SKU with 4 GB RAM in
EU DCs.
Operators picking SOLO mode (worker_count=0) should still pick
CPX52 explicitly so all Blueprints can fit on a single node.
Operators picking large/HA topologies still pick larger SKUs
(cax41/ccx33) for dedicated-vCPU control planes.
If a Sovereign experiences CP RAM pressure with this default,
the next step UP is cpx32 (4 vCPU / 8 GB, ~16.49/mo).
EOT
default = "cpx22"
validation {
# Accepted families per Hetzner Cloud (https://www.hetzner.com/cloud/):
# cx* — shared-vCPU Intel
# cpx* — shared-vCPU AMD (the wizard's recommended CPX22 is here)
# ccx* — dedicated-vCPU Intel
# cax* — Ampere Arm
# Earlier rule omitted the CPX family entirely, which rejected the
# wizard's default selection at plan-time before the operator could
# ever provision.
condition = can(regex("^(cx[0-9]+|cpx[0-9]+|ccx[0-9]+|cax[0-9]+)$", var.control_plane_size))
error_message = "control_plane_size must match Hetzner server-type naming (cxNN | cpxNN | ccxNN | caxNN). Minimum orderable in EU DCs (2026-05): cpx22 (4 GB AMD) for the Phase-8a CP working set; cpx32 (8 GB AMD) when the CP exhibits RAM pressure."
}
}
variable "worker_size" {
type = string
description = <<-EOT
Hetzner server type for worker nodes.
Default cpx32 (4 vCPU / 8 GB AMD shared) the smallest AMD shared
SKU with 8 GB RAM that is orderable for new servers in fsn1/nbg1/
hel1 as of 2026-05-04. RAM is the binding constraint for the
bootstrap-kit's worker pods (cnpg, harbor, keycloak, openbao,
grafana stack); 8 GB per worker is the sweet spot. The smaller
cpx31 (also 4 vCPU / 8 GB at ~20.49/mo published) is LISTED in
/v1/server_types with EU prices but POST /v1/servers rejects every
cpx11/cpx21/cpx31/cpx41 order in fsn1/nbg1/hel1 with "unsupported
location for server type" (issue #752 — see infra/hetzner/README.md
§"Why cpx21/cpx31 are NOT the default" for the curl reproducer).
Per docs/INVIOLABLE-PRINCIPLES.md #4 every workload pod is
reschedulable across nodes; once worker_count 2 the per-host
overhead is amortised across nodes. Solo Sovereigns set
worker_count=0 explicitly and run all workloads on the control
plane in that mode this variable is unused.
If a worker exhibits CPU pressure under load, scale by adding a
third worker (worker_count=3) before bumping the SKU.
EOT
default = "cpx32"
validation {
# Empty string is valid — solo Sovereigns set worker_count = 0 and
# never read worker_size; the wizard surfaces the empty-SKU state as
# "no workers" in the review screen. Non-empty values must match the
# same Hetzner server-type families control_plane_size accepts.
condition = var.worker_size == "" || can(regex("^(cx[0-9]+|cpx[0-9]+|ccx[0-9]+|cax[0-9]+)$", var.worker_size))
error_message = "worker_size must be empty (solo Sovereign, worker_count=0) or match Hetzner server-type naming (cxNN | cpxNN | ccxNN | caxNN)."
}
}
variable "worker_count" {
type = number
description = <<-EOT
Number of worker nodes joined to the k3s control plane.
Default 2 restores the horizontal-scale agreement (issue #733):
every Sovereign should land with at least 1 CP + 2 workers so the
operator sees a TRULY multi-node cluster from handover. Workloads
requiring `replicas: 2` (catalyst-api, catalyst-ui, marketplace-api)
can spread across nodes; node failure no longer takes the whole
Sovereign down.
0 = single-node solo Sovereign (control plane handles all workloads;
used for dev/POC). Operators opt into solo mode explicitly via the
wizard's worker count picker.
EOT
default = 2
validation {
condition = var.worker_count >= 0 && var.worker_count <= 50
error_message = "Worker count must be between 0 and 50."
}
}
variable "ha_enabled" {
type = bool
description = "When true, provisions 3 control-plane nodes for HA. When false, single control-plane node."
default = false
}
# ── Per-region SKU payload ────────────────────────────────────────────────
#
# The wizard captures sizing per-region (each region has its own provider,
# its own cloud-region, and its own control-plane + worker SKUs). The
# canonical request shape carries one entry per topology slot via this
# variable; the legacy singular control_plane_size / worker_size /
# worker_count above mirror regions[0] for the single-region apply path
# main.tf currently drives.
#
# Multi-region tofu wiring is structural-correct (variables.tf accepts the
# list, the catalyst-api provisioner emits it to tofu.auto.tfvars.json),
# but only regions[0] is end-to-end exercised today against a real Hetzner
# project. The for_each iteration that activates the rest will replace
# main.tf's single-server hcloud_server resources with one per-region
# block — at that point this variable becomes the source of truth and the
# legacy singular fields drop out. The door is open structurally so that
# activation is a follow-up commit, not a redesign.
variable "regions" {
type = list(object({
provider = string
cloudRegion = string
controlPlaneSize = string
workerSize = string
workerCount = number
}))
description = <<-EOT
Per-region SKU payload from the wizard's StepProvider. One entry per
topology slot (plus 1 for AIR-GAP when enabled). SKU strings are the
provider's NATIVE instance-type identifier (cx32, m6i.xlarge,
Standard_D4s_v5, ...) passed verbatim to that provider's API.
When empty, main.tf falls back to the singular control_plane_size /
worker_size / worker_count variables (the back-compat path used by
handler/load_test.go and any pre-rework wizard payload).
EOT
default = []
validation {
condition = alltrue([
for r in var.regions :
contains(["hetzner", "huawei", "oci", "aws", "azure"], r.provider)
])
error_message = "Each regions[].provider must be one of: hetzner, huawei, oci, aws, azure."
}
}
# ── k3s ───────────────────────────────────────────────────────────────────
variable "k3s_version" {
type = string
description = <<-EOT
k3s release pinned for both control-plane and workers. Must match the
INSTALL_K3S_VERSION format (e.g. v1.31.4+k3s1). Pinned so a Sovereign
provisioned today and one provisioned next month land on the same
Kubernetes minor required for blueprint compatibility guarantees
documented in docs/PLATFORM-TECH-STACK.md §8.1.
EOT
default = "v1.31.4+k3s1"
validation {
condition = can(regex("^v[0-9]+\\.[0-9]+\\.[0-9]+\\+k3s[0-9]+$", var.k3s_version))
error_message = "k3s_version must match the INSTALL_K3S_VERSION format vMAJOR.MINOR.PATCH+k3sN (e.g. v1.31.4+k3s1)."
}
}
# ── SSH ───────────────────────────────────────────────────────────────────
variable "ssh_public_key" {
type = string
description = <<-EOT
Public SSH key (OpenSSH format) attached to all servers for
sovereign-admin break-glass access.
The key MUST come from the operator's Hetzner project / SSO-linked
identity never auto-generated by this module. See
infra/hetzner/README.md §"SSH key management" for why ephemeral keys
are rejected (break-glass + audit-trail requirements).
EOT
validation {
condition = can(regex("^(ssh-rsa|ssh-ed25519|ecdsa-sha2-nistp256) ", var.ssh_public_key))
error_message = "SSH public key must be in OpenSSH format starting with ssh-rsa, ssh-ed25519, or ecdsa-sha2-nistp256."
}
}
# ── DNS ───────────────────────────────────────────────────────────────────
variable "domain_mode" {
type = string
description = "How DNS is managed: 'pool' (Catalyst writes records via Dynadot), 'byo' (customer manages own DNS)"
default = "pool"
validation {
condition = contains(["pool", "byo"], var.domain_mode)
error_message = "Domain mode must be 'pool' or 'byo'."
}
}
variable "pool_domain" {
type = string
description = "Pool domain when domain_mode=pool e.g. 'omani.works'"
default = ""
}
variable "dynadot_key" {
type = string
description = "Dynadot API key (required when domain_mode=pool)"
default = ""
sensitive = true
}
variable "dynadot_secret" {
type = string
description = "Dynadot API secret (required when domain_mode=pool)"
default = ""
sensitive = true
}
variable "dynadot_managed_domains" {
type = string
description = "Comma-separated list of pool domains the Dynadot webhook is permitted to mutate. Defaults to the parent zone of sovereign_fqdn when blank (e.g. 'omani.works' for 'console.otech22.omani.works')."
default = ""
}
variable "powerdns_api_key" {
type = string
description = "Contabo PowerDNS API key. Interpolated by cloudinit-control-plane.tftpl into the Sovereign's cert-manager/powerdns-api-credentials Secret so bp-cert-manager-powerdns-webhook can write DNS-01 challenge TXT records to contabo's authoritative omani.works zone (PR #681 followup). Required when domain_mode=pool."
default = ""
sensitive = true
}
# ── GHCR pull token ───────────────────────────────────────────────────────
#
# Long-lived GHCR token (GitHub PAT or fine-grained token, scope
# `packages:read` on `openova-io`) that the new Sovereign's Flux
# source-controller uses to pull the private bp-* OCI artifacts from
# `ghcr.io/openova-io/`. Cloud-init writes this into the
# flux-system/ghcr-pull Secret on the freshly-installed k3s control
# plane BEFORE applying the GitRepository + Kustomization that wires up
# clusters/<sovereign-fqdn>/.
#
# Without this, every HelmRepository CR in
# clusters/<sovereign-fqdn>/bootstrap-kit/ (each carrying
# `secretRef: name: ghcr-pull`) errors with:
# failed to get authentication secret 'flux-system/ghcr-pull':
# secrets "ghcr-pull" not found
# Phase 1 stalls at bp-cilium and the bootstrap kit never lands. The
# operator-applied workaround (kubectl apply the secret by hand) is not
# durable across reprovisioning of the same Sovereign.
#
# Source: catalyst-api Pod mounts this from the
# `catalyst-ghcr-pull-token` Kubernetes Secret in the catalyst namespace
# as the env var CATALYST_GHCR_PULL_TOKEN. Rotation policy + storage:
# docs/SECRET-ROTATION.md.
variable "ghcr_pull_token" {
type = string
description = <<-EOT
GHCR pull token (GitHub PAT or fine-grained token, scope `packages:read`
on openova-io). Written to flux-system/ghcr-pull at cloud-init time so
Flux source-controller can pull private bp-* OCI artifacts.
Empty default exists so the OpenTofu module renders for BYO
catalyst-api Pods that have not yet adopted the
`catalyst-ghcr-pull-token` Secret; provisioner.Validate() in
products/catalyst/bootstrap/api/internal/provisioner enforces
non-empty for managed-pool deployments where Phase 1 absolutely
needs the token. Sensitive never logged, never committed to git.
Rotation policy: yearly, stored in 1Password see
docs/SECRET-ROTATION.md.
EOT
sensitive = true
default = ""
}
# ── Cloud-init kubeconfig postback (issue #183, Option D) ────────────────
variable "deployment_id" {
type = string
description = <<-EOT
catalyst-api's per-deployment 16-char hex identifier. Templated
into the new Sovereign's cloud-init runcmd so the new control
plane PUTs its rewritten kubeconfig to the correct deployment
record:
PUT $${var.catalyst_api_url}/api/v1/deployments/$${var.deployment_id}/kubeconfig
Empty when the catalyst-api caller is using the legacy
out-of-band kubeconfig fetch path; cloud-init then skips the PUT
runcmd entirely.
EOT
default = ""
}
variable "kubeconfig_bearer_token" {
type = string
description = <<-EOT
32-byte cryptographic-random bearer token the new Sovereign's
cloud-init attaches as `Authorization: Bearer <token>` when
PUTting back its kubeconfig (issue #183, Option D). Consumed
once. The catalyst-api persists ONLY the SHA-256 hash on the
deployment record; the plaintext lives in this tfvars file
(file mode 0600 on the catalyst-api PVC) until `tofu destroy`
removes the workdir.
Empty when deployment_id is empty (legacy out-of-band fetch
path); cloud-init then skips the PUT runcmd. Sensitive never
logged by OpenTofu, never committed to git.
EOT
sensitive = true
default = ""
}
variable "catalyst_api_url" {
type = string
description = <<-EOT
Public origin the new Sovereign's cloud-init PUTs its kubeconfig
back to. The full URL is
$${var.catalyst_api_url}/api/v1/deployments/$${var.deployment_id}/kubeconfig
Defaults to the OpenOva-hosted franchise console; air-gapped
franchises override this with their own catalyst-api ingress
via the CATALYST_API_PUBLIC_URL env var on the catalyst-api
Pod. Per docs/INVIOLABLE-PRINCIPLES.md #4 this is runtime
configuration, not code.
EOT
default = "https://console.openova.io/sovereign"
}
# ── GitOps source for Flux bootstrap ──────────────────────────────────────
variable "gitops_repo_url" {
type = string
description = "Git URL Flux on the new cluster watches for clusters/<sovereign-fqdn>/. Defaults to public OpenOva monorepo."
default = "https://github.com/openova-io/openova"
}
variable "gitops_branch" {
type = string
description = "Branch Flux watches"
default = "main"
}
# ── OS hardening ──────────────────────────────────────────────────────────
variable "ssh_allowed_cidrs" {
type = list(string)
description = <<-EOT
Source CIDRs allowed to reach SSH (port 22). Default empty list = SSH
is NOT exposed at the firewall and break-glass requires an out-of-band
path (Hetzner console / VNC). Operators tighten/widen this via
Crossplane Composition once the cluster is up; the firewall rule below
is the Phase 0 fallback only.
EOT
default = []
validation {
condition = alltrue([for c in var.ssh_allowed_cidrs : can(cidrnetmask(c))])
error_message = "Each entry in ssh_allowed_cidrs must be a valid CIDR (e.g. 203.0.113.7/32)."
}
}
variable "enable_unattended_upgrades" {
type = bool
description = "Install + enable unattended-upgrades for security patches on Ubuntu. Default true; disable only for short-lived test sovereigns."
default = true
}
variable "enable_fail2ban" {
type = bool
description = "Install + enable fail2ban with the sshd jail. Default true; disable only when an upstream WAF/IDS already covers the same surface."
default = true
}
# ── Hetzner Object Storage (Phase 0b — issue #371) ────────────────────────
#
# Hetzner Object Storage is the canonical S3 backing for Harbor (#383) and
# Velero (#384) on Hetzner Sovereigns per the omantel handover WBS §3 and
# the ADR-0001-derived "S3 vs SeaweedFS" rule (S3-aware apps write to the
# cloud-provider's native S3; only POSIX-only apps go through SeaweedFS as
# a buffer). For Hetzner that native S3 is Object Storage.
#
# Constraints baked into the rest of this module:
# 1. No native `hcloud_object_storage_*` Terraform resource exists today
# (see versions.tf for the upstream provider audit). Bucket creation
# is delegated to the `aminueza/minio` provider, which speaks the
# S3 bucket API against `<region>.your-objectstorage.com`.
# 2. Hetzner does NOT expose a Cloud API to create S3 access keys
# programmatically — the operator issues them once in the Hetzner
# Console (Object Storage → Manage Credentials, secret half shown
# exactly once and irretrievable thereafter). The wizard collects
# both halves; the catalyst-api validates them via S3 ListBuckets;
# this module receives them as variables and uses them for both
# bucket creation AND interpolation into the Sovereign cloud-init's
# `flux-system/object-storage` Kubernetes Secret (vendor-agnostic
# name since #425).
# 3. Object Storage is available only in fsn1/nbg1/hel1 today. For
# ash/hil compute Sovereigns the operator picks a European Object
# Storage region — Velero/Harbor are latency-tolerant and the
# backup path is asynchronous.
variable "object_storage_region" {
type = string
description = <<-EOT
Hetzner Object Storage region one of fsn1 / nbg1 / hel1 (the
European-only availability zones for Object Storage as of 2026-04).
The endpoint URL is derived as `<region>.your-objectstorage.com` per
https://docs.hetzner.com/storage/object-storage/getting-started/
using-s3-api-tools/. Per docs/INVIOLABLE-PRINCIPLES.md #4 this is a
runtime variable, never hardcoded every Sovereign picks its own
Object Storage region in the wizard.
EOT
validation {
# Authoritative list of Hetzner Object Storage regions as of 2026-04-30.
# Update when Hetzner adds a new Object Storage region (NOT the same
# as Cloud regions — Cloud has ash/hil but Object Storage does not).
condition = contains(["fsn1", "nbg1", "hel1"], var.object_storage_region)
error_message = "Object Storage region must be one of: fsn1 (Falkenstein), nbg1 (Nuremberg), hel1 (Helsinki). Object Storage is European-only as of 2026-04."
}
}
variable "object_storage_access_key" {
type = string
description = <<-EOT
Hetzner Object Storage S3 access key operator-issued once in the
Hetzner Console (Object Storage Manage Credentials). The
catalyst-api validates this against the chosen region's S3 endpoint
via ListBuckets BEFORE `tofu apply` runs, so a typo'd key surfaces
at the wizard credential step, not 5 minutes into provisioning.
Sensitive never logged. Lives only in the per-deployment OpenTofu
workdir (encrypted PVC, mode 0600) and in the Sovereign's cloud-init
user_data; wiped on `tofu destroy`.
EOT
sensitive = true
validation {
# Hetzner S3 access keys are 20-character ASCII per the AWS S3 v4
# signing convention they emulate. We accept the broad shape rather
# than the precise length so future Hetzner format changes don't
# bounce off this validator with a stale literal.
condition = length(var.object_storage_access_key) >= 16 && length(var.object_storage_access_key) <= 64
error_message = "Object Storage access key must be 1664 characters."
}
}
variable "object_storage_secret_key" {
type = string
description = <<-EOT
Hetzner Object Storage S3 secret key operator-issued alongside the
access key in the Hetzner Console. Per Hetzner's docs the secret is
shown EXACTLY ONCE at issue time; if the operator loses it they must
rotate. Sensitive never logged. Same persistence boundary as the
access key: per-deployment encrypted workdir + Sovereign cloud-init
only; wiped on `tofu destroy`.
EOT
sensitive = true
validation {
# Hetzner S3 secret keys are typically 40 base64 characters (AWS-style)
# but the public spec does not pin a length and rotations may emit
# different lengths in the future. 32128 is the resilient range.
condition = length(var.object_storage_secret_key) >= 32 && length(var.object_storage_secret_key) <= 128
error_message = "Object Storage secret key must be 32128 characters."
}
}
variable "harbor_robot_token" {
type = string
description = <<-EOT
Harbor robot account token for `robot$openova-bot` on harbor.openova.io.
Written into the Sovereign's /etc/rancher/k3s/registries.yaml at
cloud-init time so containerd can authenticate against the central
Harbor proxy-cache projects (proxy-dockerhub, proxy-gcr, proxy-quay,
proxy-k8s, proxy-ghcr) when pulling images on fresh Hetzner IPs.
The token is issued on harbor.openova.io via Harbor's robot account API
after the central Harbor instance stands up (issue #557 Step 2). The
catalyst-api provisioner reads it from the `harbor-robot-token` K8s
Secret in the openova-harbor namespace on contabo and forwards it here
at provisioning time. Sensitive never logged, never committed to git.
Default empty: existing test scripts and pre-#557 provisioner builds
that do not pass this variable still render a valid cloud-init (the
registries.yaml password field will be blank, causing containerd to
attempt anonymous pulls on harbor.openova.io which are allowed for
Public proxy projects). Non-empty is enforced by the provisioner for
production Sovereign deployments once harbor.openova.io is live.
EOT
sensitive = true
default = ""
}
variable "pdm_basic_auth_user" {
type = string
description = <<-EOT
Username for the Pool Domain Manager (PDM) public ingress at
`pool.openova.io`. The Sovereign-side catalyst-api uses this
value (paired with `pdm_basic_auth_pass`) to authenticate
every PDM call (Day-2 multi-domain "Add another parent
domain" flow issue #879). Cloud-init writes the value into
a `pdm-basicauth` Secret in the `flux-system` namespace with
Reflector annotations so the Secret mirrors into
`catalyst-system` where catalyst-api reads it via secretKeyRef.
Source on contabo: `openova-system/pool-domain-manager-basicauth`
Secret (operator-managed). The catalyst-api provisioner forwards
plaintext at provisioning time never logged, never committed.
Default empty: when unset, the cloud-init still renders the
`pdm-basicauth` Secret with empty values. The Sovereign-side
pdmFlipNS skips SetBasicAuth when the env value is empty, so
older Sovereigns that pre-date this variable degrade to a
clear PDM 401 instead of a panic. Once the operator fills
this in, a re-provision (or a Secret rotation via cloud-init
re-render) supplies real credentials.
EOT
sensitive = true
default = ""
}
variable "pdm_basic_auth_pass" {
type = string
description = <<-EOT
Password for the Pool Domain Manager (PDM) public ingress.
See `pdm_basic_auth_user` for the full lifecycle. Sensitive.
EOT
sensitive = true
default = ""
}
variable "object_storage_bucket_name" {
type = string
description = <<-EOT
Hetzner Object Storage bucket name. Bucket names share a global
namespace across ALL Hetzner Object Storage tenants per
https://docs.hetzner.com/storage/object-storage/getting-started/
creating-a-bucket/, so we derive a deterministic per-Sovereign name
from the FQDN slug (catalyst-api computes this; the wizard never
surfaces a free-form bucket-name input to the operator). Pattern:
`catalyst-<sovereign-fqdn-with-dots-replaced-by-dashes>`.
The bucket is created idempotently via the `aminueza/minio` provider
in main.tf. Existing buckets with a matching name are adopted (the
minio_s3_bucket resource is idempotent on Create when the bucket
already exists in the same tenant re-running `tofu apply` against
a previously-provisioned Sovereign is a no-op, never an error).
EOT
validation {
# S3 bucket naming rules:
# - 3-63 chars
# - lowercase letters, digits, hyphens
# - must start and end with alphanumeric
condition = can(regex("^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$", var.object_storage_bucket_name))
error_message = "Object Storage bucket name must be 3-63 chars, lowercase alphanumeric + hyphens, starting and ending with alphanumeric (RFC-compliant S3 bucket naming)."
}
}
# ── Handover JWT public key (issue #605, Phase-8b) ────────────────────────
#
# RFC 7517 JWK JSON bytes of the Catalyst-Zero RS256 public key. Written to
# /var/lib/catalyst/handover-jwt-public.jwk (mode 0600) on the new Sovereign
# control-plane by cloud-init. The Sovereign-side Agent-C (auth_handover.go)
# reads this file to verify the one-time handover JWT without a cross-cluster
# RPC to Catalyst-Zero.
#
# Source: the catalyst-api provisioner reads the live Signer's PublicJWK()
# and stamps it onto provisioner.Request.HandoverJWTPublicKey before writing
# tofu.auto.tfvars.json. The field carries json:"-" so the wizard POST body
# can never inject it — it always comes from the live Signer.
#
# Default empty: pre-#605 provisioner builds that do not pass this variable
# write an empty file; auth/handover returns 503 (key unavailable) on any
# Sovereign provisioned without it until a subsequent reprovisioning run.
variable "handover_jwt_public_key" {
type = string
description = <<-EOT
RFC 7517 JWK JSON of the Catalyst-Zero RS256 handover-JWT public key.
Written to /var/lib/catalyst/handover-jwt-public.jwk (mode 0600) on
the new Sovereign control-plane by cloud-init so Agent-C can verify
the one-time JWT without a cross-cluster network call to Catalyst-Zero.
Supplied by the catalyst-api provisioner from h.handoverSigner.PublicJWK().
Empty when the provisioner has no signer (CATALYST_HANDOVER_KEY_PATH unset).
EOT
sensitive = true
default = ""
}