apiVersion: apps/v1 kind: Deployment metadata: name: catalyst-api labels: app.kubernetes.io/name: catalyst-api app.kubernetes.io/component: api annotations: # `kustomize.toolkit.fluxcd.io/force: enabled` is the durable # remediation for the `RollingUpdate -> Recreate` strategy-flip # collision documented in docs/CHART-AUTHORING.md §"Strategy flips # on existing Deployments". # # Failure mode this addresses # --------------------------- # On 2026-04-29 the `catalyst` Flux Kustomization on contabo-mkt # stuck at Ready=False with: # # Deployment.apps "catalyst-api" is invalid: # spec.strategy.rollingUpdate: Forbidden: # may not be specified when strategy `type` is 'Recreate' # # Root cause: the live Deployment had been previously created with # the default `RollingUpdate` strategy (so `rollingUpdate.maxSurge=25%` # and `maxUnavailable=25%` were present on the live object, owned # by the `kubectl-client-side-apply` field manager). Flux's # kustomize-controller submits this manifest via Server-Side Apply # with field manager `kustomize-controller`. SSA's contract is # "set the fields you declare" — it does NOT remove fields owned # by other managers. Result: post-merge object had `type: Recreate` # AND the residual `rollingUpdate.*` block, which the API server's # validator rejects as invalid (Recreate forbids any rollingUpdate # keys). SSA is REQUIRED to reject the merge. No SSA-only chart # change can fix this. # # Why `$patch: replace` does NOT solve this # ----------------------------------------- # The Strategic Merge Patch directive `$patch: replace` would tell # an SMP-aware merger to REPLACE the strategy block instead of # merging into it. But: # - SSA rejects `$patch` outright with "field not declared in # schema" (it's not in apps/v1 Deployment). # - kubectl strict-decoding rejects `$patch` on CREATE under any # mode with "unknown field spec.strategy.$patch" — so adding # it to the chart manifest BREAKS fresh installs. # `$patch: replace` is a runtime SMP directive, never a chart-spec # value. It belongs in a Kustomize `patches:` entry (where the # kustomize binary consumes it at build time and emits a clean # output) — never inline in a base resource. # # Why the Flux force annotation IS the right fix # ---------------------------------------------- # When kustomize-controller's SSA submission fails dry-run with an # Invalid response, this annotation directs the controller to # recover by deleting and recreating THIS resource specifically # (not the whole Kustomization). The recreated Deployment has no # residual `rollingUpdate.*` fields — the regression cannot # recur on the rebuilt object. # # That is NOT a "kubectl delete bandaid": the annotation is part # of the IaC manifest, version-controlled, applied declaratively # via Flux on every reconciliation, scoped to this single # Deployment, and removed only by editing the chart. Per # docs/INVIOLABLE-PRINCIPLES.md #3 (Follow the documented # architecture, exactly — Flux is the ONLY GitOps reconciler) and # #4 (Never hardcode — runtime configuration in Git, not in shell # history): the remediation lives in source control. # # Why this Deployment in particular tolerates a recreate: the # spec declares `strategy.type: Recreate`, so the steady-state # update path is delete-and-recreate anyway. Flux falling back to # delete-and-recreate on a strategy-flip is a no-op relative to a # normal pod-spec change. The deployments PVC is ReadWriteOnce; # the recreate flow detaches it from the old Pod before mounting # it on the new one, which is exactly the contract `Recreate` # enforces. State persistence is maintained because the PVC # itself is NOT recreated by this annotation — only the # Deployment resource is. kustomize.toolkit.fluxcd.io/force: enabled # Reloader watches the sovereign-fqdn + handover-jwt-public ConfigMaps/Secrets # this Pod reads via valueFrom. On Sovereigns, those resources are applied # by the sovereign-tls Kustomization concurrently with the bp-catalyst-platform # HelmRelease. If the Pod started first, optional valueFrom resolves to "" # and SOVEREIGN_FQDN stays empty for the lifetime of the Pod — every handover # then fails the audience check with 401 "invalid audience" (caught live on # otech62, 2026-05-03). Reloader rolls the Deployment when those resources # land, fixing the race without requiring strict Flux dependsOn ordering. configmap.reloader.stakater.com/reload: "sovereign-fqdn" secret.reloader.stakater.com/reload: "handover-jwt-public" spec: replicas: 1 # Recreate strategy is required because the deployments PVC is RWO # (single-attach). A rolling update would try to schedule a second # Pod that mounts the same PVC, which Kubernetes rejects as a # MultiAttachError. RWX with a multi-writer-aware filesystem # (NFS, CephFS) is the path to HA, but Catalyst-Zero today is # single-replica by design — the wizard is interactive and PDM owns # cross-tenant isolation, so a single API server is sufficient. # # The strategy-flip regression that bit contabo-mkt on 2026-04-29 # (apply over a pre-existing RollingUpdate Deployment fails with # `spec.strategy.rollingUpdate: Forbidden`) is recovered by the # `kustomize.toolkit.fluxcd.io/force: enabled` annotation above — # see that annotation's comment for the full failure-mode analysis # and the docs/CHART-AUTHORING.md §"Strategy flips on existing # Deployments" entry. Do NOT add an inline `$patch: replace` here: # it BREAKS fresh installs (kubectl strict-decoding rejects # `spec.strategy.$patch` on create), and Flux's SSA path strips it # anyway. The integration test at tests/integration/strategy-flip.yaml # asserts both the recovery path works and the regression mode is # still detected. strategy: type: Recreate selector: matchLabels: app.kubernetes.io/name: catalyst-api template: metadata: labels: app.kubernetes.io/name: catalyst-api spec: # serviceAccountName — bind the Pod to the dedicated cutover-driver # ServiceAccount so the /api/v1/sovereign/cutover/start handler can # read/patch the cutover ConfigMaps + create/watch Jobs in the # `catalyst` namespace. See serviceaccount-cutover-driver.yaml + # clusterrole-cutover-driver.yaml + clusterrolebinding-cutover- # driver.yaml for the full RBAC graph (issue #830 P0 Bug 1). # # The SA is created by THIS chart in the same namespace catalyst-api # runs in (catalyst-system) and bound at cluster scope (the cutover # endpoint is namespace-configurable via CATALYST_CUTOVER_NAMESPACE). # Without this, the Pod runs as system:serviceaccount:catalyst- # system:default and every cutover-status read returns 502 # "configmaps is forbidden" (caught live on otech102, 2026-05-04). serviceAccountName: catalyst-api-cutover-driver imagePullSecrets: - name: ghcr-pull # fsGroup applies to the volumes mounted into the Pod so the # non-root container UID (65534) can write to the deployments # PVC. Without this, Hetzner Cloud Volumes default to root:root # and the catalyst-api process gets EACCES on every store.Save — # surfacing as the "deployment store unavailable" warning at # startup and silent persistence failures at runtime. # # fsGroupChangePolicy: OnRootMismatch limits the chown traversal # to first start (where the volume is freshly provisioned with # the wrong UID). Subsequent restarts skip the recursive chown # if the root dir already matches, keeping Pod start times # bounded as the deployments directory grows. securityContext: fsGroup: 65534 fsGroupChangePolicy: OnRootMismatch containers: - name: catalyst-api image: "ghcr.io/openova-io/openova/catalyst-api:b45a49f" imagePullPolicy: IfNotPresent ports: - containerPort: 8080 protocol: TCP env: - name: PORT value: "8080" - name: CORS_ORIGIN value: "https://console.openova.io" - name: DYNADOT_API_KEY valueFrom: secretKeyRef: name: dynadot-api-credentials key: api-key # optional=true: Sovereign clusters don't hold Dynadot # credentials — their tenant DNS is served by the # Sovereign's own PowerDNS instance, not the parent # account. Catalyst-Zero (contabo-mkt) supplies the # real secret; Sovereigns use an empty stub or omit it # entirely. Without optional=true the pod refuses to # start when the secret is absent (issue #547). optional: true - name: DYNADOT_API_SECRET valueFrom: secretKeyRef: name: dynadot-api-credentials key: api-secret optional: true # DYNADOT_MANAGED_DOMAINS — comma-separated list of pool domains # the same Dynadot account manages. Per docs/INVIOLABLE-PRINCIPLES.md # #4, this is runtime configuration so adding a third pool domain # (e.g. acme.io) does NOT require a code change — only a secret # update. The Dynadot API is account-scoped (one api-key/api-secret # pair covers every domain owned by the account); this list scopes # which domains the catalyst-api is *allowed* to write records for, # defending against misconfiguration that would let a wizard- # supplied poolDomain trigger writes against an unrelated domain. - name: DYNADOT_MANAGED_DOMAINS valueFrom: secretKeyRef: name: dynadot-api-credentials key: domains # optional=true so deployments using the legacy single-value # `domain` key (pre-#108) keep working until the secret is # migrated; the dynadot package falls through to DYNADOT_DOMAIN # then to its built-in defaults if neither key is present. optional: true - name: DYNADOT_DOMAIN valueFrom: secretKeyRef: name: dynadot-api-credentials key: domain optional: true # CATALYST_TOFU_WORKDIR — provisioner runs `tofu init/plan/apply` # inside this directory. PVC-backed (catalyst-api-deployments) so # in-progress tofu state survives Pod restarts. Without this, # any catalyst-api Pod roll mid-apply (e.g. an unrelated chart # bump that triggers rolling restart on Catalyst-Zero, or a # node reboot) leaks Hetzner resources because partial apply # state is in emptyDir. Caught live on otech64, 2026-05-03: # contabo's catalyst-api was rolled at 21:40:11 (3 minutes # into otech64's tofu apply), terminal_LB created without its # control_plane target, and otech64 came up with an unreachable # 49.12.16.160 LB. Reasonable for fsGroup=65534 above to # provide write access to /var/lib/catalyst (PVC mountPath). - name: CATALYST_TOFU_WORKDIR value: /var/lib/catalyst/tofu # CATALYST_DEPLOYMENTS_DIR — flat-file store for deployment # records (one JSON file per deployment id). Backed by the # PVC mount below so deployments persist across Pod # restarts. Each record is the full Deployment state with # credentials redacted; see internal/store/store.go. - name: CATALYST_DEPLOYMENTS_DIR value: /var/lib/catalyst/deployments # CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS — defensive floor # only. The load-bearing termination gate is now the # informer's HasSynced signal (after WaitForCacheSync the # full bp-* HelmRelease set is in the cache, regardless of # cardinality). Set to 1 so the watch still refuses to # terminate when the cache is completely empty (the # "bootstrap-kit Kustomization never reconciled at all" # footgun, classified as OutcomeFluxNotReconciling). # # Earlier values (11, then 38) tied this to the kit count; # that coupling is brittle — otech48 (2026-05-03) sat # phase1-watching forever because the env was 38 but the # kit had drifted to 37. The HasSynced gate is drift-proof. - name: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS value: "1" # CATALYST_KUBECONFIGS_DIR — sibling directory on the same # PVC for the plaintext kubeconfigs the new Sovereign POSTs # back via the bearer-token endpoint (issue #183, Option D). # One .yaml per deployment, mode 0600. The store JSON # record carries only the file path + a SHA-256 hash of # the bearer; the plaintext kubeconfig is NEVER serialized # into the JSON. - name: CATALYST_KUBECONFIGS_DIR value: /var/lib/catalyst/kubeconfigs # CATALYST_API_PUBLIC_URL — the public origin the new # Sovereign's cloud-init PUTs its kubeconfig back to. The # OpenTofu module templates this into the Sovereign's # user_data so the Sovereign knows where to call. Per # docs/INVIOLABLE-PRINCIPLES.md #4 this is runtime # configuration; air-gapped franchises override it # without code change. - name: CATALYST_API_PUBLIC_URL value: https://console.openova.io/sovereign # CATALYST_K8SCACHE_KUBECONFIGS_DIR — issue #321. Directory # the k8scache.Factory reads kubeconfigs from at startup. # The data-plane SharedInformerFactory opens one informer # per kubeconfig file; the cloud-init postback handler # (PUT /api/v1/deployments/{id}/kubeconfig) writes here on # Phase-1 attach so a fresh Sovereign id is automatically # picked up at next catalyst-api restart. The same PVC # (catalyst-api-deployments) backs the existing # deployments store; the data-plane reads the kubeconfigs/ # subdirectory directly. - name: CATALYST_K8SCACHE_KUBECONFIGS_DIR value: /var/lib/catalyst/kubeconfigs # CATALYST_K8SCACHE_SNAPSHOT_DIR — issue #321 cold-start # mitigation. Backed by a separate 5Gi PVC # (catalyst-api-cache) so its size is independent of the # deployments store. See api-cache-pvc.yaml for the sizing # rationale + the cold-start latency contract. - name: CATALYST_K8SCACHE_SNAPSHOT_DIR value: /var/cache/sov-cache # CATALYST_K8SCACHE_KINDS_CONFIGMAP — optional ConfigMap # extending the built-in kinds registry. Per docs/ # INVIOLABLE-PRINCIPLES.md #4 a new watched GVR (e.g. # HelmRelease, Kustomization) is a runtime configuration # change, not a code change. Empty disables ConfigMap # loading; built-in DefaultKinds is used. - name: CATALYST_K8SCACHE_KINDS_CONFIGMAP value: catalyst-k8scache-kinds - name: CATALYST_K8SCACHE_KINDS_CONFIGMAP_NAMESPACE value: catalyst # CATALYST_GHCR_PULL_TOKEN — long-lived GHCR pull token that # the provisioner stamps onto every Request and the OpenTofu # cloud-init template writes into the new Sovereign's # flux-system/ghcr-pull Secret so Flux source-controller # can pull private bp-* OCI artifacts from # ghcr.io/openova-io/. Without this, Phase 1 stalls at # bp-cilium with "secrets ghcr-pull not found" — verified # live on omantel.omani.works pre-fix. # # optional: true — when the Secret or key is missing the # Pod still starts (with the env var unset). The # provisioner's Validate() rejects deployments that need # the token (Phase 1 bootstrap-kit pulls private bp-* # charts) with a clear pointer to docs/SECRET-ROTATION.md, # so a misconfigured catalyst-api fails fast on # /api/v1/deployments POST instead of silently mid-apply. # /healthz, /api/v1/credentials/validate, and the BYO # registrar proxy keep working — they don't read the # token at all. # # Rotation: yearly, see docs/SECRET-ROTATION.md. The Secret # is created out-of-band by an operator (never via Flux, # never committed to git) — the chart references it but # does not template it. - name: CATALYST_GHCR_PULL_TOKEN valueFrom: secretKeyRef: name: catalyst-ghcr-pull-token key: token optional: true # CATALYST_HARBOR_ROBOT_TOKEN — central Harbor proxy-cache # robot account secret (issue #557 + #557 follow-up). The # value is interpolated into the new Sovereign's # /etc/rancher/k3s/registries.yaml at cloud-init time so # containerd authenticates against harbor.openova.io's proxy # projects (proxy-dockerhub etc). # # Provisioning seam (catalyst-system Pod gets the Secret): # 1. Tofu var.harbor_robot_token enters cloud-init # (infra/hetzner/cloudinit-control-plane.tftpl). # 2. Cloud-init writes /var/lib/catalyst/harbor-robot- # token-secret.yaml into flux-system ns with the # auto-mirror Reflector annotations # (reflection-auto-enabled: "true"). # 3. runcmd applies it BEFORE flux-bootstrap, so the # Secret exists before any Helm release runs. # 4. bp-reflector (slot 05a) propagates it into every # namespace (incl. catalyst-system) on first reconcile. # 5. This Pod's secretKeyRef resolves once the mirror lands. # Mirrors the canonical pattern that flux-system/ghcr-pull # already uses (PR #543). # # NOT optional — provisioner.Validate() rejects deployments # with an empty token. The architecture mandate is that every # Sovereign image pull goes through harbor.openova.io; falling # through to docker.io is forbidden (rate-limit makes a fresh # Hetzner IP unbootable within minutes). When `optional: true` # was previously contemplated we chose against it: a missing # token must surface immediately as a Pod start failure # (CreateContainerConfigError), not silently mid-provision. # # Rotation: yearly. Re-render Tofu plan → re-apply cloud-init # → kubectl apply runs against the existing Secret with # rotated bytes; bp-reflector propagates the rotation to all # mirrored copies on the next watch tick. Plaintext NEVER # lives in git. - name: CATALYST_HARBOR_ROBOT_TOKEN valueFrom: secretKeyRef: name: harbor-robot-token key: token # CATALYST_POWERDNS_API_KEY — contabo PowerDNS API key (PR # #681 followup). The value is interpolated into the new # Sovereign's `cert-manager/powerdns-api-credentials` Secret # at cloud-init time so bp-cert-manager-powerdns-webhook # can write DNS-01 challenge TXT records to contabo's # authoritative omani.works zone. # # Provisioning seam: # 1. Source: contabo's `openova-system/powerdns-api- # credentials` Secret (created by bp-powerdns chart). # 2. Reflector mirrors it into every namespace incl. # catalyst (annotations on the source: reflection- # auto-enabled: "true", reflection-auto-namespaces: ""). # 3. This Pod resolves it via secretKeyRef. # 4. provisioner.New() reads CATALYST_POWERDNS_API_KEY at # startup, stamps onto every Request. # 5. cloud-init writes the Sovereign-side Secret in # cert-manager namespace BEFORE Flux reconciles # bp-cert-manager-powerdns-webhook. # # optional=true: Catalyst-Zero pods on Sovereigns don't have # this Secret reflected (their PowerDNS is local) so the # bootstrap shape stays clean across both contabo+Sovereign # catalyst-api deployments. - name: CATALYST_POWERDNS_API_KEY valueFrom: secretKeyRef: name: powerdns-api-credentials key: api-key optional: true # CATALYST_POWERDNS_API_URL — base URL of the per-Sovereign # PowerDNS REST API (issue #827). Used by: # - the SME-tenant pipeline's PATCH-RRset writer # (sme_tenant_dns.go) for free-subdomain provisioning # - the multi-zone parent-domain handler # (parent_domains.go) for runtime add-zone # Default is the in-cluster Service FQDN of the Sovereign's # own PowerDNS (the Helm chart targets namespace `powerdns` # with default release name `powerdns`). Operators in # non-standard layouts override via the Helm values overlay # at clusters//bootstrap-kit/13-bp-catalyst- # platform.yaml. # # NOTE — DUAL-MODE CONTRACT (see SOVEREIGN_FQDN block below # for the canonical explanation): this file is consumed BOTH # by Helm (per-Sovereign install) AND by Kustomize (contabo- # mkt's flux Kustomization at path: ./products/catalyst/chart/ # templates). Helm template syntax (double-curly directives) # in this file BREAKS the Kustomize build with # "yaml: invalid map key" and stalls every contabo # reconciliation. The 1.4.0 version of this block used # {{ default "..." .Values.catalystApi.powerdnsURL }} — that # broke contabo's catalyst-platform Kustomization until this # follow-up landed. Issue #830 follow-up. # # Solution: the in-cluster Service URL is a non-secret # constant on every Sovereign that ships bp-powerdns at its # canonical release name (powerdns/powerdns). Hardcode the # literal here so the Kustomize build stays clean. Per- # Sovereign overrides are still possible via the per- # Sovereign HelmRelease overlay's `catalystApi.env` # additional-env patch that takes precedence over the # default below. - name: CATALYST_POWERDNS_API_URL value: "http://powerdns.powerdns.svc.cluster.local:8081" # CATALYST_POWERDNS_SERVER_ID — virtually always "localhost" # per the PowerDNS REST API contract. Operator-overridable # for multi-tenant PowerDNS deployments where a single # PowerDNS instance hosts multiple servers (override via the # HelmRelease overlay env patch — same pattern as # CATALYST_POWERDNS_API_URL above). - name: CATALYST_POWERDNS_SERVER_ID value: "localhost" # ── /auth/handover Keycloak service-account (issue #606) ────────── # CATALYST_KC_ADDR — Keycloak base URL. Defaults to in-cluster # service FQDN in code; override here for non-standard Sovereign # Keycloak deployments. # optional=true: Catalyst-Zero pods don't run Keycloak locally. - name: CATALYST_KC_ADDR valueFrom: secretKeyRef: name: catalyst-kc-sa-credentials key: addr optional: true - name: CATALYST_KC_REALM valueFrom: secretKeyRef: name: catalyst-kc-sa-credentials key: realm optional: true - name: CATALYST_KC_SA_CLIENT_ID valueFrom: secretKeyRef: name: catalyst-kc-sa-credentials key: client-id optional: true - name: CATALYST_KC_SA_CLIENT_SECRET valueFrom: secretKeyRef: name: catalyst-kc-sa-credentials key: client-secret optional: true # CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH — path to the JWK file that # holds the RS256 public key for validating one-time handover JWTs. # The K8s Secret `catalyst-handover-jwt-public` (created by # cloud-init at provision time, see infra/hetzner/cloudinit-control- # plane.tftpl) is mounted as a directory at /etc/catalyst/handover- # jwt-public/, so the JWK lives at /etc/catalyst/handover-jwt-public/ # public.jwk. We deliberately mount the Secret as a directory rather # than using subPath: the catalyst-api PVC at /var/lib/catalyst is # ReadWriteOnce and a leftover empty directory at the legacy path # /var/lib/catalyst/handover-jwt-public.jwk/ from earlier installs # (where the Secret was missing and Kubernetes created an empty # directory in the volume) collides with the subPath file mount on # re-provisioning. Mounting under /etc/ keeps the JWK off the PVC # entirely so the conflict cannot recur. Caught live on otech48, # 2026-05-03. - name: CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH value: /etc/catalyst/handover-jwt-public/public.jwk # SOVEREIGN_FQDN — Sovereign's public FQDN. The /auth/handover # validator (auth_handover.go) reads this to compute the expected # JWT audience claim ("https://console." + SOVEREIGN_FQDN). When # unset on a Sovereign, the audience check collapses to "https:// # console." and every valid token is rejected with "invalid # audience" 401 — caught live on otech48, 2026-05-03. # # NOTE: this file is consumed BOTH by Helm (per-Sovereign install # via bp-catalyst-platform OCI chart) AND by Kustomize (contabo- # mkt's clusters/contabo-mkt/apps/catalyst-platform Kustomization # at path: ./products/catalyst/chart/templates). Kustomize parses # raw YAML — Helm template syntax (double-curly directives) here # breaks the Kustomize build (caught live on contabo 2026-05-03 # from commit adf8dc7d: "yaml: invalid map key"). # # Solution: read the value from a ConfigMap that exists ONLY on # Sovereigns (not contabo). On contabo the optional reference # resolves to empty (correct — catalyst-api on contabo is the # SIGNER never the validator, /auth/handover never hits there). # On Sovereigns, clusters/_template/sovereign-tls/sovereign-fqdn- # configmap.yaml renders the ConfigMap from envsubst-ed # ${SOVEREIGN_FQDN} when Flux applies the kustomization. - name: SOVEREIGN_FQDN valueFrom: configMapKeyRef: name: sovereign-fqdn key: fqdn optional: true # CATALYST_OTECH_FQDN — same value as SOVEREIGN_FQDN, but read by # the SME tenant create handler (sme_tenant.go) and the # sovereign-parent-domains seed (sovereign_parent_domains.go). # The two envs exist for historical reasons: SOVEREIGN_FQDN is the # Phase-8b handover-flow JWT-audience env; CATALYST_OTECH_FQDN is # the SME-tier tenant-pipeline env (epic #795 / #804). Both # ultimately point at the Sovereign's public FQDN. Wired from the # same `sovereign-fqdn` ConfigMap (key `fqdn`). optional=true since # Catalyst-Zero (contabo) doesn't run the SME tenant pipeline. # Issue #876 — without this, POST /api/v1/sme/tenants returns # 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign. - name: CATALYST_OTECH_FQDN valueFrom: configMapKeyRef: name: sovereign-fqdn key: fqdn optional: true # CATALYST_SELF_DEPLOYMENT_ID — the deployment-record id this # Sovereign was provisioned under on the contabo orchestrator. # Read by HandleSovereignSelf (sovereign_self.go) so the # Sovereign-side catalyst-ui can resolve /console/ to the # canonical /provision// deployment-scoped UI. # Sourced from the sovereign-fqdn ConfigMap (key # selfDeploymentId), stamped by the orchestrator's per- # Sovereign overlay writer at handover. Empty on contabo and # on freshly-provisioned Sovereigns whose handover hasn't run # yet — HandleSovereignSelf returns 503 in that window so # the UI shows a "waiting for handover" pill. - name: CATALYST_SELF_DEPLOYMENT_ID valueFrom: configMapKeyRef: name: sovereign-fqdn key: selfDeploymentId optional: true # SOVEREIGN_LB_IP — Sovereign's load-balancer public IPv4. Used by # the Day-2 multi-domain add-domain flow (issue #900) to # pre-register glue records at the customer's registrar before # the set_ns flip. Without it Dynadot rejects with # "'ns1..omani.works' needs to be registered with an ip # address before it can be used" — caught live during otech103 # multi-domain verification. # # Sourced from the chart's `global.sovereignLBIP` value (rendered # into the same `sovereign-fqdn` ConfigMap that holds `fqdn`). # optional=true: Catalyst-Zero (contabo) doesn't run the Sovereign- # side multi-domain pipeline; the env stays empty and the glue # path becomes a no-op (plain set_ns flows through unchanged). - name: SOVEREIGN_LB_IP valueFrom: configMapKeyRef: name: sovereign-fqdn key: lbIP optional: true # CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN — basic-auth # credentials embedded in the GitOps clone URL (issue #878). # Pre-cutover (Catalyst-Zero): User=x-access-token, Token=GitHub # PAT (already wired via separate CATALYST_GITOPS_TOKEN secret on # contabo). Post-cutover (Sovereign): User=gitea_admin, # Token= from the local Gitea admin secret. # The same secret (`gitea-admin-secret`) is mirrored into # catalyst-system via the bp-reflector annotation block on # bp-gitea (issue #866), so this Sovereign-side wiring works # post-Day-2-Independence without a manual mirror step. # optional=true: Catalyst-Zero (contabo) does not run the SME # tenant pipeline. - name: CATALYST_GITOPS_USER valueFrom: secretKeyRef: name: gitea-admin-secret key: username optional: true - name: CATALYST_GITOPS_TOKEN valueFrom: secretKeyRef: name: gitea-admin-secret key: password optional: true # POOL_DOMAIN_MANAGER_URL — base URL of the central Pool Domain # Manager (PDM) ingress on Catalyst-Zero (contabo). Sovereign- # side catalyst-api calls PDM's /api/v1/registrar/{r}/set-ns # endpoint for the Day-2 multi-domain "Add another parent # domain" flow (issue #879, parent epic #825 / #829). # # Why a public ingress URL (not an in-cluster Service): # the in-cluster default `pool-domain-manager.openova-system. # svc.cluster.local` ONLY resolves on the contabo cluster # (PDM lives in `openova-system` ns there). On a franchised # Sovereign post-handover, that DNS name is NXDOMAIN, so # every Day-2 add-domain call returned `dial tcp: lookup # pool-domain-manager.openova-system.svc.cluster.local on # 10.43.0.10:53: no such host` (caught live on otech103, # 2026-05-05 — issue #879 verification). # # The default below points at the public PDM ingress on # contabo (`pool.openova.io`). Per Inviolable Principle #4 # (never hardcode), per-Sovereign overlays may override via # `catalystApi.poolDomainManagerURL` in values. Catalyst-Zero # (contabo) leaves this default — its catalyst-api Pod hits # the SAME public URL via its own loopback ingress (the proxy # is idempotent on the source cluster). # # Pairs with CATALYST_PDM_BASIC_AUTH_USER / _PASS below: the # PDM ingress at pool.openova.io is gated by Traefik basicAuth # (clusters/contabo-mkt/apps/pool-domain-manager/ingress.yaml). # Both halves wired together so a fresh Sovereign reaches PDM # without a manual env-var patch. # # NOTE — DUAL-MODE CONTRACT: this file is consumed BOTH by # Helm (per-Sovereign install via bp-catalyst-platform OCI) # AND by Kustomize (contabo-mkt's clusters/contabo-mkt/apps/ # catalyst-platform). The default literal below (no Helm # template directives) keeps both build paths clean. Per- # Sovereign overlays override via the HelmRelease overlay's # `catalystApi.env` additional-env patch (Helm-only, takes # precedence over THIS default at template-render time). - name: POOL_DOMAIN_MANAGER_URL value: "https://pool.openova.io" # CATALYST_PDM_BASIC_AUTH_USER / _PASS — basic-auth credentials # for the PDM public ingress (issue #879 Bug 2). The Sovereign- # side catalyst-api adds `Authorization: Basic …` to every PDM # call so the Traefik basicAuth Middleware in front of # pool.openova.io accepts the request. Without this, every # Day-2 add-domain call returns 401 from PDM (caught live on # otech103). # # Source Secret (`pdm-basicauth`, keys `username` + `password`) # is pre-provisioned by cloud-init on every Sovereign at # provision time, mirrored via the same Reflector seam ghcr- # pull / harbor-robot-token already use. optional=true so: # - Catalyst-Zero pods (contabo's catalyst-api) start cleanly # when the Secret is absent. On contabo the in-cluster # Service path bypasses the ingress entirely and BasicAuth # is a no-op. # - CI / local dev / older Sovereigns that pre-date this # provisioning seam start cleanly. POSTs without auth get # 401 from PDM with a clear log line, instead of the Pod # crashlooping on start. # # Per Inviolable Principle #10: the credentials never enter a # logged struct or a deployment record — loaded into the Pod # env once at start, read per-call by pdmFlipNS only. - name: CATALYST_PDM_BASIC_AUTH_USER valueFrom: secretKeyRef: name: pdm-basicauth key: username optional: true - name: CATALYST_PDM_BASIC_AUTH_PASS valueFrom: secretKeyRef: name: pdm-basicauth key: password optional: true # CATALYST_HANDOVER_KEY_PATH — path to the RS256 PRIVATE key # catalyst-api uses to mint magic-link + handover JWTs. The # signer auto-generates the keypair on first start if absent. # MUST be on a writable PVC mount. Catalyst-Zero only. - name: CATALYST_HANDOVER_KEY_PATH value: /var/lib/catalyst/handover-jwt-private.pem # ── Magic-link auth (issue #608, Phase-8b Agent A) ────────────── # CATALYST_KC_CLIENT_ID — OIDC client ID for the Catalyst-Zero # UI (catalyst-zero-ui PKCE client). Defaults to "catalyst-zero-ui" # in code; override here for multi-tenant or custom client names. # optional=true: Sovereign clusters don't use this auth path. - name: CATALYST_KC_CLIENT_ID valueFrom: secretKeyRef: name: catalyst-magic-link-credentials key: kc-client-id optional: true # CATALYST_KC_REDIRECT_URI — OAuth callback URL the Keycloak magic- # link redirects to after verification (e.g. # https://console.openova.io/sovereign/auth/callback). # Per INVIOLABLE-PRINCIPLES #4: runtime configuration, not hardcoded. - name: CATALYST_KC_REDIRECT_URI valueFrom: secretKeyRef: name: catalyst-magic-link-credentials key: kc-redirect-uri optional: true # CATALYST_SESSION_COOKIE_SECRET — HMAC-SHA256 key for signing the # catalyst_session HttpOnly cookie value. 32 random bytes (base64url # encoded). Rotation invalidates all active sessions. - name: CATALYST_SESSION_COOKIE_SECRET valueFrom: secretKeyRef: name: catalyst-magic-link-credentials key: session-cookie-secret optional: true # CATALYST_POST_AUTH_REDIRECT — URL the browser is sent to after a # successful magic-link / PIN callback. Defaults to /wizard in code. # Catalyst-Zero (contabo) routes the UI under the /sovereign prefix # (Traefik strip-prefix is transparent to the server-side Location # header), so contabo overrides this to /sovereign/wizard via the # per-environment overlay. On a freshly franchised Sovereign the # wizard is mothership-only — empty page on /sovereign/wizard. # The post-handover Sovereign Console homepage is /sovereign/components, # so that's the default we now ship (issue #901, 2026-05-05). # # DUAL-MODE CONTRACT — see CATALYST_POWERDNS_API_URL block above: # this file is consumed by both Helm (Sovereign) and Kustomize # (contabo-mkt). Helm template directives (curly-brace syntax) in # `value:` break the Kustomize render with "yaml: invalid map key". # So this default is a literal. Per-Sovereign overrides go through # the HelmRelease overlay's `catalystApi.env` additional-env patch, # NOT through this file. # # Per INVIOLABLE-PRINCIPLES #4: the override seam exists (overlay # env patch); only the chart-shipped default is a literal. - name: CATALYST_POST_AUTH_REDIRECT value: "/sovereign/components" # ── Option-B magic-link: openova realm service account ─────────── # CATALYST_OPENOVA_KC_ADDR — Keycloak base URL for the openova realm. # Defaults in code to keycloak-zero.keycloak-zero.svc (in-cluster # on Catalyst-Zero). optional=true: Sovereign clusters don't run # the openova realm. - name: CATALYST_OPENOVA_KC_ADDR valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: kc-addr optional: true - name: CATALYST_OPENOVA_KC_REALM valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: kc-realm optional: true - name: CATALYST_OPENOVA_KC_SA_CLIENT_ID valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: kc-sa-client-id optional: true - name: CATALYST_OPENOVA_KC_SA_CLIENT_SECRET valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: kc-sa-client-secret optional: true # CATALYST_OPENOVA_KC_AUDIENCE — OIDC audience for KC token-exchange. # Defaults to "catalyst-zero-ui" in code. optional=true. - name: CATALYST_OPENOVA_KC_AUDIENCE valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: kc-audience optional: true # CATALYST_SMTP_HOST / CATALYST_SMTP_PORT — Stalwart SMTP relay for # magic-link email delivery. Defaults in code to # stalwart-web.stalwart.svc.cluster.local:587. optional=true. - name: CATALYST_SMTP_HOST valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: smtp-host optional: true - name: CATALYST_SMTP_PORT valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: smtp-port optional: true - name: CATALYST_SMTP_USER valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: smtp-user optional: true - name: CATALYST_SMTP_PASS valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: smtp-pass optional: true - name: CATALYST_SMTP_FROM valueFrom: secretKeyRef: name: catalyst-openova-kc-credentials key: smtp-from optional: true # CATALYST_SESSION_COOKIE_DOMAIN — optional domain scoping for the # catalyst_session + catalyst_refresh cookies. # # Why this is empty by default (issue #910 Bug 2) # =============================================== # Pre-1.4.19 this was hardcoded `console.openova.io` because that # was the host Catalyst-Zero (contabo) serves both /sovereign/wizard # and /sovereign/auth/magic from. On contabo that worked: the # request host == the cookie domain, so the browser accepted the # Set-Cookie and re-presented it on every subsequent request. # # On a freshly franchised Sovereign (e.g. console.otech105.omani. # works, caught live 2026-05-05) the same hardcoded value made the # browser refuse to bind the cookie at all: the Set-Cookie header # had `Domain=console.openova.io` while the request host was # `console.otech105.omani.works`. RFC 6265 §5.3 step 6 rejects any # Set-Cookie where the request URI's host is not the cookie's # domain (or a sub-domain). The browser silently dropped the # cookie → next /api/* request had no session → backend redirected # to /login → infinite loop. Login broke for every Sovereign. # # Empty value contract: when CATALYST_SESSION_COOKIE_DOMAIN is # empty, the auth handler omits the Domain attribute from # Set-Cookie. Per RFC 6265 the browser binds the cookie to the # exact request host. That is the correct behaviour on BOTH: # - Sovereign: request host = console., cookie binds # there, /api/* on the same host re-presents it. # - Catalyst-Zero (contabo): request host = console.openova.io, # cookie binds there. Wizard + magic-link callbacks are # served from the same Ingress so a single cookie jar is # sufficient. # # Per the dual-mode contract documented in the # CATALYST_POWERDNS_API_URL block above, this MUST stay a literal # value (no Helm template directives) so the Kustomize-mode # contabo build keeps parsing. Per-Sovereign overlays MAY # override via the `catalystApi.env` additional-env patch in the # per-cluster HelmRelease (Helm-only codepath, takes precedence # over this default at template-render time). - name: CATALYST_SESSION_COOKIE_DOMAIN value: "" resources: requests: cpu: 50m memory: 128Mi limits: # tofu provider plugins (hcloud ~80MB, dynadot ~30MB) + state + # plan files easily exceed the prior 64Mi cap. 1Gi gives headroom # for parallel provider init and sustained `apply` work. cpu: 1000m memory: 1Gi # Liveness vs readiness — the split is REQUIRED, not cosmetic # (issue #530). /healthz is liveness: it returns 200 whenever # the catalyst-api process is up and the HTTP server is # serving. /readyz is readiness: it returns 200 only when the # primary Sovereign's Pod + Deployment informers are synced # (or no Sovereigns are registered yet). # # The previous wiring pointed BOTH probes at /healthz AND # /healthz performed the strict informer-sync check. The # crashloop chain that followed: # # 1. Operator POSTs a fresh deployment. # 2. catalyst-api registers the Sovereign in k8scache and # starts looking for a kubeconfig file on the PVC. # 3. Kubeconfig will NOT arrive until the new Sovereign's # cloud-init runs (~60-120s) and PUTs it back. Until # then, informers cannot start, sync flips false. # 4. /healthz returns 503. kubelet kills the Pod on the # next liveness probe (~33s). # 5. Restarted Pod restores deployments from the PVC, # re-registers the Sovereign, re-enters the same # no-kubeconfig state. Loop repeats. # 6. Service has zero ready endpoints throughout. nginx # returns 502 to cloud-init's kubeconfig PUT. The PUT # never reaches catalyst-api. Provision stalls forever. # # The fix: liveness must be process-level (am I up?), NOT # workload-level (do I have a kubeconfig?). The strict # informer-sync check stays — moved to /readyz — so a Pod # whose primary Sovereign is mid-sync briefly drops out of # the Service rotation but is NOT restarted. The kubeconfig # PUT endpoint reaches catalyst-api the moment cloud-init # calls it, breaking the deadlock. livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 3 periodSeconds: 10 readinessProbe: httpGet: path: /readyz port: 8080 initialDelaySeconds: 2 periodSeconds: 5 securityContext: allowPrivilegeEscalation: false # readOnlyRootFilesystem deliberately false: the bootstrap installer # writes kubeconfig temp files (mode 0600) under /tmp and helm # downloads chart caches under $HOME. Per Catalyst security policy # these writes are scoped via emptyDir below, never to the image's # actual root FS. readOnlyRootFilesystem: false runAsNonRoot: true runAsUser: 65534 volumeMounts: - name: tmp mountPath: /tmp - name: home mountPath: /home/nonroot # Catalyst PVC — mounted at /var/lib/catalyst so two # subdirectories live on the same single-attach volume: # # deployments/.json — flat-file deployment store. # Every catalyst-api restart that rehydrates from # this directory closes the user-reported regression # where a deployment id created at 12:57 vanished # after 6 image rolls. The store walks every *.json # on startup; in-flight rows are rewritten to # `failed` with operator instructions for purging # orphaned Hetzner resources. # # kubeconfigs/.yaml — plaintext kubeconfig POSTed # back from cloud-init via the bearer-token endpoint # (issue #183, Option D). Mode 0600 per file. The # path is persisted in the deployment record so a # Pod restart mid-Phase-1 reattaches the helmwatch # goroutine. # # One PVC, one mount — keeps the failure modes (PVC # unbind, fs full) bounded to one volume, and lets the # Go process create both subdirectories on startup # without a second volume claim or init container. - name: catalyst mountPath: /var/lib/catalyst # k8scache disk-snapshot mount (issue #321). Separate PVC # so cache size is independent of deployment-record # storage. The k8scache loop writes one JSON per # (cluster, kind) here, mode 0600. Pruned by the loop # itself when a snapshot ages past 1h. - name: sov-cache mountPath: /var/cache/sov-cache # handover-jwt-public — RS256 public key JWK distributed by # cloud-init from Catalyst-Zero's signing keypair. Mounted # read-only as a directory under /etc/catalyst/ (NOT under # /var/lib/catalyst because that is the catalyst-api PVC; a # leftover empty directory at the legacy file path from # pre-#606 installs would collide with a subPath file mount on # re-provision). The JWK lives at /etc/catalyst/handover-jwt- # public/public.jwk — see CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH # above. optional=true on the Secret so pods start on # Catalyst-Zero (which is the SIGNER, not the verifier) and # in CI where the Secret may be absent. - name: handover-jwt-public mountPath: /etc/catalyst/handover-jwt-public readOnly: true volumes: - name: tmp emptyDir: # 2Gi to hold the per-deployment OpenTofu workdir tree under # /tmp/catalyst/tofu// (provider plugins + state # + plan binary). Each Sovereign run gets its own subdirectory. sizeLimit: 2Gi - name: home emptyDir: sizeLimit: 256Mi # Persistent catalyst-api state — mounted at /var/lib/catalyst # so deployments/ and kubeconfigs/ share one volume. The PVC # must already exist in the same namespace under the name # catalyst-api-deployments; see api-deployments-pvc.yaml in # this chart. Single-attach (RWO) is fine because the # Deployment is single-replica with the Recreate strategy # declared above; a future HA rework would need RWX or a # different persistence layer. - name: catalyst persistentVolumeClaim: claimName: catalyst-api-deployments # k8scache disk-snapshot PVC (issue #321). 5Gi RWO; see # api-cache-pvc.yaml for the sizing + cold-start contract. - name: sov-cache persistentVolumeClaim: claimName: catalyst-api-cache # handover-jwt-public — RS256 public key JWK written by cloud-init # from Catalyst-Zero's signing keypair. Secret is optional so # Catalyst-Zero pods (the signer) and CI start without it. - name: handover-jwt-public secret: secretName: catalyst-handover-jwt-public optional: true items: - key: public.jwk path: public.jwk