openova/products/catalyst/chart/templates/api-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: catalyst-api
  labels:
    app.kubernetes.io/name: catalyst-api
    app.kubernetes.io/component: api
  annotations:
    # `kustomize.toolkit.fluxcd.io/force: enabled` is the durable
    # remediation for the `RollingUpdate -> Recreate` strategy-flip
    # collision documented in docs/CHART-AUTHORING.md §"Strategy flips
    # on existing Deployments".
    #
    # Failure mode this addresses
    # ---------------------------
    # On 2026-04-29 the `catalyst` Flux Kustomization on contabo-mkt
    # stuck at Ready=False with:
    #
    #   Deployment.apps "catalyst-api" is invalid:
    #     spec.strategy.rollingUpdate: Forbidden:
    #       may not be specified when strategy `type` is 'Recreate'
    #
    # Root cause: the live Deployment had been previously created with
    # the default `RollingUpdate` strategy (so `rollingUpdate.maxSurge=25%`
    # and `maxUnavailable=25%` were present on the live object, owned
    # by the `kubectl-client-side-apply` field manager). Flux's
    # kustomize-controller submits this manifest via Server-Side Apply
    # with field manager `kustomize-controller`. SSA's contract is
    # "set the fields you declare" — it does NOT remove fields owned
    # by other managers. Result: post-merge object had `type: Recreate`
    # AND the residual `rollingUpdate.*` block, which the API server's
    # validator rejects as invalid (Recreate forbids any rollingUpdate
    # keys). SSA is REQUIRED to reject the merge. No SSA-only chart
    # change can fix this.
    #
    # Why `$patch: replace` does NOT solve this
    # -----------------------------------------
    # The Strategic Merge Patch directive `$patch: replace` would tell
    # an SMP-aware merger to REPLACE the strategy block instead of
    # merging into it. But:
    #   - SSA rejects `$patch` outright with "field not declared in
    #     schema" (it's not in apps/v1 Deployment).
    #   - kubectl strict-decoding rejects `$patch` on CREATE under any
    #     mode with "unknown field spec.strategy.$patch" — so adding
    #     it to the chart manifest BREAKS fresh installs.
    # `$patch: replace` is a runtime SMP directive, never a chart-spec
    # value. It belongs in a Kustomize `patches:` entry (where the
    # kustomize binary consumes it at build time and emits a clean
    # output) — never inline in a base resource.
    #
    # Why the Flux force annotation IS the right fix
    # ----------------------------------------------
    # When kustomize-controller's SSA submission fails dry-run with an
    # Invalid response, this annotation directs the controller to
    # recover by deleting and recreating THIS resource specifically
    # (not the whole Kustomization). The recreated Deployment has no
    # residual `rollingUpdate.*` fields — the regression cannot
    # recur on the rebuilt object.
    #
    # That is NOT a "kubectl delete bandaid": the annotation is part
    # of the IaC manifest, version-controlled, applied declaratively
    # via Flux on every reconciliation, scoped to this single
    # Deployment, and removed only by editing the chart. Per
    # docs/INVIOLABLE-PRINCIPLES.md #3 (Follow the documented
    # architecture, exactly — Flux is the ONLY GitOps reconciler) and
    # #4 (Never hardcode — runtime configuration in Git, not in shell
    # history): the remediation lives in source control.
    #
    # Why this Deployment in particular tolerates a recreate: the
    # spec declares `strategy.type: Recreate`, so the steady-state
    # update path is delete-and-recreate anyway. Flux falling back to
    # delete-and-recreate on a strategy-flip is a no-op relative to a
    # normal pod-spec change. The deployments PVC is ReadWriteOnce;
    # the recreate flow detaches it from the old Pod before mounting
    # it on the new one, which is exactly the contract `Recreate`
    # enforces. State persistence is maintained because the PVC
    # itself is NOT recreated by this annotation — only the
    # Deployment resource is.
    kustomize.toolkit.fluxcd.io/force: enabled
    # Reloader watches the sovereign-fqdn + handover-jwt-public ConfigMaps/Secrets
    # this Pod reads via valueFrom. On Sovereigns, those resources are applied
    # by the sovereign-tls Kustomization concurrently with the bp-catalyst-platform
    # HelmRelease. If the Pod started first, optional valueFrom resolves to ""
    # and SOVEREIGN_FQDN stays empty for the lifetime of the Pod — every handover
    # then fails the audience check with 401 "invalid audience" (caught live on
    # otech62, 2026-05-03). Reloader rolls the Deployment when those resources
    # land, fixing the race without requiring strict Flux dependsOn ordering.
    configmap.reloader.stakater.com/reload: "sovereign-fqdn"
    secret.reloader.stakater.com/reload: "handover-jwt-public"
spec:
  replicas: 1
  # Recreate strategy is required because the deployments PVC is RWO
  # (single-attach). A rolling update would try to schedule a second
  # Pod that mounts the same PVC, which Kubernetes rejects as a
  # MultiAttachError. RWX with a multi-writer-aware filesystem
  # (NFS, CephFS) is the path to HA, but Catalyst-Zero today is
  # single-replica by design — the wizard is interactive and PDM owns
  # cross-tenant isolation, so a single API server is sufficient.
  #
  # The strategy-flip regression that bit contabo-mkt on 2026-04-29
  # (apply over a pre-existing RollingUpdate Deployment fails with
  # `spec.strategy.rollingUpdate: Forbidden`) is recovered by the
  # `kustomize.toolkit.fluxcd.io/force: enabled` annotation above —
  # see that annotation's comment for the full failure-mode analysis
  # and the docs/CHART-AUTHORING.md §"Strategy flips on existing
  # Deployments" entry. Do NOT add an inline `$patch: replace` here:
  # it BREAKS fresh installs (kubectl strict-decoding rejects
  # `spec.strategy.$patch` on create), and Flux's SSA path strips it
  # anyway. The integration test at tests/integration/strategy-flip.yaml
  # asserts both the recovery path works and the regression mode is
  # still detected.
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: catalyst-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: catalyst-api
    spec:
      imagePullSecrets:
        - name: ghcr-pull
      # fsGroup applies to the volumes mounted into the Pod so the
      # non-root container UID (65534) can write to the deployments
      # PVC. Without this, Hetzner Cloud Volumes default to root:root
      # and the catalyst-api process gets EACCES on every store.Save —
      # surfacing as the "deployment store unavailable" warning at
      # startup and silent persistence failures at runtime.
      #
      # fsGroupChangePolicy: OnRootMismatch limits the chown traversal
      # to first start (where the volume is freshly provisioned with
      # the wrong UID). Subsequent restarts skip the recursive chown
      # if the root dir already matches, keeping Pod start times
      # bounded as the deployments directory grows.
      securityContext:
        fsGroup: 65534
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: catalyst-api
          image: "ghcr.io/openova-io/openova/catalyst-api:9a58289"
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
              protocol: TCP
          env:
            - name: PORT
              value: "8080"
            - name: CORS_ORIGIN
              value: "https://console.openova.io"
            - name: DYNADOT_API_KEY
              valueFrom:
                secretKeyRef:
                  name: dynadot-api-credentials
                  key: api-key
                  # optional=true: Sovereign clusters don't hold Dynadot
                  # credentials — their tenant DNS is served by the
                  # Sovereign's own PowerDNS instance, not the parent
                  # account. Catalyst-Zero (contabo-mkt) supplies the
                  # real secret; Sovereigns use an empty stub or omit it
                  # entirely. Without optional=true the pod refuses to
                  # start when the secret is absent (issue #547).
                  optional: true
            - name: DYNADOT_API_SECRET
              valueFrom:
                secretKeyRef:
                  name: dynadot-api-credentials
                  key: api-secret
                  optional: true
            # DYNADOT_MANAGED_DOMAINS — comma-separated list of pool domains
            # the same Dynadot account manages. Per docs/INVIOLABLE-PRINCIPLES.md
            # #4, this is runtime configuration so adding a third pool domain
            # (e.g. acme.io) does NOT require a code change — only a secret
            # update. The Dynadot API is account-scoped (one api-key/api-secret
            # pair covers every domain owned by the account); this list scopes
            # which domains the catalyst-api is *allowed* to write records for,
            # defending against misconfiguration that would let a wizard-
            # supplied poolDomain trigger writes against an unrelated domain.
            - name: DYNADOT_MANAGED_DOMAINS
              valueFrom:
                secretKeyRef:
                  name: dynadot-api-credentials
                  key: domains
                  # optional=true so deployments using the legacy single-value
                  # `domain` key (pre-#108) keep working until the secret is
                  # migrated; the dynadot package falls through to DYNADOT_DOMAIN
                  # then to its built-in defaults if neither key is present.
                  optional: true
            - name: DYNADOT_DOMAIN
              valueFrom:
                secretKeyRef:
                  name: dynadot-api-credentials
                  key: domain
                  optional: true
            # CATALYST_TOFU_WORKDIR — provisioner runs `tofu init/plan/apply`
            # inside this directory. PVC-backed (catalyst-api-deployments) so
            # in-progress tofu state survives Pod restarts. Without this,
            # any catalyst-api Pod roll mid-apply (e.g. an unrelated chart
            # bump that triggers rolling restart on Catalyst-Zero, or a
            # node reboot) leaks Hetzner resources because partial apply
            # state is in emptyDir. Caught live on otech64, 2026-05-03:
            # contabo's catalyst-api was rolled at 21:40:11 (3 minutes
            # into otech64's tofu apply), terminal_LB created without its
            # control_plane target, and otech64 came up with an unreachable
            # 49.12.16.160 LB. Reasonable for fsGroup=65534 above to
            # provide write access to /var/lib/catalyst (PVC mountPath).
            - name: CATALYST_TOFU_WORKDIR
              value: /var/lib/catalyst/tofu
            # CATALYST_DEPLOYMENTS_DIR — flat-file store for deployment
            # records (one JSON file per deployment id). Backed by the
            # PVC mount below so deployments persist across Pod
            # restarts. Each record is the full Deployment state with
            # credentials redacted; see internal/store/store.go.
            - name: CATALYST_DEPLOYMENTS_DIR
              value: /var/lib/catalyst/deployments
            # CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS — defensive floor
            # only. The load-bearing termination gate is now the
            # informer's HasSynced signal (after WaitForCacheSync the
            # full bp-* HelmRelease set is in the cache, regardless of
            # cardinality). Set to 1 so the watch still refuses to
            # terminate when the cache is completely empty (the
            # "bootstrap-kit Kustomization never reconciled at all"
            # footgun, classified as OutcomeFluxNotReconciling).
            #
            # Earlier values (11, then 38) tied this to the kit count;
            # that coupling is brittle — otech48 (2026-05-03) sat
            # phase1-watching forever because the env was 38 but the
            # kit had drifted to 37. The HasSynced gate is drift-proof.
            - name: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS
              value: "1"
            # CATALYST_KUBECONFIGS_DIR — sibling directory on the same
            # PVC for the plaintext kubeconfigs the new Sovereign POSTs
            # back via the bearer-token endpoint (issue #183, Option D).
            # One <id>.yaml per deployment, mode 0600. The store JSON
            # record carries only the file path + a SHA-256 hash of
            # the bearer; the plaintext kubeconfig is NEVER serialized
            # into the JSON.
            - name: CATALYST_KUBECONFIGS_DIR
              value: /var/lib/catalyst/kubeconfigs
            # CATALYST_API_PUBLIC_URL — the public origin the new
            # Sovereign's cloud-init PUTs its kubeconfig back to. The
            # OpenTofu module templates this into the Sovereign's
            # user_data so the Sovereign knows where to call. Per
            # docs/INVIOLABLE-PRINCIPLES.md #4 this is runtime
            # configuration; air-gapped franchises override it
            # without code change.
            - name: CATALYST_API_PUBLIC_URL
              value: https://console.openova.io/sovereign
            # CATALYST_K8SCACHE_KUBECONFIGS_DIR — issue #321. Directory
            # the k8scache.Factory reads kubeconfigs from at startup.
            # The data-plane SharedInformerFactory opens one informer
            # per kubeconfig file; the cloud-init postback handler
            # (PUT /api/v1/deployments/{id}/kubeconfig) writes here on
            # Phase-1 attach so a fresh Sovereign id is automatically
            # picked up at next catalyst-api restart. The same PVC
            # (catalyst-api-deployments) backs the existing
            # deployments store; the data-plane reads the kubeconfigs/
            # subdirectory directly.
            - name: CATALYST_K8SCACHE_KUBECONFIGS_DIR
              value: /var/lib/catalyst/kubeconfigs
            # CATALYST_K8SCACHE_SNAPSHOT_DIR — issue #321 cold-start
            # mitigation. Backed by a separate 5Gi PVC
            # (catalyst-api-cache) so its size is independent of the
            # deployments store. See api-cache-pvc.yaml for the sizing
            # rationale + the cold-start latency contract.
            - name: CATALYST_K8SCACHE_SNAPSHOT_DIR
              value: /var/cache/sov-cache
            # CATALYST_K8SCACHE_KINDS_CONFIGMAP — optional ConfigMap
            # extending the built-in kinds registry. Per docs/
            # INVIOLABLE-PRINCIPLES.md #4 a new watched GVR (e.g.
            # HelmRelease, Kustomization) is a runtime configuration
            # change, not a code change. Empty disables ConfigMap
            # loading; built-in DefaultKinds is used.
            - name: CATALYST_K8SCACHE_KINDS_CONFIGMAP
              value: catalyst-k8scache-kinds
            - name: CATALYST_K8SCACHE_KINDS_CONFIGMAP_NAMESPACE
              value: catalyst
            # CATALYST_GHCR_PULL_TOKEN — long-lived GHCR pull token that
            # the provisioner stamps onto every Request and the OpenTofu
            # cloud-init template writes into the new Sovereign's
            # flux-system/ghcr-pull Secret so Flux source-controller
            # can pull private bp-* OCI artifacts from
            # ghcr.io/openova-io/. Without this, Phase 1 stalls at
            # bp-cilium with "secrets ghcr-pull not found" — verified
            # live on omantel.omani.works pre-fix.
            #
            # optional: true — when the Secret or key is missing the
            # Pod still starts (with the env var unset). The
            # provisioner's Validate() rejects deployments that need
            # the token (Phase 1 bootstrap-kit pulls private bp-*
            # charts) with a clear pointer to docs/SECRET-ROTATION.md,
            # so a misconfigured catalyst-api fails fast on
            # /api/v1/deployments POST instead of silently mid-apply.
            # /healthz, /api/v1/credentials/validate, and the BYO
            # registrar proxy keep working — they don't read the
            # token at all.
            #
            # Rotation: yearly, see docs/SECRET-ROTATION.md. The Secret
            # is created out-of-band by an operator (never via Flux,
            # never committed to git) — the chart references it but
            # does not template it.
            - name: CATALYST_GHCR_PULL_TOKEN
              valueFrom:
                secretKeyRef:
                  name: catalyst-ghcr-pull-token
                  key: token
                  optional: true
            # CATALYST_HARBOR_ROBOT_TOKEN — central Harbor proxy-cache
            # robot account secret (issue #557 + #557 follow-up). The
            # value is interpolated into the new Sovereign's
            # /etc/rancher/k3s/registries.yaml at cloud-init time so
            # containerd authenticates against harbor.openova.io's proxy
            # projects (proxy-dockerhub etc).
            #
            # Provisioning seam (catalyst-system Pod gets the Secret):
            #   1. Tofu var.harbor_robot_token enters cloud-init
            #      (infra/hetzner/cloudinit-control-plane.tftpl).
            #   2. Cloud-init writes /var/lib/catalyst/harbor-robot-
            #      token-secret.yaml into flux-system ns with the
            #      auto-mirror Reflector annotations
            #      (reflection-auto-enabled: "true").
            #   3. runcmd applies it BEFORE flux-bootstrap, so the
            #      Secret exists before any Helm release runs.
            #   4. bp-reflector (slot 05a) propagates it into every
            #      namespace (incl. catalyst-system) on first reconcile.
            #   5. This Pod's secretKeyRef resolves once the mirror lands.
            # Mirrors the canonical pattern that flux-system/ghcr-pull
            # already uses (PR #543).
            #
            # NOT optional — provisioner.Validate() rejects deployments
            # with an empty token. The architecture mandate is that every
            # Sovereign image pull goes through harbor.openova.io; falling
            # through to docker.io is forbidden (rate-limit makes a fresh
            # Hetzner IP unbootable within minutes). When `optional: true`
            # was previously contemplated we chose against it: a missing
            # token must surface immediately as a Pod start failure
            # (CreateContainerConfigError), not silently mid-provision.
            #
            # Rotation: yearly. Re-render Tofu plan → re-apply cloud-init
            # → kubectl apply runs against the existing Secret with
            # rotated bytes; bp-reflector propagates the rotation to all
            # mirrored copies on the next watch tick. Plaintext NEVER
            # lives in git.
            - name: CATALYST_HARBOR_ROBOT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: harbor-robot-token
                  key: token
            # CATALYST_POWERDNS_API_KEY — contabo PowerDNS API key (PR
            # #681 followup). The value is interpolated into the new
            # Sovereign's `cert-manager/powerdns-api-credentials` Secret
            # at cloud-init time so bp-cert-manager-powerdns-webhook
            # can write DNS-01 challenge TXT records to contabo's
            # authoritative omani.works zone.
            #
            # Provisioning seam:
            #   1. Source: contabo's `openova-system/powerdns-api-
            #      credentials` Secret (created by bp-powerdns chart).
            #   2. Reflector mirrors it into every namespace incl.
            #      catalyst (annotations on the source: reflection-
            #      auto-enabled: "true", reflection-auto-namespaces: "").
            #   3. This Pod resolves it via secretKeyRef.
            #   4. provisioner.New() reads CATALYST_POWERDNS_API_KEY at
            #      startup, stamps onto every Request.
            #   5. cloud-init writes the Sovereign-side Secret in
            #      cert-manager namespace BEFORE Flux reconciles
            #      bp-cert-manager-powerdns-webhook.
            #
            # optional=true: Catalyst-Zero pods on Sovereigns don't have
            # this Secret reflected (their PowerDNS is local) so the
            # bootstrap shape stays clean across both contabo+Sovereign
            # catalyst-api deployments.
            - name: CATALYST_POWERDNS_API_KEY
              valueFrom:
                secretKeyRef:
                  name: powerdns-api-credentials
                  key: api-key
                  optional: true
            # ── /auth/handover Keycloak service-account (issue #606) ──────────
            # CATALYST_KC_ADDR — Keycloak base URL. Defaults to in-cluster
            # service FQDN in code; override here for non-standard Sovereign
            # Keycloak deployments.
            # optional=true: Catalyst-Zero pods don't run Keycloak locally.
            - name: CATALYST_KC_ADDR
              valueFrom:
                secretKeyRef:
                  name: catalyst-kc-sa-credentials
                  key: addr
                  optional: true
            - name: CATALYST_KC_REALM
              valueFrom:
                secretKeyRef:
                  name: catalyst-kc-sa-credentials
                  key: realm
                  optional: true
            - name: CATALYST_KC_SA_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: catalyst-kc-sa-credentials
                  key: client-id
                  optional: true
            - name: CATALYST_KC_SA_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: catalyst-kc-sa-credentials
                  key: client-secret
                  optional: true
            # CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH — path to the JWK file that
            # holds the RS256 public key for validating one-time handover JWTs.
            # The K8s Secret `catalyst-handover-jwt-public` (created by
            # cloud-init at provision time, see infra/hetzner/cloudinit-control-
            # plane.tftpl) is mounted as a directory at /etc/catalyst/handover-
            # jwt-public/, so the JWK lives at /etc/catalyst/handover-jwt-public/
            # public.jwk. We deliberately mount the Secret as a directory rather
            # than using subPath: the catalyst-api PVC at /var/lib/catalyst is
            # ReadWriteOnce and a leftover empty directory at the legacy path
            # /var/lib/catalyst/handover-jwt-public.jwk/ from earlier installs
            # (where the Secret was missing and Kubernetes created an empty
            # directory in the volume) collides with the subPath file mount on
            # re-provisioning. Mounting under /etc/ keeps the JWK off the PVC
            # entirely so the conflict cannot recur. Caught live on otech48,
            # 2026-05-03.
            - name: CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
              value: /etc/catalyst/handover-jwt-public/public.jwk
            # SOVEREIGN_FQDN — Sovereign's public FQDN. The /auth/handover
            # validator (auth_handover.go) reads this to compute the expected
            # JWT audience claim ("https://console." + SOVEREIGN_FQDN). When
            # unset on a Sovereign, the audience check collapses to "https://
            # console." and every valid token is rejected with "invalid
            # audience" 401 — caught live on otech48, 2026-05-03.
            #
            # NOTE: this file is consumed BOTH by Helm (per-Sovereign install
            # via bp-catalyst-platform OCI chart) AND by Kustomize (contabo-
            # mkt's clusters/contabo-mkt/apps/catalyst-platform Kustomization
            # at path: ./products/catalyst/chart/templates). Kustomize parses
            # raw YAML — Helm template syntax (double-curly directives) here
            # breaks the Kustomize build (caught live on contabo 2026-05-03
            # from commit adf8dc7d: "yaml: invalid map key").
            #
            # Solution: read the value from a ConfigMap that exists ONLY on
            # Sovereigns (not contabo). On contabo the optional reference
            # resolves to empty (correct — catalyst-api on contabo is the
            # SIGNER never the validator, /auth/handover never hits there).
            # On Sovereigns, clusters/_template/sovereign-tls/sovereign-fqdn-
            # configmap.yaml renders the ConfigMap from envsubst-ed
            # ${SOVEREIGN_FQDN} when Flux applies the kustomization.
            - name: SOVEREIGN_FQDN
              valueFrom:
                configMapKeyRef:
                  name: sovereign-fqdn
                  key: fqdn
                  optional: true
            # CATALYST_HANDOVER_KEY_PATH — path to the RS256 PRIVATE key
            # catalyst-api uses to mint magic-link + handover JWTs. The
            # signer auto-generates the keypair on first start if absent.
            # MUST be on a writable PVC mount. Catalyst-Zero only.
            - name: CATALYST_HANDOVER_KEY_PATH
              value: /var/lib/catalyst/handover-jwt-private.pem
            # ── Magic-link auth (issue #608, Phase-8b Agent A) ──────────────
            # CATALYST_KC_CLIENT_ID — OIDC client ID for the Catalyst-Zero
            # UI (catalyst-zero-ui PKCE client). Defaults to "catalyst-zero-ui"
            # in code; override here for multi-tenant or custom client names.
            # optional=true: Sovereign clusters don't use this auth path.
            - name: CATALYST_KC_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: catalyst-magic-link-credentials
                  key: kc-client-id
                  optional: true
            # CATALYST_KC_REDIRECT_URI — OAuth callback URL the Keycloak magic-
            # link redirects to after verification (e.g.
            # https://console.openova.io/sovereign/auth/callback).
            # Per INVIOLABLE-PRINCIPLES #4: runtime configuration, not hardcoded.
            - name: CATALYST_KC_REDIRECT_URI
              valueFrom:
                secretKeyRef:
                  name: catalyst-magic-link-credentials
                  key: kc-redirect-uri
                  optional: true
            # CATALYST_SESSION_COOKIE_SECRET — HMAC-SHA256 key for signing the
            # catalyst_session HttpOnly cookie value. 32 random bytes (base64url
            # encoded). Rotation invalidates all active sessions.
            - name: CATALYST_SESSION_COOKIE_SECRET
              valueFrom:
                secretKeyRef:
                  name: catalyst-magic-link-credentials
                  key: session-cookie-secret
                  optional: true
            # CATALYST_POST_AUTH_REDIRECT — URL the browser is sent to after a
            # successful magic-link callback. Defaults to /wizard in code; set
            # to /sovereign/wizard here because Catalyst-Zero routes the UI
            # under the /sovereign prefix and the Traefik strip-prefix middleware
            # is transparent to the server-side Location header.
            # Per INVIOLABLE-PRINCIPLES #4: runtime configuration, not hardcoded.
            - name: CATALYST_POST_AUTH_REDIRECT
              value: /sovereign/wizard
            # ── Option-B magic-link: openova realm service account ───────────
            # CATALYST_OPENOVA_KC_ADDR — Keycloak base URL for the openova realm.
            # Defaults in code to keycloak-zero.keycloak-zero.svc (in-cluster
            # on Catalyst-Zero). optional=true: Sovereign clusters don't run
            # the openova realm.
            - name: CATALYST_OPENOVA_KC_ADDR
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: kc-addr
                  optional: true
            - name: CATALYST_OPENOVA_KC_REALM
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: kc-realm
                  optional: true
            - name: CATALYST_OPENOVA_KC_SA_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: kc-sa-client-id
                  optional: true
            - name: CATALYST_OPENOVA_KC_SA_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: kc-sa-client-secret
                  optional: true
            # CATALYST_OPENOVA_KC_AUDIENCE — OIDC audience for KC token-exchange.
            # Defaults to "catalyst-zero-ui" in code. optional=true.
            - name: CATALYST_OPENOVA_KC_AUDIENCE
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: kc-audience
                  optional: true
            # CATALYST_SMTP_HOST / CATALYST_SMTP_PORT — Stalwart SMTP relay for
            # magic-link email delivery. Defaults in code to
            # stalwart-web.stalwart.svc.cluster.local:587. optional=true.
            - name: CATALYST_SMTP_HOST
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: smtp-host
                  optional: true
            - name: CATALYST_SMTP_PORT
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: smtp-port
                  optional: true
            - name: CATALYST_SMTP_USER
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: smtp-user
                  optional: true
            - name: CATALYST_SMTP_PASS
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: smtp-pass
                  optional: true
            - name: CATALYST_SMTP_FROM
              valueFrom:
                secretKeyRef:
                  name: catalyst-openova-kc-credentials
                  key: smtp-from
                  optional: true
            # CATALYST_SESSION_COOKIE_DOMAIN — optional domain scoping for the
            # catalyst_session + catalyst_refresh cookies. Set to console.openova.io
            # so both /sovereign/wizard and /sovereign/auth/magic share the cookie jar.
            - name: CATALYST_SESSION_COOKIE_DOMAIN
              value: "console.openova.io"
          resources:
            requests:
              cpu: 50m
              memory: 128Mi
            limits:
              # tofu provider plugins (hcloud ~80MB, dynadot ~30MB) + state +
              # plan files easily exceed the prior 64Mi cap. 1Gi gives headroom
              # for parallel provider init and sustained `apply` work.
              cpu: 1000m
              memory: 1Gi
          # Liveness vs readiness — the split is REQUIRED, not cosmetic
          # (issue #530). /healthz is liveness: it returns 200 whenever
          # the catalyst-api process is up and the HTTP server is
          # serving. /readyz is readiness: it returns 200 only when the
          # primary Sovereign's Pod + Deployment informers are synced
          # (or no Sovereigns are registered yet).
          #
          # The previous wiring pointed BOTH probes at /healthz AND
          # /healthz performed the strict informer-sync check. The
          # crashloop chain that followed:
          #
          #   1. Operator POSTs a fresh deployment.
          #   2. catalyst-api registers the Sovereign in k8scache and
          #      starts looking for a kubeconfig file on the PVC.
          #   3. Kubeconfig will NOT arrive until the new Sovereign's
          #      cloud-init runs (~60-120s) and PUTs it back. Until
          #      then, informers cannot start, sync flips false.
          #   4. /healthz returns 503. kubelet kills the Pod on the
          #      next liveness probe (~33s).
          #   5. Restarted Pod restores deployments from the PVC,
          #      re-registers the Sovereign, re-enters the same
          #      no-kubeconfig state. Loop repeats.
          #   6. Service has zero ready endpoints throughout. nginx
          #      returns 502 to cloud-init's kubeconfig PUT. The PUT
          #      never reaches catalyst-api. Provision stalls forever.
          #
          # The fix: liveness must be process-level (am I up?), NOT
          # workload-level (do I have a kubeconfig?). The strict
          # informer-sync check stays — moved to /readyz — so a Pod
          # whose primary Sovereign is mid-sync briefly drops out of
          # the Service rotation but is NOT restarted. The kubeconfig
          # PUT endpoint reaches catalyst-api the moment cloud-init
          # calls it, breaking the deadlock.
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 3
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 5
          securityContext:
            allowPrivilegeEscalation: false
            # readOnlyRootFilesystem deliberately false: the bootstrap installer
            # writes kubeconfig temp files (mode 0600) under /tmp and helm
            # downloads chart caches under $HOME. Per Catalyst security policy
            # these writes are scoped via emptyDir below, never to the image's
            # actual root FS.
            readOnlyRootFilesystem: false
            runAsNonRoot: true
            runAsUser: 65534
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: home
              mountPath: /home/nonroot
            # Catalyst PVC — mounted at /var/lib/catalyst so two
            # subdirectories live on the same single-attach volume:
            #
            #   deployments/<id>.json   — flat-file deployment store.
            #     Every catalyst-api restart that rehydrates from
            #     this directory closes the user-reported regression
            #     where a deployment id created at 12:57 vanished
            #     after 6 image rolls. The store walks every *.json
            #     on startup; in-flight rows are rewritten to
            #     `failed` with operator instructions for purging
            #     orphaned Hetzner resources.
            #
            #   kubeconfigs/<id>.yaml   — plaintext kubeconfig POSTed
            #     back from cloud-init via the bearer-token endpoint
            #     (issue #183, Option D). Mode 0600 per file. The
            #     path is persisted in the deployment record so a
            #     Pod restart mid-Phase-1 reattaches the helmwatch
            #     goroutine.
            #
            # One PVC, one mount — keeps the failure modes (PVC
            # unbind, fs full) bounded to one volume, and lets the
            # Go process create both subdirectories on startup
            # without a second volume claim or init container.
            - name: catalyst
              mountPath: /var/lib/catalyst
            # k8scache disk-snapshot mount (issue #321). Separate PVC
            # so cache size is independent of deployment-record
            # storage. The k8scache loop writes one JSON per
            # (cluster, kind) here, mode 0600. Pruned by the loop
            # itself when a snapshot ages past 1h.
            - name: sov-cache
              mountPath: /var/cache/sov-cache
            # handover-jwt-public — RS256 public key JWK distributed by
            # cloud-init from Catalyst-Zero's signing keypair. Mounted
            # read-only as a directory under /etc/catalyst/ (NOT under
            # /var/lib/catalyst because that is the catalyst-api PVC; a
            # leftover empty directory at the legacy file path from
            # pre-#606 installs would collide with a subPath file mount on
            # re-provision). The JWK lives at /etc/catalyst/handover-jwt-
            # public/public.jwk — see CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
            # above. optional=true on the Secret so pods start on
            # Catalyst-Zero (which is the SIGNER, not the verifier) and
            # in CI where the Secret may be absent.
            - name: handover-jwt-public
              mountPath: /etc/catalyst/handover-jwt-public
              readOnly: true
      volumes:
        - name: tmp
          emptyDir:
            # 2Gi to hold the per-deployment OpenTofu workdir tree under
            # /tmp/catalyst/tofu/<sovereign-fqdn>/ (provider plugins + state
            # + plan binary). Each Sovereign run gets its own subdirectory.
            sizeLimit: 2Gi
        - name: home
          emptyDir:
            sizeLimit: 256Mi
        # Persistent catalyst-api state — mounted at /var/lib/catalyst
        # so deployments/ and kubeconfigs/ share one volume. The PVC
        # must already exist in the same namespace under the name
        # catalyst-api-deployments; see api-deployments-pvc.yaml in
        # this chart. Single-attach (RWO) is fine because the
        # Deployment is single-replica with the Recreate strategy
        # declared above; a future HA rework would need RWX or a
        # different persistence layer.
        - name: catalyst
          persistentVolumeClaim:
            claimName: catalyst-api-deployments
        # k8scache disk-snapshot PVC (issue #321). 5Gi RWO; see
        # api-cache-pvc.yaml for the sizing + cold-start contract.
        - name: sov-cache
          persistentVolumeClaim:
            claimName: catalyst-api-cache
        # handover-jwt-public — RS256 public key JWK written by cloud-init
        # from Catalyst-Zero's signing keypair. Secret is optional so
        # Catalyst-Zero pods (the signer) and CI start without it.
        - name: handover-jwt-public
          secret:
            secretName: catalyst-handover-jwt-public
            optional: true
            items:
              - key: public.jwk
                path: public.jwk