992 lines
53 KiB
YAML
992 lines
53 KiB
YAML
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: catalyst-api
|
|
labels:
|
|
app.kubernetes.io/name: catalyst-api
|
|
app.kubernetes.io/component: api
|
|
annotations:
|
|
# `kustomize.toolkit.fluxcd.io/force: enabled` is the durable
|
|
# remediation for the `RollingUpdate -> Recreate` strategy-flip
|
|
# collision documented in docs/CHART-AUTHORING.md §"Strategy flips
|
|
# on existing Deployments".
|
|
#
|
|
# Failure mode this addresses
|
|
# ---------------------------
|
|
# On 2026-04-29 the `catalyst` Flux Kustomization on contabo-mkt
|
|
# stuck at Ready=False with:
|
|
#
|
|
# Deployment.apps "catalyst-api" is invalid:
|
|
# spec.strategy.rollingUpdate: Forbidden:
|
|
# may not be specified when strategy `type` is 'Recreate'
|
|
#
|
|
# Root cause: the live Deployment had been previously created with
|
|
# the default `RollingUpdate` strategy (so `rollingUpdate.maxSurge=25%`
|
|
# and `maxUnavailable=25%` were present on the live object, owned
|
|
# by the `kubectl-client-side-apply` field manager). Flux's
|
|
# kustomize-controller submits this manifest via Server-Side Apply
|
|
# with field manager `kustomize-controller`. SSA's contract is
|
|
# "set the fields you declare" — it does NOT remove fields owned
|
|
# by other managers. Result: post-merge object had `type: Recreate`
|
|
# AND the residual `rollingUpdate.*` block, which the API server's
|
|
# validator rejects as invalid (Recreate forbids any rollingUpdate
|
|
# keys). SSA is REQUIRED to reject the merge. No SSA-only chart
|
|
# change can fix this.
|
|
#
|
|
# Why `$patch: replace` does NOT solve this
|
|
# -----------------------------------------
|
|
# The Strategic Merge Patch directive `$patch: replace` would tell
|
|
# an SMP-aware merger to REPLACE the strategy block instead of
|
|
# merging into it. But:
|
|
# - SSA rejects `$patch` outright with "field not declared in
|
|
# schema" (it's not in apps/v1 Deployment).
|
|
# - kubectl strict-decoding rejects `$patch` on CREATE under any
|
|
# mode with "unknown field spec.strategy.$patch" — so adding
|
|
# it to the chart manifest BREAKS fresh installs.
|
|
# `$patch: replace` is a runtime SMP directive, never a chart-spec
|
|
# value. It belongs in a Kustomize `patches:` entry (where the
|
|
# kustomize binary consumes it at build time and emits a clean
|
|
# output) — never inline in a base resource.
|
|
#
|
|
# Why the Flux force annotation IS the right fix
|
|
# ----------------------------------------------
|
|
# When kustomize-controller's SSA submission fails dry-run with an
|
|
# Invalid response, this annotation directs the controller to
|
|
# recover by deleting and recreating THIS resource specifically
|
|
# (not the whole Kustomization). The recreated Deployment has no
|
|
# residual `rollingUpdate.*` fields — the regression cannot
|
|
# recur on the rebuilt object.
|
|
#
|
|
# That is NOT a "kubectl delete bandaid": the annotation is part
|
|
# of the IaC manifest, version-controlled, applied declaratively
|
|
# via Flux on every reconciliation, scoped to this single
|
|
# Deployment, and removed only by editing the chart. Per
|
|
# docs/INVIOLABLE-PRINCIPLES.md #3 (Follow the documented
|
|
# architecture, exactly — Flux is the ONLY GitOps reconciler) and
|
|
# #4 (Never hardcode — runtime configuration in Git, not in shell
|
|
# history): the remediation lives in source control.
|
|
#
|
|
# Why this Deployment in particular tolerates a recreate: the
|
|
# spec declares `strategy.type: Recreate`, so the steady-state
|
|
# update path is delete-and-recreate anyway. Flux falling back to
|
|
# delete-and-recreate on a strategy-flip is a no-op relative to a
|
|
# normal pod-spec change. The deployments PVC is ReadWriteOnce;
|
|
# the recreate flow detaches it from the old Pod before mounting
|
|
# it on the new one, which is exactly the contract `Recreate`
|
|
# enforces. State persistence is maintained because the PVC
|
|
# itself is NOT recreated by this annotation — only the
|
|
# Deployment resource is.
|
|
kustomize.toolkit.fluxcd.io/force: enabled
|
|
# Reloader watches the sovereign-fqdn + handover-jwt-public ConfigMaps/Secrets
|
|
# this Pod reads via valueFrom. On Sovereigns, those resources are applied
|
|
# by the sovereign-tls Kustomization concurrently with the bp-catalyst-platform
|
|
# HelmRelease. If the Pod started first, optional valueFrom resolves to ""
|
|
# and SOVEREIGN_FQDN stays empty for the lifetime of the Pod — every handover
|
|
# then fails the audience check with 401 "invalid audience" (caught live on
|
|
# otech62, 2026-05-03). Reloader rolls the Deployment when those resources
|
|
# land, fixing the race without requiring strict Flux dependsOn ordering.
|
|
configmap.reloader.stakater.com/reload: "sovereign-fqdn"
|
|
secret.reloader.stakater.com/reload: "handover-jwt-public"
|
|
spec:
|
|
replicas: 1
|
|
# Recreate strategy is required because the deployments PVC is RWO
|
|
# (single-attach). A rolling update would try to schedule a second
|
|
# Pod that mounts the same PVC, which Kubernetes rejects as a
|
|
# MultiAttachError. RWX with a multi-writer-aware filesystem
|
|
# (NFS, CephFS) is the path to HA, but Catalyst-Zero today is
|
|
# single-replica by design — the wizard is interactive and PDM owns
|
|
# cross-tenant isolation, so a single API server is sufficient.
|
|
#
|
|
# The strategy-flip regression that bit contabo-mkt on 2026-04-29
|
|
# (apply over a pre-existing RollingUpdate Deployment fails with
|
|
# `spec.strategy.rollingUpdate: Forbidden`) is recovered by the
|
|
# `kustomize.toolkit.fluxcd.io/force: enabled` annotation above —
|
|
# see that annotation's comment for the full failure-mode analysis
|
|
# and the docs/CHART-AUTHORING.md §"Strategy flips on existing
|
|
# Deployments" entry. Do NOT add an inline `$patch: replace` here:
|
|
# it BREAKS fresh installs (kubectl strict-decoding rejects
|
|
# `spec.strategy.$patch` on create), and Flux's SSA path strips it
|
|
# anyway. The integration test at tests/integration/strategy-flip.yaml
|
|
# asserts both the recovery path works and the regression mode is
|
|
# still detected.
|
|
strategy:
|
|
type: Recreate
|
|
selector:
|
|
matchLabels:
|
|
app.kubernetes.io/name: catalyst-api
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app.kubernetes.io/name: catalyst-api
|
|
spec:
|
|
# serviceAccountName — bind the Pod to the dedicated cutover-driver
|
|
# ServiceAccount so the /api/v1/sovereign/cutover/start handler can
|
|
# read/patch the cutover ConfigMaps + create/watch Jobs in the
|
|
# `catalyst` namespace. See serviceaccount-cutover-driver.yaml +
|
|
# clusterrole-cutover-driver.yaml + clusterrolebinding-cutover-
|
|
# driver.yaml for the full RBAC graph (issue #830 P0 Bug 1).
|
|
#
|
|
# The SA is created by THIS chart in the same namespace catalyst-api
|
|
# runs in (catalyst-system) and bound at cluster scope (the cutover
|
|
# endpoint is namespace-configurable via CATALYST_CUTOVER_NAMESPACE).
|
|
# Without this, the Pod runs as system:serviceaccount:catalyst-
|
|
# system:default and every cutover-status read returns 502
|
|
# "configmaps is forbidden" (caught live on otech102, 2026-05-04).
|
|
serviceAccountName: catalyst-api-cutover-driver
|
|
imagePullSecrets:
|
|
- name: ghcr-pull
|
|
# fsGroup applies to the volumes mounted into the Pod so the
|
|
# non-root container UID (65534) can write to the deployments
|
|
# PVC. Without this, Hetzner Cloud Volumes default to root:root
|
|
# and the catalyst-api process gets EACCES on every store.Save —
|
|
# surfacing as the "deployment store unavailable" warning at
|
|
# startup and silent persistence failures at runtime.
|
|
#
|
|
# fsGroupChangePolicy: OnRootMismatch limits the chown traversal
|
|
# to first start (where the volume is freshly provisioned with
|
|
# the wrong UID). Subsequent restarts skip the recursive chown
|
|
# if the root dir already matches, keeping Pod start times
|
|
# bounded as the deployments directory grows.
|
|
securityContext:
|
|
fsGroup: 65534
|
|
fsGroupChangePolicy: OnRootMismatch
|
|
containers:
|
|
- name: catalyst-api
|
|
# Literal image ref — required for the contabo-mkt Kustomize
|
|
# path (kustomize-controller doesn't render Helm templates).
|
|
# Auto-bumped by .github/workflows/catalyst-build.yaml's deploy
|
|
# step on every push to main, so Sovereigns AND contabo both
|
|
# roll to the latest catalyst-api SHA. The matching
|
|
# values.yaml `images.catalystApi.tag` is also bumped (but
|
|
# unused for catalyst-api; kept for SME services that DO read
|
|
# from values).
|
|
image: "ghcr.io/openova-io/openova/catalyst-api:1a85a9b"
|
|
imagePullPolicy: IfNotPresent
|
|
ports:
|
|
- containerPort: 8080
|
|
protocol: TCP
|
|
env:
|
|
- name: PORT
|
|
value: "8080"
|
|
- name: CORS_ORIGIN
|
|
value: "https://console.openova.io"
|
|
- name: DYNADOT_API_KEY
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: dynadot-api-credentials
|
|
key: api-key
|
|
# optional=true: Sovereign clusters don't hold Dynadot
|
|
# credentials — their tenant DNS is served by the
|
|
# Sovereign's own PowerDNS instance, not the parent
|
|
# account. Catalyst-Zero (contabo-mkt) supplies the
|
|
# real secret; Sovereigns use an empty stub or omit it
|
|
# entirely. Without optional=true the pod refuses to
|
|
# start when the secret is absent (issue #547).
|
|
optional: true
|
|
- name: DYNADOT_API_SECRET
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: dynadot-api-credentials
|
|
key: api-secret
|
|
optional: true
|
|
# DYNADOT_MANAGED_DOMAINS — comma-separated list of pool domains
|
|
# the same Dynadot account manages. Per docs/INVIOLABLE-PRINCIPLES.md
|
|
# #4, this is runtime configuration so adding a third pool domain
|
|
# (e.g. acme.io) does NOT require a code change — only a secret
|
|
# update. The Dynadot API is account-scoped (one api-key/api-secret
|
|
# pair covers every domain owned by the account); this list scopes
|
|
# which domains the catalyst-api is *allowed* to write records for,
|
|
# defending against misconfiguration that would let a wizard-
|
|
# supplied poolDomain trigger writes against an unrelated domain.
|
|
- name: DYNADOT_MANAGED_DOMAINS
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: dynadot-api-credentials
|
|
key: domains
|
|
# optional=true so deployments using the legacy single-value
|
|
# `domain` key (pre-#108) keep working until the secret is
|
|
# migrated; the dynadot package falls through to DYNADOT_DOMAIN
|
|
# then to its built-in defaults if neither key is present.
|
|
optional: true
|
|
- name: DYNADOT_DOMAIN
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: dynadot-api-credentials
|
|
key: domain
|
|
optional: true
|
|
# CATALYST_TOFU_WORKDIR — provisioner runs `tofu init/plan/apply`
|
|
# inside this directory. PVC-backed (catalyst-api-deployments) so
|
|
# in-progress tofu state survives Pod restarts. Without this,
|
|
# any catalyst-api Pod roll mid-apply (e.g. an unrelated chart
|
|
# bump that triggers rolling restart on Catalyst-Zero, or a
|
|
# node reboot) leaks Hetzner resources because partial apply
|
|
# state is in emptyDir. Caught live on otech64, 2026-05-03:
|
|
# contabo's catalyst-api was rolled at 21:40:11 (3 minutes
|
|
# into otech64's tofu apply), terminal_LB created without its
|
|
# control_plane target, and otech64 came up with an unreachable
|
|
# 49.12.16.160 LB. Reasonable for fsGroup=65534 above to
|
|
# provide write access to /var/lib/catalyst (PVC mountPath).
|
|
- name: CATALYST_TOFU_WORKDIR
|
|
value: /var/lib/catalyst/tofu
|
|
# CATALYST_DEPLOYMENTS_DIR — flat-file store for deployment
|
|
# records (one JSON file per deployment id). Backed by the
|
|
# PVC mount below so deployments persist across Pod
|
|
# restarts. Each record is the full Deployment state with
|
|
# credentials redacted; see internal/store/store.go.
|
|
- name: CATALYST_DEPLOYMENTS_DIR
|
|
value: /var/lib/catalyst/deployments
|
|
# CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS — defensive floor
|
|
# only. The load-bearing termination gate is now the
|
|
# informer's HasSynced signal (after WaitForCacheSync the
|
|
# full bp-* HelmRelease set is in the cache, regardless of
|
|
# cardinality). Set to 1 so the watch still refuses to
|
|
# terminate when the cache is completely empty (the
|
|
# "bootstrap-kit Kustomization never reconciled at all"
|
|
# footgun, classified as OutcomeFluxNotReconciling).
|
|
#
|
|
# Earlier values (11, then 38) tied this to the kit count;
|
|
# that coupling is brittle — otech48 (2026-05-03) sat
|
|
# phase1-watching forever because the env was 38 but the
|
|
# kit had drifted to 37. The HasSynced gate is drift-proof.
|
|
- name: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS
|
|
value: "1"
|
|
# CATALYST_KUBECONFIGS_DIR — sibling directory on the same
|
|
# PVC for the plaintext kubeconfigs the new Sovereign POSTs
|
|
# back via the bearer-token endpoint (issue #183, Option D).
|
|
# One <id>.yaml per deployment, mode 0600. The store JSON
|
|
# record carries only the file path + a SHA-256 hash of
|
|
# the bearer; the plaintext kubeconfig is NEVER serialized
|
|
# into the JSON.
|
|
- name: CATALYST_KUBECONFIGS_DIR
|
|
value: /var/lib/catalyst/kubeconfigs
|
|
# CATALYST_API_PUBLIC_URL — the public origin the new
|
|
# Sovereign's cloud-init PUTs its kubeconfig back to. The
|
|
# OpenTofu module templates this into the Sovereign's
|
|
# user_data so the Sovereign knows where to call. Per
|
|
# docs/INVIOLABLE-PRINCIPLES.md #4 this is runtime
|
|
# configuration; air-gapped franchises override it
|
|
# without code change.
|
|
- name: CATALYST_API_PUBLIC_URL
|
|
value: https://console.openova.io/sovereign
|
|
# CATALYST_K8SCACHE_KUBECONFIGS_DIR — issue #321. Directory
|
|
# the k8scache.Factory reads kubeconfigs from at startup.
|
|
# The data-plane SharedInformerFactory opens one informer
|
|
# per kubeconfig file; the cloud-init postback handler
|
|
# (PUT /api/v1/deployments/{id}/kubeconfig) writes here on
|
|
# Phase-1 attach so a fresh Sovereign id is automatically
|
|
# picked up at next catalyst-api restart. The same PVC
|
|
# (catalyst-api-deployments) backs the existing
|
|
# deployments store; the data-plane reads the kubeconfigs/
|
|
# subdirectory directly.
|
|
- name: CATALYST_K8SCACHE_KUBECONFIGS_DIR
|
|
value: /var/lib/catalyst/kubeconfigs
|
|
# CATALYST_K8SCACHE_SNAPSHOT_DIR — issue #321 cold-start
|
|
# mitigation. Backed by a separate 5Gi PVC
|
|
# (catalyst-api-cache) so its size is independent of the
|
|
# deployments store. See api-cache-pvc.yaml for the sizing
|
|
# rationale + the cold-start latency contract.
|
|
- name: CATALYST_K8SCACHE_SNAPSHOT_DIR
|
|
value: /var/cache/sov-cache
|
|
# CATALYST_K8SCACHE_KINDS_CONFIGMAP — optional ConfigMap
|
|
# extending the built-in kinds registry. Per docs/
|
|
# INVIOLABLE-PRINCIPLES.md #4 a new watched GVR (e.g.
|
|
# HelmRelease, Kustomization) is a runtime configuration
|
|
# change, not a code change. Empty disables ConfigMap
|
|
# loading; built-in DefaultKinds is used.
|
|
- name: CATALYST_K8SCACHE_KINDS_CONFIGMAP
|
|
value: catalyst-k8scache-kinds
|
|
- name: CATALYST_K8SCACHE_KINDS_CONFIGMAP_NAMESPACE
|
|
value: catalyst
|
|
# CATALYST_GHCR_PULL_TOKEN — long-lived GHCR pull token that
|
|
# the provisioner stamps onto every Request and the OpenTofu
|
|
# cloud-init template writes into the new Sovereign's
|
|
# flux-system/ghcr-pull Secret so Flux source-controller
|
|
# can pull private bp-* OCI artifacts from
|
|
# ghcr.io/openova-io/. Without this, Phase 1 stalls at
|
|
# bp-cilium with "secrets ghcr-pull not found" — verified
|
|
# live on omantel.omani.works pre-fix.
|
|
#
|
|
# optional: true — when the Secret or key is missing the
|
|
# Pod still starts (with the env var unset). The
|
|
# provisioner's Validate() rejects deployments that need
|
|
# the token (Phase 1 bootstrap-kit pulls private bp-*
|
|
# charts) with a clear pointer to docs/SECRET-ROTATION.md,
|
|
# so a misconfigured catalyst-api fails fast on
|
|
# /api/v1/deployments POST instead of silently mid-apply.
|
|
# /healthz, /api/v1/credentials/validate, and the BYO
|
|
# registrar proxy keep working — they don't read the
|
|
# token at all.
|
|
#
|
|
# Rotation: yearly, see docs/SECRET-ROTATION.md. The Secret
|
|
# is created out-of-band by an operator (never via Flux,
|
|
# never committed to git) — the chart references it but
|
|
# does not template it.
|
|
- name: CATALYST_GHCR_PULL_TOKEN
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-ghcr-pull-token
|
|
key: token
|
|
optional: true
|
|
# CATALYST_HARBOR_ROBOT_TOKEN — central Harbor proxy-cache
|
|
# robot account secret (issue #557 + #557 follow-up). The
|
|
# value is interpolated into the new Sovereign's
|
|
# /etc/rancher/k3s/registries.yaml at cloud-init time so
|
|
# containerd authenticates against harbor.openova.io's proxy
|
|
# projects (proxy-dockerhub etc).
|
|
#
|
|
# Provisioning seam (catalyst-system Pod gets the Secret):
|
|
# 1. Tofu var.harbor_robot_token enters cloud-init
|
|
# (infra/hetzner/cloudinit-control-plane.tftpl).
|
|
# 2. Cloud-init writes /var/lib/catalyst/harbor-robot-
|
|
# token-secret.yaml into flux-system ns with the
|
|
# auto-mirror Reflector annotations
|
|
# (reflection-auto-enabled: "true").
|
|
# 3. runcmd applies it BEFORE flux-bootstrap, so the
|
|
# Secret exists before any Helm release runs.
|
|
# 4. bp-reflector (slot 05a) propagates it into every
|
|
# namespace (incl. catalyst-system) on first reconcile.
|
|
# 5. This Pod's secretKeyRef resolves once the mirror lands.
|
|
# Mirrors the canonical pattern that flux-system/ghcr-pull
|
|
# already uses (PR #543).
|
|
#
|
|
# NOT optional — provisioner.Validate() rejects deployments
|
|
# with an empty token. The architecture mandate is that every
|
|
# Sovereign image pull goes through harbor.openova.io; falling
|
|
# through to docker.io is forbidden (rate-limit makes a fresh
|
|
# Hetzner IP unbootable within minutes). When `optional: true`
|
|
# was previously contemplated we chose against it: a missing
|
|
# token must surface immediately as a Pod start failure
|
|
# (CreateContainerConfigError), not silently mid-provision.
|
|
#
|
|
# Rotation: yearly. Re-render Tofu plan → re-apply cloud-init
|
|
# → kubectl apply runs against the existing Secret with
|
|
# rotated bytes; bp-reflector propagates the rotation to all
|
|
# mirrored copies on the next watch tick. Plaintext NEVER
|
|
# lives in git.
|
|
- name: CATALYST_HARBOR_ROBOT_TOKEN
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: harbor-robot-token
|
|
key: token
|
|
# CATALYST_POWERDNS_API_KEY — contabo PowerDNS API key (PR
|
|
# #681 followup). The value is interpolated into the new
|
|
# Sovereign's `cert-manager/powerdns-api-credentials` Secret
|
|
# at cloud-init time so bp-cert-manager-powerdns-webhook
|
|
# can write DNS-01 challenge TXT records to contabo's
|
|
# authoritative omani.works zone.
|
|
#
|
|
# Provisioning seam:
|
|
# 1. Source: contabo's `openova-system/powerdns-api-
|
|
# credentials` Secret (created by bp-powerdns chart).
|
|
# 2. Reflector mirrors it into every namespace incl.
|
|
# catalyst (annotations on the source: reflection-
|
|
# auto-enabled: "true", reflection-auto-namespaces: "").
|
|
# 3. This Pod resolves it via secretKeyRef.
|
|
# 4. provisioner.New() reads CATALYST_POWERDNS_API_KEY at
|
|
# startup, stamps onto every Request.
|
|
# 5. cloud-init writes the Sovereign-side Secret in
|
|
# cert-manager namespace BEFORE Flux reconciles
|
|
# bp-cert-manager-powerdns-webhook.
|
|
#
|
|
# optional=true: Catalyst-Zero pods on Sovereigns don't have
|
|
# this Secret reflected (their PowerDNS is local) so the
|
|
# bootstrap shape stays clean across both contabo+Sovereign
|
|
# catalyst-api deployments.
|
|
- name: CATALYST_POWERDNS_API_KEY
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: powerdns-api-credentials
|
|
key: api-key
|
|
optional: true
|
|
# CATALYST_POWERDNS_API_URL — base URL of the per-Sovereign
|
|
# PowerDNS REST API (issue #827). Used by:
|
|
# - the SME-tenant pipeline's PATCH-RRset writer
|
|
# (sme_tenant_dns.go) for free-subdomain provisioning
|
|
# - the multi-zone parent-domain handler
|
|
# (parent_domains.go) for runtime add-zone
|
|
# Default is the in-cluster Service FQDN of the Sovereign's
|
|
# own PowerDNS (the Helm chart targets namespace `powerdns`
|
|
# with default release name `powerdns`). Operators in
|
|
# non-standard layouts override via the Helm values overlay
|
|
# at clusters/<sovereign>/bootstrap-kit/13-bp-catalyst-
|
|
# platform.yaml.
|
|
#
|
|
# NOTE — DUAL-MODE CONTRACT (see SOVEREIGN_FQDN block below
|
|
# for the canonical explanation): this file is consumed BOTH
|
|
# by Helm (per-Sovereign install) AND by Kustomize (contabo-
|
|
# mkt's flux Kustomization at path: ./products/catalyst/chart/
|
|
# templates). Helm template syntax (double-curly directives)
|
|
# in this file BREAKS the Kustomize build with
|
|
# "yaml: invalid map key" and stalls every contabo
|
|
# reconciliation. The 1.4.0 version of this block used
|
|
# {{ default "..." .Values.catalystApi.powerdnsURL }} — that
|
|
# broke contabo's catalyst-platform Kustomization until this
|
|
# follow-up landed. Issue #830 follow-up.
|
|
#
|
|
# Solution: the in-cluster Service URL is a non-secret
|
|
# constant on every Sovereign that ships bp-powerdns at its
|
|
# canonical release name (powerdns/powerdns). Hardcode the
|
|
# literal here so the Kustomize build stays clean. Per-
|
|
# Sovereign overrides are still possible via the per-
|
|
# Sovereign HelmRelease overlay's `catalystApi.env`
|
|
# additional-env patch that takes precedence over the
|
|
# default below.
|
|
- name: CATALYST_POWERDNS_API_URL
|
|
value: "http://powerdns.powerdns.svc.cluster.local:8081"
|
|
# CATALYST_POWERDNS_SERVER_ID — virtually always "localhost"
|
|
# per the PowerDNS REST API contract. Operator-overridable
|
|
# for multi-tenant PowerDNS deployments where a single
|
|
# PowerDNS instance hosts multiple servers (override via the
|
|
# HelmRelease overlay env patch — same pattern as
|
|
# CATALYST_POWERDNS_API_URL above).
|
|
- name: CATALYST_POWERDNS_SERVER_ID
|
|
value: "localhost"
|
|
# ── /auth/handover Keycloak service-account (issue #606) ──────────
|
|
# CATALYST_KC_ADDR — Keycloak base URL. Defaults to in-cluster
|
|
# service FQDN in code; override here for non-standard Sovereign
|
|
# Keycloak deployments.
|
|
# optional=true: Catalyst-Zero pods don't run Keycloak locally.
|
|
- name: CATALYST_KC_ADDR
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-kc-sa-credentials
|
|
key: addr
|
|
optional: true
|
|
- name: CATALYST_KC_REALM
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-kc-sa-credentials
|
|
key: realm
|
|
optional: true
|
|
- name: CATALYST_KC_SA_CLIENT_ID
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-kc-sa-credentials
|
|
key: client-id
|
|
optional: true
|
|
- name: CATALYST_KC_SA_CLIENT_SECRET
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-kc-sa-credentials
|
|
key: client-secret
|
|
optional: true
|
|
# CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH — path to the JWK file that
|
|
# holds the RS256 public key for validating one-time handover JWTs.
|
|
# The K8s Secret `catalyst-handover-jwt-public` (created by
|
|
# cloud-init at provision time, see infra/hetzner/cloudinit-control-
|
|
# plane.tftpl) is mounted as a directory at /etc/catalyst/handover-
|
|
# jwt-public/, so the JWK lives at /etc/catalyst/handover-jwt-public/
|
|
# public.jwk. We deliberately mount the Secret as a directory rather
|
|
# than using subPath: the catalyst-api PVC at /var/lib/catalyst is
|
|
# ReadWriteOnce and a leftover empty directory at the legacy path
|
|
# /var/lib/catalyst/handover-jwt-public.jwk/ from earlier installs
|
|
# (where the Secret was missing and Kubernetes created an empty
|
|
# directory in the volume) collides with the subPath file mount on
|
|
# re-provisioning. Mounting under /etc/ keeps the JWK off the PVC
|
|
# entirely so the conflict cannot recur. Caught live on otech48,
|
|
# 2026-05-03.
|
|
- name: CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
|
|
value: /etc/catalyst/handover-jwt-public/public.jwk
|
|
# SOVEREIGN_FQDN — Sovereign's public FQDN. The /auth/handover
|
|
# validator (auth_handover.go) reads this to compute the expected
|
|
# JWT audience claim ("https://console." + SOVEREIGN_FQDN). When
|
|
# unset on a Sovereign, the audience check collapses to "https://
|
|
# console." and every valid token is rejected with "invalid
|
|
# audience" 401 — caught live on otech48, 2026-05-03.
|
|
#
|
|
# NOTE: this file is consumed BOTH by Helm (per-Sovereign install
|
|
# via bp-catalyst-platform OCI chart) AND by Kustomize (contabo-
|
|
# mkt's clusters/contabo-mkt/apps/catalyst-platform Kustomization
|
|
# at path: ./products/catalyst/chart/templates). Kustomize parses
|
|
# raw YAML — Helm template syntax (double-curly directives) here
|
|
# breaks the Kustomize build (caught live on contabo 2026-05-03
|
|
# from commit adf8dc7d: "yaml: invalid map key").
|
|
#
|
|
# Solution: read the value from a ConfigMap that exists ONLY on
|
|
# Sovereigns (not contabo). On contabo the optional reference
|
|
# resolves to empty (correct — catalyst-api on contabo is the
|
|
# SIGNER never the validator, /auth/handover never hits there).
|
|
# On Sovereigns, clusters/_template/sovereign-tls/sovereign-fqdn-
|
|
# configmap.yaml renders the ConfigMap from envsubst-ed
|
|
# ${SOVEREIGN_FQDN} when Flux applies the kustomization.
|
|
- name: SOVEREIGN_FQDN
|
|
valueFrom:
|
|
configMapKeyRef:
|
|
name: sovereign-fqdn
|
|
key: fqdn
|
|
optional: true
|
|
# CATALYST_OTECH_FQDN — same value as SOVEREIGN_FQDN, but read by
|
|
# the SME tenant create handler (sme_tenant.go) and the
|
|
# sovereign-parent-domains seed (sovereign_parent_domains.go).
|
|
# The two envs exist for historical reasons: SOVEREIGN_FQDN is the
|
|
# Phase-8b handover-flow JWT-audience env; CATALYST_OTECH_FQDN is
|
|
# the SME-tier tenant-pipeline env (epic #795 / #804). Both
|
|
# ultimately point at the Sovereign's public FQDN. Wired from the
|
|
# same `sovereign-fqdn` ConfigMap (key `fqdn`). optional=true since
|
|
# Catalyst-Zero (contabo) doesn't run the SME tenant pipeline.
|
|
# Issue #876 — without this, POST /api/v1/sme/tenants returns
|
|
# 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign.
|
|
- name: CATALYST_OTECH_FQDN
|
|
valueFrom:
|
|
configMapKeyRef:
|
|
name: sovereign-fqdn
|
|
key: fqdn
|
|
optional: true
|
|
# CATALYST_SELF_DEPLOYMENT_ID — the deployment-record id this
|
|
# Sovereign was provisioned under on the contabo orchestrator.
|
|
# Read by HandleSovereignSelf (sovereign_self.go) so the
|
|
# Sovereign-side catalyst-ui can resolve /console/<page> to the
|
|
# canonical /provision/<self-id>/<page> deployment-scoped UI.
|
|
# Sourced from the sovereign-fqdn ConfigMap (key
|
|
# selfDeploymentId), stamped by the orchestrator's per-
|
|
# Sovereign overlay writer at handover. Empty on contabo and
|
|
# on freshly-provisioned Sovereigns whose handover hasn't run
|
|
# yet — HandleSovereignSelf returns 503 in that window so
|
|
# the UI shows a "waiting for handover" pill.
|
|
- name: CATALYST_SELF_DEPLOYMENT_ID
|
|
valueFrom:
|
|
configMapKeyRef:
|
|
name: sovereign-fqdn
|
|
key: selfDeploymentId
|
|
optional: true
|
|
# SOVEREIGN_LB_IP — Sovereign's load-balancer public IPv4. Used by
|
|
# the Day-2 multi-domain add-domain flow (issue #900) to
|
|
# pre-register glue records at the customer's registrar before
|
|
# the set_ns flip. Without it Dynadot rejects with
|
|
# "'ns1.<sov>.omani.works' needs to be registered with an ip
|
|
# address before it can be used" — caught live during otech103
|
|
# multi-domain verification.
|
|
#
|
|
# Sourced from the chart's `global.sovereignLBIP` value (rendered
|
|
# into the same `sovereign-fqdn` ConfigMap that holds `fqdn`).
|
|
# optional=true: Catalyst-Zero (contabo) doesn't run the Sovereign-
|
|
# side multi-domain pipeline; the env stays empty and the glue
|
|
# path becomes a no-op (plain set_ns flows through unchanged).
|
|
- name: SOVEREIGN_LB_IP
|
|
valueFrom:
|
|
configMapKeyRef:
|
|
name: sovereign-fqdn
|
|
key: lbIP
|
|
optional: true
|
|
# CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN — basic-auth
|
|
# credentials embedded in the GitOps clone URL (issue #878).
|
|
# Pre-cutover (Catalyst-Zero): User=x-access-token, Token=GitHub
|
|
# PAT (already wired via separate CATALYST_GITOPS_TOKEN secret on
|
|
# contabo). Post-cutover (Sovereign): User=gitea_admin,
|
|
# Token=<gitea-admin-password> from the local Gitea admin secret.
|
|
# The same secret (`gitea-admin-secret`) is mirrored into
|
|
# catalyst-system via the bp-reflector annotation block on
|
|
# bp-gitea (issue #866), so this Sovereign-side wiring works
|
|
# post-Day-2-Independence without a manual mirror step.
|
|
# optional=true: Catalyst-Zero (contabo) does not run the SME
|
|
# tenant pipeline.
|
|
- name: CATALYST_GITOPS_USER
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: gitea-admin-secret
|
|
key: username
|
|
optional: true
|
|
- name: CATALYST_GITOPS_TOKEN
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: gitea-admin-secret
|
|
key: password
|
|
optional: true
|
|
# POOL_DOMAIN_MANAGER_URL — base URL of the central Pool Domain
|
|
# Manager (PDM) ingress on Catalyst-Zero (contabo). Sovereign-
|
|
# side catalyst-api calls PDM's /api/v1/registrar/{r}/set-ns
|
|
# endpoint for the Day-2 multi-domain "Add another parent
|
|
# domain" flow (issue #879, parent epic #825 / #829).
|
|
#
|
|
# Why a public ingress URL (not an in-cluster Service):
|
|
# the in-cluster default `pool-domain-manager.openova-system.
|
|
# svc.cluster.local` ONLY resolves on the contabo cluster
|
|
# (PDM lives in `openova-system` ns there). On a franchised
|
|
# Sovereign post-handover, that DNS name is NXDOMAIN, so
|
|
# every Day-2 add-domain call returned `dial tcp: lookup
|
|
# pool-domain-manager.openova-system.svc.cluster.local on
|
|
# 10.43.0.10:53: no such host` (caught live on otech103,
|
|
# 2026-05-05 — issue #879 verification).
|
|
#
|
|
# The default below points at the public PDM ingress on
|
|
# contabo (`pool.openova.io`). Per Inviolable Principle #4
|
|
# (never hardcode), per-Sovereign overlays may override via
|
|
# `catalystApi.poolDomainManagerURL` in values. Catalyst-Zero
|
|
# (contabo) leaves this default — its catalyst-api Pod hits
|
|
# the SAME public URL via its own loopback ingress (the proxy
|
|
# is idempotent on the source cluster).
|
|
#
|
|
# Pairs with CATALYST_PDM_BASIC_AUTH_USER / _PASS below: the
|
|
# PDM ingress at pool.openova.io is gated by Traefik basicAuth
|
|
# (clusters/contabo-mkt/apps/pool-domain-manager/ingress.yaml).
|
|
# Both halves wired together so a fresh Sovereign reaches PDM
|
|
# without a manual env-var patch.
|
|
#
|
|
# NOTE — DUAL-MODE CONTRACT: this file is consumed BOTH by
|
|
# Helm (per-Sovereign install via bp-catalyst-platform OCI)
|
|
# AND by Kustomize (contabo-mkt's clusters/contabo-mkt/apps/
|
|
# catalyst-platform). The default literal below (no Helm
|
|
# template directives) keeps both build paths clean. Per-
|
|
# Sovereign overlays override via the HelmRelease overlay's
|
|
# `catalystApi.env` additional-env patch (Helm-only, takes
|
|
# precedence over THIS default at template-render time).
|
|
- name: POOL_DOMAIN_MANAGER_URL
|
|
value: "https://pool.openova.io"
|
|
# CATALYST_PDM_BASIC_AUTH_USER / _PASS — basic-auth credentials
|
|
# for the PDM public ingress (issue #879 Bug 2). The Sovereign-
|
|
# side catalyst-api adds `Authorization: Basic …` to every PDM
|
|
# call so the Traefik basicAuth Middleware in front of
|
|
# pool.openova.io accepts the request. Without this, every
|
|
# Day-2 add-domain call returns 401 from PDM (caught live on
|
|
# otech103).
|
|
#
|
|
# Source Secret (`pdm-basicauth`, keys `username` + `password`)
|
|
# is pre-provisioned by cloud-init on every Sovereign at
|
|
# provision time, mirrored via the same Reflector seam ghcr-
|
|
# pull / harbor-robot-token already use. optional=true so:
|
|
# - Catalyst-Zero pods (contabo's catalyst-api) start cleanly
|
|
# when the Secret is absent. On contabo the in-cluster
|
|
# Service path bypasses the ingress entirely and BasicAuth
|
|
# is a no-op.
|
|
# - CI / local dev / older Sovereigns that pre-date this
|
|
# provisioning seam start cleanly. POSTs without auth get
|
|
# 401 from PDM with a clear log line, instead of the Pod
|
|
# crashlooping on start.
|
|
#
|
|
# Per Inviolable Principle #10: the credentials never enter a
|
|
# logged struct or a deployment record — loaded into the Pod
|
|
# env once at start, read per-call by pdmFlipNS only.
|
|
- name: CATALYST_PDM_BASIC_AUTH_USER
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: pdm-basicauth
|
|
key: username
|
|
optional: true
|
|
- name: CATALYST_PDM_BASIC_AUTH_PASS
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: pdm-basicauth
|
|
key: password
|
|
optional: true
|
|
# CATALYST_HANDOVER_KEY_PATH — path to the RS256 PRIVATE key
|
|
# catalyst-api uses to mint magic-link + handover JWTs. The
|
|
# signer auto-generates the keypair on first start if absent.
|
|
# MUST be on a writable PVC mount. Catalyst-Zero only.
|
|
- name: CATALYST_HANDOVER_KEY_PATH
|
|
value: /var/lib/catalyst/handover-jwt-private.pem
|
|
# ── Magic-link auth (issue #608, Phase-8b Agent A) ──────────────
|
|
# CATALYST_KC_CLIENT_ID — OIDC client ID for the Catalyst-Zero
|
|
# UI (catalyst-zero-ui PKCE client). Defaults to "catalyst-zero-ui"
|
|
# in code; override here for multi-tenant or custom client names.
|
|
# optional=true: Sovereign clusters don't use this auth path.
|
|
- name: CATALYST_KC_CLIENT_ID
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-magic-link-credentials
|
|
key: kc-client-id
|
|
optional: true
|
|
# CATALYST_KC_REDIRECT_URI — OAuth callback URL the Keycloak magic-
|
|
# link redirects to after verification (e.g.
|
|
# https://console.openova.io/sovereign/auth/callback).
|
|
# Per INVIOLABLE-PRINCIPLES #4: runtime configuration, not hardcoded.
|
|
- name: CATALYST_KC_REDIRECT_URI
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-magic-link-credentials
|
|
key: kc-redirect-uri
|
|
optional: true
|
|
# CATALYST_SESSION_COOKIE_SECRET — HMAC-SHA256 key for signing the
|
|
# catalyst_session HttpOnly cookie value. 32 random bytes (base64url
|
|
# encoded). Rotation invalidates all active sessions.
|
|
- name: CATALYST_SESSION_COOKIE_SECRET
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-magic-link-credentials
|
|
key: session-cookie-secret
|
|
optional: true
|
|
# CATALYST_POST_AUTH_REDIRECT — URL the browser is sent to after a
|
|
# successful magic-link / PIN callback. Defaults to /wizard in code.
|
|
# Catalyst-Zero (contabo) routes the UI under the /sovereign prefix
|
|
# (Traefik strip-prefix is transparent to the server-side Location
|
|
# header), so contabo overrides this to /sovereign/wizard via the
|
|
# per-environment overlay. On a freshly franchised Sovereign the
|
|
# wizard is mothership-only — empty page on /sovereign/wizard.
|
|
# The post-handover Sovereign Console homepage is /sovereign/components,
|
|
# so that's the default we now ship (issue #901, 2026-05-05).
|
|
#
|
|
# DUAL-MODE CONTRACT — see CATALYST_POWERDNS_API_URL block above:
|
|
# this file is consumed by both Helm (Sovereign) and Kustomize
|
|
# (contabo-mkt). Helm template directives (curly-brace syntax) in
|
|
# `value:` break the Kustomize render with "yaml: invalid map key".
|
|
# So this default is a literal. Per-Sovereign overrides go through
|
|
# the HelmRelease overlay's `catalystApi.env` additional-env patch,
|
|
# NOT through this file.
|
|
#
|
|
# Per INVIOLABLE-PRINCIPLES #4: the override seam exists (overlay
|
|
# env patch); only the chart-shipped default is a literal.
|
|
- name: CATALYST_POST_AUTH_REDIRECT
|
|
value: "/sovereign/components"
|
|
# ── Option-B magic-link: openova realm service account ───────────
|
|
# CATALYST_OPENOVA_KC_ADDR — Keycloak base URL for the openova realm.
|
|
# Defaults in code to keycloak-zero.keycloak-zero.svc (in-cluster
|
|
# on Catalyst-Zero). optional=true: Sovereign clusters don't run
|
|
# the openova realm.
|
|
- name: CATALYST_OPENOVA_KC_ADDR
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: kc-addr
|
|
optional: true
|
|
- name: CATALYST_OPENOVA_KC_REALM
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: kc-realm
|
|
optional: true
|
|
- name: CATALYST_OPENOVA_KC_SA_CLIENT_ID
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: kc-sa-client-id
|
|
optional: true
|
|
- name: CATALYST_OPENOVA_KC_SA_CLIENT_SECRET
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: kc-sa-client-secret
|
|
optional: true
|
|
# CATALYST_OPENOVA_KC_AUDIENCE — OIDC audience for KC token-exchange.
|
|
# Defaults to "catalyst-zero-ui" in code. optional=true.
|
|
- name: CATALYST_OPENOVA_KC_AUDIENCE
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: kc-audience
|
|
optional: true
|
|
# CATALYST_SMTP_HOST / CATALYST_SMTP_PORT — Stalwart SMTP relay for
|
|
# magic-link email delivery. Defaults in code to
|
|
# stalwart-web.stalwart.svc.cluster.local:587. optional=true.
|
|
- name: CATALYST_SMTP_HOST
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: smtp-host
|
|
optional: true
|
|
- name: CATALYST_SMTP_PORT
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: smtp-port
|
|
optional: true
|
|
- name: CATALYST_SMTP_USER
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: smtp-user
|
|
optional: true
|
|
- name: CATALYST_SMTP_PASS
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: smtp-pass
|
|
optional: true
|
|
- name: CATALYST_SMTP_FROM
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: catalyst-openova-kc-credentials
|
|
key: smtp-from
|
|
optional: true
|
|
# CATALYST_SESSION_COOKIE_DOMAIN — optional domain scoping for the
|
|
# catalyst_session + catalyst_refresh cookies.
|
|
#
|
|
# Why this is empty by default (issue #910 Bug 2)
|
|
# ===============================================
|
|
# Pre-1.4.19 this was hardcoded `console.openova.io` because that
|
|
# was the host Catalyst-Zero (contabo) serves both /sovereign/wizard
|
|
# and /sovereign/auth/magic from. On contabo that worked: the
|
|
# request host == the cookie domain, so the browser accepted the
|
|
# Set-Cookie and re-presented it on every subsequent request.
|
|
#
|
|
# On a freshly franchised Sovereign (e.g. console.otech105.omani.
|
|
# works, caught live 2026-05-05) the same hardcoded value made the
|
|
# browser refuse to bind the cookie at all: the Set-Cookie header
|
|
# had `Domain=console.openova.io` while the request host was
|
|
# `console.otech105.omani.works`. RFC 6265 §5.3 step 6 rejects any
|
|
# Set-Cookie where the request URI's host is not the cookie's
|
|
# domain (or a sub-domain). The browser silently dropped the
|
|
# cookie → next /api/* request had no session → backend redirected
|
|
# to /login → infinite loop. Login broke for every Sovereign.
|
|
#
|
|
# Empty value contract: when CATALYST_SESSION_COOKIE_DOMAIN is
|
|
# empty, the auth handler omits the Domain attribute from
|
|
# Set-Cookie. Per RFC 6265 the browser binds the cookie to the
|
|
# exact request host. That is the correct behaviour on BOTH:
|
|
# - Sovereign: request host = console.<sov-fqdn>, cookie binds
|
|
# there, /api/* on the same host re-presents it.
|
|
# - Catalyst-Zero (contabo): request host = console.openova.io,
|
|
# cookie binds there. Wizard + magic-link callbacks are
|
|
# served from the same Ingress so a single cookie jar is
|
|
# sufficient.
|
|
#
|
|
# Per the dual-mode contract documented in the
|
|
# CATALYST_POWERDNS_API_URL block above, this MUST stay a literal
|
|
# value (no Helm template directives) so the Kustomize-mode
|
|
# contabo build keeps parsing. Per-Sovereign overlays MAY
|
|
# override via the `catalystApi.env` additional-env patch in the
|
|
# per-cluster HelmRelease (Helm-only codepath, takes precedence
|
|
# over this default at template-render time).
|
|
- name: CATALYST_SESSION_COOKIE_DOMAIN
|
|
value: ""
|
|
resources:
|
|
requests:
|
|
cpu: 50m
|
|
memory: 128Mi
|
|
limits:
|
|
# tofu provider plugins (hcloud ~80MB, dynadot ~30MB) + state +
|
|
# plan files easily exceed the prior 64Mi cap. 1Gi gives headroom
|
|
# for parallel provider init and sustained `apply` work.
|
|
cpu: 1000m
|
|
memory: 1Gi
|
|
# Liveness vs readiness — the split is REQUIRED, not cosmetic
|
|
# (issue #530). /healthz is liveness: it returns 200 whenever
|
|
# the catalyst-api process is up and the HTTP server is
|
|
# serving. /readyz is readiness: it returns 200 only when the
|
|
# primary Sovereign's Pod + Deployment informers are synced
|
|
# (or no Sovereigns are registered yet).
|
|
#
|
|
# The previous wiring pointed BOTH probes at /healthz AND
|
|
# /healthz performed the strict informer-sync check. The
|
|
# crashloop chain that followed:
|
|
#
|
|
# 1. Operator POSTs a fresh deployment.
|
|
# 2. catalyst-api registers the Sovereign in k8scache and
|
|
# starts looking for a kubeconfig file on the PVC.
|
|
# 3. Kubeconfig will NOT arrive until the new Sovereign's
|
|
# cloud-init runs (~60-120s) and PUTs it back. Until
|
|
# then, informers cannot start, sync flips false.
|
|
# 4. /healthz returns 503. kubelet kills the Pod on the
|
|
# next liveness probe (~33s).
|
|
# 5. Restarted Pod restores deployments from the PVC,
|
|
# re-registers the Sovereign, re-enters the same
|
|
# no-kubeconfig state. Loop repeats.
|
|
# 6. Service has zero ready endpoints throughout. nginx
|
|
# returns 502 to cloud-init's kubeconfig PUT. The PUT
|
|
# never reaches catalyst-api. Provision stalls forever.
|
|
#
|
|
# The fix: liveness must be process-level (am I up?), NOT
|
|
# workload-level (do I have a kubeconfig?). The strict
|
|
# informer-sync check stays — moved to /readyz — so a Pod
|
|
# whose primary Sovereign is mid-sync briefly drops out of
|
|
# the Service rotation but is NOT restarted. The kubeconfig
|
|
# PUT endpoint reaches catalyst-api the moment cloud-init
|
|
# calls it, breaking the deadlock.
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /healthz
|
|
port: 8080
|
|
initialDelaySeconds: 3
|
|
periodSeconds: 10
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /readyz
|
|
port: 8080
|
|
initialDelaySeconds: 2
|
|
periodSeconds: 5
|
|
securityContext:
|
|
allowPrivilegeEscalation: false
|
|
# readOnlyRootFilesystem deliberately false: the bootstrap installer
|
|
# writes kubeconfig temp files (mode 0600) under /tmp and helm
|
|
# downloads chart caches under $HOME. Per Catalyst security policy
|
|
# these writes are scoped via emptyDir below, never to the image's
|
|
# actual root FS.
|
|
readOnlyRootFilesystem: false
|
|
runAsNonRoot: true
|
|
runAsUser: 65534
|
|
volumeMounts:
|
|
- name: tmp
|
|
mountPath: /tmp
|
|
- name: home
|
|
mountPath: /home/nonroot
|
|
# Catalyst PVC — mounted at /var/lib/catalyst so two
|
|
# subdirectories live on the same single-attach volume:
|
|
#
|
|
# deployments/<id>.json — flat-file deployment store.
|
|
# Every catalyst-api restart that rehydrates from
|
|
# this directory closes the user-reported regression
|
|
# where a deployment id created at 12:57 vanished
|
|
# after 6 image rolls. The store walks every *.json
|
|
# on startup; in-flight rows are rewritten to
|
|
# `failed` with operator instructions for purging
|
|
# orphaned Hetzner resources.
|
|
#
|
|
# kubeconfigs/<id>.yaml — plaintext kubeconfig POSTed
|
|
# back from cloud-init via the bearer-token endpoint
|
|
# (issue #183, Option D). Mode 0600 per file. The
|
|
# path is persisted in the deployment record so a
|
|
# Pod restart mid-Phase-1 reattaches the helmwatch
|
|
# goroutine.
|
|
#
|
|
# One PVC, one mount — keeps the failure modes (PVC
|
|
# unbind, fs full) bounded to one volume, and lets the
|
|
# Go process create both subdirectories on startup
|
|
# without a second volume claim or init container.
|
|
- name: catalyst
|
|
mountPath: /var/lib/catalyst
|
|
# k8scache disk-snapshot mount (issue #321). Separate PVC
|
|
# so cache size is independent of deployment-record
|
|
# storage. The k8scache loop writes one JSON per
|
|
# (cluster, kind) here, mode 0600. Pruned by the loop
|
|
# itself when a snapshot ages past 1h.
|
|
- name: sov-cache
|
|
mountPath: /var/cache/sov-cache
|
|
# handover-jwt-public — RS256 public key JWK distributed by
|
|
# cloud-init from Catalyst-Zero's signing keypair. Mounted
|
|
# read-only as a directory under /etc/catalyst/ (NOT under
|
|
# /var/lib/catalyst because that is the catalyst-api PVC; a
|
|
# leftover empty directory at the legacy file path from
|
|
# pre-#606 installs would collide with a subPath file mount on
|
|
# re-provision). The JWK lives at /etc/catalyst/handover-jwt-
|
|
# public/public.jwk — see CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
|
|
# above. optional=true on the Secret so pods start on
|
|
# Catalyst-Zero (which is the SIGNER, not the verifier) and
|
|
# in CI where the Secret may be absent.
|
|
- name: handover-jwt-public
|
|
mountPath: /etc/catalyst/handover-jwt-public
|
|
readOnly: true
|
|
volumes:
|
|
- name: tmp
|
|
emptyDir:
|
|
# 2Gi to hold the per-deployment OpenTofu workdir tree under
|
|
# /tmp/catalyst/tofu/<sovereign-fqdn>/ (provider plugins + state
|
|
# + plan binary). Each Sovereign run gets its own subdirectory.
|
|
sizeLimit: 2Gi
|
|
- name: home
|
|
emptyDir:
|
|
sizeLimit: 256Mi
|
|
# Persistent catalyst-api state — mounted at /var/lib/catalyst
|
|
# so deployments/ and kubeconfigs/ share one volume. The PVC
|
|
# must already exist in the same namespace under the name
|
|
# catalyst-api-deployments; see api-deployments-pvc.yaml in
|
|
# this chart. Single-attach (RWO) is fine because the
|
|
# Deployment is single-replica with the Recreate strategy
|
|
# declared above; a future HA rework would need RWX or a
|
|
# different persistence layer.
|
|
- name: catalyst
|
|
persistentVolumeClaim:
|
|
claimName: catalyst-api-deployments
|
|
# k8scache disk-snapshot PVC (issue #321). 5Gi RWO; see
|
|
# api-cache-pvc.yaml for the sizing + cold-start contract.
|
|
- name: sov-cache
|
|
persistentVolumeClaim:
|
|
claimName: catalyst-api-cache
|
|
# handover-jwt-public — RS256 public key JWK written by cloud-init
|
|
# from Catalyst-Zero's signing keypair. Secret is optional so
|
|
# Catalyst-Zero pods (the signer) and CI start without it.
|
|
- name: handover-jwt-public
|
|
secret:
|
|
secretName: catalyst-handover-jwt-public
|
|
optional: true
|
|
items:
|
|
- key: public.jwk
|
|
path: public.jwk
|