openova/products/catalyst/chart/templates/api-deployment.yaml
2026-04-29 18:17:32 +00:00

321 lines
16 KiB
YAML

apiVersion: apps/v1
kind: Deployment
metadata:
name: catalyst-api
labels:
app.kubernetes.io/name: catalyst-api
app.kubernetes.io/component: api
annotations:
# `kustomize.toolkit.fluxcd.io/force: enabled` is the durable
# remediation for the `RollingUpdate -> Recreate` strategy-flip
# collision documented in docs/CHART-AUTHORING.md §"Strategy flips
# on existing Deployments".
#
# Failure mode this addresses
# ---------------------------
# On 2026-04-29 the `catalyst` Flux Kustomization on contabo-mkt
# stuck at Ready=False with:
#
# Deployment.apps "catalyst-api" is invalid:
# spec.strategy.rollingUpdate: Forbidden:
# may not be specified when strategy `type` is 'Recreate'
#
# Root cause: the live Deployment had been previously created with
# the default `RollingUpdate` strategy (so `rollingUpdate.maxSurge=25%`
# and `maxUnavailable=25%` were present on the live object, owned
# by the `kubectl-client-side-apply` field manager). Flux's
# kustomize-controller submits this manifest via Server-Side Apply
# with field manager `kustomize-controller`. SSA's contract is
# "set the fields you declare" — it does NOT remove fields owned
# by other managers. Result: post-merge object had `type: Recreate`
# AND the residual `rollingUpdate.*` block, which the API server's
# validator rejects as invalid (Recreate forbids any rollingUpdate
# keys). SSA is REQUIRED to reject the merge. No SSA-only chart
# change can fix this.
#
# Why `$patch: replace` does NOT solve this
# -----------------------------------------
# The Strategic Merge Patch directive `$patch: replace` would tell
# an SMP-aware merger to REPLACE the strategy block instead of
# merging into it. But:
# - SSA rejects `$patch` outright with "field not declared in
# schema" (it's not in apps/v1 Deployment).
# - kubectl strict-decoding rejects `$patch` on CREATE under any
# mode with "unknown field spec.strategy.$patch" — so adding
# it to the chart manifest BREAKS fresh installs.
# `$patch: replace` is a runtime SMP directive, never a chart-spec
# value. It belongs in a Kustomize `patches:` entry (where the
# kustomize binary consumes it at build time and emits a clean
# output) — never inline in a base resource.
#
# Why the Flux force annotation IS the right fix
# ----------------------------------------------
# When kustomize-controller's SSA submission fails dry-run with an
# Invalid response, this annotation directs the controller to
# recover by deleting and recreating THIS resource specifically
# (not the whole Kustomization). The recreated Deployment has no
# residual `rollingUpdate.*` fields — the regression cannot
# recur on the rebuilt object.
#
# That is NOT a "kubectl delete bandaid": the annotation is part
# of the IaC manifest, version-controlled, applied declaratively
# via Flux on every reconciliation, scoped to this single
# Deployment, and removed only by editing the chart. Per
# docs/INVIOLABLE-PRINCIPLES.md #3 (Follow the documented
# architecture, exactly — Flux is the ONLY GitOps reconciler) and
# #4 (Never hardcode — runtime configuration in Git, not in shell
# history): the remediation lives in source control.
#
# Why this Deployment in particular tolerates a recreate: the
# spec declares `strategy.type: Recreate`, so the steady-state
# update path is delete-and-recreate anyway. Flux falling back to
# delete-and-recreate on a strategy-flip is a no-op relative to a
# normal pod-spec change. The deployments PVC is ReadWriteOnce;
# the recreate flow detaches it from the old Pod before mounting
# it on the new one, which is exactly the contract `Recreate`
# enforces. State persistence is maintained because the PVC
# itself is NOT recreated by this annotation — only the
# Deployment resource is.
kustomize.toolkit.fluxcd.io/force: enabled
spec:
replicas: 1
# Recreate strategy is required because the deployments PVC is RWO
# (single-attach). A rolling update would try to schedule a second
# Pod that mounts the same PVC, which Kubernetes rejects as a
# MultiAttachError. RWX with a multi-writer-aware filesystem
# (NFS, CephFS) is the path to HA, but Catalyst-Zero today is
# single-replica by design — the wizard is interactive and PDM owns
# cross-tenant isolation, so a single API server is sufficient.
#
# The strategy-flip regression that bit contabo-mkt on 2026-04-29
# (apply over a pre-existing RollingUpdate Deployment fails with
# `spec.strategy.rollingUpdate: Forbidden`) is recovered by the
# `kustomize.toolkit.fluxcd.io/force: enabled` annotation above —
# see that annotation's comment for the full failure-mode analysis
# and the docs/CHART-AUTHORING.md §"Strategy flips on existing
# Deployments" entry. Do NOT add an inline `$patch: replace` here:
# it BREAKS fresh installs (kubectl strict-decoding rejects
# `spec.strategy.$patch` on create), and Flux's SSA path strips it
# anyway. The integration test at tests/integration/strategy-flip.yaml
# asserts both the recovery path works and the regression mode is
# still detected.
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: catalyst-api
template:
metadata:
labels:
app.kubernetes.io/name: catalyst-api
spec:
imagePullSecrets:
- name: ghcr-pull-secret
# fsGroup applies to the volumes mounted into the Pod so the
# non-root container UID (65534) can write to the deployments
# PVC. Without this, Hetzner Cloud Volumes default to root:root
# and the catalyst-api process gets EACCES on every store.Save —
# surfacing as the "deployment store unavailable" warning at
# startup and silent persistence failures at runtime.
#
# fsGroupChangePolicy: OnRootMismatch limits the chown traversal
# to first start (where the volume is freshly provisioned with
# the wrong UID). Subsequent restarts skip the recursive chown
# if the root dir already matches, keeping Pod start times
# bounded as the deployments directory grows.
securityContext:
fsGroup: 65534
fsGroupChangePolicy: OnRootMismatch
containers:
- name: catalyst-api
image: ghcr.io/openova-io/openova/catalyst-api:139ce37
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
protocol: TCP
env:
- name: PORT
value: "8080"
- name: CORS_ORIGIN
value: "https://catalyst.openova.io"
- name: DYNADOT_API_KEY
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: api-key
- name: DYNADOT_API_SECRET
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: api-secret
# DYNADOT_MANAGED_DOMAINS — comma-separated list of pool domains
# the same Dynadot account manages. Per docs/INVIOLABLE-PRINCIPLES.md
# #4, this is runtime configuration so adding a third pool domain
# (e.g. acme.io) does NOT require a code change — only a secret
# update. The Dynadot API is account-scoped (one api-key/api-secret
# pair covers every domain owned by the account); this list scopes
# which domains the catalyst-api is *allowed* to write records for,
# defending against misconfiguration that would let a wizard-
# supplied poolDomain trigger writes against an unrelated domain.
- name: DYNADOT_MANAGED_DOMAINS
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: domains
# optional=true so deployments using the legacy single-value
# `domain` key (pre-#108) keep working until the secret is
# migrated; the dynadot package falls through to DYNADOT_DOMAIN
# then to its built-in defaults if neither key is present.
optional: true
- name: DYNADOT_DOMAIN
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: domain
optional: true
# CATALYST_TOFU_WORKDIR — provisioner runs `tofu init/plan/apply`
# inside this directory. Default in code (/var/lib/catalyst/tofu)
# is unwritable for UID 65534 because the only emptyDir mounts on
# this Pod are /tmp and /home/nonroot. We pin to /tmp/catalyst so
# the writable emptyDir backs the per-Sovereign workdir tree.
- name: CATALYST_TOFU_WORKDIR
value: /tmp/catalyst/tofu
# CATALYST_DEPLOYMENTS_DIR — flat-file store for deployment
# records (one JSON file per deployment id). Backed by the
# PVC mount below so deployments persist across Pod
# restarts. Each record is the full Deployment state with
# credentials redacted; see internal/store/store.go.
- name: CATALYST_DEPLOYMENTS_DIR
value: /var/lib/catalyst/deployments
# CATALYST_KUBECONFIGS_DIR — sibling directory on the same
# PVC for the plaintext kubeconfigs the new Sovereign POSTs
# back via the bearer-token endpoint (issue #183, Option D).
# One <id>.yaml per deployment, mode 0600. The store JSON
# record carries only the file path + a SHA-256 hash of
# the bearer; the plaintext kubeconfig is NEVER serialized
# into the JSON.
- name: CATALYST_KUBECONFIGS_DIR
value: /var/lib/catalyst/kubeconfigs
# CATALYST_API_PUBLIC_URL — the public origin the new
# Sovereign's cloud-init PUTs its kubeconfig back to. The
# OpenTofu module templates this into the Sovereign's
# user_data so the Sovereign knows where to call. Per
# docs/INVIOLABLE-PRINCIPLES.md #4 this is runtime
# configuration; air-gapped franchises override it
# without code change.
- name: CATALYST_API_PUBLIC_URL
value: https://console.openova.io/sovereign
# CATALYST_GHCR_PULL_TOKEN — long-lived GHCR pull token that
# the provisioner stamps onto every Request and the OpenTofu
# cloud-init template writes into the new Sovereign's
# flux-system/ghcr-pull Secret so Flux source-controller
# can pull private bp-* OCI artifacts from
# ghcr.io/openova-io/. Without this, Phase 1 stalls at
# bp-cilium with "secrets ghcr-pull not found" — verified
# live on omantel.omani.works pre-fix.
#
# optional: true — when the Secret or key is missing the
# Pod still starts (with the env var unset). The
# provisioner's Validate() rejects deployments that need
# the token (Phase 1 bootstrap-kit pulls private bp-*
# charts) with a clear pointer to docs/SECRET-ROTATION.md,
# so a misconfigured catalyst-api fails fast on
# /api/v1/deployments POST instead of silently mid-apply.
# /healthz, /api/v1/credentials/validate, and the BYO
# registrar proxy keep working — they don't read the
# token at all.
#
# Rotation: yearly, see docs/SECRET-ROTATION.md. The Secret
# is created out-of-band by an operator (never via Flux,
# never committed to git) — the chart references it but
# does not template it.
- name: CATALYST_GHCR_PULL_TOKEN
valueFrom:
secretKeyRef:
name: catalyst-ghcr-pull-token
key: token
optional: true
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
# tofu provider plugins (hcloud ~80MB, dynadot ~30MB) + state +
# plan files easily exceed the prior 64Mi cap. 1Gi gives headroom
# for parallel provider init and sustained `apply` work.
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 2
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
# readOnlyRootFilesystem deliberately false: the bootstrap installer
# writes kubeconfig temp files (mode 0600) under /tmp and helm
# downloads chart caches under $HOME. Per Catalyst security policy
# these writes are scoped via emptyDir below, never to the image's
# actual root FS.
readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 65534
volumeMounts:
- name: tmp
mountPath: /tmp
- name: home
mountPath: /home/nonroot
# Catalyst PVC — mounted at /var/lib/catalyst so two
# subdirectories live on the same single-attach volume:
#
# deployments/<id>.json — flat-file deployment store.
# Every catalyst-api restart that rehydrates from
# this directory closes the user-reported regression
# where a deployment id created at 12:57 vanished
# after 6 image rolls. The store walks every *.json
# on startup; in-flight rows are rewritten to
# `failed` with operator instructions for purging
# orphaned Hetzner resources.
#
# kubeconfigs/<id>.yaml — plaintext kubeconfig POSTed
# back from cloud-init via the bearer-token endpoint
# (issue #183, Option D). Mode 0600 per file. The
# path is persisted in the deployment record so a
# Pod restart mid-Phase-1 reattaches the helmwatch
# goroutine.
#
# One PVC, one mount — keeps the failure modes (PVC
# unbind, fs full) bounded to one volume, and lets the
# Go process create both subdirectories on startup
# without a second volume claim or init container.
- name: catalyst
mountPath: /var/lib/catalyst
volumes:
- name: tmp
emptyDir:
# 2Gi to hold the per-deployment OpenTofu workdir tree under
# /tmp/catalyst/tofu/<sovereign-fqdn>/ (provider plugins + state
# + plan binary). Each Sovereign run gets its own subdirectory.
sizeLimit: 2Gi
- name: home
emptyDir:
sizeLimit: 256Mi
# Persistent catalyst-api state — mounted at /var/lib/catalyst
# so deployments/ and kubeconfigs/ share one volume. The PVC
# must already exist in the same namespace under the name
# catalyst-api-deployments; see api-deployments-pvc.yaml in
# this chart. Single-attach (RWO) is fine because the
# Deployment is single-replica with the Recreate strategy
# declared above; a future HA rework would need RWX or a
# different persistence layer.
- name: catalyst
persistentVolumeClaim:
claimName: catalyst-api-deployments