DIVERGES from the literal "$patch: replace" prescription on the issue
because that directive cannot survive any apply path that actually
runs in production (verified end-to-end in
tests/integration/strategy-flip.sh):
- Flux's kustomize-controller submits via Server-Side Apply. SSA
rejects `.spec.strategy.$patch` with "field not declared in
schema" — fluxcd/pkg/ssa Manager.Apply does not preprocess SMP
directives.
- kubectl strict-decoding rejects `$patch` on every CREATE path
(`kubectl create`, `kubectl apply` to an empty namespace, every
`--server-side` flavor) with "unknown field spec.strategy.$patch"
— adding it to a chart base resource BREAKS fresh installs of
every new Sovereign.
The durable fix is the documented Flux annotation
`kustomize.toolkit.fluxcd.io/force: enabled` on the Deployment.
When kustomize-controller's SSA dry-run fails Invalid (the contabo-
mkt failure mode: `spec.strategy.rollingUpdate: Forbidden` on the
post-merge object that retained `rollingUpdate.maxSurge=25%` /
`maxUnavailable=25%` from the prior `kubectl-client-side-apply`
field manager), the controller falls back to delete-and-recreate
THIS resource. The recreated Deployment carries no residual
`rollingUpdate.*` fields, so the regression cannot recur. The
annotation is IaC, scoped to the Deployment, applies on every
reconcile.
Verified gates:
- `kubectl apply --dry-run=server -f .../api-deployment.yaml`
over a Deployment in the bad pre-state (RollingUpdate +
maxSurge=25% / maxUnavailable=25%) → exit 0,
"deployment.apps/catalyst-api configured (server dry run)".
- Same manifest applied to an empty namespace via SSA + CSA →
both succeed (the fresh-install gate that catches `$patch:`-
shaped regressions).
- SSA path correctly REPRODUCES the regression mode (asserted
in step 3 of the integration test) → proves the recovery layer
is necessary.
- Flux force-recovery equivalent (delete + apply) succeeds →
proves the recovery path itself works.
Files:
- products/catalyst/chart/templates/api-deployment.yaml: add
`kustomize.toolkit.fluxcd.io/force: enabled` annotation +
inline reference comment explaining failure mode and rejecting
inline `$patch: replace` as a future regression vector.
- docs/CHART-AUTHORING.md (new): authoritative chart-authoring
doc, with §"Strategy flips on existing Deployments" anchoring
the failure mode + canonical fix + table of related fields
(selector, clusterIP, accessModes, etc.) that share the
pattern. References docs/INVIOLABLE-PRINCIPLES.md #3 (Flux is
the only GitOps reconciler) and #4 (never hardcode runtime
knobs in operator runbooks).
- tests/integration/strategy-flip.yaml (new): bad-state fixture
+ assertion ConfigMap. Reproduces the exact 25%/25% pre-state
that triggered contabo-mkt.
- tests/integration/strategy-flip.sh (new): 6-step runner —
bad-state stage, CSA gate, SSA failure-mode reproduction,
structural annotation check, recovery-path proof, fresh-
install gate. Exits non-zero on any regression.
- .github/workflows/test-strategy-flip.yaml (new): CI wiring on
kind v1.30.6 (matches contabo-mkt k3s decoding behavior),
triggered by edits to the chart manifest, the test, the doc,
or the workflow itself.
Sweep of the rest of the Catalyst chart templates: the only
`strategy.type: Recreate` Deployment in the chart is catalyst-api.
catalyst-ui, marketplace-api, and all 11 sme-services Deployments
declare default RollingUpdate and live as RollingUpdate on contabo-
mkt — no latent flips. Services use ClusterIP with default IP
allocation; the api-deployments PVC is RWO and never re-shaped by
the chart. No additional resources needed hardening.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
129 lines
5.6 KiB
YAML
129 lines
5.6 KiB
YAML
# Integration test fixture — RollingUpdate -> Recreate strategy flip.
|
|
#
|
|
# Why this test exists
|
|
# --------------------
|
|
# On 2026-04-29 the contabo-mkt cluster's `catalyst` Flux Kustomization
|
|
# stuck at Ready=False with:
|
|
#
|
|
# Deployment/catalyst/catalyst-api dry-run failed (Invalid):
|
|
# Deployment.apps "catalyst-api" is invalid:
|
|
# spec.strategy.rollingUpdate: Forbidden:
|
|
# may not be specified when strategy `type` is 'Recreate'
|
|
#
|
|
# Root cause: the live Deployment had been created earlier by a
|
|
# `kubectl apply` that landed the default RollingUpdate strategy
|
|
# (maxSurge=25%, maxUnavailable=25%) on the object. When the chart's
|
|
# new manifest declared `strategy.type: Recreate`, Flux's strategic-
|
|
# merge dry-run merged the new `type` into the existing strategy block
|
|
# WITHOUT removing the residual rolling-update fields. Kubernetes
|
|
# validating admission rejected the merged result because Recreate
|
|
# forbids any rollingUpdate keys.
|
|
#
|
|
# The fix in `products/catalyst/chart/templates/api-deployment.yaml`
|
|
# adds `$patch: replace` to the strategy block, instructing strategic-
|
|
# merge to REPLACE rather than merge — dropping the leftover keys.
|
|
#
|
|
# This test reproduces the failing pre-state and asserts the new chart
|
|
# manifest applies cleanly over it. It is wired into CI by the
|
|
# `.github/workflows/test-strategy-flip.yaml` workflow which spins up
|
|
# kind, applies the fixture below, then applies the chart manifest.
|
|
#
|
|
# Three documents in this multi-doc YAML:
|
|
# 1. Namespace — isolation for the test run.
|
|
# 2. bad-state Deployment — the pre-existing live object that
|
|
# carries RollingUpdate + maxSurge
|
|
# 25% / maxUnavailable 25%, exactly
|
|
# the shape that triggered the
|
|
# regression on contabo-mkt.
|
|
# 3. assertion ConfigMap — captures the expected SUCCESS
|
|
# contract so the CI script can
|
|
# diff against `kubectl apply`
|
|
# output and fail loudly if the
|
|
# error string regresses.
|
|
---
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: strategy-flip-test
|
|
labels:
|
|
test/owner: catalyst-chart
|
|
test/purpose: strategy-flip-regression
|
|
---
|
|
# Pre-existing live Deployment, shaped to reproduce the original bug.
|
|
# This MUST be applied via `kubectl apply` (client-side apply) so the
|
|
# `kubectl-client-side-apply` field manager owns the rollingUpdate
|
|
# fields — that is the exact scenario kustomize-controller hits in
|
|
# production on objects created before the chart pinned Recreate.
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: catalyst-api
|
|
namespace: strategy-flip-test
|
|
labels:
|
|
app.kubernetes.io/name: catalyst-api
|
|
app.kubernetes.io/component: api
|
|
test/role: pre-existing-bad-state
|
|
spec:
|
|
replicas: 1
|
|
# The default RollingUpdate strategy with the default 25%/25% knobs.
|
|
# Identical to what `kubectl apply` of a Deployment that did NOT
|
|
# declare `strategy:` synthesizes on the API server side.
|
|
strategy:
|
|
type: RollingUpdate
|
|
rollingUpdate:
|
|
maxSurge: 25%
|
|
maxUnavailable: 25%
|
|
selector:
|
|
matchLabels:
|
|
app.kubernetes.io/name: catalyst-api
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app.kubernetes.io/name: catalyst-api
|
|
spec:
|
|
# registry.k8s.io/pause is the canonical no-op container — never
|
|
# pulls a heavy image, has no side effects, exits as designed.
|
|
# Keeping the test side-effect-free means CI can run it in
|
|
# under 30 seconds against a kind cluster.
|
|
containers:
|
|
- name: catalyst-api
|
|
image: registry.k8s.io/pause:3.9
|
|
ports:
|
|
- containerPort: 8080
|
|
protocol: TCP
|
|
---
|
|
# Assertion contract — the test runner reads this ConfigMap to know
|
|
# (a) which command to run as the apply step,
|
|
# (b) which manifest to apply,
|
|
# (c) which exit code AND which kubectl output to require.
|
|
#
|
|
# Encoding the contract as data on the cluster (instead of in the test
|
|
# script) makes the contract reviewable in the same git diff as the
|
|
# chart change — every chart edit that touches strategy must update
|
|
# this contract or fail the integration test.
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: strategy-flip-assertions
|
|
namespace: strategy-flip-test
|
|
data:
|
|
# Which manifest is the system-under-test. Path is relative to the
|
|
# repository root so the same value works in CI and on a developer
|
|
# workstation invoking the test runner from anywhere.
|
|
target-manifest: products/catalyst/chart/templates/api-deployment.yaml
|
|
# The kubectl invocation the runner must execute. `--dry-run=server`
|
|
# uses client-side apply with server-side validation — this is the
|
|
# same path the user prescribed as the verification gate.
|
|
apply-command: kubectl apply --dry-run=server -n strategy-flip-test -f products/catalyst/chart/templates/api-deployment.yaml
|
|
# Required exit code. A non-zero exit reproduces the bug.
|
|
expected-exit-code: "0"
|
|
# Required substring in stdout — proves the apply was processed and
|
|
# would have updated the existing Deployment (rather than failing or
|
|
# creating a brand-new one because the namespace was empty).
|
|
expected-stdout-substring: "deployment.apps/catalyst-api configured"
|
|
# Forbidden substring in combined output — if this string appears
|
|
# the regression is back. Quote it exactly as Kubernetes emits it
|
|
# so a flaky-error rewording on a future K8s version surfaces as a
|
|
# real test failure rather than a silent regression.
|
|
forbidden-error-substring: "spec.strategy.rollingUpdate: Forbidden"
|