openova/tests/integration/strategy-flip.yaml
hatiyildiz 015e7ab18b fix(catalyst-chart): annotate api-deployment for Flux strategy-flip recovery
DIVERGES from the literal "$patch: replace" prescription on the issue
because that directive cannot survive any apply path that actually
runs in production (verified end-to-end in
tests/integration/strategy-flip.sh):

  - Flux's kustomize-controller submits via Server-Side Apply. SSA
    rejects `.spec.strategy.$patch` with "field not declared in
    schema" — fluxcd/pkg/ssa Manager.Apply does not preprocess SMP
    directives.
  - kubectl strict-decoding rejects `$patch` on every CREATE path
    (`kubectl create`, `kubectl apply` to an empty namespace, every
    `--server-side` flavor) with "unknown field spec.strategy.$patch"
    — adding it to a chart base resource BREAKS fresh installs of
    every new Sovereign.

The durable fix is the documented Flux annotation
`kustomize.toolkit.fluxcd.io/force: enabled` on the Deployment.
When kustomize-controller's SSA dry-run fails Invalid (the contabo-
mkt failure mode: `spec.strategy.rollingUpdate: Forbidden` on the
post-merge object that retained `rollingUpdate.maxSurge=25%` /
`maxUnavailable=25%` from the prior `kubectl-client-side-apply`
field manager), the controller falls back to delete-and-recreate
THIS resource. The recreated Deployment carries no residual
`rollingUpdate.*` fields, so the regression cannot recur. The
annotation is IaC, scoped to the Deployment, applies on every
reconcile.

Verified gates:
  - `kubectl apply --dry-run=server -f .../api-deployment.yaml`
    over a Deployment in the bad pre-state (RollingUpdate +
    maxSurge=25% / maxUnavailable=25%) → exit 0,
    "deployment.apps/catalyst-api configured (server dry run)".
  - Same manifest applied to an empty namespace via SSA + CSA →
    both succeed (the fresh-install gate that catches `$patch:`-
    shaped regressions).
  - SSA path correctly REPRODUCES the regression mode (asserted
    in step 3 of the integration test) → proves the recovery layer
    is necessary.
  - Flux force-recovery equivalent (delete + apply) succeeds →
    proves the recovery path itself works.

Files:
  - products/catalyst/chart/templates/api-deployment.yaml: add
    `kustomize.toolkit.fluxcd.io/force: enabled` annotation +
    inline reference comment explaining failure mode and rejecting
    inline `$patch: replace` as a future regression vector.
  - docs/CHART-AUTHORING.md (new): authoritative chart-authoring
    doc, with §"Strategy flips on existing Deployments" anchoring
    the failure mode + canonical fix + table of related fields
    (selector, clusterIP, accessModes, etc.) that share the
    pattern. References docs/INVIOLABLE-PRINCIPLES.md #3 (Flux is
    the only GitOps reconciler) and #4 (never hardcode runtime
    knobs in operator runbooks).
  - tests/integration/strategy-flip.yaml (new): bad-state fixture
    + assertion ConfigMap. Reproduces the exact 25%/25% pre-state
    that triggered contabo-mkt.
  - tests/integration/strategy-flip.sh (new): 6-step runner —
    bad-state stage, CSA gate, SSA failure-mode reproduction,
    structural annotation check, recovery-path proof, fresh-
    install gate. Exits non-zero on any regression.
  - .github/workflows/test-strategy-flip.yaml (new): CI wiring on
    kind v1.30.6 (matches contabo-mkt k3s decoding behavior),
    triggered by edits to the chart manifest, the test, the doc,
    or the workflow itself.

Sweep of the rest of the Catalyst chart templates: the only
`strategy.type: Recreate` Deployment in the chart is catalyst-api.
catalyst-ui, marketplace-api, and all 11 sme-services Deployments
declare default RollingUpdate and live as RollingUpdate on contabo-
mkt — no latent flips. Services use ClusterIP with default IP
allocation; the api-deployments PVC is RWO and never re-shaped by
the chart. No additional resources needed hardening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:04:07 +02:00

129 lines
5.6 KiB
YAML

# Integration test fixture — RollingUpdate -> Recreate strategy flip.
#
# Why this test exists
# --------------------
# On 2026-04-29 the contabo-mkt cluster's `catalyst` Flux Kustomization
# stuck at Ready=False with:
#
# Deployment/catalyst/catalyst-api dry-run failed (Invalid):
# Deployment.apps "catalyst-api" is invalid:
# spec.strategy.rollingUpdate: Forbidden:
# may not be specified when strategy `type` is 'Recreate'
#
# Root cause: the live Deployment had been created earlier by a
# `kubectl apply` that landed the default RollingUpdate strategy
# (maxSurge=25%, maxUnavailable=25%) on the object. When the chart's
# new manifest declared `strategy.type: Recreate`, Flux's strategic-
# merge dry-run merged the new `type` into the existing strategy block
# WITHOUT removing the residual rolling-update fields. Kubernetes
# validating admission rejected the merged result because Recreate
# forbids any rollingUpdate keys.
#
# The fix in `products/catalyst/chart/templates/api-deployment.yaml`
# adds `$patch: replace` to the strategy block, instructing strategic-
# merge to REPLACE rather than merge — dropping the leftover keys.
#
# This test reproduces the failing pre-state and asserts the new chart
# manifest applies cleanly over it. It is wired into CI by the
# `.github/workflows/test-strategy-flip.yaml` workflow which spins up
# kind, applies the fixture below, then applies the chart manifest.
#
# Three documents in this multi-doc YAML:
# 1. Namespace — isolation for the test run.
# 2. bad-state Deployment — the pre-existing live object that
# carries RollingUpdate + maxSurge
# 25% / maxUnavailable 25%, exactly
# the shape that triggered the
# regression on contabo-mkt.
# 3. assertion ConfigMap — captures the expected SUCCESS
# contract so the CI script can
# diff against `kubectl apply`
# output and fail loudly if the
# error string regresses.
---
apiVersion: v1
kind: Namespace
metadata:
name: strategy-flip-test
labels:
test/owner: catalyst-chart
test/purpose: strategy-flip-regression
---
# Pre-existing live Deployment, shaped to reproduce the original bug.
# This MUST be applied via `kubectl apply` (client-side apply) so the
# `kubectl-client-side-apply` field manager owns the rollingUpdate
# fields — that is the exact scenario kustomize-controller hits in
# production on objects created before the chart pinned Recreate.
apiVersion: apps/v1
kind: Deployment
metadata:
name: catalyst-api
namespace: strategy-flip-test
labels:
app.kubernetes.io/name: catalyst-api
app.kubernetes.io/component: api
test/role: pre-existing-bad-state
spec:
replicas: 1
# The default RollingUpdate strategy with the default 25%/25% knobs.
# Identical to what `kubectl apply` of a Deployment that did NOT
# declare `strategy:` synthesizes on the API server side.
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
selector:
matchLabels:
app.kubernetes.io/name: catalyst-api
template:
metadata:
labels:
app.kubernetes.io/name: catalyst-api
spec:
# registry.k8s.io/pause is the canonical no-op container — never
# pulls a heavy image, has no side effects, exits as designed.
# Keeping the test side-effect-free means CI can run it in
# under 30 seconds against a kind cluster.
containers:
- name: catalyst-api
image: registry.k8s.io/pause:3.9
ports:
- containerPort: 8080
protocol: TCP
---
# Assertion contract — the test runner reads this ConfigMap to know
# (a) which command to run as the apply step,
# (b) which manifest to apply,
# (c) which exit code AND which kubectl output to require.
#
# Encoding the contract as data on the cluster (instead of in the test
# script) makes the contract reviewable in the same git diff as the
# chart change — every chart edit that touches strategy must update
# this contract or fail the integration test.
apiVersion: v1
kind: ConfigMap
metadata:
name: strategy-flip-assertions
namespace: strategy-flip-test
data:
# Which manifest is the system-under-test. Path is relative to the
# repository root so the same value works in CI and on a developer
# workstation invoking the test runner from anywhere.
target-manifest: products/catalyst/chart/templates/api-deployment.yaml
# The kubectl invocation the runner must execute. `--dry-run=server`
# uses client-side apply with server-side validation — this is the
# same path the user prescribed as the verification gate.
apply-command: kubectl apply --dry-run=server -n strategy-flip-test -f products/catalyst/chart/templates/api-deployment.yaml
# Required exit code. A non-zero exit reproduces the bug.
expected-exit-code: "0"
# Required substring in stdout — proves the apply was processed and
# would have updated the existing Deployment (rather than failing or
# creating a brand-new one because the namespace was empty).
expected-stdout-substring: "deployment.apps/catalyst-api configured"
# Forbidden substring in combined output — if this string appears
# the regression is back. Quote it exactly as Kubernetes emits it
# so a flaky-error rewording on a future K8s version surfaces as a
# real test failure rather than a silent regression.
forbidden-error-substring: "spec.strategy.rollingUpdate: Forbidden"