fix(cutover 0.1.20): Step-06 pushes YAML edit to local Gitea so patches survive Flux reconcile (#970) (#971)

## Root cause (live on otech116 2026-05-05 14:38)

After the #968 fix shipped (0.1.19), the cutover engine reached Step-7
(87%) successfully — Step-01..07 all completed. Then Step-08 (egress-
block-test) caught 38/38 HelmRepositories had reverted to upstream:

```
external HelmRepositories still pointing at ghcr.io/openova-io: 38
  OFFENDER flux-system/bp-cilium=oci://ghcr.io/openova-io
  ... (37 more)
FAIL — at least one HelmRepository did not pivot
```

But Step-06's job logs say:
```
[helmrepository-patches] OK bp-cilium -> oci://harbor.otech116.omani.works/openova-io
... (37 more OK)
ok=38 skip=0 fail=0
```

So Step-06 thought it succeeded — and it had, momentarily. But then
the bootstrap-kit Kustomization (which had successfully pivoted to
local Gitea via Step-05) reconciled its YAML from local Gitea, where
the YAML still declared `url: oci://ghcr.io/openova-io`. Within ~30s
every kubectl patch was undone. The cutover engine then aborted at
Step-8 verification.

## Fix

Step-06 now runs in two phases:
1. **Live K8s patches** (existing behaviour) — flips spec.url on every
   HelmRepository immediately. Useful for the cluster between cutover
   and the next reconcile.
2. **NEW — Push YAML edit to local Gitea** — clones `openova/openova`
   from the local Gitea over basic-auth, sed-rewrites every
   `clusters/_template/bootstrap-kit/*.yaml` declaration of `url:
   oci://ghcr.io/openova-io` → `oci://harbor.<sov-fqdn>/openova-io`,
   commits with a clear message, pushes back. Subsequent reconciles
   see local Harbor as the steady-state.

After the push, the script annotates `flux-system/openova` GitRepository
to trigger immediate reconciliation so the new YAML lands without
waiting for the polling interval.

## Image change

Step-06 image bumped from `bitnami/kubectl:1.31.4` to `alpine/k8s:1.31.4`
because the new phase needs both `kubectl` and `git` in one image
(verified live on otech116 — both binaries present).

## Acceptance gate

Test case 16 added to cutover-contract.sh — guards against future
regressions that remove the `git clone`, the `git push origin main`,
or the `clusters/_template/bootstrap-kit` target dir reference.

## Live verification

Will fire on otech117 (next provision). Expected:
- Step-06 logs `cloning gitea-http.gitea.../openova/openova.git` then `pushed to ...`
- Step-08 verify PASSES (38/38 HelmRepositories pivoted in K8s + Gitea)
- self-sovereign-cutover-status `cutoverComplete: "true"`
- Egress block to ghcr.io safely activates

Co-authored-by: e3mrah <ebaysal@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-05 18:55:22 +04:00 committed by GitHub
parent 9ed579d4ba
commit 608db53a25
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 151 additions and 8 deletions

View File

@ -184,11 +184,27 @@ spec:
# one DNS miss was terminal and the cutover engine aborted all # one DNS miss was terminal and the cutover engine aborted all
# 8 steps. Fix is dual: (a) catalyst-api now stamps Jobs with # 8 steps. Fix is dual: (a) catalyst-api now stamps Jobs with
# `backoffLimit=3` so a single miss is recoverable; (b) Step-01 # `backoffLimit=3` so a single miss is recoverable; (b) Step-01
# bash script gains an explicit `nslookup` readiness loop (30 × # bash script gains an explicit `nslookup` readiness loop (30 x
# 5s) at the top, before any wget call. Both layers are needed — # 5s) at the top, before any wget call. Both layers are needed —
# the in-script probe is fastest; the backoffLimit is the # the in-script probe is fastest; the backoffLimit is the
# safety net for any other transient pre-cluster-stable race. # safety net for any other transient pre-cluster-stable race.
version: 0.1.19 # 0.1.20: Step-06 helmrepository-patches reverted by Flux (#970).
# 0.1.19 unblocked the cutover through Step-7, but Step-08
# verify caught 38/38 HelmRepositories had reverted to
# oci://ghcr.io/openova-io despite Step-06's job logs showing
# `OK ${name} -> oci://harbor.<sov-fqdn>/openova-io` for each.
# Root cause: Step-06 only `kubectl patch`ed the live K8s
# objects; bootstrap-kit Kustomization reconciled YAML from
# local Gitea every 1m, where the YAML still declared the
# upstream URL, undoing each patch within ~30s. Fix: Step-06
# now does both phases — (a) live kubectl patches as before,
# then (b) clones local Gitea, sed-rewrites every
# clusters/_template/bootstrap-kit/*.yaml declaration of
# `url: oci://ghcr.io/openova-io` → local Harbor prefix,
# commits, and pushes. Subsequent reconciles see local Harbor
# as steady-state. Image bumped to alpine/k8s:1.31.4 (kubectl
# + git in one image; verified live on otech116).
version: 0.1.20
sourceRef: sourceRef:
kind: HelmRepository kind: HelmRepository
name: bp-self-sovereign-cutover name: bp-self-sovereign-cutover

View File

@ -1,6 +1,6 @@
apiVersion: v2 apiVersion: v2
name: bp-self-sovereign-cutover name: bp-self-sovereign-cutover
version: 0.1.19 version: 0.1.20
description: | description: |
Catalyst Self-Sovereignty Cutover Blueprint. Installs DORMANT — this Catalyst Self-Sovereignty Cutover Blueprint. Installs DORMANT — this
chart ships eight step ConfigMaps (PodSpec ConfigMaps, one per step), chart ships eight step ConfigMaps (PodSpec ConfigMaps, one per step),

View File

@ -29,7 +29,9 @@ data:
activeDeadlineSeconds: {{ .Values.stepTimeouts.helmRepositoryPatchesSeconds }} activeDeadlineSeconds: {{ .Values.stepTimeouts.helmRepositoryPatchesSeconds }}
containers: containers:
- name: helmrepository-patches - name: helmrepository-patches
image: {{ include "bp-self-sovereign-cutover.image" (dict "repository" .Values.images.kubectl.repository "tag" .Values.images.kubectl.tag "cutoverPhase" "post" "Values" .Values) }} image: alpine/k8s:1.31.4 # ships kubectl + git so we can both
# patch the live K8s object AND push
# the YAML edit to local Gitea (#970).
imagePullPolicy: IfNotPresent imagePullPolicy: IfNotPresent
env: env:
- name: HELMREPO_NAMESPACE - name: HELMREPO_NAMESPACE
@ -38,10 +40,30 @@ data:
value: {{ .Values.helmRepositories.upstreamPrefix | quote }} value: {{ .Values.helmRepositories.upstreamPrefix | quote }}
- name: HARBOR_PUBLIC_URL - name: HARBOR_PUBLIC_URL
value: {{ .Values.sovereign.harborPublicURL | quote }} value: {{ .Values.sovereign.harborPublicURL | quote }}
- name: SOVEREIGN_FQDN
value: {{ .Values.sovereign.fqdn | quote }}
- name: GITEA_INTERNAL_URL
value: {{ .Values.sovereign.giteaInternalURL | quote }}
- name: GITEA_USERNAME
valueFrom:
secretKeyRef:
name: {{ .Values.gitea.adminSecretRef.name }}
key: {{ .Values.gitea.adminSecretRef.usernameKey }}
- name: GITEA_PASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.gitea.adminSecretRef.name }}
key: {{ .Values.gitea.adminSecretRef.passwordKey }}
- name: GITEA_ORG
value: {{ .Values.gitea.org | quote }}
- name: GITEA_REPO
value: {{ .Values.gitea.repo | quote }}
volumeMounts: volumeMounts:
- name: helmrepository-list - name: helmrepository-list
mountPath: /work mountPath: /work
readOnly: true readOnly: true
- name: tmp
mountPath: /tmp
command: ["/bin/sh", "-c"] command: ["/bin/sh", "-c"]
args: args:
- | - |
@ -51,6 +73,7 @@ data:
echo "[helmrepository-patches] upstream=${UPSTREAM_PREFIX} -> local=${local_prefix}" echo "[helmrepository-patches] upstream=${UPSTREAM_PREFIX} -> local=${local_prefix}"
# ---- Phase 1: live K8s patches (immediate) ----
ok=0 ok=0
skip=0 skip=0
fail=0 fail=0
@ -90,11 +113,89 @@ data:
fi fi
done < /work/helmrepository-names.txt done < /work/helmrepository-names.txt
echo "[helmrepository-patches] ok=${ok} skip=${skip} fail=${fail}" echo "[helmrepository-patches] live-K8s ok=${ok} skip=${skip} fail=${fail}"
[ "${fail}" -eq 0 ] [ "${fail}" -eq 0 ] || exit 1
# ---- Phase 2: push YAML edit to local Gitea (#970) ----
#
# The kubectl patches above flip the live HelmRepository
# objects. But the bootstrap-kit Kustomization reconciles
# YAML from the Sovereign's local Gitea every minute, and
# those YAML files still declare `url: oci://ghcr.io/openova-io`.
# Without this phase, Flux reverts every patch within ~1m,
# and Step-08's verify catches the regression as "OFFENDER".
#
# Fix: clone the local Gitea, sed-rewrite every
# clusters/_template/bootstrap-kit/*.yaml that declares the
# upstream URL, commit, push. Subsequent reconciles pick up
# the local Harbor URL as the steady-state.
export HOME=/tmp
git config --global user.name "self-sovereign-cutover"
git config --global user.email "cutover@${SOVEREIGN_FQDN}"
git config --global advice.detachedHead false
# gitea-http is headless (#968); wait for DNS just in case.
gitea_host="$(printf '%s' "${GITEA_INTERNAL_URL}" | sed -E 's|^https?://||' | cut -d: -f1 | cut -d/ -f1)"
for i in $(seq 1 30); do
if nslookup "${gitea_host}" >/dev/null 2>&1; then break; fi
sleep 5
done
# URL with embedded basic auth — credential goes to git via
# URL only, never echoed to stdout.
push_url=$(printf '%s' "${GITEA_INTERNAL_URL}" | sed -E "s,^(https?://),\1${GITEA_USERNAME}:${GITEA_PASSWORD}@,")"/${GITEA_ORG}/${GITEA_REPO}.git"
redacted=$(printf '%s' "${GITEA_INTERNAL_URL}/${GITEA_ORG}/${GITEA_REPO}.git")
echo "[helmrepository-patches] cloning ${redacted}"
cd /tmp
rm -rf repo
git clone --depth 1 --branch main "${push_url}" repo >/dev/null 2>&1
cd repo
# Find every HelmRepository YAML that declares the upstream
# URL under clusters/_template/bootstrap-kit/. Rewrite to
# the local Harbor prefix in-place.
target_dir="clusters/_template/bootstrap-kit"
if [ ! -d "${target_dir}" ]; then
echo "[helmrepository-patches] FATAL: ${target_dir} not present in local mirror" >&2
exit 1
fi
edited=0
for f in $(grep -lE "^[[:space:]]*url:[[:space:]]*${UPSTREAM_PREFIX}[[:space:]]*$" "${target_dir}"/*.yaml 2>/dev/null || true); do
sed -i -E "s,^([[:space:]]*url:[[:space:]]*)${UPSTREAM_PREFIX}([[:space:]]*)$,\1${local_prefix}\2," "${f}"
edited=$((edited+1))
echo "[helmrepository-patches] edited ${f}"
done
echo "[helmrepository-patches] sed edited ${edited} files"
if [ "${edited}" -eq 0 ]; then
echo "[helmrepository-patches] no edits — already pivoted in Gitea or upstream prefix not present"
# Don't fail; phase-1 already succeeded.
exit 0
fi
git add "${target_dir}"
if git diff --staged --quiet; then
echo "[helmrepository-patches] git diff empty after sed — nothing to commit"
exit 0
fi
git commit -m "cutover: pivot ${edited} HelmRepository URLs to local Harbor" >/dev/null
git push origin main >/dev/null 2>&1 || {
echo "[helmrepository-patches] FATAL: git push failed" >&2
exit 1
}
echo "[helmrepository-patches] pushed to ${redacted} (commit will reconcile via bootstrap-kit Kustomization)"
# Trigger an immediate Flux reconciliation so the new YAML
# lands without waiting for the polling interval.
kubectl annotate --overwrite gitrepository openova \
-n flux-system \
"reconcile.fluxcd.io/requestedAt=$(date +%s)" >/dev/null || true
echo "[helmrepository-patches] step complete"
resources: resources:
requests: { cpu: 50m, memory: 64Mi } requests: { cpu: 50m, memory: 128Mi }
limits: { memory: 256Mi } limits: { memory: 384Mi }
securityContext: securityContext:
runAsNonRoot: true runAsNonRoot: true
runAsUser: 1001 runAsUser: 1001
@ -109,3 +210,5 @@ data:
items: items:
- key: helmrepository-names.txt - key: helmrepository-names.txt
path: helmrepository-names.txt path: helmrepository-names.txt
- name: tmp
emptyDir: {}

View File

@ -267,4 +267,28 @@ if ! grep -q 'gitea_host=' "$TMP/render.yaml"; then
fi fi
echo " PASS (Step-01 has DNS readiness probe)" echo " PASS (Step-01 has DNS readiness probe)"
echo "[cutover-contract] Case 16: Step-06 helmrepository-patches pushes YAML edit to local Gitea (#970)"
# 0.1.19 Step-06 only ran kubectl patch against live HelmRepository
# objects. bootstrap-kit Kustomization reconciled YAML from local
# Gitea every 1m and reverted each patch within ~30s — Step-08
# verify caught 38/38 OFFENDERS (caught live on otech116 2026-05-05).
#
# 0.1.20 Step-06 has a Phase-2 that clones local Gitea, sed-rewrites
# every clusters/_template/bootstrap-kit/*.yaml that declares the
# upstream URL, commits, and pushes. This gate guards against future
# regressions that drop the git-push.
if ! grep -q 'git clone' "$TMP/render.yaml"; then
echo "FAIL: Step-06 missing git clone (no Gitea push — patches will be reverted by Flux) (#970)" >&2
exit 1
fi
if ! grep -q 'git push origin main' "$TMP/render.yaml"; then
echo "FAIL: Step-06 missing git push origin main (#970)" >&2
exit 1
fi
if ! grep -q 'clusters/_template/bootstrap-kit' "$TMP/render.yaml"; then
echo "FAIL: Step-06 missing target_dir reference (#970)" >&2
exit 1
fi
echo " PASS (Step-06 pushes YAML edit to local Gitea)"
echo "[cutover-contract] All gates green." echo "[cutover-contract] All gates green."