fix(self-sovereign-cutover): set HR timeout 15m + lower hook deadlines below it (#127)
Two prior provisions (#12 d22b6d4dada2aef2, #14 12e194090631a885) wedged identically at phase1-watching: bp-self-sovereign-cutover@0.1.25 post-install hook timed out at 5m (Helm default), Flux marked release Failed, retried 3x, gave up. catalyst-api never received kubeconfig PUT-back because the cutover chain inside the Sovereign couldn't complete. Root cause: HelmRelease had no explicit install/upgrade timeout → Helm's 5m default → hit before the auto-trigger Job's activeDeadlineSeconds (600s) and WAIT_TIMEOUT_SECONDS (300s) could complete cleanly. Fix: - HR install/upgrade timeout: 15m (covers cold-start cluster + auto-trigger) - values.autoWaitForAPISeconds: 300 → 720 (12m wait, exits 0 below 15m HR cap) - values.autoTimeoutSeconds: 600 → 840 (14m Job deadline, below 15m HR cap) - Chart bump 0.1.25 → 0.1.26 Per CLAUDE.md principle 16: canonical seam = HR timeout + chart hook deadlines. Both must align: hook deadlines < HR timeout, otherwise Helm gives up before hook completes regardless of how the Job exits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
86231d1d2f
commit
40612f19ea
@ -241,10 +241,12 @@ spec:
|
||||
namespace: flux-system
|
||||
install:
|
||||
disableWait: true
|
||||
timeout: 15m
|
||||
remediation:
|
||||
retries: 3
|
||||
upgrade:
|
||||
disableWait: true
|
||||
timeout: 15m
|
||||
remediation:
|
||||
retries: 3
|
||||
# Per-Sovereign overrides — the chart's values.yaml carries
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
apiVersion: v2
|
||||
name: bp-self-sovereign-cutover
|
||||
version: 0.1.25
|
||||
version: 0.1.26
|
||||
description: |
|
||||
Catalyst Self-Sovereignty Cutover Blueprint. Installs DORMANT — this
|
||||
chart ships eight step ConfigMaps (PodSpec ConfigMaps, one per step),
|
||||
|
||||
@ -331,14 +331,19 @@ trigger:
|
||||
catalystAPIURL: "http://catalyst-api.catalyst-system.svc.cluster.local:8080"
|
||||
# How long the auto-trigger Job will wait for catalyst-api to be
|
||||
# reachable before giving up (and exiting 0 so the operator can fire
|
||||
# manually). 5 minutes is enough for a Sovereign mid-cold-start.
|
||||
autoWaitForAPISeconds: 300
|
||||
# manually). Must finish below the HelmRelease install/upgrade
|
||||
# timeout (15m for bp-self-sovereign-cutover) AND the activeDeadline
|
||||
# below so the Job exits cleanly even when catalyst-api never comes
|
||||
# up — 12 minutes leaves a healthy 3m buffer below the 15m HR cap.
|
||||
autoWaitForAPISeconds: 720
|
||||
# Overall cap on the auto-trigger Job runtime. activeDeadlineSeconds
|
||||
# on the Job spec — anything longer means catalyst-api is sick and
|
||||
# the operator should investigate. The Job exiting at this deadline
|
||||
# is non-fatal for the chart install (the cutover engine already
|
||||
# runs detached inside catalyst-api once /start returns 200).
|
||||
autoTimeoutSeconds: 600
|
||||
# Must stay below the HelmRelease install/upgrade timeout (15m =
|
||||
# 900s) so the Job ends and the hook unblocks before Helm gives up.
|
||||
autoTimeoutSeconds: 840
|
||||
# TTL on the completed Job — kept for audit so operators can read
|
||||
# the trigger Pod logs if something looks wrong.
|
||||
autoJobTTLSeconds: 86400
|
||||
|
||||
Loading…
Reference in New Issue
Block a user