fix(openbao): make auth-bootstrap Job idempotent on post-upgrade (token already revoked)
bp-openbao 1.2.15 (the HTTPRoute backend-name collapse fix) replayed the `auth-bootstrap` post-install,post-upgrade hook against an already-bootstrapped OpenBao. The hook hit `Error enabling kubernetes auth: 403 permission denied` on `bao auth enable -path=kubernetes kubernetes`, the upgrade failed, and Flux auto-rolled the release back to 1.2.14. Net effect: every chart bump that touches bp-openbao is unrecoverable without manual intervention. Root cause is in the hook itself: at the end of the FIRST run it `bao token revoke -self` + deletes the openbao-root-token Secret content (acceptance criterion #6: no root token persists past install). On any post-upgrade replay, the Secret still mounts via valueFrom but the token value is REVOKED, so every privileged call (`auth enable`, `secrets enable`, `policy write`, `write role`) returns 403. The existing idempotency check (`bao auth list | grep kubernetes/`) doesn't help because `bao auth list` itself silently 403s and the `|| echo "{}"` mask makes the script think the auth method is missing. Fix: add a token-validity gate immediately after the `initialized=true sealed=false` wait. Call `bao token lookup` (zero-cost, strictly read-only on the caller's token). If it 403s, BAO_TOKEN was revoked by a prior successful run — exit 0. The auth method, role, kv backend, and ESO policy are all already configured; nothing for this Job to do on a re-run. Chart bump: bp-openbao 1.2.15 → 1.2.16. Caught live on prov #80 (omantel.biz, 2026-05-14) when bp-openbao 1.2.14 → 1.2.15 was rolled by Flux and immediately failed + rolled back in a loop, blocking bp-newapi's dependsOn and stalling the bootstrap-kit Kustomization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3d929e69d7
commit
0af69e8728
@ -54,7 +54,7 @@ spec:
|
||||
chart:
|
||||
spec:
|
||||
chart: bp-openbao
|
||||
version: 1.2.15
|
||||
version: 1.2.16
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-openbao
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
apiVersion: v2
|
||||
name: bp-openbao
|
||||
version: 1.2.15
|
||||
version: 1.2.16
|
||||
description: |
|
||||
Catalyst-curated Blueprint umbrella chart for OpenBao. Depends on the
|
||||
upstream `openbao` chart as a Helm subchart so `helm dependency build`
|
||||
|
||||
@ -137,6 +137,24 @@ spec:
|
||||
sleep 5
|
||||
done
|
||||
|
||||
# ─── Token-validity gate (post-upgrade no-op) ─────────────
|
||||
# On post-install (first run) BAO_TOKEN is the freshly-minted
|
||||
# root token from openbao-root-token Secret and is valid.
|
||||
# On post-upgrade re-runs the same root token has ALREADY
|
||||
# been revoked by the previous run (see "revoke + cleanup"
|
||||
# section below) so every privileged call returns 403. There
|
||||
# is nothing meaningful for this Job to do on a re-run — the
|
||||
# auth method, the role, and the kv backend are already
|
||||
# configured. Detect the revoked-token case and exit 0 so a
|
||||
# routine chart bump doesn't fail the HR's post-upgrade hook
|
||||
# and rollback the release. Caught live on prov #80 when
|
||||
# bp-openbao 1.2.14 → 1.2.15 (an HTTPRoute-only bump) replayed
|
||||
# the hook and 403'd on `bao auth enable`.
|
||||
if ! bao token lookup >/dev/null 2>&1; then
|
||||
echo "[openbao-auth-bootstrap] BAO_TOKEN no longer valid (likely revoked by a prior successful run); nothing to do — exiting 0"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# ─── Idempotency: skip if auth method + role already exist ─
|
||||
# `bao auth list` returns 200 and the JSON includes the mount
|
||||
# path key (e.g. "kubernetes/") when the method is enabled.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user