fix(openbao): make auth-bootstrap Job idempotent on post-upgrade (token already revoked)

bp-openbao 1.2.15 (the HTTPRoute backend-name collapse fix) replayed the
`auth-bootstrap` post-install,post-upgrade hook against an already-bootstrapped
OpenBao. The hook hit `Error enabling kubernetes auth: 403 permission denied`
on `bao auth enable -path=kubernetes kubernetes`, the upgrade failed, and Flux
auto-rolled the release back to 1.2.14. Net effect: every chart bump that
touches bp-openbao is unrecoverable without manual intervention.

Root cause is in the hook itself: at the end of the FIRST run it
`bao token revoke -self` + deletes the openbao-root-token Secret content
(acceptance criterion #6: no root token persists past install). On any
post-upgrade replay, the Secret still mounts via valueFrom but the token
value is REVOKED, so every privileged call (`auth enable`, `secrets enable`,
`policy write`, `write role`) returns 403. The existing idempotency check
(`bao auth list | grep kubernetes/`) doesn't help because `bao auth list`
itself silently 403s and the `|| echo "{}"` mask makes the script think the
auth method is missing.

Fix: add a token-validity gate immediately after the
`initialized=true sealed=false` wait. Call `bao token lookup` (zero-cost,
strictly read-only on the caller's token). If it 403s, BAO_TOKEN was
revoked by a prior successful run — exit 0. The auth method, role, kv
backend, and ESO policy are all already configured; nothing for this Job
to do on a re-run.

Chart bump: bp-openbao 1.2.15 → 1.2.16.

Caught live on prov #80 (omantel.biz, 2026-05-14) when bp-openbao
1.2.14 → 1.2.15 was rolled by Flux and immediately failed + rolled back
in a loop, blocking bp-newapi's dependsOn and stalling the bootstrap-kit
Kustomization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-14 17:12:55 +02:00
parent 3d929e69d7
commit 0af69e8728
3 changed files with 20 additions and 2 deletions

View File

@ -54,7 +54,7 @@ spec:
chart:
spec:
chart: bp-openbao
version: 1.2.15
version: 1.2.16
sourceRef:
kind: HelmRepository
name: bp-openbao

View File

@ -1,6 +1,6 @@
apiVersion: v2
name: bp-openbao
version: 1.2.15
version: 1.2.16
description: |
Catalyst-curated Blueprint umbrella chart for OpenBao. Depends on the
upstream `openbao` chart as a Helm subchart so `helm dependency build`

View File

@ -137,6 +137,24 @@ spec:
sleep 5
done
# ─── Token-validity gate (post-upgrade no-op) ─────────────
# On post-install (first run) BAO_TOKEN is the freshly-minted
# root token from openbao-root-token Secret and is valid.
# On post-upgrade re-runs the same root token has ALREADY
# been revoked by the previous run (see "revoke + cleanup"
# section below) so every privileged call returns 403. There
# is nothing meaningful for this Job to do on a re-run — the
# auth method, the role, and the kv backend are already
# configured. Detect the revoked-token case and exit 0 so a
# routine chart bump doesn't fail the HR's post-upgrade hook
# and rollback the release. Caught live on prov #80 when
# bp-openbao 1.2.14 → 1.2.15 (an HTTPRoute-only bump) replayed
# the hook and 403'd on `bao auth enable`.
if ! bao token lookup >/dev/null 2>&1; then
echo "[openbao-auth-bootstrap] BAO_TOKEN no longer valid (likely revoked by a prior successful run); nothing to do — exiting 0"
exit 0
fi
# ─── Idempotency: skip if auth method + role already exist ─
# `bao auth list` returns 200 and the JSON includes the mount
# path key (e.g. "kubernetes/") when the method is enabled.