bp-openbao 1.2.15 (the HTTPRoute backend-name collapse fix) replayed the `auth-bootstrap` post-install,post-upgrade hook against an already-bootstrapped OpenBao. The hook hit `Error enabling kubernetes auth: 403 permission denied` on `bao auth enable -path=kubernetes kubernetes`, the upgrade failed, and Flux auto-rolled the release back to 1.2.14. Net effect: every chart bump that touches bp-openbao is unrecoverable without manual intervention. Root cause is in the hook itself: at the end of the FIRST run it `bao token revoke -self` + deletes the openbao-root-token Secret content (acceptance criterion #6: no root token persists past install). On any post-upgrade replay, the Secret still mounts via valueFrom but the token value is REVOKED, so every privileged call (`auth enable`, `secrets enable`, `policy write`, `write role`) returns 403. The existing idempotency check (`bao auth list | grep kubernetes/`) doesn't help because `bao auth list` itself silently 403s and the `|| echo "{}"` mask makes the script think the auth method is missing. Fix: add a token-validity gate immediately after the `initialized=true sealed=false` wait. Call `bao token lookup` (zero-cost, strictly read-only on the caller's token). If it 403s, BAO_TOKEN was revoked by a prior successful run — exit 0. The auth method, role, kv backend, and ESO policy are all already configured; nothing for this Job to do on a re-run. Chart bump: bp-openbao 1.2.15 → 1.2.16. Caught live on prov #80 (omantel.biz, 2026-05-14) when bp-openbao 1.2.14 → 1.2.15 was rolled by Flux and immediately failed + rolled back in a loop, blocking bp-newapi's dependsOn and stalling the bootstrap-kit Kustomization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
239 lines
12 KiB
YAML
239 lines
12 KiB
YAML
{{- /*
|
|
Catalyst auth-bootstrap Job — bp-openbao (issue #316).
|
|
|
|
Helm post-install/post-upgrade hook (weight 10, runs AFTER the init Job
|
|
at weight 5). Idempotent on every run:
|
|
|
|
1. Wait for `bao status` to report Initialized=true (the init Job at
|
|
weight 5 has already run; this is a defensive check).
|
|
2. Use the K8s ServiceAccount mounted at /var/run/secrets/... to
|
|
enable + configure the Kubernetes auth method on OpenBao.
|
|
3. Bind the `external-secrets` role to the ESO ServiceAccount per
|
|
`.Values.autoUnseal.kubernetesAuth`. ESO's ClusterSecretStore
|
|
`vault-region1` (platform/external-secrets) authenticates via this
|
|
role on every secret read.
|
|
4. Mount the kv-v2 backend at `secret/` (matches
|
|
platform/external-secrets/chart/values.yaml clusterSecretStore.path).
|
|
|
|
Why this is a SEPARATE Job from init-job.yaml:
|
|
- Init Job consumes the seed Secret and never persists the root
|
|
token — exit cleanly. Acceptance criterion #6 demands no root token
|
|
in K8s Secrets.
|
|
- Auth bootstrap requires a token to call `bao auth enable …`. The
|
|
upstream openbao chart exposes a transient init token via the
|
|
StatefulSet's emptyDir persistence. This Job uses `bao login -path`
|
|
against the auto-unseal recovery key — which is loaded from
|
|
OpenBao's internal Raft state, NOT from a K8s Secret.
|
|
|
|
Skip-render pattern (per #402): renders ONLY when both
|
|
`autoUnseal.enabled=true` AND `autoUnseal.kubernetesAuth.enabled=true`.
|
|
*/}}
|
|
{{- $au := .Values.autoUnseal | default dict -}}
|
|
{{- if $au.enabled -}}
|
|
{{- $kAuth := $au.kubernetesAuth | default dict -}}
|
|
{{- if $kAuth.enabled -}}
|
|
{{- $img := $au.image | default dict -}}
|
|
{{- $repo := $img.repository | default "quay.io/openbao/openbao" -}}
|
|
{{- $registry := .Values.global.imageRegistry | default "" -}}
|
|
{{- if $registry }}{{- $repo = printf "%s/%s" $registry $repo -}}{{- end -}}
|
|
{{- $tag := $img.tag | default "2.1.0" -}}
|
|
{{- $pullPolicy := $img.pullPolicy | default "IfNotPresent" -}}
|
|
{{- $baoAddr := $au.baoAddress | default (printf "http://%s-openbao:8200" .Release.Name) -}}
|
|
{{- $deadline := $au.activeDeadlineSeconds | default 600 -}}
|
|
{{- $backoff := $au.backoffLimit | default 6 -}}
|
|
{{- $mountPath := $kAuth.mountPath | default "kubernetes" -}}
|
|
{{- $role := $kAuth.role | default "external-secrets" -}}
|
|
{{- $saName := $kAuth.serviceAccountName | default "external-secrets" -}}
|
|
{{- $saNs := $kAuth.serviceAccountNamespace | default "external-secrets-system" -}}
|
|
{{- $kvMount := $kAuth.kvMountPath | default "secret" -}}
|
|
{{- $tokenTTL := $kAuth.tokenTTL | default "1h" -}}
|
|
{{- $tokenMaxTTL := $kAuth.tokenMaxTTL | default "24h" -}}
|
|
---
|
|
apiVersion: batch/v1
|
|
kind: Job
|
|
metadata:
|
|
name: openbao-auth-bootstrap
|
|
namespace: {{ .Release.Namespace | quote }}
|
|
annotations:
|
|
"helm.sh/hook": post-install,post-upgrade
|
|
"helm.sh/hook-weight": "10"
|
|
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
|
labels:
|
|
catalyst.openova.io/blueprint: bp-openbao
|
|
catalyst.openova.io/component: openbao-auth-bootstrap
|
|
spec:
|
|
activeDeadlineSeconds: {{ $deadline }}
|
|
backoffLimit: {{ $backoff }}
|
|
ttlSecondsAfterFinished: 600
|
|
template:
|
|
metadata:
|
|
labels:
|
|
catalyst.openova.io/blueprint: bp-openbao
|
|
catalyst.openova.io/component: openbao-auth-bootstrap
|
|
spec:
|
|
serviceAccountName: openbao-auto-unseal
|
|
restartPolicy: OnFailure
|
|
securityContext:
|
|
runAsNonRoot: true
|
|
runAsUser: 100
|
|
runAsGroup: 1000
|
|
fsGroup: 1000
|
|
containers:
|
|
- name: auth-bootstrap
|
|
image: {{ printf "%s:%s" $repo $tag | quote }}
|
|
imagePullPolicy: {{ $pullPolicy | quote }}
|
|
env:
|
|
- name: BAO_ADDR
|
|
value: {{ $baoAddr | quote }}
|
|
- name: AUTH_MOUNT_PATH
|
|
value: {{ $mountPath | quote }}
|
|
- name: AUTH_ROLE
|
|
value: {{ $role | quote }}
|
|
- name: ESO_SA_NAME
|
|
value: {{ $saName | quote }}
|
|
- name: ESO_SA_NAMESPACE
|
|
value: {{ $saNs | quote }}
|
|
- name: KV_MOUNT_PATH
|
|
value: {{ $kvMount | quote }}
|
|
- name: TOKEN_TTL
|
|
value: {{ $tokenTTL | quote }}
|
|
- name: TOKEN_MAX_TTL
|
|
value: {{ $tokenMaxTTL | quote }}
|
|
# BAO_TOKEN sourced from openbao-root-token Secret that init-job
|
|
# (post-install weight 5) writes from the bao operator init
|
|
# output. Required so this Job's `bao auth enable`,
|
|
# `bao secrets enable`, `bao policy write`, and `bao write
|
|
# role/...` calls are authenticated. Without it bao returns
|
|
# 403 Forbidden — caught live on otech43+otech44 because the
|
|
# PR #663 commit only added the revoke logic, not the env
|
|
# declaration that gates it.
|
|
- name: BAO_TOKEN
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: openbao-root-token
|
|
key: token
|
|
- name: NAMESPACE
|
|
valueFrom:
|
|
fieldRef:
|
|
fieldPath: metadata.namespace
|
|
command: ["/bin/sh", "-c"]
|
|
args:
|
|
- |
|
|
set -eu
|
|
echo "[openbao-auth-bootstrap] target BAO_ADDR=$BAO_ADDR"
|
|
# ─── Wait for OpenBao initialised + unsealed ───────────────
|
|
ATTEMPTS=0
|
|
MAX_ATTEMPTS=60 # 5 minutes
|
|
until OUT=$(bao status -format=json 2>/dev/null) && \
|
|
echo "$OUT" | grep -qE '"initialized"[[:space:]]*:[[:space:]]*true' && \
|
|
echo "$OUT" | grep -qE '"sealed"[[:space:]]*:[[:space:]]*false'; do
|
|
ATTEMPTS=$((ATTEMPTS+1))
|
|
if [ "$ATTEMPTS" -ge "$MAX_ATTEMPTS" ]; then
|
|
echo "[openbao-auth-bootstrap] FATAL: OpenBao not initialised+unsealed after $MAX_ATTEMPTS attempts"
|
|
echo "[openbao-auth-bootstrap] manual recovery: docs/RUNBOOK-PROVISIONING.md §openbao-auto-unseal"
|
|
exit 1
|
|
fi
|
|
echo "[openbao-auth-bootstrap] waiting for openbao initialized=true sealed=false (attempt $ATTEMPTS/$MAX_ATTEMPTS)"
|
|
sleep 5
|
|
done
|
|
|
|
# ─── Token-validity gate (post-upgrade no-op) ─────────────
|
|
# On post-install (first run) BAO_TOKEN is the freshly-minted
|
|
# root token from openbao-root-token Secret and is valid.
|
|
# On post-upgrade re-runs the same root token has ALREADY
|
|
# been revoked by the previous run (see "revoke + cleanup"
|
|
# section below) so every privileged call returns 403. There
|
|
# is nothing meaningful for this Job to do on a re-run — the
|
|
# auth method, the role, and the kv backend are already
|
|
# configured. Detect the revoked-token case and exit 0 so a
|
|
# routine chart bump doesn't fail the HR's post-upgrade hook
|
|
# and rollback the release. Caught live on prov #80 when
|
|
# bp-openbao 1.2.14 → 1.2.15 (an HTTPRoute-only bump) replayed
|
|
# the hook and 403'd on `bao auth enable`.
|
|
if ! bao token lookup >/dev/null 2>&1; then
|
|
echo "[openbao-auth-bootstrap] BAO_TOKEN no longer valid (likely revoked by a prior successful run); nothing to do — exiting 0"
|
|
exit 0
|
|
fi
|
|
|
|
# ─── Idempotency: skip if auth method + role already exist ─
|
|
# `bao auth list` returns 200 and the JSON includes the mount
|
|
# path key (e.g. "kubernetes/") when the method is enabled.
|
|
# If the role exists we have nothing to do — this run is
|
|
# post-upgrade or a Job retry.
|
|
EXISTING_AUTH=$(bao auth list -format=json 2>/dev/null || echo "{}")
|
|
if echo "$EXISTING_AUTH" | grep -qE "\"$AUTH_MOUNT_PATH/\""; then
|
|
echo "[openbao-auth-bootstrap] auth method $AUTH_MOUNT_PATH/ already enabled"
|
|
else
|
|
echo "[openbao-auth-bootstrap] enabling Kubernetes auth at $AUTH_MOUNT_PATH/"
|
|
bao auth enable -path=$AUTH_MOUNT_PATH kubernetes
|
|
fi
|
|
|
|
# Configure the auth method against the in-cluster API
|
|
# server. K8s_HOST is the standard cluster-DNS endpoint;
|
|
# K8S_CA_CERT comes from the SA mount.
|
|
echo "[openbao-auth-bootstrap] writing auth/$AUTH_MOUNT_PATH/config"
|
|
KUBE_CA=$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)
|
|
bao write auth/$AUTH_MOUNT_PATH/config \
|
|
kubernetes_host="https://kubernetes.default.svc" \
|
|
kubernetes_ca_cert="$KUBE_CA" \
|
|
disable_iss_validation=true
|
|
|
|
# ─── Ensure kv-v2 backend at $KV_MOUNT_PATH/ ────────────────
|
|
EXISTING_MOUNTS=$(bao secrets list -format=json 2>/dev/null || echo "{}")
|
|
if echo "$EXISTING_MOUNTS" | grep -qE "\"$KV_MOUNT_PATH/\""; then
|
|
echo "[openbao-auth-bootstrap] kv backend $KV_MOUNT_PATH/ already mounted"
|
|
else
|
|
echo "[openbao-auth-bootstrap] mounting kv-v2 at $KV_MOUNT_PATH/"
|
|
bao secrets enable -path=$KV_MOUNT_PATH -version=2 kv
|
|
fi
|
|
|
|
# ─── ESO read-policy ────────────────────────────────────────
|
|
# Read access to all keys under $KV_MOUNT_PATH. ESO does NOT
|
|
# write — Catalyst rotation jobs hold the writer policy
|
|
# separately.
|
|
cat <<EOF | bao policy write external-secrets-read -
|
|
path "$KV_MOUNT_PATH/data/*" {
|
|
capabilities = ["read", "list"]
|
|
}
|
|
path "$KV_MOUNT_PATH/metadata/*" {
|
|
capabilities = ["read", "list"]
|
|
}
|
|
EOF
|
|
|
|
# ─── Bind role $AUTH_ROLE to the ESO ServiceAccount ─────────
|
|
echo "[openbao-auth-bootstrap] writing auth/$AUTH_MOUNT_PATH/role/$AUTH_ROLE bound to $ESO_SA_NAMESPACE/$ESO_SA_NAME"
|
|
bao write auth/$AUTH_MOUNT_PATH/role/$AUTH_ROLE \
|
|
bound_service_account_names=$ESO_SA_NAME \
|
|
bound_service_account_namespaces=$ESO_SA_NAMESPACE \
|
|
policies=external-secrets-read \
|
|
ttl=$TOKEN_TTL \
|
|
max_ttl=$TOKEN_MAX_TTL
|
|
|
|
# ─── Revoke + cleanup root token ──────────────────────────
|
|
# Acceptance criterion #6: the privileged root token must
|
|
# not persist past the install window. Revoke it inside
|
|
# bao first, then delete the K8s Secret that held it.
|
|
echo "[openbao-auth-bootstrap] revoking root token"
|
|
bao token revoke -self 2>&1 || echo "[openbao-auth-bootstrap] WARN: token revoke returned non-zero (may already be revoked)"
|
|
echo "[openbao-auth-bootstrap] deleting transient openbao-root-token Secret"
|
|
wget -qO- --no-check-certificate \
|
|
--header="Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
|
|
--method=DELETE \
|
|
"https://kubernetes.default.svc/api/v1/namespaces/$NAMESPACE/secrets/openbao-root-token" >/dev/null 2>&1 || \
|
|
echo "[openbao-auth-bootstrap] WARN: --method=DELETE not supported (busybox wget); manual cleanup may be needed via kubectl"
|
|
|
|
echo "[openbao-auth-bootstrap] Kubernetes auth bootstrap complete"
|
|
resources:
|
|
requests:
|
|
cpu: 50m
|
|
memory: 64Mi
|
|
limits:
|
|
memory: 256Mi
|
|
securityContext:
|
|
allowPrivilegeEscalation: false
|
|
readOnlyRootFilesystem: false
|
|
capabilities:
|
|
drop: ["ALL"]
|
|
{{- end }}
|
|
{{- end }}
|