Every bootstrap-kit HelmRepository CR carries `secretRef: name: ghcr-pull`
because bp-* OCI artifacts at ghcr.io/openova-io/ are private. Cloud-init
never created the Secret, so every fresh Sovereign's source-controller
logs `secrets "ghcr-pull" not found` and Phase 1 stalls at bp-cilium.
The operator workaround (kubectl apply by hand) is not durable across
reprovisioning. Verified live on omantel.omani.works pre-fix.
Changes:
- provisioner.Request gains GHCRPullToken (json:"-") so it is never
serialized into persisted deployment records. provisioner.New() reads
CATALYST_GHCR_PULL_TOKEN at startup; Provision() stamps it onto the
Request before tofu.auto.tfvars.json. Validate() rejects empty for
domain_mode=pool with a pointer to docs/SECRET-ROTATION.md.
- handler.CreateDeployment also stamps the env var onto the Request so
the synchronous validation path returns 400 early on misconfiguration.
- infra/hetzner: variables.tf adds ghcr_pull_token (sensitive=true,
default=""). main.tf computes ghcr_pull_username + ghcr_pull_auth_b64
locals and passes both to templatefile().
cloudinit-control-plane.tftpl emits a kubernetes.io/dockerconfigjson
Secret manifest into /var/lib/catalyst/ghcr-pull-secret.yaml; runcmd
applies it AFTER Flux core install but BEFORE flux-bootstrap.yaml so
the GitRepository + Kustomization land into a cluster that already
has working GHCR creds.
- products/catalyst/chart/templates/api-deployment.yaml mounts
CATALYST_GHCR_PULL_TOKEN from the catalyst-ghcr-pull-token Secret in
the catalyst namespace (key: token, optional: true so the Pod still
starts on misconfigured installs and Validate() owns the gate).
- docs/SECRET-ROTATION.md: yearly-rotation runbook for the GHCR token,
Hetzner per-Sovereign tokens, and the Dynadot pool-domain creds.
Includes the kubectl create secret one-liner with <GHCR_PULL_TOKEN>
placeholder; the token never lives in git.
- Tests: provisioner unit tests cover New() reading the env var,
tolerance of missing env, pool-mode validation rejection with
operator-facing error, BYO acceptance, and the json:"-" serialization
invariant. tests/e2e/hetzner-provisioning gains a
TestCloudInit_RendersGHCRPullSecret render-only integration test that
asserts the rendered cloud-init contains the Secret, applies it
before flux-bootstrap, and that the dockerconfigjson round-trips the
sample token through templatefile() correctly. Existing
pool-mode handler tests now t.Setenv the placeholder token; the
on-disk redaction test asserts the placeholder never reaches disk.
Gates:
- go vet ./... and go test -race -count=1 ./... in
products/catalyst/bootstrap/api: PASS.
- helm lint products/catalyst/chart: PASS (warnings pre-existing).
- tofu fmt + tofu validate: deferred to CI (no tofu binary on the
development host).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>