Founder corrective: prior diagram missed:
- 9 chart bugs surfaced + fixed today (#549, #553, #561, #567-#571, #568)
- 3 still in flight (#562 cilium-operator gateway-controller race,
#563 NS delegation + LB:53 + DNS-01 wildcard, #565 harbor CNPG)
- 12 chart bugs from prior session days (#474, #488, #489, #491, #492,
#494, #503, #506, #508, #510, #519, #536, #538, #539, #340)
Adds Phase 0d · Phase-8a chart bug bash with all of them.
Edges: every fix gates the bp-* HR it makes possible on a fresh
Sovereign integration test. Edge from #563 (handover-URL DNS-01
wildcard chain) → #454 makes the actual gating relationship explicit:
without #563 there is no working `console.<sovereign>.omani.works`,
which means no Phase-8a gate met.
The diagram should now match what the founder sees actually failing
on otech22, not the chart-released optimism of an earlier draft.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Per founder corrective: existing diagram missed the real blockers
surfaced during otech10..otech22 burns. The image-pull-through gap
(#557) and the cross-namespace secret gap (#543, #544) gate every
workload pull from a public registry — without them, Sovereign hits
DockerHub anonymous rate-limit on first provision and 30+ HRs are
ImagePullBackOff/CreateContainerConfigError.
Adds:
- Phase 0b · Image pull-through (#557 + #557B Sovereign-Harbor swap +
#557C charts global.imageRegistry templating). Edges to NATS / Gitea
/ Harbor / Grafana / Loki / Mimir / PowerDNS / Crossplane /
cert-manager-powerdns-webhook / Trivy / Kyverno / SPIRE / OpenBao
- Phase 0c · Cross-namespace secrets (#543 ghcr-pull Reflector + #544
powerdns-api-credentials reflect). Edges to bp-catalyst-platform and
bp-cert-manager-powerdns-webhook
- Phase 1 additions: #542 kubeconfig CP-IP fix and #547 helmwatch
38-HR threshold both gate Phase 8a integration test
- Phase 0b → Phase 8b edge: post-handover Sovereign-Harbor swap is
what makes "zero contabo dependency" DoD-met possible
WBS now reflects the cascade observed live, not the pre-Phase-8a model.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Per founder corrective: WBS hadn't been updated in 16h. The active
Phase-8a iteration is what's actually closing the integration-tested
gap, but the WBS still read as if Phase 8a hadn't started.
New §9b captures:
- 18 fixes landed in last 36h (#317, #340, #474, #487, #488, #489,
#491, #492, #494, #503, #506, #508, #510, #519, #531/#532/#534/#535/
#537, #536, #538, #539/#540, #542, #544, #547, #549, #553)
- Symptom → root cause → fix → PR per row, all linked to deployed SHAs
- Background agents in flight (#543 ghcr-pull Reflector, #548 dynadot
ClusterIssuer)
- Risk Register status — R3 / R4 exercised + resolved, R2 / R5 / R7 /
R8 still open
Updated as bugs land. The handover-state truth lives here, not in
Claude memory files.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
First runs of preflight A (bootstrap-kit) and E (Keycloak) failed with the
same error: helm OCI pull from ghcr.io/openova-io/bp-* returning 401
'unauthorized: authentication required'. bp-* are PRIVATE GHCR packages.
#460's agent fixed it for B in c26fbcaf. #461's already had GHCR login.
This commit applies the same helm-registry-login pattern to A and E.
WBS state on main after this commit:
- done (35): all chart-level + #317 + #319 + #453 + 4 preflights
- wip (0)
- blocked (3): 454, 455, 456 (Phase-8 live runs, operator-driven)
The preflights' first runs ALREADY surfaced a real CI bug pattern that
would have hit Phase 8a — exactly what they're for.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
PR #465 merged at 48b73af6 ships
.github/workflows/preflight-cilium-httproute.yaml — Phase-8a Risk R3
preflight (Cilium Gateway HTTPRoute admission for bp-catalyst-platform
on kind). Update §9 status row from "in flight" to "done".
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Surfaces Risk R6 (docs/omantel-handover-wbs.md §9a — Keycloak
realm-import config-CLI bootstrap timing untested). bp-keycloak 1.2.0
ships a sovereign realm + a public kubectl OIDC client via the
upstream bitnami/keycloak chart's keycloakConfigCli post-install Helm
hook (issue #326); this workflow proves it actually wires up on a
clean cluster before we run it on a real Sovereign.
Workflow installs bp-keycloak 1.2.0 on a kind cluster (helm/kind-action
v1, kindest/node:v1.30.6 — same versions as test-bootstrap-kit), waits
for the keycloak StatefulSet to roll out, polls for the
keycloakConfigCli post-install Job by label
(app.kubernetes.io/component=keycloak-config-cli), waits for it to
Complete, port-forwards svc/keycloak and asserts:
1. /realms/sovereign returns 200 (realm exists in Keycloak's DB).
2. The kubectl OIDC client is provisioned with publicClient=true,
redirectUris contains http://localhost:8000 (kubectl-oidc-login
default), and the groups client scope is wired with the
oidc-group-membership-mapper (the per-Sovereign k3s api-server's
--oidc-groups-claim flag depends on this).
Acceptance per ticket: if the post-install Job fails, the workflow
summary captures Job logs + StatefulSet logs + cluster state via
GITHUB_STEP_SUMMARY so a failed run is debuggable without re-running.
Triggers are event-driven only per CLAUDE.md "every workflow MUST be
event-driven, NEVER scheduled" rule — push on the workflow file itself
plus workflow_dispatch for ad-hoc re-runs.
Closes#462.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Twice-corrected discipline rule per founder pushback at 15:55 UTC:
- Original 15:38 'max 1-2 agents' was over-correction
- Real rule: scope-based not count-based
- 'Min 3, max 5 in flight' from feedback_agent_orchestration_discipline.md
still holds; what was wrong was dispatching out-of-scope work
- 4 agents in flight now: #459/#460/#461/#462 — all Phase-8a preflight
de-risking against §9a Risk register
State on main after this commit:
- done (31): all minimal Sovereign blueprints + foundation + CI + Phase 6 +
Phase 7 (#317 + #319 + #453 contract reconciliation)
- wip (4): 459, 460, 461, 462 (Phase-8a preflights, kind-cluster de-risking)
- blocked (3): 454, 455, 456 (Phase 8 operator-driven live runs)
DAG additions:
- New PRE subgraph 'Phase-8a preflight · de-risk before live run'
- Edges T459/T460/T461/T462 → T454 (preflights gate Phase 8a)
- §9 rows for #459-#462
- §13 rewritten with twice-corrected scope-not-count discipline
Co-authored-by: hatiyildiz <hatiyildiz@noreply.function-com>
#317's FinaliseHandover deleted the deployment record entirely, which
meant #319's `AdoptedAt` field was dormant — the post-handover redirect
at console.openova.io/sovereign/<id> 404'd instead of 301-ing to
console.<sovereign-fqdn>.
Fix: replace `store.Delete(id)` at the end of FinaliseHandover with a
slim-record save via the new `Deployment.SlimForHandover(adoptedAt)`
seam. The slim shape retains:
- id, sovereignFQDN, orgName, orgEmail, startedAt (audit-minimum)
- AdoptedAt = now() (redirect contract from #319 PR #451)
- Status: "adopted"
- closed eventsCh + done channels
Operational fields are zeroed: Result/tofuState, kubeconfig hash, PDM
reservation token, error, credentials. Consistent with §0
minimum-retention principle.
Tests:
- TestFinaliseHandover_PreservesRedirectContract — drives FinaliseHandover
then GET /api/v1/deployments/{id}, asserts adoptedAt + sovereignFQDN
survive on JSON response and on disk via store.Load round-trip
- TestSlimForHandover (table-driven) — full-record + minimal-record
transforms; asserts audit fields kept, redirect field set,
operational fields zeroed, credentials zeroed, channels closed
- TestSlimForHandover_StoreRecordRoundTrip — JSON encode/decode
cross-Pod-restart guard
- TestFinaliseHandover_FullFlow extended with slim-shape assertions
Anti-duplication: SlimForHandover lives next to other Deployment methods
in deployments.go (canonical seam). FinaliseHandover modifies the same
file referenced in the issue (handover.go); no parallel binary or
script.
WBS row #453 → done; class line T453 wip → done.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per founder corrective 2026-05-01. Prior WBS over-promised by:
1. Treating chart-released and chart-verified as 'done' indistinguishable
from DoD-met
2. Bundling epic #320 IAM access plane (#322-#326) as if part of omantel
handover scope
3. Hiding the fact that ZERO of the 23 minimal blueprints have ever been
reconciled together on a fresh Sovereign
Rewrite changes:
- §0 (NEW): Truth-of-state — explicit ladder chart-released → chart-verified
→ integration-tested → DoD-met. Today every 'done' ticket is at chart
level; zero are integration-tested; zero are DoD-met.
- §1: explicit out-of-scope carve-out for epic #320
- §2: split chart-status from reconcile-chain-status; latter reads ❓
unknown for all 23 (truthful)
- §4 DAG:
* adds Phase 7 cleanup #453 (#317↔#319 contract reconciliation)
* adds Phase 8a/8b/8c live-execution gates (#454/#455/#456)
* adds 🎯 DoD-met gate node tied to #456
* promotes T425 into Phase 4 (it was wrongly in SCAF subgraph as if it
were sustainment work — it's the foundation for #383/#384)
* keeps SCAF subgraph for genuine CI guardrails (#428/#438/#429/#430)
- §9: adds rows for #453/#454/#455/#456 explicitly bold + marks #324/#325
as ⏸ parked per scope rewrite
- §9a (NEW): Risk register — 8 known gaps that will surface in Phase 8a
- §12 (NEW): What we are NOT doing now — scope discipline
- §13 (NEW): Agent-orchestration reset — max 1-2 agents on Phase-8
follow-ups; NO capacity-fill on post-omantel scope until #456 closes
The 5 sequential steps to DoD-met are listed in §12. There are no
parallel-agent shortcuts past Phase 7. Phase 8 is operator-driven.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Customer-side decommission UI + PDM release endpoints + Catalyst-Zero
redirect to console.<sovereign-fqdn> once handover is finalised.
Anti-duplication map (canonical seams reused, NOT duplicated):
- catalyst-api wipe.go: existing wipe endpoint already drives PDM
release + Hetzner purge + tofu destroy + local cleanup. The new
DecommissionPage POSTs to the same endpoint with an optional
backup-destination payload.
- PDM Allocator.Release: child zone delete + parent-zone NS revert
+ allocation row delete already idempotent. The new sovereign-side
POST /api/v1/release is a thin FQDN-shaped wrapper that splits at
the first dot and delegates to Allocator.Release.
- The orphan force-release path adds gates (X-Force-Release-Confirm
header, 30-day grace, DNS-NXDOMAIN check) on top of the same seam.
Scope contract with #317 (handover finalisation): NOT touching
internal/handler/handover.go. AdoptedAt is a new contract field on
Deployment + store.Record that the redirect helper consumes; future
#317 enhancement will populate it before deletion.
Files:
core/pool-domain-manager/internal/handler/release.go (NEW)
core/pool-domain-manager/internal/handler/release_test.go (NEW)
core/pool-domain-manager/internal/handler/handler.go (route wiring)
products/catalyst/bootstrap/api/internal/handler/deployments.go (AdoptedAt field + State()/toRecord/fromRecord)
products/catalyst/bootstrap/api/internal/handler/deployments_adopted_test.go (NEW)
products/catalyst/bootstrap/api/internal/store/store.go (AdoptedAt persistence)
products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.tsx (NEW)
products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.test.tsx (NEW)
products/catalyst/bootstrap/ui/src/pages/sovereign/Dashboard.tsx (Decommission link)
products/catalyst/bootstrap/ui/src/app/router.tsx (redirect + decom route)
docs/omantel-handover-wbs.md (T319 → done)
Tests: 13 new Go test cases + 5 new vitest cases all green. catalyst-
api + PDM full suites pass. Live execution against omantel deferred to
Phase 8 per ticket scope (no Dynadot/Hetzner exec here).
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Class line had stale T326 in wip — both #322 and #326 merged on main
(b6810c19 and 20b89607). State on main after this tick:
- done (29)
- wip (2): 319 (decommission, Phase 7), 323 (user-access editor)
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Wires the per-Sovereign K8s api-server's --oidc-* validator to the
per-Sovereign Keycloak realm so customer admins can authenticate
kubectl directly against their Sovereign — no static admin-kubeconfig
handoff, no rotated bearer-token exchange.
infra (cloud-init):
- Add 6 --kube-apiserver-arg=oidc-* flags to the k3s install line in
infra/hetzner/cloudinit-control-plane.tftpl. Issuer URL composed
from sovereign_fqdn (https://auth.\${sovereign_fqdn}/realms/sovereign)
per INVIOLABLE-PRINCIPLES #4 — never hardcoded. Username/groups
prefixes scope OIDC subjects under "oidc:" so RoleBindings reference
e.g. subjects[0].name=oidc:alice@org, distinct from local SAs/x509.
Canonical seam (anti-duplication rule, ADR-0001 §11.3):
- The bp-keycloak chart already bundles bitnami/keycloak's
keycloakConfigCli post-install Helm hook Job, which imports realms
declared under values.keycloak.keycloakConfigCli.configuration. We
enable the existing seam — no bespoke kubectl-exec realm-creation
script, no custom Admin-API call from catalyst-api.
bp-keycloak chart (1.1.2 → 1.2.0):
- Enable keycloakConfigCli + ship inline sovereign-realm.json with:
realm "sovereign" (invariant per Sovereign — Keycloak resolves the
issuer claim from the request hostname, so no per-FQDN realm
rename), default groups sovereign-admins/-ops/-viewers, oidc-group
-membership-mapper emitting "groups" claim, public OIDC client
"kubectl" with localhost:8000 + OOB redirect URIs (kubectl-oidc
-login defaults), publicClient=true (kubectl runs locally and
cannot safely hold a secret), PKCE S256 enforced.
- Bump version 1.1.2 → 1.2.0 (semver MINOR, additive shape).
- Bump bootstrap-kit slot 09 in _template/, omantel.omani.works/,
otech.omani.works/ to version: 1.2.0.
- New chart test tests/oidc-kubectl-client.sh (4 cases) — all green.
- Existing tests/observability-toggle.sh — still green.
Documentation:
- Add §11 "kubectl OIDC for customer admins" runbook to
docs/omantel-handover-wbs.md with one-time workstation setup
(kubectl krew install oidc-login + config set-credentials),
sovereign-admin RBAC binding (oidc:sovereign-admins → cluster
-admin), and 401-debugging table mapping common symptoms to
root causes.
- Carve #326 out of §7 "Out of scope" — it is shipped.
- Add §9 status row.
Validation:
- grep -c 'oidc-issuer-url' infra/hetzner/cloudinit-control-plane.tftpl
→ 2 (comment + the actual flag in the curl line)
- grep -c 'oidc-username-claim' → 2
- helm template platform/keycloak/chart → renders post-install
keycloak-config-cli Job + ConfigMap with kubectl client (3 hits
on grep "kubectl"; 1 hit on "clientId": "kubectl")
- bash scripts/check-vendor-coupling.sh → exit 0 (HARD-FAIL mode)
- 4/4 oidc-kubectl-client gates green; 3/3 observability-toggle
gates green
Out of scope (deferred to follow-up tickets):
- Per-Sovereign user provisioning UI (#322, #323)
- Refresh-token revocation on RoleBinding deletion (#324)
- provider-kubernetes Crossplane ProviderConfig per Sovereign (#321)
- omantel migration / Phase 8 live execution
NO catalyst-api or UI source files touched (those are #319/#322/#323
agents' territories per agent brief).
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Ship the server-side machinery for issue #317 — zero-Sovereign-footprint
retention. When bp-catalyst-platform.Ready=True on the new Sovereign,
the wizard / post-install hook calls /api/v1/handover/finalise/{id}
and Catalyst-Zero runs the 4-step finalisation:
1. Emit final SSE event (`event: handover, data: {sovereignFqdn,
consoleURL, finalisedAt}`) through the existing emitWatchEvent
seam — the wizard's reducer picks it up without code change.
2. Cancel the per-deployment helmwatch informer via a new
helmwatch.Watcher.Cancel() method that wraps the existing
watchCtx cancel func — same teardown path as the timeout branch,
no new informer or goroutine.
3. Walk the per-deployment OpenTofu workdir, base64-archive every
regular file, POST to the new Sovereign's
/api/v1/handover/tofu-archive endpoint. The new Sovereign's
catalyst-api seals the blob into its OpenBao at
`secret/catalyst/tofu-phase0-archive` (KV-v2). On 200 OK,
Catalyst-Zero deletes /var/lib/catalyst/tofu/<sovereign>/.
4. Delete the kubeconfig file + the deployment record JSON.
Receiver endpoint (POST /api/v1/handover/tofu-archive) lives on the
same catalyst-api binary; production Sovereigns set
CATALYST_OPENBAO_ADDR + CATALYST_OPENBAO_TOKEN and the receiver is
active. Catalyst-Zero leaves both unset so a misrouted POST returns
503 ("not handover target") instead of misbehaving.
Hetzner-token rotation (issue body step 4) is deferred to Crossplane
Provider rotation per #425 — catalyst-api never makes bespoke cloud-
API calls (docs/INVIOLABLE-PRINCIPLES.md #3). The operator-supplied
Phase-0 token is already GC'd from memory after writeTfvars.
Live execution against a real omantel cluster is deferred to Phase 8
(epic #369, scaffold #429). This PR ships code + tests only.
Anti-duplication audit (canonical seams used):
- internal/handler/handler.go (existing Handler) extended with
3 new fields + 3 setter methods. No new Handler shape.
- internal/handler/deployments.go emitWatchEvent is the SSE emit
seam — handover handler reuses it.
- internal/helmwatch/helmwatch.go Watcher gets Cancel() — extends
existing struct, no parallel watcher.
- internal/openbao/ is the FIRST and ONLY OpenBao client (verified
by grep: no prior internal/vault, internal/secrets/openbao, or
similar package existed).
- internal/provisioner provides WorkDir for tofu workdir cleanup.
- internal/store provides Delete(id) for record removal.
- Receiver endpoint lives on the SAME binary; per-deployment file
walking via filepath.Walk is stdlib, not a duplicated archive
package.
Tests:
- 9 new handler-side cases (handover_test.go) — full flow, dry-run,
receiver-failure-keeps-local-state, 404, no-OpenBao→503, OpenBao
seal, validation errors, archive build, missing-dir empty.
- 4 new openbao package cases (client_test.go) — happy path,
default mount, status error wrap, required-field validation.
- All existing tests still pass: handler, helmwatch, openbao,
provisioner, store, jobs, dynadot, hetzner, k8scache, objectstorage.
WBS row #317 → 🟢 done; DAG class line includes T317.
Out of scope (per ticket guardrails):
- No core/pool-domain-manager changes (#319's territory)
- No products/catalyst/bootstrap/ui changes (decommission UI is #319)
- No SME-namespace touch (ADR-0001 §9.4)
- No live Hetzner / Dynadot / OpenBao calls
- No vendor-name reintroduction; no schedule: cron triggers
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Verify bp-catalyst-platform:1.1.8 (the umbrella over 10 leaf bp-* deps —
cilium / cert-manager / flux / crossplane / sealed-secrets / spire /
nats-jetstream / openbao / keycloak / gitea) installs cleanly. This is
Phase 6 of #369 and the convergence point pulling from Phase 3-5
(gitea+keycloak+crossplane+harbor+grafana) and Phase 2a (TLS via the
powerdns webhook).
Verification (chart-only, contabo, ~25 min wall time):
* `helm dep build products/catalyst/chart/` — clean, all 10 OCI deps
pulled from `oci://ghcr.io/openova-io`.
* `helm template` defaults render 259 docs / 36k+ lines clean — no
HTTPRoute (skip-render without `ingress.hosts.console.host`/`api.host`
per the #387/#402 if-host-emit pattern), legacy contabo Ingress
templates excluded by `.helmignore` on Sovereign installs.
* With per-Sovereign overlay (sovereignFQDN + ingress.hosts.console.host
+ ingress.hosts.api.host) renders 261 docs incl. 2 HTTPRoutes:
- catalyst-ui → hostname console.<sov>, backend port 80
- catalyst-api → hostname api.<sov>, backend port 8080
both attached to `cilium-gateway/kube-system` parentRef sectionName
`https`.
* Server-side dry-run of catalyst-specific resources (api-deployment,
api-service, ui-deployment, ui-service, httproute, api-deployments-pvc,
api-cache-pvc) — all 8 accepted by API server.
* Smoke-install of catalyst-specific manifests in `catalyst-platform-smoke`
ns on contabo:
- catalyst-ui Deployment 1/1 Ready in <30s
- catalyst-api Deployment 1/1 Ready 18s (after stub
`dynadot-api-credentials` + `ghcr-pull-secret` provided)
- kubelet liveness/readiness HTTP 200 on `/healthz`
- in-cluster curl http://catalyst-api.catalyst-platform-smoke.svc:8080/healthz
→ HTTP 200
- both PVCs (catalyst-api-deployments 1Gi + catalyst-api-cache 5Gi)
Bound on local-path StorageClass.
Smoke torn down clean.
Per-Sovereign overlay drift check
---------------------------------
`clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml` ↔
`omantel.omani.works/` ↔ `otech.omani.works/` differ ONLY in literal
${SOVEREIGN_FQDN} substitution. No drift fix needed (in contrast to #381
grafana, which DID need a `gateway.host` retrofit on overlays).
helmwatch
---------
helmwatch is an in-process Go internal package inside catalyst-api
(`products/catalyst/bootstrap/api/internal/helmwatch/`) — NOT a separate
Deployment. Its readiness is exercised by api-deployment readiness via
the catalyst-api `/healthz` probe.
HTTPRoute admission
-------------------
Deferred to a real Sovereign run. contabo runs Traefik for the SME demo
(ADR-0001 §9.4 protected) and has no `cilium-gateway` Gateway, so the
HTTPRoute parentRef cannot be satisfied here. Phase 8 omantel E2E
(#429 scaffold) covers Gateway admission on the live Sovereign.
Sub-chart cluster-scoped CRD installs
-------------------------------------
The umbrella's 10 leaf bp-* deps install cluster-scoped CRDs (bp-cilium
ciliumnetworkpolicies, bp-spire ClusterSPIFFEID, bp-cert-manager
clusterissuers, bp-cnpg postgresql.cnpg.io, etc.) plus DaemonSets (CNI,
spire-agent). On contabo these are owned by the SME demo or unavailable;
installing the full umbrella here would either clobber SME (forbidden)
or fail on missing CRDs. Per Flux `dependsOn` chain, sub-charts install
FIRST on a Sovereign, then bp-catalyst-platform. Each sub-chart's
correctness is independently verified by sibling chart-verify tickets:
- #376 bp-gitea chart-verified
- #377 bp-keycloak chart-verified
- #378 bp-crossplane chart-verified
- #382 bp-spire chart-verified
- #381 bp-grafana chart-verified
- #380 bp-trivy chart-verified
- #379 bp-kyverno chart-verified
- #375 bp-nats-jetstream chart-verified
- #383 bp-harbor chart-released
Vendor-coupling guardrail
-------------------------
`bash scripts/check-vendor-coupling.sh` → exit 0, "no vendor-coupling
violations found across 4 scan path(s)".
Files touched
-------------
docs/omantel-handover-wbs.md only:
- §2 row 23: bp-catalyst-platform marked chart-verified
- §9 row #385: parked → 🟢 chart-verified with full verification
evidence
- DAG class line: T385 added to the `done` class
No chart edits — the existing 1.1.8 chart renders + smoke-installs
clean. No bootstrap-kit edits — overlays already match template modulo
${SOVEREIGN_FQDN}. No new files authored (anti-duplication rule).
Sovereign-impact deferred to Phase 7 handover machinery (#317 / #319)
and Phase 8 omantel E2E (#429 spec).
Closes#385.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
State on main after this commit:
- done (25): all minimal Sovereign blueprints + foundation + #438
- wip (1): 385 (catalyst-platform single-blueprint verify, Phase 6 gate)
#438 merged at 87ba48c4 — vendor-coupling guardrail hard-fail mode now
auto-engaged on this repo.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
State on main after this commit:
- done (24): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,383,384,387,392,425,428,429,430
- wip (2): 385 (catalyst-platform single-blueprint verify, Phase 6 gate), 438 (CI guardrail path mode-gate fix)
#383 merged at 0511efbd. All 23 minimal Sovereign blueprints now
chart-released or chart-verified. Phase 6 → 7 → 8 path is open.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
State on main after this commit:
- done (23): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,384,387,392,425,428,429,430
- wip (1): 383 (Harbor chart rework on post-#425 vendor-agnostic shape)
#425 merged at 0172b9a8 — vendor-agnostic Object Storage abstraction +
OpenTofu→Crossplane handover. #383 unblocked + dispatched against the
new shape (objectStorage.s3.* / flux-system/object-storage).
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Adds the post-handover wizard step that delegates the parent zone (e.g.
omani.works) to the new Sovereign's PowerDNS, plus a light catalyst-api
stub for live execution in Phase 8.
Wizard (UI):
- New StepNSDelegation slotted as terminal post-handover step (after
StepSuccess) so the LB IP is in hand before we ask the operator to
delegate.
- Default mode: emit-runbook only. Renders the exact set_dns2 curl
command with add_dns_to_current_setting=yes (record-preserving) for
copy-paste. NEVER embeds the API key — operator exports
$DYNADOT_API_KEY in their shell.
- Auto-apply mode: gated behind a toggle + double-confirm field
matching the parent zone. Defaults OFF. POSTs to a stub
/api/v1/dns/parent-zone/delegate which is 501 today; the wizard
surfaces a "Phase 8" hint instead of a generic error.
- Memory rule honoured: NO live set_dns2 call reachable on a normal
wizard flow without explicit operator double-confirm.
- 17 new vitest cases (helper + render + auto-apply gating + 501
stub-aware error) all green.
Catalyst-API (Go):
- Extends existing internal/dynadot package (canonical seam — no new
package, no PDM source touched).
- New Client.AddNSDelegation(parentZone, sovereignFQDN, lbIP, extraNS)
writes 3 NS + 1 glue A record using add_dns_to_current_setting=yes.
Fail-closed via IsManagedDomain gate (refuses to call the API for an
unmanaged zone).
- New pure BuildNSDelegationRunbook helper that mirrors the JSX-side
buildDynadotRunbookCommand so wizard and API emit the same shape.
- 6 new test cases (happy path / unmanaged-zone refusal / table-driven
validation / custom NS hosts / runbook builder) all green.
Per ticket #374 scope: wizard step + emitted runbook + light stub;
live execution deferred to Phase 8 of the omantel handover WBS. WBS
row updated to wizard-shipped state.
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Phase 8 of the omantel handover (#369) needs an automated E2E that proves
DoD: omantel.omani.works runs as a fully self-sufficient Sovereign with
zero contabo dependency post-handover. Today this is a SCAFFOLD — when
Phase 4/6/7 land, dispatching the new workflow against a live omantel is
the entire Phase 8.
Canonical seam (anti-duplication, per memory/feedback_anti_duplication_seam_first.md):
- tests/e2e/playwright/tests/ ← mirror of sovereign-wizard.spec.ts shape
(NOT specs/ as the issue body said — actual repo path is tests/)
- tests/e2e/playwright/playwright.config.ts (BASE_URL handling, retries,
workers=1, reporter=list) — reused as-is
- tests/e2e/playwright/tests/_helpers.ts:reachable() — reused for the
pre-flight skip-when-unreachable pattern
- .github/workflows/playwright-smoke.yaml — workflow shape (checkout v4,
setup-node v4, npm install, playwright install --with-deps chromium,
upload-artifact on failure) — mirrored, NOT duplicated
What ships:
- tests/e2e/playwright/tests/omantel-handover.spec.ts (NEW, 6 tests):
1. sovereign Ready + 23/23 blueprints
2. all bp-* HelmReleases Ready=True
3. catalyst-platform self-hosts (healthz + dashboard "23 / 23 ready")
4. vendor-agnostic Object Storage (post-#425 canonical secret name
flux-system/object-storage — NOT hetzner-object-storage)
5. dig +trace omantel.omani.works ends at omantel NS, not contabo
6. zero contabo dependency (omantel /api/healthz keeps returning 200)
Self-skips when OMANTEL_BASE_URL/OMANTEL_API_BASE/OPERATOR_BEARER unset.
- .github/workflows/omantel-e2e-handover.yaml (NEW):
workflow_dispatch ONLY (no schedule cron — per CLAUDE.md "every workflow
MUST be event-driven, NEVER scheduled"). Inputs let the operator override
base URLs at dispatch time.
- docs/omantel-handover-wbs.md:
new §10 "Phase 8 acceptance criteria (executable DoD)" — 6 bullets 1:1
with the spec test() blocks; §9 status row added for #429
(🟢 scaffold-shipped).
Local verification:
cd tests/e2e/playwright && npm install && \
npx playwright test --list tests/omantel-handover.spec.ts
→ 6 tests listed cleanly
npx playwright test tests/omantel-handover.spec.ts
→ 6 skipped (env vars unset, expected)
Out of scope (per #425 / #428 territory split):
- internal/hetzner/, infra/hetzner/, platform/velero/chart/,
clusters/.../34-velero.yaml — #425's vendor-agnostic sweep
- .github/workflows/check-vendor-coupling.yaml — #428's coupling guard
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
* fix(bp-external-secrets): split ClusterSecretStore into bp-external-secrets-stores chart (resolves CRD ordering, closes#331)
bp-external-secrets@1.0.0 deadlocked on first install on otech.omani.works:
Helm install failed for release external-secrets-system/external-secrets
with chart bp-external-secrets@1.0.0:
failed post-install: unable to build kubernetes object for deleting hook
bp-external-secrets/templates/clustersecretstore-vault-region1.yaml:
resource mapping not found for name: "vault-region1" namespace: ""
no matches for kind "ClusterSecretStore" in version "external-secrets.io/v1beta1"
Root cause: Helm's `helm.sh/hook-delete-policy: before-hook-creation` ran
a kubectl-style lookup of the existing ClusterSecretStore CR before the
upstream `external-secrets` subchart's CRDs finished registration. The
in-line ClusterSecretStore template (templates/clustersecretstore-vault-
region1.yaml) and the upstream subchart's CRDs co-installed in the same
release; admission ordering wasn't deterministic enough to make the
post-install hook safe.
Fix — same pattern as PR #247 (bp-crossplane@1.1.3 ↔ bp-crossplane-claims@1.0.0):
split the chart into controller + stores. Flux dependsOn orders them.
- bp-external-secrets@1.1.0 — controller-only (just upstream subchart
+ NetworkPolicy + ServiceMonitor toggle). CRDs register here.
- bp-external-secrets-stores@1.0.0 (NEW) — the default
ClusterSecretStore CR; depends on bp-external-secrets being Ready.
No Helm hooks needed: by the time this chart's HelmRelease starts,
Flux has already verified bp-external-secrets is Ready=True and
therefore the CRDs are registered.
Files:
NEW: platform/external-secrets-stores/blueprint.yaml (1.0.0)
NEW: platform/external-secrets-stores/chart/Chart.yaml (1.0.0; no upstream subchart, annotation `catalyst.openova.io/no-upstream: "true"`)
NEW: platform/external-secrets-stores/chart/values.yaml (clusterSecretStore.* knobs moved from controller chart)
MOVED: platform/external-secrets/chart/templates/clustersecretstore-vault-region1.yaml
→ platform/external-secrets-stores/chart/templates/clustersecretstore-vault-region1.yaml
(Helm hook annotations removed — Flux dependsOn now handles ordering)
TOUCHED: platform/external-secrets/chart/Chart.yaml (1.0.0 → 1.1.0; description note appended)
TOUCHED: platform/external-secrets/blueprint.yaml (1.0.0 → 1.1.0)
TOUCHED: platform/external-secrets/chart/values.yaml (clusterSecretStore block removed; pointer comment added)
NEW: clusters/_template/bootstrap-kit/15a-external-secrets-stores.yaml
(Flux HelmRelease, dependsOn: [bp-external-secrets, bp-openbao])
TOUCHED: clusters/_template/bootstrap-kit/15-external-secrets.yaml
(chart version 1.0.0 → 1.1.0)
TOUCHED: clusters/_template/bootstrap-kit/kustomization.yaml
(slot 15a inserted after 15)
Out of scope for this PR (separate tickets):
- blueprint-release.yaml CI fan-out: verify the path-matrix picks up
the new platform/external-secrets-stores/ directory automatically;
if not, add the directory to the matrix in a follow-up.
- Per-Sovereign cluster directory edits (#257 will delete those).
- Phase 0 minimum trim (#310 will renumber slots; this PR uses 15a as
a non-disruptive sub-slot insertion that works with both the current
35-slot kustomization and the eventual 15-slot canonical layout —
when #310 renumbers, 15 + 15a become 08 + 09 in the canonical order).
Refs: #331 (this issue), #247 (pattern reference — bp-crossplane split),
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(scripts): register bp-external-secrets-stores in expected-bootstrap-deps.yaml
The dependency-graph-audit CI step rejected PR #334 because the new
bp-external-secrets-stores HR was on disk at slot 15a but missing from
the expected DAG. This commit adds it with the same dependsOn shape as
clusters/_template/bootstrap-kit/15a-external-secrets-stores.yaml:
[bp-external-secrets, bp-openbao].
Refs: #331, #310 (Phase 0 minimum), PR #334.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bp-external-secrets): retire CR cases from controller test, add stores-toggle (#331)
After splitting the default ClusterSecretStore into bp-external-secrets-stores
@1.0.0, the controller chart's observability-toggle integration test still
expected the CR to render in the controller chart (Cases 4 + 5). Those
assertions now belong on the new chart.
Changes:
- platform/external-secrets/chart/tests/observability-toggle.sh:
Replace Cases 4+5 with a single inverted assertion — the controller
chart MUST render ZERO ClusterSecretStore CRs (top-level kind:); only
the upstream subchart's CRD definition (whose spec.names.kind value is
"ClusterSecretStore" at non-zero indent) is allowed.
- platform/external-secrets-stores/chart/tests/clustersecretstore-toggle.sh:
NEW. Mirrors the retired Cases 4+5 against the stores chart, plus a
Case 3 that asserts clusterSecretStore.server overrides propagate.
Local smoke:
bash platform/external-secrets/chart/tests/observability-toggle.sh → 4/4 PASS
bash platform/external-secrets-stores/chart/tests/clustersecretstore-toggle.sh → 3/3 PASS
Refs: #331, PR #334.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(scripts): handle alphanumeric sub-slot suffixes in check-bootstrap-deps.sh
PR #334 (issue #331) added slot 15a-external-secrets-stores as a sub-slot
between numeric slots 15 and 16. The bootstrap-deps audit script's
`printf '%02d'` formatter rejected `15a` with:
scripts/check-bootstrap-deps.sh: line 390: printf: 15a: invalid number
Fix: detect non-numeric slot tokens and pass them through verbatim. Numeric
slots still render as zero-padded `01..49` for output alignment.
Local smoke:
$ bash scripts/check-bootstrap-deps.sh
...
[P] slot 15 bp-external-secrets <-- bp-cert-manager bp-openbao
[P] slot 15a bp-external-secrets-stores <-- bp-external-secrets bp-openbao
...
OK: bootstrap-kit dependency graph audit PASSED
Refs: #331, PR #334.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(wbs): tick #331 chart-released
bp-external-secrets@1.1.0 (controller-only) + bp-external-secrets-stores@1.0.0
(NEW) shipped in PR #426. Helm-template acceptance + both toggle tests +
dependency-graph-audit all green. Sovereign-impact deferred to Phase 8.
Refs: #331, PR #426.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
* feat(bp-velero): Hetzner Object Storage backend wiring (closes#384)
Velero on a Hetzner Sovereign now writes its backups DIRECTLY to Hetzner
Object Storage per ADR-0001 §13 (S3-aware app architecture rule) +
docs/omantel-handover-wbs.md §3 — NOT SeaweedFS, which is reserved as a
POSIX→S3 buffer for legacy POSIX-only writers and is not in the minimal
Sovereign set.
Mirrors the Hetzner-direct backend pattern Agent #383 is wiring for
Harbor; both consume the canonical flux-system/hetzner-object-storage
Secret shipped by issue #371 (cloud-init writes 5 keys: s3-endpoint /
s3-region / s3-bucket / s3-access-key / s3-secret-key, derived from
the operator-issued Hetzner-Console keys + the per-Sovereign bucket
provisioned by OpenTofu's aminueza/minio resource).
platform/velero/chart/ (umbrella chart, bumped to 1.1.0):
- templates/_helpers.tpl: NEW — bp-velero.fullname / bp-velero.labels
helpers + bp-velero.hetznerCredentialsSecretName (default
`velero-hetzner-credentials`).
- templates/hetzner-credentials-secret.yaml: NEW — synthesises a
velero-namespace Secret with a single `cloud` key in AWS-CLI INI
format from .Values.veleroOverlay.hetzner.s3.{accessKey,secretKey}.
The upstream Velero deployment mounts this at /credentials/cloud
via existingSecret + AWS_SHARED_CREDENTIALS_FILE. Skip-render path
when veleroOverlay.hetzner.enabled is false (default — keeps
contabo render clean) or useExistingSecret is true (operator
supplied Secret out-of-band).
- values.yaml: BSL provider/region/s3Url/bucket fields populated as
placeholders the per-Sovereign HelmRelease overrides via Flux
valuesFrom; backupsEnabled defaults FALSE so default render emits
no half-broken BSL; veleroOverlay.hetzner block surfaces the
operator-overridable fields. Long-form rationale comments inline
on each value per the chart's existing docstring style.
clusters/_template/bootstrap-kit/34-velero.yaml (+ omantel + otech):
- dependsOn: bp-seaweedfs REMOVED — Velero is no longer a SeaweedFS
consumer on Sovereigns (was the old SeaweedFS-tiered architecture
that minimal-omantel retired in favour of cloud-native S3).
- chart version bumped 1.0.0 → 1.1.0.
- valuesFrom block added: 5 Secret-key entries pull each canonical
s3-* key into the matching umbrella value path. Plaintext
credentials never appear in the committed manifest; Flux
dereferences valuesFrom at HelmRelease apply time.
- values block adds the baseline veleroOverlay.hetzner.enabled=true
+ velero.credentials.{useSecret:true,existingSecret:velero-hetzner-
credentials} + BSL provider/credential/s3ForcePathStyle scaffolding
that the valuesFrom entries fill in.
docs/omantel-handover-wbs.md:
- §2 row 19: "❌ chart needs S3 endpoint rework" → "🟢 chart-released
v1.1.0 — Hetzner Object Storage backend wired to #371 secret".
- §9 #384 row: detailed status with smoke evidence.
Smoke evidence (contabo, default values — no Hetzner credentials):
- helm template t . → renders cleanly (no Hetzner Secret, no BSL).
- helm template t . --set veleroOverlay.hetzner.enabled=true \
--set ...accessKey=AK_TEST --set ...secretKey=SK_TEST \
--set velero.backupsEnabled=true (+ BSL config) →
Secret/velero-hetzner-credentials with `cloud` INI key emitted +
BackupStorageLocation/default with provider=aws,
bucket=omantel-velero, region=fsn1,
s3Url=https://fsn1.your-objectstorage.com.
- helm install velero-smoke . -n velero-smoke (defaults) → pod
velero-69bb84c5-669sh Ready 1/1 in 48s. Smoke torn down clean.
Hetzner-S3 E2E deferred to Phase 8 (first omantel run) — contabo has
no Hetzner Object Storage credentials so end-to-end backup→restore
verification can't run here.
Anti-duplication rule: NO bash scripts authored, NO parallel
implementations of upstream Velero functionality. Upstream Velero +
velero-plugin-for-aws natively support any S3-compatible backend; the
work here is values + a credential-shape adapter Secret, not a fork.
Closes#384.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(scripts): drop bp-seaweedfs dep from bp-velero expected DAG (#384)
Mirrors the dependsOn removal in clusters/_template/bootstrap-kit/34-
velero.yaml from the parent commit. Velero on Hetzner Sovereigns now
writes directly to Hetzner Object Storage (ADR-0001 §13 + WBS §3); no
in-cluster prerequisite Blueprint is required.
Local `bash scripts/check-bootstrap-deps.sh` now passes (0 drift,
0 cycles). The CI failure on the parent commit's PR was the audit
flagging bp-velero as having a missing edge to bp-seaweedfs because
this expected-DAG file still listed it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bp-trivy:1.0.0 already published; smoke install on contabo (trivy-smoke
ns) reached operator Ready in ~30s, log4shell-vulnerable-app test
Deployment yielded VulnerabilityReport with 386 CVEs (15 CRITICAL / 74
HIGH) including the target CVE-2021-44228 (log4shell) on log4j-core
2.14.1 flagged CRITICAL. Bootstrap-kit slot 30 wired in _template/,
omantel.omani.works/, otech.omani.works/. Smoke torn down clean.
Closes#380.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bp-kyverno:1.0.0 (digest sha256:16edc78e…) was already published on GHCR
on 2026-04-30. The chart is correct for the minimal-Sovereign use case —
confirmed via smoke install on contabo.
Smoke evidence:
- helm template renders 80 resources clean (22 CRDs, 4 controller
Deployments, 5 Pods, 6 Services, ServiceAccounts, ClusterRoles, etc.)
- helm install in kyverno-smoke ns: all 4 controllers (admission,
background, cleanup, reports) reached 1/1 Ready in 81s
- ClusterPolicy 'disallow :latest' admission denial verified end-to-end:
- nginx:latest BLOCKED with 'admission webhook "validate.kyverno.svc-fail"
denied the request'
- nginx:1.27-alpine admitted normally
- Smoke torn down clean (release uninstalled, namespaces deleted,
no leftover CRDs)
Bootstrap-kit slot 27-kyverno.yaml is already wired in _template/,
omantel.omani.works/, and otech.omani.works/ — all overlays clean
(only ${SOVEREIGN_FQDN} sovereign-label substitution diff).
WBS §2 row 20 + §9 row #379 updated to chart-verified. Class moves from
wip to done in the §6 Mermaid graph.
Sovereign-impact (running on omantel cluster) deferred to Phase 8 per
ADR-0001 §9.4.
Closes#379
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bp-gitea:1.1.2 already published; smoke-installed in `gitea-smoke` ns on
contabo, both pods Ready in ~2m38s, /api/v1/version returns 1.22.3 (HTTP
200), admin auth verified. Smoke torn down clean.
In-scope hygiene fix to clusters/otech.omani.works/bootstrap-kit/10-gitea.yaml
— replaces stale upstream `ingress.hosts[]` overlay with the
post-#387/#402 `gateway.host` shape so otech matches the _template/ and
omantel.omani.works/ overlays. helm-template default-values renders 15
manifests clean (HTTPRoute correctly skip-renders without `gateway.host`).
WBS §2 row 13 + §9 row #376 updated to chart-verified.
Closes#376.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bp-grafana 1.0.0 was published by blueprint-release run 25214143810 on
commit a1bd5502 (alongside the #387 Gateway API HTTPRoute templates).
This commit verifies the chart on contabo and brings the per-Sovereign
overlays in line with the _template (and with the bp-keycloak pattern
shipped in #377).
Verification:
- helm template defaults → 13 kinds (HTTPRoute skip-renders when
gateway.host is empty, per the #387/#402 if-host-emit pattern)
- helm template with gateway.host=grafana.test.example.com → 14 kinds
(incl. HTTPRoute)
- smoke install in grafana-smoke ns: 1/1 Ready in 65s; in-cluster GET
http://smoke-grafana/login → HTTP 200; /api/health → 200; image
docker.io/grafana/grafana:12.3.1 confirmed; smoke torn down clean.
Per-Sovereign overlay drift fix:
- clusters/omantel.omani.works/bootstrap-kit/25-grafana.yaml — add
values.gateway.host = grafana.omantel.omani.works (was missing).
- clusters/otech.omani.works/bootstrap-kit/25-grafana.yaml — add
values.gateway.host = grafana.otech.omani.works (was missing).
Both now match the _template and the bp-keycloak otech overlay shape.
Scope clarification: the original ticket said "Bundle: Alloy + Loki +
Mimir + Tempo + Grafana dashboards" but the actual chart split has
Alloy/Loki/Mimir/Tempo as sibling Blueprints at slots 21-24, with
bp-grafana as the visualizer-only at slot 25. WBS §2 row updated to
reflect this. Each LGTM sibling has its own ticket.
Closes#381
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(catalyst): Hetzner Object Storage credential pattern (Phase 0b, #371)
Adds the per-Sovereign Hetzner Object Storage credential capture + bucket
provisioning Phase 0b path described in the omantel handover WBS §5.
Hybrid Option A+B: wizard collects operator-issued S3 credentials (Hetzner
exposes no Cloud API to mint them — they're issued once in the Hetzner
Console and the secret half is shown exactly once), and OpenTofu
auto-provisions the per-Sovereign bucket via the aminueza/minio provider
+ writes a flux-system/hetzner-object-storage Secret into the new
Sovereign at cloud-init time so Harbor (#383) and Velero (#384) find
their backing-store credentials already in the cluster from Phase 1
onwards.
Extends the EXISTING canonical seam at every layer (per the founder's
anti-duplication rule for #371's session): the existing Tofu module at
infra/hetzner/, the existing handler/credentials.go validator, the
existing provisioner.Request struct, the existing store.Redact path,
and the existing wizard StepCredentials. No parallel binaries / scripts
/ operators introduced.
infra/hetzner/ (Tofu module — Phase 0):
- versions.tf: declare aminueza/minio provider (Hetzner's official
recommendation for S3-compatible bucket creation per
docs.hetzner.com/storage/object-storage/getting-started/...)
- variables.tf: 4 sensitive vars — region (validated against
fsn1/nbg1/hel1, the European-only OS regions as of 2026-04),
access_key, secret_key, bucket_name (RFC-compliant S3 naming)
- main.tf: minio_s3_bucket.main resource — idempotent on re-apply,
no force_destroy (Velero archive must survive a control-plane
reinstall), object_locking=false (content-addressed digests are
the immutability guarantee for Harbor; Velero uses S3 versioning)
- cloudinit-control-plane.tftpl: write
flux-system/hetzner-object-storage Secret with the canonical
s3-endpoint/s3-region/s3-bucket/s3-access-key/s3-secret-key keys
Harbor + Velero charts consume via existingSecret refs
- outputs.tf: surface endpoint/region/bucket back to catalyst-api
for the deployment record (credentials NEVER returned)
products/catalyst/bootstrap/api/ (Go):
- internal/hetzner/objectstorage.go: NEW — minio-go/v7-based
ListBuckets validator. Distinguishes auth failure ("rejected") from
network failure ("unreachable") so the wizard renders the right
error card. NOT a parallel cloud-resource path — the existing
purge.go handles hcloud purge; objectstorage.go handles a separate
API surface (S3-compatible) that has no equivalent client today.
- internal/handler/credentials.go: extend with
ValidateObjectStorageCredentials handler — same wire shape
(200 valid:true / 200 valid:false / 503 unreachable / 400 bad
input) as the existing token validator so the wizard's failure-
card machinery handles both without per-endpoint switches.
- cmd/api/main.go: wire POST
/api/v1/credentials/object-storage/validate
- internal/provisioner/provisioner.go: extend Request with
ObjectStorageRegion/AccessKey/SecretKey/Bucket; Validate()
rejects empty/malformed values fail-fast at /api/v1/deployments
POST time; writeTfvars() emits the 4 new tfvars.
- internal/handler/deployments.go: derive bucket name from FQDN slug
pre-Validate (catalyst-<fqdn-with-dots-replaced-by-dashes>) so
Hetzner's globally-namespaced bucket pool gets a deterministic,
collision-resistant per-Sovereign name without operator input.
- internal/store/store.go: redact access/secret keys; preserve
region+bucket plain (they're public in tofu outputs anyway).
products/catalyst/bootstrap/ui/ (TypeScript / React):
- entities/deployment/model.ts + store.ts: 4 new wizard fields
(objectStorageRegion/AccessKey/SecretKey/Validated) with merge()
coercion for legacy persisted state.
- pages/wizard/steps/StepCredentials.tsx: ObjectStorageSection —
region picker (fsn1/nbg1/hel1), masked secret-key input,
Validate button gating Next. Same FailureCard taxonomy
(rejected/too-short/unreachable/network/parse/http) the existing
TokenSection uses, so the operator UX is consistent. Section
only renders when Hetzner is among chosen providers — non-Hetzner
Sovereigns skip Phase 0b until their own backing-store path lands.
- pages/wizard/steps/StepReview.tsx: include
objectStorageRegion/AccessKey/SecretKey in the
POST /v1/deployments payload (bucket derived server-side).
Tests:
- api: 7 new provisioner Validate tests (region/keys/bucket
required + RFC-compliant + valid-region acceptance), 5 handler
tests for the new endpoint (bad JSON / missing region / invalid
region / short keys), 4 hetzner/objectstorage_test.go tests
(endpoint composition + early input rejection), 1 handler test
for the bucket-name derivation. Existing tests updated to supply
the new required fields.
- ui: StepCredentials.test.tsx pre-populates objectStorageValidated
in beforeEach so the existing 11 SSH-section tests aren't gated
on Object Storage validation.
DoD: a fresh Sovereign provision results in a usable S3 endpoint URL +
access/secret keys available as a K8s Secret in the Sovereign's home
cluster (flux-system/hetzner-object-storage), ready for consumption by
Harbor + Velero charts via existingSecret references.
Closes#371.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(wbs): #371 done — Hetzner Object Storage Phase 0b shipped (#409)
Marks #371 done with the architectural rationale (hybrid Option A + B —
Hetzner exposes no Cloud API to mint S3 keys, so the wizard MUST capture
them; OpenTofu auto-provisions the bucket + cloud-init writes the
flux-system/hetzner-object-storage Secret with the canonical s3-* keys
Harbor + Velero consume).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bursty completion: #316 + #375 prose rows now reflect chart-released state
(was stale from earlier 'not deployed').
#379 first agent watchdog-killed (no work survived) — restarted with
tighter STAY-TIGHT brief modeled on the successful #378/#377/#375 patterns
(5-15 min wall time, smoke + close as duplicate if chart already published).
In flight (5): #371#376 #379-RESTART #380#381
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #408 merged at d2ada908. Blueprint-release run 25214747925 SUCCESS,
bp-openbao:1.2.0 published to GHCR with cosign signature + SBOM
attestation. Cluster overlay clusters/_template/bootstrap-kit/08-openbao.yaml
already wired with autoUnseal.enabled=true in the same PR.
Sovereign-impact deferred to Phase 8 — next omantel provision run.
Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Done (9): #316#338#370#373#375#377#378#387#392
In flight (5): #371#376#379#380#381
Bursty completion window — #316#373#375#377#378 all landed within ~10 min.
Sovereign-impact for chart-released/chart-verified items deferred to Phase 8.
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catalyst-curated auto-unseal pipeline for OpenBao on Hetzner Sovereigns
(no managed-KMS available). Selected **Option A — Shamir + cloud-init
seed** because:
- Hetzner has no managed-KMS service → Cloud-KMS auto-unseal (Option C)
is structurally unavailable.
- Transit-seal (Option B) requires a peer OpenBao cluster, only
applicable to multi-region tier-1; out of scope for single-region
omantel.
- Manual unseal (Option D) violates the "first sovereign-admin lands
on console.<sovereign-fqdn> ready to use" goal in
SOVEREIGN-PROVISIONING.md §5.
Architecture (per issue #316 spec + acceptance criteria 1-6):
1. Cloud-init on the control-plane node generates a 32-byte recovery
seed from /dev/urandom and writes it to a single-use K8s Secret
`openbao-recovery-seed` in the openbao namespace, with annotation
`openbao.openova.io/single-use: "true"`. Pre-creates the openbao
namespace to eliminate the race with Flux's HelmRelease apply.
2. bp-openbao chart v1.2.0 ships two new Helm post-install hooks:
- `templates/init-job.yaml` (hook weight 5): consumes the seed,
calls `bao operator init -recovery-shares=1 -recovery-threshold=1`,
persists the recovery key inside OpenBao's auto-unseal config,
deletes the seed Secret on success. Idempotent — re-runs detect
Initialized=true and exit 0.
- `templates/auth-bootstrap-job.yaml` (hook weight 10): enables
the Kubernetes auth method, mounts kv-v2 at `secret/`, writes
the `external-secrets-read` policy, binds the `external-secrets`
role to the ESO ServiceAccount in `external-secrets-system`.
3. `templates/auto-unseal-rbac.yaml` declares the least-privilege SA
+ Role + RoleBinding the Jobs need (Secret get/list/delete in the
openbao namespace; create/get/patch on the openbao-init-marker).
Also emits the permanent `system:auth-delegator` ClusterRoleBinding
bound to the OpenBao ServiceAccount so the Kubernetes auth method
can call tokenreviews.authentication.k8s.io.
4. Cluster overlay `clusters/_template/bootstrap-kit/08-openbao.yaml`
bumps version 1.1.1 → 1.2.0 and flips `autoUnseal.enabled: true`
per-Sovereign.
Per #402 lesson: skip-render pattern (`{{- if .Values.X }}{{ emit }}
{{- end }}`) used throughout — never `{{ fail }}`. Default `helm
template` render emits NOTHING new; opt-in via autoUnseal.enabled=true.
Acceptance criteria coverage:
1. Provision fresh Sovereign — cloud-init writes seed, Flux installs
bp-openbao 1.2.0, post-install Jobs run automatically. ✅
2. bp-openbao HR Ready=True without manual intervention — install
keeps `disableWait: true` (Helm Ready ≠ OpenBao initialised; the
init Job drives initialisation out-of-band on the same install). ✅
3. `bao status` shows Sealed=false, Initialized=true within 5 minutes
— init Job polls + retries up to 60×5s. ✅
4. ESO ClusterSecretStore vault-region1 reaches Status: Valid — the
auth-bootstrap Job binds the `external-secrets` role to ESO's SA
before the Job exits. ✅
5. Seed Secret deleted post-init — init Job deletes it via K8s API
after consuming. ✅
6. No openbao-root-token Secret in K8s — root token captured to
/tmp/.root-token in the Job pod's tmpfs only; never written to a
K8s Secret. The recovery key persists ONLY inside OpenBao's Raft
state (auto-unseal config). ✅
Tests:
- tests/auto-unseal-toggle.sh — 4 cases:
* default render → no auto-unseal artefacts (skip-render works)
* autoUnseal.enabled=true → both Jobs + correct hook weights
* kubernetesAuth.enabled=false → init Job only, no auth-bootstrap
* idempotency annotations present on all 5 hook objects
- tests/observability-toggle.sh — unchanged, all 3 cases green.
- helm lint . — clean.
Files:
- platform/openbao/chart/Chart.yaml — version 1.1.1 → 1.2.0
- platform/openbao/blueprint.yaml — version 1.1.1 → 1.2.0
- platform/openbao/chart/values.yaml — `autoUnseal.*` block
- platform/openbao/chart/templates/auto-unseal-rbac.yaml — new
- platform/openbao/chart/templates/init-job.yaml — new
- platform/openbao/chart/templates/auth-bootstrap-job.yaml — new
- platform/openbao/chart/tests/auto-unseal-toggle.sh — new
- platform/openbao/README.md — bootstrap procedure §2-3 expanded;
auto-unseal alternatives table added.
- clusters/_template/bootstrap-kit/08-openbao.yaml — chart 1.1.1 →
1.2.0, autoUnseal.enabled=true.
- infra/hetzner/cloudinit-control-plane.tftpl — seed-token block
inserted between ghcr-pull-secret apply and flux-bootstrap apply.
- docs/omantel-handover-wbs.md §9 — #316 ticked chart-released.
Canonical seam used: extended existing `platform/openbao/chart/` per
the anti-duplication rule. NO standalone scripts. NO bespoke Go cloud
calls. NO `{{ fail }}`. All knobs configurable via values.yaml per
INVIOLABLE-PRINCIPLES.md #4 (never hardcode).
Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bp-nats-jetstream:1.1.1 already published on GHCR. Helm template renders
8 kinds clean (StatefulSet replicas=3 per ADR-0001 §9.2 B5). Smoke install
on contabo `nats-smoke` ns reached 3/3 Ready in 33s; JetStream R=3 stream
created with leader+2 replica quorum; pub/sub round-trip verified.
Bootstrap-kit slot 07 already wired in `_template/`. No code change needed.
Same verify-and-close pattern as #378.
Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bp-keycloak:1.1.2 already published by blueprint-release run 25214143810
on commit a1bd5502 (digest sha256:c284c3dc...). Verified end-to-end:
- helm dependency build pulls bitnami/keycloak 25.2.0
- helm template (default values, no gateway.host) renders without error
(HTTPRoute skip-renders per #387/#402 pattern)
- helm install in disposable keycloak-smoke ns on contabo:
smoke-postgresql-0 + smoke-keycloak-0 reached Ready in ~2m39s
- /realms/master returns HTTP 200 in-cluster
- admin OIDC password-grant returned valid RS256 JWT access_token
- teardown clean (PVC + namespace deleted)
In-scope hygiene fix:
- clusters/otech.omani.works/bootstrap-kit/09-keycloak.yaml: add
values.gateway.host=auth.otech.omani.works (mirrors omantel overlay
authored under #387; otech overlay was authored before that and
would have shipped without an HTTPRoute on its Sovereign).
Wizard catalog already lists keycloak under layer:'bootstrap-kit'
(mandatory, auto-installed) — no UI work needed.
WBS §2 row 14 + §9 row #377 updated to chart-verified.
Closes#377
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#378 completed (chart-verified, closed as duplicate per agent finding).
#375 dispatched as next from queue to maintain 5-parallel.
In-flight now: #371#373#316#375#377 (5).
Done: #338#370#378#387#392 (5 of 24 minimal blueprints).
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0/2/3/4 fan-out at full 5-parallel:
- #371 RESUME (Hetzner OS credentials, in-worktree state)
- #373 NEW (cert-mgr-powerdns-webhook authoring)
- #316 NEW (OpenBao auto-unseal)
- #377 NEW (bp-keycloak install verification)
- #378 NEW (bp-crossplane install verification)
#370 promoted to done (unblocked + scope superseded by working wipe.go).
Class assignments updated; §9 status rows added.
Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrates every minimal-Sovereign-set blueprint chart from
networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute,
replacing the legacy Traefik-on-Sovereigns assumption with the canonical
Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2
correction note (#388).
The single per-Sovereign Gateway is added as additional documents in
the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml
(NOT a new top-level slot), since Cilium owns the GatewayClass. It
includes:
- Certificate `sovereign-wildcard-tls` requesting `*.${SOVEREIGN_FQDN}`
from `letsencrypt-dns01-prod` (cert-manager + #373 webhook)
- Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS
terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All
Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's
existing `templates/` directory):
| Blueprint | Host pattern | Backend port |
|---------------------|---------------------------------|--------------|
| bp-keycloak | auth.<sov> | 80 |
| bp-gitea | git.<sov> | 3000 |
| bp-openbao | bao.<sov> | 8200 |
| bp-grafana | grafana.<sov> | 80 |
| bp-harbor | registry.<sov> | 80 |
| bp-powerdns | pdns.<sov>/api (dual-mode) | 8081 |
| bp-catalyst-platform| console.<sov>, api.<sov> | 80, 8080 |
bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute
(Sovereign) simultaneously — the per-Sovereign overlay sets
`api.gateway.enabled=true` while leaving `api.enabled=true`. The
Ingress object is harmless on Cilium clusters with no Traefik. This
preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4.
bp-harbor flips `expose.type` from `ingress` to `clusterIP` in
platform/harbor/chart/values.yaml so the upstream chart no longer
emits its own Ingress; the HTTPRoute is the sole HTTP exposure.
TLS terminates at the Gateway (wildcard cert) rather than per-host
Certificates inside the chart.
bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by
.helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml,
which remain contabo-only legacy demo infra). The contabo path keeps
serving console.openova.io/sovereign via Traefik unchanged.
Bootstrap-kit slot updates (per-Sovereign hostname interpolation):
- 08-openbao.yaml → gateway.host: bao.${SOVEREIGN_FQDN}
- 09-keycloak.yaml → gateway.host: auth.${SOVEREIGN_FQDN}
- 10-gitea.yaml → gateway.host: gitea.${SOVEREIGN_FQDN}
- 11-powerdns.yaml → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true
- 19-harbor.yaml → gateway.host: registry.${SOVEREIGN_FQDN}
- 25-grafana.yaml → gateway.host: grafana.${SOVEREIGN_FQDN}
Server-side dry-run validation against the live Cilium Gateway API
CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway
+ Certificate apply cleanly via `kubectl apply --dry-run=server`.
Contabo unaffected: clusters/contabo-mkt/* not modified. The legacy
SME ingresses (console-nova, marketplace, admin, axon, talentmesh,
stalwart, ...) continue to serve via Traefik as before. powerdns
on contabo remains on the Ingress path (api.gateway.enabled defaults
to false at the chart level).
Closes#387.
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>