Commit Graph

229 Commits

Author SHA1 Message Date
e3mrah
7bd1821473
docs(wbs): Mermaid reflects ALL Phase-8a 2026-05-02 chart bug bash (#577)
Founder corrective: prior diagram missed:
- 9 chart bugs surfaced + fixed today (#549, #553, #561, #567-#571, #568)
- 3 still in flight (#562 cilium-operator gateway-controller race,
  #563 NS delegation + LB:53 + DNS-01 wildcard, #565 harbor CNPG)
- 12 chart bugs from prior session days (#474, #488, #489, #491, #492,
  #494, #503, #506, #508, #510, #519, #536, #538, #539, #340)

Adds Phase 0d · Phase-8a chart bug bash with all of them.

Edges: every fix gates the bp-* HR it makes possible on a fresh
Sovereign integration test. Edge from #563 (handover-URL DNS-01
wildcard chain) → #454 makes the actual gating relationship explicit:
without #563 there is no working `console.<sovereign>.omani.works`,
which means no Phase-8a gate met.

The diagram should now match what the founder sees actually failing
on otech22, not the chart-released optimism of an earlier draft.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-02 13:06:04 +04:00
e3mrah
dee2be5cc8
docs(wbs): Mermaid DAG shows actual Phase-8a dependency cascade (#559)
Per founder corrective: existing diagram missed the real blockers
surfaced during otech10..otech22 burns. The image-pull-through gap
(#557) and the cross-namespace secret gap (#543, #544) gate every
workload pull from a public registry — without them, Sovereign hits
DockerHub anonymous rate-limit on first provision and 30+ HRs are
ImagePullBackOff/CreateContainerConfigError.

Adds:
- Phase 0b · Image pull-through (#557 + #557B Sovereign-Harbor swap +
  #557C charts global.imageRegistry templating). Edges to NATS / Gitea
  / Harbor / Grafana / Loki / Mimir / PowerDNS / Crossplane /
  cert-manager-powerdns-webhook / Trivy / Kyverno / SPIRE / OpenBao
- Phase 0c · Cross-namespace secrets (#543 ghcr-pull Reflector + #544
  powerdns-api-credentials reflect). Edges to bp-catalyst-platform and
  bp-cert-manager-powerdns-webhook
- Phase 1 additions: #542 kubeconfig CP-IP fix and #547 helmwatch
  38-HR threshold both gate Phase 8a integration test
- Phase 0b → Phase 8b edge: post-handover Sovereign-Harbor swap is
  what makes "zero contabo dependency" DoD-met possible

WBS now reflects the cascade observed live, not the pre-Phase-8a model.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-02 12:45:11 +04:00
e3mrah
a6a3a9b3b1
docs(wbs): add §9b Phase-8a live iteration log (2026-05-01→05-02) (#555)
Per founder corrective: WBS hadn't been updated in 16h. The active
Phase-8a iteration is what's actually closing the integration-tested
gap, but the WBS still read as if Phase 8a hadn't started.

New §9b captures:
- 18 fixes landed in last 36h (#317, #340, #474, #487, #488, #489,
  #491, #492, #494, #503, #506, #508, #510, #519, #531/#532/#534/#535/
  #537, #536, #538, #539/#540, #542, #544, #547, #549, #553)
- Symptom → root cause → fix → PR per row, all linked to deployed SHAs
- Background agents in flight (#543 ghcr-pull Reflector, #548 dynadot
  ClusterIssuer)
- Risk Register status — R3 / R4 exercised + resolved, R2 / R5 / R7 /
  R8 still open

Updated as bugs land. The handover-state truth lives here, not in
Claude memory files.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-02 12:18:35 +04:00
e3mrah
1628a1b3aa
ci(preflight): GHCR auth for A+E + WBS tick — all 4 preflights done (#470)
First runs of preflight A (bootstrap-kit) and E (Keycloak) failed with the
same error: helm OCI pull from ghcr.io/openova-io/bp-* returning 401
'unauthorized: authentication required'. bp-* are PRIVATE GHCR packages.

#460's agent fixed it for B in c26fbcaf. #461's already had GHCR login.
This commit applies the same helm-registry-login pattern to A and E.

WBS state on main after this commit:
- done (35): all chart-level + #317 + #319 + #453 + 4 preflights
- wip (0)
- blocked (3): 454, 455, 456 (Phase-8 live runs, operator-driven)

The preflights' first runs ALREADY surfaced a real CI bug pattern that
would have hit Phase 8a — exactly what they're for.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 20:06:36 +04:00
e3mrah
a7a90619e5
docs(wbs): mark #461 done — preflight C cilium-httproute shipped (#469)
PR #465 merged at 48b73af6 ships
.github/workflows/preflight-cilium-httproute.yaml — Phase-8a Risk R3
preflight (Cilium Gateway HTTPRoute admission for bp-catalyst-platform
on kind). Update §9 status row from "in flight" to "done".

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 20:04:37 +04:00
e3mrah
4a7eb42d26
feat(ci): Phase-8a preflight E — Keycloak realm-import + kubectl OIDC client (closes #462) (#468)
Surfaces Risk R6 (docs/omantel-handover-wbs.md §9a — Keycloak
realm-import config-CLI bootstrap timing untested). bp-keycloak 1.2.0
ships a sovereign realm + a public kubectl OIDC client via the
upstream bitnami/keycloak chart's keycloakConfigCli post-install Helm
hook (issue #326); this workflow proves it actually wires up on a
clean cluster before we run it on a real Sovereign.

Workflow installs bp-keycloak 1.2.0 on a kind cluster (helm/kind-action
v1, kindest/node:v1.30.6 — same versions as test-bootstrap-kit), waits
for the keycloak StatefulSet to roll out, polls for the
keycloakConfigCli post-install Job by label
(app.kubernetes.io/component=keycloak-config-cli), waits for it to
Complete, port-forwards svc/keycloak and asserts:

  1. /realms/sovereign returns 200 (realm exists in Keycloak's DB).
  2. The kubectl OIDC client is provisioned with publicClient=true,
     redirectUris contains http://localhost:8000 (kubectl-oidc-login
     default), and the groups client scope is wired with the
     oidc-group-membership-mapper (the per-Sovereign k3s api-server's
     --oidc-groups-claim flag depends on this).

Acceptance per ticket: if the post-install Job fails, the workflow
summary captures Job logs + StatefulSet logs + cluster state via
GITHUB_STEP_SUMMARY so a failed run is debuggable without re-running.

Triggers are event-driven only per CLAUDE.md "every workflow MUST be
event-driven, NEVER scheduled" rule — push on the workflow file itself
plus workflow_dispatch for ad-hoc re-runs.

Closes #462.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 20:01:30 +04:00
e3mrah
abac00d8b3
feat(ci): Phase-8a preflight A — bootstrap-kit reconcile dry-run on kind (closes #459) (#467)
Surfaces Risk-register R4 (docs/omantel-handover-wbs.md §9a — bootstrap-kit
reconcile-chain order untested under load) before Phase 8a (#454) burns
Hetzner credit on test.omani.works.

New workflow .github/workflows/preflight-bootstrap-kit.yaml:
- kind v0.25.0 + kindest/node:v1.30.6
- Gateway API CRDs v1.2.0 standard channel
- Full Flux controller set (fluxcd/flux2/action@main + flux install)
- Mock Secrets: flux-system/object-storage, flux-system/cloud-credentials,
  flux-system/ghcr-pull
- Renders clusters/_template/bootstrap-kit/ with SOVEREIGN_FQDN_PLACEHOLDER
  + ${SOVEREIGN_FQDN} -> test-sov.example.com (matches test harness pattern
  in tests/e2e/bootstrap-kit/main_test.go:247)
- 30 x 30s HR poll loop, never-fail-fast (goal: surface ALL bugs, not stop
  at first)
- $GITHUB_STEP_SUMMARY emits Markdown table of every HR's terminal Ready
  condition + per-HR describe blocks for non-Ready + recent flux-system
  events + raw hrs.json artefact (14d retention)
- Event-driven only: push on self-edit + workflow_dispatch; no schedule:
  cron (per CLAUDE.md "every workflow MUST be event-driven")

Canonical seam reused (no duplication):
- kind setup + flux install pattern from .github/workflows/test-bootstrap-kit.yaml
- bootstrap-kit kustomization at clusters/_template/bootstrap-kit/ (the
  same overlay production Sovereigns consume; substitution shape mirrors
  tests/e2e/bootstrap-kit/main_test.go:247)
- event-driven shape per .github/workflows/check-vendor-coupling.yaml (#428)

Out of scope (sibling preflights):
- #460 Crossplane provider-hcloud Healthy probe
- #461 Cilium Gateway HTTPRoute admission
- #462 Keycloak realm-import

Validated: actionlint clean, YAML parses cleanly.

WBS row #459 in §9 updated: 🟡 in flight -> 🟢 done (workflow shipped).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 20:01:26 +04:00
e3mrah
56b7cdbb6d
docs(wbs): tick 21 — #453 done; 4 Phase-8a preflights dispatched; §13 cap rule corrected (#464)
Twice-corrected discipline rule per founder pushback at 15:55 UTC:
- Original 15:38 'max 1-2 agents' was over-correction
- Real rule: scope-based not count-based
- 'Min 3, max 5 in flight' from feedback_agent_orchestration_discipline.md
  still holds; what was wrong was dispatching out-of-scope work
- 4 agents in flight now: #459/#460/#461/#462 — all Phase-8a preflight
  de-risking against §9a Risk register

State on main after this commit:
- done (31): all minimal Sovereign blueprints + foundation + CI + Phase 6 +
  Phase 7 (#317 + #319 + #453 contract reconciliation)
- wip (4): 459, 460, 461, 462 (Phase-8a preflights, kind-cluster de-risking)
- blocked (3): 454, 455, 456 (Phase 8 operator-driven live runs)

DAG additions:
- New PRE subgraph 'Phase-8a preflight · de-risk before live run'
- Edges T459/T460/T461/T462 → T454 (preflights gate Phase 8a)
- §9 rows for #459-#462
- §13 rewritten with twice-corrected scope-not-count discipline

Co-authored-by: hatiyildiz <hatiyildiz@noreply.function-com>
2026-05-01 19:59:50 +04:00
e3mrah
18d59174d3
fix(catalyst-api): #317↔#319 contract — preserve slim deployment record post-handover for redirect (closes #453) (#458)
#317's FinaliseHandover deleted the deployment record entirely, which
meant #319's `AdoptedAt` field was dormant — the post-handover redirect
at console.openova.io/sovereign/<id> 404'd instead of 301-ing to
console.<sovereign-fqdn>.

Fix: replace `store.Delete(id)` at the end of FinaliseHandover with a
slim-record save via the new `Deployment.SlimForHandover(adoptedAt)`
seam. The slim shape retains:
  - id, sovereignFQDN, orgName, orgEmail, startedAt (audit-minimum)
  - AdoptedAt = now() (redirect contract from #319 PR #451)
  - Status: "adopted"
  - closed eventsCh + done channels

Operational fields are zeroed: Result/tofuState, kubeconfig hash, PDM
reservation token, error, credentials. Consistent with §0
minimum-retention principle.

Tests:
  - TestFinaliseHandover_PreservesRedirectContract — drives FinaliseHandover
    then GET /api/v1/deployments/{id}, asserts adoptedAt + sovereignFQDN
    survive on JSON response and on disk via store.Load round-trip
  - TestSlimForHandover (table-driven) — full-record + minimal-record
    transforms; asserts audit fields kept, redirect field set,
    operational fields zeroed, credentials zeroed, channels closed
  - TestSlimForHandover_StoreRecordRoundTrip — JSON encode/decode
    cross-Pod-restart guard
  - TestFinaliseHandover_FullFlow extended with slim-shape assertions

Anti-duplication: SlimForHandover lives next to other Deployment methods
in deployments.go (canonical seam). FinaliseHandover modifies the same
file referenced in the issue (handover.go); no parallel binary or
script.

WBS row #453 → done; class line T453 wip → done.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:52:58 +04:00
e3mrah
51e24ea3b8
docs(wbs): truthful rewrite — match real DoD; carve out post-omantel epic #320 (#457)
Per founder corrective 2026-05-01. Prior WBS over-promised by:
1. Treating chart-released and chart-verified as 'done' indistinguishable
   from DoD-met
2. Bundling epic #320 IAM access plane (#322-#326) as if part of omantel
   handover scope
3. Hiding the fact that ZERO of the 23 minimal blueprints have ever been
   reconciled together on a fresh Sovereign

Rewrite changes:
- §0 (NEW): Truth-of-state — explicit ladder chart-released → chart-verified
  → integration-tested → DoD-met. Today every 'done' ticket is at chart
  level; zero are integration-tested; zero are DoD-met.
- §1: explicit out-of-scope carve-out for epic #320
- §2: split chart-status from reconcile-chain-status; latter reads 
  unknown for all 23 (truthful)
- §4 DAG:
  * adds Phase 7 cleanup #453 (#317↔#319 contract reconciliation)
  * adds Phase 8a/8b/8c live-execution gates (#454/#455/#456)
  * adds 🎯 DoD-met gate node tied to #456
  * promotes T425 into Phase 4 (it was wrongly in SCAF subgraph as if it
    were sustainment work — it's the foundation for #383/#384)
  * keeps SCAF subgraph for genuine CI guardrails (#428/#438/#429/#430)
- §9: adds rows for #453/#454/#455/#456 explicitly bold + marks #324/#325
  as ⏸ parked per scope rewrite
- §9a (NEW): Risk register — 8 known gaps that will surface in Phase 8a
- §12 (NEW): What we are NOT doing now — scope discipline
- §13 (NEW): Agent-orchestration reset — max 1-2 agents on Phase-8
  follow-ups; NO capacity-fill on post-omantel scope until #456 closes

The 5 sequential steps to DoD-met are listed in §12. There are no
parallel-agent shortcuts past Phase 7. Phase 8 is operator-driven.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:41:37 +04:00
e3mrah
3a34969a2f
feat(catalyst+pdm): Sovereign self-decommission + post-handover redirect (closes #319) (#451)
Customer-side decommission UI + PDM release endpoints + Catalyst-Zero
redirect to console.<sovereign-fqdn> once handover is finalised.

Anti-duplication map (canonical seams reused, NOT duplicated):
  - catalyst-api wipe.go: existing wipe endpoint already drives PDM
    release + Hetzner purge + tofu destroy + local cleanup. The new
    DecommissionPage POSTs to the same endpoint with an optional
    backup-destination payload.
  - PDM Allocator.Release: child zone delete + parent-zone NS revert
    + allocation row delete already idempotent. The new sovereign-side
    POST /api/v1/release is a thin FQDN-shaped wrapper that splits at
    the first dot and delegates to Allocator.Release.
  - The orphan force-release path adds gates (X-Force-Release-Confirm
    header, 30-day grace, DNS-NXDOMAIN check) on top of the same seam.

Scope contract with #317 (handover finalisation): NOT touching
internal/handler/handover.go. AdoptedAt is a new contract field on
Deployment + store.Record that the redirect helper consumes; future
#317 enhancement will populate it before deletion.

Files:
  core/pool-domain-manager/internal/handler/release.go         (NEW)
  core/pool-domain-manager/internal/handler/release_test.go    (NEW)
  core/pool-domain-manager/internal/handler/handler.go         (route wiring)
  products/catalyst/bootstrap/api/internal/handler/deployments.go     (AdoptedAt field + State()/toRecord/fromRecord)
  products/catalyst/bootstrap/api/internal/handler/deployments_adopted_test.go (NEW)
  products/catalyst/bootstrap/api/internal/store/store.go      (AdoptedAt persistence)
  products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.tsx        (NEW)
  products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.test.tsx   (NEW)
  products/catalyst/bootstrap/ui/src/pages/sovereign/Dashboard.tsx    (Decommission link)
  products/catalyst/bootstrap/ui/src/app/router.tsx            (redirect + decom route)
  docs/omantel-handover-wbs.md                                 (T319 → done)

Tests: 13 new Go test cases + 5 new vitest cases all green. catalyst-
api + PDM full suites pass. Live execution against omantel deferred to
Phase 8 per ticket scope (no Dynadot/Hetzner exec here).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:27:18 +04:00
e3mrah
efedbb04af
docs(wbs): tick 20 — #324 + #325 dispatched (4 in flight while #319 finishes) (#450)
Filling capacity with the heavy IAM-epic tickets while #319 is still
running through its test-fix loops. Non-overlap matrix maintained:

- #319: PDM release + sovereign/Decommission + Dashboard + router + deployments + store
- #323: handler/user_access + UI admin/user-access
- #324: handler/bastion + internal/bastion/ + UI sovereign/BastionPage
- #325: handler/pod_exec + internal/podexec/ + UI admin/pod-console + asciinema → Object Storage

State on main after this commit:
- done (29)
- wip (4): 319, 323, 324, 325

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:18:14 +04:00
e3mrah
d50b1d73fd
docs(wbs): tick 19 — #326 done; #319 + #323 sole wip (#449)
Class line had stale T326 in wip — both #322 and #326 merged on main
(b6810c19 and 20b89607). State on main after this tick:
- done (29)
- wip (2): 319 (decommission, Phase 7), 323 (user-access editor)

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:12:07 +04:00
e3mrah
20b896070f
feat(bp-keycloak + infra): Sovereign K8s OIDC config for kubectl via per-Sovereign Keycloak realm (closes #326) (#448)
Wires the per-Sovereign K8s api-server's --oidc-* validator to the
per-Sovereign Keycloak realm so customer admins can authenticate
kubectl directly against their Sovereign — no static admin-kubeconfig
handoff, no rotated bearer-token exchange.

infra (cloud-init):
  - Add 6 --kube-apiserver-arg=oidc-* flags to the k3s install line in
    infra/hetzner/cloudinit-control-plane.tftpl. Issuer URL composed
    from sovereign_fqdn (https://auth.\${sovereign_fqdn}/realms/sovereign)
    per INVIOLABLE-PRINCIPLES #4 — never hardcoded. Username/groups
    prefixes scope OIDC subjects under "oidc:" so RoleBindings reference
    e.g. subjects[0].name=oidc:alice@org, distinct from local SAs/x509.

Canonical seam (anti-duplication rule, ADR-0001 §11.3):
  - The bp-keycloak chart already bundles bitnami/keycloak's
    keycloakConfigCli post-install Helm hook Job, which imports realms
    declared under values.keycloak.keycloakConfigCli.configuration. We
    enable the existing seam — no bespoke kubectl-exec realm-creation
    script, no custom Admin-API call from catalyst-api.

bp-keycloak chart (1.1.2 → 1.2.0):
  - Enable keycloakConfigCli + ship inline sovereign-realm.json with:
    realm "sovereign" (invariant per Sovereign — Keycloak resolves the
    issuer claim from the request hostname, so no per-FQDN realm
    rename), default groups sovereign-admins/-ops/-viewers, oidc-group
    -membership-mapper emitting "groups" claim, public OIDC client
    "kubectl" with localhost:8000 + OOB redirect URIs (kubectl-oidc
    -login defaults), publicClient=true (kubectl runs locally and
    cannot safely hold a secret), PKCE S256 enforced.
  - Bump version 1.1.2 → 1.2.0 (semver MINOR, additive shape).
  - Bump bootstrap-kit slot 09 in _template/, omantel.omani.works/,
    otech.omani.works/ to version: 1.2.0.
  - New chart test tests/oidc-kubectl-client.sh (4 cases) — all green.
  - Existing tests/observability-toggle.sh — still green.

Documentation:
  - Add §11 "kubectl OIDC for customer admins" runbook to
    docs/omantel-handover-wbs.md with one-time workstation setup
    (kubectl krew install oidc-login + config set-credentials),
    sovereign-admin RBAC binding (oidc:sovereign-admins → cluster
    -admin), and 401-debugging table mapping common symptoms to
    root causes.
  - Carve #326 out of §7 "Out of scope" — it is shipped.
  - Add §9 status row.

Validation:
  - grep -c 'oidc-issuer-url' infra/hetzner/cloudinit-control-plane.tftpl
    → 2 (comment + the actual flag in the curl line)
  - grep -c 'oidc-username-claim' → 2
  - helm template platform/keycloak/chart → renders post-install
    keycloak-config-cli Job + ConfigMap with kubectl client (3 hits
    on grep "kubectl"; 1 hit on "clientId": "kubectl")
  - bash scripts/check-vendor-coupling.sh → exit 0 (HARD-FAIL mode)
  - 4/4 oidc-kubectl-client gates green; 3/3 observability-toggle
    gates green

Out of scope (deferred to follow-up tickets):
  - Per-Sovereign user provisioning UI (#322, #323)
  - Refresh-token revocation on RoleBinding deletion (#324)
  - provider-kubernetes Crossplane ProviderConfig per Sovereign (#321)
  - omantel migration / Phase 8 live execution

NO catalyst-api or UI source files touched (those are #319/#322/#323
agents' territories per agent brief).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:07:52 +04:00
e3mrah
c1c5766706
docs(wbs): tick 18 — #322 UserAccess CRD released (PR #446, bp-crossplane-claims 1.1.0) (#447)
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:04:19 +04:00
e3mrah
7ea496ba64
docs(wbs): tick 17 — Phase 7 + IAM epic #320 dispatched (4 in flight) (#445)
State on main after this commit:
- done (27): all minimal Sovereign blueprints + foundation + CI guards + scaffolds + Phase 6 + #317 (handover finalisation server-side)
- wip (4): 319 (decommission), 322 (UserAccess CRD), 323 (user-access editor), 326 (kubectl OIDC)

Filling capacity while #319 finishes — IAM epic #320 sub-tickets dispatched
(322/323/326). #322 unblocks #323; #326 independent. Non-overlap matrix:
- 319: core/pool-domain-manager + UI sovereign-decommission + redirect
- 322: platform/crossplane-claims/ (CRD + Composition + ClusterRoles)
- 323: products/catalyst/bootstrap/api/internal/handler/user_access* + UI admin/user-access
- 326: infra/hetzner/cloudinit-control-plane.tftpl + platform/keycloak/chart/

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:59:20 +04:00
e3mrah
180a687eef
feat(catalyst-api): handover finalisation flow (closes #317) (#444)
Ship the server-side machinery for issue #317 — zero-Sovereign-footprint
retention. When bp-catalyst-platform.Ready=True on the new Sovereign,
the wizard / post-install hook calls /api/v1/handover/finalise/{id}
and Catalyst-Zero runs the 4-step finalisation:

  1. Emit final SSE event (`event: handover, data: {sovereignFqdn,
     consoleURL, finalisedAt}`) through the existing emitWatchEvent
     seam — the wizard's reducer picks it up without code change.
  2. Cancel the per-deployment helmwatch informer via a new
     helmwatch.Watcher.Cancel() method that wraps the existing
     watchCtx cancel func — same teardown path as the timeout branch,
     no new informer or goroutine.
  3. Walk the per-deployment OpenTofu workdir, base64-archive every
     regular file, POST to the new Sovereign's
     /api/v1/handover/tofu-archive endpoint. The new Sovereign's
     catalyst-api seals the blob into its OpenBao at
     `secret/catalyst/tofu-phase0-archive` (KV-v2). On 200 OK,
     Catalyst-Zero deletes /var/lib/catalyst/tofu/<sovereign>/.
  4. Delete the kubeconfig file + the deployment record JSON.

Receiver endpoint (POST /api/v1/handover/tofu-archive) lives on the
same catalyst-api binary; production Sovereigns set
CATALYST_OPENBAO_ADDR + CATALYST_OPENBAO_TOKEN and the receiver is
active. Catalyst-Zero leaves both unset so a misrouted POST returns
503 ("not handover target") instead of misbehaving.

Hetzner-token rotation (issue body step 4) is deferred to Crossplane
Provider rotation per #425 — catalyst-api never makes bespoke cloud-
API calls (docs/INVIOLABLE-PRINCIPLES.md #3). The operator-supplied
Phase-0 token is already GC'd from memory after writeTfvars.

Live execution against a real omantel cluster is deferred to Phase 8
(epic #369, scaffold #429). This PR ships code + tests only.

Anti-duplication audit (canonical seams used):
- internal/handler/handler.go (existing Handler) extended with
  3 new fields + 3 setter methods. No new Handler shape.
- internal/handler/deployments.go emitWatchEvent is the SSE emit
  seam — handover handler reuses it.
- internal/helmwatch/helmwatch.go Watcher gets Cancel() — extends
  existing struct, no parallel watcher.
- internal/openbao/ is the FIRST and ONLY OpenBao client (verified
  by grep: no prior internal/vault, internal/secrets/openbao, or
  similar package existed).
- internal/provisioner provides WorkDir for tofu workdir cleanup.
- internal/store provides Delete(id) for record removal.
- Receiver endpoint lives on the SAME binary; per-deployment file
  walking via filepath.Walk is stdlib, not a duplicated archive
  package.

Tests:
- 9 new handler-side cases (handover_test.go) — full flow, dry-run,
  receiver-failure-keeps-local-state, 404, no-OpenBao→503, OpenBao
  seal, validation errors, archive build, missing-dir empty.
- 4 new openbao package cases (client_test.go) — happy path,
  default mount, status error wrap, required-field validation.
- All existing tests still pass: handler, helmwatch, openbao,
  provisioner, store, jobs, dynadot, hetzner, k8scache, objectstorage.

WBS row #317🟢 done; DAG class line includes T317.

Out of scope (per ticket guardrails):
- No core/pool-domain-manager changes (#319's territory)
- No products/catalyst/bootstrap/ui changes (decommission UI is #319)
- No SME-namespace touch (ADR-0001 §9.4)
- No live Hetzner / Dynadot / OpenBao calls
- No vendor-name reintroduction; no schedule: cron triggers

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:48:29 +04:00
e3mrah
5d211fe249
docs(wbs): tick 16 — Phase 7 dispatched (#317 + #319 in flight) (#443)
State on main after this commit:
- done (26): all 23 minimal Sovereign blueprints + foundation (425) + CI (428,438) + Phase-8 scaffold (429) + Phase 6 gate (385) + sweeps (430)
- wip (2): 317 (handover finalisation, catalyst-api server-side), 319 (self-decommission UI + PDM release + console redirect)

Phase 6 #385 chart-verified at 73dc78a3 unblocked Phase 7. After #317/#319
land, Phase 8 omantel E2E execution path opens (live run via #429 spec).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:36:17 +04:00
e3mrah
73dc78a30a
feat(bp-catalyst-platform): single-blueprint verification (closes #385) (#442)
Verify bp-catalyst-platform:1.1.8 (the umbrella over 10 leaf bp-* deps —
cilium / cert-manager / flux / crossplane / sealed-secrets / spire /
nats-jetstream / openbao / keycloak / gitea) installs cleanly. This is
Phase 6 of #369 and the convergence point pulling from Phase 3-5
(gitea+keycloak+crossplane+harbor+grafana) and Phase 2a (TLS via the
powerdns webhook).

Verification (chart-only, contabo, ~25 min wall time):

* `helm dep build products/catalyst/chart/` — clean, all 10 OCI deps
  pulled from `oci://ghcr.io/openova-io`.
* `helm template` defaults render 259 docs / 36k+ lines clean — no
  HTTPRoute (skip-render without `ingress.hosts.console.host`/`api.host`
  per the #387/#402 if-host-emit pattern), legacy contabo Ingress
  templates excluded by `.helmignore` on Sovereign installs.
* With per-Sovereign overlay (sovereignFQDN + ingress.hosts.console.host
  + ingress.hosts.api.host) renders 261 docs incl. 2 HTTPRoutes:
  - catalyst-ui  → hostname console.<sov>, backend port 80
  - catalyst-api → hostname api.<sov>,    backend port 8080
  both attached to `cilium-gateway/kube-system` parentRef sectionName
  `https`.
* Server-side dry-run of catalyst-specific resources (api-deployment,
  api-service, ui-deployment, ui-service, httproute, api-deployments-pvc,
  api-cache-pvc) — all 8 accepted by API server.
* Smoke-install of catalyst-specific manifests in `catalyst-platform-smoke`
  ns on contabo:
  - catalyst-ui  Deployment 1/1 Ready in <30s
  - catalyst-api Deployment 1/1 Ready  18s (after stub
    `dynadot-api-credentials` + `ghcr-pull-secret` provided)
  - kubelet liveness/readiness HTTP 200 on `/healthz`
  - in-cluster curl http://catalyst-api.catalyst-platform-smoke.svc:8080/healthz
    → HTTP 200
  - both PVCs (catalyst-api-deployments 1Gi + catalyst-api-cache 5Gi)
    Bound on local-path StorageClass.
  Smoke torn down clean.

Per-Sovereign overlay drift check
---------------------------------
`clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml` ↔
`omantel.omani.works/` ↔ `otech.omani.works/` differ ONLY in literal
${SOVEREIGN_FQDN} substitution. No drift fix needed (in contrast to #381
grafana, which DID need a `gateway.host` retrofit on overlays).

helmwatch
---------
helmwatch is an in-process Go internal package inside catalyst-api
(`products/catalyst/bootstrap/api/internal/helmwatch/`) — NOT a separate
Deployment. Its readiness is exercised by api-deployment readiness via
the catalyst-api `/healthz` probe.

HTTPRoute admission
-------------------
Deferred to a real Sovereign run. contabo runs Traefik for the SME demo
(ADR-0001 §9.4 protected) and has no `cilium-gateway` Gateway, so the
HTTPRoute parentRef cannot be satisfied here. Phase 8 omantel E2E
(#429 scaffold) covers Gateway admission on the live Sovereign.

Sub-chart cluster-scoped CRD installs
-------------------------------------
The umbrella's 10 leaf bp-* deps install cluster-scoped CRDs (bp-cilium
ciliumnetworkpolicies, bp-spire ClusterSPIFFEID, bp-cert-manager
clusterissuers, bp-cnpg postgresql.cnpg.io, etc.) plus DaemonSets (CNI,
spire-agent). On contabo these are owned by the SME demo or unavailable;
installing the full umbrella here would either clobber SME (forbidden)
or fail on missing CRDs. Per Flux `dependsOn` chain, sub-charts install
FIRST on a Sovereign, then bp-catalyst-platform. Each sub-chart's
correctness is independently verified by sibling chart-verify tickets:

  - #376 bp-gitea            chart-verified
  - #377 bp-keycloak         chart-verified
  - #378 bp-crossplane       chart-verified
  - #382 bp-spire            chart-verified
  - #381 bp-grafana          chart-verified
  - #380 bp-trivy            chart-verified
  - #379 bp-kyverno          chart-verified
  - #375 bp-nats-jetstream   chart-verified
  - #383 bp-harbor           chart-released

Vendor-coupling guardrail
-------------------------
`bash scripts/check-vendor-coupling.sh` → exit 0, "no vendor-coupling
violations found across 4 scan path(s)".

Files touched
-------------
docs/omantel-handover-wbs.md only:
  - §2 row 23: bp-catalyst-platform marked chart-verified
  - §9 row #385: parked → 🟢 chart-verified with full verification
    evidence
  - DAG class line: T385 added to the `done` class

No chart edits — the existing 1.1.8 chart renders + smoke-installs
clean. No bootstrap-kit edits — overlays already match template modulo
${SOVEREIGN_FQDN}. No new files authored (anti-duplication rule).

Sovereign-impact deferred to Phase 7 handover machinery (#317 / #319)
and Phase 8 omantel E2E (#429 spec).

Closes #385.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:30:09 +04:00
e3mrah
f740a97aa9
docs(wbs): tick 15 — #438 done; #385 sole wip (#441)
State on main after this commit:
- done (25): all minimal Sovereign blueprints + foundation + #438
- wip (1): 385 (catalyst-platform single-blueprint verify, Phase 6 gate)

#438 merged at 87ba48c4 — vendor-coupling guardrail hard-fail mode now
auto-engaged on this repo.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:23:39 +04:00
e3mrah
feeabb63cb
docs(wbs): tick 14 — #383 done; #385 + #438 in flight (#439)
State on main after this commit:
- done (24): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,383,384,387,392,425,428,429,430
- wip (2): 385 (catalyst-platform single-blueprint verify, Phase 6 gate), 438 (CI guardrail path mode-gate fix)

#383 merged at 0511efbd. All 23 minimal Sovereign blueprints now
chart-released or chart-verified. Phase 6 → 7 → 8 path is open.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:21:42 +04:00
e3mrah
0511efbdac
feat(bp-harbor): vendor-agnostic Object Storage backend (closes #383) (#437)
Reworks bp-harbor to write blobs DIRECTLY to the cloud-provider's
native S3 endpoint (Hetzner Object Storage on Hetzner Sovereigns)
per ADR-0001 §13. Mirrors the post-#425 vendor-agnostic seam shipped
in bp-velero:1.2.0 (PR #435 / SHA 0172b9a8) 1:1.

Canonical seam used (per anti-duplication rule + docs/omantel-
handover-wbs.md §3a):
  - Sealed Secret name:   flux-system/object-storage  (NOT hetzner-prefixed)
  - Chart values block:   .Values.objectStorage.s3.{enabled,credentialsSecretName,s3.{accessKey,secretKey}}
  - Template filename:    templates/objectstorage-credentials.yaml
  - Reference impl:       platform/velero/chart/ (PR #435)

Chart changes (platform/harbor/chart/):
  - Chart.yaml: 1.0.0 → 1.1.0; description rewritten to emphasise
    cloud-direct architecture + remove SeaweedFS hard-dep claim.
  - values.yaml: REMOVED hardcoded SeaweedFS endpoint
    (http://seaweedfs-s3.seaweedfs.svc.cluster.local:8333) from
    persistence.imageChartStorage.s3.regionendpoint. Default
    type flipped to `filesystem` so contabo/dev render is clean.
    Added vendor-agnostic objectStorage block:
      objectStorage:
        enabled: false
        useExistingSecret: false
        credentialsSecretName: ""
        s3: { accessKey: "", secretKey: "" }
  - templates/objectstorage-credentials.yaml (NEW): synthesises a
    harbor-namespace Secret with REGISTRY_STORAGE_S3_ACCESSKEY +
    REGISTRY_STORAGE_S3_SECRETKEY keys (the upstream chart's
    persistence.imageChartStorage.s3.existingSecret consumption
    shape — envFrom on the registry pod). Skip-render branch
    when objectStorage.enabled=false (default).
  - templates/_helpers.tpl: added bp-harbor.objectStorageCredentialsSecretName
    helper.
  - templates/networkpolicy.yaml: egress rule retargeted from
    SeaweedFS service-namespace selector → external HTTPS:443
    (works for any cloud-native S3 endpoint without vendor coupling).
    Gated on `.Values.objectStorage.enabled`. Removed
    seaweedfsNamespace + seaweedfsS3Port overlay keys.

Per-Sovereign overlays (clusters/{_template,omantel,otech}/bootstrap-
kit/19-harbor.yaml):
  - Chart version reference bumped 1.0.0 → 1.1.0.
  - dependsOn: bp-seaweedfs REMOVED. New dependsOn = bp-cnpg + bp-cert-manager.
  - Added valuesFrom block mapping the 5 keys of flux-system/object-
    storage Secret:
      s3-bucket     → harbor.persistence.imageChartStorage.s3.bucket
      s3-region     → harbor.persistence.imageChartStorage.s3.region
      s3-endpoint   → harbor.persistence.imageChartStorage.s3.regionendpoint
      s3-access-key → objectStorage.s3.accessKey
      s3-secret-key → objectStorage.s3.secretKey
  - Inline values flip objectStorage.enabled=true,
    harbor.persistence.imageChartStorage.type=s3, and
    harbor.persistence.imageChartStorage.s3.existingSecret=harbor-
    objectstorage-credentials.

UI catalog (products/catalyst/bootstrap/ui/src/shared/constants/components.ts):
  - Harbor's `dependencies` array drops `seaweedfs`. Now ['cnpg', 'valkey'].

Validation:
  helm template default render →
    1448 lines, 5 Secrets (Harbor internal: core/jobservice/registry/
    registry-htpasswd/database — NO objectstorage-credentials),
    type=filesystem, 0 SeaweedFS references.
  helm template overlay render with objectStorage.enabled=true +
  type=s3 + bucket=omantel-harbor + region=fsn1 +
  regionendpoint=https://fsn1.your-objectstorage.com +
  existingSecret=harbor-objectstorage-credentials →
    1452 lines, 6 Secrets (5 internal + 1 objectstorage-credentials),
    type=s3 with Hetzner endpoint, registry pod envFrom wired to the
    new Secret, 0 SeaweedFS references.
  scripts/check-vendor-coupling.sh → exit 0 (no violations across
    platform/, clusters/, products/catalyst/bootstrap/{api,ui}/).
  helm lint → 0 failures.

WBS:
  §2 row 18 → 🟢 chart-released (#383).
  §9 #383 row → 🟢 chart-released narrative.
  §6 DAG: T383 moved from `class blocked` → `class done`.

Hetzner-S3 E2E deferred to Phase 8 (first omantel run).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:18:37 +04:00
e3mrah
512639a1aa
docs(wbs): tick 13 — #425 done; #383 in flight on new shape (#436)
State on main after this commit:
- done (23): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,384,387,392,425,428,429,430
- wip (1): 383 (Harbor chart rework on post-#425 vendor-agnostic shape)

#425 merged at 0172b9a8 — vendor-agnostic Object Storage abstraction +
OpenTofu→Crossplane handover. #383 unblocked + dispatched against the
new shape (objectStorage.s3.* / flux-system/object-storage).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:07:17 +04:00
e3mrah
0172b9a89a
wip(#425): vendor-agnostic OS rename — partial (rate-limited mid-run) (#435)
Files staged from prior agent run before rate-limit. Re-dispatch will
verify, complete missing pieces (Crossplane Provider+ProviderConfig in
cloud-init, grep-zero acceptance, helm/go test runs, WBS row update),
and finalise the PR.

Includes:
- platform/velero/chart/templates/{hetzner-credentials-secret -> objectstorage-credentials}.yaml
- platform/velero/chart/values.yaml (objectStorage.s3.* block)
- platform/velero/chart/Chart.yaml (1.1.0 -> 1.2.0)
- products/catalyst/bootstrap/api/internal/objectstorage/ (NEW package)
- internal/hetzner/objectstorage{,_test}.go DELETED
- credentials handler + StepCredentials.tsx renamed
- infra/hetzner/{main.tf,variables.tf,cloudinit-control-plane.tftpl}
- clusters/{_template,omantel.omani.works,otech.omani.works}/bootstrap-kit/34-velero.yaml
- platform/seaweedfs/* (out-of-scope drift — re-dispatch will revert if not part of #425)

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 18:05:19 +04:00
e3mrah
11afb27e95
docs(wbs): tick 12 — #374/#428/#429/#430 done; SCAF subgraph + click directives (#434)
State on main after this commit:
- done (22): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,384,387,392,428,429,430
- wip (1): 425 (vendor-agnostic OS + Tofu→Crossplane handover)
- blocked (1): 383 (gates on #425)

Adds new SCAF (sustainment/scaffolding/cross-cutting) subgraph carrying
T425/T428/T429/T430 + cross-cutting edges: T425→T383, T425→T428, T429→P8.
§9 rows added for #428 (CI guardrail merged) + #430 (audit-only).
T374 moves wip → done after PR #433 (NS-delegation wizard step) merged.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:59:28 +04:00
e3mrah
6e7a878b1c
feat(catalyst): NS delegation wizard step (closes #374) (#433)
Adds the post-handover wizard step that delegates the parent zone (e.g.
omani.works) to the new Sovereign's PowerDNS, plus a light catalyst-api
stub for live execution in Phase 8.

Wizard (UI):
- New StepNSDelegation slotted as terminal post-handover step (after
  StepSuccess) so the LB IP is in hand before we ask the operator to
  delegate.
- Default mode: emit-runbook only. Renders the exact set_dns2 curl
  command with add_dns_to_current_setting=yes (record-preserving) for
  copy-paste. NEVER embeds the API key — operator exports
  $DYNADOT_API_KEY in their shell.
- Auto-apply mode: gated behind a toggle + double-confirm field
  matching the parent zone. Defaults OFF. POSTs to a stub
  /api/v1/dns/parent-zone/delegate which is 501 today; the wizard
  surfaces a "Phase 8" hint instead of a generic error.
- Memory rule honoured: NO live set_dns2 call reachable on a normal
  wizard flow without explicit operator double-confirm.
- 17 new vitest cases (helper + render + auto-apply gating + 501
  stub-aware error) all green.

Catalyst-API (Go):
- Extends existing internal/dynadot package (canonical seam — no new
  package, no PDM source touched).
- New Client.AddNSDelegation(parentZone, sovereignFQDN, lbIP, extraNS)
  writes 3 NS + 1 glue A record using add_dns_to_current_setting=yes.
  Fail-closed via IsManagedDomain gate (refuses to call the API for an
  unmanaged zone).
- New pure BuildNSDelegationRunbook helper that mirrors the JSX-side
  buildDynadotRunbookCommand so wizard and API emit the same shape.
- 6 new test cases (happy path / unmanaged-zone refusal / table-driven
  validation / custom NS hosts / runbook builder) all green.

Per ticket #374 scope: wizard step + emitted runbook + light stub;
live execution deferred to Phase 8 of the omantel handover WBS. WBS
row updated to wizard-shipped state.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:53:41 +04:00
e3mrah
1e7d1e67c9
test(e2e): omantel handover Playwright scaffold for Phase 8 (closes #429) (#432)
Phase 8 of the omantel handover (#369) needs an automated E2E that proves
DoD: omantel.omani.works runs as a fully self-sufficient Sovereign with
zero contabo dependency post-handover. Today this is a SCAFFOLD — when
Phase 4/6/7 land, dispatching the new workflow against a live omantel is
the entire Phase 8.

Canonical seam (anti-duplication, per memory/feedback_anti_duplication_seam_first.md):
  - tests/e2e/playwright/tests/  ← mirror of sovereign-wizard.spec.ts shape
    (NOT specs/ as the issue body said — actual repo path is tests/)
  - tests/e2e/playwright/playwright.config.ts (BASE_URL handling, retries,
    workers=1, reporter=list) — reused as-is
  - tests/e2e/playwright/tests/_helpers.ts:reachable() — reused for the
    pre-flight skip-when-unreachable pattern
  - .github/workflows/playwright-smoke.yaml — workflow shape (checkout v4,
    setup-node v4, npm install, playwright install --with-deps chromium,
    upload-artifact on failure) — mirrored, NOT duplicated

What ships:
  - tests/e2e/playwright/tests/omantel-handover.spec.ts (NEW, 6 tests):
      1. sovereign Ready + 23/23 blueprints
      2. all bp-* HelmReleases Ready=True
      3. catalyst-platform self-hosts (healthz + dashboard "23 / 23 ready")
      4. vendor-agnostic Object Storage (post-#425 canonical secret name
         flux-system/object-storage — NOT hetzner-object-storage)
      5. dig +trace omantel.omani.works ends at omantel NS, not contabo
      6. zero contabo dependency (omantel /api/healthz keeps returning 200)
    Self-skips when OMANTEL_BASE_URL/OMANTEL_API_BASE/OPERATOR_BEARER unset.

  - .github/workflows/omantel-e2e-handover.yaml (NEW):
    workflow_dispatch ONLY (no schedule cron — per CLAUDE.md "every workflow
    MUST be event-driven, NEVER scheduled"). Inputs let the operator override
    base URLs at dispatch time.

  - docs/omantel-handover-wbs.md:
    new §10 "Phase 8 acceptance criteria (executable DoD)" — 6 bullets 1:1
    with the spec test() blocks; §9 status row added for #429
    (🟢 scaffold-shipped).

Local verification:
  cd tests/e2e/playwright && npm install && \
    npx playwright test --list tests/omantel-handover.spec.ts
  → 6 tests listed cleanly
  npx playwright test tests/omantel-handover.spec.ts
  → 6 skipped (env vars unset, expected)

Out of scope (per #425 / #428 territory split):
  - internal/hetzner/, infra/hetzner/, platform/velero/chart/,
    clusters/.../34-velero.yaml — #425's vendor-agnostic sweep
  - .github/workflows/check-vendor-coupling.yaml — #428's coupling guard

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:52:18 +04:00
e3mrah
095433ee55
docs(wbs): tick 11 — #331 done, #383 paused on #425, #425 dispatched, §3a vendor-agnostic rule (#427)
State:
- done (18): 316,327,331,338,370,371,373,375,376,377,378,379,380,381,382,384,387,392
- wip   (2): 374 (re-dispatching after watchdog kill), 425 (vendor-agnostic rename + Tofu→Crossplane handover)
- blocked (1): 383 (paused on #425; first agent stopped before any commits — no work lost)

Adds §3a — vendor-agnostic provider abstraction architecture rule:
  every cloud-provider capability consumed by Sovereign blueprints through a
  capability-named seam (objectStorage, dns, cloud, smtp, tls), provider name
  only appears in infra/<provider>/ Tofu module path + Crossplane Provider CR.
  OpenTofu → Crossplane handover formalised: Tofu Phase-0 emits both canonical
  Secret AND Crossplane Provider+ProviderConfig; Day-2 = XRC writes only.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:39:01 +04:00
e3mrah
92b7db622d
fix(bp-external-secrets-stores): split ClusterSecretStore into separate chart per #247 pattern (closes #331) (#426)
* fix(bp-external-secrets): split ClusterSecretStore into bp-external-secrets-stores chart (resolves CRD ordering, closes #331)

bp-external-secrets@1.0.0 deadlocked on first install on otech.omani.works:

  Helm install failed for release external-secrets-system/external-secrets
  with chart bp-external-secrets@1.0.0:
  failed post-install: unable to build kubernetes object for deleting hook
  bp-external-secrets/templates/clustersecretstore-vault-region1.yaml:
  resource mapping not found for name: "vault-region1" namespace: ""
  no matches for kind "ClusterSecretStore" in version "external-secrets.io/v1beta1"

Root cause: Helm's `helm.sh/hook-delete-policy: before-hook-creation` ran
a kubectl-style lookup of the existing ClusterSecretStore CR before the
upstream `external-secrets` subchart's CRDs finished registration. The
in-line ClusterSecretStore template (templates/clustersecretstore-vault-
region1.yaml) and the upstream subchart's CRDs co-installed in the same
release; admission ordering wasn't deterministic enough to make the
post-install hook safe.

Fix — same pattern as PR #247 (bp-crossplane@1.1.3 ↔ bp-crossplane-claims@1.0.0):
split the chart into controller + stores. Flux dependsOn orders them.

  - bp-external-secrets@1.1.0 — controller-only (just upstream subchart
    + NetworkPolicy + ServiceMonitor toggle). CRDs register here.
  - bp-external-secrets-stores@1.0.0 (NEW) — the default
    ClusterSecretStore CR; depends on bp-external-secrets being Ready.
    No Helm hooks needed: by the time this chart's HelmRelease starts,
    Flux has already verified bp-external-secrets is Ready=True and
    therefore the CRDs are registered.

Files:
  NEW: platform/external-secrets-stores/blueprint.yaml             (1.0.0)
  NEW: platform/external-secrets-stores/chart/Chart.yaml           (1.0.0; no upstream subchart, annotation `catalyst.openova.io/no-upstream: "true"`)
  NEW: platform/external-secrets-stores/chart/values.yaml          (clusterSecretStore.* knobs moved from controller chart)
  MOVED: platform/external-secrets/chart/templates/clustersecretstore-vault-region1.yaml
       → platform/external-secrets-stores/chart/templates/clustersecretstore-vault-region1.yaml
       (Helm hook annotations removed — Flux dependsOn now handles ordering)
  TOUCHED: platform/external-secrets/chart/Chart.yaml              (1.0.0 → 1.1.0; description note appended)
  TOUCHED: platform/external-secrets/blueprint.yaml                (1.0.0 → 1.1.0)
  TOUCHED: platform/external-secrets/chart/values.yaml             (clusterSecretStore block removed; pointer comment added)
  NEW: clusters/_template/bootstrap-kit/15a-external-secrets-stores.yaml
       (Flux HelmRelease, dependsOn: [bp-external-secrets, bp-openbao])
  TOUCHED: clusters/_template/bootstrap-kit/15-external-secrets.yaml
       (chart version 1.0.0 → 1.1.0)
  TOUCHED: clusters/_template/bootstrap-kit/kustomization.yaml
       (slot 15a inserted after 15)

Out of scope for this PR (separate tickets):
  - blueprint-release.yaml CI fan-out: verify the path-matrix picks up
    the new platform/external-secrets-stores/ directory automatically;
    if not, add the directory to the matrix in a follow-up.
  - Per-Sovereign cluster directory edits (#257 will delete those).
  - Phase 0 minimum trim (#310 will renumber slots; this PR uses 15a as
    a non-disruptive sub-slot insertion that works with both the current
    35-slot kustomization and the eventual 15-slot canonical layout —
    when #310 renumbers, 15 + 15a become 08 + 09 in the canonical order).

Refs: #331 (this issue), #247 (pattern reference — bp-crossplane split),

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): register bp-external-secrets-stores in expected-bootstrap-deps.yaml

The dependency-graph-audit CI step rejected PR #334 because the new
bp-external-secrets-stores HR was on disk at slot 15a but missing from
the expected DAG. This commit adds it with the same dependsOn shape as
clusters/_template/bootstrap-kit/15a-external-secrets-stores.yaml:
[bp-external-secrets, bp-openbao].

Refs: #331, #310 (Phase 0 minimum), PR #334.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bp-external-secrets): retire CR cases from controller test, add stores-toggle (#331)

After splitting the default ClusterSecretStore into bp-external-secrets-stores
@1.0.0, the controller chart's observability-toggle integration test still
expected the CR to render in the controller chart (Cases 4 + 5). Those
assertions now belong on the new chart.

Changes:
  - platform/external-secrets/chart/tests/observability-toggle.sh:
    Replace Cases 4+5 with a single inverted assertion — the controller
    chart MUST render ZERO ClusterSecretStore CRs (top-level kind:); only
    the upstream subchart's CRD definition (whose spec.names.kind value is
    "ClusterSecretStore" at non-zero indent) is allowed.
  - platform/external-secrets-stores/chart/tests/clustersecretstore-toggle.sh:
    NEW. Mirrors the retired Cases 4+5 against the stores chart, plus a
    Case 3 that asserts clusterSecretStore.server overrides propagate.

Local smoke:
  bash platform/external-secrets/chart/tests/observability-toggle.sh         → 4/4 PASS
  bash platform/external-secrets-stores/chart/tests/clustersecretstore-toggle.sh → 3/3 PASS

Refs: #331, PR #334.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): handle alphanumeric sub-slot suffixes in check-bootstrap-deps.sh

PR #334 (issue #331) added slot 15a-external-secrets-stores as a sub-slot
between numeric slots 15 and 16. The bootstrap-deps audit script's
`printf '%02d'` formatter rejected `15a` with:

  scripts/check-bootstrap-deps.sh: line 390: printf: 15a: invalid number

Fix: detect non-numeric slot tokens and pass them through verbatim. Numeric
slots still render as zero-padded `01..49` for output alignment.

Local smoke:
  $ bash scripts/check-bootstrap-deps.sh
  ...
    [P] slot 15  bp-external-secrets        <-- bp-cert-manager bp-openbao
    [P] slot 15a bp-external-secrets-stores <-- bp-external-secrets bp-openbao
  ...
  OK: bootstrap-kit dependency graph audit PASSED

Refs: #331, PR #334.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(wbs): tick #331 chart-released

bp-external-secrets@1.1.0 (controller-only) + bp-external-secrets-stores@1.0.0
(NEW) shipped in PR #426. Helm-template acceptance + both toggle tests +
dependency-graph-audit all green. Sovereign-impact deferred to Phase 8.

Refs: #331, PR #426.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:33:47 +04:00
e3mrah
f7796ef807
feat(bp-velero): Hetzner Object Storage backend wiring (closes #384) (#423)
* feat(bp-velero): Hetzner Object Storage backend wiring (closes #384)

Velero on a Hetzner Sovereign now writes its backups DIRECTLY to Hetzner
Object Storage per ADR-0001 §13 (S3-aware app architecture rule) +
docs/omantel-handover-wbs.md §3 — NOT SeaweedFS, which is reserved as a
POSIX→S3 buffer for legacy POSIX-only writers and is not in the minimal
Sovereign set.

Mirrors the Hetzner-direct backend pattern Agent #383 is wiring for
Harbor; both consume the canonical flux-system/hetzner-object-storage
Secret shipped by issue #371 (cloud-init writes 5 keys: s3-endpoint /
s3-region / s3-bucket / s3-access-key / s3-secret-key, derived from
the operator-issued Hetzner-Console keys + the per-Sovereign bucket
provisioned by OpenTofu's aminueza/minio resource).

platform/velero/chart/ (umbrella chart, bumped to 1.1.0):
  - templates/_helpers.tpl: NEW — bp-velero.fullname / bp-velero.labels
    helpers + bp-velero.hetznerCredentialsSecretName (default
    `velero-hetzner-credentials`).
  - templates/hetzner-credentials-secret.yaml: NEW — synthesises a
    velero-namespace Secret with a single `cloud` key in AWS-CLI INI
    format from .Values.veleroOverlay.hetzner.s3.{accessKey,secretKey}.
    The upstream Velero deployment mounts this at /credentials/cloud
    via existingSecret + AWS_SHARED_CREDENTIALS_FILE. Skip-render path
    when veleroOverlay.hetzner.enabled is false (default — keeps
    contabo render clean) or useExistingSecret is true (operator
    supplied Secret out-of-band).
  - values.yaml: BSL provider/region/s3Url/bucket fields populated as
    placeholders the per-Sovereign HelmRelease overrides via Flux
    valuesFrom; backupsEnabled defaults FALSE so default render emits
    no half-broken BSL; veleroOverlay.hetzner block surfaces the
    operator-overridable fields. Long-form rationale comments inline
    on each value per the chart's existing docstring style.

clusters/_template/bootstrap-kit/34-velero.yaml (+ omantel + otech):
  - dependsOn: bp-seaweedfs REMOVED — Velero is no longer a SeaweedFS
    consumer on Sovereigns (was the old SeaweedFS-tiered architecture
    that minimal-omantel retired in favour of cloud-native S3).
  - chart version bumped 1.0.0 → 1.1.0.
  - valuesFrom block added: 5 Secret-key entries pull each canonical
    s3-* key into the matching umbrella value path. Plaintext
    credentials never appear in the committed manifest; Flux
    dereferences valuesFrom at HelmRelease apply time.
  - values block adds the baseline veleroOverlay.hetzner.enabled=true
    + velero.credentials.{useSecret:true,existingSecret:velero-hetzner-
    credentials} + BSL provider/credential/s3ForcePathStyle scaffolding
    that the valuesFrom entries fill in.

docs/omantel-handover-wbs.md:
  - §2 row 19: " chart needs S3 endpoint rework" → "🟢 chart-released
    v1.1.0 — Hetzner Object Storage backend wired to #371 secret".
  - §9 #384 row: detailed status with smoke evidence.

Smoke evidence (contabo, default values — no Hetzner credentials):
  - helm template t . → renders cleanly (no Hetzner Secret, no BSL).
  - helm template t . --set veleroOverlay.hetzner.enabled=true \
      --set ...accessKey=AK_TEST --set ...secretKey=SK_TEST \
      --set velero.backupsEnabled=true (+ BSL config) →
      Secret/velero-hetzner-credentials with `cloud` INI key emitted +
      BackupStorageLocation/default with provider=aws,
      bucket=omantel-velero, region=fsn1,
      s3Url=https://fsn1.your-objectstorage.com.
  - helm install velero-smoke . -n velero-smoke (defaults) → pod
    velero-69bb84c5-669sh Ready 1/1 in 48s. Smoke torn down clean.

Hetzner-S3 E2E deferred to Phase 8 (first omantel run) — contabo has
no Hetzner Object Storage credentials so end-to-end backup→restore
verification can't run here.

Anti-duplication rule: NO bash scripts authored, NO parallel
implementations of upstream Velero functionality. Upstream Velero +
velero-plugin-for-aws natively support any S3-compatible backend; the
work here is values + a credential-shape adapter Secret, not a fork.

Closes #384.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): drop bp-seaweedfs dep from bp-velero expected DAG (#384)

Mirrors the dependsOn removal in clusters/_template/bootstrap-kit/34-
velero.yaml from the parent commit. Velero on Hetzner Sovereigns now
writes directly to Hetzner Object Storage (ADR-0001 §13 + WBS §3); no
in-cluster prerequisite Blueprint is required.

Local `bash scripts/check-bootstrap-deps.sh` now passes (0 drift,
0 cycles). The CI failure on the parent commit's PR was the audit
flagging bp-velero as having a missing edge to bp-seaweedfs because
this expected-DAG file still listed it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:24:44 +04:00
e3mrah
a853a653a3
docs(wbs): tick 10 — 16 done (incl. #327); #331/#374 dispatched (#424)
Done (16): 316,327,338,370,371,373,375,376,377,378,379,380,381,382,387,392
Wip  (4):  331 (ESO split), 374 (NS delegation), 383 (Harbor S3), 384 (Velero S3)

#327 PR merged 511e96de — bp-crossplane-claims event-driven HR install.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:23:09 +04:00
e3mrah
47898ca59f
docs(wbs): tick 9 — 15 done (incl. #382); #383/#384 dispatched (#422)
DAG class lines updated to reflect reality on main:
- done (15): 316,338,370,371,373,375,376,377,378,379,380,381,382,387,392
- wip (2):   383 (Harbor → Hetzner S3 rework), 384 (Velero → Hetzner S3)

§9 status table rows for #383/#384 marked 'in flight' with worktree paths.

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 17:16:27 +04:00
e3mrah
5b6d854837
docs(wbs): tick #382 — bp-spire chart-verified (smoke OK on contabo) (#421)
bp-spire:1.1.4 already published on GHCR (32 versions cumulative).
Smoke install in `spire-smoke` ns on contabo:
- server-0 reached 2/2 Ready in ~30s
- agent DaemonSet reached 1/1 Ready in ~70s
- k8s_psat agent attestation succeeded (server log confirms
  AttestAgent for spiffe://catalyst.local/spire/agent/k8s_psat/...)
- 3 CRDs (clusterspiffeids/clusterstaticentries/clusterfederated
  trustdomains) registered cleanly via spire-crds subchart
- helm template renders 50 resources clean
- Smoke torn down clean

Bootstrap-kit slot 06 wired in `_template/`, `omantel.omani.works/`,
`otech.omani.works/` — overlays clean (only ${SOVEREIGN_FQDN}
substitution diff). dependsOn: bp-cert-manager, disableWait: true.

No code change required — this PR ticks WBS only.

Closes #382

Co-authored-by: hatiyildiz <hatice@openova.io>
2026-05-01 17:14:30 +04:00
e3mrah
ab636a64f1
docs(wbs): bp-trivy chart-verified on contabo (#380) (#420)
bp-trivy:1.0.0 already published; smoke install on contabo (trivy-smoke
ns) reached operator Ready in ~30s, log4shell-vulnerable-app test
Deployment yielded VulnerabilityReport with 386 CVEs (15 CRITICAL / 74
HIGH) including the target CVE-2021-44228 (log4shell) on log4j-core
2.14.1 flagged CRITICAL. Bootstrap-kit slot 30 wired in _template/,
omantel.omani.works/, otech.omani.works/. Smoke torn down clean.

Closes #380.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:03 +04:00
e3mrah
ef57a28165
docs(wbs): #379 bp-kyverno chart-verified — smoke OK on contabo, close as duplicate (#419)
bp-kyverno:1.0.0 (digest sha256:16edc78e…) was already published on GHCR
on 2026-04-30. The chart is correct for the minimal-Sovereign use case —
confirmed via smoke install on contabo.

Smoke evidence:
- helm template renders 80 resources clean (22 CRDs, 4 controller
  Deployments, 5 Pods, 6 Services, ServiceAccounts, ClusterRoles, etc.)
- helm install in kyverno-smoke ns: all 4 controllers (admission,
  background, cleanup, reports) reached 1/1 Ready in 81s
- ClusterPolicy 'disallow :latest' admission denial verified end-to-end:
  - nginx:latest BLOCKED with 'admission webhook "validate.kyverno.svc-fail"
    denied the request'
  - nginx:1.27-alpine admitted normally
- Smoke torn down clean (release uninstalled, namespaces deleted,
  no leftover CRDs)

Bootstrap-kit slot 27-kyverno.yaml is already wired in _template/,
omantel.omani.works/, and otech.omani.works/ — all overlays clean
(only ${SOVEREIGN_FQDN} sovereign-label substitution diff).

WBS §2 row 20 + §9 row #379 updated to chart-verified. Class moves from
wip to done in the §6 Mermaid graph.

Sovereign-impact (running on omantel cluster) deferred to Phase 8 per
ADR-0001 §9.4.

Closes #379

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:07:13 +04:00
e3mrah
b3383557eb
feat(bp-gitea): chart-verified on contabo (#376) (#417)
bp-gitea:1.1.2 already published; smoke-installed in `gitea-smoke` ns on
contabo, both pods Ready in ~2m38s, /api/v1/version returns 1.22.3 (HTTP
200), admin auth verified. Smoke torn down clean.

In-scope hygiene fix to clusters/otech.omani.works/bootstrap-kit/10-gitea.yaml
— replaces stale upstream `ingress.hosts[]` overlay with the
post-#387/#402 `gateway.host` shape so otech matches the _template/ and
omantel.omani.works/ overlays. helm-template default-values renders 15
manifests clean (HTTPRoute correctly skip-renders without `gateway.host`).

WBS §2 row 13 + §9 row #376 updated to chart-verified.

Closes #376.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:55:19 +04:00
e3mrah
2913c4f27a
feat(bp-grafana): chart-verified — smoke OK on contabo + per-Sovereign overlay drift fix (closes #381) (#416)
bp-grafana 1.0.0 was published by blueprint-release run 25214143810 on
commit a1bd5502 (alongside the #387 Gateway API HTTPRoute templates).
This commit verifies the chart on contabo and brings the per-Sovereign
overlays in line with the _template (and with the bp-keycloak pattern
shipped in #377).

Verification:
  - helm template defaults → 13 kinds (HTTPRoute skip-renders when
    gateway.host is empty, per the #387/#402 if-host-emit pattern)
  - helm template with gateway.host=grafana.test.example.com → 14 kinds
    (incl. HTTPRoute)
  - smoke install in grafana-smoke ns: 1/1 Ready in 65s; in-cluster GET
    http://smoke-grafana/login → HTTP 200; /api/health → 200; image
    docker.io/grafana/grafana:12.3.1 confirmed; smoke torn down clean.

Per-Sovereign overlay drift fix:
  - clusters/omantel.omani.works/bootstrap-kit/25-grafana.yaml — add
    values.gateway.host = grafana.omantel.omani.works (was missing).
  - clusters/otech.omani.works/bootstrap-kit/25-grafana.yaml — add
    values.gateway.host = grafana.otech.omani.works (was missing).

Both now match the _template and the bp-keycloak otech overlay shape.

Scope clarification: the original ticket said "Bundle: Alloy + Loki +
Mimir + Tempo + Grafana dashboards" but the actual chart split has
Alloy/Loki/Mimir/Tempo as sibling Blueprints at slots 21-24, with
bp-grafana as the visualizer-only at slot 25. WBS §2 row updated to
reflect this. Each LGTM sibling has its own ticket.

Closes #381

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:55:07 +04:00
e3mrah
1e17668055
feat(catalyst): Hetzner Object Storage credential pattern — Phase 0b (#371) (#409)
* feat(catalyst): Hetzner Object Storage credential pattern (Phase 0b, #371)

Adds the per-Sovereign Hetzner Object Storage credential capture + bucket
provisioning Phase 0b path described in the omantel handover WBS §5.
Hybrid Option A+B: wizard collects operator-issued S3 credentials (Hetzner
exposes no Cloud API to mint them — they're issued once in the Hetzner
Console and the secret half is shown exactly once), and OpenTofu
auto-provisions the per-Sovereign bucket via the aminueza/minio provider
+ writes a flux-system/hetzner-object-storage Secret into the new
Sovereign at cloud-init time so Harbor (#383) and Velero (#384) find
their backing-store credentials already in the cluster from Phase 1
onwards.

Extends the EXISTING canonical seam at every layer (per the founder's
anti-duplication rule for #371's session): the existing Tofu module at
infra/hetzner/, the existing handler/credentials.go validator, the
existing provisioner.Request struct, the existing store.Redact path,
and the existing wizard StepCredentials. No parallel binaries / scripts
/ operators introduced.

infra/hetzner/ (Tofu module — Phase 0):
  - versions.tf: declare aminueza/minio provider (Hetzner's official
    recommendation for S3-compatible bucket creation per
    docs.hetzner.com/storage/object-storage/getting-started/...)
  - variables.tf: 4 sensitive vars — region (validated against
    fsn1/nbg1/hel1, the European-only OS regions as of 2026-04),
    access_key, secret_key, bucket_name (RFC-compliant S3 naming)
  - main.tf: minio_s3_bucket.main resource — idempotent on re-apply,
    no force_destroy (Velero archive must survive a control-plane
    reinstall), object_locking=false (content-addressed digests are
    the immutability guarantee for Harbor; Velero uses S3 versioning)
  - cloudinit-control-plane.tftpl: write
    flux-system/hetzner-object-storage Secret with the canonical
    s3-endpoint/s3-region/s3-bucket/s3-access-key/s3-secret-key keys
    Harbor + Velero charts consume via existingSecret refs
  - outputs.tf: surface endpoint/region/bucket back to catalyst-api
    for the deployment record (credentials NEVER returned)

products/catalyst/bootstrap/api/ (Go):
  - internal/hetzner/objectstorage.go: NEW — minio-go/v7-based
    ListBuckets validator. Distinguishes auth failure ("rejected") from
    network failure ("unreachable") so the wizard renders the right
    error card. NOT a parallel cloud-resource path — the existing
    purge.go handles hcloud purge; objectstorage.go handles a separate
    API surface (S3-compatible) that has no equivalent client today.
  - internal/handler/credentials.go: extend with
    ValidateObjectStorageCredentials handler — same wire shape
    (200 valid:true / 200 valid:false / 503 unreachable / 400 bad
    input) as the existing token validator so the wizard's failure-
    card machinery handles both without per-endpoint switches.
  - cmd/api/main.go: wire POST
    /api/v1/credentials/object-storage/validate
  - internal/provisioner/provisioner.go: extend Request with
    ObjectStorageRegion/AccessKey/SecretKey/Bucket; Validate()
    rejects empty/malformed values fail-fast at /api/v1/deployments
    POST time; writeTfvars() emits the 4 new tfvars.
  - internal/handler/deployments.go: derive bucket name from FQDN slug
    pre-Validate (catalyst-<fqdn-with-dots-replaced-by-dashes>) so
    Hetzner's globally-namespaced bucket pool gets a deterministic,
    collision-resistant per-Sovereign name without operator input.
  - internal/store/store.go: redact access/secret keys; preserve
    region+bucket plain (they're public in tofu outputs anyway).

products/catalyst/bootstrap/ui/ (TypeScript / React):
  - entities/deployment/model.ts + store.ts: 4 new wizard fields
    (objectStorageRegion/AccessKey/SecretKey/Validated) with merge()
    coercion for legacy persisted state.
  - pages/wizard/steps/StepCredentials.tsx: ObjectStorageSection —
    region picker (fsn1/nbg1/hel1), masked secret-key input,
    Validate button gating Next. Same FailureCard taxonomy
    (rejected/too-short/unreachable/network/parse/http) the existing
    TokenSection uses, so the operator UX is consistent. Section
    only renders when Hetzner is among chosen providers — non-Hetzner
    Sovereigns skip Phase 0b until their own backing-store path lands.
  - pages/wizard/steps/StepReview.tsx: include
    objectStorageRegion/AccessKey/SecretKey in the
    POST /v1/deployments payload (bucket derived server-side).

Tests:
  - api: 7 new provisioner Validate tests (region/keys/bucket
    required + RFC-compliant + valid-region acceptance), 5 handler
    tests for the new endpoint (bad JSON / missing region / invalid
    region / short keys), 4 hetzner/objectstorage_test.go tests
    (endpoint composition + early input rejection), 1 handler test
    for the bucket-name derivation. Existing tests updated to supply
    the new required fields.
  - ui: StepCredentials.test.tsx pre-populates objectStorageValidated
    in beforeEach so the existing 11 SSH-section tests aren't gated
    on Object Storage validation.

DoD: a fresh Sovereign provision results in a usable S3 endpoint URL +
access/secret keys available as a K8s Secret in the Sovereign's home
cluster (flux-system/hetzner-object-storage), ready for consumption by
Harbor + Velero charts via existingSecret references.

Closes #371.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(wbs): #371 done — Hetzner Object Storage Phase 0b shipped (#409)

Marks #371 done with the architectural rationale (hybrid Option A + B —
Hetzner exposes no Cloud API to mint S3 keys, so the wizard MUST capture
them; OpenTofu auto-provisions the bucket + cloud-init writes the
flux-system/hetzner-object-storage Secret with the canonical s3-* keys
Harbor + Velero consume).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:54:22 +04:00
e3mrah
1cbd759e0f
docs(wbs): tick 7 — §2 prose updated (#316 + #375 chart-released); #379 RESTART after watchdog kill (#415)
Bursty completion: #316 + #375 prose rows now reflect chart-released state
(was stale from earlier 'not deployed').

#379 first agent watchdog-killed (no work survived) — restarted with
tighter STAY-TIGHT brief modeled on the successful #378/#377/#375 patterns
(5-15 min wall time, smoke + close as duplicate if chart already published).

In flight (5): #371 #376 #379-RESTART #380 #381

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:53:00 +04:00
e3mrah
8695ab82c5
docs(wbs): tick #316 chart-released — bp-openbao 1.2.0 (auto-unseal) (#414)
PR #408 merged at d2ada908. Blueprint-release run 25214747925 SUCCESS,
bp-openbao:1.2.0 published to GHCR with cosign signature + SBOM
attestation. Cluster overlay clusters/_template/bootstrap-kit/08-openbao.yaml
already wired with autoUnseal.enabled=true in the same PR.

Sovereign-impact deferred to Phase 8 — next omantel provision run.

Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:50:18 +04:00
e3mrah
38e6a2a528
docs(wbs): tick 6 — 9 done; #380 dispatched to maintain 5 parallel (#413)
Done (9): #316 #338 #370 #373 #375 #377 #378 #387 #392
In flight (5): #371 #376 #379 #380 #381

Bursty completion window — #316 #373 #375 #377 #378 all landed within ~10 min.
Sovereign-impact for chart-released/chart-verified items deferred to Phase 8.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:48:04 +04:00
e3mrah
d2ada908c9
feat(bp-openbao): auto-unseal flow — cloud-init seed + post-install init Job (closes #316) (#408)
Catalyst-curated auto-unseal pipeline for OpenBao on Hetzner Sovereigns
(no managed-KMS available). Selected **Option A — Shamir + cloud-init
seed** because:

  - Hetzner has no managed-KMS service → Cloud-KMS auto-unseal (Option C)
    is structurally unavailable.
  - Transit-seal (Option B) requires a peer OpenBao cluster, only
    applicable to multi-region tier-1; out of scope for single-region
    omantel.
  - Manual unseal (Option D) violates the "first sovereign-admin lands
    on console.<sovereign-fqdn> ready to use" goal in
    SOVEREIGN-PROVISIONING.md §5.

Architecture (per issue #316 spec + acceptance criteria 1-6):

  1. Cloud-init on the control-plane node generates a 32-byte recovery
     seed from /dev/urandom and writes it to a single-use K8s Secret
     `openbao-recovery-seed` in the openbao namespace, with annotation
     `openbao.openova.io/single-use: "true"`. Pre-creates the openbao
     namespace to eliminate the race with Flux's HelmRelease apply.
  2. bp-openbao chart v1.2.0 ships two new Helm post-install hooks:
       - `templates/init-job.yaml` (hook weight 5): consumes the seed,
         calls `bao operator init -recovery-shares=1 -recovery-threshold=1`,
         persists the recovery key inside OpenBao's auto-unseal config,
         deletes the seed Secret on success. Idempotent — re-runs detect
         Initialized=true and exit 0.
       - `templates/auth-bootstrap-job.yaml` (hook weight 10): enables
         the Kubernetes auth method, mounts kv-v2 at `secret/`, writes
         the `external-secrets-read` policy, binds the `external-secrets`
         role to the ESO ServiceAccount in `external-secrets-system`.
  3. `templates/auto-unseal-rbac.yaml` declares the least-privilege SA
     + Role + RoleBinding the Jobs need (Secret get/list/delete in the
     openbao namespace; create/get/patch on the openbao-init-marker).
     Also emits the permanent `system:auth-delegator` ClusterRoleBinding
     bound to the OpenBao ServiceAccount so the Kubernetes auth method
     can call tokenreviews.authentication.k8s.io.
  4. Cluster overlay `clusters/_template/bootstrap-kit/08-openbao.yaml`
     bumps version 1.1.1 → 1.2.0 and flips `autoUnseal.enabled: true`
     per-Sovereign.

Per #402 lesson: skip-render pattern (`{{- if .Values.X }}{{ emit }}
{{- end }}`) used throughout — never `{{ fail }}`. Default `helm
template` render emits NOTHING new; opt-in via autoUnseal.enabled=true.

Acceptance criteria coverage:
  1. Provision fresh Sovereign — cloud-init writes seed, Flux installs
     bp-openbao 1.2.0, post-install Jobs run automatically. 
  2. bp-openbao HR Ready=True without manual intervention — install
     keeps `disableWait: true` (Helm Ready ≠ OpenBao initialised; the
     init Job drives initialisation out-of-band on the same install). 
  3. `bao status` shows Sealed=false, Initialized=true within 5 minutes
     — init Job polls + retries up to 60×5s. 
  4. ESO ClusterSecretStore vault-region1 reaches Status: Valid — the
     auth-bootstrap Job binds the `external-secrets` role to ESO's SA
     before the Job exits. 
  5. Seed Secret deleted post-init — init Job deletes it via K8s API
     after consuming. 
  6. No openbao-root-token Secret in K8s — root token captured to
     /tmp/.root-token in the Job pod's tmpfs only; never written to a
     K8s Secret. The recovery key persists ONLY inside OpenBao's Raft
     state (auto-unseal config). 

Tests:
  - tests/auto-unseal-toggle.sh — 4 cases:
    * default render → no auto-unseal artefacts (skip-render works)
    * autoUnseal.enabled=true → both Jobs + correct hook weights
    * kubernetesAuth.enabled=false → init Job only, no auth-bootstrap
    * idempotency annotations present on all 5 hook objects
  - tests/observability-toggle.sh — unchanged, all 3 cases green.
  - helm lint . — clean.

Files:
  - platform/openbao/chart/Chart.yaml — version 1.1.1 → 1.2.0
  - platform/openbao/blueprint.yaml — version 1.1.1 → 1.2.0
  - platform/openbao/chart/values.yaml — `autoUnseal.*` block
  - platform/openbao/chart/templates/auto-unseal-rbac.yaml — new
  - platform/openbao/chart/templates/init-job.yaml — new
  - platform/openbao/chart/templates/auth-bootstrap-job.yaml — new
  - platform/openbao/chart/tests/auto-unseal-toggle.sh — new
  - platform/openbao/README.md — bootstrap procedure §2-3 expanded;
    auto-unseal alternatives table added.
  - clusters/_template/bootstrap-kit/08-openbao.yaml — chart 1.1.1 →
    1.2.0, autoUnseal.enabled=true.
  - infra/hetzner/cloudinit-control-plane.tftpl — seed-token block
    inserted between ghcr-pull-secret apply and flux-bootstrap apply.
  - docs/omantel-handover-wbs.md §9 — #316 ticked chart-released.

Canonical seam used: extended existing `platform/openbao/chart/` per
the anti-duplication rule. NO standalone scripts. NO bespoke Go cloud
calls. NO `{{ fail }}`. All knobs configurable via values.yaml per
INVIOLABLE-PRINCIPLES.md #4 (never hardcode).

Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:45:44 +04:00
e3mrah
74d232538a
docs(wbs): #375 bp-nats-jetstream chart-verified — smoke OK, close as duplicate (#411)
bp-nats-jetstream:1.1.1 already published on GHCR. Helm template renders
8 kinds clean (StatefulSet replicas=3 per ADR-0001 §9.2 B5). Smoke install
on contabo `nats-smoke` ns reached 3/3 Ready in 33s; JetStream R=3 stream
created with leader+2 replica quorum; pub/sub round-trip verified.
Bootstrap-kit slot 07 already wired in `_template/`. No code change needed.

Same verify-and-close pattern as #378.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:45:21 +04:00
e3mrah
04308af7e9
feat(cert-manager): bp-cert-manager-powerdns-webhook (#373) (#410)
Authors a Catalyst Blueprint for the cert-manager DNS-01 external webhook
backed by PowerDNS, for post-handover wildcard TLS issuance against the
Sovereign's OWN PowerDNS — eliminating the last reachback to openova-
controlled Dynadot credentials per ADR-0001 §9.4.

Structure mirrors bp-cert-manager-dynadot-webhook (canonical seam):
- platform/cert-manager-powerdns-webhook/blueprint.yaml — Blueprint CR
  with depends: [bp-cert-manager, bp-powerdns]
- platform/cert-manager-powerdns-webhook/chart/Chart.yaml — wraps upstream
  zachomedia/cert-manager-webhook-pdns v2.5.5 (chart 3.2.5); declares the
  sigstore/common stub dep to satisfy the hollow-chart guard (#181)
- chart/templates/ — 8 templates (Deployment, Service, APIService, RBAC,
  selfSigned/CA Issuer + serving Certificate, ServiceAccount,
  ClusterIssuer)
- ClusterIssuer (letsencrypt-dns01-prod-powerdns) ships with the chart,
  paired with the webhook's solver. Gated behind clusterIssuer.enabled
  AND powerdns.host (skip-render pattern, lesson from #387 follow-up
  #402 — never use {{ fail }})

Bootstrap-kit slot:
- clusters/_template/bootstrap-kit/36-bp-cert-manager-powerdns-webhook.yaml
  wires the HelmRelease to the per-Sovereign in-cluster PowerDNS endpoint
  (http://powerdns.powerdns:8081) and flips clusterIssuer.enabled=true.
- ${SOVEREIGN_FQDN} envsubst keeps the slot operator-overridable per
  Inviolable Principle #4. Contabo bootstrap path does NOT include this
  template — contabo stays on legacy http01 + Traefik per ADR-0001 §9.4.

Helm-template verification:
  helm template t platform/cert-manager-powerdns-webhook/chart/
    → 14 resources, 0 ClusterIssuer (skip-render works)
  helm template t platform/cert-manager-powerdns-webhook/chart/ \
      --set powerdns.host=http://powerdns.test:8081 \
      --set clusterIssuer.enabled=true \
      --set powerdns.apiKeySecretRef.name=fake
    → 15 resources incl. ClusterIssuer with PowerDNS solver config
  Both renders parse cleanly through python yaml.safe_load_all.

Updates docs/omantel-handover-wbs.md §2 row 4 + §9 row #373 to
chart-released. Sovereign-impact deferred to Phase 8 (handover E2E).

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:44:27 +04:00
e3mrah
43c93d1875
feat(bp-keycloak): chart-verified on contabo (#377) (#407)
bp-keycloak:1.1.2 already published by blueprint-release run 25214143810
on commit a1bd5502 (digest sha256:c284c3dc...). Verified end-to-end:

- helm dependency build pulls bitnami/keycloak 25.2.0
- helm template (default values, no gateway.host) renders without error
  (HTTPRoute skip-renders per #387/#402 pattern)
- helm install in disposable keycloak-smoke ns on contabo:
  smoke-postgresql-0 + smoke-keycloak-0 reached Ready in ~2m39s
- /realms/master returns HTTP 200 in-cluster
- admin OIDC password-grant returned valid RS256 JWT access_token
- teardown clean (PVC + namespace deleted)

In-scope hygiene fix:
- clusters/otech.omani.works/bootstrap-kit/09-keycloak.yaml: add
  values.gateway.host=auth.otech.omani.works (mirrors omantel overlay
  authored under #387; otech overlay was authored before that and
  would have shipped without an HTTPRoute on its Sovereign).

Wizard catalog already lists keycloak under layer:'bootstrap-kit'
(mandatory, auto-installed) — no UI work needed.

WBS §2 row 14 + §9 row #377 updated to chart-verified.

Closes #377

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:42:06 +04:00
e3mrah
513508f224
docs(wbs): tick 5 — #378 done, #375 dispatched, dedupe §9 (#406)
#378 completed (chart-verified, closed as duplicate per agent finding).
#375 dispatched as next from queue to maintain 5-parallel.

In-flight now: #371 #373 #316 #375 #377 (5).
Done: #338 #370 #378 #387 #392 (5 of 24 minimal blueprints).

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:40:25 +04:00
e3mrah
1a20cc50b9
docs(wbs): #378 bp-crossplane chart-verified — smoke OK, close as duplicate (#405)
Investigation by Agent #378-bp-crossplane:

VALIDATION
- platform/crossplane/chart/ is umbrella (Chart.yaml + values.yaml + Chart.lock + charts/)
  by design after the v1.1.3 split (CR-of-CRD ordering moved to bp-crossplane-claims)
- helm template bp-crossplane . --namespace crossplane-system renders 23 kinds, 0 errors
- bp-crossplane v1.1.3 already published to oci://ghcr.io/openova-io/bp-crossplane
- Latest blueprint-release.yaml run on main is SUCCESS (f004300f)

SMOKE INSTALL (contabo, crossplane-smoke ns, torn down)
- helm install: deployed in 26s
- crossplane controller: 1/1 Ready
- crossplane-rbac-manager: 1/1 Ready
- 16 CRDs admitted (apiextensions.crossplane.io + pkg.crossplane.io + secrets.crossplane.io)
- Provider.pkg.crossplane.io/v1 admitted
- provider-hcloud:v0.4.0 Provider CR admitted (xpkg.upbound.io/crossplane-contrib)
- Teardown clean (provider deleted, helm uninstall, namespace deleted, CRDs deleted)

BOOTSTRAP-KIT WIRING (already done — verified, not changed)
- clusters/_template/bootstrap-kit/04-crossplane.yaml — bp-crossplane HelmRelease,
  dependsOn bp-flux, namespace crossplane-system, version pinned 1.1.3
- clusters/_template/bootstrap-kit/14-crossplane-claims.yaml — bp-crossplane-claims
  HelmRelease, dependsOn bp-crossplane (post-v1.1.3 split rationale documented inline)
- clusters/omantel.omani.works/bootstrap-kit/{04,14}-*.yaml — same content with
  catalyst.openova.io/sovereign label substituted

Per ADR-0001 §9.2 #2 Crossplane is the only day-2 cloud-API seam — chart deployed
per-Sovereign on the management k3s, not on contabo-mkt (which is the marketing
cluster). The smoke install above is a transient verification only.

#378 closes as duplicate — chart pre-exists, renders clean, installs clean,
bootstrap-kit wiring pre-exists. Nothing new to ship.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:37:17 +04:00
e3mrah
32864b58df
docs(wbs): tick 4 — 5 agents in flight (#371 #373 #316 #377 #378) (#404)
Phase 0/2/3/4 fan-out at full 5-parallel:
  - #371 RESUME (Hetzner OS credentials, in-worktree state)
  - #373 NEW (cert-mgr-powerdns-webhook authoring)
  - #316 NEW (OpenBao auto-unseal)
  - #377 NEW (bp-keycloak install verification)
  - #378 NEW (bp-crossplane install verification)

#370 promoted to done (unblocked + scope superseded by working wipe.go).

Class assignments updated; §9 status rows added.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:36:51 +04:00
e3mrah
f004300ff9
docs(wbs): tick 3 — #387 chart-released, #392 DoD-met (e2e proven), #370 unblocked (#403)
State after #401 + #402 + #399 land:
- #338 chart-released, Sovereign-impact deferred (bp-flux is cloud-init bootstrapped)
- #387 chart-released, follow-up #402 fixed default-values render; blueprint-release SUCCESS on a1bd5502
- #392  DoD-met — fake-Hetzner E2E test exercises full Purge() flow
- #370 unblocked (purge.go fix proven); reframed scope superseded
- #371 still in flight (Hetzner OS credentials)

DAG class: T338 T387 T392 → done; T370 T371 → wip.

Co-authored-by: hatiyildiz <hati@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:26:49 +04:00
e3mrah
abf01b6f21
feat(platform): Gateway API migration audit (#387) (#401)
Migrates every minimal-Sovereign-set blueprint chart from
networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute,
replacing the legacy Traefik-on-Sovereigns assumption with the canonical
Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2
correction note (#388).

The single per-Sovereign Gateway is added as additional documents in
the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml
(NOT a new top-level slot), since Cilium owns the GatewayClass. It
includes:

  - Certificate `sovereign-wildcard-tls` requesting `*.${SOVEREIGN_FQDN}`
    from `letsencrypt-dns01-prod` (cert-manager + #373 webhook)
  - Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS
    terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All

Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's
existing `templates/` directory):

  | Blueprint           | Host pattern                    | Backend port |
  |---------------------|---------------------------------|--------------|
  | bp-keycloak         | auth.<sov>                      | 80           |
  | bp-gitea            | git.<sov>                       | 3000         |
  | bp-openbao          | bao.<sov>                       | 8200         |
  | bp-grafana          | grafana.<sov>                   | 80           |
  | bp-harbor           | registry.<sov>                  | 80           |
  | bp-powerdns         | pdns.<sov>/api  (dual-mode)     | 8081         |
  | bp-catalyst-platform| console.<sov>, api.<sov>         | 80, 8080     |

bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute
(Sovereign) simultaneously — the per-Sovereign overlay sets
`api.gateway.enabled=true` while leaving `api.enabled=true`. The
Ingress object is harmless on Cilium clusters with no Traefik. This
preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4.

bp-harbor flips `expose.type` from `ingress` to `clusterIP` in
platform/harbor/chart/values.yaml so the upstream chart no longer
emits its own Ingress; the HTTPRoute is the sole HTTP exposure.
TLS terminates at the Gateway (wildcard cert) rather than per-host
Certificates inside the chart.

bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by
.helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml,
which remain contabo-only legacy demo infra). The contabo path keeps
serving console.openova.io/sovereign via Traefik unchanged.

Bootstrap-kit slot updates (per-Sovereign hostname interpolation):

  - 08-openbao.yaml      → gateway.host: bao.${SOVEREIGN_FQDN}
  - 09-keycloak.yaml     → gateway.host: auth.${SOVEREIGN_FQDN}
  - 10-gitea.yaml        → gateway.host: gitea.${SOVEREIGN_FQDN}
  - 11-powerdns.yaml     → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true
  - 19-harbor.yaml       → gateway.host: registry.${SOVEREIGN_FQDN}
  - 25-grafana.yaml      → gateway.host: grafana.${SOVEREIGN_FQDN}

Server-side dry-run validation against the live Cilium Gateway API
CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway
+ Certificate apply cleanly via `kubectl apply --dry-run=server`.

Contabo unaffected: clusters/contabo-mkt/* not modified. The legacy
SME ingresses (console-nova, marketplace, admin, axon, talentmesh,
stalwart, ...) continue to serve via Traefik as before. powerdns
on contabo remains on the Ingress path (api.gateway.enabled defaults
to false at the chart level).

Closes #387.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:19:30 +04:00