From 43839526fe79257a569185920f9b43ada94018a6 Mon Sep 17 00:00:00 2001 From: e3mrah <81884938+emrahbaysal@users.noreply.github.com> Date: Fri, 1 May 2026 15:13:48 +0400 Subject: [PATCH] docs(wbs): omantel handover work-breakdown structure (#369) (#386) Canonical reference for the minimal self-sufficient Sovereign blueprint set, the 7-phase DAG, per-ticket dependencies, realistic timeline, and the DoD execution checklist. Companion to #369 epic and ADR-0001. Co-authored-by: hatiyildiz Co-authored-by: Claude Opus 4.7 (1M context) --- docs/omantel-handover-wbs.md | 243 +++++++++++++++++++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 docs/omantel-handover-wbs.md diff --git a/docs/omantel-handover-wbs.md b/docs/omantel-handover-wbs.md new file mode 100644 index 00000000..72dd2e2a --- /dev/null +++ b/docs/omantel-handover-wbs.md @@ -0,0 +1,243 @@ +# omantel Handover — Work Breakdown Structure + +| | | +|---|---| +| **Parent epic** | [#369](https://github.com/openova-io/openova/issues/369) | +| **Authoritative architecture** | [ADR-0001](adr/0001-catalyst-control-plane-architecture.md) | +| **Definition of Done** | omantel.omani.works runs as a fully self-sufficient Sovereign Cloud on Hetzner with **zero contabo dependency** post-handover | + +--- + +## 1. Goal + +Provision **omantel.omani.works** as the first fully self-sufficient Sovereign Cloud on Hetzner. Validate the wizard end-to-end. Complete the handover transition. Verify that killing catalyst-api on contabo for 5 minutes does not affect omantel. + +The hard rule from ADR-0001 §9.4: the legacy SME demos (`console.openova.io/nova`, `marketplace.openova.io`, `admin.openova.io`) stay running and untouched throughout this work. + +## 2. Minimal Self-Sufficient Sovereign — 24 blueprints + +A handed-over Sovereign must own its own GitOps loop, its own DNS, its own cert issuance, its own identity, its own secrets, its own registry, its own observability, its own Day-2 IaC, and its own multi-tenant isolation. The 24 blueprints below are the floor. + +| # | Blueprint | Role | Today on contabo | +|---|---|---|---| +| 1 | `bp-cilium` | CNI / eBPF | ✅ deployed | +| 2 | `bp-flux` | GitOps reconciler — pulls from Sovereign's own Gitea | ✅ deployed (gated on RBAC fix #338) | +| 3 | `bp-traefik` | L7 Ingress | ⚠️ **blueprint missing — installed via cluster manifests** ([#372](https://github.com/openova-io/openova/issues/372)) | +| 4 | `bp-cert-manager` | TLS issuance | ✅ deployed | +| 5 | `bp-cert-manager-powerdns-webhook` | DNS-01 against Sovereign's own PowerDNS post-handover | ❌ **not authored** ([#373](https://github.com/openova-io/openova/issues/373)) | +| 6 | `bp-sealed-secrets` | Git-committed encrypted secrets | ✅ deployed | +| 7 | `bp-openbao` | Dynamic secrets, rotation, audit log | ❌ not deployed — gates [#316](https://github.com/openova-io/openova/issues/316) auto-unseal | +| 8 | `bp-external-secrets` | OpenBao → K8s Secret materialiser | ⚠️ chart exists; [#331](https://github.com/openova-io/openova/issues/331) ClusterSecretStore split open | +| 9 | `bp-cnpg` | Postgres operator | ✅ deployed | +| 10 | `bp-valkey` | Redis-API cache | ✅ deployed | +| 11 | `bp-nats-jetstream` | Event bus per ADR-0001 §9.2 B5 | ❌ not deployed ([#375](https://github.com/openova-io/openova/issues/375)) | +| 12 | `bp-vcluster` | Per-tenant vCluster operator | ✅ deployed (3 active tenants) | +| 13 | `bp-powerdns` | Authoritative DNS for the Sovereign's delegated subdomain (PDM + dnsdist included) | ✅ deployed | +| 14 | `bp-gitea` | Sovereign-owned Git server — replaces github.com dependency | ❌ not deployed ([#376](https://github.com/openova-io/openova/issues/376)) | +| 15 | `bp-keycloak` | OIDC IDP — per-Sovereign realm | ❌ not deployed ([#377](https://github.com/openova-io/openova/issues/377)) | +| 16 | `bp-spire` | Workload identity — service-to-service mTLS | ❌ not deployed ([#382](https://github.com/openova-io/openova/issues/382)) | +| 17 | `bp-crossplane` | Day-2 cloud-resource provisioning | ❌ not deployed ([#378](https://github.com/openova-io/openova/issues/378)) | +| 18 | `bp-crossplane-claims` | XRDs + Compositions for Sovereign-level claims | ⚠️ chart exists; [#327](https://github.com/openova-io/openova/issues/327) event-driven HR install in flight | +| 19 | `bp-harbor` | Container registry — avoids Docker Hub rate limits | ❌ not deployed; **chart hardcodes SeaweedFS endpoint** ([#383](https://github.com/openova-io/openova/issues/383)) | +| 20 | `bp-velero` | Cluster-state backup → Hetzner Object Storage | ❌ not deployed; chart needs S3 endpoint rework ([#384](https://github.com/openova-io/openova/issues/384)) | +| 21 | `bp-kyverno` | Admission policy | ❌ not deployed ([#379](https://github.com/openova-io/openova/issues/379)) | +| 22 | `bp-trivy` | Image CVE scanning | ❌ not deployed ([#380](https://github.com/openova-io/openova/issues/380)) | +| 23 | `bp-grafana` | Bundle: Alloy + Loki + Mimir + Tempo + Grafana dashboards | ❌ not deployed ([#381](https://github.com/openova-io/openova/issues/381)) | +| 24 | `bp-catalyst-platform` | catalyst-api + catalyst-ui + helmwatch (the self-sufficient console) | ✅ deployed; needs single-blueprint verification ([#385](https://github.com/openova-io/openova/issues/385)) | + +## 3. Architecture rule — S3 vs SeaweedFS + +Per ADR-0001 §13 (recorded from this session): + +``` +S3-aware app (Harbor, Velero, OpenBao audit log, future analytics) + → cloud-provider native S3 (Hetzner Object Storage on Hetzner Sovereigns) + +POSIX-only app that needs S3 archival (Guacamole session recordings, + any legacy POSIX writer) → SeaweedFS as POSIX→S3 buffer in front of cloud-native S3 +``` + +For minimal omantel, neither Guacamole nor any POSIX-only writer is selected. **SeaweedFS is NOT in the minimal set.** Harbor + Velero write directly to Hetzner Object Storage. + +## 4. Phase ordering (DAG) + +Phases run sequentially; tickets within a phase parallelize except where a same-phase dependency is noted. + +```mermaid +flowchart TB + P0a[Phase 0a · #370
Hetzner mock-data purge] --> P8 + P0b[Phase 0b · #371
Hetzner Object Storage
credential pattern] + P1a[Phase 1a · #372
bp-traefik
blueprint authoring] + P1b[Phase 1 · #338
bp-flux helm-controller
SA cluster-admin] --> Phase2 + P2a[Phase 2a · #373
cert-manager-powerdns
-webhook] --> P2b + P2b[Phase 2b · #374
NS delegation
.omani.works → omantel] --> P6 + P1a --> Phase2[Phase 2 — Infrastructure] + Phase2 --> P3[Phase 3 — Data + State] + P3 --> P3a[#375 nats-jetstream] + P3 --> P3b[#376 gitea] + P3 --> P3c[#377 keycloak] + P3 --> P316[#316 OpenBao auto-unseal] + P3 --> P331[#331 ESO ClusterSecretStore split] + Phase2 --> P4[Phase 4 — Registry + IaC + Backup] + P4 --> P4a[#378 bp-crossplane] + P4 --> P327[#327 crossplane-claims] + P4 --> P4b[#383 bp-harbor S3 rework] + P4 --> P4c[#384 bp-velero S3] + P0b --> P4b + P0b --> P4c + P3 --> P5[Phase 5 — Security + Observability] + P5 --> P5a[#379 kyverno] + P5 --> P5b[#380 trivy] + P5 --> P5c[#381 grafana stack] + P5 --> P5d[#382 spire] + P4 --> P6[Phase 6 · #385
bp-catalyst-platform
single-blueprint verify] + P5 --> P6 + P6 --> P7a[Phase 7a · #317
handover finalisation] + P7a --> P7b[Phase 7b · #319
self-decommission + redirect] + P7b --> P8[Phase 8
End-to-end omantel run
+ DoD verification] +``` + +## 5. Phase-by-phase detail + +### Phase 0 — Pre-flight (parallelizable) + +| Ticket | Title | Depends on | +|---|---|---| +| [#370](https://github.com/openova-io/openova/issues/370) | Hetzner mock-data purge runbook | nothing | +| [#371](https://github.com/openova-io/openova/issues/371) | Hetzner Object Storage credential pattern (wizard step OR Phase-0 OpenTofu auto-provision) | nothing | + +### Phase 1 — Foundational platform fixes + +| Ticket | Title | Depends on | Gates | +|---|---|---|---| +| [#338](https://github.com/openova-io/openova/issues/338) | bp-flux helm-controller SA cluster-admin | nothing | every Helm install on omantel | +| [#372](https://github.com/openova-io/openova/issues/372) | bp-traefik blueprint authoring | nothing | every Sovereign needing L7 ingress | + +### Phase 2 — Infrastructure layer (depends on Phase 1) + +| Ticket | Title | Depends on | +|---|---|---| +| [#373](https://github.com/openova-io/openova/issues/373) | cert-manager-powerdns-webhook | bp-powerdns deployed | +| [#374](https://github.com/openova-io/openova/issues/374) | NS delegation .omani.works → omantel.omani.works | bp-powerdns deployed on omantel | + +### Phase 3 — Data + State layer (depends on Phase 2) + +| Ticket | Title | Depends on | +|---|---|---| +| [#375](https://github.com/openova-io/openova/issues/375) | bp-nats-jetstream install | #338 | +| [#376](https://github.com/openova-io/openova/issues/376) | bp-gitea install | bp-cnpg, #338 | +| [#377](https://github.com/openova-io/openova/issues/377) | bp-keycloak install | bp-cnpg, #338 | +| [#316](https://github.com/openova-io/openova/issues/316) | bp-openbao auto-unseal | #338 | +| [#331](https://github.com/openova-io/openova/issues/331) | bp-external-secrets ClusterSecretStore split | bp-openbao (#316) | + +### Phase 4 — Registry + IaC + Backup (depends on Phase 3) + +| Ticket | Title | Depends on | +|---|---|---| +| [#378](https://github.com/openova-io/openova/issues/378) | bp-crossplane install | #338 | +| [#327](https://github.com/openova-io/openova/issues/327) | bp-crossplane-claims event-driven HR install | #378 | +| [#383](https://github.com/openova-io/openova/issues/383) | bp-harbor Hetzner Object Storage backend rework | bp-cnpg, bp-valkey, #371 (Hetzner OS credentials) | +| [#384](https://github.com/openova-io/openova/issues/384) | bp-velero install + Hetzner S3 wiring | #371, #338 | + +### Phase 5 — Security + Observability (depends on Phase 3; can parallel with Phase 4) + +| Ticket | Title | Depends on | +|---|---|---| +| [#379](https://github.com/openova-io/openova/issues/379) | bp-kyverno install | #338 | +| [#380](https://github.com/openova-io/openova/issues/380) | bp-trivy install | #338 | +| [#381](https://github.com/openova-io/openova/issues/381) | bp-grafana stack install | #338 | +| [#382](https://github.com/openova-io/openova/issues/382) | bp-spire install | #338, bp-cert-manager | + +### Phase 6 — Catalyst control plane (depends on Phases 2 + 4 + 5) + +| Ticket | Title | Depends on | +|---|---|---| +| [#385](https://github.com/openova-io/openova/issues/385) | bp-catalyst-platform single-blueprint verification | #338, bp-cnpg, bp-cert-manager + #373, bp-sealed-secrets, #372, bp-powerdns + #374 | + +### Phase 7 — Handover machinery (sequential) + +| Ticket | Title | Depends on | +|---|---|---| +| [#317](https://github.com/openova-io/openova/issues/317) | Handover finalisation — minimum-retention model (zero state retained on contabo for handed-over Sovereigns) | #385 | +| [#319](https://github.com/openova-io/openova/issues/319) | Self-decommission + redirect (`console.openova.io/sovereign/` → omantel.omani.works) | #317, #374 | + +### Phase 8 — End-to-end omantel run + DoD verification + +Not a code ticket; an execution gate. Pre-conditions: +1. Hetzner is clean (#370 done). +2. All blueprints in §2 install cleanly on contabo as a dry-run (proven by Phases 1–6 closing). +3. Handover machinery in place (Phase 7 closing). + +DoD execution checklist: +- [ ] Run wizard end-to-end against fresh Hetzner with the 24-blueprint minimal set. +- [ ] Validate each step's job time matches helmwatch estimate ±20%. +- [ ] No error chains; if anything fails, the failed-deployment wipe ([#318](https://github.com/openova-io/openova/issues/318)) cleanup is exercised + re-run. +- [ ] Trigger handover. omantel takes over its own `omantel.omani.works`. +- [ ] Kill catalyst-api on contabo for 5 minutes — omantel keeps running, customer requests still served. +- [ ] `console.openova.io/sovereign/` 301-redirects to `omantel.omani.works/sovereign/`. +- [ ] `dig +trace omantel.omani.works` ends at omantel's PowerDNS, not contabo's. +- [ ] cert-manager on omantel renews its TLS cert via local PowerDNS DNS-01 with no Dynadot reachback. +- [ ] Operator opens `omantel.omani.works/sovereign//cloud/architecture` — sees the Sovereign's own Architecture graph, sourced from omantel's catalyst-api informer (per ADR-0001 §5). +- [ ] Operator adds a NodePool via the Cloud surface — Crossplane on omantel reconciles to Hetzner. +- [ ] All Velero backups go to omantel's Hetzner Object Storage bucket. +- [ ] All Harbor pushes go to omantel's Hetzner Object Storage bucket. +- [ ] Legacy SME demos (`console.openova.io/nova`, `marketplace.openova.io`, `admin.openova.io`) keep responding 200 throughout — ADR §9.4 honoured. + +## 6. Realistic timeline + +| Phase | Duration | Parallelizable? | +|---|---|---| +| 0 | ~1 day | yes (#370 + #371) | +| 1 | ~1-2 days | yes (#338 + #372) | +| 2 | ~1-2 days | partially (#373 → #374) | +| 3 | ~3-4 days | yes (5 install tickets, parallelizable on different agents) | +| 4 | ~3-4 days | yes (4 install tickets), but Harbor + Velero gate on #371 | +| 5 | ~2-3 days | yes (4 install tickets, all parallel) | +| 6 | ~1-2 days | sequential gate — depends on Phases 2/4/5 done | +| 7 | ~3-5 days | sequential (#317 → #319), each non-trivial new code | +| 8 | ~2-3 days | sequential gate; bug-fix loop expected | +| **Total** | **~3 weeks** with parallel agents at peak (3-6 in flight); ~5-6 weeks if executed strictly serially | + +## 7. Out of scope (explicitly post-MVP) + +These are real future work but **not in the minimal omantel handover**: + +- **#320 IAM family** (#322, #323, #324, #325, #326): Bastion + pod console + UserAccess editor. Sovereign owner uses static admin kubeconfig in the minimal. Adds Day-2 enrichment. +- **#37**: Catalyst docs overhaul. +- **#264, #265**: bp-knative, bp-kserve — W2.K4 batch. +- **#109** (private): Cart-during-initial silent loss — SME-side legacy bug. +- **#335**: CI rot fix — convenient but doesn't gate omantel. +- **#257**: Per-Sovereign cluster-directory cleanup — convenient. +- **#127** (private) + PR #128: Credential rotation — important but parallel. +- **bp-falco**, **bp-coraza**, **bp-debezium**, etc. — every blueprint NOT in the §2 list of 24. + +## 8. Out-of-scope architecture amendments worth filing + +If founder wants to amend ADR-0001 with §13 formalised (S3 vs SeaweedFS rule), file as a new ADR (`0002-…`) referencing this WBS. + +## 9. Status field — fill as work progresses + +| Ticket | Status | PR(s) | Deployed-SHA evidence | +|---|---|---|---| +| #338 | (pending) | | | +| #316 | (pending) | | | +| #317 | (pending) | | | +| #319 | (pending) | | | +| #327 | (in flight, other session) | | | +| #331 | (pending) | | | +| #370 | (parked) | | | +| #371 | (parked) | | | +| #372 | (parked) | | | +| #373 | (parked) | | | +| #374 | (parked) | | | +| #375 | (parked) | | | +| #376 | (parked) | | | +| #377 | (parked) | | | +| #378 | (parked) | | | +| #379 | (parked) | | | +| #380 | (parked) | | | +| #381 | (parked) | | | +| #382 | (parked) | | | +| #383 | (parked) | | | +| #384 | (parked) | | | +| #385 | (parked) | | |