openova/docs
e3mrah ff0ff84b37
fix(cnpg-pair, cilium): qa-loop iter-6 Phase-2 multi-region closeout (#1101) (#1223)
Two bugs blocked the Phase-2 multi-region pair from converging on
omantel-fsn ↔ omantel-hel; both are addressed here:

bp-cilium overlay (omantel-fsn)
- Promote the kubectl-patched ClusterMesh values into the
  per-Sovereign overlay at clusters/omantel.omani.works/bootstrap-kit/
  01-cilium.yaml so resuming Flux on bootstrap-kit Kustomization keeps
  the live mesh state. This is the chart-side fix mandated by
  feedback_no_mvp_no_workarounds.md (operational kubectl patch is the
  hack; overlay commit is the fix).
- Bump chart version 1.1.1 → 1.2.0 (already the live version after
  manual reconcile; matches platform/cilium/chart/Chart.yaml).
- Add docs/CLUSTERMESH-CLUSTER-IDS.md as the registry for
  cluster.id allocation (1 = omantel-fsn, 2 = omantel-hel, 3..255
  reserved). Adds a duplicate-id check the next PR adding a peer
  must run.
- Document the convention in platform/cilium/README.md.

bp-cnpg-pair chart 0.1.0 → 0.1.1
Three chart bugs found during Phase-2 deploy on the live mesh
(qa-loop-state/incidents.md "bp-cnpg-pair chart bugs surfaced ..."):

  1. hot_standby is a fixed parameter in PG16 — CNPG rejects
     explicit set with phase "Unable to create required cluster
     objects". Removed from primary + replica postgresql.parameters.
  2. Replica Cluster CR was missing bootstrap.pg_basebackup —
     replica.enabled: true alone leaves phase stuck at
     "Setting up primary". Added pg_basebackup referencing the
     primary externalCluster + sslKey/sslCert/sslRootCert pinning
     the streaming_replica TLS material.
  3. Hand-rendered service-replication.yaml created
     <name>-primary-r which COLLIDED with CNPG's auto-created
     <name>-r Service (operator log: "refusing to reconcile
     service ..., not owned by the cluster"). Removed the standalone
     template; the global Service is now declared via the primary
     Cluster's spec.managed.services.additional[] (CNPG ≥ 1.22) and
     renamed <name>-primary-mesh to avoid the collision permanently.

- Add helm test (templates/tests/test-replication.yaml) asserting:
  * primary Cluster CR reaches Ready=True
  * CNPG-managed -mesh Service exists
  * service.cilium.io/global=true annotation propagated
  * pg_isready against -rw endpoint succeeds
- Update render-gate test: expected count 8 → 7 (Service removed),
  added fail-closed checks for hot_standby absence,
  bootstrap.pg_basebackup presence, and -mesh externalCluster host.
- Update README + values.yaml comments + DESIGN-style header in
  replica-cluster.yaml to reflect the new shape.

Phase-2 state captured in
.claude/qa-loop-state/phase-2-multi-region-state.md
.claude/qa-loop-state/incidents.md (incident #3 — bp-cnpg-pair
chart bugs surfaced).

Refs: #1101 (EPIC-6), qa-loop iter-6 fix-33

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:36:17 +04:00
..
adr docs(adr-0001): ratify Accepted with §2.3 K8s-Composition amendment (#1095 slice A1) (#1103) 2026-05-08 21:50:59 +04:00
lessons-learned fix(bp-flux): catalyst-cluster-reconciler ClusterRoleBinding overlay (closes #338) (#393) 2026-05-01 15:56:45 +04:00
proposals feat(wizard): job dependencies SVG DAG + (stretch) timeline view (closes #206) (#212) 2026-04-29 21:40:43 +02:00
ARCHITECTURE.md docs: ADR-0002 + ARCHITECTURE §11.1 + Inviolable #11 — post-handover sovereignty cutover (#794) (#797) 2026-05-04 21:23:29 +04:00
AUDIT-PROCEDURE.md docs(component-count): update 53 → 56 anchors after Pass 105 (spire + nats-jetstream + sealed-secrets) 2026-04-28 13:48:24 +02:00
BLUEPRINT-AUTHORING.md fix(bp-*): observability toggles default false — break circular CRD dependency 2026-04-29 19:23:52 +02:00
BOOTSTRAP-KIT-EXPANSION-PLAN.md docs(bootstrap-kit): expansion plan to 40+ HRs (Wave 2 dispatch reference) (#255) 2026-04-30 17:08:16 +04:00
BUSINESS-STRATEGY.md refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
CHART-AUTHORING.md fix(catalyst-chart): annotate api-deployment for Flux strategy-flip recovery 2026-04-29 18:04:07 +02:00
CLUSTERMESH-CLUSTER-IDS.md fix(cnpg-pair, cilium): qa-loop iter-6 Phase-2 multi-region closeout (#1101) (#1223) 2026-05-09 23:36:17 +04:00
COMPONENT-LOGOS.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00
DEMO-RUNBOOK.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00
EPICS-1-6-unified-design.md docs: flip 8 CRDs to 🚧 + amend ProvisioningState decision (slices A2+A3, #1095) (#1113) 2026-05-08 22:27:04 +04:00
FRANCHISE-MODEL.md docs(franchise),test(billing): voucher CRD propagation invariant 2026-04-28 13:59:31 +02:00
GLOSSARY.md docs(reconcile-pass-1): align docs with ground truth at dd578d1c 2026-04-29 09:40:10 +02:00
IMPLEMENTATION-STATUS.md docs: flip 8 CRDs to 🚧 + amend ProvisioningState decision (slices A2+A3, #1095) (#1113) 2026-05-08 22:27:04 +04:00
INVIOLABLE-PRINCIPLES.md docs: ADR-0002 + ARCHITECTURE §11.1 + Inviolable #11 — post-handover sovereignty cutover (#794) (#797) 2026-05-04 21:23:29 +04:00
MULTI-REGION-DNS.md docs(reconcile-pass-1): align docs with ground truth at dd578d1c 2026-04-29 09:40:10 +02:00
NAMING-CONVENTION.md refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
omantel-handover-wbs.md docs(wbs): Mermaid reflects ALL Phase-8a 2026-05-02 chart bug bash (#577) 2026-05-02 13:06:04 +04:00
ORCHESTRATOR-STATE.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00
PERSONAS-AND-JOURNEYS.md docs(unified-repo-model): collapse SME and corporate to one shape — Application = Gitea Repo 2026-04-28 10:13:02 +02:00
PLATFORM-POWERDNS.md docs(reconcile-pass-1): align docs with ground truth at dd578d1c 2026-04-29 09:40:10 +02:00
PLATFORM-TECH-STACK.md docs(reconcile-pass-1): align docs with ground truth at dd578d1c 2026-04-29 09:40:10 +02:00
PRODUCT-FAMILIES.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00
PROVISIONING-PLAN.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00
RUNBOOK-OPERATIONS.md docs(ops): comprehensive operator runbook + remediation playbook + idempotent recovery script 2026-04-29 19:26:29 +02:00
RUNBOOK-PROVISIONING.md merge: keep k3s local-path-provisioner; mark StorageClass default before Flux runs (closes #189) 2026-04-29 19:43:59 +02:00
SECRET-ROTATION.md fix(cloudinit): create flux-system/ghcr-pull secret on Sovereign so private bp-* charts pull cleanly 2026-04-29 18:07:27 +02:00
SECURITY.md refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
SOVEREIGN-PROVISIONING.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00
SRE.md refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
TECHNOLOGY-FORECAST-2027-2030.md refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171) 2026-04-29 08:51:09 +02:00
UI-REGRESSION-GUARDS.md fix(platform): sync blueprint.yaml versions with Chart.yaml (#199) 2026-04-29 22:07:55 +04:00
VALIDATION-LOG.md docs(reconcile-pass-2): align docs with ground truth at 6afdb303 2026-04-29 11:48:57 +02:00