Two bugs blocked the Phase-2 multi-region pair from converging on
omantel-fsn ↔ omantel-hel; both are addressed here:
bp-cilium overlay (omantel-fsn)
- Promote the kubectl-patched ClusterMesh values into the
per-Sovereign overlay at clusters/omantel.omani.works/bootstrap-kit/
01-cilium.yaml so resuming Flux on bootstrap-kit Kustomization keeps
the live mesh state. This is the chart-side fix mandated by
feedback_no_mvp_no_workarounds.md (operational kubectl patch is the
hack; overlay commit is the fix).
- Bump chart version 1.1.1 → 1.2.0 (already the live version after
manual reconcile; matches platform/cilium/chart/Chart.yaml).
- Add docs/CLUSTERMESH-CLUSTER-IDS.md as the registry for
cluster.id allocation (1 = omantel-fsn, 2 = omantel-hel, 3..255
reserved). Adds a duplicate-id check the next PR adding a peer
must run.
- Document the convention in platform/cilium/README.md.
bp-cnpg-pair chart 0.1.0 → 0.1.1
Three chart bugs found during Phase-2 deploy on the live mesh
(qa-loop-state/incidents.md "bp-cnpg-pair chart bugs surfaced ..."):
1. hot_standby is a fixed parameter in PG16 — CNPG rejects
explicit set with phase "Unable to create required cluster
objects". Removed from primary + replica postgresql.parameters.
2. Replica Cluster CR was missing bootstrap.pg_basebackup —
replica.enabled: true alone leaves phase stuck at
"Setting up primary". Added pg_basebackup referencing the
primary externalCluster + sslKey/sslCert/sslRootCert pinning
the streaming_replica TLS material.
3. Hand-rendered service-replication.yaml created
<name>-primary-r which COLLIDED with CNPG's auto-created
<name>-r Service (operator log: "refusing to reconcile
service ..., not owned by the cluster"). Removed the standalone
template; the global Service is now declared via the primary
Cluster's spec.managed.services.additional[] (CNPG ≥ 1.22) and
renamed <name>-primary-mesh to avoid the collision permanently.
- Add helm test (templates/tests/test-replication.yaml) asserting:
* primary Cluster CR reaches Ready=True
* CNPG-managed -mesh Service exists
* service.cilium.io/global=true annotation propagated
* pg_isready against -rw endpoint succeeds
- Update render-gate test: expected count 8 → 7 (Service removed),
added fail-closed checks for hot_standby absence,
bootstrap.pg_basebackup presence, and -mesh externalCluster host.
- Update README + values.yaml comments + DESIGN-style header in
replica-cluster.yaml to reflect the new shape.
Phase-2 state captured in
.claude/qa-loop-state/phase-2-multi-region-state.md
.claude/qa-loop-state/incidents.md (incident #3 — bp-cnpg-pair
chart bugs surfaced).
Refs: #1101 (EPIC-6), qa-loop iter-6 fix-33
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>