The recurring drift: Catalyst control-plane DNS placeholders that omit the
<location-code> segment, producing forms like gitea.<sovereign>,
gitea.<sovereign>.<domain>, gitea.<sovereign-domain>, keycloak.<domain>.
Per NAMING §5.1 the canonical form is
{component}.{location-code}.{sovereign-domain} (e.g. gitea.hfmp.openova.io).
The shorter forms aren't just abbreviations — they collapse the multi-region
location dimension and re-drift every time a reader reads them as obvious
shorthand.
Fixes:
- CLAUDE.md "Customer Sync" — both gitea.<sovereign>/catalog/... lines.
- docs/SOVEREIGN-PROVISIONING.md §3 DNS-records bullet (3 lines) + §5
Day-1 login line.
- docs/ARCHITECTURE.md §4 write-path Gitea label.
- docs/BLUEPRINT-AUTHORING.md §6.4 private-Blueprint Studio target.
- platform/librechat/README.md Keycloak issuer (Pass 22 marked clean and
missed this — banner scans miss YAML-block drift).
platform/nemo-guardrails/README.md verified clean.
Final grep confirms only canonical forms remain. Validation log Pass 29
entry added with the recurring-drift-pattern note for future passes.
platform/llm-gateway/README.md had three malformed DNS placeholders:
- KEYCLOAK_URL collapsed location-code + sovereign-domain into <domain> and
used Application namespace `ai-hub` as a Keycloak realm name. Per NAMING §7
and SECURITY §7, Keycloak realms are per-Org in SME-style or per-Sovereign
in corporate-style — never per-Application-namespace. Fixed to
`keycloak.<location-code>.<sovereign-domain>/realms/<org>`.
- ANTHROPIC_BASE_URL and `claude config set api_base` examples used
`llm-gateway.ai-hub.<domain>/v1` — but NAMING §5.2 establishes
Application endpoints as `{app}.{environment}.{sovereign-domain}`.
Fixed to `llm-gateway.<env>.<sovereign-domain>/v1`.
docs/IMPLEMENTATION-STATUS.md confirmed clean: CRD list, surfaces, and
control-plane component list all match canonical docs.
Sweep concern logged for `harbor.<domain>` / `:latest` image patterns
appearing across many platform READMEs — to be addressed in a dedicated
sweep pass rather than asymmetrically here.
Validation log Pass 25 entry added.
Pass 20 — drift-detection on SOVEREIGN-PROVISIONING + platform/kyverno.
Two real findings.
SOVEREIGN-PROVISIONING.md §8:
- "Existing Applications with `placement: active-active: false,
single-region` do not migrate automatically" — invalid YAML
mixing a boolean with an enum. The canonical placement model
(per GLOSSARY) has `placement.mode: single-region | active-
active | active-hotstandby`, no boolean toggle.
- Rewrote: "Existing Applications with `placement.mode: single-
region` ... user explicitly switches Placement to active-active
(or active-hotstandby) and adds the new region to
placement.regions".
platform/kyverno/README.md:
- Policy V5 (minimum-replicas-production) targeted namespaces
labeled `openova.io/env: production` — out-of-spec label name
AND value. NAMING-CONVENTION §6 establishes `openova.io/env-type:
prod` (hyphen-form, short value).
- Fixed to `openova.io/env-type: prod`.
Both findings show the same pattern: schema-level details that
survive grep-based banned-term checks but contradict the canonical
spec when read in body.
VALIDATION-LOG: Pass 20 entry added.
Refs #37
Pass 18 — drift-detection on NAMING-CONVENTION + platform/keycloak.
Two real findings.
NAMING-CONVENTION §11.1:
- The example list of Catalyst Environments included `bankdhofar-dr`
— but `dr` is NOT a valid env_type. Canonical values per §2.4 are
prod / stg / uat / dev / poc. DR is a Placement mode
(active-active / active-hotstandby across regions inside the
*-prod Environment), not a separate Environment.
- Replaced `bankdhofar-dr` with `bankdhofar-uat` and added an
explicit "DR is a Placement, not an Env Type" note.
platform/keycloak/README.md:
- Keycloak Deployment YAML example used `namespace: open-banking`
with 2 replicas — Fingate-specific narrative that contradicted
the per-Org / per-Sovereign topology stated in the banner.
Rewrote with two side-by-side examples:
* shared-sovereign (3 HA replicas, catalyst-keycloak namespace,
CNPG-backed)
* per-organization (1 replica in <org> namespace, optional
embedded DB for smallest SME tier)
- HA section was a single set of claims (2+ replicas, CNPG, Infinispan)
that only matched corporate. Now branches on topology — corporate
gets HA + Infinispan, SME gets single replica with restart-on-
deploy as acceptable for tier SLAs.
Same kind of drift Pass 17 caught in Harbor: banner says one thing,
body still describes the older model. Both fixed.
VALIDATION-LOG: Pass 18 entry added.
Refs #37
Pass 17 — drift-detection sweep on ARCHITECTURE + harbor. Two real
findings.
ARCHITECTURE §13 (OAM table):
- `| Trait | Blueprint overlay (`overlays/small|medium|large`) |`
has pipe chars inside backticks inside a Markdown table cell —
a known GFM rendering hazard. Replaced with comma-separated
examples.
platform/harbor/README.md:
- The banner added in Pass 9 said "every host cluster runs a
Harbor instance" but the body still described an older
"Harbor Primary / Harbor Replica" cross-region replication
topology. Same shape of architectural drift Pass 7 caught in
OpenBao/ESO/Gitea/Flux — banner-add doesn't rewrite the body.
- Three sections rewritten:
* Overview mermaid: now shows upstream-OCI → multiple
independent per-cluster Harbors with local Trivy scan + local
Pod pulls.
* "Multi-Region Replication" → "Per-host-cluster mirroring (NOT
primary-replica)". Single source of truth = upstream OCI
(ghcr.io/openova-io/* for Catalyst+Blueprints, customer CI for
application images), not a "primary Harbor".
* Example replication policy: was a `dest_registry` cross-region
push policy → now a pull-mirror policy from ghcr.io with
scheduled-cron trigger.
- "Why Mandatory" table reframed in per-host-cluster terms.
VALIDATION-LOG: Pass 17 entry added with the specific drift-detection
lesson — banner-addition passes don't catch body-level drift; need
explicit body re-reads.
Refs #37
Pass 15 swept all 52 platform/*/README.md files for the role-in-
Catalyst banner. 3 still lacked one (cnpg, flux, strimzi) and got
banners added:
- cnpg (§4.1): production Postgres; underlying engine for FerretDB +
Gitea metadata.
- flux (§3.2): per-vcluster Flux + host-level Flux for Catalyst
itself; pulls from single per-Sovereign Gitea.
- strimzi (§4.1): Application-tier event streaming; NOT the Catalyst
control-plane spine (which uses NATS JetStream). Same upstream-
tech-different-tier disambiguation pattern as Valkey.
CONVERGENCE: 52 / 52 platform components have role-in-Catalyst
banners. All cross-refs resolve. No banned terms. No architectural
drift detected on this pass.
VALIDATION-LOG: Pass 15 entry + "Convergence achieved (initial
banner sweep)" marker added. The validation loop continues per
the standing instruction — but subsequent passes will be brief
drift-detection sweeps rather than systematic rewrites.
Refs #37
Seven more Application Blueprint banners landed:
- temporal (§4.3): durable workflow orchestration; bp-fabric.
- flink (§4.3): stream + batch processing; bp-fabric.
- debezium (§4.2): CDC into Strimzi/Kafka; bp-fabric pipeline source.
- iceberg (§4.4): open table format on MinIO + archival S3.
- openmeter (§4.8): API metering for bp-fingate.
- litmus (§4.9): chaos engineering required by DORA / NIS2.
- valkey (§4.1): banner explicitly states NOT a Catalyst control-
plane component — control plane uses NATS JetStream KV per
ARCHITECTURE §5 / GLOSSARY event-spine. Valkey is Application-tier
caching only. This is the disambiguation that PLATFORM-TECH-STACK
§1 establishes ("same upstream technology can serve in multiple
categories") — pinned in the per-component README so it can't be
misread.
VALIDATION-LOG: Pass 14 entry added.
Refs #37
All 4 communication components (composing under bp-relay) got role-
in-Catalyst banners pointing at PLATFORM-TECH-STACK §4.5:
- stalwart: JMAP/IMAP/SMTP self-hosted email.
- livekit: WebRTC SFU for video/audio/data; pairs with STUNner.
- stunner: K8s-native TURN/STUN for WebRTC NAT traversal.
- matrix: Matrix protocol via Synapse server. Banner explicitly
disambiguates "Synapse" as the chat-server implementation, NOT
the deprecated OpenOva product noun (retired in favor of bp-axon).
All 4 are explicitly Application Blueprints, NOT Catalyst control
plane.
VALIDATION-LOG: Pass 13 entry added.
Refs #37
7 more component READMEs got role-in-Catalyst banners:
- vpa, keda, reloader → per-host-cluster scaling/ops layer (§3.4).
Reloader specifically calls out its role in Catalyst's secret-
rotation flow (rolling deploy on K8s Secret hash change).
- external-dns → per-host-cluster DNS-sync (§3.1); pairs with k8gb
for the GSLB zone separation.
- coraza → DMZ-block WAF on every host cluster (§3.1).
- crossplane → per-Sovereign on the management cluster (§3.2);
banner explicitly emphasizes the agreed "never a user-facing
surface" rule (Users don't write Compositions in Application
configs; Blueprint authors and advanced contributors do). Cross-
references the no-fourth-surface clause in ARCHITECTURE §4/§7
and the Crossplane Composition section in BLUEPRINT-AUTHORING §8.
- opentofu → repositioned as Phase-0-only, runs on `catalyst-
provisioner` only, NOT installed on host clusters at runtime.
opentofu drift fixes (uncovered by line-by-line read):
- Section 5 line 182: "Bootstrap Wizard prompts for cloud credentials"
→ "Catalyst Bootstrap (Phase 0) prompts for cloud credentials"
(banned term).
- Same section line 186: "ESO PushSecrets sync to both regional
OpenBao instances" — the active-active drift Pass 7 corrected
elsewhere, still here. Replaced with "writes go to the primary
OpenBao region only; replicas pick up via async perf replication".
VALIDATION-LOG: Pass 10 entry added.
Refs #37
Pass 9 — six more component READMEs got Catalyst-role banners
matching the rule of thumb in CLAUDE.md (every platform/<x>/README.md
should state its role in Catalyst).
- grafana: observability stack on every host cluster; Catalyst's
own self-monitoring + Application telemetry flows here.
- harbor: per-host-cluster container registry for Catalyst images,
mirrored Blueprint OCI artifacts, customer images.
- falco: runtime security on every host cluster; feeds SIEM/SOAR.
- kyverno: policy engine on every host cluster; enforces Catalyst
policy contracts (cosign on Blueprints, default-deny NetworkPolicies
on Organization namespaces, priority-class injection).
- sigstore: cosign-signed Blueprint OCI artifacts + admission
verification chain on every host cluster.
- syft-grype: SBOM generation in CI per Blueprint + runtime CVE scans.
Plus Kyverno priority-class clarification: prose around `tenant-high`
/ `tenant-default` / `tenant-batch` priority class names now reads
"Organization workloads" instead of "tenant workloads", with an
explicit note that the priority class artifact names themselves stay
as-is until a separate migration ticket renames them in deployed
clusters (renaming PriorityClass objects requires recreate, not
in-place rename).
VALIDATION-LOG: Pass 9 entry added.
Refs #37
Pass 8 — line-by-line read of platform/cnpg, platform/strimzi,
platform/k8gb, platform/keycloak, platform/cert-manager, platform/cilium.
CNPG and Strimzi: read in full and confirmed clean — they correctly
position themselves as Application Blueprints and don't drift from
the canonical model. CNPG's `<org>-postgres-dr` cluster name
(Application-tier database role) is acceptable per NAMING-CONVENTION
§1.3 (which only forbids primary/dr in K8s host-cluster names, not
in Application-internal CRD names).
Four READMEs updated:
k8gb:
- Header reframed: per-host-cluster infrastructure pointer to
PLATFORM-TECH-STACK §3.1 and SRE §2.4 split-brain protection.
- Removed dead link to ../failover-controller/docs/ADR-FAILOVER-
CONTROLLER.md (the failover-controller folder has no docs/);
replaced with link to that component's README + SRE §2.4.
keycloak:
- Header reframed from "FAPI Authorization Server for Open Banking"
(narrow) to "User identity for Catalyst Sovereigns" (broad).
Keycloak handles ALL user identity in Catalyst, not just FAPI.
- Added per-Org / per-Sovereign topology callout matching SECURITY
§6. Clarified that "Multi-tenant TPP" refers to PSD2 Third Party
Providers, not Catalyst's Organization-level multi-tenancy.
- FAPI features kept since Keycloak still serves Fingate as the
FAPI Authorization Server.
cert-manager:
- Header reframed as per-host-cluster infrastructure with pointer
to PLATFORM-TECH-STACK §3.3.
cilium:
- Header reframed as per-host-cluster infrastructure with pointer
to PLATFORM-TECH-STACK §3.1, including the install-first note
(CNI must come before any other workload during Phase 0).
VALIDATION-LOG: Pass 8 entry added.
Refs #37
Continuing Pass 7 cleanup after the OpenBao/ESO rewrite (42aeb62).
Gitea README:
- Was describing "Bidirectional mirroring for multi-region" with two
Gitea instances mirroring repos cross-region. Wrong: Catalyst's
agreed model has one Gitea per Sovereign on the management cluster
(PLATFORM-TECH-STACK §2.3). Replaced the multi-region mirror
diagram with a single-Gitea + intra-cluster HA topology and added
a "Why not cross-region bidirectional mirror" explainer (write-
conflict semantics would break EnvironmentPolicy enforcement).
- Status banner: notes the canonical references.
- Backup section: removed "Repository mirror for redundancy"
(replaced with Velero scheduled backups).
Flux README:
- "Multi-Region GitOps" section was showing one Gitea per region
with bidirectional mirror. Replaced with one Gitea per Sovereign
topology. Per-vcluster Flux pulls from this single Gitea.
Mermaid syntax bug:
- Earlier mass replace_all of "Catalyst IDP" → "Catalyst console"
had left an invalid mermaid node identifier
`Catalyst console[Catalyst console]` (mermaid forbids spaces in
node IDs). Fixed to `Console[Catalyst console]`. Would have
rendered as a broken diagram on GitHub.
VALIDATION-LOG: Pass 7 entry added documenting the OpenBao/ESO
active-active rewrite (the most consequential drift fix in any pass).
Refs #37
Pass 7 — line-by-line read of platform/openbao/README.md and
platform/external-secrets/README.md found a major architectural drift:
both files described an OLD active-active bidirectional sync model
that contradicts docs/SECURITY.md §5 (the canonical reference).
The active-active design was rejected during the architecture session
because it would have been a stretched cluster — a single region's
network blip would block writes everywhere. The agreed model is:
- Independent Raft cluster per region (intra-region quorum only).
- Single-primary writes; replicas accept reads only.
- Async Performance Replication primary → replicas (lag <1s typical).
- Explicit DR promotion (sovereign-admin or failover-controller).
Fixes:
platform/openbao/README.md:
- Overview: removed "active-active deployments" / "either region can
update secrets". Replaced with "independent Raft cluster per region",
"asynchronous Performance Replication".
- Architecture diagram: replaced bidirectional-push diagram with the
primary→replicas async perf replication topology that matches
SECURITY.md §5.
- ClusterSecretStores: simplified from "two stores (local+remote)" to
"one local store"; reads always pull locally.
- Renamed "PushSecret (Bidirectional)" → "Writes go to the primary
region" with a single-target PushSecret pointing at bao-primary.
- Added DR promotion section pointing at SECURITY.md §5.2.
- Status banner: notes that the canonical multi-region reference is
SECURITY.md.
platform/external-secrets/README.md:
- Header line: repositioned as per-host-cluster infrastructure with
pointer to PLATFORM-TECH-STACK §3.3.
- Removed broken link to non-existent ../openbao/docs/ADR-OPENBAO.md
(replaced with link to ../openbao/README.md).
- "Multi-region sync | Push to both OpenBao instances simultaneously"
→ "Multi-region reads | Async perf replication".
- "PushSecret to Multiple OpenBao Instances" example was writing to
two ClusterSecretStores in parallel — replaced with single-target
primary write.
- "Multi-region sync via single PushSecret" in Consequences →
"Cross-region availability via Performance Replication".
- Mermaid sequence diagram: "Bootstrap Wizard" actor → "Catalyst
Bootstrap (Phase 0)"; "Terraform" → "OpenTofu"; ESO connection
description "via K8s auth" → "via SPIFFE SVID (workload identity)".
These were the most consequential drift fixes found in any pass —
two READMEs were documenting an architecture explicitly rejected by
the agreed model.
Refs #37
Pass 2 — fresh-eyes sweep across the entire docs tree. One residual
entity-noun usage found:
- platform/external-secrets/README.md:75 (in a Mermaid sequence
diagram): "Note over Wizard: Operator saves unseal keys offline"
— "Operator" used as person/entity. Renamed to "sovereign-admin"
to match the role from GLOSSARY.md.
All other banned-term sweeps clean:
- No tenant (architectural) anywhere.
- No Catalyst IDP anywhere.
- No Synapse-as-product anywhere (only the legitimate
"Matrix/Synapse server" usages).
- No workspace-controller (only the banned-term entries that define
the rename).
- No capital-W Workspace as Catalyst scope.
- No github.com/openova (without -io).
- All cross-doc Markdown links resolve.
- All §X references resolve to the new section numbering after
PLATFORM-TECH-STACK reorg.
- API group catalyst.openova.io/v1alpha1 consistent across 6 references.
- OCI artifact prefix `bp-` consistent across README, CLAUDE,
BLUEPRINT-AUTHORING, IMPLEMENTATION-STATUS.
Other "Operator" mentions intentionally retained (legitimate
technical usage):
- "External Secrets Operator (ESO)", "Trivy Operator" — K8s
Operator pattern (controllers), explicitly allowed by GLOSSARY.
- "Operator compatibility" in BUSINESS-STRATEGY's OpenShift migration
table — refers to compatibility with K8s Operators (the technology),
not as an entity/role.
Refs #37
Remove hierarchical grouping (networking/, security/, etc.) and use flat
structure for all 41 platform components.
Changes:
- All components now directly under platform/ (no subfolders)
- AI Hub components moved from meta-platforms/ai-hub/components/ to platform/
- Open Banking components (lago, openmeter) moved to platform/
- meta-platforms/ now only contains README files that reference platform/
- Open Banking custom services remain in meta-platforms/open-banking/services/
Structure:
- platform/ (41 components, flat)
- meta-platforms/ai-hub/ (README only, references platform/)
- meta-platforms/open-banking/ (README + 6 custom services)
All documentation links updated.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Harbor moved from storage/ to registry/ (artifact management, not storage)
- Kyverno moved from security/ to policy/ (policy engine for validation,
mutation, generation - broader than just security)
Updated structure:
- platform/registry/harbor/
- platform/policy/kyverno/
All documentation links updated accordingly.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>