a987748b42
19 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
59cdfe5a77
|
docs: ADR-0002 + ARCHITECTURE §11.1 + Inviolable #11 — post-handover sovereignty cutover (#794) (#797)
Adds the documentation set for the self-sovereignty cutover seam: - NEW docs/adr/0002-post-handover-sovereignty-cutover.md following ADR-0001's shape (Status, Context, Decision, Consequences, Alternatives Considered). Documents the 8-tether map, the 30/70 provisioning split, the operator-driven trigger model, and the egress-block DoD proof. - ARCHITECTURE.md §11 now carries a §11.1 Phase 2 — Self-Sovereignty Cutover subsection with the 8-Job table, mermaid Phase-0 → Phase-1 → Handover → Phase-2 → Day-2 diagram, and links to issues #790/#791/#792/#793/#794. - INVIOLABLE-PRINCIPLES.md adds Principle #11: Sovereigns must be independent of openova-io after handover. Trigger phrase, cold-start exception, and cutover requirement spelled out. Cites #790 (umbrella), #791 (chart), #792 (api), #793 (ui), #794 (this PR). Extends, does not contradict, ADR-0001 §11 (Catalyst-on-Catalyst) and §2 (Inviolable Principles). Closes #794 Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> |
||
|
|
53bc4357ca
|
feat(provisioner): cluster-autoscaler-hcloud + wizard footprint estimate (closes #767) (#776)
* feat(provisioner): cluster-autoscaler-hcloud + wizard footprint estimate (closes #767) Two-pronged fix for the FailedScheduling pattern that hit otech92 (2x cpx32 workers couldn't fit external-secrets-webhook because the bootstrap-kit ate the full 16 GB): 1. PRE-LAUNCH ESTIMATE — wizard StepReview now surfaces a "Footprint estimate" Section with: bootstrap-kit baseline (sum of mandatory-tier component footprints), selected components delta, control-plane overhead, and a "Recommended N x <SKU>" line that turns amber when the operator's chosen worker count is below the rollup. Backed by per-component RAM/CPU floors in components/wizard/steps/componentFootprints.ts (covered by 12 unit tests including the otech92 reproduction). 2. RUNTIME AUTOSCALING — new bp-cluster-autoscaler-hcloud Blueprint added at bootstrap-kit slot 40. Wraps the upstream kubernetes/autoscaler chart 9.46.6 (appVersion 1.32.0) with the Hetzner cloud-provider. Token wired from the canonical flux-system/cloud-credentials.hcloud-token Secret cloud-init writes (mirrors the velero/harbor object-storage pattern). Pinned to the control-plane node so the autoscaler never schedules onto a worker it could itself terminate. 10-minute scale-down idle as the cost-saving default. Documented in docs/ARCHITECTURE.md sec.14 (Autoscaling) — explains how VPA / HPA / KEDA / cluster-autoscaler compose, why we picked cluster-autoscaler over KEDA for cluster scaling, and the bounds + safety story. Per the issue's MVP scope, this PR ships the blueprint + StepReview estimate WITHOUT the wizard StepProvider min/max pair refactor or the tofu node-pool template restructuring. Those are tracked as a follow-up issue (scope-control rule per docs/INVIOLABLE-PRINCIPLES.md #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(provisioner): move cluster-autoscaler to slot 50 + register in expected-bootstrap-deps Slot 40 was already forward-declared for bp-llm-gateway in scripts/expected- bootstrap-deps.yaml — the dependency-graph-audit CI check fired on PR #776 because the file existed without a matching entry in the expected DAG, AND collided with a reserved slot. Move to slot 50 (after the W2.K4 cohort + slot 49 bp-cert-manager-powerdns-webhook) and add the matching entry to the expected-bootstrap-deps.yaml so the audit passes. `scripts/check-bootstrap-deps.sh` runs clean locally now (drift=0, cycles=0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
04559e5c37 |
docs(reconcile-pass-1): align docs with ground truth at dd578d1c
Reconcile Pass 1 — first holistic LLM-driven reconciliation pass per ~/.claude/skills/reconcile-catalyst-docs/SKILL.md. Skill triggered after the post-Group-M architectural batch (#161, #162, #163, #167, #168, #169, #170, #171, #173, #174, #175). Live ground truth verified against kubectl + ls platform/ + git log + GHCR + componentGroups.ts. Drift categories fixed: - A. Numerical: bp-powerdns 1.0.5 → 1.0.6; component-logos 63 → 62 (powerdns SVG missing, tracked under #173); bootstrap kit 11 → 12 with bp-powerdns added per #167. - B. Service: pool-domain-manager + 5 registrar adapters (Cloudflare/Namecheap/GoDaddy/OVH/Dynadot, #170) added to IMPLEMENTATION-STATUS, ARCHITECTURE, PLATFORM-TECH-STACK, GLOSSARY, and PROVISIONING-PLAN; bp-powerdns added to ARCHITECTURE bootstrap kit + Catalyst-on-Catalyst dependency tree. - C. Architectural: SOVEREIGN-PROVISIONING §3 + DEMO-RUNBOOK Step 4 + ORCHESTRATOR-STATE Step 6 rewritten from Dynadot-direct DNS writes to PowerDNS authoritative + PDM /v1/commit + registrar-adapter NS-flip; PROVISIONING-PLAN Phase 4 paths corrected to products/catalyst/bootstrap/api/ (per INVIOLABLE-PRINCIPLES #3 the Go provisioner does NOT call cloud APIs); Phase 6 retitled and rewritten for the new DNS architecture. - D. Process: RUNBOOK-PROVISIONING §2 wizard-step table + DEMO-RUNBOOK Step 2 wizard-step table updated to canonical 7-step ordering (Org → Domain → Topology → Provider → Credentials → Components → Review per WIZARD_STEPS in WizardLayout.tsx, post #169 + #174); the three-mode StepDomain (pool / byo-manual / byo-api per #169) and two-tab StepComponents (mandatory infra + apps per #161/#162/#175) now documented. - E. Cross-doc: Group G ✅ across PROVISIONING-PLAN + ORCHESTRATOR-STATE (superseded by #167+#163+#170, not by the original Dynadot-multi-domain plan); Group C ✅ in PROVISIONING-PLAN (Flux is reconciling from openova-public today); README Stack-at-a-glance DNS row expanded. - F. Stale terminology: 11-grep banned-terms scan clean — every k8gb residual is a legitimate "removed at #171, replaced by lua-records" reference. VALIDATION-LOG.md gains the Reconcile Pass 1 entry per skill spec. Reconcile-skill numbering is independent of the Audit-skill numbering (which continues at Pass 108+). Files: 13 docs + VALIDATION-LOG entry. Escalations: none. |
||
|
|
f5daac52af |
refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171)
PowerDNS lua-records (`ifurlup`, `pickclosest`, `ifportup`) cover everything k8gb was doing — geo-aware response selection, health-checked failover, weighted round-robin — at the authoritative DNS layer. Eliminates a separate K8s controller, CRD set, and CoreDNS plugin from every Sovereign. Changes: - platform/k8gb/ deleted (Chart.yaml, values.yaml, blueprint.yaml never authored — only README existed) - products/catalyst/bootstrap/ui/public/component-logos/k8gb.svg deleted - componentGroups.ts: remove k8gb component (PowerDNS already there) - componentLogos.tsx: drop logo_k8gb + k8gb map entry - model.ts DEFAULT_COMPONENT_GROUPS spine: replace k8gb with powerdns - StepInfrastructure.tsx: copy refers to PowerDNS lua-records, not k8gb - provision.html: replace k8gb tile and edges with powerdns - catalog.generated.ts regenerated (now includes bp-powerdns) - docs sweep — every k8gb reference in PLATFORM-TECH-STACK, NAMING- CONVENTION, SOVEREIGN-PROVISIONING, SRE, ARCHITECTURE, GLOSSARY, COMPONENT-LOGOS, IMPLEMENTATION-STATUS, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST, README, infra/hetzner/README, platform READMEs (cilium, external-dns, failover-controller, litmus, flux, opentofu) rewritten to point at PowerDNS lua-records / MULTI-REGION-DNS.md. Historical entries in VALIDATION-LOG.md preserved as audit trail. - New docs/MULTI-REGION-DNS.md — canonical reference for the lua-record patterns (ifurlup all/pickclosest/pickfirst, ifportup, pickwhashed), Application Placement → lua-record selector mapping, when to add a second Sovereign region, operational checks. Closes #171. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7cafa3c894 |
docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay
Component-level architectural correction (two changes): 1. MinIO → SeaweedFS as unified S3 encapsulation layer The old design used MinIO for in-cluster S3 plus separate cold-tier configuration scattered across consumers. The new design positions SeaweedFS as the single S3 encapsulation layer: every Catalyst component talks to one endpoint (seaweedfs.storage.svc:8333). SeaweedFS internally handles hot tier (in-cluster NVMe), warm tier (in-cluster bulk), and cold tier (transparent passthrough to cloud archival storage — Cloudflare R2 / AWS S3 / Hetzner Object Storage / etc., chosen at Sovereign provisioning). One audit/lifecycle/encryption boundary instead of N. No Catalyst component talks to cloud S3 directly anymore — Velero, CNPG WAL archive, OpenSearch snapshots, Loki/Mimir/Tempo, Iceberg, Harbor blob store, Application buckets all share one S3 surface. 2. Apache Guacamole added as Application Blueprint §4.5 Communication Clientless browser-based RDP/VNC/SSH/kubectl-exec gateway. Keycloak SSO, full session recording to SeaweedFS for compliance evidence (PSD2/DORA/SOX). Composed into bp-relay. Replaces VPN+native-client distribution for auditable remote access. Component changes: - DELETED: platform/minio/ - CREATED: platform/seaweedfs/README.md (unified S3 + cold-tier encapsulation; bucket layout; multi-region replication via shared cold backend; migration-from-MinIO section) - CREATED: platform/guacamole/README.md (clientless remote-desktop gateway; GuacamoleConnection CRD; compliance integration via session recordings) Doc updates: PLATFORM-TECH-STACK §1+§3.5+§4.5+§5+§7.4; TECHNOLOGY-FORECAST L11+mandatory+a-la-carte counts (52 → 53); ARCHITECTURE §3 topology; SECURITY §4 DB engines; SOVEREIGN-PROVISIONING §1 inputs; SRE §2.5+§7; IMPLEMENTATION-STATUS §3; BLUEPRINT-AUTHORING stateful examples; BUSINESS-STRATEGY 13 component-count anchors + Relay product line; README.md backup row; CLAUDE.md folder count. Component README updates (S3 endpoint + dependency renames): cnpg, clickhouse, flink, gitea, iceberg, harbor, grafana, livekit, kserve, milvus, opensearch, flux, stalwart, velero (substantive rewrite of velero — now writes exclusively to SeaweedFS with cold-tier auto-routing). Products: relay, fabric. UI scaffold: products/catalyst/bootstrap/ui/src/shared/constants/components.ts — minio entry replaced with seaweedfs; velero+harbor deps updated; new guacamole entry added. VALIDATION-LOG entry "Pass 104 — MinIO → SeaweedFS swap + Guacamole add" captures the encapsulation principle and adds Lesson #22: storage tier policy belongs at the encapsulation boundary, not inside every consumer. Verification: zero remaining MinIO references in canonical docs (one intentional retention in TECHNOLOGY-FORECAST L37 explaining the swap); 53 platform/ folders matching all "53 components" anchors; bp-relay composition includes guacamole. |
||
|
|
0a6179dd21 |
docs(unified-repo-model): collapse SME and corporate to one shape — Application = Gitea Repo
Architectural correction. Replaces the previous "one Gitea repo per Environment with Apps as folders" rule with a single uniform shape that scales by configuration only: - Catalyst Application = one Gitea Repo (always, regardless of scale) - Branches develop/staging/main map to dev/stg/prod environments - 5 conventional Gitea Orgs per Sovereign: catalog (public mirror), catalog-sovereign (Sovereign-curated private Blueprints), one per Catalyst Organization (with shared-blueprints + N App repos), system (sovereign-admin scope) - EnvironmentPolicy CR lives in system/catalyst-config/policies/, same shape for SME and corporate; only field values differ Removes the SME-vs-corporate dual-shape design that violated the "Application is application" invariant. Teams primitive (proposed for corporate scale) is dropped — team boundaries emerge from CODEOWNERS at the App-repo level. RE-score thresholds and EnvironmentPolicy fields are universal defaults; only their values vary per Org's policy choice. Files updated line-by-line: GLOSSARY (Application + Environment definitions, new Gitea-Orgs section, 6 component-row updates), NAMING §11.2 (Realization 7-bullet rewrite), ARCHITECTURE (§1, §3 topology, §4 write-side ASCII, §7.1+§7.2+§7.3, §8 promotion, §9 multi-App linkage), PERSONAS-AND-JOURNEYS (§2 surfaces, §4.1 Ahmed, §4.2 Layla full rewrite), BLUEPRINT-AUTHORING §1 (catalog-sovereign source location), PLATFORM-TECH-STACK §2.2+§2.3, SECURITY §3, SOVEREIGN-PROVISIONING §5+§8+§10, IMPLEMENTATION-STATUS §5, SRE §14. VALIDATION-LOG entry "Pass 103 — UNIFIED REPO MODEL REFACTOR" captures the architectural correction and acknowledges the prior 102-pass audit anchored on the wrong shape (text-shape consistency was correct; the chosen text-shape was inadequate). Lesson #21 added: text-shape audits don't substitute for architectural review. Verification: zero remaining old-model assertions in canonical docs (grep clean for 'Environment Gitea repo', '/{org}/{org}-{env_type}', 'per-Environment Gitea repos', 'applications/<app>/values', etc.). |
||
|
|
9af6717dcc |
docs(pass-61): ARCHITECTURE §4 box alignment (Pass 29 carry-over); cnpg clean
ARCHITECTURE §4 (Write side) box at L121 had alignment drift from
Pass 29's expansion to canonical FQDN form. Line content reached
89 chars while box border was 74 chars — overflow. Same drift
category as Pass 53's §8 acme-stg alignment fix.
Fixed by replacing the in-box content with a shorter form pointing
to NAMING §11.2 for the FQDN (already canonical there + 4 other places):
- Old: │ Gitea: gitea.<location-code>.<sovereign-domain>/{org}/{org}-{env_type} │ (89 chars)
- New: │ Environment Gitea repo: {org}/{org}-{env_type} │
│ (FQDN form per NAMING §11.2) │
Also normalized whitespace padding across L122-L130 (uniform 76 chars).
ARCHITECTURE §1-§14 third-cycle deep re-scan with all current methodology
lenses confirmed otherwise clean. §5 <env> shorthand explicitly defined,
§9 catalyst.openova.io/v1alpha1 canonical, §10/§11/§12 all consistent
with downstream canonical references.
platform/cnpg/README.md: clean. Banner correct (§4.1 Data services).
namespace: databases ✓, minio.storage.svc ✓, postgres.<env>.<sovereign-domain> ✓
(Pass 35 fix held). Cross-region DR example uses canonical Application
DNS — no Pass-60-style fully-qualified-hostname drift.
Methodology lesson #19: Pass-N expansion of placeholder-to-canonical-form
inside ASCII tables/diagrams must verify box alignment afterward. Pass 29
expansion broke alignment at §4 (this pass) and §8 (Pass 53).
|
||
|
|
bb15e03884 |
docs(pass-53): ARCHITECTURE §8 column alignment (Pass 39 carry-over); langfuse clean
ARCHITECTURE §8 (Promotion across Environments) L287 had column- alignment drift from Pass 39's `replace_all acme-staging → acme-stg`. The 12-char acme-staging filled the column padding; the 8-char acme-stg shifted "1.3.0" left of the adjacent "1.4.0"/"1.2.0" values. PERSONAS-AND-JOURNEYS L230 had the same Pass 39 fix but I'd done that as an explicit Edit with proper padding; ARCHITECTURE used replace_all which produced misaligned 7-space gap. Fixed: acme-stg padded to acme-stg + 11 spaces (was 7) so all four rows in the §8 mockup table align at the version column. Methodology lesson #17: replace_all on shorter strings inside ASCII code-block tables silently breaks column alignment. Greps can't detect whitespace-alignment drift; manual column-check after replace_all is needed. ARCHITECTURE.md §1-§14 deep re-scan with all current lessons: - §3 Topology: 15-component Catalyst control plane matches PTS §2 union (post-Pass 40). Per-host-cluster list omits OpenTofu (bootstrap-only/not-runtime) defensibly. - §5 explicitly defines <env> as {org}-{env_type} — anchors the ws.<env>.> shorthand Pass 30 noted. - §10 11-component bootstrap kit matches SOVEREIGN-PROVISIONING §3. - §11 bp-catalyst-* list matches IMPLEMENTATION-STATUS §2. - §12 Independent-failure-domains cites OpenBao per-region Raft ✓. platform/langfuse/README.md: clean. Banner correct (§4.7 AI Observability). Distinguishes per-host-cluster Grafana stack from Application-level LangFuse correctly. Drift found. Consecutive-clean count remains 0 but drift surface shifting toward cosmetic territory (column alignment, freshness) rather than architectural. |
||
|
|
9ae1531878 |
docs(pass-39): non-canonical *-staging env_type drift; clickhouse clean
NAMING §2.4 establishes the 3-char env_type form (prod|stg|uat|dev|poc)
but multiple Environment-name examples used the long form `staging`.
ARCHITECTURE.md §8 (Promotion across Environments): 3 instances of
acme-staging (Blueprint detail mockup L287, prose L295, EnvironmentPolicy
sourceEnvironment L310) renamed to acme-stg.
PERSONAS-AND-JOURNEYS.md: 3 instances renamed —
- digital-channels-staging → digital-channels-stg (Layla narrative L126, L135)
- acme-staging → acme-stg (Blueprint detail mockup L230)
Pass 33 fixed Layla's DNS but left the env_type spelling.
Preserved: payment-rail-staging (Application name, free-form per NAMING)
and minimum-replicas-production (Kyverno policy identifier).
ARCHITECTURE.md deep re-scan with Pass 23 lesson (focus on later
sections): §5-§13 substantively clean. §5 explicitly defines <env> as
{org}-{env_type} which retroactively grounds the ws.<env>.> shorthand
Pass 30 noted as "documented shorthand".
platform/clickhouse/README.md: clean. minioadmin literal placeholder
flagged for future security-hardening pass but not Catalyst drift.
|
||
|
|
4793cab8b6 |
docs(pass-29): DNS-placeholder sweep across canonical docs
The recurring drift: Catalyst control-plane DNS placeholders that omit the
<location-code> segment, producing forms like gitea.<sovereign>,
gitea.<sovereign>.<domain>, gitea.<sovereign-domain>, keycloak.<domain>.
Per NAMING §5.1 the canonical form is
{component}.{location-code}.{sovereign-domain} (e.g. gitea.hfmp.openova.io).
The shorter forms aren't just abbreviations — they collapse the multi-region
location dimension and re-drift every time a reader reads them as obvious
shorthand.
Fixes:
- CLAUDE.md "Customer Sync" — both gitea.<sovereign>/catalog/... lines.
- docs/SOVEREIGN-PROVISIONING.md §3 DNS-records bullet (3 lines) + §5
Day-1 login line.
- docs/ARCHITECTURE.md §4 write-path Gitea label.
- docs/BLUEPRINT-AUTHORING.md §6.4 private-Blueprint Studio target.
- platform/librechat/README.md Keycloak issuer (Pass 22 marked clean and
missed this — banner scans miss YAML-block drift).
platform/nemo-guardrails/README.md verified clean.
Final grep confirms only canonical forms remain. Validation log Pass 29
entry added with the recurring-drift-pattern note for future passes.
|
||
|
|
eff264b077 |
docs(pass-17): ARCHITECTURE OAM table pipe-fix + Harbor README de-drift
Pass 17 — drift-detection sweep on ARCHITECTURE + harbor. Two real
findings.
ARCHITECTURE §13 (OAM table):
- `| Trait | Blueprint overlay (`overlays/small|medium|large`) |`
has pipe chars inside backticks inside a Markdown table cell —
a known GFM rendering hazard. Replaced with comma-separated
examples.
platform/harbor/README.md:
- The banner added in Pass 9 said "every host cluster runs a
Harbor instance" but the body still described an older
"Harbor Primary / Harbor Replica" cross-region replication
topology. Same shape of architectural drift Pass 7 caught in
OpenBao/ESO/Gitea/Flux — banner-add doesn't rewrite the body.
- Three sections rewritten:
* Overview mermaid: now shows upstream-OCI → multiple
independent per-cluster Harbors with local Trivy scan + local
Pod pulls.
* "Multi-Region Replication" → "Per-host-cluster mirroring (NOT
primary-replica)". Single source of truth = upstream OCI
(ghcr.io/openova-io/* for Catalyst+Blueprints, customer CI for
application images), not a "primary Harbor".
* Example replication policy: was a `dest_registry` cross-region
push policy → now a pull-mirror policy from ghcr.io with
scheduled-cron trigger.
- "Why Mandatory" table reframed in per-host-cluster terms.
VALIDATION-LOG: Pass 17 entry added with the specific drift-detection
lesson — banner-addition passes don't catch body-level drift; need
explicit body re-reads.
Refs #37
|
||
|
|
fec0c342a8 |
docs(pass-6): reconcile topology diagram + unify JetStream Account scoping
Pass 6 — fresh-eyes line-by-line read of ARCHITECTURE.md. Found two
internal contradictions that earlier passes missed.
ARCHITECTURE §3 (topology diagram) listed Crossplane, Flux, Harbor,
and grafana-stack INSIDE the Catalyst control plane block. But §11
(Catalyst-on-Catalyst) explicitly says these are per-host-cluster
infrastructure, NOT Catalyst control-plane components. PLATFORM-TECH-
STACK §3 also classifies them as per-host-cluster.
Fixed: §3 topology diagram now shows only true Catalyst control-plane
components (console, marketplace, admin, catalog-svc, projector,
provisioning, environment-controller, blueprint-controller, billing,
gitea, nats-jetstream, openbao, keycloak, spire-server, observability)
and adds a separate line for "Plus per-host-cluster infrastructure"
that defers to PLATFORM-TECH-STACK §3 for the full list (Cilium, Flux,
Crossplane, cert-manager, ESO, Kyverno, Harbor, Reloader, Trivy, Falco,
Sigstore, Syft+Grype, VPA, KEDA, External-DNS, k8gb, Coraza, MinIO,
Velero, failover-controller). Also added the previously-missing
`provisioning` row.
JetStream Account scoping was contradictory:
- ARCHITECTURE §5 said "Per-Org account: ws.{org}-{env_type}.>" —
reads ambiguously: is the Account per-Org or per-Env?
- NAMING-CONVENTION §11.2 said "One JetStream Account scoped to
ws.{org}-{env_type}.>" — implied per-Environment.
- GLOSSARY + PLATFORM-TECH-STACK + SECURITY all say per-Organization.
Reconciled to the per-Org-Account-with-per-Env-subjects model:
- Account isolation: ONE NATS Account per Organization.
- Subjects within the Account use prefix `ws.{org}-{env_type}.>` for
per-Environment partitioning.
This is the cleanest isolation model: Accounts are NATS' strongest
isolation boundary (per-Org); subjects partition further within each
Account (per-Env).
Refs #37
|
||
|
|
ba048d2fd7 |
docs(pass-5b): scrub remaining "instance" usages where "Application" is meant
Two user-facing residuals where the banned product term "instance" slipped through: - docs/ARCHITECTURE.md §9: example console dialog "Use existing instance or create a dedicated one?" → "Use an existing Postgres Application or create a new dedicated one?". This is a UI prompt text — must use the user-facing noun "Application", not "instance". - docs/NAMING-CONVENTION.md §6.2 tag comment: "Application instance name" → "Application name within the Environment". The CRD might internally still use the noun Instance for class-vs-instance semantics, but in tag annotations and user-visible context the Application IS the instance. Other "instance" occurrences confirmed legitimate (Postgres instance as Crossplane resource type, Flux instance as software deployment, EC2/Hetzner instance as cloud-provider terminology) and retained. Final cross-reference check: all Markdown links across all canonical docs resolve. No residual banned terms. Refs #37 |
||
|
|
79c59a27a2 |
docs(pass-5): reconcile Phase-0 install order, IMPLEMENTATION-STATUS section numbering
Pass-5A — fresh-eyes deep read found two structural drifts. ARCHITECTURE §10 Phase-0 install order: - Old order: cert-manager → Cilium → Flux → ... → Catalyst control plane. - SOVEREIGN-PROVISIONING §3 has the correct order: Cilium first (CNI must be in place before pods can network), THEN cert-manager. - ARCHITECTURE updated to match: Cilium → cert-manager → Flux → Crossplane → Sealed Secrets → SPIRE → JetStream → OpenBao → Keycloak → Gitea → Catalyst control plane (11 items, matching the SOVEREIGN-PROVISIONING list which had Keycloak and Gitea spelled out separately). IMPLEMENTATION-STATUS section numbering: - Old: §1 → §2 → §2bis → §3 → §4 → §5 → §6 → §7 → §8. The "§2bis" was a workaround for inserting per-host-cluster infrastructure without renumbering. Reads weird. - New: §1 → §2 → §3 → §4 → §5 → §6 → §7 → §8 → §9. Clean numbering. Refs #37 |
||
|
|
d1a2ed73a3 |
docs(pass-4): align ARCHITECTURE phase numbering with SOVEREIGN-PROVISIONING
ARCHITECTURE §10 listed 3 provisioning phases (Phase 0 / 1 / 2) and labeled Phase 2 as "Self-sufficient". SOVEREIGN-PROVISIONING.md uses 4 phases (Phase 0 Bootstrap / Phase 1 Hand-off / Phase 2 Day-1 setup / Phase 3 Steady-state). The same phase number meant different things in the two docs. Aligned ARCHITECTURE to the 4-phase numbering. SOVEREIGN-PROVISIONING is now explicitly the canonical reference for phase semantics. Refs #37 |
||
|
|
80b91709e1 |
docs(iter-3-5): purge operator-as-entity, fix Workspace-controller capital, JetStream KV references
ARCHITECTURE (iter 3): - Removed catalystctl from the §4 write-side diagram (it's read-only; presenting it as a write input contradicted §7.4). - "Both tabs read the same Valkey snapshot" → "JetStream KV snapshot" in §5 (Valkey is no longer in the control plane). - §7.4: catalystctl reframed as "may exist as small read-only debug CLI" rather than implying it ships today. - §11 dependency list: added bp-catalyst-provisioning; removed bp-catalyst-crossplane (Crossplane is per-host-cluster infra, not a Catalyst control-plane component); added clarifying note. - §12 CRD list: added SecretPolicy + Runbook (were already in IMPLEMENTATION-STATUS but missing from the principles table). - §2 SME-style description: "SaaS Operator team (Omantel staff)" → "SaaS provider's cloud team" (Operator banned as entity). NAMING-CONVENTION (iter 4): - §5.1 heading "operator domain" → "Sovereign domain". - §7 multi-region diagram: replaced piecemeal Catalyst component list with a deferral to PLATFORM-TECH-STACK §2; added SPIRE server; fixed "per-Org workspaces" → "per-Environment Gitea repos"; added per-host-cluster infrastructure callout. SECURITY (iter 6 — partial; fold into this commit): - "operator-approved" → "sovereign-admin-approved" for DR promotion. - Realm name "catalyst-operator" → "catalyst-admin" (entity-noun scrubbed from the realm naming itself). SOVEREIGN-PROVISIONING (iter 7 — partial): - "single operator's laptop" → "single person's laptop" (avoid "operator" as entity). - "the next operator" → "the next Sovereign provisioning request, regardless of who initiates it". - "catalyst-operator realm" → "catalyst-admin realm" (×2). - Capital-W "Workspace-controller" residuals (3) → "Environment- controller" (replace_all is case-sensitive; previous iter caught lowercase only). PERSONAS (iter 5): - P3 "within a Sovereign Operator team" → "within a Sovereign's operations team". - Two capital-W "Workspace-controller" residuals fixed. SRE (iter 11 — partial): - §13.2 "Workspace-controller stuck" runbook entry → "Environment-controller stuck". Banned-term sweep result post-fix: no `Operator team|role|account| user|admin` anywhere; no capital-W Workspace as Catalyst scope; no Valkey-as-control-plane refs. Refs #37 |
||
|
|
27325edb32 |
docs(iter-2): glossary alignment — rename workspace-controller, fix definitions
GLOSSARY.md line-by-line audit. Eight corrections.
1. workspace-controller → environment-controller everywhere. The
controller reconciles the Environment CRD; "workspace" is banned as
a Catalyst scope, so it cannot be in a component name either. Fixed
in: GLOSSARY, ARCHITECTURE, PLATFORM-TECH-STACK, NAMING-CONVENTION,
SOVEREIGN-PROVISIONING, IMPLEMENTATION-STATUS, core/README,
BUSINESS-STRATEGY. Banned-term entry in GLOSSARY now explicitly
covers component names too.
2. "workspace repos" (per-Environment Gitea repos) → "Environment
Gitea repos" in GLOSSARY, PLATFORM-TECH-STACK.
3. JWT claim {workspace, org, role} → {environment, org, role} in
ARCHITECTURE projector diagram.
4. OpenOva definition refined: was "Never used to name a product",
which contradicted "OpenOva Catalyst", "OpenOva Cortex". Now: brand
prefix in product names; bare "OpenOva" = the company; bare
"Catalyst" = the platform.
5. Catalyst definition completed: was missing provisioning, billing,
gitea, observability — now lists all 14 control-plane components,
pointing at the table below.
6. Catalyst components table: added `provisioning` (validates
configSchema, commits to Environment Gitea); reordered to match
ARCHITECTURE §3 grouping; clarified each component's source-of-truth
(catalog-svc reads monorepo + Gitea, blueprint-controller watches
monorepo + Gitea, etc.).
7. Environment definition: refers to NAMING §2.4 for env_type values;
removed inline list that didn't match canonical ordering. Added
concrete examples (acme-prod, acme-dev, bankdhofar-uat).
8. Application example: dropped "RocketChat" which appeared nowhere
else; replaced with generic "running deployment" plus the
established WordPress / Postgres examples.
9. sovereign-admin description: was "runs Crossplane" — Crossplane is
platform plumbing not user-facing. Now: "manages the underlying
clusters via Crossplane (which is platform plumbing, not a
user-facing surface)".
Banned-term coverage:
- "Workspace" entry now covers BOTH the Catalyst scope AND component
naming (workspace-controller → environment-controller).
Refs #37
|
||
|
|
2c4902b409 |
docs(iter-1): add IMPLEMENTATION-STATUS, fix wrong-org refs, reconcile monorepo
First validation iteration. Three concrete corrections. 1. Add docs/IMPLEMENTATION-STATUS.md as the bridge between target architecture and current code state. Status legend (✅ / 🚧 / 📐 / ⏸) applied per-component. Catalyst control plane = mostly 📐. Component READMEs = 🚧 (README only, no Blueprint manifests yet). products/axon = ✅ (only product with real code). core/ = 📐 (just .gitkeep). 2. Status banner added to ARCHITECTURE, SECURITY, SOVEREIGN-PROVISIONING, BLUEPRINT-AUTHORING, PERSONAS-AND-JOURNEYS, PLATFORM-TECH-STACK, SRE pointing readers at IMPLEMENTATION-STATUS.md before they treat any described feature as built. GLOSSARY also references it. 3. Architectural decision (Option A — monorepo canonical): - Each platform/<name>/ and products/<name>/ folder is the source of ONE Blueprint, published as ghcr.io/openova-io/<name>:<semver> by CI fan-out from the monorepo root. - BLUEPRINT-AUTHORING.md §1, §2, §13 rewritten to match. - README.md "what's in this repo" rewritten to clarify monorepo + OCI-fan-out shape; no longer claims every directory is a Blueprint in a way that contradicts BLUEPRINT-AUTHORING. Wrong-org fixes (3 places): - docs/PERSONAS-AND-JOURNEYS.md:13 github.com/openova → openova-io - docs/BLUEPRINT-AUTHORING.md:13 github.com/openova → openova-io - docs/BLUEPRINT-AUTHORING.md:404 github.com/openova → openova-io - docs/BLUEPRINT-AUTHORING.md ghcr.io/openova/* (3 refs) → openova-io API group consistency: - All references unified to catalyst.openova.io/v1alpha1 (was mixed v1 / v1alpha1; v1alpha1 is correct since the CRDs are design-stage with no implementation). core/README.md updated to honestly describe the directory tree as "target structure with .gitkeep placeholders" rather than implying the apps/console, apps/projector, etc. binaries already exist. The legacy apps/bootstrap and apps/manager directories are acknowledged as transitional placeholders that will be removed when the new apps/ layout is scaffolded. CLAUDE.md and .claude/project-memory.md updated to put IMPLEMENTATION-STATUS.md second in the read-first ordering. Refs #37 |
||
|
|
d51a3fba4d |
docs: add canonical Catalyst documentation set
Six new docs that establish the unified Catalyst model — Sovereign as
deployed instance, Organization as multi-tenancy unit, Environment as
{org}-{env_type} scope, Application as user-facing handle, Blueprint as
unified module+template successor.
- docs/GLOSSARY.md single source of truth for terminology;
every other doc defers to it; banned terms
(tenant, operator-as-entity, module, template,
Backstage, etc.) listed with replacements.
- docs/ARCHITECTURE.md overall Catalyst architecture: control plane
vs application Blueprints, write path
(Git → Flux → K8s + Crossplane), read path
(CQRS via NATS JetStream → projector → SSE),
SPIFFE/SPIRE workload identity, OpenBao
independent Raft per region (no stretched
cluster), Keycloak per-Org (SME) vs
per-Sovereign (corporate).
- docs/PERSONAS-AND-JOURNEYS.md personas × journeys matrix; only
three first-class surfaces (UI, Git, API);
explicit removal of Terraform/Pulumi/CLI as
user-facing IaC; Application card anatomy.
- docs/SECURITY.md identity (workload + user), OpenBao + ESO
credential flow, dynamic credentials with
auto-rotation sidecar, multi-region
OpenBao (independent Raft per region with
async perf replication — explicitly NOT
stretched), rotation policy CRDs, threat
model.
- docs/SOVEREIGN-PROVISIONING.md Phase 0 (catalyst-provisioner +
OpenTofu one-shot) → Phase 1 (Crossplane
adopts) → Phase 2 (self-sufficient Catalyst
control plane); air-gap procedure;
Organization migration; decommission.
- docs/BLUEPRINT-AUTHORING.md Blueprint CRD spec, configSchema,
placementSchema, depends, manifests,
overlays; Crossplane Composition authoring
for non-K8s; signing/publishing pipeline;
public vs private (Org-scoped) visibility;
contribution path.
Refs #37
|