openova/docs/PLATFORM-TECH-STACK.md
hatiyildiz 04559e5c37 docs(reconcile-pass-1): align docs with ground truth at dd578d1c
Reconcile Pass 1 — first holistic LLM-driven reconciliation pass per
~/.claude/skills/reconcile-catalyst-docs/SKILL.md. Skill triggered after
the post-Group-M architectural batch (#161, #162, #163, #167, #168,
#169, #170, #171, #173, #174, #175). Live ground truth verified against
kubectl + ls platform/ + git log + GHCR + componentGroups.ts.

Drift categories fixed:

- A. Numerical: bp-powerdns 1.0.5 → 1.0.6; component-logos 63 → 62
  (powerdns SVG missing, tracked under #173); bootstrap kit 11 → 12
  with bp-powerdns added per #167.
- B. Service: pool-domain-manager + 5 registrar adapters
  (Cloudflare/Namecheap/GoDaddy/OVH/Dynadot, #170) added to
  IMPLEMENTATION-STATUS, ARCHITECTURE, PLATFORM-TECH-STACK, GLOSSARY,
  and PROVISIONING-PLAN; bp-powerdns added to ARCHITECTURE bootstrap
  kit + Catalyst-on-Catalyst dependency tree.
- C. Architectural: SOVEREIGN-PROVISIONING §3 + DEMO-RUNBOOK Step 4
  + ORCHESTRATOR-STATE Step 6 rewritten from Dynadot-direct DNS writes
  to PowerDNS authoritative + PDM /v1/commit + registrar-adapter
  NS-flip; PROVISIONING-PLAN Phase 4 paths corrected to
  products/catalyst/bootstrap/api/ (per INVIOLABLE-PRINCIPLES #3 the
  Go provisioner does NOT call cloud APIs); Phase 6 retitled and
  rewritten for the new DNS architecture.
- D. Process: RUNBOOK-PROVISIONING §2 wizard-step table + DEMO-RUNBOOK
  Step 2 wizard-step table updated to canonical 7-step ordering
  (Org → Domain → Topology → Provider → Credentials → Components →
  Review per WIZARD_STEPS in WizardLayout.tsx, post #169 + #174); the
  three-mode StepDomain (pool / byo-manual / byo-api per #169) and
  two-tab StepComponents (mandatory infra + apps per #161/#162/#175)
  now documented.
- E. Cross-doc: Group G  across PROVISIONING-PLAN +
  ORCHESTRATOR-STATE (superseded by #167+#163+#170, not by the
  original Dynadot-multi-domain plan); Group C  in
  PROVISIONING-PLAN (Flux is reconciling from openova-public today);
  README Stack-at-a-glance DNS row expanded.
- F. Stale terminology: 11-grep banned-terms scan clean — every k8gb
  residual is a legitimate "removed at #171, replaced by lua-records"
  reference.

VALIDATION-LOG.md gains the Reconcile Pass 1 entry per skill spec.
Reconcile-skill numbering is independent of the Audit-skill numbering
(which continues at Pass 108+).

Files: 13 docs + VALIDATION-LOG entry.
Escalations: none.
2026-04-29 09:40:10 +02:00

22 KiB
Raw Blame History

Platform Tech Stack

Status: Authoritative target stack. Updated: 2026-04-29. Implementation: Component READMEs exist; Catalyst control-plane glue is mostly design-stage. The DNS plane (bp-powerdns + pool-domain-manager + registrar adapters) is deployed today in openova-system on Catalyst-Zero — see IMPLEMENTATION-STATUS.md.

Every component in Catalyst, what it does, and where it sits — control plane, application layer, or both. Defer to GLOSSARY.md for terminology and ARCHITECTURE.md for the model.


1. Component categorization

Catalyst's components fall into three categories:

Category Where it runs Examples
Catalyst control plane The Sovereign's mgt cluster console, marketplace, admin, projector, catalog-svc, provisioning, environment-controller, blueprint-controller, billing, gitea, nats-jetstream (control-plane account), openbao, keycloak, spire-server, observability (Grafana stack)
Per-host-cluster infrastructure Every host cluster (mgt, rtz, dmz) cilium, external-dns, powerdns, coraza, flux, crossplane, opentofu (bootstrap-only), sealed-secrets (bootstrap-only — transient until ESO+OpenBao take over), cert-manager, external-secrets, kyverno, trivy, falco, sigstore, syft-grype, vpa, keda, reloader, seaweedfs, velero, harbor, failover-controller
Application Blueprints Inside per-Org vclusters cnpg, ferretdb, valkey, strimzi, clickhouse, opensearch, stalwart, livekit, matrix, stunner, guacamole, milvus, neo4j, vllm, kserve, knative, librechat, bge, llm-gateway, anthropic-adapter, langfuse, nemo-guardrails, temporal, flink, debezium, iceberg, openmeter, litmus

The same upstream technology can serve in multiple categories. For example: Valkey is not part of the control plane (JetStream KV replaces it there) but is available as an Application Blueprint when a User wants Redis-compatible caching for their app. Similarly, Strimzi/Kafka is an Application Blueprint; the Catalyst control plane uses NATS JetStream for events, not Kafka.

This separation is critical and is the main reason to read this document carefully.


2. Catalyst control-plane components (per-Sovereign, on the mgt cluster)

These components make a Kubernetes cluster a Sovereign. Installed exactly once per Sovereign, on its management cluster, as part of the bp-catalyst-platform umbrella Blueprint.

2.1 User-facing surfaces

Component Source Purpose
console core/ (Go + Astro/Svelte UI) Primary UI for end users. Form / Advanced / IaC editor depths.
marketplace (UI module of core/) Public-facing Blueprint card grid.
admin (UI module of core/) Sovereign-admin operations UI.

2.2 Catalyst backend services

Component Purpose
projector CQRS read-side. Subscribes to NATS JetStream, materializes per-Environment KV, fans out SSE to console.
catalog-svc Reads Blueprint CRDs, serves catalog API to console + marketplace.
provisioning Validates configSchema, composes manifests, creates one Gitea repo per Application under the Org's Gitea Org, commits initial branches (develop/staging/main).
environment-controller Reconciles Environment CRD: vcluster + Flux-bootstrap (watching the appropriate branch across the Org's Application repos) + webhooks.
blueprint-controller Watches Blueprint sources (this monorepo + per-Sovereign catalog-sovereign Gitea Org + Org-private shared-blueprints repos), registers Blueprint CRDs.
billing Per-Organization metering, invoicing.
pool-domain-manager (PDM) Allocates pool subdomains (under omani.works / openova.io), owns the per-Sovereign PowerDNS zone lifecycle (/v1/reserve/v1/commit writes the 6-record set + parent-zone NS delegation), and exposes registrar adapters (Cloudflare / Namecheap / GoDaddy / OVH / Dynadot — #170) for BYO byo-api flow's NS-flip. CNPG-backed pdm-pg. Source: core/pool-domain-manager/. Lives on the OpenOva-run Catalyst-Zero (the catalyst-provisioner), not on every Sovereign — it is part of the bootstrap surface, not the per-Sovereign control plane.

2.3 Per-Sovereign supporting services

These run once per Sovereign (on the mgt cluster, with sibling replicas in workload regions where noted). They are part of the Catalyst control plane.

Component Purpose
keycloak User identity. Per-Org realm in SME-style Sovereigns; per-Sovereign realm in corporate-style.
openbao Secret backend. Primary on mgt; sibling Raft cluster per workload region with async perf replication. No stretched clusters. See SECURITY.md §5.
spire (server + agent) SPIFFE/SPIRE workload identity. 5-min rotating SVIDs. Root server on mgt; per-host-cluster agent + cluster-local SPIRE-server replica.
nats-jetstream Event spine (pub/sub + Streams + KV). Per-Organization Accounts. Replaces Redpanda + Valkey for the control plane only. Apache 2.0.
gitea Per-Sovereign Git server. Hosts five conventional Gitea Orgs: catalog (public Blueprint mirror), catalog-sovereign (Sovereign-curated private Blueprints), one per Catalyst Organization (each with shared-blueprints + one repo per Application), and system (sovereign-admin scope). See GLOSSARY.md §"Gitea Orgs".
observability (Grafana stack) Catalyst's own self-monitoring: Alloy collector, Loki (logs), Mimir (metrics), Tempo (traces), Grafana visualization. Customer Application telemetry also flows here unless an Org installs its own observability stack.

3. Per-host-cluster infrastructure (on every host cluster: mgt, rtz, dmz)

These are deployed on every host cluster a Sovereign owns — not just the management cluster. They form the substrate Catalyst (and Application workloads) sit on. Installed by the bootstrap kit during Phase 0 (or by Crossplane when a new region is added later).

3.1 Networking and service mesh

Component Purpose
cilium CNI + Service Mesh (eBPF). mTLS, L7 policies, Gateway API.
external-dns DNS sync (registers/deletes records via PowerDNS API).
powerdns Authoritative DNS + DNSSEC + lua-records (geo + health-checked failover). See MULTI-REGION-DNS.md for the failover patterns.
coraza WAF (OWASP CRS) at the DMZ edge.

3.2 GitOps and IaC

Component Purpose
flux GitOps reconciler. One Flux instance per vcluster (lightweight: source + kustomize + helm controllers). Plus a host-level Flux on each host cluster for Catalyst itself.
crossplane The only IaC. Manages all non-Kubernetes resources via Compositions. Never user-facing. Installed on the mgt cluster (manages cloud resources for the whole Sovereign).
opentofu Bootstrap IaC only. Used in Phase 0 of Sovereign provisioning by catalyst-provisioner, then archived. Not deployed on host clusters.
sealed-secrets Bootstrap-only secret distribution. Used during Phase 0 to seal the initial OpenBao unseal keys before ESO + OpenBao are up. Decommissioned after Phase 1 hand-off; ESO+OpenBao is the day-2 secret pipeline.

3.3 Security and policy

Component Purpose
cert-manager TLS certificate automation.
external-secrets ESO — reads OpenBao paths, materializes K8s Secrets.
kyverno Policy engine — admission control, mutation, generation.
trivy Image and IaC vulnerability scanning (CI + runtime).
falco Runtime security (eBPF).
sigstore Container image signing verification (cosign admission).
syft-grype SBOM generation + vulnerability matching.

3.4 Scaling and operations

Component Purpose
vpa Vertical Pod Autoscaler — right-sizing.
keda Event-driven horizontal autoscaling, scale-to-zero.
reloader Auto-restart Pods when ConfigMap/Secret hashes change.

3.5 Storage and registry

Component Purpose
seaweedfs Unified S3 layer. Acts as the encapsulation in front of cloud archival storage — every Catalyst component talks to one S3 endpoint while SeaweedFS routes hot/warm/cold tiers transparently.
velero K8s backup/restore. Backups land in cloud archival storage.
harbor Container registry per host cluster. Stores Catalyst component images, mirrored Blueprint OCI artifacts, customer images.

3.6 Resilience

Component Purpose
failover-controller Multi-region failover orchestration. Lease-based (cloud witness) to prevent split-brain.

4. Application Blueprints (Optional, A La Carte)

These are not part of the Catalyst control plane. Users install them as Applications when they need them.

4.1 Data services

Blueprint Purpose Multi-region replication
cnpg PostgreSQL operator WAL streaming (async primary-replica)
ferretdb MongoDB wire protocol on PostgreSQL Via CNPG WAL streaming
strimzi Apache Kafka streaming MirrorMaker2
valkey Redis-compatible cache REPLICAOF
clickhouse OLAP analytics ReplicatedMergeTree
opensearch Search + hot SIEM backend Cross-cluster replication

4.2 CDC

Blueprint Purpose
debezium Change data capture

4.3 Workflow and processing

Blueprint Purpose
temporal Saga orchestration + compensation
flink Stream + batch processing

4.4 Data lakehouse

Blueprint Purpose
iceberg Open table format

4.5 Communication

Blueprint Purpose
stalwart Email server (JMAP/IMAP/SMTP)
stunner K8s-native TURN/STUN
livekit Video/audio (WebRTC SFU)
matrix Team chat (Matrix protocol; Synapse is the server implementation)
guacamole Clientless remote-desktop gateway (RDP/VNC/SSH/kubectl-exec via browser, Keycloak SSO, full session recording to SeaweedFS)

4.6 AI / ML

Blueprint Purpose
knative Serverless platform
kserve Model serving
vllm LLM inference
milvus Vector database
neo4j Graph database
librechat Chat UI
bge Embeddings + reranking
llm-gateway Subscription proxy for Claude Code
anthropic-adapter OpenAI-to-Anthropic translation

4.7 AI safety and observability

Blueprint Purpose
nemo-guardrails AI safety firewall
langfuse LLM observability

4.8 Identity and metering

Blueprint Purpose
openmeter Usage metering

4.9 Chaos engineering

Blueprint Purpose
litmus Chaos engineering experiments

5. Composite Blueprints (Products)

OpenOva ships these as ready-made composite Blueprints. Each is a package of Blueprints with curated configuration:

Composite Composes
bp-catalyst-platform The Catalyst control plane itself — see §2 above.
bp-cortex AI Hub — kserve, knative, vllm, milvus, neo4j, librechat, bge, llm-gateway, anthropic-adapter, nemo-guardrails, langfuse
bp-axon SaaS LLM Gateway (also installable as a managed gateway when Cortex is too heavy)
bp-fingate Open Banking — keycloak (FAPI mode), openmeter, ext_authz + 6 banking services
bp-fabric Data & Integration — strimzi, flink, temporal, debezium, iceberg, clickhouse, seaweedfs
bp-relay Communication — stalwart, livekit, stunner, matrix, guacamole

OpenOva also ships Specter (AIOps agents) and Exodus (migration program). Specter is a composite Blueprint (bp-specter) typically installed in corporate Sovereigns. Exodus is a deliverable services engagement, not a Blueprint.


6. Multi-Region Architecture

flowchart TB
    subgraph Mgt["Management host cluster (one per Sovereign)"]
        CC[Catalyst control plane]
        Gitea
        Bao0[OpenBao primary]
        Nats[NATS JetStream]
        KC[Keycloak]
    end

    subgraph RegionA["Region A (rtz + dmz)"]
        K8sA[Workload host cluster<br>per-Org vclusters]
        BaoA[OpenBao replica<br>region-local Raft]
        NatsA[NATS leaf node]
        IngressA[Cilium Gateway + WAF]
    end

    subgraph RegionB["Region B (rtz + dmz)"]
        K8sB[Workload host cluster<br>per-Org vclusters]
        BaoB[OpenBao replica<br>region-local Raft]
        NatsB[NATS leaf node]
        IngressB[Cilium Gateway + WAF]
    end

    Mgt -->|"Crossplane provisions"| RegionA
    Mgt -->|"Crossplane provisions"| RegionB
    Bao0 -.->|"async perf replication"| BaoA
    Bao0 -.->|"async perf replication"| BaoB
    Nats <-->|"leaf node sync"| NatsA
    Nats <-->|"leaf node sync"| NatsB
    IngressA <-.->|"PowerDNS lua-records (geo + health-checked failover)"| IngressB

Each region is its own failure domain. OpenBao Raft is intra-region only; cross-region is async perf replication. See SECURITY.md §5.


7. Resource estimates

7.1 Catalyst control plane (per Sovereign, on the mgt cluster)

This is the budget for the Catalyst-specific layer only — the components in §2. Per-host-cluster infrastructure (§3 — Cilium, Flux, Crossplane, Kyverno, Harbor, etc.) runs on the mgt cluster too, but its budget is in §7.4 below.

Layer Approx RAM Notes
Control-plane services (console, projector, catalog-svc, provisioning, environment-controller, blueprint-controller, billing) ~3 GB Several small Go services
NATS JetStream ~0.5 GB 3 replicas
OpenBao ~1.5 GB 3-node Raft
Keycloak (corporate / shared-sovereign) ~2 GB HA, Postgres-backed
Keycloak (SME / per-organization × N orgs) ~150 MB × N Single replica each, embedded H2
Gitea ~1 GB
SPIRE server ~0.3 GB
Catalyst observability (Grafana stack) ~3 GB Grafana, Loki, Mimir, Tempo, Alloy
Catalyst-only subtotal ~11.3 GB for the mgt cluster

For a single-region SME Sovereign with 100 Orgs: ~11.3 GB Catalyst + 100 × 150 MB Keycloak ≈ ~26 GB Catalyst-only on the management host cluster (before per-host-cluster infrastructure overhead).

7.2 Per-Organization vcluster (workload regions)

Layer Approx RAM
vcluster control plane ~150 MB
Lightweight Flux ~150 MB
ESO + reloader ~100 MB
Subtotal per Org per region ~400 MB + workload RAM

7.3 Per-Application

Application-specific. A WordPress with embedded Postgres on medium overlay: ~2 GB. A multi-region Strimzi Kafka cluster: 416 GB per region.

7.4 Per-host-cluster infrastructure overhead

Adds to every host cluster a Sovereign owns (mgt, rtz, dmz):

Layer Approx RAM Notes
Cilium ~0.5 GB per node, agents + Hubble
Flux (host-level) ~0.2 GB source + kustomize + helm controllers
Crossplane ~0.5 GB only on mgt; manages cloud resources for whole Sovereign
cert-manager ~0.2 GB
ESO ~0.2 GB
Kyverno ~0.5 GB
Trivy Operator ~0.5 GB
Falco ~0.5 GB per node
Harbor ~3 GB per host cluster
SeaweedFS ~1.2 GB per host cluster (3 master + 6 volume + 2 filer + 2 s3 replicas)
Velero ~0.2 GB
Reloader, VPA, KEDA, External-DNS, Sigstore, Syft+Grype, failover-controller ~1.3 GB combined small operators
PowerDNS + dnsdist ~0.4 GB authoritative DNS + rate-limit shield (mgt only)
Per-host-cluster subtotal ~8.8 GB per host cluster

Total mgt cluster RAM ≈ Catalyst (§7.1) + per-host-cluster (§7.4) ≈ ~20 GB + 100 × 150 MB Keycloak (SME tier with 100 orgs) ≈ ~35 GB.


8. Cluster deployment

8.1 K3s installation

curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --disable traefik \
  --disable servicelb \
  --disable local-storage \
  --flannel-backend=none \
  --disable-network-policy \
  --kube-controller-manager-arg="node-monitor-period=5s" \
  --kube-controller-manager-arg="node-monitor-grace-period=20s" \
  --kube-apiserver-arg="default-watch-cache-size=50" \
  --etcd-arg="quota-backend-bytes=1073741824" \
  --kubelet-arg="max-pods=50"

8.2 Disabled K3s components

Component Replacement
traefik Cilium Gateway API
servicelb Cloud LB + PowerDNS lua-records for cross-region failover (see MULTI-REGION-DNS.md)
local-storage Application-level replication
flannel Cilium CNI

8.3 Cilium installation

helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=${API_SERVER_IP} \
  --set k8sServicePort=6443 \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set encryption.enabled=true \
  --set encryption.type=wireguard \
  --set gatewayAPI.enabled=true \
  --set envoy.enabled=true

9. User choice options

9.1 Cloud Provider

Provider Status Crossplane provider
Hetzner Cloud Available hcloud
AWS Available (Crossplane provider stable) aws
GCP Available (Crossplane provider stable) gcp
Azure Available (Crossplane provider stable) azure
Oracle Cloud (OCI) Available oci
Huawei Cloud Available huaweicloud

Hetzner is the most-tested path; the OpenOva Sovereign runs on Hetzner.

9.2 Regions

Option Description
1 region SME default — single rtz cluster, no geographic redundancy
2 regions Recommended for production — symmetric rtz clusters + DMZ, PowerDNS lua-records route across regions
3+ regions Regulated tier — adds DR replica region

9.3 LoadBalancer

Option How Cost
Cloud Provider LB Native LB ~EUR 510/mo
PowerDNS lua-records Cilium Gateway + PowerDNS authoritative + dnsdist Free
Cilium L2 Mode ARP-based (same subnet) Free

9.4 DNS Provider

Sovereign-domain registration is by the customer; Cloudflare is a frequent default. Per-cloud DNS providers (Route53, Cloud DNS, Azure DNS, Hetzner DNS) work too — Crossplane providers exist for all.

9.5 Archival S3 Storage

Provider Notes
Cloudflare R2 Always available; zero egress
AWS S3 If AWS chosen
GCP GCS If GCP chosen
Azure Blob If Azure chosen
OCI Object Storage If OCI chosen

10. SIEM / SOAR architecture

flowchart LR
    subgraph Detect
        Falco
        Trivy
        Kyverno
    end

    Detect -->|Falcosidekick / hooks| Strimzi[Strimzi/Kafka<br>(Application Blueprint)]
    Strimzi --> OS[OpenSearch<br>(hot SIEM)]
    OS -->|Age-out| CH[ClickHouse<br>(cold storage)]
    OS -->|Correlation| Specter[bp-specter<br>(AIOps Blueprint)]
    Specter -->|Auto-remediate| Detect

This pipeline is not part of the Catalyst control plane — it's a composition of Application Blueprints (Strimzi for transport, OpenSearch for hot SIEM, ClickHouse for cold storage, bp-specter for SOAR/correlation) plus per-host-cluster security tooling already there (Falco, Trivy, Kyverno). Customers install OpenSearch + ClickHouse + bp-specter when they want SIEM; the rest is already running.

The Catalyst control plane's own audit log (commits, RBAC events, SecretPolicy actions) ships to OpenSearch via this pipeline when the SIEM components are installed; otherwise audit logs are retained in the local Grafana stack with rotation.


11. License posture

Every Catalyst control-plane component carries an open-source license that allows redistribution as a customer-deployable platform:

Component License Notes
OpenBao MPL 2.0 Apache-2.0 fork of Vault, OK to redistribute.
NATS JetStream Apache 2.0 Clean.
Cilium Apache 2.0 Clean.
Flux Apache 2.0 Clean.
Crossplane Apache 2.0 Clean.
Gitea MIT Clean.
Keycloak Apache 2.0 Clean.
cert-manager Apache 2.0 Clean.
ESO Apache 2.0 Clean.
OpenTofu MPL 2.0 Clean (Terraform fork).
OpenSearch Apache 2.0 Clean (Elasticsearch fork).
Valkey BSD-3 Clean (Redis fork).

Application Blueprints carry their upstream licenses; some are non-Apache (e.g. CNPG: Apache 2.0; Strimzi: Apache 2.0; ferretdb: Apache 2.0; vllm: Apache 2.0). The Catalyst control plane never bundles BSL-licensed software.


See ARCHITECTURE.md for how these components fit together. See docs/TECHNOLOGY-FORECAST-2027-2030.md for the roadmap.