Closes #133. The previous §3 used a target/aspirational diagram with no cross-link to the actual implementation. Per the orchestrator brief and INVIOLABLE- PRINCIPLES.md #3 ('follow the documented architecture, exactly') + #7 ('verify before claiming done'), §3 now records what exists in this monorepo, where, and what is verifiably runtime-true vs structurally- complete. Changes: - Status header updated: 'design-stage' → 'deployed shape exists; DoD pending' - §3 replaced the target ASCII diagram with a 5-row table mapping each bootstrap step to its concrete artifact: 1. Wizard → tofu vars: products/catalyst/bootstrap/api/internal/provisioner/ 2. Cloud resources: infra/hetzner/main.tf 3. k3s + Flux bootstrap: infra/hetzner/cloudinit-control-plane.tftpl + cloudinit-worker.tftpl 4. Bootstrap-kit install: clusters/<sovereign-fqdn>/ Flux-reconciled, 11 G2 charts in dependency order matching the canonical sequence (cilium → cert-manager → flux → crossplane → sealed-secrets → spire → nats-jetstream → openbao → keycloak → gitea → bp-catalyst-platform) 5. Crossplane adoption / sealed-secrets decommission at Phase 1 hand-off - DNS records section preserved (managed-pool only — BYO require customer CNAME) - OpenTofu state location specified (catalyst-api PVC; air-gap remote backend guidance retained) - Implementation-status banner cross-links IMPLEMENTATION-STATUS.md §7 + PROVISIONING-PLAN.md Group M for end-to-end DoD What did NOT change: the architectural model (Phase 0 OpenTofu, Phase 1 Crossplane adoption, Flux as GitOps, Blueprints as install unit) is preserved exactly per INVIOLABLE-PRINCIPLES.md #3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Sovereign Provisioning
Status: Authoritative procedure. Updated: 2026-04-28.
Implementation: §3 below now reflects the deployed shape — the Go provisioner, OpenTofu module, and 11 G2 wrapper Helm charts all exist in this monorepo today (per IMPLEMENTATION-STATUS.md §7). End-to-end DoD against a real Hetzner project is pending Group M of PROVISIONING-PLAN.md. Catalyst-Zero (Contabo k3s, namespace catalyst) is the running catalyst-provisioner today.
How to provision a new Sovereign — a self-sufficient deployed instance of Catalyst. Defer to GLOSSARY.md for terminology and ARCHITECTURE.md for the model.
1. Inputs
| Input | Required | Notes |
|---|---|---|
| Cloud provider | Hetzner / AWS / GCP / Azure / OCI / Huawei | Hetzner is the most-tested path. |
| Cloud credentials | Provider API token | Used by OpenTofu (one-shot bootstrap) and Crossplane (ongoing). |
| Sovereign name | e.g. omantel, bankdhofar |
Slug, lowercase, 3–32 chars. |
| Sovereign domain | e.g. omantel.openova.io, bankdhofar.com |
Customers may use openova subdomains initially, then migrate. |
| Region(s) | 1+ | Single-region simplest for SME; 2+ for regulated/HA. |
| Building blocks per region | typically mgt + rtz (+ dmz) |
At minimum mgt + rtz. |
| Keycloak topology | per-organization (SME) / shared-sovereign (corporate) |
Determines Keycloak deployment shape. |
| Federation IdP (optional) | Azure AD / Okta / Google / etc. | For corporate; SME tier defers to per-Org Org-IdP federation. |
| TLS strategy | Let's Encrypt / cert-manager / corporate CA | cert-manager-managed, Let's Encrypt by default. |
| Object storage | Cloud-provider native | Used as the cold-tier backend behind SeaweedFS (which is the in-cluster S3 encapsulation layer that all consumers — Velero, Harbor, CNPG WAL, OpenSearch snapshots, Loki/Mimir/Tempo, Iceberg — talk to). |
2. Provisioning runs from catalyst-provisioner
The bootstrap is performed by catalyst-provisioner.openova.io, an always-on provisioning service operated by OpenOva. It is not part of any Sovereign at runtime — once a Sovereign is up, it is fully self-sufficient.
Why a permanent provisioner instead of "boot from your laptop":
- OpenTofu state must be durably stored — keeping it on a single person's laptop is fragile and a security risk.
- Provider credentials are scoped, stored in OpenBao on the provisioner, and never leave it.
- New Sovereigns can be created without a manual installer dance — the same machinery serves the next Sovereign provisioning request, regardless of who initiates it.
A self-host route exists for organizations that want zero OpenOva involvement: catalyst-provisioner is itself a Blueprint (bp-catalyst-provisioner) and can be deployed in a customer's own infrastructure. From there it bootstraps further Sovereigns. This is the air-gap path.
3. Phase 0 — Bootstrap
The implementation maps cleanly onto two artifacts in this monorepo:
| Step | Lives in | What runs |
|---|---|---|
| 1. Wizard input → tofu vars | products/catalyst/bootstrap/api/internal/provisioner/ |
Go service writes tofu.auto.tfvars.json from validated wizard input, runs tofu init && tofu plan && tofu apply -auto-approve against the canonical OpenTofu module, streams stdout/stderr lines to the wizard via SSE. No cloud APIs called from Go (per INVIOLABLE-PRINCIPLES.md #3). |
| 2. Cloud resources | infra/hetzner/main.tf |
OpenTofu provisions: hcloud_network (10.0.0.0/16) + subnet (10.0.1.0/24), hcloud_firewall (80/443/6443/ICMP open; 22 closed by default — operator adds source-CIDR rule via Crossplane post-bootstrap), hcloud_ssh_key from wizard input, 1 control-plane server (or 3 if ha_enabled) on Ubuntu 24.04 with cloud-init, worker_count worker servers, hcloud_load_balancer (lb11) targeting NodePorts 31080/31443. DNS records written via null_resource.dns_pool → catalyst-dns helper invoking the Dynadot API (managed pool domains only; BYO Sovereigns require the customer to point their own CNAME at the LB IP). |
| 3. k3s + Flux bootstrap | infra/hetzner/cloudinit-control-plane.tftpl |
cloud-init on the control-plane node installs k3s v1.31.4+k3s1 with --flannel-backend=none --disable-network-policy --disable=traefik --disable=servicelb --disable=local-storage --tls-san=<sovereign-fqdn>, then installs Flux v2.4.0 core, then applies the Flux GitRepository + Kustomization pointing at clusters/<sovereign-fqdn>/ in the public OpenOva monorepo. From this point Flux owns the cluster. Workers join via cloudinit-worker.tftpl using the project-derived k3s_token. |
| 4. Bootstrap-kit install | clusters/<sovereign-fqdn>/ (Flux-reconciled) |
Flux installs the 11 G2 wrapper Helm charts (each a bp-<name>:<semver> OCI artifact published by .github/workflows/blueprint-release.yaml) in dependency order: cilium → cert-manager → flux (host-level reconciler for the cluster's own Kustomizations) → crossplane → sealed-secrets (transient) → spire (server + agent) → nats-jetstream → openbao (3-node Raft) → keycloak (per topology choice) → gitea (with public Blueprint mirror) → bp-catalyst-platform (umbrella). |
| 5. Crossplane adoption | Crossplane Compositions in clusters/<sovereign-fqdn>/ |
Crossplane adopts management of all infrastructure created by OpenTofu in step 2; sealed-secrets is decommissioned in favour of ESO + OpenBao for day-2 secret distribution; further DNS records (gitea/admin/api/harbor) written through the Crossplane provider rather than Dynadot directly. Phase 1 begins (see §4). |
The wizard's progress page polls Flux Kustomizations on the new cluster and renders steady-state to the user when every Kustomization is Ready=True.
DNS records written in Phase 0:
*.<subdomain>.<pool-domain> A → load balancer IP
console.<subdomain>.<pool-domain> A → load balancer IP
gitea.<subdomain>.<pool-domain> A → load balancer IP
harbor.<subdomain>.<pool-domain> A → load balancer IP
admin.<subdomain>.<pool-domain> A → load balancer IP
api.<subdomain>.<pool-domain> A → load balancer IP
(Per NAMING §5.1 the canonical control-plane DNS pattern is {component}.{location-code}.{sovereign-domain} — the wildcard handles per-Application records under per-Environment subdomains.)
OpenTofu state: kept in the catalyst-api PVC under /var/lib/catalyst/tofu/<sovereign-fqdn>/ — re-running with the same FQDN is idempotent (tofu apply on existing state). For air-gap installs the operator MUST configure a remote backend with encryption-at-rest so the Hetzner token isn't carried only on a single PVC.
Implementation status: the Go wrapper, OpenTofu module, and 11 G2 wrapper charts all exist today (verified at IMPLEMENTATION-STATUS.md §7). End-to-end DoD against a real Hetzner project is pending Group M of the Catalyst-Zero Provisioning Plan.
Total Phase 0 time: 30–60 minutes for a single-region Hetzner Sovereign once DoD lands.
4. Phase 1 — Hand-off
After Phase 0 completes:
- Crossplane in the new Sovereign adopts management of all infrastructure created by OpenTofu. From this point forward, all infrastructure changes go through Crossplane.
- The bootstrap k3s nodes are not "thrown away" — they are claimed by Crossplane via the cloud provider's adoption mechanism.
- OpenTofu state is archived and read-only. It is never touched again.
catalyst-provisionerno longer has any active connection to the new Sovereign.
The Sovereign is now self-sufficient. It has the full Catalyst control-plane set per PLATFORM-TECH-STACK.md §2.3:
- Its own Crossplane managing further infrastructure.
- Its own OpenBao for secrets.
- Its own JetStream as event spine.
- Its own Keycloak for users.
- Its own SPIFFE/SPIRE for workload identity (5-min rotating SVIDs).
- Its own Gitea (with mirror of the public Blueprint catalog).
- Its own observability stack (Grafana + Alloy + Loki + Mimir + Tempo) for self-monitoring.
- Its own Catalyst control plane (console, marketplace, admin, projector, catalog-svc, provisioning, environment-controller, blueprint-controller, billing).
5. Phase 2 — Day-1 setup
The first sovereign-admin logs into console.<location-code>.<sovereign-domain>:
Day-1 actions
──────────────────────────────────────────────────────────────────
1. Configure cert-manager issuers (Let's Encrypt / corporate CA).
2. Configure backup destination (cloud object storage for Velero).
3. Configure Harbor with image-scanning policies.
4. (Optional) Federate Keycloak's catalyst-admin realm to corporate IdP.
5. (Optional) Configure observability exports (SIEM, datadog, etc.).
6. Onboard the first Organization:
Catalyst console → Admin → Organizations → New
Provide: name, contact, plan.
Environment-controller does NOT create vclusters yet.
They are created when the first Environment is provisioned.
7. Create the first Environment in that Organization:
Console → switch to Org context → Environments → New
Environment-controller spins up a vcluster on the chosen host cluster
and bootstraps Flux inside (watching the env-appropriate branch on
every Application repo within this Org's Gitea Org). Apps not yet
installed have no repos yet; repos are created on demand by the
provisioning-service when each App is installed.
Ready in ~60 seconds.
6. Phase 3 — Steady-state operation
From here on, the Sovereign runs autonomously. Sovereign-admins use the Catalyst admin UI for:
- Onboarding more Organizations
- Adding host clusters in new regions (Crossplane provisions them, environment-controller adopts them)
- Updating Catalyst itself (umbrella Blueprint version bumps, applied via Flux PR)
- Configuring SecretPolicies and EnvironmentPolicies
- Monitoring the Sovereign's own observability stack
- Reviewing audit logs
Everyday Application installs and configurations are done by org-admins and org-developers within their Organizations — see PERSONAS-AND-JOURNEYS.md.
7. Multi-region topology
7.1 Single-region (SME default)
Region A
└── Host cluster: hz-fsn-mgt-prod ← Catalyst control plane + per-Org vclusters
└── all building blocks collapse onto one cluster (mgt + rtz + dmz workloads
in separate namespaces, with Cilium NetworkPolicies enforcing isolation)
Cheapest topology. Single-region failure = Sovereign down. Acceptable for SME tier where customers also accept SME-tier SLAs.
7.2 Multi-region (corporate default)
Region A (primary mgt) Region B Region C (DR)
───────────────── ───────────── ─────────────
hz-nbg-mgt-prod hz-fsn-rtz-prod hz-hel-rtz-prod
Catalyst control plane per-Org vclusters per-Org vclusters
Gitea, JetStream, OpenBao, (sibling realizations (sibling realizations
Keycloak, projector, of each Org's Environment) of each Org's Environment)
catalog-svc, marketplace,
console, admin, billing
hz-nbg-dmz-prod hz-fsn-dmz-prod hz-hel-dmz-prod
ingress, WAF, k8gb ingress, WAF, k8gb ingress, WAF, k8gb
The mgt building block is typically NOT replicated (one Catalyst control plane per Sovereign). The rtz and dmz blocks ARE replicated for workload HA.
OpenBao runs in BOTH the mgt cluster (primary) and each rtz region (replica) — see SECURITY.md §5 for replication semantics.
8. Adding a region post-provisioning
sovereign-admin in Catalyst admin UI:
Admin → Infrastructure → Add Region
Provider: Hetzner
Region: hel
Building blocks: rtz, dmz
Apply
Catalyst:
- Crossplane provisions the new VPC, hosts, k3s cluster, etc.
- Cluster registered in Catalyst's cluster registry.
- cert-manager + Cilium + Flux + Crossplane + SPIRE + ESO + OpenBao replica deployed via the cluster's Flux Kustomization.
- New region available as a Placement target for new and existing Environments.
Existing Applications with placement.mode: single-region do not migrate automatically. To extend an existing Application to the new region, the user explicitly switches Placement to active-active (or active-hotstandby) and adds the new region to placement.regions — that's a one-line edit in the Application's Gitea repo on the appropriate branch (or a click in the Topology tab).
9. Air-gap deployment
Connected zone (one-time) Air-gapped Sovereign
────────────────────────── ───────────────────────────────
1. Mirror public Blueprint OCI Harbor receives blobs via physical
artifacts to portable media. transfer / data diode.
2. Mirror Catalyst control-plane Sovereign's Gitea adopts blobs as
container images. OCI manifests in local registry.
3. Mirror cert-manager root + cert-manager configured with
organization CA bundle. internal CA only.
4. Configure Keycloak to local LDAP Keycloak federates to internal AD/LDAP.
(no external IdPs).
Catalyst is air-gap-ready by construction: every artifact (Blueprints, Catalyst code, base images) is OCI-signed. Mirror once, run forever.
10. Migration and decommission
10.1 Migrating an Organization between Sovereigns
Rare but supported. Example: a Bank Dhofar Organization started life on the openova Sovereign (paid SaaS), now wants to move to its own bankdhofar Sovereign (self-host).
1. Provision bankdhofar Sovereign (Phases 0–2).
2. On openova Sovereign: Admin → Organization → Export
Catalyst produces an export bundle:
- Org metadata
- All Application Gitea repos under this Org (cloned + bundled, including all branches)
- The Org's `shared-blueprints` repo
- Keycloak realm export (users, federated identities)
- OpenBao export (sealed secrets only)
3. On bankdhofar Sovereign: Admin → Organization → Import
Environment-controller recreates Environments → vclusters.
Flux pulls manifests, reconciles.
Apps come up.
4. Final cutover: DNS swap.
5. Verify, then decommission on openova side.
Time depends on data volume; typically minutes to hours per Org.
10.2 Decommissioning a Sovereign
Reverse of provisioning:
1. Migrate all Organizations off (Section 10.1).
2. Catalyst admin → Sovereign → Decommission
3. Crossplane begins teardown of host clusters.
4. OpenBao final state exported and stored encrypted.
5. DNS records removed.
6. Cloud resources reclaimed.
The customer keeps the OpenBao export and Gitea bundles for whatever retention period their compliance demands.
Cross-reference ARCHITECTURE.md and SECURITY.md. For day-to-day operation see SRE.md.