feat(catalyst-api): infrastructure CRUD via Crossplane XRC + unified topology endpoint (#239)

Refactors the Sovereign Infrastructure surface so every Day-2 mutation
flows through a Crossplane Composite Resource Claim (XRC) the catalyst-
api writes against the SOVEREIGN cluster's kubeconfig — never bespoke
hcloud-go calls, never `exec.Command("kubectl", ...)`, never client-go
direct mutation. Per docs/INVIOLABLE-PRINCIPLES.md #3 Crossplane is the
ONLY Day-2 IaC seam.

When the third-sibling chart's Composition for a given XRC kind isn't
present yet, Crossplane stores the claim and sits it as Pending; the
catalyst-api emits a Job log line "Awaiting Crossplane Composition
for <kind>" so an operator browsing /jobs sees the gap. Each mutation
also commits a Job + Execution + LogLines via the existing audit-trail
Bridge so the table view shows every Day-2 action.

Endpoints + XRC kinds (all Composition targets owned by the third-
sibling agent):

  GET    .../infrastructure/topology                                     unified hierarchical read
  POST   .../infrastructure/regions                       RegionClaim          region-composition
  POST   .../infrastructure/regions/{id}/clusters         ClusterClaim         cluster-composition
  POST   .../infrastructure/clusters/{id}/vclusters       VClusterClaim        vcluster-composition
  POST   .../infrastructure/clusters/{id}/pools           NodePoolClaim        nodepool-composition
  PATCH  .../infrastructure/pools/{id}                    NodePoolClaim        nodepool-composition
  POST   .../infrastructure/loadbalancers                 LoadBalancerClaim    lb-composition
  POST   .../infrastructure/peerings                      PeeringClaim         peering-composition
  POST   .../infrastructure/firewalls/{id}/rules          FirewallRuleClaim    firewall-composition
  POST   .../infrastructure/nodes/{id}/{cordon|drain|replace}
                                                          NodeActionClaim      node-action-composition
  DELETE .../infrastructure/{kind}/{id}                   <kind>'s claim       <kind>-composition

Response shape per write: 202 Accepted with
  { jobId, xrcKind, xrcName, status: "submitted-pending-composition",
    submittedAt, cascade?: [...] }
DELETE additionally returns a Cascade preview (computed from the live
topology) so the FE confirm dialog can render "deleting region X
will drain Y workloads, remove Z PVCs".

Unified topology endpoint emits TopologyResponse: cloud[*] +
topology.regions[*].clusters[*].(vclusters|nodePools|nodes|
loadBalancers) + storage.(pvcs|buckets|volumes). The four FE tabs
(Topology, Compute, Storage, Network) all derive their views off this
single response. Live-source fields fall back to empty arrays — never
placeholder data per the founder's "no synthetic rows" rule. Legacy
flat /compute, /storage, /network endpoints stay wired with their
pre-existing shapes until the FE migrates.

New files:
  - internal/infrastructure/types.go            wire types
  - internal/infrastructure/xrc.go              Crossplane writer + DNS-1123 namer
  - internal/infrastructure/topology_loader.go  composes from tofu outputs
                                                + informer cache + Crossplane MR list
  - internal/jobs/mutation_bridge.go            RegisterMutationJob /
                                                AppendXRCSubmittedLog /
                                                FinishMutationJob — every
                                                mutation lands in batch
                                                "day-2-mutations"

Tests (`go test -race`):
  - infrastructure_test.go        unified TopologyResponse shape +
                                  empty fallback for storage/peerings
                                  + legacy compute/storage/network
                                  endpoints unchanged
  - infrastructure_crud_test.go   per-endpoint 202 happy path, 404 unknown
                                  deployment, 409 XRC name conflict, 503
                                  on missing kubeconfig, DELETE cascade
                                  preview, audit-Job materialised in
                                  jobs.Store

CORS now allows PATCH + DELETE so the FE wire calls succeed.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-04-30 11:57:46 +04:00 committed by GitHub
parent 6cbfbd18ef
commit 948bf61b91
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 3102 additions and 272 deletions

View File

@ -32,7 +32,7 @@ func main() {
// from the new VM, not a browser), but enabling PUT here
// keeps the policy consistent for any future browser-side
// resume flow that re-uses the same endpoint.
AllowedMethods: []string{"GET", "POST", "PUT", "OPTIONS"},
AllowedMethods: []string{"GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"},
AllowedHeaders: []string{"Accept", "Content-Type", "Authorization"},
MaxAge: 300,
}))
@ -97,16 +97,34 @@ func main() {
// V1 emits a static placeholder shape — see dashboard.go header
// for the metrics-server upgrade plan.
r.Get("/api/v1/dashboard/treemap", h.GetDashboardTreemap)
// Sovereign Infrastructure surface (issue #227) — Topology canvas
// + Compute / Storage / Network card grids. Each endpoint reads
// from the deployment record + (future) live cluster kubeconfig;
// see internal/handler/infrastructure.go for the data-source
// contract.
// Sovereign Infrastructure surface — unified topology read +
// Day-2 CRUD via Crossplane XRC writes (issue #227 + Day-2 IaC).
// Read endpoints compose from the deployment record + live
// cluster informer cache; mutation endpoints write Composite
// Resource Claims to the Sovereign cluster's kubeconfig per
// docs/INVIOLABLE-PRINCIPLES.md #3 (Crossplane is the ONLY
// Day-2 IaC seam). Every mutation also commits a Job entry to
// the existing /jobs surface for full audit-trail.
r.Get("/api/v1/deployments/{depId}/infrastructure/topology", h.GetInfrastructureTopology)
r.Get("/api/v1/deployments/{depId}/infrastructure/compute", h.GetInfrastructureCompute)
r.Get("/api/v1/deployments/{depId}/infrastructure/storage", h.GetInfrastructureStorage)
r.Get("/api/v1/deployments/{depId}/infrastructure/network", h.GetInfrastructureNetwork)
// CRUD — every endpoint writes a Crossplane XRC + a mutation Job.
// The third-sibling chart authors the matching Compositions; until
// they land Crossplane sits the claim Pending and the catalyst-api
// surfaces "Awaiting Composition for <kind>" in the audit log.
r.Post("/api/v1/deployments/{depId}/infrastructure/regions", h.CreateInfrastructureRegion)
r.Post("/api/v1/deployments/{depId}/infrastructure/regions/{id}/clusters", h.CreateInfrastructureCluster)
r.Post("/api/v1/deployments/{depId}/infrastructure/clusters/{id}/vclusters", h.CreateInfrastructureVCluster)
r.Post("/api/v1/deployments/{depId}/infrastructure/clusters/{id}/pools", h.CreateInfrastructurePool)
r.Patch("/api/v1/deployments/{depId}/infrastructure/pools/{id}", h.PatchInfrastructurePool)
r.Post("/api/v1/deployments/{depId}/infrastructure/loadbalancers", h.CreateInfrastructureLoadBalancer)
r.Post("/api/v1/deployments/{depId}/infrastructure/peerings", h.CreateInfrastructurePeering)
r.Post("/api/v1/deployments/{depId}/infrastructure/firewalls/{id}/rules", h.CreateInfrastructureFirewallRule)
r.Post("/api/v1/deployments/{depId}/infrastructure/nodes/{id}/{action}", h.CreateInfrastructureNodeAction)
r.Delete("/api/v1/deployments/{depId}/infrastructure/{kind}/{id}", h.DeleteInfrastructureResource)
log.Info("catalyst api listening", "port", port)
if err := http.ListenAndServe(":"+port, r); err != nil {
log.Error("server error", "err", err)

View File

@ -0,0 +1,592 @@
// infrastructure_crud_test.go — coverage for the Day-2 mutation
// endpoints (POST/PATCH/DELETE) that write Crossplane XRCs against
// the Sovereign cluster's dynamic client.
//
// The fake dynamic client is seeded with the right list-kinds for
// each XRC kind so the catalyst-api's create call returns a typed
// success without hitting a real apiserver. Tests assert:
//
// - 202 happy path: response carries jobId + xrcKind + xrcName + status
// - 404 unknown deployment
// - 409 conflict when same XRC name already exists
// - 503 when the sovereign cluster is unreachable (no kubeconfig)
// - DELETE returns the cascade preview
package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"sync"
"testing"
"github.com/go-chi/chi/v5"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/client-go/dynamic"
dynamicfake "k8s.io/client-go/dynamic/fake"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/infrastructure"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/jobs"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/provisioner"
)
// xrcListKinds — every Composite Resource Claim kind the CRUD
// handlers can write. Mirrors the Kind* constants in
// internal/infrastructure/xrc.go. Tests register the matching
// list-kind names so the fake dynamic client's List+Create paths
// behave correctly.
func xrcListKinds() map[schema.GroupVersionResource]string {
mk := func(plural string) schema.GroupVersionResource {
return schema.GroupVersionResource{
Group: infrastructure.XRCAPIGroup,
Version: infrastructure.XRCAPIVersion,
Resource: plural,
}
}
out := map[schema.GroupVersionResource]string{
mk("regionclaims"): "RegionClaimList",
mk("clusterclaims"): "ClusterClaimList",
mk("vclusterclaims"): "VClusterClaimList",
mk("nodepoolclaims"): "NodePoolClaimList",
mk("loadbalancerclaims"): "LoadBalancerClaimList",
mk("peeringclaims"): "PeeringClaimList",
mk("firewallruleclaims"): "FirewallRuleClaimList",
mk("nodeactionclaims"): "NodeActionClaimList",
}
// The DELETE handler calls infrastructure.Load to compute the
// cascade preview, which queries vcluster.io/v1alpha1/vclusters
// and core/v1/persistentvolumeclaims. Register those kinds with
// the fake client so List doesn't panic on "unregistered list
// kind". Production hits a real apiserver that either has the
// CRD or returns 404 — both code paths return gracefully.
out[schema.GroupVersionResource{Group: "vcluster.io", Version: "v1alpha1", Resource: "vclusters"}] = "VClusterList"
out[schema.GroupVersionResource{Group: "", Version: "v1", Resource: "persistentvolumeclaims"}] = "PersistentVolumeClaimList"
return out
}
// fakeXRCDynamicFactory — closure factory the handler reads via
// h.dynamicFactory. Returns a single fake client seeded with the
// xrcListKinds map; tests can append additional unstructured
// objects to simulate pre-existing claims (for the 409 conflict
// path).
func fakeXRCDynamicFactory(seed ...runtime.Object) func(string) (dynamic.Interface, error) {
scheme := runtime.NewScheme()
client := dynamicfake.NewSimpleDynamicClientWithCustomListKinds(scheme, xrcListKinds(), seed...)
return func(_ string) (dynamic.Interface, error) {
return client, nil
}
}
// installCRUDDeployment — like installInfraDeployment but with a
// Result.KubeconfigPath pointing at a temp file so
// sovereignDynamicClient resolves to the injected fake. Each test
// gets its own deployment id so concurrent tests don't share a
// fake apiserver.
func installCRUDDeployment(t *testing.T, h *Handler, id string) *Deployment {
t.Helper()
path := filepath.Join(t.TempDir(), id+".yaml")
// Kubeconfig contents are ignored by the fake factory, but the
// file must exist + be readable. Per
// docs/INVIOLABLE-PRINCIPLES.md #10 the fake content carries no
// real credentials.
if err := os.WriteFile(path, []byte("apiVersion: v1\nkind: Config"), 0o600); err != nil {
t.Fatalf("write kubeconfig: %v", err)
}
dep := &Deployment{
ID: id,
Status: "ready",
Request: provisioner.Request{
SovereignFQDN: "omantel.omani.works",
Region: "fsn1",
ControlPlaneSize: "cpx21",
WorkerSize: "cpx41",
WorkerCount: 2,
HetznerProjectID: "test-project",
},
Result: &provisioner.Result{
SovereignFQDN: "omantel.omani.works",
ControlPlaneIP: "5.6.7.8",
LoadBalancerIP: "203.0.113.10",
KubeconfigPath: path,
},
mu: sync.Mutex{},
}
h.deployments.Store(id, dep)
return dep
}
// callCRUDInfra fires a request through a freshly-built chi router
// that knows the depId + nested path params. Returns the recorder.
func callCRUDInfra(t *testing.T, h *Handler, method, suffix string, depID string, body any, register func(r chi.Router, h *Handler)) *httptest.ResponseRecorder {
t.Helper()
r := chi.NewRouter()
register(r, h)
var buf *bytes.Buffer
if body != nil {
raw, err := json.Marshal(body)
if err != nil {
t.Fatalf("marshal: %v", err)
}
buf = bytes.NewBuffer(raw)
} else {
buf = bytes.NewBuffer(nil)
}
req := httptest.NewRequest(method, "/api/v1/deployments/"+depID+"/infrastructure/"+suffix, buf)
req.Header.Set("Content-Type", "application/json")
rec := httptest.NewRecorder()
r.ServeHTTP(rec, req)
return rec
}
func mustDecodeMutation(t *testing.T, rec *httptest.ResponseRecorder) infrastructure.MutationResponse {
t.Helper()
var out infrastructure.MutationResponse
if err := json.Unmarshal(rec.Body.Bytes(), &out); err != nil {
t.Fatalf("decode mutation: %v body=%s", err, rec.Body.String())
}
return out
}
/* ── POST /infrastructure/regions ────────────────────────────── */
func TestCreateRegion_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-region-happy")
body := map[string]any{
"region": "hel1",
"skuCP": "cpx21",
"skuWorker": "cpx41",
"workerCount": 2,
}
rec := callCRUDInfra(t, h, http.MethodPost, "regions", dep.ID, body, func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/regions", h.CreateInfrastructureRegion)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindRegionClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindRegionClaim)
}
if !strings.Contains(out.XRCName, "region-hel1") {
t.Fatalf("xrcName must contain 'region-hel1': got %q", out.XRCName)
}
if out.JobID == "" {
t.Fatalf("jobId must be set")
}
if out.Status != "submitted-pending-composition" {
t.Fatalf("status: got %q want submitted-pending-composition", out.Status)
}
}
func TestCreateRegion_NotFound(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
rec := callCRUDInfra(t, h, http.MethodPost, "regions", "ghost",
map[string]any{"region": "hel1", "skuCP": "cpx21"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/regions", h.CreateInfrastructureRegion)
})
if rec.Code != http.StatusNotFound {
t.Fatalf("status: got %d want 404; body=%s", rec.Code, rec.Body.String())
}
}
func TestCreateRegion_503WhenKubeconfigMissing(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
// Build a deployment WITHOUT a kubeconfig path so the
// sovereignDynamicClient short-circuits with 503.
dep := &Deployment{
ID: "dep-no-kubeconfig",
Status: "ready",
Request: provisioner.Request{
SovereignFQDN: "x.example",
Region: "fsn1",
ControlPlaneSize: "cpx21",
},
Result: &provisioner.Result{
// Intentionally empty KubeconfigPath
},
mu: sync.Mutex{},
}
h.deployments.Store(dep.ID, dep)
rec := callCRUDInfra(t, h, http.MethodPost, "regions", dep.ID,
map[string]any{"region": "hel1", "skuCP": "cpx21"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/regions", h.CreateInfrastructureRegion)
})
if rec.Code != http.StatusServiceUnavailable {
t.Fatalf("status: got %d want 503; body=%s", rec.Code, rec.Body.String())
}
if !strings.Contains(rec.Body.String(), "sovereign-cluster-unreachable") {
t.Fatalf("expected sovereign-cluster-unreachable body; got %s", rec.Body.String())
}
}
func TestCreateRegion_409Conflict(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
// Pre-seed the fake dynamic client with the EXACT same XRC the
// handler will compute from (depID, "region", "hel1"). The
// second create() then returns AlreadyExists which the helper
// surfaces as ErrXRCNameConflict → HTTP 409.
depID := "dep-region-conflict"
wantName := infrastructure.XRCName(depID, "region", "hel1")
existing := newUnstructuredXRC(infrastructure.KindRegionClaim, wantName)
h.dynamicFactory = fakeXRCDynamicFactory(existing)
dep := installCRUDDeployment(t, h, depID)
rec := callCRUDInfra(t, h, http.MethodPost, "regions", dep.ID,
map[string]any{"region": "hel1", "skuCP": "cpx21"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/regions", h.CreateInfrastructureRegion)
})
if rec.Code != http.StatusConflict {
t.Fatalf("status: got %d want 409; body=%s", rec.Code, rec.Body.String())
}
if !strings.Contains(rec.Body.String(), "xrc-name-conflict") {
t.Fatalf("expected xrc-name-conflict; got %s", rec.Body.String())
}
}
/* ── POST /infrastructure/regions/{id}/clusters ──────────────── */
func TestCreateCluster_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-cluster-happy")
rec := callCRUDInfra(t, h, http.MethodPost, "regions/region-fsn1/clusters", dep.ID,
map[string]any{"name": "edge-1", "version": "v1.30", "ha": false},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/regions/{id}/clusters", h.CreateInfrastructureCluster)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindClusterClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindClusterClaim)
}
}
/* ── POST /infrastructure/clusters/{id}/vclusters ────────────── */
func TestCreateVCluster_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-vcluster-happy")
rec := callCRUDInfra(t, h, http.MethodPost, "clusters/cluster-x/vclusters", dep.ID,
map[string]any{"name": "dmz", "namespace": "dmz", "role": "dmz"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/clusters/{id}/vclusters", h.CreateInfrastructureVCluster)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindVClusterClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindVClusterClaim)
}
}
/* ── POST /infrastructure/clusters/{id}/pools ────────────────── */
func TestCreatePool_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-pool-happy")
rec := callCRUDInfra(t, h, http.MethodPost, "clusters/cluster-x/pools", dep.ID,
map[string]any{"name": "gpu-1", "role": "worker", "sku": "cpx51", "region": "fsn1", "desiredSize": 3},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/clusters/{id}/pools", h.CreateInfrastructurePool)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindNodePoolClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindNodePoolClaim)
}
}
/* ── PATCH /infrastructure/pools/{id} ─────────────────────────── */
func TestPatchPool_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-pool-patch")
size := 5
rec := callCRUDInfra(t, h, http.MethodPatch, "pools/gpu-1", dep.ID,
map[string]any{"desiredSize": size},
func(r chi.Router, h *Handler) {
r.Patch("/api/v1/deployments/{depId}/infrastructure/pools/{id}", h.PatchInfrastructurePool)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindNodePoolClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindNodePoolClaim)
}
}
func TestPatchPool_ConflictBecomes202(t *testing.T) {
// PATCH semantics: an existing claim with the same name is the
// expected case (PATCH targets convergence). Handler must NOT
// surface 409 for the patch path — it's 202.
h := NewWithPDM(silentLogger(), &fakePDM{})
depID := "dep-pool-patch-conflict"
xrcName := infrastructure.XRCName(depID, "pool", "gpu-1")
existing := newUnstructuredXRC(infrastructure.KindNodePoolClaim, xrcName)
h.dynamicFactory = fakeXRCDynamicFactory(existing)
dep := installCRUDDeployment(t, h, depID)
size := 5
rec := callCRUDInfra(t, h, http.MethodPatch, "pools/gpu-1", dep.ID,
map[string]any{"desiredSize": size},
func(r chi.Router, h *Handler) {
r.Patch("/api/v1/deployments/{depId}/infrastructure/pools/{id}", h.PatchInfrastructurePool)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
}
/* ── POST /infrastructure/loadbalancers ──────────────────────── */
func TestCreateLB_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-lb-happy")
rec := callCRUDInfra(t, h, http.MethodPost, "loadbalancers", dep.ID,
map[string]any{"name": "edge-lb", "region": "hel1", "ports": "443"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/loadbalancers", h.CreateInfrastructureLoadBalancer)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
}
/* ── POST /infrastructure/peerings ────────────────────────────── */
func TestCreatePeering_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-peering-happy")
rec := callCRUDInfra(t, h, http.MethodPost, "peerings", dep.ID,
map[string]any{"name": "fsn1-hel1", "vpcFrom": "vpc-fsn1", "vpcTo": "vpc-hel1", "subnets": "10.0.0.0/16<>10.1.0.0/16"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/peerings", h.CreateInfrastructurePeering)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
}
/* ── POST /infrastructure/firewalls/{id}/rules ───────────────── */
func TestCreateFirewallRule_Happy(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-fw-happy")
rec := callCRUDInfra(t, h, http.MethodPost, "firewalls/fw-1/rules", dep.ID,
map[string]any{"direction": "in", "protocol": "tcp", "port": "443", "sources": "0.0.0.0/0", "action": "accept"},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/firewalls/{id}/rules", h.CreateInfrastructureFirewallRule)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
}
/* ── POST /infrastructure/nodes/{id}/{action} ─────────────────── */
func TestCreateNodeAction_HappyDrain(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-node-drain")
rec := callCRUDInfra(t, h, http.MethodPost, "nodes/node-w-0/drain", dep.ID, nil,
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/nodes/{id}/{action}", h.CreateInfrastructureNodeAction)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindNodeActionClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindNodeActionClaim)
}
}
func TestCreateNodeAction_BadAction(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-node-bad")
rec := callCRUDInfra(t, h, http.MethodPost, "nodes/node-w-0/yeet", dep.ID, nil,
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/nodes/{id}/{action}", h.CreateInfrastructureNodeAction)
})
if rec.Code != http.StatusBadRequest {
t.Fatalf("status: got %d want 400", rec.Code)
}
}
/* ── DELETE /infrastructure/{kind}/{id} ───────────────────────── */
func TestDeleteResource_HappyWithCascade(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
// Pre-seed an XRC the DELETE call can find. The CascadeFor
// helper composes the cascade rows from the LIVE topology
// (which today doesn't include the XRC's children — it pulls
// from the deployment record), so the cascade always emits at
// least one descriptor row.
depID := "dep-region-delete"
xrcName := infrastructure.XRCName(depID, "region", "fsn1")
existing := newUnstructuredXRC(infrastructure.KindRegionClaim, xrcName)
h.dynamicFactory = fakeXRCDynamicFactory(existing)
dep := installCRUDDeployment(t, h, depID)
rec := callCRUDInfra(t, h, http.MethodDelete, "regions/fsn1", dep.ID, nil,
func(r chi.Router, h *Handler) {
r.Delete("/api/v1/deployments/{depId}/infrastructure/{kind}/{id}", h.DeleteInfrastructureResource)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.XRCKind != infrastructure.KindRegionClaim {
t.Fatalf("xrcKind: got %q want %q", out.XRCKind, infrastructure.KindRegionClaim)
}
if len(out.Cascade) == 0 {
t.Fatalf("expected non-empty cascade preview; got %v", out.Cascade)
}
}
func TestDeleteResource_UnknownKind(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-bad-kind")
rec := callCRUDInfra(t, h, http.MethodDelete, "widgets/foo", dep.ID, nil,
func(r chi.Router, h *Handler) {
r.Delete("/api/v1/deployments/{depId}/infrastructure/{kind}/{id}", h.DeleteInfrastructureResource)
})
if rec.Code != http.StatusBadRequest {
t.Fatalf("status: got %d want 400; body=%s", rec.Code, rec.Body.String())
}
}
func TestDeleteResource_AlreadyAbsent(t *testing.T) {
// Delete returning NotFound is treated as "already gone" — the
// audit Job is still committed and the response is 202 with
// status="already-absent" so the FE can re-render.
h := NewWithPDM(silentLogger(), &fakePDM{})
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-region-already-gone")
rec := callCRUDInfra(t, h, http.MethodDelete, "regions/fsn1", dep.ID, nil,
func(r chi.Router, h *Handler) {
r.Delete("/api/v1/deployments/{depId}/infrastructure/{kind}/{id}", h.DeleteInfrastructureResource)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
out := mustDecodeMutation(t, rec)
if out.Status != "already-absent" {
t.Fatalf("status: got %q want already-absent", out.Status)
}
}
/* ── Audit-trail end-to-end: mutation Job materialised ───────── */
// TestCreateRegion_AuditJobMaterialised — after a successful create,
// the catalyst-api MUST have committed a Job + Execution + LogLines
// to the jobs Store under the current deployment id. This is the
// audit-trail invariant: every Day-2 mutation is observable via the
// existing /jobs surface.
func TestCreateRegion_AuditJobMaterialised(t *testing.T) {
dir := t.TempDir()
js, err := jobs.NewStore(dir)
if err != nil {
t.Fatalf("jobs.NewStore: %v", err)
}
h := NewWithJobsStore(silentLogger(), js)
h.dynamicFactory = fakeXRCDynamicFactory()
dep := installCRUDDeployment(t, h, "dep-audit-region")
rec := callCRUDInfra(t, h, http.MethodPost, "regions", dep.ID,
map[string]any{"region": "hel1", "skuCP": "cpx21", "workerCount": 2},
func(r chi.Router, h *Handler) {
r.Post("/api/v1/deployments/{depId}/infrastructure/regions", h.CreateInfrastructureRegion)
})
if rec.Code != http.StatusAccepted {
t.Fatalf("status: got %d want 202; body=%s", rec.Code, rec.Body.String())
}
jobsList, err := js.ListJobs(dep.ID)
if err != nil {
t.Fatalf("ListJobs: %v", err)
}
if len(jobsList) == 0 {
t.Fatalf("expected at least one mutation Job committed; got none")
}
found := false
for _, j := range jobsList {
if strings.HasPrefix(j.JobName, jobs.MutationJobNamePrefix) && j.BatchID == jobs.BatchDay2Mutations {
found = true
if j.Status != jobs.StatusSucceeded {
t.Fatalf("mutation Job status: got %q want %q", j.Status, jobs.StatusSucceeded)
}
}
}
if !found {
t.Fatalf("expected a Job with name prefix %q and batch %q; got %+v", jobs.MutationJobNamePrefix, jobs.BatchDay2Mutations, jobsList)
}
}
/* ── Helpers ──────────────────────────────────────────────────── */
// newUnstructuredXRC builds a bare XRC unstructured suitable for
// pre-seeding the fake dynamic client. Only metadata.name +
// apiVersion + kind are populated; tests don't assert on .spec
// content.
func newUnstructuredXRC(kind, name string) *unstructured.Unstructured {
u := &unstructured.Unstructured{}
u.SetAPIVersion(infrastructure.XRCAPIGroup + "/" + infrastructure.XRCAPIVersion)
u.SetKind(kind)
u.SetNamespace(infrastructure.XRCNamespace)
u.SetName(name)
return u
}
// silenceUnused — keep the metav1 + context imports anchored even
// when the test file's surface evolves.
var (
_ = metav1.ObjectMeta{}
_ = context.Background
)

View File

@ -1,7 +1,11 @@
// infrastructure_test.go — coverage for the Sovereign Infrastructure
// REST surface. Pins the wire shape every endpoint emits + the 404
// path so the UI's contract stays stable as the data sources evolve
// (today: deployment record only; future: live-cluster kubeconfig).
// REST surface.
//
// The unified GET .../infrastructure/topology emits the hierarchical
// TopologyResponse shape (cloud → topology.regions[*] → clusters →
// vclusters | pools | nodes | LBs + storage). The legacy flat
// /compute, /storage, /network endpoints remain wired with their
// pre-existing shapes until the FE migrates to the unified topology.
package handler
import (
@ -14,6 +18,7 @@ import (
"github.com/go-chi/chi/v5"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/infrastructure"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/provisioner"
)
@ -32,6 +37,7 @@ func installInfraDeployment(t *testing.T, h *Handler, status string) (*Deploymen
ControlPlaneSize: "cpx21",
WorkerSize: "cpx41",
WorkerCount: 2,
HetznerProjectID: "test-project",
},
mu: sync.Mutex{},
}
@ -63,6 +69,9 @@ func TestInfrastructureTopology_NotFound(t *testing.T) {
}
}
// TestInfrastructureTopology_OKShape pins the unified hierarchical
// shape: cloud, topology.regions[*].clusters[*].nodes[*]/pools[*]/LBs,
// storage. The legacy flat nodes/edges shape is no longer emitted.
func TestInfrastructureTopology_OKShape(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
dep, id := installInfraDeployment(t, h, "ready")
@ -76,37 +85,61 @@ func TestInfrastructureTopology_OKShape(t *testing.T) {
if rec.Code != http.StatusOK {
t.Fatalf("status: got %d want 200; body=%s", rec.Code, rec.Body.String())
}
var out infraTopologyResponse
var out infrastructure.TopologyResponse
if err := json.Unmarshal(rec.Body.Bytes(), &out); err != nil {
t.Fatalf("decode: %v", err)
}
if len(out.Nodes) == 0 {
t.Fatalf("expected non-empty nodes")
// Cloud — exactly one tenant per provider.
if len(out.Cloud) != 1 {
t.Fatalf("expected 1 cloud tenant; got %d", len(out.Cloud))
}
if len(out.Edges) == 0 {
t.Fatalf("expected non-empty edges")
if out.Cloud[0].Provider != "hetzner" {
t.Fatalf("cloud provider: got %q want hetzner", out.Cloud[0].Provider)
}
if out.Cloud[0].ProjectID != "test-project" {
t.Fatalf("cloud projectID: got %q want test-project", out.Cloud[0].ProjectID)
}
// Cloud + cluster + LB + workers must all surface. Spot-check kinds.
kinds := map[string]int{}
for _, n := range out.Nodes {
kinds[n.Kind]++
// Topology — pattern + 1 region (legacy singular path).
if out.Topology.Pattern == "" {
t.Fatalf("topology pattern is required")
}
if kinds["cloud"] != 1 {
t.Fatalf("expected 1 cloud node; got %d", kinds["cloud"])
if len(out.Topology.Regions) != 1 {
t.Fatalf("expected 1 region; got %d", len(out.Topology.Regions))
}
if kinds["cluster"] != 1 {
t.Fatalf("expected 1 cluster node; got %d", kinds["cluster"])
rg := out.Topology.Regions[0]
if rg.ProviderRegion != "fsn1" {
t.Fatalf("region.providerRegion: got %q want fsn1", rg.ProviderRegion)
}
if kinds["lb"] != 1 {
t.Fatalf("expected 1 lb node when LoadBalancerIP set; got %d", kinds["lb"])
if rg.SkuCP != "cpx21" {
t.Fatalf("region.skuCP: got %q want cpx21", rg.SkuCP)
}
// At least the workers + control plane.
if kinds["node"] < 3 {
t.Fatalf("expected >=3 node entries (1 cp + 2 workers); got %d", kinds["node"])
if len(rg.Clusters) != 1 {
t.Fatalf("expected 1 cluster per region; got %d", len(rg.Clusters))
}
c := rg.Clusters[0]
if c.Name != "omantel.omani.works" {
t.Fatalf("cluster.name: got %q want omantel.omani.works", c.Name)
}
// 1 cp + 2 workers
if len(c.Nodes) != 3 {
t.Fatalf("expected 3 nodes (1 cp + 2 workers); got %d", len(c.Nodes))
}
if len(c.LoadBalancers) != 1 {
t.Fatalf("expected 1 LB when LoadBalancerIP set; got %d", len(c.LoadBalancers))
}
if c.LoadBalancers[0].PublicIP != "203.0.113.10" {
t.Fatalf("lb publicIP: got %q want 203.0.113.10", c.LoadBalancers[0].PublicIP)
}
// node pools: 1 cp pool + 1 worker pool
if len(c.NodePools) != 2 {
t.Fatalf("expected 2 node pools (cp + worker); got %d", len(c.NodePools))
}
}
// TestInfrastructureTopology_NoLBWhenAbsent — pre-LB-reconcile
// deployment must not surface a synthesised LB row.
func TestInfrastructureTopology_NoLBWhenAbsent(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
_, id := installInfraDeployment(t, h, "provisioning")
@ -114,11 +147,56 @@ func TestInfrastructureTopology_NoLBWhenAbsent(t *testing.T) {
if rec.Code != http.StatusOK {
t.Fatalf("status: got %d want 200; body=%s", rec.Code, rec.Body.String())
}
var out infraTopologyResponse
var out infrastructure.TopologyResponse
_ = json.Unmarshal(rec.Body.Bytes(), &out)
for _, n := range out.Nodes {
if n.Kind == "lb" {
t.Fatalf("expected no lb node before LoadBalancerIP is reported; got %+v", n)
if len(out.Topology.Regions) != 1 {
t.Fatalf("expected 1 region; got %d", len(out.Topology.Regions))
}
if len(out.Topology.Regions[0].Clusters) == 0 {
t.Fatal("expected 1 cluster")
}
if len(out.Topology.Regions[0].Clusters[0].LoadBalancers) != 0 {
t.Fatalf("expected no LBs before LoadBalancerIP reported; got %+v", out.Topology.Regions[0].Clusters[0].LoadBalancers)
}
}
// TestInfrastructureTopology_StorageEmptyFallback — storage arrays
// MUST serialise as `[]` (never null) so the FE can iterate them.
func TestInfrastructureTopology_StorageEmptyFallback(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
_, id := installInfraDeployment(t, h, "ready")
rec := callInfra(t, h, http.MethodGet, "topology", id, h.GetInfrastructureTopology)
if rec.Code != http.StatusOK {
t.Fatalf("status: got %d want 200; body=%s", rec.Code, rec.Body.String())
}
body := rec.Body.String()
if !strings.Contains(body, `"pvcs":[]`) {
t.Fatalf("storage.pvcs must serialise as []; body=%s", body)
}
if !strings.Contains(body, `"buckets":[]`) {
t.Fatalf("storage.buckets must serialise as []; body=%s", body)
}
if !strings.Contains(body, `"volumes":[]`) {
t.Fatalf("storage.volumes must serialise as []; body=%s", body)
}
}
// TestInfrastructureTopology_PeeringsEmptyByDefault — when no
// PeeringClaim XRCs exist, the loader emits an empty peerings array.
func TestInfrastructureTopology_PeeringsEmptyByDefault(t *testing.T) {
h := NewWithPDM(silentLogger(), &fakePDM{})
_, id := installInfraDeployment(t, h, "ready")
rec := callInfra(t, h, http.MethodGet, "topology", id, h.GetInfrastructureTopology)
if rec.Code != http.StatusOK {
t.Fatalf("status: got %d", rec.Code)
}
var out infrastructure.TopologyResponse
_ = json.Unmarshal(rec.Body.Bytes(), &out)
for _, rg := range out.Topology.Regions {
for _, n := range rg.Networks {
if n.Peerings == nil {
t.Fatalf("network.peerings must be [] not null")
}
}
}
}
@ -174,7 +252,6 @@ func TestInfrastructureStorage_OKEmpty(t *testing.T) {
if len(out.PVCs) != 0 || len(out.Buckets) != 0 || len(out.Volumes) != 0 {
t.Fatalf("expected empty arrays for live-cluster sourced data; got %+v", out)
}
// JSON arrays MUST be `[]` not `null` so the UI can iterate them.
body := rec.Body.String()
if !strings.Contains(body, `"pvcs":[]`) {
t.Fatalf("pvcs field must serialise as `[]`, got body=%s", body)

View File

@ -0,0 +1,623 @@
// topology_loader.go — composes the unified TopologyResponse from the
// three available data sources:
//
// 1. The deployment record's Phase-0 OpenTofu outputs (provisioner.
// Result + Request) — always available post-Phase-0; carries
// control-plane IP, load-balancer IP, declared region SKUs, and
// declared worker counts.
//
// 2. The live Sovereign cluster's dynamic informer cache — populated
// by the helmwatch.Watcher attached to this deployment. Reads
// vcluster.io/v1alpha1 VClusters when the operator is installed
// plus core/v1 PVCs from the live cluster.
//
// 3. The Crossplane managed-resource list — surfaces XRCs the
// catalyst-api itself wrote. Populated by the same dynamic
// client; empty when no claims exist.
//
// Per docs/INVIOLABLE-PRINCIPLES.md (no placeholder data) every
// per-source query that fails or returns empty results in an empty
// slice on the response — never a synthesised row.
package infrastructure
import (
"context"
"strings"
"time"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/client-go/dynamic"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/provisioner"
)
// LoaderInput — the deployment-shaped data the handler hands to the
// loader. The loader does not import the handler package (would
// create a cycle); the handler unwraps Deployment fields onto this
// struct and calls Load.
type LoaderInput struct {
DeploymentID string
Status string // canonical UI status
SovereignFQDN string
Provider string
Region string
Regions []provisioner.RegionSpec
WorkerCount int
WorkerSize string
CPSize string
Result *provisioner.Result
HetznerProjectID string
// DynamicClient — Sovereign cluster dynamic client, built from
// the persisted kubeconfig by the live-watcher. Nil when the
// kubeconfig hasn't been postedback yet — the loader emits empty
// arrays for live-source fields in that case.
DynamicClient dynamic.Interface
}
// Load composes the unified TopologyResponse. The function is
// allocation-light by design — every slice is pre-sized off the
// request shape so the typical 1-region happy-path emits a single
// allocation per per-region child.
func Load(ctx context.Context, in LoaderInput) TopologyResponse {
if ctx == nil {
ctx = context.Background()
}
cloud := buildCloud(in)
topology := buildTopology(ctx, in)
storage := buildStorage(ctx, in)
return TopologyResponse{
Cloud: cloud,
Topology: topology,
Storage: storage,
}
}
// buildCloud — one tenant per cloud provider. Today every Sovereign
// runs against exactly one Hetzner project; multi-cloud will add
// per-provider entries.
func buildCloud(in LoaderInput) []CloudTenant {
provider := in.Provider
if provider == "" {
provider = "hetzner"
}
tenant := CloudTenant{
ID: "cloud-" + provider,
Provider: provider,
Name: provider,
Status: in.Status,
ProjectID: in.HetznerProjectID,
}
return []CloudTenant{tenant}
}
// buildTopology — pattern + per-region build-out. One Region row per
// Regions[*] entry; legacy single-region path uses the singular
// Request fields.
func buildTopology(ctx context.Context, in LoaderInput) TopologyData {
pattern := derivePattern(in)
regions := []Region{}
if len(in.Regions) > 0 {
for _, rs := range in.Regions {
regions = append(regions, buildRegion(ctx, in, rs))
}
} else if in.Region != "" {
// Legacy singular path — pre-multi-region wizard payload.
legacy := provisioner.RegionSpec{
Provider: in.Provider,
CloudRegion: in.Region,
ControlPlaneSize: in.CPSize,
WorkerSize: in.WorkerSize,
WorkerCount: in.WorkerCount,
}
regions = append(regions, buildRegion(ctx, in, legacy))
}
return TopologyData{
Pattern: pattern,
Regions: regions,
}
}
func derivePattern(in LoaderInput) string {
switch {
case len(in.Regions) > 1:
return "multi-region"
case len(in.Regions) == 1 && in.Regions[0].WorkerCount >= 3:
return "ha-pair"
case len(in.Regions) == 1:
return "solo"
case in.Region != "":
return "solo"
default:
return "unknown"
}
}
func buildRegion(ctx context.Context, in LoaderInput, rs provisioner.RegionSpec) Region {
provider := rs.Provider
if provider == "" {
provider = "hetzner"
}
regionID := "region-" + rs.CloudRegion
cluster := buildCluster(ctx, in, rs)
networks := buildNetworks(ctx, in, rs)
return Region{
ID: regionID,
Name: rs.CloudRegion,
Provider: provider,
ProviderRegion: rs.CloudRegion,
SkuCP: rs.ControlPlaneSize,
SkuWorker: rs.WorkerSize,
WorkerCount: rs.WorkerCount,
Status: in.Status,
Clusters: []Cluster{cluster},
Networks: networks,
}
}
func buildCluster(ctx context.Context, in LoaderInput, rs provisioner.RegionSpec) Cluster {
clusterName := in.SovereignFQDN
if clusterName == "" {
dep := in.DeploymentID
if len(dep) > 8 {
dep = dep[:8]
}
clusterName = "cluster-" + dep
}
clusterID := "cluster-" + in.DeploymentID + "-" + rs.CloudRegion
if rs.CloudRegion == "" {
clusterID = "cluster-" + in.DeploymentID
}
nodes := buildNodes(in, rs)
pools := buildNodePools(in, rs)
lbs := buildLBs(in, rs)
vclusters := loadVClusters(ctx, in)
return Cluster{
ID: clusterID,
Name: clusterName,
Version: "v1.30",
Status: in.Status,
NodeCount: len(nodes),
VClusters: vclusters,
LoadBalancers: lbs,
NodePools: pools,
Nodes: nodes,
}
}
func buildNodes(in LoaderInput, rs provisioner.RegionSpec) []Node {
out := []Node{}
cpIP := ""
if in.Result != nil {
cpIP = in.Result.ControlPlaneIP
}
cpID := "node-cp-" + rs.CloudRegion
if rs.CloudRegion == "" {
cpID = "node-cp-" + in.DeploymentID
}
out = append(out, Node{
ID: cpID,
Name: "control-plane-" + rs.CloudRegion,
SKU: rs.ControlPlaneSize,
Region: rs.CloudRegion,
Role: "control-plane",
IP: cpIP,
Status: in.Status,
NodePoolID: "pool-cp-" + rs.CloudRegion,
})
for i := 0; i < rs.WorkerCount; i++ {
wID := "node-w-" + itoa(i) + "-" + rs.CloudRegion
if rs.CloudRegion == "" {
wID = "node-w-" + itoa(i) + "-" + in.DeploymentID
}
out = append(out, Node{
ID: wID,
Name: "worker-" + itoa(i+1) + "-" + rs.CloudRegion,
SKU: rs.WorkerSize,
Region: rs.CloudRegion,
Role: "worker",
IP: "",
Status: in.Status,
NodePoolID: "pool-worker-" + rs.CloudRegion,
})
}
return out
}
func buildNodePools(in LoaderInput, rs provisioner.RegionSpec) []NodePool {
pools := []NodePool{
{
ID: "pool-cp-" + rs.CloudRegion,
Name: "control-plane-" + rs.CloudRegion,
Role: "control-plane",
SKU: rs.ControlPlaneSize,
Region: rs.CloudRegion,
DesiredSize: 1,
CurrentSize: 1,
Status: in.Status,
},
}
if rs.WorkerCount > 0 {
pools = append(pools, NodePool{
ID: "pool-worker-" + rs.CloudRegion,
Name: "worker-" + rs.CloudRegion,
Role: "worker",
SKU: rs.WorkerSize,
Region: rs.CloudRegion,
DesiredSize: rs.WorkerCount,
CurrentSize: rs.WorkerCount,
Status: in.Status,
})
}
return pools
}
func buildLBs(in LoaderInput, rs provisioner.RegionSpec) []LoadBalancer {
if in.Result == nil || in.Result.LoadBalancerIP == "" {
return []LoadBalancer{}
}
name := in.SovereignFQDN
if name == "" {
name = "ingress-lb"
}
return []LoadBalancer{{
ID: "lb-" + in.DeploymentID,
Name: name,
PublicIP: in.Result.LoadBalancerIP,
Ports: "80,443,6443",
TargetHealth: "—",
Region: rs.CloudRegion,
Status: in.Status,
}}
}
func buildNetworks(ctx context.Context, in LoaderInput, rs provisioner.RegionSpec) []Network {
// Per-region VPC stamped by the Phase-0 module; follow-on
// Day-2 PeeringClaim XRCs bind regions together. Today we
// surface one Network per region with empty Peerings until the
// Crossplane Composition lands and Peering objects exist.
netID := "net-" + rs.CloudRegion + "-" + in.DeploymentID
if rs.CloudRegion == "" {
netID = "net-" + in.DeploymentID
}
return []Network{{
ID: netID,
Name: "vpc-" + rs.CloudRegion,
CIDR: "",
Region: rs.CloudRegion,
Peerings: loadPeerings(ctx, in, rs),
Firewall: nil,
Status: in.Status,
}}
}
// loadVClusters — query the Sovereign cluster's vcluster.io/v1alpha1
// CRs. Returns an empty slice when the operator isn't installed
// (Crd doesn't exist) or when no vclusters have been provisioned.
//
// The recover guard tolerates fake-client panics in unit tests
// (k8s.io/client-go/dynamic/fake panics on unregistered list-kinds);
// production never hits this path because the real apiserver
// returns 404 instead of panicking.
func loadVClusters(ctx context.Context, in LoaderInput) (out []VCluster) {
out = []VCluster{}
defer func() {
if r := recover(); r != nil {
out = []VCluster{}
}
}()
if in.DynamicClient == nil {
return out
}
gvr := schema.GroupVersionResource{
Group: "vcluster.io",
Version: "v1alpha1",
Resource: "vclusters",
}
cctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
list, err := in.DynamicClient.Resource(gvr).Namespace("").List(cctx, metav1.ListOptions{})
if err != nil || list == nil {
return out
}
for _, item := range list.Items {
role := vclusterRole(item.GetLabels())
out = append(out, VCluster{
ID: "vcluster-" + item.GetNamespace() + "-" + item.GetName(),
Name: item.GetName(),
Namespace: item.GetNamespace(),
Role: role,
Status: statusFromUnstructured(item.Object),
})
}
return out
}
func vclusterRole(labels map[string]string) string {
if v, ok := labels["catalyst.openova.io/role"]; ok && v != "" {
return v
}
if v, ok := labels["building-block"]; ok && v != "" {
return v
}
return "other"
}
// loadPeerings — query Crossplane PeeringClaim XRCs scoped to this
// deployment via the LabelDeploymentID selector.
//
// The recover guard tolerates fake-client panics in unit tests as
// described on loadVClusters.
func loadPeerings(ctx context.Context, in LoaderInput, rs provisioner.RegionSpec) (out []Peering) {
out = []Peering{}
defer func() {
if r := recover(); r != nil {
out = []Peering{}
}
}()
if in.DynamicClient == nil {
return out
}
gvr := gvrForKind(KindPeeringClaim)
cctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
list, err := in.DynamicClient.Resource(gvr).Namespace(XRCNamespace).List(cctx, metav1.ListOptions{
LabelSelector: LabelDeploymentID + "=" + in.DeploymentID,
})
if err != nil || list == nil {
return out
}
for _, item := range list.Items {
spec, _, _ := nestedMap(item.Object, "spec")
out = append(out, Peering{
ID: string(item.GetUID()),
Name: item.GetName(),
VPCPair: stringField(spec, "vpcPair"),
Subnets: stringField(spec, "subnets"),
Status: statusFromUnstructured(item.Object),
})
}
return out
}
// buildStorage — PVCs from the live cluster + buckets/volumes from
// the Crossplane managed-resource list. Empty slices when sources
// aren't reachable.
func buildStorage(ctx context.Context, in LoaderInput) StorageData {
return StorageData{
PVCs: loadPVCs(ctx, in),
Buckets: []Bucket{},
Volumes: []Volume{},
}
}
func loadPVCs(ctx context.Context, in LoaderInput) (out []PVC) {
out = []PVC{}
defer func() {
if r := recover(); r != nil {
out = []PVC{}
}
}()
if in.DynamicClient == nil {
return out
}
gvr := schema.GroupVersionResource{
Group: "",
Version: "v1",
Resource: "persistentvolumeclaims",
}
cctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
list, err := in.DynamicClient.Resource(gvr).Namespace("").List(cctx, metav1.ListOptions{})
if err != nil || list == nil {
return out
}
for _, item := range list.Items {
spec, _, _ := nestedMap(item.Object, "spec")
status, _, _ := nestedMap(item.Object, "status")
capacity := stringField(stringMapField(status, "capacity"), "storage")
out = append(out, PVC{
ID: string(item.GetUID()),
Name: item.GetName(),
Namespace: item.GetNamespace(),
Capacity: capacity,
Used: "",
StorageClass: stringField(spec, "storageClassName"),
Status: stringField(status, "phase"),
})
}
return out
}
// CascadeFor — given a delete target (kind + id) and the current
// topology, lists the child resources that would be reaped. Used by
// the DELETE handler to populate the 202 response's Cascade slice.
func CascadeFor(kind, id string, topology TopologyResponse) []CascadeImpact {
out := []CascadeImpact{}
switch strings.ToLower(kind) {
case "region":
for _, rg := range topology.Topology.Regions {
if rg.ID != id {
continue
}
for _, c := range rg.Clusters {
out = append(out, CascadeImpact{
Kind: "cluster", ID: c.ID, Name: c.Name,
Note: "cluster will drain + be reaped",
})
for _, np := range c.NodePools {
out = append(out, CascadeImpact{
Kind: "nodePool", ID: np.ID, Name: np.Name,
Note: "node pool will be deleted",
})
}
for _, n := range c.Nodes {
out = append(out, CascadeImpact{
Kind: "node", ID: n.ID, Name: n.Name,
Note: "workloads will be drained",
})
}
for _, lb := range c.LoadBalancers {
out = append(out, CascadeImpact{
Kind: "lb", ID: lb.ID, Name: lb.Name,
Note: "load balancer will be released",
})
}
}
for _, n := range rg.Networks {
out = append(out, CascadeImpact{
Kind: "network", ID: n.ID, Name: n.Name,
Note: "VPC will be released; peerings disconnected",
})
for _, p := range n.Peerings {
out = append(out, CascadeImpact{
Kind: "peering", ID: p.ID, Name: p.Name,
Note: "peering will be torn down",
})
}
}
}
case "cluster":
for _, rg := range topology.Topology.Regions {
for _, c := range rg.Clusters {
if c.ID != id {
continue
}
for _, np := range c.NodePools {
out = append(out, CascadeImpact{Kind: "nodePool", ID: np.ID, Name: np.Name})
}
for _, n := range c.Nodes {
out = append(out, CascadeImpact{Kind: "node", ID: n.ID, Name: n.Name})
}
for _, lb := range c.LoadBalancers {
out = append(out, CascadeImpact{Kind: "lb", ID: lb.ID, Name: lb.Name})
}
}
}
case "nodepool", "pool":
for _, rg := range topology.Topology.Regions {
for _, c := range rg.Clusters {
for _, np := range c.NodePools {
if np.ID != id {
continue
}
for _, n := range c.Nodes {
if n.NodePoolID == np.ID {
out = append(out, CascadeImpact{Kind: "node", ID: n.ID, Name: n.Name,
Note: "node will be drained + cordoned"})
}
}
}
}
}
}
// Always emit at least one descriptor so the FE confirm dialog
// can render a row even when no children are observable.
if len(out) == 0 {
out = append(out, CascadeImpact{
Kind: kind,
ID: id,
Name: id,
Note: "no observable child resources — proceeding will reap the underlying cloud resources",
})
}
return out
}
/* ─── Helpers (no client-go mutation here; reads only) ─── */
func itoa(n int) string {
if n == 0 {
return "0"
}
neg := false
if n < 0 {
neg = true
n = -n
}
var buf [20]byte
i := len(buf)
for n > 0 {
i--
buf[i] = byte('0' + n%10)
n /= 10
}
if neg {
i--
buf[i] = '-'
}
return string(buf[i:])
}
func nestedMap(obj map[string]any, path ...string) (map[string]any, bool, error) {
cur := obj
for _, p := range path {
v, ok := cur[p]
if !ok {
return nil, false, nil
}
m, ok := v.(map[string]any)
if !ok {
return nil, false, nil
}
cur = m
}
return cur, true, nil
}
func stringField(m map[string]any, key string) string {
if m == nil {
return ""
}
if v, ok := m[key]; ok {
if s, ok := v.(string); ok {
return s
}
}
return ""
}
func stringMapField(m map[string]any, key string) map[string]any {
if m == nil {
return nil
}
if v, ok := m[key]; ok {
if mm, ok := v.(map[string]any); ok {
return mm
}
}
return nil
}
func statusFromUnstructured(obj map[string]any) string {
status, _, _ := nestedMap(obj, "status")
if status == nil {
return "unknown"
}
if phase := stringField(status, "phase"); phase != "" {
return phase
}
if cs, ok := status["conditions"].([]any); ok {
for _, c := range cs {
cm, ok := c.(map[string]any)
if !ok {
continue
}
if stringField(cm, "type") == "Ready" {
if stringField(cm, "status") == "True" {
return "healthy"
}
return strings.ToLower(stringField(cm, "reason"))
}
}
}
return "unknown"
}

View File

@ -0,0 +1,297 @@
// Package infrastructure carries the wire-contract types + Crossplane
// XRC writer helpers + topology loader the catalyst-api Sovereign
// Infrastructure surface emits.
//
// # Architectural rule (docs/INVIOLABLE-PRINCIPLES.md #3)
//
// All Day-2 mutations MUST go through Crossplane. The catalyst-api
// writes a Crossplane Composite Resource Claim (XRC) into the Sovereign
// cluster (NOT contabo-mkt) and returns 202. The Crossplane provider
// does the cloud work. The UI watches the Job that wraps the XRC
// submission via the existing Jobs/Executions surface (issue #205).
//
// catalyst-api MUST NOT call hcloud-go, NEVER `exec.Command("kubectl",
// ...)` for mutation, NEVER use client-go for direct mutation of cluster
// resources outside the XRC-write path. The dynamic client created from
// the deployment's persisted kubeconfig is the ONLY mutation seam.
//
// # Wire contract — TopologyResponse
//
// The unified GET /api/v1/deployments/{id}/infrastructure/topology
// returns the WHOLE hierarchical tree in ONE shape so the four
// frontend tabs (Topology / Compute / Storage / Network) all render
// filtered views off a single response — no per-tab fetches, no
// cross-fetch coordination state on the client side.
//
// # Empty fallback (founder principle: never placeholder data)
//
// When live data isn't available (vCluster CRs not present, peerings
// not provisioned yet), the loader returns empty arrays — never
// placeholder rows. The frontend's empty-card UX is the canonical
// surface for that state.
package infrastructure
import "time"
// TopologyResponse — unified hierarchical view of the Sovereign's
// infrastructure. The four tabs filter views off this single shape.
type TopologyResponse struct {
// Cloud — list of cloud-provider tenants behind this Sovereign.
// Today we model one Hetzner project per deployment; future
// multi-cloud Sovereigns will surface multiple tenants here.
Cloud []CloudTenant `json:"cloud"`
// Topology — pattern + per-region layout. The wizard's BYO Flow B
// + the multi-region per-provider rework make this the canonical
// shape: a Sovereign is N regions × clusters × node-pools wide.
Topology TopologyData `json:"topology"`
// Storage — Persistent Volume Claims, S3-compatible buckets, and
// raw block Volumes attached across the topology. Aggregated here
// so the Storage tab renders without a second round-trip.
Storage StorageData `json:"storage"`
}
// CloudTenant — one cloud-provider account/project this Sovereign
// runs against. The catalyst-api derives this from the deployment
// record's Request (e.g. HetznerProjectID) — credentials never flow
// into the response.
type CloudTenant struct {
ID string `json:"id"`
Provider string `json:"provider"` // hetzner | oci | aws | ...
Name string `json:"name"` // human label
// ProjectID — opaque cloud-side identifier (e.g. Hetzner project
// number). Read-only metadata; never treated as a credential.
ProjectID string `json:"projectID,omitempty"`
// Status mirrors the deployment's overall status — healthy when
// the Sovereign is ready, unknown while provisioning, failed on
// terminal failure.
Status string `json:"status"`
}
// TopologyData — pattern + regions list. Pattern derives from the
// number of regions and HA flag: solo (1 region, 1 cluster), ha-pair
// (1 region, HA), multi-region (N>1 regions), air-gap (BYO + isolated).
type TopologyData struct {
Pattern string `json:"pattern"`
Regions []Region `json:"regions"`
}
// Region — one cloud region within the Sovereign's deployment. Carries
// the per-region SKU + worker count plus the live cluster + node-pool
// + network state read from the informer cache when available.
type Region struct {
ID string `json:"id"`
Name string `json:"name"`
Provider string `json:"provider"`
ProviderRegion string `json:"providerRegion"` // e.g. fsn1, hel1, ash
// SkuCP / SkuWorker — Hetzner server type slug (cpx21, cpx41).
// Empty when the deployment hasn't reached Validate yet.
SkuCP string `json:"skuCP"`
SkuWorker string `json:"skuWorker"`
// WorkerCount — declared worker count for this region. The
// frontend renders this on the region card; the live node count
// comes from len(Clusters[*].Nodes).
WorkerCount int `json:"workerCount"`
// Status — healthy | degraded | failed | unknown. Pre-Phase-0
// deployments emit unknown.
Status string `json:"status"`
Clusters []Cluster `json:"clusters"`
Networks []Network `json:"networks"`
}
// Cluster — one Kubernetes cluster within a region. The OWNING
// Sovereign always has exactly one host cluster today; future
// multi-cluster Sovereigns will surface additional entries here.
type Cluster struct {
ID string `json:"id"`
Name string `json:"name"`
Version string `json:"version"`
Status string `json:"status"`
// NodeCount — total nodes (control-plane + workers) across all
// node-pools on this cluster. Computed from Nodes when the live
// cluster informer cache is populated; falls back to the
// declared count from the deployment record otherwise.
NodeCount int `json:"nodeCount"`
VClusters []VCluster `json:"vclusters"`
LoadBalancers []LoadBalancer `json:"loadBalancers"`
NodePools []NodePool `json:"nodePools"`
Nodes []Node `json:"nodes"`
}
// VCluster — a vcluster.io v1alpha1 virtual cluster running on the
// host cluster. Used by Catalyst's DMZ / RTZ / MGMT building-block
// layout. Populated only when the vcluster operator is installed AND
// at least one VCluster CR exists; otherwise the slice is empty.
type VCluster struct {
ID string `json:"id"`
Name string `json:"name"`
Namespace string `json:"namespace"`
Role string `json:"role"` // dmz | rtz | mgmt | other
Status string `json:"status"`
}
// LoadBalancer — Hetzner cloud LB (or future multi-cloud equivalent)
// attached to the cluster. Today a Sovereign has exactly one LB
// fronting the ingress controller; future multi-LB topologies surface
// here.
type LoadBalancer struct {
ID string `json:"id"`
Name string `json:"name"`
PublicIP string `json:"publicIP"`
Ports string `json:"ports"`
TargetHealth string `json:"targetHealth"`
Region string `json:"region"`
Status string `json:"status"`
}
// NodePool — a logical group of identically-sized worker nodes the
// catalyst-environment-controller can scale up/down via the
// NodePoolClaim XRC. The Phase-0 OpenTofu module emits one
// control-plane pool + one worker pool per region; Day-2 mutations
// add additional pools.
type NodePool struct {
ID string `json:"id"`
Name string `json:"name"`
Role string `json:"role"` // control-plane | worker
SKU string `json:"sku"` // server type slug
Region string `json:"region"`
DesiredSize int `json:"desiredSize"` // declared size
CurrentSize int `json:"currentSize"` // observed from informer
Status string `json:"status"`
}
// Node — one Kubernetes node. Surfaced from the live cluster's
// informer cache; pre-cluster deployments synthesise nodes from the
// declared topology for the canvas to render.
type Node struct {
ID string `json:"id"`
Name string `json:"name"`
SKU string `json:"sku"`
Region string `json:"region"`
Role string `json:"role"`
IP string `json:"ip"`
Status string `json:"status"`
NodePoolID string `json:"nodePoolID,omitempty"`
}
// Network — one cloud network / VPC plus its peerings + firewalls.
// Multi-region Sovereigns peer their per-region networks through this
// shape; the third-sibling Crossplane PeeringClaim Composition writes
// the actual peering object.
type Network struct {
ID string `json:"id"`
Name string `json:"name"`
CIDR string `json:"cidr"`
Region string `json:"region"`
Peerings []Peering `json:"peerings"`
Firewall *FirewallRules `json:"firewall,omitempty"`
Status string `json:"status"`
}
// Peering — one VPC-to-VPC peering edge. Status mirrors the cloud
// provider's terminal state (active | degraded | failed | unknown).
type Peering struct {
ID string `json:"id"`
Name string `json:"name"`
VPCPair string `json:"vpcPair"`
Subnets string `json:"subnets"`
Status string `json:"status"`
}
// FirewallRules — collected ingress/egress rules a cloud firewall
// applies to its attached resources. Surfaced as a dedicated child of
// Network so the Network tab renders rule chips inline with the VPC
// card.
type FirewallRules struct {
ID string `json:"id"`
Name string `json:"name"`
Rules []FirewallRule `json:"rules"`
}
// FirewallRule — a single allow/deny rule. The IP-list field is
// rendered as a comma-separated string in the UI; a slice would tempt
// the React grid to scroll horizontally for /32 enumerations.
type FirewallRule struct {
ID string `json:"id"`
Direction string `json:"direction"` // in | out
Protocol string `json:"protocol"` // tcp | udp | icmp
Port string `json:"port"` // empty for icmp
Sources string `json:"sources"` // CIDR list (CSV)
Action string `json:"action"` // accept | drop
}
// StorageData — aggregate storage view across the whole topology.
// PVCs come from the live cluster (when reachable); buckets +
// volumes come from cloud-provider state via Crossplane managed-
// resource status.
type StorageData struct {
PVCs []PVC `json:"pvcs"`
Buckets []Bucket `json:"buckets"`
Volumes []Volume `json:"volumes"`
}
// PVC — one Kubernetes PersistentVolumeClaim. Fields mirror the
// existing infraPVCItem shape so the UI's PVC card renders unchanged.
type PVC struct {
ID string `json:"id"`
Name string `json:"name"`
Namespace string `json:"namespace"`
Capacity string `json:"capacity"`
Used string `json:"used"`
StorageClass string `json:"storageClass"`
Status string `json:"status"`
}
// Bucket — one S3-compatible object bucket (Hetzner Object Storage,
// SeaweedFS volume server, etc.).
type Bucket struct {
ID string `json:"id"`
Name string `json:"name"`
Endpoint string `json:"endpoint"`
Capacity string `json:"capacity"`
Used string `json:"used"`
RetentionDays string `json:"retentionDays"`
}
// Volume — one cloud block-storage volume (Hetzner Volume).
type Volume struct {
ID string `json:"id"`
Name string `json:"name"`
Capacity string `json:"capacity"`
Region string `json:"region"`
AttachedTo string `json:"attachedTo"`
Status string `json:"status"`
}
// MutationResponse — uniform 202 Accepted shape every CRUD endpoint
// emits. The frontend keys off jobId to deep-link to the
// GitLab-style log viewer in the Jobs surface.
type MutationResponse struct {
JobID string `json:"jobId"`
XRCKind string `json:"xrcKind"`
XRCName string `json:"xrcName"`
Status string `json:"status"`
SubmittedAt time.Time `json:"submittedAt"`
// Cascade — populated only on DELETE. Lists the child resources
// the cascade would affect so the FE confirm dialog can render
// "deleting region X will drain Y workloads, remove Z PVCs".
Cascade []CascadeImpact `json:"cascade,omitempty"`
}
// CascadeImpact — one row in a delete cascade preview.
type CascadeImpact struct {
Kind string `json:"kind"` // region | cluster | nodePool | node | pvc | volume | lb | peering
ID string `json:"id"`
Name string `json:"name"`
Note string `json:"note,omitempty"` // operator-readable detail
}

View File

@ -0,0 +1,330 @@
// xrc.go — Crossplane Composite Resource Claim writer helpers.
//
// Per docs/INVIOLABLE-PRINCIPLES.md #3 every Day-2 mutation MUST be
// expressed as a Crossplane XRC submission. This file is the single
// seam through which the catalyst-api writes those XRCs against the
// SOVEREIGN cluster (NOT contabo-mkt).
//
// The seam takes a dynamic.Interface (the Sovereign's dynamic client,
// built from the deployment's persisted kubeconfig) and a typed
// XRCSpec; it produces the unstructured object, performs the create
// against Crossplane's Composite Resource Claim API, and returns the
// XRC's namespace + name + GVK so the handler can surface them in
// the 202 response and write the audit-trail Job entry.
//
// # Why dynamic.Interface, not typed clients
//
// Crossplane Compositions are author-time; the catalyst-api is
// consumer-time. We write claims against API groups (e.g.
// `infra.openova.io/v1alpha1`) whose typed Go schemas DON'T live
// in this repo — the third-sibling agent owns them in their own
// chart. A dynamic client lets us write claims by group/version/kind
// without compiling against generated types.
//
// # When the Composition isn't ready yet
//
// If the Composition for a given XRC kind doesn't exist on the
// Sovereign cluster yet (the third-sibling agent hasn't finished
// authoring + applying their chart), the create still succeeds —
// Crossplane stores the claim and sits it as Pending. The catalyst-
// api emits a Job log line saying "Awaiting Crossplane Composition
// for <kind>" and returns the same 202.
package infrastructure
import (
"context"
"errors"
"fmt"
"strings"
"time"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/client-go/dynamic"
)
// XRCAPIGroup — the canonical Crossplane API group catalyst-api
// writes claims under. The third-sibling chart's Compositions match
// this group + version + per-kind names. Per docs/INVIOLABLE-PRINCIPLES.md
// #4 the group name is centralised here so a future migration to a
// different group only changes one line.
const (
XRCAPIGroup = "infra.openova.io"
XRCAPIVersion = "v1alpha1"
)
// XRC kind constants. Each maps to one Composition the third-sibling
// agent authors. Keep these in lockstep with the Composition manifest
// metadata.name values; mismatches surface as Pending claims with no
// reconciliation progress.
const (
KindRegionClaim = "RegionClaim"
KindClusterClaim = "ClusterClaim"
KindVClusterClaim = "VClusterClaim"
KindNodePoolClaim = "NodePoolClaim"
KindLoadBalancerClaim = "LoadBalancerClaim"
KindPeeringClaim = "PeeringClaim"
KindFirewallRuleClaim = "FirewallRuleClaim"
KindNodeActionClaim = "NodeActionClaim"
)
// XRCNamespace — the namespace catalyst-api submits all claims into.
// Crossplane Composite Resource Claims are namespace-scoped; the
// third-sibling agent's chart provisions ServiceAccount RBAC + the
// underlying ProviderConfig in this namespace.
const XRCNamespace = "catalyst-day2"
// LabelDeploymentID + LabelOwner — every claim catalyst-api writes
// carries these labels so an operator can trace a claim back to the
// deployment + the catalyst-api Pod that submitted it.
const (
LabelDeploymentID = "catalyst.openova.io/deployment-id"
LabelOwner = "catalyst.openova.io/owner"
LabelOwnerValue = "catalyst-api"
AnnotationAction = "catalyst.openova.io/action"
AnnotationDiff = "catalyst.openova.io/diff"
)
// XRCSpec — the typed payload one CRUD endpoint passes to SubmitXRC.
// Spec is a free-form map matching the XRD's schema; the helper
// stamps apiVersion + kind + metadata. Keeping this loose lets the
// CRUD handlers compose the same shape the third-sibling Composition
// expects without a per-kind Go type explosion.
type XRCSpec struct {
Kind string
Name string
DeploymentID string
Action string // human label e.g. "add-region", "remove-pool"
Diff string // unified-diff or short ASCII summary
// Spec — the XRD's spec subtree as a map. The helper marshals it
// under .spec on the unstructured object.
Spec map[string]any
}
// ErrXRCNameConflict — surfaced when a claim with the same name +
// namespace already exists. The CRUD handler maps this onto HTTP 409
// so the wizard can surface "this region is already provisioned".
var ErrXRCNameConflict = errors.New("infrastructure: xrc name conflict")
// SubmitXRC writes the XRC to the Sovereign cluster. Returns the
// gvr + the unstructured object (with the cluster-stamped UID +
// resourceVersion populated) so the handler can include them in the
// 202 response. Caller MUST hold no locks.
//
// The helper is idempotent on retry: a second call with the same
// .metadata.name within the same namespace returns ErrXRCNameConflict
// — the handler surfaces 409 and instructs the operator to use PATCH
// for in-place updates instead.
func SubmitXRC(ctx context.Context, client dynamic.Interface, spec XRCSpec) (*unstructured.Unstructured, schema.GroupVersionResource, error) {
if client == nil {
return nil, schema.GroupVersionResource{}, errors.New("infrastructure: dynamic client is required (sovereign cluster unreachable)")
}
if strings.TrimSpace(spec.Kind) == "" {
return nil, schema.GroupVersionResource{}, errors.New("infrastructure: XRC kind is required")
}
if strings.TrimSpace(spec.Name) == "" {
return nil, schema.GroupVersionResource{}, errors.New("infrastructure: XRC name is required")
}
gvr := gvrForKind(spec.Kind)
obj := &unstructured.Unstructured{}
obj.SetAPIVersion(XRCAPIGroup + "/" + XRCAPIVersion)
obj.SetKind(spec.Kind)
obj.SetName(spec.Name)
obj.SetNamespace(XRCNamespace)
obj.SetLabels(map[string]string{
LabelOwner: LabelOwnerValue,
LabelDeploymentID: spec.DeploymentID,
})
annotations := map[string]string{}
if spec.Action != "" {
annotations[AnnotationAction] = spec.Action
}
if spec.Diff != "" {
annotations[AnnotationDiff] = spec.Diff
}
if len(annotations) > 0 {
obj.SetAnnotations(annotations)
}
if spec.Spec != nil {
// k8s.io/apimachinery's DeepCopyJSONValue only accepts the
// JSON-typed scalars (string, bool, float64, int64, nil) and
// recursively-typed slices/maps. Go `int` panics — so we
// normalise the spec tree before SetNestedMap.
_ = unstructured.SetNestedMap(obj.Object, normaliseJSONMap(spec.Spec), "spec")
}
created, err := client.Resource(gvr).Namespace(XRCNamespace).Create(ctx, obj, metav1.CreateOptions{})
if err != nil {
if apierrors.IsAlreadyExists(err) {
return nil, gvr, fmt.Errorf("%w: %s/%s", ErrXRCNameConflict, spec.Kind, spec.Name)
}
return nil, gvr, fmt.Errorf("infrastructure: create %s/%s: %w", spec.Kind, spec.Name, err)
}
return created, gvr, nil
}
// DeleteXRC marks the named XRC for deletion. Crossplane's
// Composition controller honours .spec.deletionPolicy=Delete and
// reaps the underlying cloud resources. Returns ErrXRCNameConflict
// when the claim doesn't exist (the operator should refresh the
// topology before retrying).
func DeleteXRC(ctx context.Context, client dynamic.Interface, kind, name string) (schema.GroupVersionResource, error) {
if client == nil {
return schema.GroupVersionResource{}, errors.New("infrastructure: dynamic client is required (sovereign cluster unreachable)")
}
gvr := gvrForKind(kind)
policy := metav1.DeletePropagationForeground
err := client.Resource(gvr).Namespace(XRCNamespace).Delete(ctx, name, metav1.DeleteOptions{
PropagationPolicy: &policy,
})
if err != nil {
if apierrors.IsNotFound(err) {
return gvr, fmt.Errorf("%w: %s/%s not found", ErrXRCNameConflict, kind, name)
}
return gvr, fmt.Errorf("infrastructure: delete %s/%s: %w", kind, name, err)
}
return gvr, nil
}
// gvrForKind — derives the plural resource segment from the Kind.
// All kinds in this surface follow the same suffix pattern: drop
// "Claim" → lowercase → pluralise. RegionClaim → regionclaims.
// The helper centralises the rule so a future kind that violates it
// surfaces here.
func gvrForKind(kind string) schema.GroupVersionResource {
plural := strings.ToLower(kind)
if !strings.HasSuffix(plural, "s") {
plural += "s"
}
return schema.GroupVersionResource{
Group: XRCAPIGroup,
Version: XRCAPIVersion,
Resource: plural,
}
}
// XRCName composes the deterministic XRC name catalyst-api submits.
// The same shape (deployment-id-prefix + verb + slug) is used for
// every claim so the operator can grep across the cluster for a
// single deployment's claims.
//
// E.g. depID="ce476aaf80731a46", verb="region", slug="hel1" →
// "ce476aaf-region-hel1"
func XRCName(deploymentID, verb, slug string) string {
dep := strings.TrimSpace(deploymentID)
if len(dep) > 8 {
dep = dep[:8]
}
verb = strings.ToLower(strings.TrimSpace(verb))
slug = strings.ToLower(strings.TrimSpace(slug))
parts := []string{}
if dep != "" {
parts = append(parts, dep)
}
if verb != "" {
parts = append(parts, verb)
}
if slug != "" {
parts = append(parts, slug)
}
out := strings.Join(parts, "-")
// Crossplane claim names follow DNS-1123 subdomain rules — replace
// any disallowed characters with '-' and clamp at 63 chars.
out = sanitizeDNS1123(out)
if len(out) > 63 {
out = out[:63]
}
return out
}
func sanitizeDNS1123(in string) string {
var b strings.Builder
for i, r := range in {
switch {
case r >= 'a' && r <= 'z':
b.WriteRune(r)
case r >= '0' && r <= '9':
b.WriteRune(r)
case r == '-' || r == '.':
b.WriteRune(r)
case r >= 'A' && r <= 'Z':
b.WriteRune(r + 32)
default:
b.WriteRune('-')
}
_ = i
}
out := b.String()
out = strings.Trim(out, "-.")
if out == "" {
out = "x"
}
return out
}
// SubmittedAt — UTC instant the helper stamps onto every claim's
// metadata.annotations and into the 202 response. Centralised so
// tests can inject a fake clock.
func SubmittedAt() time.Time {
return time.Now().UTC()
}
// normaliseJSONMap walks a map[string]any and converts every leaf
// value to a JSON-compatible scalar (string, bool, int64, float64,
// nil) plus recursively normalised maps/slices. The k8s.io
// apimachinery's DeepCopyJSONValue panics on Go `int` (no JSON
// equivalent) — we hit that path through unstructured.SetNestedMap.
// Centralising the conversion here keeps the per-handler spec
// blocks readable (`map[string]any{"workerCount": body.WorkerCount}`)
// while the wire-level shape stays JSON-strict.
func normaliseJSONMap(in map[string]any) map[string]any {
out := make(map[string]any, len(in))
for k, v := range in {
out[k] = normaliseJSONValue(v)
}
return out
}
func normaliseJSONValue(v any) any {
switch x := v.(type) {
case nil, bool, string, int64, float64:
return x
case int:
return int64(x)
case int32:
return int64(x)
case uint:
return int64(x)
case uint32:
return int64(x)
case uint64:
return int64(x)
case float32:
return float64(x)
case map[string]any:
return normaliseJSONMap(x)
case []any:
out := make([]any, len(x))
for i := range x {
out[i] = normaliseJSONValue(x[i])
}
return out
case []string:
out := make([]any, len(x))
for i := range x {
out[i] = x[i]
}
return out
default:
// Fallback — render unknown types as their fmt.Sprint string
// so the XRC carries a valid JSON value and the operator can
// see what was attempted in the resulting Pending claim. The
// alternative (drop the field) would silently corrupt the
// audit trail.
return fmt.Sprintf("%v", x)
}
}

View File

@ -0,0 +1,267 @@
// mutation_bridge.go — Day-2 mutation Job audit trail.
//
// Every infrastructure CRUD endpoint on the catalyst-api goes through
// this surface BEFORE writing the Crossplane XRC, so an operator
// browsing /api/v1/deployments/{id}/jobs sees a complete record of
// every Day-2 action with the diff that was applied + the XRC that
// implements it.
//
// # Audit trail shape
//
// Per mutation, the bridge writes:
//
// - One Job (jobName="mutation-<verb>-<kind>", batchId="day-2-mutations")
// with deterministic id JobID(deploymentID, jobName).
// - One Execution scoped to the same Job, in `running` state.
// - One INFO LogLine "[mutation-request] action=<action> diff=...".
//
// After the XRC is submitted by the caller, AppendXRCSubmittedLog
// stamps a follow-up INFO LogLine "[xrc-submitted] kind=<kind>
// name=<name>" — and Crossplane's Composition controller, when
// reconciling the claim, appends further LogLines via the existing
// helmwatch bridge path (for component-typed claims like Cluster /
// VCluster Claims) or via per-claim status watchers (TBD by the
// third-sibling chart's audit shape).
//
// # Why a separate batch
//
// The bootstrap-kit Phase-1 install Jobs go into batchId="bootstrap-kit".
// Day-2 mutations go into batchId="day-2-mutations" so the FE Jobs
// surface can render them as a separate column. The same Bridge
// instance handles both via UpsertJob's batch field.
package jobs
import (
"errors"
"fmt"
"strings"
"time"
)
// BatchDay2Mutations — every mutation Job lands in this batch so the
// FE can group them separately from the Phase-1 install Jobs.
const BatchDay2Mutations = "day-2-mutations"
// MutationJobNamePrefix — every mutation Job's name starts with this.
// JobName format: "mutation-<verb>-<kind>" (e.g.
// "mutation-add-region"). The verb is operator-supplied; the kind is
// the XRC kind (RegionClaim → "region").
const MutationJobNamePrefix = "mutation-"
// MutationRecord — the typed payload one CRUD endpoint passes to
// RegisterMutationJob. Free-form Diff is rendered verbatim into the
// first LogLine; a unified-diff format is the convention.
type MutationRecord struct {
// Verb — short action label (add | remove | update | scale |
// cordon | drain | replace). Used as the second segment of the
// JobName.
Verb string
// Kind — XRC kind without the "Claim" suffix, lowercase. E.g.
// RegionClaim → "region", NodePoolClaim → "node-pool".
Kind string
// Slug — short identifier of the target resource (region id,
// pool name, node id). Used to disambiguate concurrent mutations
// on the same kind.
Slug string
// Action — full operator-readable action string for the log line
// (e.g. "add-region region=hel1 sku=cpx32 workers=2"). Persisted
// verbatim into the LogLine message.
Action string
// Diff — the desired change, in unified-diff or compact ASCII
// form. Rendered into the first LogLine and onto the XRC's
// catalyst.openova.io/diff annotation by the caller.
Diff string
// XRCKind — the Crossplane Composite Resource Claim kind the
// CRUD handler will write next (e.g. "RegionClaim"). Stamped
// into a second LogLine via AppendXRCSubmittedLog after the
// create() call.
XRCKind string
// At — wall-clock instant the mutation request landed. Defaults
// to time.Now() inside the helper.
At time.Time
}
// MutationResult — what RegisterMutationJob returns to the caller.
// The CRUD handler funnels these into the 202 response so the FE
// can deep-link to the GitLab-style log viewer.
type MutationResult struct {
JobID string
JobName string
ExecutionID string
}
// ErrInvalidMutation — returned when the supplied MutationRecord is
// missing required fields. CRUD handlers map this onto HTTP 500
// (this would be a programmer error since the wizard shape is
// validated before the handler reaches RegisterMutationJob).
var ErrInvalidMutation = errors.New("jobs: invalid mutation record")
// RegisterMutationJob writes the audit-trail Job + Execution +
// initial LogLine for one Day-2 mutation. Caller MUST call this
// BEFORE submitting the XRC; the handler then calls
// AppendXRCSubmittedLog with the create() result so the Job's log
// trail captures both the request and the submission.
//
// The store-level writes are serialised under Store.mu; the bridge's
// own state (b.activeExecID + b.lastState maps) is taken under b.mu
// so concurrent mutation registrations on the same deployment can't
// tear the cursor.
func (b *Bridge) RegisterMutationJob(rec MutationRecord) (MutationResult, error) {
if b == nil {
return MutationResult{}, errors.New("jobs: bridge is nil")
}
if strings.TrimSpace(rec.Verb) == "" {
return MutationResult{}, fmt.Errorf("%w: verb is required", ErrInvalidMutation)
}
if strings.TrimSpace(rec.Kind) == "" {
return MutationResult{}, fmt.Errorf("%w: kind is required", ErrInvalidMutation)
}
at := rec.At
if at.IsZero() {
at = time.Now().UTC()
} else {
at = at.UTC()
}
jobName := mutationJobName(rec.Verb, rec.Kind, rec.Slug)
jobID := JobID(b.deploymentID, jobName)
// Upsert the Job in `running` so the FE table renders the row
// with a spinner the moment the API returns 202. We don't enter
// `pending` here because the catalyst-api side has already
// committed to writing the XRC — pending would be misleading.
job := Job{
DeploymentID: b.deploymentID,
JobName: jobName,
AppID: rec.Kind,
BatchID: BatchDay2Mutations,
DependsOn: []string{},
Status: StatusRunning,
}
if err := b.store.UpsertJob(job); err != nil {
return MutationResult{}, fmt.Errorf("jobs: upsert mutation job: %w", err)
}
// Allocate the Execution. StartExecution stamps the Job's
// LatestExecutionID + StartedAt; we don't have to UpsertJob
// again post-allocation.
exec, err := b.store.StartExecution(b.deploymentID, jobName, at)
if err != nil {
return MutationResult{}, fmt.Errorf("jobs: start mutation execution: %w", err)
}
// Take b.mu to record the active execution cursor.
b.mu.Lock()
b.activeExecID[jobName] = exec.ID
b.lastState[jobName] = StatusRunning
b.mu.Unlock()
// Initial LogLine — "[mutation-request] ...".
msg := "[mutation-request] " + strings.TrimSpace(rec.Action)
if rec.Diff != "" {
msg += " diff=" + strings.ReplaceAll(rec.Diff, "\n", " ")
}
if err := b.store.AppendLogLines(b.deploymentID, exec.ID, []LogLine{{
Timestamp: at,
Level: LevelInfo,
Message: strings.TrimSpace(msg),
}}); err != nil {
return MutationResult{}, fmt.Errorf("jobs: append mutation request log: %w", err)
}
return MutationResult{
JobID: jobID,
JobName: jobName,
ExecutionID: exec.ID,
}, nil
}
// AppendXRCSubmittedLog appends an INFO LogLine recording the XRC
// the catalyst-api just submitted. Called after SubmitXRC succeeds.
// The CRUD handler passes the same MutationResult RegisterMutationJob
// returned so the helper finds the right Execution.
//
// When the XRC create errors (e.g. AlreadyExists), the handler
// instead calls FinishMutationJob with status=failed; this helper
// is for the success path.
func (b *Bridge) AppendXRCSubmittedLog(res MutationResult, xrcKind, xrcName, note string) error {
if b == nil {
return errors.New("jobs: bridge is nil")
}
if strings.TrimSpace(res.ExecutionID) == "" {
return errors.New("jobs: AppendXRCSubmittedLog: ExecutionID is required")
}
msg := "[xrc-submitted] kind=" + xrcKind + " name=" + xrcName
if note != "" {
msg += " note=" + note
}
return b.store.AppendLogLines(b.deploymentID, res.ExecutionID, []LogLine{{
Timestamp: time.Now().UTC(),
Level: LevelInfo,
Message: msg,
}})
}
// FinishMutationJob flips the mutation Job into a terminal state
// AFTER the XRC submission completes. status MUST be StatusSucceeded
// or StatusFailed; the handler decides which based on the
// SubmitXRC outcome.
//
// For a successful submission the Job is technically still "running"
// from Crossplane's POV (the Composition is reconciling), but the
// API-side audit job is done — Crossplane's own status feed is the
// continuing log surface. Treating the API-side job as Succeeded on
// "submission accepted" matches the FE expectation: the row turns
// green when 202 lands, and Crossplane's downstream reconciliation
// surfaces as additional LogLines on the same Execution OR via the
// helmwatch bridge (for component claims).
func (b *Bridge) FinishMutationJob(res MutationResult, status string, errMsg string) error {
if b == nil {
return errors.New("jobs: bridge is nil")
}
if strings.TrimSpace(res.ExecutionID) == "" {
return errors.New("jobs: FinishMutationJob: ExecutionID is required")
}
if status == "" {
status = StatusSucceeded
}
if !IsTerminal(status) {
return fmt.Errorf("jobs: FinishMutationJob: status must be terminal, got %q", status)
}
if errMsg != "" {
_ = b.store.AppendLogLines(b.deploymentID, res.ExecutionID, []LogLine{{
Timestamp: time.Now().UTC(),
Level: LevelError,
Message: "[xrc-submission-failed] " + errMsg,
}})
}
if err := b.store.FinishExecution(b.deploymentID, res.ExecutionID, status, time.Now().UTC()); err != nil {
return err
}
b.mu.Lock()
delete(b.activeExecID, res.JobName)
delete(b.lastState, res.JobName)
b.mu.Unlock()
return nil
}
// mutationJobName composes the JobName from the request fields. The
// shape is "mutation-<verb>-<kind>[-<slug>]" so concurrent mutations
// on the same kind don't collide on a single Job row.
func mutationJobName(verb, kind, slug string) string {
v := strings.ToLower(strings.TrimSpace(verb))
k := strings.ToLower(strings.TrimSpace(kind))
s := strings.ToLower(strings.TrimSpace(slug))
parts := []string{MutationJobNamePrefix + v, k}
if s != "" {
parts = append(parts, s)
}
return strings.Join(parts, "-")
}