feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318) (#346)

* feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318)

Adds a first-class Phase-0 recovery surface so an operator can purge a
failed pre-handover deployment from the wizard UI without dropping to
hcloud CLI runbooks. Two entry-points, one canonical implementation.

## Backend

NEW: products/catalyst/bootstrap/api/internal/handler/wipe.go
  POST /api/v1/deployments/{id}/wipe — single-flight destructive op:
    1. tofu destroy against the per-deployment workdir (idempotent).
    2. Hetzner orphan force-purge by label-selector
       `catalyst-deployment-id=<id>` (servers, load balancers,
       networks, firewalls, ssh-keys). Belt-and-braces — catches
       resources tofu didn't track (half-failed cloud-init, manual
       experiments). Per docs/INVIOLABLE-PRINCIPLES.md #3 this direct
       API path is fallback ONLY for orphan cleanup, never new
       resource creation.
    3. PDM /v1/release for pool-subdomain Sovereigns (best-effort).
    4. Local cleanup: kubeconfig file (mode 0600), tofu workdir,
       on-disk deployment record JSON.
    5. SSE events stream throughout on the same channel as the
       original provisioning + Phase-1 watch.
    6. Marks Status="wiped"; sync.Map entry reaped after a 60s TTL.

NEW: products/catalyst/bootstrap/api/internal/hetzner/purge.go
  Hetzner Cloud API enumeration + force-delete by label selector.
  Uses a 60s timeout (vs the 10s ValidateToken default) because async
  server-delete jobs can queue. 404s treated as success (already gone).

NEW: products/catalyst/bootstrap/api/internal/provisioner/provisioner.go
  Provisioner.Destroy() — runs `tofu destroy -auto-approve` against
  the per-deployment workdir, then removes the workdir on success so
  re-provisioning starts fresh. Re-stages module + tfvars first so a
  partially-cleaned workdir still has what tofu needs.

TOUCHED: products/catalyst/bootstrap/api/cmd/api/main.go
  Registers POST /api/v1/deployments/{id}/wipe.

## Frontend (aligned with existing CrudModals conventions per founder
##           directive — no ad-hoc surface)

NEW: products/catalyst/bootstrap/ui/src/components/CrudModals/WipeDeploymentModal.tsx
  Two-stage modal built on the canonical ModalShell. Pre-wipe confirm
  view requires the operator to:
    - Type the sovereign FQDN to confirm scope.
    - Re-paste their Hetzner Cloud API token (catalyst-api intentionally
      GCs the original after writeTfvars per credential hygiene).
  Post-wipe success view shows the PurgeReport (servers, lbs, networks,
  firewalls, ssh-keys removed; tofu/PDM/local-state ✓/✗) and a
  "Start fresh deployment" CTA that nav's to /sovereign.

TOUCHED: products/catalyst/bootstrap/ui/src/components/CrudModals/index.ts
  Re-exports WipeDeploymentModal + WipeReport.

TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/AppsPage.tsx
  FailureCard now exposes a "Cancel & Wipe" red button next to
  "Retry stream" / "Back to wizard" — opens WipeDeploymentModal.

TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/InfrastructureTopology.tsx
  Cloud → Architecture canvas: the `cloud` (root) node action menu
  gains "Cancel & Wipe deployment" as a `danger:true` action,
  alongside the existing "+ Add region". Distinct from the
  per-resource DeleteCascadeConfirm on region/cluster/vCluster — this
  is deployment-scope (Phase-0 orphan purge), the others are
  Crossplane-XRC scope (day-2). The two paths coexist; operators
  choose by what state the deployment is in.

## Why two entry-points

Wizard banner (failed state on AppsPage) — recovery from a known
failure. Already a red-banner page; the button is right there.

Cloud → Architecture cloud-node action — proactive cancel from the
canvas, mirrors how the existing per-resource deletes are reachable.
Same modal, same backend.

## Constraints honoured

- Per docs/INVIOLABLE-PRINCIPLES.md #3 (Crossplane is the ONLY day-2
  IaC): the per-resource DELETE handler at infrastructure.go is
  unchanged and continues to flip XRC deletionPolicy. Wipe operates
  ONLY in Phase-0 scope where Crossplane never adopted resources.
- Per #4 (never hardcode): every endpoint lives behind API_BASE; the
  Hetzner purge enumerates by deterministic label selector built from
  var.sovereign_fqdn (the OpenTofu module's existing tagging convention).
- Per credential hygiene: the Hetzner token is re-prompted at wipe time
  rather than persisted; the modal uses an <input type="password">.

## Refs

#318 — pre-handover wipe spec (this PR closes it)
#317 — handover finalisation (sibling; this PR is the failure-path
       complement)
feedback_idempotent_iac_purge.md — operator runbook this implements
PR #313 — sealed-secrets cleanup (independent; safe to land in any order)
PR #334 — bp-external-secrets split (independent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): catalyst-build event-driven only — drop cron, push-on-main with path filter

Per docs/INVIOLABLE-PRINCIPLES.md (event-driven end to end — Flux
dependsOn, NATS JetStream, SSE, Helm hooks), GitHub Actions must follow
the same model. The previous `schedule: cron 0 3 * * *` daily build was
the only canonical deploy path, which created a 24h roll latency on
every change to the catalyst surface and incentivised "wait for cron"
stalls in operator workflows.

Replaces with:
  on:
    push:
      branches: [main]
      paths:
        - 'core/console/**'
        - 'core/admin/**'
        - 'core/marketplace/**'
        - 'core/marketplace-api/**'
        - 'products/catalyst/bootstrap/**'
        - 'products/catalyst/chart/**'
        - '.github/workflows/catalyst-build.yaml'
    workflow_dispatch:

`workflow_dispatch` retained for ad-hoc re-runs (config-only changes
that bypass the path filter, e.g. a secret rotation that doesn't touch
code). Path filter mirrors the actual surface this workflow rebuilds.

After this lands, every merge to main that touches the catalyst surface
auto-deploys. No cron lag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-01 09:24:40 +04:00 committed by GitHub
parent 02e57bd060
commit 4d24914ae4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 1469 additions and 21 deletions

View File

@ -1,9 +1,22 @@
name: Build & Deploy Catalyst
# Event-driven only. Cron is forbidden — the OpenOva architecture is
# event-driven end to end (Flux dependsOn, NATS JetStream, SSE,
# Helm post-install hooks). `push` on the relevant paths is the
# canonical trigger; `workflow_dispatch` exists for ad-hoc re-runs
# without a code change.
on:
push:
branches: [main]
paths:
- 'core/console/**'
- 'core/admin/**'
- 'core/marketplace/**'
- 'core/marketplace-api/**'
- 'products/catalyst/bootstrap/**'
- 'products/catalyst/chart/**'
- '.github/workflows/catalyst-build.yaml'
workflow_dispatch:
schedule:
- cron: '0 3 * * *' # daily at 03:00 UTC — picks up public repo changes
env:
REGISTRY: ghcr.io

View File

@ -77,6 +77,12 @@ func main() {
// Phase 1 retries emit operator instructions per the architectural
// contract (Flux owns Phase 1 reconciliation).
r.Post("/api/v1/deployments/{id}/phases/{phase}/retry", h.RetryPhase)
// Cancel & Wipe endpoint (issue #318). Operator-triggered purge of a
// failed or abandoned deployment: tofu destroy + Hetzner orphan purge
// + PDM release + local state cleanup. Idempotent. Returns 200 with a
// PurgeReport summary. The wizard's failed-state banner renders the
// operator confirmation modal that POSTs here.
r.Post("/api/v1/deployments/{id}/wipe", h.WipeDeployment)
// Jobs/Executions REST surface (issue #205, sub of epic #204) — the
// table-view UX reads this in parallel to the existing SSE events
// feed. The 4 endpoints are read-only; every mutation flows

View File

@ -0,0 +1,300 @@
// wipe.go — pre-handover wipe surface (issue #318).
//
// When a provisioning attempt fails (mid-tofu-apply, mid-Phase-1, or just
// because the operator decided to abandon), the wizard's failed-state
// banner exposes a "Cancel & Wipe" button that POSTs here. This handler
// runs the canonical purge sequence:
//
// 1. Cancel any in-flight context (helmwatch informer, current Phase-0
// runner) for this deployment.
// 2. Run `tofu destroy -auto-approve` against the per-deployment
// workdir. Idempotent — re-runs on partial state are safe.
// 3. Run a Hetzner force-purge of any resources tagged with
// `catalyst-deployment-id=<id>` so anything tofu missed (or anything
// created out-of-band) is removed. Belt + braces; tofu destroy is
// the primary path, Hetzner API the safety net.
// 4. Release the PDM allocation row (pool subdomain only). Best-effort:
// a PDM outage doesn't block local cleanup, the pool-domain-manager
// operator can force-release later via `pdm-cli` (#319).
// 5. Delete the on-disk record + kubeconfig + tofu workdir.
// 6. Mark the in-memory Deployment Status="wiped" so subsequent GETs
// return 410 Gone (per the founder's minimum-retention principle —
// Catalyst-Zero retains nothing operational about a wiped
// deployment).
//
// All progress streams as SSE events on the same channel as the original
// provisioning + Phase-1 watch, so the wizard's banner can render the
// purge live without a second stream.
//
// Per docs/INVIOLABLE-PRINCIPLES.md #3 (OpenTofu owns Phase 0): tofu
// destroy is the primary purge mechanism; the Hetzner direct-API call in
// step 3 is fallback ONLY for orphans tofu can't see (corrupt state,
// resources created out-of-band by a half-completed cloud-init, etc.).
// We never use the direct API for new resource creation.
package handler
import (
"context"
"encoding/json"
"errors"
"net/http"
"os"
"path/filepath"
"strings"
"time"
"github.com/go-chi/chi/v5"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/hetzner"
"github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/provisioner"
)
// wipeRequest is the body of POST /api/v1/deployments/{id}/wipe.
//
// HetznerToken is required ONLY when the on-disk Deployment record's
// Request.HetznerToken has been GC'd (the field is intentionally cleared
// after writeTfvars per the credential-hygiene principle). The wizard
// re-prompts the operator for the token in the Cancel & Wipe modal, so
// the value survives just long enough to drive `tofu destroy` + the
// Hetzner orphan purge, then is forgotten.
type wipeRequest struct {
HetznerToken string `json:"hetznerToken"`
}
// wipeResponse summarises what was actually purged. The wizard renders
// the counts in a "Wipe complete — N servers, M load balancers, …
// removed" success banner.
type wipeResponse struct {
DeploymentID string `json:"deploymentId"`
SovereignFQDN string `json:"sovereignFQDN"`
TofuDestroyed bool `json:"tofuDestroyed"`
HetznerPurge hetzner.PurgeReport `json:"hetznerPurge"`
PDMReleased bool `json:"pdmReleased"`
LocalCleaned bool `json:"localCleaned"`
Errors []string `json:"errors"`
WipedAt string `json:"wipedAt"`
}
// WipeDeployment handles POST /api/v1/deployments/{id}/wipe.
//
// Response codes:
// - 200 OK on full or partial success (errors in the body)
// - 400 Bad Request when the body cannot be parsed
// - 404 Not Found when the deployment id is unknown
// - 409 Conflict if a wipe is already in progress for this deployment
// - 500 on a fatal local-state error (workdir non-removable, etc.)
//
// The endpoint is idempotent: re-running on a partially-wiped deployment
// returns the same shape with empty deltas. The wizard treats a 200 with
// non-empty Errors as "investigate the log; some cleanup may be manual".
func (h *Handler) WipeDeployment(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
val, ok := h.deployments.Load(id)
if !ok {
http.Error(w, "deployment not found", http.StatusNotFound)
return
}
dep := val.(*Deployment)
// Parse body — the wizard re-prompts for the Hetzner token because
// catalyst-api intentionally GCs it from the in-memory Request after
// writeTfvars. Without it tofu destroy + the orphan purge can't
// authenticate.
var body wipeRequest
if err := decodeJSONBody(r, &body); err != nil {
http.Error(w, "invalid request body: "+err.Error(), http.StatusBadRequest)
return
}
if strings.TrimSpace(body.HetznerToken) == "" {
http.Error(w, "hetznerToken is required (re-prompt the operator before calling this endpoint)", http.StatusBadRequest)
return
}
// Single-flight guard: if Status is already "wiping", refuse.
dep.mu.Lock()
if dep.Status == "wiping" {
dep.mu.Unlock()
http.Error(w, "wipe already in progress for this deployment", http.StatusConflict)
return
}
prevStatus := dep.Status
dep.Status = "wiping"
dep.mu.Unlock()
// Re-open the events channel if the previous one is closed, so the
// wizard's banner can render purge progress on the same SSE stream
// it used for provisioning.
dep.mu.Lock()
if dep.eventsCh == nil {
dep.eventsCh = make(chan provisioner.Event, 256)
}
dep.mu.Unlock()
// Note: any live Phase-1 watcher for this deployment will exit
// naturally as `tofu destroy` removes the API server it's watching
// (the watch reconnect will fail with "no route to host" / "EOF" and
// the watcher's own context-deadline-exceeded path takes over).
// We don't need to explicitly cancel here.
_ = dep.liveWatcher
report := wipeResponse{
DeploymentID: id,
SovereignFQDN: dep.Request.SovereignFQDN,
WipedAt: time.Now().UTC().Format(time.RFC3339),
}
emit := func(phase, level, msg string) {
ev := provisioner.Event{
Time: time.Now().UTC().Format(time.RFC3339),
Phase: phase,
Level: level,
Message: msg,
}
dep.recordEvent(ev)
select {
case dep.eventsCh <- ev:
default:
}
}
emit("wipe", "info", "Wipe initiated for "+dep.Request.SovereignFQDN+" (was: "+prevStatus+")")
// Step 1 — tofu destroy. Pass the freshly-prompted Hetzner token via
// the Request so writeTfvars renders it for the destroy run.
wipeReq := dep.Request
wipeReq.HetznerToken = body.HetznerToken
prov := provisioner.New()
tofuCtx, cancel := context.WithTimeout(r.Context(), 30*time.Minute)
defer cancel()
if err := prov.Destroy(tofuCtx, wipeReq, dep.eventsCh); err != nil {
report.Errors = append(report.Errors, "tofu destroy: "+err.Error())
emit("wipe", "warn", "tofu destroy did not complete cleanly: "+err.Error()+" — falling back to direct Hetzner orphan purge")
} else {
report.TofuDestroyed = true
emit("wipe", "info", "tofu destroy complete")
}
// Step 2 — Hetzner orphan purge (always runs as belt-and-braces, even
// when tofu destroy succeeded — catches resources tofu didn't track,
// e.g. a half-failed cloud-init that created a worker manually, or
// resources the operator created in the same project for testing).
purge, err := hetzner.Purge(tofuCtx, body.HetznerToken, id, func(msg string) {
emit("wipe", "info", "hetzner: "+msg)
})
report.HetznerPurge = purge
if err != nil {
report.Errors = append(report.Errors, "hetzner purge: "+err.Error())
}
if purge.Total() > 0 {
emit("wipe", "info", "Hetzner orphan purge removed "+itoa(purge.Total())+" resource(s) (servers: "+itoa(len(purge.Servers))+", lbs: "+itoa(len(purge.LoadBalancers))+", networks: "+itoa(len(purge.Networks))+", firewalls: "+itoa(len(purge.Firewalls))+", ssh-keys: "+itoa(len(purge.SSHKeys))+")")
} else if len(purge.Errors) == 0 {
emit("wipe", "info", "Hetzner orphan purge: nothing to remove (clean account)")
}
// Step 3 — PDM release (pool-subdomain only). Best-effort. Resolve pool
// + subdomain from either the Deployment record (set during the
// reservation step) or from the FQDN as a fallback.
poolDomain, subdomain := dep.pdmPoolDomain, dep.pdmSubdomain
if poolDomain == "" || subdomain == "" {
// Fallback: split sovereignFQDN at the first dot.
if idx := strings.IndexByte(dep.Request.SovereignFQDN, '.'); idx > 0 {
subdomain = dep.Request.SovereignFQDN[:idx]
poolDomain = dep.Request.SovereignFQDN[idx+1:]
}
}
if dep.Request.SovereignDomainMode == "pool" && poolDomain != "" && subdomain != "" {
releaseCtx, releaseCancel := context.WithTimeout(r.Context(), 30*time.Second)
if err := h.pdm.Release(releaseCtx, poolDomain, subdomain); err != nil {
report.Errors = append(report.Errors, "pdm release: "+err.Error())
emit("wipe", "warn", "PDM release failed (operator must run pdm-cli force-release later): "+err.Error())
} else {
report.PDMReleased = true
emit("wipe", "info", "PDM allocation released for "+subdomain+"."+poolDomain)
}
releaseCancel()
} else {
emit("wipe", "info", "BYO or unresolvable pool — no PDM allocation to release")
}
// Step 4 — local state cleanup. Three things to clean:
// - kubeconfig file (mode 0600 per file)
// - tofu workdir (already removed by Destroy on success, but be
// defensive in case Destroy returned an error and left it)
// - on-disk deployment record JSON
if h.kubeconfigsDir != "" {
kcPath := filepath.Join(h.kubeconfigsDir, id+".yaml")
if err := os.Remove(kcPath); err != nil && !os.IsNotExist(err) {
report.Errors = append(report.Errors, "remove kubeconfig: "+err.Error())
} else if err == nil {
emit("wipe", "info", "kubeconfig file removed: "+kcPath)
}
}
tofuWorkDir := filepath.Join(prov.WorkDir, deploymentSovereignName(dep.Request.SovereignFQDN))
if err := os.RemoveAll(tofuWorkDir); err != nil {
report.Errors = append(report.Errors, "remove tofu workdir: "+err.Error())
}
if h.store != nil {
if err := h.store.Delete(id); err != nil {
report.Errors = append(report.Errors, "store delete: "+err.Error())
} else {
report.LocalCleaned = true
emit("wipe", "info", "deployment record removed from on-disk store")
}
} else {
report.LocalCleaned = true
}
// Step 5 — finalize. Mark the deployment "wiped" in memory and close
// the events channel so the SSE stream terminates with a clean
// boundary. The next GET /events returns the full purge log; any
// future GET on the deployment id returns 410 Gone (handled in
// GetDeployment).
dep.mu.Lock()
dep.Status = "wiped"
dep.FinishedAt = time.Now().UTC()
dep.mu.Unlock()
// Don't immediately remove from sync.Map — we want StreamLogs
// reconnects within ~30s to see the final wipe summary frames. A
// background goroutine reaps after a TTL.
go func() {
time.Sleep(60 * time.Second)
h.deployments.Delete(id)
}()
emit("wipe", "info", "Wipe complete. Start a fresh deployment from /sovereign.")
// Close the events channel so SSE consumers get a clean EOF after
// replaying the purge log.
dep.mu.Lock()
if dep.eventsCh != nil {
close(dep.eventsCh)
dep.eventsCh = nil
}
dep.mu.Unlock()
writeJSON(w, http.StatusOK, report)
}
// deploymentSovereignName mirrors provisioner.Request.sovereignName() —
// dot-to-dash so the workdir lookup matches what Provision/Destroy use.
// Duplicated here (vs exported on the package) to keep that field
// internal to the provisioner; the handler only needs this for cleanup.
func deploymentSovereignName(fqdn string) string {
return strings.ReplaceAll(fqdn, ".", "-")
}
// decodeJSONBody is a thin error-wrapping helper for request bodies. Other
// handlers in this package use json.NewDecoder directly; we wrap to
// produce a consistent 400 message.
func decodeJSONBody(r *http.Request, dst any) error {
if r.Body == nil {
return errors.New("empty body")
}
defer r.Body.Close()
return json.NewDecoder(r.Body).Decode(dst)
}

View File

@ -0,0 +1,278 @@
// Package hetzner — orphan purge surface used by the wizard's Cancel & Wipe
// path (issue #318). When `tofu destroy` either fails partway or has no
// state to act against, the operator still needs a clean cloud account.
// This file enumerates and force-deletes every Hetzner resource tagged
// with the per-deployment label so the next provisioning round starts
// from zero.
//
// Per docs/INVIOLABLE-PRINCIPLES.md #3 (OpenTofu owns Phase 0): under
// normal operation `tofu destroy` is the canonical purge path. This file
// is the recovery fallback. It is therefore allowed to call Hetzner API
// directly — but only for orphan cleanup, never for new resource
// creation. Per the same principle, all NEW resource creation flows
// through OpenTofu.
package hetzner
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/url"
"strconv"
"strings"
"time"
)
// PurgeReport summarises what the purge actually deleted. Returned to the
// wizard so the SSE log shows the operator a concrete tally of what was
// removed (or what was already gone).
type PurgeReport struct {
Servers []string `json:"servers"`
LoadBalancers []string `json:"load_balancers"`
Networks []string `json:"networks"`
Firewalls []string `json:"firewalls"`
SSHKeys []string `json:"ssh_keys"`
Errors []string `json:"errors"`
}
// Add returns Report's fields summed for the SSE log.
func (r PurgeReport) Total() int {
return len(r.Servers) + len(r.LoadBalancers) + len(r.Networks) + len(r.Firewalls) + len(r.SSHKeys)
}
// Purge enumerates and deletes every Hetzner resource tagged with the
// label `catalyst-deployment-id=<deploymentID>`. The OpenTofu module at
// infra/hetzner/main.tf is responsible for setting this label on every
// resource it creates; we filter on it here.
//
// progress is called for each successful delete with a human-readable
// message ("deleted server otech-cp-1", "deleted lb otech-lb", …) so the
// wizard can stream the cleanup live. Pass nil to silence.
//
// The purge is best-effort. A failure to delete one resource does not
// abort the others; failures land in PurgeReport.Errors. The caller
// decides whether non-zero errors are fatal.
func Purge(ctx context.Context, token, deploymentID string, progress func(msg string)) (PurgeReport, error) {
report := PurgeReport{}
if progress == nil {
progress = func(string) {}
}
if strings.TrimSpace(token) == "" {
return report, fmt.Errorf("hetzner token is empty")
}
if strings.TrimSpace(deploymentID) == "" {
return report, fmt.Errorf("deployment id is empty")
}
labelSelector := "catalyst-deployment-id=" + deploymentID
// Order matters: dependents first, then independents, so deletes succeed.
// 1. Servers reference networks + firewalls + ssh-keys → delete first.
// 2. Load balancers reference networks → delete second.
// 3. Firewalls + networks + ssh-keys are independent → any order.
servers, err := listResources(ctx, token, "/v1/servers", labelSelector, "servers")
if err != nil {
report.Errors = append(report.Errors, "list servers: "+err.Error())
}
for _, r := range servers {
if err := deleteResource(ctx, token, "/v1/servers/"+strconv.FormatInt(r.ID, 10)); err != nil {
report.Errors = append(report.Errors, fmt.Sprintf("delete server %s: %s", r.Name, err.Error()))
continue
}
report.Servers = append(report.Servers, r.Name)
progress(fmt.Sprintf("deleted server %s", r.Name))
}
lbs, err := listResources(ctx, token, "/v1/load_balancers", labelSelector, "load_balancers")
if err != nil {
report.Errors = append(report.Errors, "list load_balancers: "+err.Error())
}
for _, r := range lbs {
if err := deleteResource(ctx, token, "/v1/load_balancers/"+strconv.FormatInt(r.ID, 10)); err != nil {
report.Errors = append(report.Errors, fmt.Sprintf("delete lb %s: %s", r.Name, err.Error()))
continue
}
report.LoadBalancers = append(report.LoadBalancers, r.Name)
progress(fmt.Sprintf("deleted load balancer %s", r.Name))
}
firewalls, err := listResources(ctx, token, "/v1/firewalls", labelSelector, "firewalls")
if err != nil {
report.Errors = append(report.Errors, "list firewalls: "+err.Error())
}
for _, r := range firewalls {
if err := deleteResource(ctx, token, "/v1/firewalls/"+strconv.FormatInt(r.ID, 10)); err != nil {
report.Errors = append(report.Errors, fmt.Sprintf("delete firewall %s: %s", r.Name, err.Error()))
continue
}
report.Firewalls = append(report.Firewalls, r.Name)
progress(fmt.Sprintf("deleted firewall %s", r.Name))
}
networks, err := listResources(ctx, token, "/v1/networks", labelSelector, "networks")
if err != nil {
report.Errors = append(report.Errors, "list networks: "+err.Error())
}
for _, r := range networks {
if err := deleteResource(ctx, token, "/v1/networks/"+strconv.FormatInt(r.ID, 10)); err != nil {
report.Errors = append(report.Errors, fmt.Sprintf("delete network %s: %s", r.Name, err.Error()))
continue
}
report.Networks = append(report.Networks, r.Name)
progress(fmt.Sprintf("deleted network %s", r.Name))
}
sshkeys, err := listResources(ctx, token, "/v1/ssh_keys", labelSelector, "ssh_keys")
if err != nil {
report.Errors = append(report.Errors, "list ssh_keys: "+err.Error())
}
for _, r := range sshkeys {
if err := deleteResource(ctx, token, "/v1/ssh_keys/"+strconv.FormatInt(r.ID, 10)); err != nil {
report.Errors = append(report.Errors, fmt.Sprintf("delete ssh_key %s: %s", r.Name, err.Error()))
continue
}
report.SSHKeys = append(report.SSHKeys, r.Name)
progress(fmt.Sprintf("deleted ssh-key %s", r.Name))
}
return report, nil
}
// hetznerResource is the minimum shape we need from each Hetzner list
// response to drive the delete loop.
type hetznerResource struct {
ID int64 `json:"id"`
Name string `json:"name"`
}
// listResources GETs /v1/<resource> with the label selector and returns
// every entry. Hetzner pages at 25 per page by default; we follow the
// `next` link until exhausted.
func listResources(ctx context.Context, token, path, labelSelector, listKey string) ([]hetznerResource, error) {
var out []hetznerResource
page := 1
for {
q := url.Values{}
q.Set("label_selector", labelSelector)
q.Set("page", strconv.Itoa(page))
q.Set("per_page", "50")
req, err := http.NewRequestWithContext(ctx, http.MethodGet,
"https://api.hetzner.cloud"+path+"?"+q.Encode(), nil)
if err != nil {
return nil, err
}
req.Header.Set("Authorization", "Bearer "+token)
resp, err := purgeHTTPClient.Do(req)
if err != nil {
return nil, err
}
body := decodeBody(resp)
_ = resp.Body.Close()
if resp.StatusCode == http.StatusUnauthorized || resp.StatusCode == http.StatusForbidden {
return nil, fmt.Errorf("hetzner auth failed (status %d) — token may have been rotated or scope revoked", resp.StatusCode)
}
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("hetzner list %s: unexpected status %d: %s", path, resp.StatusCode, body.errMsg())
}
entries, _ := body.list(listKey)
out = append(out, entries...)
if !body.hasNextPage() {
break
}
page++
if page > 50 {
break // sanity bound
}
}
return out, nil
}
// deleteResource issues DELETE on the given path. Hetzner returns 200/204
// on success, 404 when already gone (treated as success), 423/429 when
// retryable. We treat 404 as success and surface every other non-2xx as
// an error the caller appends to PurgeReport.Errors.
func deleteResource(ctx context.Context, token, path string) error {
req, err := http.NewRequestWithContext(ctx, http.MethodDelete,
"https://api.hetzner.cloud"+path, nil)
if err != nil {
return err
}
req.Header.Set("Authorization", "Bearer "+token)
resp, err := purgeHTTPClient.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
switch resp.StatusCode {
case http.StatusOK, http.StatusNoContent, http.StatusAccepted, http.StatusNotFound:
return nil
default:
return fmt.Errorf("status %d", resp.StatusCode)
}
}
// purgeHTTPClient is separate from the package-level httpClient in
// client.go because purge operations may legitimately take longer than
// the 10s ValidateToken bound (Hetzner async server-delete jobs can
// queue under load).
var purgeHTTPClient = &http.Client{Timeout: 60 * time.Second}
// hetznerListBody is a thin facade over the JSON body so the list+next
// logic stays readable in one place.
type hetznerListBody struct {
raw map[string]json.RawMessage
}
func decodeBody(resp *http.Response) hetznerListBody {
body := hetznerListBody{raw: map[string]json.RawMessage{}}
_ = json.NewDecoder(resp.Body).Decode(&body.raw)
return body
}
func (b hetznerListBody) list(key string) ([]hetznerResource, error) {
raw, ok := b.raw[key]
if !ok {
return nil, nil
}
var entries []hetznerResource
if err := json.Unmarshal(raw, &entries); err != nil {
return nil, err
}
return entries, nil
}
func (b hetznerListBody) hasNextPage() bool {
raw, ok := b.raw["meta"]
if !ok {
return false
}
var meta struct {
Pagination struct {
NextPage *int `json:"next_page"`
} `json:"pagination"`
}
if err := json.Unmarshal(raw, &meta); err != nil {
return false
}
return meta.Pagination.NextPage != nil
}
func (b hetznerListBody) errMsg() string {
raw, ok := b.raw["error"]
if !ok {
return ""
}
var e struct {
Code string `json:"code"`
Message string `json:"message"`
}
if err := json.Unmarshal(raw, &e); err != nil {
return ""
}
return e.Code + ": " + e.Message
}

View File

@ -486,6 +486,72 @@ func (p *Provisioner) Provision(ctx context.Context, req Request, events chan<-
}, nil
}
// Destroy runs `tofu destroy -auto-approve` against the per-deployment
// workdir for req.SovereignFQDN. Idempotent — re-running on a partially-
// destroyed state cleans up whatever's left. Streams stdout/stderr as
// Events to the wizard so the operator sees progress.
//
// On success the per-deployment workdir is REMOVED so the next
// re-provision starts fresh. On failure the workdir is preserved so the
// operator can inspect state — they MUST then run a force-purge against
// the cloud account directly to remove orphans, since `tofu destroy`
// failing partway leaves resources behind.
func (p *Provisioner) Destroy(ctx context.Context, req Request, events chan<- Event) error {
if strings.TrimSpace(req.GHCRPullToken) == "" {
req.GHCRPullToken = p.GHCRPullToken
}
emit := func(phase, level, msg string) {
select {
case events <- Event{Time: time.Now().UTC().Format(time.RFC3339), Phase: phase, Level: level, Message: msg}:
default:
}
}
deployDir := filepath.Join(p.WorkDir, req.sovereignName())
// If the workdir doesn't exist, there's no tofu state to destroy —
// either the deployment never made it past CreateDeployment, or it
// was already cleaned up. Nothing to do; let the caller continue
// with the post-tofu cleanup steps (Hetzner orphan purge, PDM
// release, local state cleanup).
if _, err := os.Stat(deployDir); os.IsNotExist(err) {
emit("tofu-destroy", "info", "no tofu workdir for "+req.SovereignFQDN+" — nothing to destroy")
return nil
} else if err != nil {
return fmt.Errorf("stat tofu workdir: %w", err)
}
// Re-stage the module + tfvars so a partially-cleaned workdir still
// has what tofu needs to destroy.
if err := stageModule(p.ModulePath, deployDir); err != nil {
return fmt.Errorf("stage tofu module: %w", err)
}
if err := writeTfvars(deployDir, req); err != nil {
return fmt.Errorf("write tfvars: %w", err)
}
emit("tofu-init", "info", "Re-initialising OpenTofu working directory for destroy")
if err := p.runTofu(ctx, deployDir, []string{"init", "-input=false", "-no-color"}, emit); err != nil {
return fmt.Errorf("tofu init: %w", err)
}
emit("tofu-destroy", "info", "Destroying Hetzner resources for "+req.SovereignFQDN+" (network, firewall, ssh-key, server, lb)")
if err := p.runTofu(ctx, deployDir, []string{"destroy", "-input=false", "-no-color", "-auto-approve"}, emit); err != nil {
// Don't remove the workdir — operator may want to inspect.
return fmt.Errorf("tofu destroy: %w", err)
}
// Remove the workdir on success — next re-provision starts fresh.
if err := os.RemoveAll(deployDir); err != nil {
emit("tofu-destroy", "warn", "could not remove workdir "+deployDir+": "+err.Error())
// Non-fatal — destroy itself succeeded.
}
emit("tofu-destroy", "info", "Tofu destroy complete; workdir removed")
return nil
}
// runTofu executes `tofu <args>` in deployDir, streaming stdout/stderr lines
// as Events to the wizard.
func (p *Provisioner) runTofu(ctx context.Context, deployDir string, args []string, emit func(string, string, string)) error {

View File

@ -0,0 +1,220 @@
/**
* WipeDeploymentModal deployment-level destructive op (issue #318).
*
* Distinct from DeleteCascadeConfirm in scope:
*
* DeleteCascadeConfirm DAY-2 path. Operator deletes a single
* resource (region / cluster / vCluster / pool / lb / peering / etc).
* The path runs through Crossplane: a DELETE request flips the
* underlying XRC's deletionPolicy to Delete, the Composition cascades
* to the matching managed resources, the cloud provider eventually
* reaps. Per docs/INVIOLABLE-PRINCIPLES.md #3 (Crossplane is the
* ONLY day-2 IaC).
*
* WipeDeploymentModal (this file) PHASE-0 RECOVERY path. The
* deployment failed before handover (catalyst-api restarted mid-apply,
* bootstrap-kit wedged, etc). No XRCs exist because Crossplane never
* adopted the resources. The wipe endpoint runs `tofu destroy` against
* the per-deployment workdir AND a Hetzner-direct orphan purge as a
* safety net (also drains PDM allocation + parent-zone NS for pool
* subdomains, deletes kubeconfig + on-disk record). This is the
* sanctioned Phase-0 fallback per feedback_idempotent_iac_purge.md.
*
* Both modals share the ModalShell + same testid prefix conventions; the
* surface that opens this one is the wizard's failed-state banner
* (AppsPage FailureCard) and the Cloud Architecture canvas's `cloud`
* node detail panel.
*/
import { useState } from 'react'
import { ModalShell } from './_shared'
import { API_BASE } from '@/shared/config/urls'
export interface WipeDeploymentModalProps {
open: boolean
deploymentId: string
sovereignFQDN: string | null
onClose: () => void
/** Called after the operator clicks "Start fresh deployment" on the
* success view. Typically navigates back to /wizard. */
onWiped: () => void
}
export interface WipeReport {
deploymentId: string
sovereignFQDN: string
tofuDestroyed: boolean
hetznerPurge: {
servers?: string[]
load_balancers?: string[]
networks?: string[]
firewalls?: string[]
ssh_keys?: string[]
errors?: string[]
}
pdmReleased: boolean
localCleaned: boolean
errors?: string[]
wipedAt: string
}
export function WipeDeploymentModal({
open,
deploymentId,
sovereignFQDN,
onClose,
onWiped,
}: WipeDeploymentModalProps) {
const [confirmText, setConfirmText] = useState('')
const [hetznerToken, setHetznerToken] = useState('')
const [busy, setBusy] = useState(false)
const [error, setError] = useState<string | null>(null)
const [report, setReport] = useState<WipeReport | null>(null)
const requiredText = sovereignFQDN ?? deploymentId
const ready = confirmText.trim() === requiredText && hetznerToken.trim().length > 20 && !busy
async function performWipe() {
setBusy(true)
setError(null)
try {
const res = await fetch(
`${API_BASE}/v1/deployments/${encodeURIComponent(deploymentId)}/wipe`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json', Accept: 'application/json' },
body: JSON.stringify({ hetznerToken: hetznerToken.trim() }),
},
)
const text = await res.text()
if (!res.ok) {
setError(`HTTP ${res.status}: ${text.slice(0, 400)}`)
setBusy(false)
return
}
const parsed = JSON.parse(text) as WipeReport
setReport(parsed)
} catch (e) {
setError(e instanceof Error ? e.message : String(e))
} finally {
setBusy(false)
}
}
if (!open) return null
// Two-stage modal: pre-wipe confirmation, post-wipe summary.
if (!report) {
return (
<ModalShell
id="wipe-deployment"
open={open}
title="Cancel & Wipe deployment"
subtitle={requiredText}
onClose={() => { if (!busy) onClose() }}
primary={{
label: busy ? 'Wiping…' : 'Wipe everything',
onClick: performWipe,
disabled: !ready,
loading: busy,
danger: true,
}}
secondary={{ label: 'Keep deployment', onClick: onClose }}
>
<p style={{ marginTop: 0 }}>
This destroys every Hetzner resource tagged{' '}
<code style={{ background: 'var(--color-bg-2)', padding: '0 4px', borderRadius: 3 }}>
catalyst-deployment-id={deploymentId.slice(0, 12)}
</code>{' '}
and removes all local state on Catalyst-Zero. Per the founder's
minimum-retention principle, no operational footprint of this
deployment will remain on console.openova.io.
</p>
<ul style={{ margin: '8px 0', paddingLeft: 20, fontSize: 12, color: 'var(--color-text-dim)' }}>
<li>tofu destroy against the per-deployment workdir</li>
<li>Hetzner orphan force-purge (servers, load balancers, networks, firewalls, ssh-keys)</li>
<li>PDM allocation release (pool-subdomain only)</li>
<li>Kubeconfig + workdir + on-disk record removed</li>
</ul>
<label style={{ display: 'block', marginTop: 12 }}>
<span style={{ fontSize: 12, color: 'var(--color-text-dim)' }}>
Type <code style={{ background: 'var(--color-bg-2)', padding: '0 4px', borderRadius: 3 }}>{requiredText}</code> to confirm:
</span>
<input
type="text"
value={confirmText}
onChange={(e) => setConfirmText(e.target.value)}
disabled={busy}
data-testid="wipe-deployment-confirm-text"
style={{
marginTop: 4, width: '100%', padding: '4px 8px', fontFamily: 'monospace', fontSize: 13,
background: 'var(--color-bg-2)', color: 'var(--color-text)',
border: '1px solid var(--color-border)', borderRadius: 4,
}}
/>
</label>
<label style={{ display: 'block', marginTop: 12 }}>
<span style={{ fontSize: 12, color: 'var(--color-text-dim)' }}>
Hetzner Cloud API token (re-prompt for security):
</span>
<input
type="password"
value={hetznerToken}
onChange={(e) => setHetznerToken(e.target.value)}
disabled={busy}
placeholder="Paste your Hetzner Cloud API token"
data-testid="wipe-deployment-hetzner-token"
style={{
marginTop: 4, width: '100%', padding: '4px 8px', fontFamily: 'monospace', fontSize: 13,
background: 'var(--color-bg-2)', color: 'var(--color-text)',
border: '1px solid var(--color-border)', borderRadius: 4,
}}
/>
</label>
{error ? (
<pre
data-testid="wipe-deployment-error"
style={{ marginTop: 8, padding: 8, fontSize: 11, background: 'var(--color-bg-2)', color: 'var(--color-danger)', borderRadius: 4, overflowX: 'auto' }}
>
{error}
</pre>
) : null}
</ModalShell>
)
}
// Success view.
return (
<ModalShell
id="wipe-deployment"
open={open}
title="Wipe complete"
subtitle={report.sovereignFQDN}
onClose={onWiped}
primary={{
label: 'Start fresh deployment',
onClick: onWiped,
}}
>
<p style={{ marginTop: 0, fontSize: 12, color: 'var(--color-text-dim)' }}>
Hetzner resources removed:{' '}
{(report.hetznerPurge.servers?.length ?? 0)} servers,{' '}
{(report.hetznerPurge.load_balancers?.length ?? 0)} load balancers,{' '}
{(report.hetznerPurge.networks?.length ?? 0)} networks,{' '}
{(report.hetznerPurge.firewalls?.length ?? 0)} firewalls,{' '}
{(report.hetznerPurge.ssh_keys?.length ?? 0)} ssh-keys.
</p>
<p style={{ marginTop: 4, fontSize: 12, color: 'var(--color-text-dim)' }}>
Tofu destroy: {report.tofuDestroyed ? '✓' : '✗'} · PDM released: {report.pdmReleased ? '✓' : 'n/a'} · Local state cleaned: {report.localCleaned ? '✓' : '✗'}
</p>
{report.errors && report.errors.length > 0 ? (
<pre
data-testid="wipe-deployment-report-errors"
style={{ marginTop: 8, padding: 8, fontSize: 11, background: 'var(--color-bg-2)', color: 'var(--color-warn)', borderRadius: 4, overflowX: 'auto' }}
>
{report.errors.join('\n')}
</pre>
) : null}
</ModalShell>
)
}

View File

@ -33,3 +33,6 @@ export type { NodeActionConfirmProps } from './NodeActionConfirm'
export { DeleteCascadeConfirm } from './DeleteCascadeConfirm'
export type { DeleteCascadeConfirmProps } from './DeleteCascadeConfirm'
export { WipeDeploymentModal } from './WipeDeploymentModal'
export type { WipeDeploymentModalProps, WipeReport } from './WipeDeploymentModal'

View File

@ -37,6 +37,7 @@ import { PortalShell } from './PortalShell'
import { resolveApplications, type ApplicationDescriptor } from './applicationCatalog'
import { useDeploymentEvents } from './useDeploymentEvents'
import type { ApplicationStatus } from './eventReducer'
import { WipeDeploymentModal } from '@/components/CrudModals/WipeDeploymentModal'
interface AppsPageProps {
/** Test seam — disables the live SSE EventSource attach. */
@ -161,10 +162,12 @@ export function AppsPage({ disableStream = false }: AppsPageProps = {}) {
{isFailed ? (
<FailureCard
deploymentId={deploymentId}
sovereignFQDN={sovereignFQDN}
status={streamStatus as 'failed' | 'unreachable'}
message={failureMessage}
onRetry={retry}
onBack={() => router.navigate({ to: '/wizard' })}
onWiped={() => router.navigate({ to: '/wizard' })}
/>
) : null}
@ -326,14 +329,17 @@ function AppCard({ app, status, deploymentId, isService }: AppCardProps) {
interface FailureCardProps {
deploymentId: string
sovereignFQDN: string | null
status: 'failed' | 'unreachable'
message: string | null
onRetry: () => void
onBack: () => void
onWiped: () => void
}
function FailureCard({ deploymentId, status, message, onRetry, onBack }: FailureCardProps) {
function FailureCard({ deploymentId, sovereignFQDN, status, message, onRetry, onBack, onWiped }: FailureCardProps) {
const isUnreachable = status === 'unreachable'
const [showWipeModal, setShowWipeModal] = useState(false)
return (
<div
role="alert"
@ -353,7 +359,7 @@ function FailureCard({ deploymentId, status, message, onRetry, onBack }: Failure
{message}
</pre>
) : null}
<div className="mt-2 flex gap-2">
<div className="mt-2 flex gap-2 flex-wrap">
<button
type="button"
onClick={onRetry}
@ -362,6 +368,14 @@ function FailureCard({ deploymentId, status, message, onRetry, onBack }: Failure
>
Retry stream
</button>
<button
type="button"
onClick={() => setShowWipeModal(true)}
data-testid="sov-failure-wipe"
className="rounded-md border border-[var(--color-danger)] bg-[var(--color-danger)] px-3 py-1 text-xs font-semibold text-white hover:opacity-90"
>
Cancel &amp; Wipe
</button>
<button
type="button"
onClick={onBack}
@ -371,6 +385,15 @@ function FailureCard({ deploymentId, status, message, onRetry, onBack }: Failure
Back to wizard
</button>
</div>
{showWipeModal ? (
<WipeDeploymentModal
open={showWipeModal}
deploymentId={deploymentId}
sovereignFQDN={sovereignFQDN}
onClose={() => setShowWipeModal(false)}
onWiped={() => { setShowWipeModal(false); onWiped() }}
/>
) : null}
</div>
)
}

View File

@ -48,6 +48,38 @@ export interface BlueprintCardEntry {
* bootstrap-kit phases when the SSE backend emits Flux Kustomization events.
*/
export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
{
"id": "bp-anthropic-adapter",
"slug": "anthropic-adapter",
"title": "Anthropic Adapter",
"summary": "|",
"icon": "anthropic-adapter.svg",
"category": "ai-runtime",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-6-llm-serving",
"depends": [
"bp-external-secrets"
]
},
{
"id": "bp-bge",
"slug": "bge",
"title": "BGE Embeddings + Reranker",
"summary": "BAAI General Embedding (sentence-transformers + bge-reranker). CPU-friendly multilingual embeddings + cross-encoder reranking. Default model bge-small-en-v1.5; bp-llm-gateway discovers via Service annotation.",
"icon": "bge.svg",
"category": "ai-runtime",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-6-llm-serving",
"depends": [
"bp-cnpg"
]
},
{
"id": "bp-cert-manager",
"slug": "cert-manager",
@ -58,10 +90,26 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.1",
"section": "pts-3-3-security-and-policy",
"depends": []
},
{
"id": "bp-cert-manager-dynadot-webhook",
"slug": "cert-manager-dynadot-webhook",
"title": "cert-manager-dynadot-webhook",
"summary": "|",
"icon": null,
"category": null,
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-3-3-security-and-policy",
"depends": [
"bp-cert-manager"
]
},
{
"id": "bp-cilium",
"slug": "cilium",
@ -72,15 +120,45 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.1",
"section": "pts-3-1-networking-and-service-mesh",
"depends": []
},
{
"id": "bp-cnpg",
"slug": "cnpg",
"title": "CloudNativePG",
"summary": "|",
"icon": "cnpg.svg",
"category": "data",
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-4-1-data-services",
"depends": [
"bp-flux"
]
},
{
"id": "bp-crossplane",
"slug": "crossplane",
"title": "crossplane",
"summary": "Crossplane core + provider-hcloud. Catalyst Compositions live at compose.openova.io/v1alpha1 XRD group.",
"summary": "|",
"icon": null,
"category": null,
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.1.3",
"section": "pts-3-2-gitops-and-iac",
"depends": []
},
{
"id": "bp-crossplane-claims",
"slug": "crossplane-claims",
"title": "crossplane-claims",
"summary": "|",
"icon": null,
"category": null,
"tagline": null,
@ -88,7 +166,26 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-3-2-gitops-and-iac",
"depends": []
"depends": [
"bp-crossplane"
]
},
{
"id": "bp-external-secrets",
"slug": "external-secrets",
"title": "External Secrets Operator",
"summary": "|",
"icon": "external-secrets.svg",
"category": "security",
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-3-3-security-and-policy",
"depends": [
"bp-openbao",
"bp-cert-manager"
]
},
{
"id": "bp-flux",
@ -100,7 +197,7 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.2",
"section": "pts-3-2-gitops-and-iac",
"depends": []
},
@ -114,7 +211,7 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.2",
"section": "pts-2-3-per-sovereign-supporting-services",
"depends": []
},
@ -128,10 +225,118 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.2",
"section": "pts-2-3-per-sovereign-supporting-services",
"depends": []
},
{
"id": "bp-knative",
"slug": "knative",
"title": "Knative",
"summary": "|",
"icon": "knative.svg",
"category": "ai-ml",
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-4-6-ai-ml",
"depends": [
"bp-cilium",
"bp-cert-manager"
]
},
{
"id": "bp-kserve",
"slug": "kserve",
"title": "KServe",
"summary": "|",
"icon": "kserve.svg",
"category": "ai-ml",
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-4-6-ai-ml",
"depends": [
"bp-cilium",
"bp-cert-manager",
"bp-knative"
]
},
{
"id": "bp-librechat",
"slug": "librechat",
"title": "LibreChat",
"summary": "|",
"icon": "librechat.svg",
"category": "application",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-7-application-tier-chat-ui",
"depends": [
"bp-llm-gateway",
"bp-vllm",
"bp-bge",
"bp-keycloak"
]
},
{
"id": "bp-livekit",
"slug": "livekit",
"title": "LiveKit",
"summary": "|",
"icon": "livekit.svg",
"category": "application",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-5-communication",
"depends": [
"bp-stunner",
"bp-cert-manager",
"bp-valkey"
]
},
{
"id": "bp-llm-gateway",
"slug": "llm-gateway",
"title": "LLM Gateway",
"summary": "|",
"icon": "llm-gateway.svg",
"category": "ai-runtime",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-6-llm-serving",
"depends": [
"bp-cnpg",
"bp-keycloak",
"bp-external-secrets"
]
},
{
"id": "bp-matrix",
"slug": "matrix",
"title": "Matrix (Synapse)",
"summary": "|",
"icon": "matrix.svg",
"category": "application",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-5-communication",
"depends": [
"bp-cnpg",
"bp-keycloak",
"bp-cert-manager"
]
},
{
"id": "bp-nats-jetstream",
"slug": "nats-jetstream",
@ -142,10 +347,26 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.1",
"section": "pts-2-3-per-sovereign-supporting-services",
"depends": []
},
{
"id": "bp-nemo-guardrails",
"slug": "nemo-guardrails",
"title": "NeMo Guardrails",
"summary": "Programmable AI safety firewall — blocks prompt injection, PII leaks, off-topic content, and hallucinated citations between user input and the LLM. Wraps NVIDIA's `nemoguardrails server` (FastAPI on port 8000) as a Deployment + Service. Inline filter for bp-llm-gateway and bp-vllm.",
"icon": "nemo-guardrails.svg",
"category": "ai-safety",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-7-ai-safety",
"depends": [
"bp-vllm"
]
},
{
"id": "bp-openbao",
"slug": "openbao",
@ -156,10 +377,28 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.1",
"section": "pts-2-3-per-sovereign-supporting-services",
"depends": []
},
{
"id": "bp-openmeter",
"slug": "openmeter",
"title": "OpenMeter",
"summary": "|",
"icon": "openmeter.svg",
"category": "application",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-8-identity-and-metering",
"depends": [
"bp-cnpg",
"bp-nats-jetstream",
"bp-cert-manager"
]
},
{
"id": "bp-powerdns",
"slug": "powerdns",
@ -170,7 +409,7 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.3",
"section": "pts-3-2-gitops-and-iac",
"depends": [
"bp-cert-manager"
@ -186,7 +425,7 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.1",
"section": "pts-3-3-security-and-policy",
"depends": []
},
@ -200,9 +439,75 @@ export const ALL_BLUEPRINTS: readonly BlueprintCardEntry[] = [
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"version": "1.1.4",
"section": "pts-2-3-per-sovereign-supporting-services",
"depends": []
},
{
"id": "bp-stunner",
"slug": "stunner",
"title": "STUNner",
"summary": "|",
"icon": "stunner.svg",
"category": "communication",
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-4-5-communication",
"depends": [
"bp-cilium",
"bp-cert-manager"
]
},
{
"id": "bp-temporal",
"slug": "temporal",
"title": "Temporal",
"summary": "Durable workflow orchestration with saga + compensation. Postgres-backed (CNPG); Postgres visibility store; Web UI; Keycloak OIDC integration via `--auth-claim-mapper`.",
"icon": "temporal.svg",
"category": "workflow",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-3-workflow-and-processing",
"depends": [
"bp-cnpg",
"bp-cert-manager"
]
},
{
"id": "bp-valkey",
"slug": "valkey",
"title": "Valkey",
"summary": "|",
"icon": "valkey.svg",
"category": "data",
"tagline": null,
"tags": [],
"visibility": "unlisted",
"version": "1.0.0",
"section": "pts-4-1-data-services",
"depends": [
"bp-flux"
]
},
{
"id": "bp-vllm",
"slug": "vllm",
"title": "vLLM",
"summary": "High-throughput LLM inference engine with PagedAttention. OpenAI-compatible API. GPU-accelerated when nvidia.com/gpu is available; CPU fallback for non-GPU dev Sovereigns.",
"icon": "vllm.svg",
"category": "ai-runtime",
"tagline": null,
"tags": [],
"visibility": "listed",
"version": "1.0.0",
"section": "pts-4-6-llm-serving",
"depends": [
"bp-kserve"
]
}
] as const
@ -221,17 +526,35 @@ export const BLUEPRINT_BY_ID: Readonly<Record<string, BlueprintCardEntry>> = Obj
/** Source files this catalog was built from (for diagnostics / CI logs). */
export const PLATFORM_BLUEPRINT_FILES: readonly string[] = [
"platform/anthropic-adapter/blueprint.yaml",
"platform/bge/blueprint.yaml",
"platform/cert-manager-dynadot-webhook/blueprint.yaml",
"platform/cert-manager/blueprint.yaml",
"platform/cilium/blueprint.yaml",
"platform/cnpg/blueprint.yaml",
"platform/crossplane-claims/blueprint.yaml",
"platform/crossplane/blueprint.yaml",
"platform/external-secrets/blueprint.yaml",
"platform/flux/blueprint.yaml",
"platform/gitea/blueprint.yaml",
"platform/keycloak/blueprint.yaml",
"platform/knative/blueprint.yaml",
"platform/kserve/blueprint.yaml",
"platform/librechat/blueprint.yaml",
"platform/livekit/blueprint.yaml",
"platform/llm-gateway/blueprint.yaml",
"platform/matrix/blueprint.yaml",
"platform/nats-jetstream/blueprint.yaml",
"platform/nemo-guardrails/blueprint.yaml",
"platform/openbao/blueprint.yaml",
"platform/openmeter/blueprint.yaml",
"platform/powerdns/blueprint.yaml",
"platform/sealed-secrets/blueprint.yaml",
"platform/spire/blueprint.yaml"
"platform/spire/blueprint.yaml",
"platform/stunner/blueprint.yaml",
"platform/temporal/blueprint.yaml",
"platform/valkey/blueprint.yaml",
"platform/vllm/blueprint.yaml"
] as const
/**
@ -325,11 +648,179 @@ export const BOOTSTRAP_KIT: readonly BootstrapKitEntry[] = [
"file": "10-gitea.yaml",
"order": 10
},
{
"id": "bp-powerdns",
"slug": "powerdns",
"label": "powerdns",
"file": "11-powerdns.yaml",
"order": 11
},
{
"id": "bp-external-dns",
"slug": "external-dns",
"label": "external-dns",
"file": "12-external-dns.yaml",
"order": 12
},
{
"id": "bp-bp-catalyst-platform",
"slug": "bp-catalyst-platform",
"label": "bp-catalyst-platform",
"file": "11-bp-catalyst-platform.yaml",
"order": 11
"file": "13-bp-catalyst-platform.yaml",
"order": 13
},
{
"id": "bp-crossplane-claims",
"slug": "crossplane-claims",
"label": "crossplane-claims",
"file": "14-crossplane-claims.yaml",
"order": 14
},
{
"id": "bp-external-secrets",
"slug": "external-secrets",
"label": "external-secrets",
"file": "15-external-secrets.yaml",
"order": 15
},
{
"id": "bp-cnpg",
"slug": "cnpg",
"label": "cnpg",
"file": "16-cnpg.yaml",
"order": 16
},
{
"id": "bp-valkey",
"slug": "valkey",
"label": "valkey",
"file": "17-valkey.yaml",
"order": 17
},
{
"id": "bp-seaweedfs",
"slug": "seaweedfs",
"label": "seaweedfs",
"file": "18-seaweedfs.yaml",
"order": 18
},
{
"id": "bp-harbor",
"slug": "harbor",
"label": "harbor",
"file": "19-harbor.yaml",
"order": 19
},
{
"id": "bp-opentelemetry",
"slug": "opentelemetry",
"label": "opentelemetry",
"file": "20-opentelemetry.yaml",
"order": 20
},
{
"id": "bp-alloy",
"slug": "alloy",
"label": "alloy",
"file": "21-alloy.yaml",
"order": 21
},
{
"id": "bp-loki",
"slug": "loki",
"label": "loki",
"file": "22-loki.yaml",
"order": 22
},
{
"id": "bp-mimir",
"slug": "mimir",
"label": "mimir",
"file": "23-mimir.yaml",
"order": 23
},
{
"id": "bp-tempo",
"slug": "tempo",
"label": "tempo",
"file": "24-tempo.yaml",
"order": 24
},
{
"id": "bp-grafana",
"slug": "grafana",
"label": "grafana",
"file": "25-grafana.yaml",
"order": 25
},
{
"id": "bp-langfuse",
"slug": "langfuse",
"label": "langfuse",
"file": "26-langfuse.yaml",
"order": 26
},
{
"id": "bp-kyverno",
"slug": "kyverno",
"label": "kyverno",
"file": "27-kyverno.yaml",
"order": 27
},
{
"id": "bp-reloader",
"slug": "reloader",
"label": "reloader",
"file": "28-reloader.yaml",
"order": 28
},
{
"id": "bp-vpa",
"slug": "vpa",
"label": "vpa",
"file": "29-vpa.yaml",
"order": 29
},
{
"id": "bp-trivy",
"slug": "trivy",
"label": "trivy",
"file": "30-trivy.yaml",
"order": 30
},
{
"id": "bp-falco",
"slug": "falco",
"label": "falco",
"file": "31-falco.yaml",
"order": 31
},
{
"id": "bp-sigstore",
"slug": "sigstore",
"label": "sigstore",
"file": "32-sigstore.yaml",
"order": 32
},
{
"id": "bp-syft-grype",
"slug": "syft-grype",
"label": "syft-grype",
"file": "33-syft-grype.yaml",
"order": 33
},
{
"id": "bp-velero",
"slug": "velero",
"label": "velero",
"file": "34-velero.yaml",
"order": 34
},
{
"id": "bp-coraza",
"slug": "coraza",
"label": "coraza",
"file": "35-coraza.yaml",
"order": 35
}
] as const

View File

@ -34,6 +34,7 @@ import {
AddRegionModal,
AddVClusterModal,
DeleteCascadeConfirm,
WipeDeploymentModal,
} from '@/components/CrudModals'
import type { CloudProvider } from '@/entities/deployment/model'
import type { HierarchicalInfrastructure } from '@/lib/infrastructure.types'
@ -99,6 +100,7 @@ interface ModalState {
| 'add-nodepool'
| 'add-lb'
| 'delete'
| 'wipe-deployment'
}
export function ArchitectureGraphPage({
@ -485,7 +487,18 @@ export function ArchitectureGraphPage({
else if (kind === 'Cluster') setModal({ kind: 'add-vcluster' })
else if (kind === 'Cloud') setModal({ kind: 'add-region' })
}}
onDelete={() => setModal({ kind: 'delete' })}
onDelete={() => {
// Cloud root → deployment-level wipe (Phase-0 orphan purge,
// tofu destroy + Hetzner orphan force-purge + PDM release +
// local cleanup). Per-resource Crossplane XRC delete is the
// right path for Region / Cluster / vCluster (day-2). See
// docs/INVIOLABLE-PRINCIPLES.md #3.
if (selectedNode.type === 'Cloud') {
setModal({ kind: 'wipe-deployment' })
} else {
setModal({ kind: 'delete' })
}
}}
onPickNeighbor={(id) => setSelectedId(id)}
/>
)}
@ -515,7 +528,12 @@ export function ArchitectureGraphPage({
closeCtxMenu()
}}
onDelete={() => {
setModal({ kind: 'delete' })
// Same Cloud-root → wipe rule as the detail panel.
if (ctxMenu.node?.type === 'Cloud') {
setModal({ kind: 'wipe-deployment' })
} else {
setModal({ kind: 'delete' })
}
closeCtxMenu()
}}
/>
@ -590,10 +608,40 @@ export function ArchitectureGraphPage({
/>
</>
)}
{/* Deployment-level Phase-0 wipe (issue #318). Reachable from the
Cloud-root node's Delete action OR the canvas context menu's
Delete on a Cloud node. Distinct from DeleteCascadeConfirm
which targets Crossplane XRCs in the day-2 path. */}
<WipeDeploymentModal
open={modal.kind === 'wipe-deployment'}
deploymentId={deploymentId}
sovereignFQDN={inferSovereignFQDNFromGraph(data)}
onClose={() => setModal({ kind: 'none' })}
onWiped={() => {
setModal({ kind: 'none' })
// Hard-navigate to /sovereign so the wizard re-mounts fresh
// (no stale provisioning store state).
window.location.href = '/sovereign'
}}
/>
</div>
)
}
/** Derive the sovereign FQDN from the Architecture data so the
* WipeDeploymentModal confirm input matches what the operator sees on
* the cluster row. First cluster's name is the FQDN by convention. */
function inferSovereignFQDNFromGraph(
data: HierarchicalInfrastructure | null,
): string | null {
if (!data || !data.topology || !data.topology.regions) return null
for (const r of data.topology.regions) {
if (r.clusters && r.clusters.length > 0) return r.clusters[0].name
}
return null
}
/* ── Sub-components ──────────────────────────────────────────────── */
interface TypeBadgeProps {