* feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes#318)
Adds a first-class Phase-0 recovery surface so an operator can purge a
failed pre-handover deployment from the wizard UI without dropping to
hcloud CLI runbooks. Two entry-points, one canonical implementation.
## Backend
NEW: products/catalyst/bootstrap/api/internal/handler/wipe.go
POST /api/v1/deployments/{id}/wipe — single-flight destructive op:
1. tofu destroy against the per-deployment workdir (idempotent).
2. Hetzner orphan force-purge by label-selector
`catalyst-deployment-id=<id>` (servers, load balancers,
networks, firewalls, ssh-keys). Belt-and-braces — catches
resources tofu didn't track (half-failed cloud-init, manual
experiments). Per docs/INVIOLABLE-PRINCIPLES.md #3 this direct
API path is fallback ONLY for orphan cleanup, never new
resource creation.
3. PDM /v1/release for pool-subdomain Sovereigns (best-effort).
4. Local cleanup: kubeconfig file (mode 0600), tofu workdir,
on-disk deployment record JSON.
5. SSE events stream throughout on the same channel as the
original provisioning + Phase-1 watch.
6. Marks Status="wiped"; sync.Map entry reaped after a 60s TTL.
NEW: products/catalyst/bootstrap/api/internal/hetzner/purge.go
Hetzner Cloud API enumeration + force-delete by label selector.
Uses a 60s timeout (vs the 10s ValidateToken default) because async
server-delete jobs can queue. 404s treated as success (already gone).
NEW: products/catalyst/bootstrap/api/internal/provisioner/provisioner.go
Provisioner.Destroy() — runs `tofu destroy -auto-approve` against
the per-deployment workdir, then removes the workdir on success so
re-provisioning starts fresh. Re-stages module + tfvars first so a
partially-cleaned workdir still has what tofu needs.
TOUCHED: products/catalyst/bootstrap/api/cmd/api/main.go
Registers POST /api/v1/deployments/{id}/wipe.
## Frontend (aligned with existing CrudModals conventions per founder
## directive — no ad-hoc surface)
NEW: products/catalyst/bootstrap/ui/src/components/CrudModals/WipeDeploymentModal.tsx
Two-stage modal built on the canonical ModalShell. Pre-wipe confirm
view requires the operator to:
- Type the sovereign FQDN to confirm scope.
- Re-paste their Hetzner Cloud API token (catalyst-api intentionally
GCs the original after writeTfvars per credential hygiene).
Post-wipe success view shows the PurgeReport (servers, lbs, networks,
firewalls, ssh-keys removed; tofu/PDM/local-state ✓/✗) and a
"Start fresh deployment" CTA that nav's to /sovereign.
TOUCHED: products/catalyst/bootstrap/ui/src/components/CrudModals/index.ts
Re-exports WipeDeploymentModal + WipeReport.
TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/AppsPage.tsx
FailureCard now exposes a "Cancel & Wipe" red button next to
"Retry stream" / "Back to wizard" — opens WipeDeploymentModal.
TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/InfrastructureTopology.tsx
Cloud → Architecture canvas: the `cloud` (root) node action menu
gains "Cancel & Wipe deployment" as a `danger:true` action,
alongside the existing "+ Add region". Distinct from the
per-resource DeleteCascadeConfirm on region/cluster/vCluster — this
is deployment-scope (Phase-0 orphan purge), the others are
Crossplane-XRC scope (day-2). The two paths coexist; operators
choose by what state the deployment is in.
## Why two entry-points
Wizard banner (failed state on AppsPage) — recovery from a known
failure. Already a red-banner page; the button is right there.
Cloud → Architecture cloud-node action — proactive cancel from the
canvas, mirrors how the existing per-resource deletes are reachable.
Same modal, same backend.
## Constraints honoured
- Per docs/INVIOLABLE-PRINCIPLES.md #3 (Crossplane is the ONLY day-2
IaC): the per-resource DELETE handler at infrastructure.go is
unchanged and continues to flip XRC deletionPolicy. Wipe operates
ONLY in Phase-0 scope where Crossplane never adopted resources.
- Per #4 (never hardcode): every endpoint lives behind API_BASE; the
Hetzner purge enumerates by deterministic label selector built from
var.sovereign_fqdn (the OpenTofu module's existing tagging convention).
- Per credential hygiene: the Hetzner token is re-prompted at wipe time
rather than persisted; the modal uses an <input type="password">.
## Refs
#318 — pre-handover wipe spec (this PR closes it)
#317 — handover finalisation (sibling; this PR is the failure-path
complement)
feedback_idempotent_iac_purge.md — operator runbook this implements
PR #313 — sealed-secrets cleanup (independent; safe to land in any order)
PR #334 — bp-external-secrets split (independent)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): catalyst-build event-driven only — drop cron, push-on-main with path filter
Per docs/INVIOLABLE-PRINCIPLES.md (event-driven end to end — Flux
dependsOn, NATS JetStream, SSE, Helm hooks), GitHub Actions must follow
the same model. The previous `schedule: cron 0 3 * * *` daily build was
the only canonical deploy path, which created a 24h roll latency on
every change to the catalyst surface and incentivised "wait for cron"
stalls in operator workflows.
Replaces with:
on:
push:
branches: [main]
paths:
- 'core/console/**'
- 'core/admin/**'
- 'core/marketplace/**'
- 'core/marketplace-api/**'
- 'products/catalyst/bootstrap/**'
- 'products/catalyst/chart/**'
- '.github/workflows/catalyst-build.yaml'
workflow_dispatch:
`workflow_dispatch` retained for ad-hoc re-runs (config-only changes
that bypass the path filter, e.g. a secret rotation that doesn't touch
code). Path filter mirrors the actual surface this workflow rebuilds.
After this lands, every merge to main that touches the catalyst surface
auto-deploys. No cron lag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>