openova

History

e3mrah 4d24914ae4 feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318 ) (#346 ) * feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318) Adds a first-class Phase-0 recovery surface so an operator can purge a failed pre-handover deployment from the wizard UI without dropping to hcloud CLI runbooks. Two entry-points, one canonical implementation. ## Backend NEW: products/catalyst/bootstrap/api/internal/handler/wipe.go POST /api/v1/deployments/{id}/wipe — single-flight destructive op: 1. tofu destroy against the per-deployment workdir (idempotent). 2. Hetzner orphan force-purge by label-selector `catalyst-deployment-id=<id>` (servers, load balancers, networks, firewalls, ssh-keys). Belt-and-braces — catches resources tofu didn't track (half-failed cloud-init, manual experiments). Per docs/INVIOLABLE-PRINCIPLES.md #3 this direct API path is fallback ONLY for orphan cleanup, never new resource creation. 3. PDM /v1/release for pool-subdomain Sovereigns (best-effort). 4. Local cleanup: kubeconfig file (mode 0600), tofu workdir, on-disk deployment record JSON. 5. SSE events stream throughout on the same channel as the original provisioning + Phase-1 watch. 6. Marks Status="wiped"; sync.Map entry reaped after a 60s TTL. NEW: products/catalyst/bootstrap/api/internal/hetzner/purge.go Hetzner Cloud API enumeration + force-delete by label selector. Uses a 60s timeout (vs the 10s ValidateToken default) because async server-delete jobs can queue. 404s treated as success (already gone). NEW: products/catalyst/bootstrap/api/internal/provisioner/provisioner.go Provisioner.Destroy() — runs `tofu destroy -auto-approve` against the per-deployment workdir, then removes the workdir on success so re-provisioning starts fresh. Re-stages module + tfvars first so a partially-cleaned workdir still has what tofu needs. TOUCHED: products/catalyst/bootstrap/api/cmd/api/main.go Registers POST /api/v1/deployments/{id}/wipe. ## Frontend (aligned with existing CrudModals conventions per founder ## directive — no ad-hoc surface) NEW: products/catalyst/bootstrap/ui/src/components/CrudModals/WipeDeploymentModal.tsx Two-stage modal built on the canonical ModalShell. Pre-wipe confirm view requires the operator to: - Type the sovereign FQDN to confirm scope. - Re-paste their Hetzner Cloud API token (catalyst-api intentionally GCs the original after writeTfvars per credential hygiene). Post-wipe success view shows the PurgeReport (servers, lbs, networks, firewalls, ssh-keys removed; tofu/PDM/local-state ✓/✗) and a "Start fresh deployment" CTA that nav's to /sovereign. TOUCHED: products/catalyst/bootstrap/ui/src/components/CrudModals/index.ts Re-exports WipeDeploymentModal + WipeReport. TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/AppsPage.tsx FailureCard now exposes a "Cancel & Wipe" red button next to "Retry stream" / "Back to wizard" — opens WipeDeploymentModal. TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/InfrastructureTopology.tsx Cloud → Architecture canvas: the `cloud` (root) node action menu gains "Cancel & Wipe deployment" as a `danger:true` action, alongside the existing "+ Add region". Distinct from the per-resource DeleteCascadeConfirm on region/cluster/vCluster — this is deployment-scope (Phase-0 orphan purge), the others are Crossplane-XRC scope (day-2). The two paths coexist; operators choose by what state the deployment is in. ## Why two entry-points Wizard banner (failed state on AppsPage) — recovery from a known failure. Already a red-banner page; the button is right there. Cloud → Architecture cloud-node action — proactive cancel from the canvas, mirrors how the existing per-resource deletes are reachable. Same modal, same backend. ## Constraints honoured - Per docs/INVIOLABLE-PRINCIPLES.md #3 (Crossplane is the ONLY day-2 IaC): the per-resource DELETE handler at infrastructure.go is unchanged and continues to flip XRC deletionPolicy. Wipe operates ONLY in Phase-0 scope where Crossplane never adopted resources. - Per #4 (never hardcode): every endpoint lives behind API_BASE; the Hetzner purge enumerates by deterministic label selector built from var.sovereign_fqdn (the OpenTofu module's existing tagging convention). - Per credential hygiene: the Hetzner token is re-prompted at wipe time rather than persisted; the modal uses an <input type="password">. ## Refs #318 — pre-handover wipe spec (this PR closes it) #317 — handover finalisation (sibling; this PR is the failure-path complement) feedback_idempotent_iac_purge.md — operator runbook this implements PR #313 — sealed-secrets cleanup (independent; safe to land in any order) PR #334 — bp-external-secrets split (independent) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): catalyst-build event-driven only — drop cron, push-on-main with path filter Per docs/INVIOLABLE-PRINCIPLES.md (event-driven end to end — Flux dependsOn, NATS JetStream, SSE, Helm hooks), GitHub Actions must follow the same model. The previous `schedule: cron 0 3 * * ` daily build was the only canonical deploy path, which created a 24h roll latency on every change to the catalyst surface and incentivised "wait for cron" stalls in operator workflows. Replaces with: on: push: branches: [main] paths: - 'core/console/' - 'core/admin/' - 'core/marketplace/' - 'core/marketplace-api/' - 'products/catalyst/bootstrap/' - 'products/catalyst/chart/*' - '.github/workflows/catalyst-build.yaml' workflow_dispatch: `workflow_dispatch` retained for ad-hoc re-runs (config-only changes that bypass the path filter, e.g. a secret rotation that doesn't touch code). Path filter mirrors the actual surface this workflow rebuilds. After this lands, every merge to main that touches the catalyst surface auto-deploys. No cron lag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:24:40 +04:00
..
workflows	feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318 ) (#346 )	2026-05-01 09:24:40 +04:00
dependabot.yml	chore(ci): add Dependabot for npm and GitHub Actions dependency updates	2026-03-19 13:42:02 +01:00

feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318 ) (#346 )

* feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318)

Adds a first-class Phase-0 recovery surface so an operator can purge a
failed pre-handover deployment from the wizard UI without dropping to
hcloud CLI runbooks. Two entry-points, one canonical implementation.

## Backend

NEW: products/catalyst/bootstrap/api/internal/handler/wipe.go
  POST /api/v1/deployments/{id}/wipe — single-flight destructive op:
    1. tofu destroy against the per-deployment workdir (idempotent).
    2. Hetzner orphan force-purge by label-selector
       `catalyst-deployment-id=<id>` (servers, load balancers,
       networks, firewalls, ssh-keys). Belt-and-braces — catches
       resources tofu didn't track (half-failed cloud-init, manual
       experiments). Per docs/INVIOLABLE-PRINCIPLES.md #3 this direct
       API path is fallback ONLY for orphan cleanup, never new
       resource creation.
    3. PDM /v1/release for pool-subdomain Sovereigns (best-effort).
    4. Local cleanup: kubeconfig file (mode 0600), tofu workdir,
       on-disk deployment record JSON.
    5. SSE events stream throughout on the same channel as the
       original provisioning + Phase-1 watch.
    6. Marks Status="wiped"; sync.Map entry reaped after a 60s TTL.

NEW: products/catalyst/bootstrap/api/internal/hetzner/purge.go
  Hetzner Cloud API enumeration + force-delete by label selector.
  Uses a 60s timeout (vs the 10s ValidateToken default) because async
  server-delete jobs can queue. 404s treated as success (already gone).

NEW: products/catalyst/bootstrap/api/internal/provisioner/provisioner.go
  Provisioner.Destroy() — runs `tofu destroy -auto-approve` against
  the per-deployment workdir, then removes the workdir on success so
  re-provisioning starts fresh. Re-stages module + tfvars first so a
  partially-cleaned workdir still has what tofu needs.

TOUCHED: products/catalyst/bootstrap/api/cmd/api/main.go
  Registers POST /api/v1/deployments/{id}/wipe.

## Frontend (aligned with existing CrudModals conventions per founder
##           directive — no ad-hoc surface)

NEW: products/catalyst/bootstrap/ui/src/components/CrudModals/WipeDeploymentModal.tsx
  Two-stage modal built on the canonical ModalShell. Pre-wipe confirm
  view requires the operator to:
    - Type the sovereign FQDN to confirm scope.
    - Re-paste their Hetzner Cloud API token (catalyst-api intentionally
      GCs the original after writeTfvars per credential hygiene).
  Post-wipe success view shows the PurgeReport (servers, lbs, networks,
  firewalls, ssh-keys removed; tofu/PDM/local-state ✓/✗) and a
  "Start fresh deployment" CTA that nav's to /sovereign.

TOUCHED: products/catalyst/bootstrap/ui/src/components/CrudModals/index.ts
  Re-exports WipeDeploymentModal + WipeReport.

TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/AppsPage.tsx
  FailureCard now exposes a "Cancel & Wipe" red button next to
  "Retry stream" / "Back to wizard" — opens WipeDeploymentModal.

TOUCHED: products/catalyst/bootstrap/ui/src/pages/sovereign/InfrastructureTopology.tsx
  Cloud → Architecture canvas: the `cloud` (root) node action menu
  gains "Cancel & Wipe deployment" as a `danger:true` action,
  alongside the existing "+ Add region". Distinct from the
  per-resource DeleteCascadeConfirm on region/cluster/vCluster — this
  is deployment-scope (Phase-0 orphan purge), the others are
  Crossplane-XRC scope (day-2). The two paths coexist; operators
  choose by what state the deployment is in.

## Why two entry-points

Wizard banner (failed state on AppsPage) — recovery from a known
failure. Already a red-banner page; the button is right there.

Cloud → Architecture cloud-node action — proactive cancel from the
canvas, mirrors how the existing per-resource deletes are reachable.
Same modal, same backend.

## Constraints honoured

- Per docs/INVIOLABLE-PRINCIPLES.md #3 (Crossplane is the ONLY day-2
  IaC): the per-resource DELETE handler at infrastructure.go is
  unchanged and continues to flip XRC deletionPolicy. Wipe operates
  ONLY in Phase-0 scope where Crossplane never adopted resources.
- Per #4 (never hardcode): every endpoint lives behind API_BASE; the
  Hetzner purge enumerates by deterministic label selector built from
  var.sovereign_fqdn (the OpenTofu module's existing tagging convention).
- Per credential hygiene: the Hetzner token is re-prompted at wipe time
  rather than persisted; the modal uses an <input type="password">.

## Refs

#318 — pre-handover wipe spec (this PR closes it)
#317 — handover finalisation (sibling; this PR is the failure-path
       complement)
feedback_idempotent_iac_purge.md — operator runbook this implements
PR #313 — sealed-secrets cleanup (independent; safe to land in any order)
PR #334 — bp-external-secrets split (independent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): catalyst-build event-driven only — drop cron, push-on-main with path filter

Per docs/INVIOLABLE-PRINCIPLES.md (event-driven end to end — Flux
dependsOn, NATS JetStream, SSE, Helm hooks), GitHub Actions must follow
the same model. The previous `schedule: cron 0 3 * * *` daily build was
the only canonical deploy path, which created a 24h roll latency on
every change to the catalyst surface and incentivised "wait for cron"
stalls in operator workflows.

Replaces with:
  on:
    push:
      branches: [main]
      paths:
        - 'core/console/**'
        - 'core/admin/**'
        - 'core/marketplace/**'
        - 'core/marketplace-api/**'
        - 'products/catalyst/bootstrap/**'
        - 'products/catalyst/chart/**'
        - '.github/workflows/catalyst-build.yaml'
    workflow_dispatch:

`workflow_dispatch` retained for ad-hoc re-runs (config-only changes
that bypass the path filter, e.g. a secret rotation that doesn't touch
code). Path filter mirrors the actual surface this workflow rebuilds.

After this lands, every merge to main that touches the catalyst surface
auto-deploys. No cron lag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 09:24:40 +04:00

workflows

feat(wipe): deployment-level Cancel & Wipe — backend endpoint + Cloud-Architecture + wizard banner entry-points (closes #318 ) (#346 )

2026-05-01 09:24:40 +04:00

dependabot.yml

chore(ci): add Dependabot for npm and GitHub Actions dependency updates

2026-03-19 13:42:02 +01:00