Commit Graph

2 Commits

Author SHA1 Message Date
e3mrah
1e17668055
feat(catalyst): Hetzner Object Storage credential pattern — Phase 0b (#371) (#409)
* feat(catalyst): Hetzner Object Storage credential pattern (Phase 0b, #371)

Adds the per-Sovereign Hetzner Object Storage credential capture + bucket
provisioning Phase 0b path described in the omantel handover WBS §5.
Hybrid Option A+B: wizard collects operator-issued S3 credentials (Hetzner
exposes no Cloud API to mint them — they're issued once in the Hetzner
Console and the secret half is shown exactly once), and OpenTofu
auto-provisions the per-Sovereign bucket via the aminueza/minio provider
+ writes a flux-system/hetzner-object-storage Secret into the new
Sovereign at cloud-init time so Harbor (#383) and Velero (#384) find
their backing-store credentials already in the cluster from Phase 1
onwards.

Extends the EXISTING canonical seam at every layer (per the founder's
anti-duplication rule for #371's session): the existing Tofu module at
infra/hetzner/, the existing handler/credentials.go validator, the
existing provisioner.Request struct, the existing store.Redact path,
and the existing wizard StepCredentials. No parallel binaries / scripts
/ operators introduced.

infra/hetzner/ (Tofu module — Phase 0):
  - versions.tf: declare aminueza/minio provider (Hetzner's official
    recommendation for S3-compatible bucket creation per
    docs.hetzner.com/storage/object-storage/getting-started/...)
  - variables.tf: 4 sensitive vars — region (validated against
    fsn1/nbg1/hel1, the European-only OS regions as of 2026-04),
    access_key, secret_key, bucket_name (RFC-compliant S3 naming)
  - main.tf: minio_s3_bucket.main resource — idempotent on re-apply,
    no force_destroy (Velero archive must survive a control-plane
    reinstall), object_locking=false (content-addressed digests are
    the immutability guarantee for Harbor; Velero uses S3 versioning)
  - cloudinit-control-plane.tftpl: write
    flux-system/hetzner-object-storage Secret with the canonical
    s3-endpoint/s3-region/s3-bucket/s3-access-key/s3-secret-key keys
    Harbor + Velero charts consume via existingSecret refs
  - outputs.tf: surface endpoint/region/bucket back to catalyst-api
    for the deployment record (credentials NEVER returned)

products/catalyst/bootstrap/api/ (Go):
  - internal/hetzner/objectstorage.go: NEW — minio-go/v7-based
    ListBuckets validator. Distinguishes auth failure ("rejected") from
    network failure ("unreachable") so the wizard renders the right
    error card. NOT a parallel cloud-resource path — the existing
    purge.go handles hcloud purge; objectstorage.go handles a separate
    API surface (S3-compatible) that has no equivalent client today.
  - internal/handler/credentials.go: extend with
    ValidateObjectStorageCredentials handler — same wire shape
    (200 valid:true / 200 valid:false / 503 unreachable / 400 bad
    input) as the existing token validator so the wizard's failure-
    card machinery handles both without per-endpoint switches.
  - cmd/api/main.go: wire POST
    /api/v1/credentials/object-storage/validate
  - internal/provisioner/provisioner.go: extend Request with
    ObjectStorageRegion/AccessKey/SecretKey/Bucket; Validate()
    rejects empty/malformed values fail-fast at /api/v1/deployments
    POST time; writeTfvars() emits the 4 new tfvars.
  - internal/handler/deployments.go: derive bucket name from FQDN slug
    pre-Validate (catalyst-<fqdn-with-dots-replaced-by-dashes>) so
    Hetzner's globally-namespaced bucket pool gets a deterministic,
    collision-resistant per-Sovereign name without operator input.
  - internal/store/store.go: redact access/secret keys; preserve
    region+bucket plain (they're public in tofu outputs anyway).

products/catalyst/bootstrap/ui/ (TypeScript / React):
  - entities/deployment/model.ts + store.ts: 4 new wizard fields
    (objectStorageRegion/AccessKey/SecretKey/Validated) with merge()
    coercion for legacy persisted state.
  - pages/wizard/steps/StepCredentials.tsx: ObjectStorageSection —
    region picker (fsn1/nbg1/hel1), masked secret-key input,
    Validate button gating Next. Same FailureCard taxonomy
    (rejected/too-short/unreachable/network/parse/http) the existing
    TokenSection uses, so the operator UX is consistent. Section
    only renders when Hetzner is among chosen providers — non-Hetzner
    Sovereigns skip Phase 0b until their own backing-store path lands.
  - pages/wizard/steps/StepReview.tsx: include
    objectStorageRegion/AccessKey/SecretKey in the
    POST /v1/deployments payload (bucket derived server-side).

Tests:
  - api: 7 new provisioner Validate tests (region/keys/bucket
    required + RFC-compliant + valid-region acceptance), 5 handler
    tests for the new endpoint (bad JSON / missing region / invalid
    region / short keys), 4 hetzner/objectstorage_test.go tests
    (endpoint composition + early input rejection), 1 handler test
    for the bucket-name derivation. Existing tests updated to supply
    the new required fields.
  - ui: StepCredentials.test.tsx pre-populates objectStorageValidated
    in beforeEach so the existing 11 SSH-section tests aren't gated
    on Object Storage validation.

DoD: a fresh Sovereign provision results in a usable S3 endpoint URL +
access/secret keys available as a K8s Secret in the Sovereign's home
cluster (flux-system/hetzner-object-storage), ready for consumption by
Harbor + Velero charts via existingSecret references.

Closes #371.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(wbs): #371 done — Hetzner Object Storage Phase 0b shipped (#409)

Marks #371 done with the architectural rationale (hybrid Option A + B —
Hetzner exposes no Cloud API to mint S3 keys, so the wizard MUST capture
them; OpenTofu auto-provisions the bucket + cloud-init writes the
flux-system/hetzner-object-storage Secret with the canonical s3-* keys
Harbor + Velero consume).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:54:22 +04:00
hatiyildiz
e668637bc9 feat(provisioner): replace bespoke Hetzner+helm-exec code with OpenTofu→Crossplane→Flux
Per docs/INVIOLABLE-PRINCIPLES.md Lesson #24 — the previous commits 915c467 + 07b4bcf shipped bespoke Go code that called Hetzner Cloud API directly + exec'd helm/kubectl, which violates principle #3 (OpenTofu provisions Phase 0, Crossplane is the ONLY day-2 IaC, Flux is the ONLY GitOps reconciler, Blueprints are the ONLY install unit). This commit reverts all of that and replaces it with the canonical architecture.

REVERTED (deleted):
- products/catalyst/bootstrap/api/internal/hetzner/resources.go (379 lines bespoke Hetzner API client)
- products/catalyst/bootstrap/api/internal/hetzner/cloudinit.go (bespoke cloud-init builder)
- products/catalyst/bootstrap/api/internal/hetzner/provisioner.go (306 lines orchestrator)
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go (helm-exec installer for 11 components)
- products/catalyst/bootstrap/api/internal/bootstrap/exec.go (kubectl/helm exec wrappers)

KEPT:
- products/catalyst/bootstrap/api/internal/hetzner/client.go — fast token validity probe used by StepCredentials wizard step. NOT architectural drift; just a UX pre-flight check.
- products/catalyst/bootstrap/api/internal/dynadot/dynadot.go — DNS API client. Will be invoked by the OpenTofu module via local-exec (the catalyst-dns helper binary).

NEW (canonical architecture):

infra/hetzner/ — OpenTofu module per docs/SOVEREIGN-PROVISIONING.md §3 Phase 0:
- versions.tf: hetznercloud/hcloud provider ~> 1.49
- variables.tf: 17 typed variables matching wizard inputs (sovereign_fqdn, hcloud_token, region, control_plane_size, ssh_public_key, domain_mode, gitops_repo_url, etc.) — all runtime parameters, none hardcoded per principle #4
- main.tf: hcloud_network + subnet + firewall + ssh_key + control-plane server(s) with cloud-init + worker servers + load_balancer with services + null_resource calling /usr/local/bin/catalyst-dns for pool-domain DNS writes
- outputs.tf: control_plane_ip, load_balancer_ip, sovereign_fqdn, console_url, gitops_repo_url
- cloudinit-control-plane.tftpl: installs k3s with --flannel-backend=none --disable=traefik --disable=servicelb (Cilium replaces all of these), then installs Flux core, then applies a GitRepository pointing at clusters/${sovereign_fqdn}/ in the public OpenOva monorepo. From this point Flux is the GitOps engine — it reconciles bp-cilium → bp-cert-manager → bp-crossplane → ... → bp-catalyst-platform via the Kustomization tree the cluster directory ships. NO bespoke helm install from outside the cluster. NO direct kubectl apply. Flux is the install layer.
- cloudinit-worker.tftpl: k3s agent join via private-IP control plane

products/catalyst/bootstrap/api/internal/provisioner/provisioner.go — thin OpenTofu invoker:
- Validates wizard inputs
- Stages the canonical infra/hetzner/ module into a per-deployment workdir
- Writes tofu.auto.tfvars.json from the wizard request
- Execs `tofu init`, `tofu plan -out=tfplan`, `tofu apply tfplan`, streaming stdout/stderr lines as SSE events to the wizard
- Reads tofu output -json for control_plane_ip + load_balancer_ip
- Returns Result. Flux on the new cluster takes over from here.

products/catalyst/bootstrap/api/internal/handler/deployments.go — rewritten:
- Uses provisioner.Request and provisioner.New() (no more hetzner.Provisioner)
- Same SSE/poll endpoints; same Dynadot env-var injection for pool-domain mode

What this commit DOES NOT yet include (intentionally — separate work):
- clusters/${sovereign_fqdn}/ Kustomization tree in the monorepo that Flux will reconcile (each Sovereign gets its own cluster directory). Tracked separately as part of the bp-catalyst-platform umbrella work.
- /usr/local/bin/catalyst-dns helper binary in the catalyst-api Containerfile. Tracked as ticket [G] dns Dynadot client.
- Crossplane Compositions for hcloud resources at platform/crossplane/compositions/. Tracked as part of [F] crossplane chart.

Lesson #24 closed. Architecture now matches docs/ARCHITECTURE.md §10 + SOVEREIGN-PROVISIONING.md §3-§4 exactly.
2026-04-28 13:38:56 +02:00