openova/platform/velero/README.md
hatiyildiz 7cafa3c894 docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay
Component-level architectural correction (two changes):

1. MinIO → SeaweedFS as unified S3 encapsulation layer

The old design used MinIO for in-cluster S3 plus separate cold-tier configuration scattered across consumers. The new design positions SeaweedFS as the single S3 encapsulation layer: every Catalyst component talks to one endpoint (seaweedfs.storage.svc:8333). SeaweedFS internally handles hot tier (in-cluster NVMe), warm tier (in-cluster bulk), and cold tier (transparent passthrough to cloud archival storage — Cloudflare R2 / AWS S3 / Hetzner Object Storage / etc., chosen at Sovereign provisioning). One audit/lifecycle/encryption boundary instead of N. No Catalyst component talks to cloud S3 directly anymore — Velero, CNPG WAL archive, OpenSearch snapshots, Loki/Mimir/Tempo, Iceberg, Harbor blob store, Application buckets all share one S3 surface.

2. Apache Guacamole added as Application Blueprint §4.5 Communication

Clientless browser-based RDP/VNC/SSH/kubectl-exec gateway. Keycloak SSO, full session recording to SeaweedFS for compliance evidence (PSD2/DORA/SOX). Composed into bp-relay. Replaces VPN+native-client distribution for auditable remote access.

Component changes:
- DELETED: platform/minio/
- CREATED: platform/seaweedfs/README.md (unified S3 + cold-tier encapsulation; bucket layout; multi-region replication via shared cold backend; migration-from-MinIO section)
- CREATED: platform/guacamole/README.md (clientless remote-desktop gateway; GuacamoleConnection CRD; compliance integration via session recordings)

Doc updates: PLATFORM-TECH-STACK §1+§3.5+§4.5+§5+§7.4; TECHNOLOGY-FORECAST L11+mandatory+a-la-carte counts (52 → 53); ARCHITECTURE §3 topology; SECURITY §4 DB engines; SOVEREIGN-PROVISIONING §1 inputs; SRE §2.5+§7; IMPLEMENTATION-STATUS §3; BLUEPRINT-AUTHORING stateful examples; BUSINESS-STRATEGY 13 component-count anchors + Relay product line; README.md backup row; CLAUDE.md folder count.

Component README updates (S3 endpoint + dependency renames): cnpg, clickhouse, flink, gitea, iceberg, harbor, grafana, livekit, kserve, milvus, opensearch, flux, stalwart, velero (substantive rewrite of velero — now writes exclusively to SeaweedFS with cold-tier auto-routing). Products: relay, fabric.

UI scaffold: products/catalyst/bootstrap/ui/src/shared/constants/components.ts — minio entry replaced with seaweedfs; velero+harbor deps updated; new guacamole entry added.

VALIDATION-LOG entry "Pass 104 — MinIO → SeaweedFS swap + Guacamole add" captures the encapsulation principle and adds Lesson #22: storage tier policy belongs at the encapsulation boundary, not inside every consumer.

Verification: zero remaining MinIO references in canonical docs (one intentional retention in TECHNOLOGY-FORECAST L37 explaining the swap); 53 platform/ folders matching all "53 components" anchors; bp-relay composition includes guacamole.
2026-04-28 10:23:46 +02:00

6.8 KiB
Raw Permalink Blame History

Velero

Kubernetes backup/restore for disaster recovery. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.5) — runs on every host cluster Catalyst manages. Backups land in the velero-backups bucket on SeaweedFS, which is Catalyst's unified S3 encapsulation layer; SeaweedFS's cold-tier policy automatically transitions backup objects to the configured cloud archival backend (Cloudflare R2 / AWS S3 / Hetzner Object Storage / etc.) so backups survive cluster failure without any direct cloud-S3 call from Velero itself.

Status: Accepted | Updated: 2026-04-28


Overview

Velero provides Kubernetes-native backup. All Velero output goes to the same single S3 endpointseaweedfs.storage.svc:8333, bucket velero-backups. SeaweedFS handles the rest: hot-tier in-cluster for fast restore of recent backups; cold-tier in cloud archival storage for backups beyond the configured warm-window.

flowchart TB
    subgraph K8s["Kubernetes Cluster"]
        Velero[Velero]
        Apps[Applications]
        PVs[Persistent Volumes]
    end

    subgraph SW["SeaweedFS (in-cluster S3 encapsulation)"]
        Bucket[velero-backups bucket]
        TierMgr[Tier Manager]
    end

    subgraph Archival["Cloud archive backend (cold tier)"]
        R2[Cloudflare R2]
        S3[AWS S3]
        GCS[GCP GCS]
        Hetzner[Hetzner Object Storage]
        OCI[OCI Object Storage]
    end

    Apps --> Velero
    PVs --> Velero
    Velero -->|"Backup"| Bucket
    Bucket --> TierMgr
    TierMgr -->|"After warm window"| Archival

Why route through SeaweedFS

Property Direct cloud-S3 calls Through SeaweedFS encapsulation
Number of S3 endpoints in Catalyst components N (one per consumer × cloud) 1 (seaweedfs.storage.svc:8333)
Hot-restore latency for recent backups Cloud round-trip Near-zero (in-cluster cache)
Audit / lifecycle / encryption boundary Per-component One central boundary
Air-gap deployment Requires direct cloud reachability Works with SeaweedFS-only mode (see SRE §7)

Backups survive cluster failure because SeaweedFS's cold tier is the cloud archival backend, not the in-cluster volumes. Even if the entire host cluster is destroyed, backups beyond the warm window already live in the cold backend (R2 / Glacier / etc.) and a restoring SeaweedFS can read them through.


Storage Backend Options

Provider Availability Egress Fees Notes
Cloud Provider Storage Default Varies Hetzner, OCI, Huawei OBS
Cloudflare R2 Always available Free Zero egress, multi-cloud friendly
AWS S3 Available $0.09/GB Full featured
GCP GCS Available $0.12/GB Full featured

Default: Cloud provider's object storage (Hetzner Object Storage, OCI Object Storage, etc.)

Alternative: Cloudflare R2 for zero egress fees, useful for multi-cloud or egress-heavy scenarios.


Configuration

Cloudflare R2 (Zero Egress)

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: r2-backup
  namespace: velero
spec:
  provider: aws
  bucket: <org>-backups
  config:
    region: auto
    s3ForcePathStyle: "true"
    s3Url: https://<account-id>.r2.cloudflarestorage.com
  credential:
    name: r2-credentials
    key: cloud

AWS S3

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: s3-backup
  namespace: velero
spec:
  provider: aws
  bucket: <org>-backups
  config:
    region: us-east-1
  credential:
    name: aws-credentials
    key: cloud

GCP GCS

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: gcs-backup
  namespace: velero
spec:
  provider: gcp
  bucket: <org>-backups
  credential:
    name: gcp-credentials
    key: cloud

Backup Schedule

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  template:
    includedNamespaces:
      - "*"
    excludedNamespaces:
      - velero
      - kube-system
    includedResources:
      - "*"
    excludedResources:
      - events
      - events.events.k8s.io
    storageLocation: r2-backup
    ttl: 720h  # 30 days

Backup Strategy

Resource Schedule Retention
All namespaces Daily 2 AM 30 days
Databases (labels) Hourly 7 days
Secrets Daily 90 days
PVs (snapshots) Daily 14 days

Multi-Region Backup

flowchart TB
    subgraph Region1["Region 1"]
        V1[Velero]
        K1[Kubernetes]
    end

    subgraph Region2["Region 2"]
        V2[Velero]
        K2[Kubernetes]
    end

    subgraph Archival["Archival S3"]
        Bucket[Shared Bucket<br/>or Cross-Region Replication]
    end

    V1 -->|"Backup"| Bucket
    V2 -->|"Backup"| Bucket
    Bucket -->|"Restore"| V1
    Bucket -->|"Restore"| V2

Both regions can:

  • Backup to same bucket (different prefixes)
  • Restore from either region's backups
  • Use for cross-region disaster recovery

Restore Procedure

sequenceDiagram
    participant Op as Operator
    participant Velero as Velero
    participant S3 as Archival S3
    participant K8s as Kubernetes

    Op->>Velero: velero restore create
    Velero->>S3: Fetch backup
    S3->>Velero: Return backup data
    Velero->>K8s: Restore resources
    Velero->>K8s: Restore PV data
    K8s->>Op: Restoration complete

Commands

# List available backups
velero backup get

# Restore entire backup
velero restore create --from-backup daily-backup-20260116

# Restore specific namespace
velero restore create --from-backup daily-backup-20260116 \
  --include-namespaces databases

# Restore to different namespace
velero restore create --from-backup daily-backup-20260116 \
  --include-namespaces databases \
  --namespace-mappings databases:databases-restored

Operations

Check Backup Status

# List backups
velero backup get

# Describe specific backup
velero backup describe daily-backup-20260116

# Check backup logs
velero backup logs daily-backup-20260116

Verify Backup Location

# Check backup storage locations
velero backup-location get

# Verify connection
velero backup-location check r2-backup

Manual Backup

# Create manual backup
velero backup create manual-backup-$(date +%Y%m%d)

# Backup specific namespace
velero backup create db-backup-$(date +%Y%m%d) \
  --include-namespaces databases

Consequences

Positive:

  • K8s-native backup
  • Flexible storage backends
  • Zero egress with Cloudflare R2
  • Cross-region restore capability
  • Incremental backups

Negative:

  • Requires external S3 (by design)
  • PV backup requires CSI snapshots
  • Large restores take time

Part of OpenOva