7 more component READMEs got role-in-Catalyst banners: Per-host-cluster infrastructure: - minio (§3.5): S3 fast-tier; tiers cold to cloud archival. - velero (§3.5): K8s backup to archival S3 (NOT MinIO — that's fast-tier; backups land in cloud archival). - failover-controller (§3.6): lease-based split-brain protection layered on k8gb; pointers to SRE §2.4 (witness pattern) + SECURITY §5.2 (OpenBao DR promotion). - trivy (§3.3): CI + registry + runtime scan chain. Application Blueprints (NOT control plane): - opensearch (§4.1): explicitly framed as Application Blueprint — installed when an Org wants SIEM / full-text search / log analytics. - clickhouse (§4.1): used by bp-fabric and SIEM cold-storage tier. - ferretdb (§4.1): replication piggybacks on underlying CNPG. MinIO ILM disambiguation: - The Mermaid diagram had `ILM[Lifecycle Manager]` — confusable with the rejected Catalyst sub-product (per banned-terms list). Relabeled to `ILM[Information Lifecycle Manager - MinIO ILM]` to make clear it's MinIO's own feature, not the deprecated Catalyst Lifecycle Manager noun. VALIDATION-LOG: Pass 11 entry added. Refs #37 |
||
|---|---|---|
| .. | ||
| README.md | ||
Velero
Kubernetes backup/restore to archival S3 for disaster recovery. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.5) — runs on every host cluster Catalyst manages. Backups land in cloud archival storage (Cloudflare R2 / AWS S3 / etc.), not in MinIO (which is fast-tier in-cluster).
Status: Accepted | Updated: 2026-04-27
Overview
Velero provides Kubernetes-native backup with flexible storage backend options. Backups are stored in Archival S3 (external storage), not in-cluster MinIO.
flowchart TB
subgraph K8s["Kubernetes Cluster"]
Velero[Velero]
Apps[Applications]
PVs[Persistent Volumes]
end
subgraph Archival["Archival S3 Options"]
R2[Cloudflare R2]
S3[AWS S3]
GCS[GCP GCS]
Hetzner[Hetzner Object Storage]
OCI[OCI Object Storage]
end
Apps --> Velero
PVs --> Velero
Velero -->|"Backup"| Archival
Why Archival S3?
| Storage | Purpose | Use for Backup? |
|---|---|---|
| MinIO | Fast in-cluster S3 | No |
| Archival S3 | External cold storage | Yes |
Velero backs up to Archival S3, not MinIO.
Reason: Backups must survive cluster failure. MinIO is inside the cluster.
Storage Backend Options
| Provider | Availability | Egress Fees | Notes |
|---|---|---|---|
| Cloud Provider Storage | Default | Varies | Hetzner, OCI, Huawei OBS |
| Cloudflare R2 | Always available | Free | Zero egress, multi-cloud friendly |
| AWS S3 | Available | $0.09/GB | Full featured |
| GCP GCS | Available | $0.12/GB | Full featured |
Default: Cloud provider's object storage (Hetzner Object Storage, OCI Object Storage, etc.)
Alternative: Cloudflare R2 for zero egress fees, useful for multi-cloud or egress-heavy scenarios.
Configuration
Cloudflare R2 (Zero Egress)
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: r2-backup
namespace: velero
spec:
provider: aws
bucket: <org>-backups
config:
region: auto
s3ForcePathStyle: "true"
s3Url: https://<account-id>.r2.cloudflarestorage.com
credential:
name: r2-credentials
key: cloud
AWS S3
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: s3-backup
namespace: velero
spec:
provider: aws
bucket: <org>-backups
config:
region: us-east-1
credential:
name: aws-credentials
key: cloud
GCP GCS
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: gcs-backup
namespace: velero
spec:
provider: gcp
bucket: <org>-backups
credential:
name: gcp-credentials
key: cloud
Backup Schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup
namespace: velero
spec:
schedule: "0 2 * * *" # Daily at 2 AM
template:
includedNamespaces:
- "*"
excludedNamespaces:
- velero
- kube-system
includedResources:
- "*"
excludedResources:
- events
- events.events.k8s.io
storageLocation: r2-backup
ttl: 720h # 30 days
Backup Strategy
| Resource | Schedule | Retention |
|---|---|---|
| All namespaces | Daily 2 AM | 30 days |
| Databases (labels) | Hourly | 7 days |
| Secrets | Daily | 90 days |
| PVs (snapshots) | Daily | 14 days |
Multi-Region Backup
flowchart TB
subgraph Region1["Region 1"]
V1[Velero]
K1[Kubernetes]
end
subgraph Region2["Region 2"]
V2[Velero]
K2[Kubernetes]
end
subgraph Archival["Archival S3"]
Bucket[Shared Bucket<br/>or Cross-Region Replication]
end
V1 -->|"Backup"| Bucket
V2 -->|"Backup"| Bucket
Bucket -->|"Restore"| V1
Bucket -->|"Restore"| V2
Both regions can:
- Backup to same bucket (different prefixes)
- Restore from either region's backups
- Use for cross-region disaster recovery
Restore Procedure
sequenceDiagram
participant Op as Operator
participant Velero as Velero
participant S3 as Archival S3
participant K8s as Kubernetes
Op->>Velero: velero restore create
Velero->>S3: Fetch backup
S3->>Velero: Return backup data
Velero->>K8s: Restore resources
Velero->>K8s: Restore PV data
K8s->>Op: Restoration complete
Commands
# List available backups
velero backup get
# Restore entire backup
velero restore create --from-backup daily-backup-20260116
# Restore specific namespace
velero restore create --from-backup daily-backup-20260116 \
--include-namespaces databases
# Restore to different namespace
velero restore create --from-backup daily-backup-20260116 \
--include-namespaces databases \
--namespace-mappings databases:databases-restored
Operations
Check Backup Status
# List backups
velero backup get
# Describe specific backup
velero backup describe daily-backup-20260116
# Check backup logs
velero backup logs daily-backup-20260116
Verify Backup Location
# Check backup storage locations
velero backup-location get
# Verify connection
velero backup-location check r2-backup
Manual Backup
# Create manual backup
velero backup create manual-backup-$(date +%Y%m%d)
# Backup specific namespace
velero backup create db-backup-$(date +%Y%m%d) \
--include-namespaces databases
Consequences
Positive:
- K8s-native backup
- Flexible storage backends
- Zero egress with Cloudflare R2
- Cross-region restore capability
- Incremental backups
Negative:
- Requires external S3 (by design)
- PV backup requires CSI snapshots
- Large restores take time
Part of OpenOva