openova/platform/keda
hatiyildiz 5834daec14 docs(pass-10): banners on 7 more components + opentofu active-active drift fix
7 more component READMEs got role-in-Catalyst banners:

- vpa, keda, reloader → per-host-cluster scaling/ops layer (§3.4).
  Reloader specifically calls out its role in Catalyst's secret-
  rotation flow (rolling deploy on K8s Secret hash change).
- external-dns → per-host-cluster DNS-sync (§3.1); pairs with k8gb
  for the GSLB zone separation.
- coraza → DMZ-block WAF on every host cluster (§3.1).
- crossplane → per-Sovereign on the management cluster (§3.2);
  banner explicitly emphasizes the agreed "never a user-facing
  surface" rule (Users don't write Compositions in Application
  configs; Blueprint authors and advanced contributors do). Cross-
  references the no-fourth-surface clause in ARCHITECTURE §4/§7
  and the Crossplane Composition section in BLUEPRINT-AUTHORING §8.
- opentofu → repositioned as Phase-0-only, runs on `catalyst-
  provisioner` only, NOT installed on host clusters at runtime.

opentofu drift fixes (uncovered by line-by-line read):
- Section 5 line 182: "Bootstrap Wizard prompts for cloud credentials"
  → "Catalyst Bootstrap (Phase 0) prompts for cloud credentials"
  (banned term).
- Same section line 186: "ESO PushSecrets sync to both regional
  OpenBao instances" — the active-active drift Pass 7 corrected
  elsewhere, still here. Replaced with "writes go to the primary
  OpenBao region only; replicas pick up via async perf replication".

VALIDATION-LOG: Pass 10 entry added.

Refs #37
2026-04-27 21:43:45 +02:00
..
README.md docs(pass-10): banners on 7 more components + opentofu active-active drift fix 2026-04-27 21:43:45 +02:00

KEDA

Event-driven horizontal autoscaling, scale-to-zero. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.4) — runs on every host cluster a Sovereign owns.

Status: Accepted | Updated: 2026-04-27


Overview

KEDA (Kubernetes Event-driven Autoscaling) provides horizontal pod autoscaling based on external metrics and events:

  • Queue-based scaling (Kafka via Strimzi)
  • Metric-based scaling (Prometheus, custom metrics)
  • Cron-based scaling
  • Scale-to-zero capability

Architecture

flowchart TB
    subgraph KEDA["KEDA"]
        Operator[KEDA Operator]
        Metrics[Metrics Adapter]
    end

    subgraph Sources["Event Sources"]
        Kafka[Kafka]
        Prometheus[Prometheus/Mimir]
        Cron[Cron]
    end

    subgraph Workloads["Workloads"]
        Deploy[Deployments]
        Pods[Pods]
    end

    Sources --> Operator
    Operator --> Deploy
    Deploy --> Pods
    Metrics --> Operator

Scalers

Scaler Use Case
kafka Kafka consumer lag
prometheus Custom metrics
cron Time-based scaling
cpu/memory Resource utilization

Configuration

ScaledObject

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <org>-worker
  namespace: <org>
spec:
  scaleTargetRef:
    name: <org>-worker
  minReplicaCount: 1
  maxReplicaCount: 10
  cooldownPeriod: 300
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-kafka-bootstrap.databases.svc:9092
        consumerGroup: <org>-workers
        topic: <org>-jobs
        lagThreshold: "100"

Prometheus Scaler

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <org>-api
  namespace: <org>
spec:
  scaleTargetRef:
    name: <org>-api
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://mimir.monitoring.svc:8080/prometheus
        metricName: http_requests_per_second
        query: |
          sum(rate(http_requests_total{namespace="<org>"}[1m]))          
        threshold: "100"

Cron Scaler

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <org>-batch
  namespace: <org>
spec:
  scaleTargetRef:
    name: <org>-batch
  minReplicaCount: 0
  maxReplicaCount: 5
  triggers:
    - type: cron
      metadata:
        timezone: UTC
        start: "0 8 * * 1-5"
        end: "0 18 * * 1-5"
        desiredReplicas: "3"

VPA + KEDA Coordination

flowchart LR
    subgraph Scaling["Scaling"]
        VPA[VPA<br/>Vertical]
        KEDA[KEDA<br/>Horizontal]
    end

    subgraph Workload["Workload"]
        Deploy[Deployment]
        Pods[Pods]
    end

    VPA -->|"Right-size resources"| Pods
    KEDA -->|"Scale replicas"| Deploy
    Deploy --> Pods
  • VPA: Optimizes CPU/memory per pod
  • KEDA: Scales replica count based on events
  • Combined: Optimal resource utilization with event-driven elasticity

Scale-to-Zero

KEDA supports scaling to zero for batch workloads:

spec:
  minReplicaCount: 0  # Allow scale-to-zero
  idleReplicaCount: 0  # Scale to zero when idle

Monitoring

Metric Description
keda_scaler_active Whether scaler is active
keda_scaler_metrics_value Current metric value
keda_scaled_object_errors Scaling errors

Part of OpenOva