openova/platform/cilium/README.md
hatiyildiz 14ed84de41 docs(pass-8): role-in-Catalyst banners + dead-link fix in component READMEs
Pass 8 — line-by-line read of platform/cnpg, platform/strimzi,
platform/k8gb, platform/keycloak, platform/cert-manager, platform/cilium.

CNPG and Strimzi: read in full and confirmed clean — they correctly
position themselves as Application Blueprints and don't drift from
the canonical model. CNPG's `<org>-postgres-dr` cluster name
(Application-tier database role) is acceptable per NAMING-CONVENTION
§1.3 (which only forbids primary/dr in K8s host-cluster names, not
in Application-internal CRD names).

Four READMEs updated:

k8gb:
- Header reframed: per-host-cluster infrastructure pointer to
  PLATFORM-TECH-STACK §3.1 and SRE §2.4 split-brain protection.
- Removed dead link to ../failover-controller/docs/ADR-FAILOVER-
  CONTROLLER.md (the failover-controller folder has no docs/);
  replaced with link to that component's README + SRE §2.4.

keycloak:
- Header reframed from "FAPI Authorization Server for Open Banking"
  (narrow) to "User identity for Catalyst Sovereigns" (broad).
  Keycloak handles ALL user identity in Catalyst, not just FAPI.
- Added per-Org / per-Sovereign topology callout matching SECURITY
  §6. Clarified that "Multi-tenant TPP" refers to PSD2 Third Party
  Providers, not Catalyst's Organization-level multi-tenancy.
- FAPI features kept since Keycloak still serves Fingate as the
  FAPI Authorization Server.

cert-manager:
- Header reframed as per-host-cluster infrastructure with pointer
  to PLATFORM-TECH-STACK §3.3.

cilium:
- Header reframed as per-host-cluster infrastructure with pointer
  to PLATFORM-TECH-STACK §3.1, including the install-first note
  (CNI must come before any other workload during Phase 0).

VALIDATION-LOG: Pass 8 entry added.

Refs #37
2026-04-27 21:39:03 +02:00

521 lines
11 KiB
Markdown

# Cilium
Unified CNI + Service Mesh for Kubernetes with eBPF. Per-host-cluster infrastructure (see [`docs/PLATFORM-TECH-STACK.md`](../../docs/PLATFORM-TECH-STACK.md) §3.1) — installed on every host cluster Catalyst manages, before any other workload (CNI must come first).
**Status:** Accepted | **Updated:** 2026-04-27
---
## Overview
Cilium provides both Container Network Interface (CNI) and Service Mesh capabilities in a single, unified solution. It replaces Flannel, kube-proxy, and eliminates the need for a separate service mesh like Istio.
```mermaid
flowchart TB
subgraph Node["Kubernetes Node"]
subgraph Cilium["Cilium"]
Agent[Cilium Agent]
eBPF[eBPF Programs]
Hubble[Hubble]
Envoy[Envoy Proxy<br/>L7 only]
end
subgraph Pods["Application Pods"]
subgraph Pod1["Pod A"]
App1[Application]
OTel1[OTel Agent]
end
subgraph Pod2["Pod B"]
App2[Application]
OTel2[OTel Agent]
end
end
end
subgraph Observability["Observability"]
Collector[OTel Collector]
Grafana[Grafana Stack]
end
Agent --> eBPF
eBPF -->|"L3/L4"| Pod1
eBPF -->|"L3/L4"| Pod2
Envoy -->|"L7 when needed"| Pod1
Envoy -->|"L7 when needed"| Pod2
OTel1 --> Collector
OTel2 --> Collector
Hubble --> Grafana
Collector --> Grafana
```
---
## Why Cilium?
### CNI Comparison
| Feature | Flannel | Cilium |
|---------|---------|--------|
| Basic networking | Yes | Yes |
| Network policies | No | Yes (L3-L7) |
| Observability | No | Yes (Hubble) |
| kube-proxy replacement | No | Yes |
| eBPF performance | No | Yes |
| Service mesh | No | Yes (built-in) |
| L2 LoadBalancer | No | Yes |
| Gateway API | No | Yes |
### Service Mesh Comparison
| Feature | Istio | Cilium Service Mesh |
|---------|-------|---------------------|
| mTLS | Envoy/ztunnel | eBPF + WireGuard |
| L7 Policies | Envoy | CiliumEnvoyConfig |
| Network Policies | Via CNI | Native (eBPF) |
| Resource Overhead | Medium-High | Low |
| Observability | Envoy metrics | Hubble + OTel |
| CNI Integration | Separate | Unified |
| Complexity | High | Medium |
### Key Finding: OTel Independence
OpenTelemetry auto-instrumentation is **independent** of the service mesh:
| Component | Provides |
|-----------|----------|
| **OpenTelemetry** | SQL queries, HTTP calls, application traces |
| **Service Mesh** | mTLS encryption, L7 routing, network policies |
Cilium provides mesh features without sacrificing observability - OTel handles application tracing regardless of mesh choice.
---
## Features
### CNI Features
| Feature | Purpose |
|---------|---------|
| kubeProxyReplacement | Replace kube-proxy with eBPF |
| hubble | Network observability |
| hubble.relay | Metrics export to Grafana |
| gatewayAPI | Gateway API for ingress |
| encryption | WireGuard-based mTLS |
| envoy | L7 traffic management |
| l2announcements | L2 LoadBalancer for bare-metal |
### Service Mesh Capabilities
| Capability | How Cilium Provides It |
|------------|------------------------|
| mTLS | WireGuard encryption (transparent) |
| L7 Policies | CiliumEnvoyConfig + Network Policies |
| Traffic Management | Gateway API (HTTPRoute, GRPCRoute) |
| Observability | Hubble metrics, flows, traces |
| Circuit Breaker | CiliumEnvoyConfig outlier detection |
| Retries | HTTPRoute retry policies |
---
## Configuration
### Helm Values
```yaml
# values.yaml
kubeProxyReplacement: true
k8sServiceHost: ${API_SERVER_IP}
k8sServicePort: 6443
# Service Mesh features
authentication:
mutual:
spire:
enabled: false # Using Cilium native
encryption:
enabled: true
type: wireguard
# L7 proxy
envoy:
enabled: true
# Hubble observability
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
metrics:
enabled:
- dns
- drop
- tcp
- flow
- icmp
- http
# Gateway API
gatewayAPI:
enabled: true
# L2 LoadBalancer (optional)
l2announcements:
enabled: true
```
---
## Network Policies
### mTLS (Mutual TLS)
```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: require-mtls
spec:
endpointSelector:
matchLabels:
app: my-service
ingress:
- fromEndpoints:
- matchLabels:
app: allowed-client
authentication:
mode: required
```
### L7 HTTP Policies
```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: http-policy
spec:
endpointSelector:
matchLabels:
app: api-service
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: GET
path: "/api/v1/.*"
- method: POST
path: "/api/v1/users"
```
---
## Gateway API
Cilium replaces traditional ingress controllers with Gateway API:
```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: cilium-gateway
spec:
gatewayClassName: cilium
listeners:
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: tls-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: app-route
spec:
parentRefs:
- name: cilium-gateway
hostnames:
- "app.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /api
backendRefs:
- name: api-service
port: 8080
timeouts:
request: 30s
retry:
attempts: 3
backoff: 100ms
```
---
## Resilience Patterns
### Circuit Breaker Tiers
| Service Tier | consecutiveErrors | baseEjectionTime | maxEjectionPercent |
|--------------|-------------------|------------------|-------------------|
| Critical | 3 | 30s | 50% |
| Standard | 5 | 30s | 50% |
| Background | 10 | 60s | 75% |
### Standard Service Circuit Breaker
```yaml
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: service-circuit-breaker
spec:
services:
- name: my-service
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: my-service
connect_timeout: 5s
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 100
max_pending_requests: 100
max_requests: 1000
max_retries: 3
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
```
### Critical Service (Database Proxy)
```yaml
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: db-proxy-circuit-breaker
spec:
services:
- name: db-proxy
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: db-proxy
connect_timeout: 3s
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 50
max_pending_requests: 50
max_retries: 2
outlier_detection:
consecutive_5xx: 3
interval: 5s
base_ejection_time: 30s
max_ejection_percent: 50
```
### Timeout Configuration
```yaml
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: service-timeouts
spec:
services:
- name: my-service
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: my-service-routes
virtual_hosts:
- name: my-service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: my-service
timeout: 30s
```
---
## Health Probes
| Probe Type | Purpose | Failure Action |
|------------|---------|----------------|
| Startup | Wait for initialization | Block traffic |
| Readiness | Can accept traffic | Remove from LB |
| Liveness | Process healthy | Restart pod |
### Standard Configuration
```yaml
startupProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 30
readinessProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: 8080
periodSeconds: 20
failureThreshold: 3
```
---
## SLO-Based Alerting
| SLI | Target | Alert Threshold |
|-----|--------|-----------------|
| Availability | 99.9% | <99.5% for 5m |
| Latency (p95) | <500ms | >1s for 5m |
| Error Rate | <0.1% | >1% for 5m |
---
## LoadBalancer Options
```mermaid
flowchart TB
subgraph Options["LoadBalancer Options"]
subgraph CloudLB["Cloud LB (Recommended)"]
HetznerLB[Hetzner LB]
OCILB[OCI LB]
end
subgraph K8gbLB["k8gb DNS-based (Free)"]
K8gb[k8gb GSLB]
ExternalDNS[ExternalDNS]
end
subgraph CiliumLB["Cilium L2 (Single Subnet)"]
L2[L2 Announcements]
end
end
CloudLB -->|"Best for cloud"| Service[K8s Service]
K8gbLB -->|"Best for multi-region"| Service
CiliumLB -->|"Best for single node"| Service
```
| Option | Cost | Multi-Region | Use Case |
|--------|------|--------------|----------|
| Cloud LB | Paid | Via k8gb | Production |
| k8gb DNS-based | Free | Native | Cost-sensitive |
| Cilium L2 | Free | No | Single subnet/dev |
---
## OpenTelemetry Integration
OTel is configured independently of Cilium:
```yaml
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: auto-instrumentation
namespace: opentelemetry
spec:
exporter:
endpoint: http://collector.opentelemetry.svc:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "1.0"
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
```
Pods opt-in via annotation:
```yaml
metadata:
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
```
---
## Observability Stack
| Source | Provides |
|--------|----------|
| **Hubble (Cilium)** | Network flows, DNS queries, HTTP metrics (L7), dropped packets |
| **OpenTelemetry** | Application traces, SQL queries, HTTP client calls, custom spans |
| **Grafana Stack** | Loki (logs), Mimir (metrics), Tempo (traces), dashboards |
---
## Migration from Istio
If migrating from Istio:
1. **Install Cilium** with service mesh features
2. **Remove Istio sidecars** (if using sidecar mode)
3. **Convert VirtualService** → HTTPRoute
4. **Convert DestinationRule** → CiliumNetworkPolicy + CiliumEnvoyConfig
5. **Verify OTel** continues working (it's independent)
---
## Consequences
**Positive:**
- Unified CNI + service mesh (single component)
- Advanced network policies (L3-L7)
- Network observability via Hubble
- eBPF performance
- kube-proxy replacement
- L2 LoadBalancer option
- Native Kubernetes Gateway API support
- Lower resource overhead than Istio
**Negative:**
- More complex than Flannel
- eBPF kernel requirements (5.10+)
- Less mature than Istio for some features
- CiliumEnvoyConfig less intuitive than VirtualService
---
*Part of [OpenOva](https://openova.io)*