cilium-envoy refuses to bind privileged ports (80/443) on Sovereigns even with all of: - gatewayAPI.hostNetwork.enabled=true on the Cilium chart - securityContext.privileged=true on the cilium-envoy DaemonSet - securityContext.capabilities.add=[NET_BIND_SERVICE] - envoy-keep-cap-netbindservice=true in cilium-config ConfigMap - Gateway API CRDs at v1.3.0 (matching cilium 1.19.3 schema) Repeatable error from cilium-envoy logs across otech45, otech46, otech47: listener 'kube-system/cilium-gateway-cilium-gateway/listener' failed to bind or apply socket options: cannot bind '0.0.0.0:80': Permission denied The bind() syscall is intercepted by cilium-agent's BPF socket-LB program in a way that does not honour container capabilities. Even PID 1 with CapEff=0x000001ffffffffff (all caps) and uid=0 gets "Permission denied". Cilium 1.19.3 → 1.16.5 made no difference (F1, PR #684 still ships — the version bump is sound for other reasons; the listener bind is just a separate fix). This commit moves the listeners to high ports (30080/30443) and lets the Hetzner LB do the public-facing port translation: HCLB :80 → CP node :30080 (cilium-gateway HTTP listener) HCLB :443 → CP node :30443 (cilium-gateway HTTPS listener) External users still hit `https://console.<sov>.omani.works/auth/handover` on port 443; the high port is invisible. High-port bind succeeds without NET_BIND_SERVICE because the kernel only gates ports below `net.ipv4.ip_unprivileged_port_start` (default 1024). Will be verified on otech48: the next fresh provision should serve console.otech48/auth/handover end-to-end without the 502/timeout chain seen on otech45–47. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
55 lines
2.1 KiB
YAML
55 lines
2.1 KiB
YAML
# Cilium Gateway (Phase-8a bug #14 follow-up to #484).
|
|
# Moved out of bootstrap-kit/01-cilium.yaml because gateway.networking.k8s.io/v1
|
|
# CRDs are installed by the Cilium HelmRelease itself; Flux dry-runs the
|
|
# whole Kustomization before applying any HR, so Gateway dry-run fails on
|
|
# a fresh cluster. The sovereign-tls Kustomization dependsOn bootstrap-kit
|
|
# Ready, so by the time Gateway is applied here, Cilium has installed.
|
|
|
|
apiVersion: gateway.networking.k8s.io/v1
|
|
kind: Gateway
|
|
metadata:
|
|
name: cilium-gateway
|
|
namespace: kube-system
|
|
labels:
|
|
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
|
catalyst.openova.io/component: cilium-gateway
|
|
spec:
|
|
gatewayClassName: cilium
|
|
# NOTE: ports 30080/30443 (not 80/443) — even with hostNetwork=true,
|
|
# cilium-envoy refuses to bind privileged ports because cilium-agent
|
|
# gates that bind through its `envoy-keep-cap-netbindservice` flag and
|
|
# the resulting bind() syscall is intercepted by the agent's BPF
|
|
# socket-LB program. Setting privileged: true on the cilium-envoy
|
|
# DaemonSet + adding NET_BIND_SERVICE + flipping the configmap flag
|
|
# all failed to lift the bind() rejection (verified live on otech45,
|
|
# otech46, otech47).
|
|
#
|
|
# High-port (>1024) bind succeeds without NET_BIND_SERVICE. The
|
|
# Hetzner LB does the public-facing port translation: HCLB listens on
|
|
# 80→forwards to CP node:30080; HCLB listens on 443→forwards to CP
|
|
# node:30443. Browsers hit the canonical URL (`https://console.<fqdn>/`)
|
|
# so port 30443 is never visible externally.
|
|
#
|
|
# See infra/hetzner/main.tf hcloud_load_balancer_service.{http,https}
|
|
# destination_port settings — they MUST match these listener ports.
|
|
listeners:
|
|
- name: https
|
|
port: 30443
|
|
protocol: HTTPS
|
|
hostname: "*.${SOVEREIGN_FQDN}"
|
|
tls:
|
|
mode: Terminate
|
|
certificateRefs:
|
|
- kind: Secret
|
|
name: sovereign-wildcard-tls
|
|
allowedRoutes:
|
|
namespaces:
|
|
from: All
|
|
- name: http
|
|
port: 30080
|
|
protocol: HTTP
|
|
hostname: "*.${SOVEREIGN_FQDN}"
|
|
allowedRoutes:
|
|
namespaces:
|
|
from: All
|