openova/clusters/_template/bootstrap-kit/01-cilium.yaml
hatiyildiz 66ea39f091 fix(infra): set envoyConfig.enabled=true so cilium-operator registers envoyconfig CRDs (Phase-8a bug #15)
Phase-8a-preflight live deployment 1bfc46347564467b confirmed cilium-agent
crash-loops forever waiting for envoyconfig CRDs that the operator never
registers:

  Still waiting for Cilium Operator to register the following CRDs:
  [crd:ciliumclusterwideenvoyconfigs.cilium.io
   crd:ciliumenvoyconfigs.cilium.io]

Root cause: upstream Cilium 1.16 chart has TWO separate envoy toggles:
- cilium.envoy.enabled — runs Envoy as a separate DaemonSet (was set)
- cilium.envoyConfig.enabled — registers CRDs + agent/operator controllers
  for CiliumEnvoyConfig (was NOT set)

The chart values.yaml only sets envoy.enabled=true. Operator finishes CRD
registration with 11 of 13 CRDs, missing the two envoy ones, and
cilium-agent's node taint never lifts. All 37 dependent HelmReleases
block forever on the dependsOn chain.

Fix in HR values (no chart rebuild needed; lands via Flux on next
sovereign provision directly).
2026-05-01 21:38:33 +02:00

112 lines
4.1 KiB
YAML

# bp-cilium — Catalyst bootstrap-kit Blueprint. CNI must come first; k3s started with --flannel-backend=none precisely so Cilium can take over.
#
# Wrapper chart: platform/cilium/chart/
# Catalyst-curated values: platform/cilium/chart/values.yaml
# Reconciled by: Flux on the new Sovereign's k3s control plane.
---
# kube-system is built into every Kubernetes cluster — never re-declare it.
# Earlier revisions of 01-cilium.yaml AND 05-sealed-secrets.yaml both
# declared it, which collided when kustomize tried to merge the two:
# "may not add resource with an already registered id:
# Namespace.v1.[noGrp]/kube-system.[noNs]"
# This Blueprint installs Cilium INTO kube-system; the HelmRelease's
# targetNamespace field below is sufficient.
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bp-cilium
namespace: flux-system
spec:
type: oci
interval: 15m
url: oci://ghcr.io/openova-io
secretRef:
name: ghcr-pull
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: bp-cilium
namespace: flux-system
spec:
interval: 15m
releaseName: cilium
targetNamespace: kube-system
chart:
spec:
chart: bp-cilium
version: 1.1.1
sourceRef:
kind: HelmRepository
name: bp-cilium
namespace: flux-system
# Event-driven install: Helm completes when manifests apply, not when
# cilium-agent reaches Ready (agent waits for envoyconfig CRDs that the
# SAME chart installs — legitimate slow-Ready). Replaces blanket
# spec.timeout: 15m band-aid from PR #221.
install:
disableWait: true
remediation:
retries: 3
upgrade:
disableWait: true
remediation:
retries: 3
values:
cilium:
# Phase-8a bug #15 (otech8 deployment 1bfc46347564467b 2026-05-01):
# cilium-agent waits forever for the operator to register
# ciliumenvoyconfigs + ciliumclusterwideenvoyconfigs CRDs.
# Setting `envoy.enabled: true` (chart-level) runs Envoy as a separate
# daemonset but does NOT register those CRDs — that requires
# `envoyConfig.enabled: true`, a separate upstream chart toggle.
# Without it, the agent's node taint `node.cilium.io/agent-not-ready`
# never lifts and every other HelmRelease (37 of them) blocks on its
# dependsOn chain.
envoyConfig:
enabled: true
l7Proxy: true
prometheus:
enabled: false
serviceMonitor:
enabled: false
hubble:
metrics:
enabled: null
serviceMonitor:
enabled: false
relay:
enabled: false
ui:
enabled: false
---
# ─── Per-Sovereign Gateway API resources (issue #387) ────────────────────
#
# Cilium owns the GatewayClass (`cilium`) installed by the chart above
# (gatewayAPI.enabled=true, envoy.enabled=true in platform/cilium/chart/
# values.yaml). The single per-Sovereign Gateway listening on
# *.${SOVEREIGN_FQDN}:443 lives here so it boots alongside the CNI
# without needing a new bootstrap-kit slot — every Sovereign HTTP
# blueprint (catalyst-platform, gitea, keycloak, harbor, grafana,
# openbao, powerdns) attaches its HTTPRoute to this Gateway via
# parentRefs.
#
# TLS material: a wildcard Certificate is requested from
# letsencrypt-dns01-prod (cert-manager + bp-cert-manager-powerdns-webhook
# from #373). The resulting Secret `sovereign-wildcard-tls` is
# referenced by the Gateway listener.
#
# Cross-namespace HTTPRoute attachment: allowedRoutes.namespaces.from=All
# permits every blueprint namespace (catalyst-system, gitea, keycloak,
# harbor, grafana-system, openbao, powerdns-system) to bind without a
# ReferenceGrant. This matches the Catalyst single-tenant Sovereign
# model — cross-tenant isolation is enforced by per-tenant vClusters
# (bp-vcluster), not by Gateway-level RBAC.
#
# Per ADR-0001 §9.4 and docs/INVIOLABLE-PRINCIPLES.md #4: this resource
# only renders when ${SOVEREIGN_FQDN} is set by Flux envsubst at the
# Sovereign apply time — contabo's bootstrap path does NOT include this
# template, so Traefik continues to serve console.openova.io/nova
# unchanged.