openova/clusters/_template/bootstrap-kit/01a-gateway-api.yaml
e3mrah 73ae746637
fix(cloud-init): install Gateway API v1.1.0 CRDs before cilium so operator registers gateway controller (#581)
Root cause (otech22 2026-05-02): Cilium operator checks for Gateway API
CRDs at startup and disables its gateway controller if they are absent —
a static, one-shot decision. Cloud-init installs k3s+Cilium first, then
Flux reconciles bp-gateway-api minutes later, so the operator always
starts without CRDs and never recovers. All 8 HTTPRoutes orphaned.

Three-part permanent fix:

1. cloud-init: apply Gateway API v1.1.0 experimental CRDs (incl.
   TLSRoute) BEFORE the Cilium helm install. Cilium 1.16.x requires
   TLSRoute CRD to be present; without it the operator's capability
   check fails entirely and disables the gateway controller.

2. bp-cilium (1.1.2 → 1.1.3): add gatewayAPI.gatewayClass.create: "true"
   to force GatewayClass creation regardless of CRD presence at Helm
   render time. Upstream default "auto" skips GatewayClass when the
   gateway API CRDs are absent at install time (Capabilities check).

3. bp-gateway-api (1.0.0 → 1.1.0): downgrade CRDs from v1.2.0 to v1.1.0
   and ship experimental channel (TLSRoute, TCPRoute, UDPRoute,
   BackendLBPolicy, BackendTLSPolicy). Gateway API v1.2.0 changed
   status.supportedFeatures from string[] to object[]; Cilium 1.16.5
   writes the old string format and the v1.2.0 CRD rejects the status
   patch with "must be of type object: string", leaving GatewayClass
   permanently Unknown/Pending. v1.1.0 retains string schema.

Upgrade path: bump bp-gateway-api + bp-cilium together when Cilium ≥ 1.17
adopts the v1.2.0 object schema for supportedFeatures.

Closes #503

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:23:32 +04:00

81 lines
2.8 KiB
YAML

# bp-gateway-api — Catalyst bootstrap-kit Blueprint, slot 01a (between
# bp-cilium and every chart that ships HTTPRoute templates). Installs the
# upstream Kubernetes Gateway API CRDs (Standard channel — gatewayclasses,
# gateways, httproutes, grpcroutes, referencegrants).
#
# Why this Blueprint exists (issue #503):
#
# Cilium 1.16's chart `gatewayAPI.enabled=true` flag (set in
# platform/cilium/chart/values.yaml) wires up the cilium gateway
# controller and creates the `cilium` GatewayClass — but it does NOT
# install the gateway.networking.k8s.io CRDs themselves. Without those
# CRDs registered on the apiserver, every chart that references
# HTTPRoute / Gateway / GatewayClass resources fails install with:
#
# no matches for kind "HTTPRoute" in version "gateway.networking.k8s.io/v1"
#
# Phase-8a-preflight live deployment otech10 (e1a0cd6662872fcb,
# 2026-05-01) hit exactly this: bp-harbor, bp-openbao, bp-powerdns
# reconciled to InstallFailed with the message above; the fix is to
# install the upstream Gateway API CRDs ahead of any chart that uses
# them. Same pattern as bp-crossplane-claims and
# bp-external-secrets-stores — split CRD install from CR application
# so Flux dependsOn can order them.
#
# Wrapper chart: platform/gateway-api/chart/
# Reconciled by: Flux on the new Sovereign's k3s control plane.
#
# dependsOn: bp-cilium — Cilium owns the GatewayClass that the upstream
# Gateway resources reference; this Blueprint just installs the CRD
# schema. Sequencing CRDs after the CNI also ensures the apiserver has
# a working pod network when the CRD apply lands.
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bp-gateway-api
namespace: flux-system
spec:
type: oci
interval: 15m
url: oci://ghcr.io/openova-io
secretRef:
name: ghcr-pull
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: bp-gateway-api
namespace: flux-system
spec:
interval: 15m
releaseName: gateway-api
# CRDs are cluster-scoped; targetNamespace is just where the Helm
# release marker Secret lives. Using flux-system keeps the marker
# next to every other bootstrap-kit release.
targetNamespace: flux-system
dependsOn:
- name: bp-cilium
chart:
spec:
chart: bp-gateway-api
version: 1.1.0
sourceRef:
kind: HelmRepository
name: bp-gateway-api
namespace: flux-system
# Event-driven install: 5 CRDs apply in a single pass; nothing to wait
# for beyond apiserver acceptance. Helm Ready is sufficient — every
# downstream HelmRelease that needs the CRDs declares
# `dependsOn: bp-gateway-api` so Flux gates them on this release's
# Ready condition.
install:
disableWait: true
remediation:
retries: 3
upgrade:
disableWait: true
remediation:
retries: 3