openova

Author	SHA1	Message	Date
e3mrah	b8d7a8b9cf	fix(bp-seaweedfs): disable global.enableSecurity to avoid fromToml on helm-controller v1.1.0 (#339 ) Upstream seaweedfs/seaweedfs templates/shared/security-configmap.yaml uses Helm template fromToml; helm-controller v1.1.0's bundled helm SDK (v3.x older than 3.13) doesn't define fromToml so the install fails: parse error at security-configmap.yaml:21: function fromToml not defined Setting global.seaweedfs.enableSecurity: false skips the entire template. Internal SeaweedFS API is cluster-IP only on Sovereign-1; chart-level security is acceptable to defer until helm-controller is bumped. Bumped 1.0.0 → 1.0.1. Unblocks the chain: bp-loki, bp-mimir, bp-tempo, bp-velero, bp-harbor, bp-grafana all dependsOn bp-seaweedfs. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 23:42:43 +04:00
e3mrah	9554be4a5e	fix(bp-external-secrets): gate ClusterSecretStore on CRD presence + drop delete-policy (#337 ) The chart's post-install hook was failing on otech.omani.works: failed post-install: unable to build kubernetes object for deleting hook bp-external-secrets/templates/clustersecretstore-vault-region1.yaml: resource mapping not found for kind ClusterSecretStore in version external-secrets.io/v1beta1 Two corrections: 1. Capabilities-gate the entire template — don't render unless the ClusterSecretStore CRD is registered (it ships in via the upstream ESO subchart but isn't live on first install) 2. Remove 'before-hook-creation' delete-policy (was the actual trigger for the 'deleting hook' failure path) Bumped 1.0.0 → 1.0.1. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 23:31:24 +04:00
e3mrah	5502d9aa48	feat(dns): cert-manager-dynadot-webhook for DNS-01 wildcard TLS (closes #159 ) (#291 ) Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer in bp-cert-manager by shipping the missing piece — a Go binary that satisfies cert-manager's external webhook contract (`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json. Architecture ============ * `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with pool-domain-manager and catalyst-dns). Encapsulates the api3.json transport, command builders, response decoding, and the safe read-modify-write semantics required to never accidentally wipe a zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2` variant is unexported. * `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook binary. Implements `Solver.Present` via the client's append-only `AddRecord` path and `Solver.CleanUp` via the read-modify-write `RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`) rejects challenges for unmanaged apexes BEFORE any Dynadot call. * `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm wrapper. Templates Deployment + Service + APIService + serving Certificate (CA chain via cert-manager Issuer self-signing) + RBAC + ServiceAccount. Mirrors the standard cert-manager external- webhook deployment shape. * `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the paired ClusterIssuer activates. The interim http01 issuer remains templated as the rollback path. Test results ============ core/pkg/dynadot-client — 7 tests PASS (race-clean) core/cmd/cert-manager-dynadot-... — 9 tests PASS (race-clean) Test coverage includes a Present/CleanUp round-trip against an httptest fixture that models Dynadot's zone state, an explicit unmanaged-domain rejection, a regression preserving a pre-existing CNAME across the DNS-01 round-trip (the zone-wipe defence), and a typed-error propagation test that surfaces `ErrInvalidToken` to cert-manager so the controller will retry. Helm template smoke render ========================== `helm template` against the new chart with default values yields 12 resources / 424 lines (APIService, Certificate, ClusterRoleBinding, Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The modified bp-cert-manager chart still renders both ClusterIssuers (`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default values; flipping `certManager.issuers.dns01.enabled=false` is the clean rollback. Smoke command (post-deploy) =========================== kubectl get apiservices.apiregistration.k8s.io \ v1alpha1.acme.dynadot.openova.io # Issue a *.<sovereign>.<pool> wildcard cert and watch the # Order/Challenge progress through cert-manager. CI == `.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the pool-domain-manager-build pattern (cosign keyless signing, SBOM attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager- dynadot-webhook:<sha>`). Triggered by changes to either the binary or the shared dynadot-client package. Closes #159 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 19:37:47 +04:00
e3mrah	c09109a61a	feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265 ) (#290 ) Edge + serverless + model-serving batch (W2.5.C) — three upstream- subchart umbrella Blueprints completing the bootstrap-kit slots for WebRTC media relay (bp-relay → bp-stunner) and the AI/ML serving stack (bp-cortex → bp-kserve → bp-knative). Each chart follows the canonical umbrella pattern from docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream chart under `dependencies:` so `helm dependency build` bundles the upstream payload into the OCI artifact, and Catalyst-curated overlay values + templates sit alongside in chart/values.yaml + chart/templates/. Per-chart highlights: - bp-stunner/1.0.0 — wraps stunner/stunner-gateway-operator 1.1.0. Ships a Cilium-native GatewayClass (Capabilities-gated on gateway.networking.k8s.io/v1) so bp-relay (LiveKit / SFU) can claim Gateway CRs without an operator-ordering dance. Default UDP TURN port range 30000-32767 matches the range opened at the Sovereign edge firewall (Crossplane bp-firewall composition). - bp-knative/1.0.0 — wraps knative-operator v1.21.1. Ships a KnativeServing CR pre-configured for istio-less mode (ingress.istio.enabled=false, ingress.contour.enabled=false, ingress.kourier.enabled=false; config.network.ingress-class=cilium). Sovereign FQDN sourced from values, no hardcoded fallback per inviolable principle #4 — render fails loudly if cluster overlay doesn't set knativeOverlay.knativeServing.sovereignFqdn. - bp-kserve/1.0.0 — wraps kserve/kserve v0.16.0 (latest version published on the official OCI registry as of 2026-04-30). Default deploymentMode=RawDeployment (no Knative hop on the hot path) but bp-knative is still installed (declared as a hard dep) so per-IS annotation `serving.kserve.io/deploymentMode: Serverless` opts in to scale-to-zero per tenant. Cilium native Gateway-API ingress (enableGatewayApi=true, className=cilium, disableIstioVirtualHost= true). Observability discipline (issue #182): every observability toggle (ServiceMonitor, HPA, GatewayClass) defaults false and is operator- tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (with `--api-versions monitoring.coreos.com/v1` to simulate Prometheus Operator CRDs), and explicit-off cases. Per-chart kind summary (helm template default render): bp-stunner: ClusterRole, ClusterRoleBinding, ConfigMap, Dataplane, Deployment, Role, RoleBinding, Service, ServiceAccount. (+ GatewayClass when --api-versions gateway.networking.k8s.io/v1 is passed.) bp-knative: ClusterRole, ClusterRoleBinding, ConfigMap, CustomResourceDefinition, Deployment, KnativeServing, Role, RoleBinding, Secret, Service, ServiceAccount. bp-kserve: Certificate, ClusterRole, ClusterRoleBinding, ClusterServingRuntime, ClusterStorageContainer, ConfigMap, Deployment, Gateway, Issuer, MutatingWebhookConfiguration, Role, RoleBinding, Service, ServiceAccount, ValidatingWebhookConfiguration. `helm lint` clean for all three (single INFO on missing icon — icons land with marketplace card work). `bash tests/observability-toggle.sh` green for all three (3 cases each: default-off, opt-in, explicit-off). Closes #263 #264 #265 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 19:37:38 +04:00
e3mrah	782d8015c5	feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274 ) (#289 ) W2.5.F — three Catalyst Blueprint umbrella charts at platform/{openmeter, livekit,matrix}/, each declaring its upstream chart under Chart.yaml `dependencies:` so `helm dependency build` bundles the upstream payload into the published OCI artifact (per docs/BLUEPRINT-AUTHORING.md §11.1 — hollow charts forbidden, CI-enforced by issue #181). Per-chart kind summary ====================== bp-openmeter (closes #272) default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount upstream chart: openmeter 1.0.0-beta.213 (oci://ghcr.io/openmeterio/helm-charts) ClickHouse-less profile per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §6.4. The upstream chart's bundled clickhouse / kafka / postgresql / redis / svix subcharts are all DISABLED — Catalyst supplies CNPG (postgres), JetStream (event bus), and Valkey (redis-compat) at the platform tier. Chart-level toggle `catalystBlueprint.backend.kind` (default `cnpg`, alt `clickhouse`) records the active profile so observability/audit pipelines can report it. The OpenMeter binary's `aggregation.clickhouse.address` is left blank — per-Sovereign overlay supplies it once a host cluster adds bp-clickhouse and the operator re-rolls with `backend.kind: clickhouse`. Catalyst overlay templates (NetworkPolicy / ServiceMonitor / HPA) all default OFF per docs/BLUEPRINT-AUTHORING.md §11.2. bp-livekit (closes #273) default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount upstream chart: livekit-server 1.9.0 (https://helm.livekit.io) WebRTC SFU. Powers the Huawei iFlytek voice demo. Catalyst defaults pair LiveKit with bp-stunner (the upstream chart's bundled co-located TURN server is OFF; per-Sovereign overlay points the LiveKit TURN config at the stunner UDP-gateway Service). RTC UDP port range is 50000-60000 (matches the Hetzner firewall rule the per-Sovereign overlay opens). Catalyst overlay templates (NetworkPolicy / ServiceMonitor / HPA) all default OFF; the chart's NetworkPolicy template documents that LiveKit's hostNetwork mode means pod-level policies do NOT cover the SFU port range — the firewall rule is the load-bearing control. blueprint.yaml `depends:` declares bp-stunner + bp-cert-manager + bp-valkey. bp-matrix (closes #274) default `helm template` kinds: ConfigMap, Deployment, Ingress, Job, PersistentVolumeClaim, Pod, Role, RoleBinding, Secret, Service, ServiceAccount upstream chart: matrix-synapse 3.12.25 (https://ananace.gitlab.io/charts) Synapse (the Matrix server implementation, NOT the retired OpenOva product noun). Federation OFF by default (Catalyst per-Sovereign tenancy default — operator overlays flip it on per-Organization). Postgres backend via bp-cnpg externalPostgresql; OIDC SSO via bp-keycloak; bundled bitnami postgresql + redis subcharts both disabled. Catalyst overlay NetworkPolicy gates the federation port (8448) on `federation.enabled` — verified by Case 5 of the observability-toggle test. Catalyst-overlay ServiceMonitor (upstream chart has none) + HPA both default OFF. Lint ==== All three charts pass `helm lint` clean (only the noisy "icon is recommended" INFO message). Observability tests =================== Each chart's `tests/observability-toggle.sh` enforces the Catalyst contract from docs/BLUEPRINT-AUTHORING.md §11.2: Case 1: default render produces zero monitoring.coreos.com/v1 resources (no ServiceMonitor / PrometheusRule). Case 2: opt-in (--set serviceMonitor.enabled=true --api-versions monitoring.coreos.com/v1) renders a ServiceMonitor. Case 3: explicit-off render is clean. Case 4 (per chart): - openmeter: ClickHouse-less profile asserts no clickhouse.altinity.com / Kafka subchart resources leak into the default render. - livekit: asserts upstream livekit-server.serviceMonitor.create defaults false. - matrix: asserts default render carries an empty federation_domain_whitelist (the per-Sovereign tenancy default). Case 5 (matrix only): `--set federation.enabled=true networkPolicy .enabled=true` opens port 8448 in the Catalyst NetworkPolicy. All gates green for all three charts. Closes #272 #273 #274 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 19:37:28 +04:00
e3mrah	87d9a4afa7	feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271 ) (#288 ) W2.5.E batch — three Application-tier Blueprints completing the LLM serving / workflow stack: - bp-temporal/1.0.0 — wraps temporal/temporal 1.2.0 (the new chart rewrite that removed cassandra:/mysql:/postgresql:/elasticsearch:/ prometheus:/grafana: top-level keys in favour of server.config.persistence.datastores). Postgres-only via CNPG-backed visibility store (skip Cassandra). Web UI ON. Keycloak OIDC integration via --auth-claim-mapper renders auth.yaml ConfigMap (operator wires via additionalVolumes once bp-keycloak is reconciled, default OFF). dependsOn: bp-cnpg + bp-cert-manager. Closes #271. Kinds: Cluster (CNPG) + ConfigMap + Deployment + Job + Pod + Service. - bp-llm-gateway/1.0.0 — wraps berriai/litellm-helm 0.1.572 from OCI. Subscription-aware proxy for Claude Code: routes to Anthropic (via operator OAuth/Max subscription — NEVER an ANTHROPIC_API_KEY, per memory/feedback_no_api_key.md), Bedrock, Vertex, OpenAI-compatible (via bp-anthropic-adapter), and self-hosted vLLM. CNPG-backed audit log (every prompt + response persisted for compliance). Bundled bitnami postgresql + redis subcharts DISABLED (db.useExisting=true points at the CNPG cluster). Keycloak SSO via auth.yaml ConfigMap (default OFF). ExternalSecret-backed environmentSecrets brings tokens / IAM creds in without inlining plaintext. dependsOn: bp-cnpg + bp-keycloak + bp-external-secrets. Closes #267. Kinds: Cluster (CNPG audit) + ConfigMap + Deployment + Job + Pod + Secret + Service + ServiceAccount. - bp-anthropic-adapter/1.0.0 — Catalyst-authored scratch chart for the OpenAI ↔ Anthropic translation Go service. SHA-pinned image ghcr.io/openova-io/openova/anthropic-adapter:<sha> (Inviolable Principle #4a — GitHub Actions is the only build path; empty default tag fails the render with a clear error instead of silently shipping :latest). OAuth/Max subscription token mounted from K8s Secret materialized by ESO from bp-openbao — ANTHROPIC_OAUTH_TOKEN env var, NEVER an ANTHROPIC_API_KEY. Includes OpenAI → Anthropic model-mapping ConfigMap (gpt-4 → claude-3-5-sonnet, gpt-4o-mini → claude-3-5-haiku, etc.). sigstore/common library subchart included to satisfy the hollow-chart gate (matches bp-vllm pattern from #283). dependsOn: bp-external-secrets. Closes #268. Kinds: ConfigMap + Deployment + Service + ServiceAccount. CRITICAL — bp-llm-gateway and bp-anthropic-adapter both consume the operator's Claude OAuth/Max subscription. Per memory/ feedback_no_api_key.md and the user's standing instruction, neither chart accepts or generates an ANTHROPIC_API_KEY. Tokens flow exclusively through ExternalSecret-managed K8s Secrets that ESO materializes from bp-openbao at install time. Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every observability toggle defaults `false` (ServiceMonitor / metrics sidecar / PodMonitor) and is operator-tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (with --api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. bp-anthropic-adapter additionally tests the never-:latest gate via Case 4 (empty image tag must fail render). Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every upstream version, namespace, server URL, role, secret name, model default, and toggle is exposed under values.yaml. Cluster overlays in clusters/<sovereign>/ may override without rebuilding the Blueprint OCI artifact. Per docs/BLUEPRINT-AUTHORING.md §11.1 (umbrella shape — hard contract): bp-temporal and bp-llm-gateway declare their upstream charts under Chart.yaml dependencies: so helm dependency build bundles the upstream payload into the OCI artifact. bp-anthropic- adapter is a scratch chart (no upstream Helm chart exists) and includes sigstore/common as the obligatory hollow-chart-gate dependency, matching the bp-vllm precedent from W2.5.D (#283). Closes #267 Closes #268 Closes #271 helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 19:37:19 +04:00
e3mrah	a6bf07b0ce	feat(charts): bp-librechat wrapper chart (closes #275 ) (#287 ) W2.5.G — Catalyst-authored scratch chart for LibreChat (slot 48 of the omantel-1 bootstrap-kit). LibreChat upstream does not publish a Helm chart, so this chart hand-wires the official ghcr.io/danny-avila/librechat container as Deployment + Service + Ingress + ConfigMap + ServiceAccount + NetworkPolicy + ServiceMonitor + HPA, with the sigstore/common library subchart declared to satisfy the hollow-chart gate (issue #181). Per docs/BLUEPRINT-AUTHORING.md §11.2: every observability toggle (serviceMonitor, hpa) defaults false; opt-in via per-cluster overlay once kube-prometheus-stack reconciles. The ServiceMonitor template is double-gated by .Values.serviceMonitor.enabled AND Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so flipping the toggle on a too-early Sovereign cannot break the bp-librechat reconcile. Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every endpoint URL, model name, secret reference, namespace selector, and image tag is operator-tunable via values.yaml. The Sovereign FQDN, Keycloak issuer, llm-gateway URL, embeddings URL, and TLS ClusterIssuer are all operator-supplied at install time. The image tag is pinned to v0.7.5 (no :latest). Connectors: - Chat completions: bp-llm-gateway (OpenAI-compatible /v1/chat/completions) exposed as a "custom" endpoint named "Catalyst LLM" - Embeddings (RAG): bp-bge — provider=bge maps to EMBEDDINGS_PROVIDER=openai + RAG_OPENAI_BASEURL=<bge.svc> at template-render time - SSO: bp-keycloak (OpenID Connect) — issuer/clientId from values, client secret + session secret from ExternalSecret - Conversation store: FerretDB on bp-cnpg (MongoDB wire protocol over Postgres) — operator-supplied connection URI Hosted at chat-app.<sovereign-fqdn>; the chart `fail`s render if ingress.host is empty (no platform-wide default). helm template (default values, --set ingress.host=...): ConfigMap, Deployment, Ingress, NetworkPolicy, Service, ServiceAccount helm template (--set hpa.enabled=true serviceMonitor.enabled=true --api-versions monitoring.coreos.com/v1): ConfigMap, Deployment, HorizontalPodAutoscaler, Ingress, NetworkPolicy, Service, ServiceAccount, ServiceMonitor helm lint: 1 chart(s) linted, 0 chart(s) failed (single INFO on missing icon — icons land with the marketplace card work). tests/observability-toggle.sh: PASS on default-off, opt-in (--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. Path isolation: only platform/librechat/ — no HR slot files, blueprint-release.yaml, or other charts touched. The HR slot files (clusters/.../48-librechat.yaml) and blueprint-release.yaml will land in a separate slot-wiring PR per the W2.K4 expansion plan. Closes #275 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:56:59 +04:00
e3mrah	9dc8506dd9	feat(charts): bp-external-secrets + bp-cnpg + bp-valkey wrapper charts (#285 ) Storage-substrate batch (W2.5.A) — closes #254 by shipping the three upstream-subchart umbrella Blueprints that the Flux HRs at clusters/_template/bootstrap-kit/{15-external-secrets,16-cnpg,17-valkey} .yaml (merged via PR #262) target. Each chart follows the canonical umbrella pattern documented in docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream chart under `dependencies:` so `helm dependency build` bundles the upstream payload into the OCI artifact, and Catalyst-curated overlay values + templates sit alongside in chart/values.yaml + chart/templates/. Per-chart highlights: - bp-external-secrets/1.0.0 — wraps external-secrets/external-secrets 0.10.7. Ships a default `vault-region1` ClusterSecretStore (via Helm post-install/post-upgrade hook to defer the CR application until the upstream chart's CRDs are registered) wired to the in-cluster bp-openbao service. clusterSecretStore.enabled toggle lets cluster overlays opt out and author their own multi-region CRs. - bp-cnpg/1.0.0 — wraps cnpg/cloudnative-pg 0.28.0. Operator-only surface (Cluster CRs are per-Application). CRDs ship in-chart so bp-powerdns / bp-keycloak / bp-gitea / bp-langfuse / bp-grafana / bp-temporal / bp-matrix / bp-llm-gateway / bp-bge / bp-nemo-guardrails / bp-openmeter / pool-domain-manager can `dependsOn: bp-cnpg` via Flux — closing #254 (bp-powerdns CreateContainerConfigError on pdns-pg-app secret). - bp-valkey/1.0.0 — wraps bitnami/valkey 5.5.1. BSD-3 Redis-compatible cache, replication architecture, password auth ON, NetworkPolicy ON, replicas 0 by default for solo Sovereigns (cluster overlays bump for HA). Application-tier cache only — Catalyst control plane uses NATS JetStream KV (per ARCHITECTURE.md §5). Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every observability toggle defaults `false` (ServiceMonitor / PodMonitor / PrometheusRule / metrics sidecar) and is operator-tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every upstream version, namespace, server URL, role, and password toggle is exposed under values.yaml. Cluster overlays in clusters/<sovereign>/ may override without rebuilding the Blueprint OCI artifact. helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only) helm template default render kinds: bp-external-secrets: ClusterRole, ClusterRoleBinding, ClusterSecretStore, CustomResourceDefinition, Deployment, Role, RoleBinding, Secret, Service, ServiceAccount, ValidatingWebhookConfiguration bp-cnpg: ClusterRole, ClusterRoleBinding, ConfigMap, CustomResourceDefinition, Deployment, MutatingWebhookConfiguration, Service, ServiceAccount, ValidatingWebhookConfiguration bp-valkey: ConfigMap, NetworkPolicy, PodDisruptionBudget, Secret, Service, ServiceAccount, StatefulSet Closes #254 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 18:39:29 +04:00
e3mrah	ba2ff05292	feat(charts): bp-seaweedfs + bp-harbor + bp-vpa wrapper charts (#284 ) W2.5.B — first authoring of the three Catalyst Blueprint wrapper charts that fill bootstrap-kit slots 18 (seaweedfs), 19 (harbor) and 29 (vpa). Each wraps an upstream chart as a Helm subchart and ships Catalyst- curated overlay templates (NetworkPolicy + ServiceMonitor) gated behind opt-in toggles, per docs/BLUEPRINT-AUTHORING.md §11 and docs/INVIOLABLE-PRINCIPLES.md. bp-seaweedfs (slot 18 — storage foundation) - Wraps seaweedfs/seaweedfs 4.22.0; Chart name `bp-seaweedfs`. - Catalyst defaults: 1 master + 3 volume + 1 filer + 2 s3 replicas. - S3 API on 8333 — single S3 surface every consumer talks to per docs/PLATFORM-TECH-STACK.md §3.5 (no per-app MinIO). - Overlay templates: NetworkPolicy (cross-namespace S3 reachability, cold-tier egress allowlist), ServiceMonitor (Capabilities-gated, DEFAULT FALSE per §11.2). - Default helm template kinds: ClusterRole, ClusterRoleBinding, ConfigMap, Deployment, Secret, Service, ServiceAccount, StatefulSet. bp-harbor (slot 19 — per-Sovereign OCI registry) - Wraps goharbor/harbor 1.18.3 (appVersion 2.14.3); Chart name `bp-harbor`. - Catalyst defaults: blob backend = SeaweedFS S3 (regionendpoint seaweedfs-s3.seaweedfs.svc:8333), metadata DB = bp-cnpg external Postgres, ingress class `cilium`, expose.tls.enabled true (cert- manager-issued Secret). - Overlay templates: NetworkPolicy (CNPG/SeaweedFS/Keycloak egress), ServiceMonitor (Capabilities-gated, DEFAULT FALSE). - Trivy + SSO + pull-mirror are operator-flag opt-ins per per- Sovereign overlay (default false; trivy/keycloak/cnpg deps land on later slots). - Default helm template kinds: ConfigMap, Deployment, Ingress, PersistentVolumeClaim, Secret, Service, StatefulSet. bp-vpa (slot 29 — vertical autoscaling) - Wraps cowboysysop/vertical-pod-autoscaler 11.1.1 (appVersion 1.5.0); Chart name `bp-vpa`. - Catalyst defaults: 1 replica each of recommender + updater + admission-controller. Default mode `Off` (recommend only). - Admission webhook self-signs via init Job (cluster-internal); per- Sovereign overlay MAY swap to cert-manager. - Overlay templates: NetworkPolicy (apiserver + metrics-server egress, admission webhook ingress). - Upstream metrics.serviceMonitor / metrics.prometheusRule defaulted false per §11.2. - Default helm template kinds: ClusterRole, ClusterRoleBinding, ConfigMap, Deployment, Job, Pod, Secret, Service, ServiceAccount. Lint + observability-toggle results helm lint: 1 chart(s) linted, 0 chart(s) failed (each) tests/observability-toggle.sh: PASS on all three (default render has zero monitoring.coreos.com/v1 references; opt-in render produces a ServiceMonitor; explicit-off render is clean). Path isolation: only platform/seaweedfs/, platform/harbor/, and platform/vpa/ — no HR slot files or other charts touched. Refs: bootstrap-kit slots 18, 19, 29 reconcile against ghcr.io/openova-io/bp-seaweedfs:1.0.0, bp-harbor:1.0.0, bp-vpa:1.0.0 which this commit produces on next blueprint-release CI run. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 18:37:50 +04:00
e3mrah	c3c9c0cf27	feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283 ) Catalyst-authored umbrella charts for the W2.5.D AI-inference stack. None of the three upstream projects publish a Helm chart, so each chart hand-wires the upstream container as Deployment + Service + ConfigMap + ServiceMonitor + NetworkPolicy + HPA, with the sigstore/common library subchart declared to satisfy the hollow-chart gate (issue #181). bp-vllm (slot 39) — wraps vllm/vllm-openai:v0.6.4. GPU-aware (nvidia.com/gpu when vllm.gpu.enabled=true; CPU fallback for dev). Default model meta-llama/Llama-3.1-8B-Instruct, port 8000, OpenAI-compatible /v1/chat/completions. All engine knobs (maxModelLen, gpuMemoryUtilization, dtype, quantization, tensorParallelSize, prefix-caching) overlay-tunable. Closes #266. bp-bge (slot 42) — wraps ghcr.io/huggingface/text-embeddings-inference:cpu-1.5. Default model BAAI/bge-small-en-v1.5 + BAAI/bge-reranker-base sidecar in same Pod. Two-port Service (8080 embed, 8081 rerank) annotated for bp-llm-gateway discovery. CPU-friendly defaults; overlay swaps in BAAI/bge-m3 on GPU Sovereigns. Closes #269. bp-nemo-guardrails (slot 43) — wraps the upstream NVIDIA/NeMo-Guardrails Dockerfile (nemoguardrails server, FastAPI, port 8000). LLM endpoint + model + engine all overlay-tunable; Colang flow bundle mounts via configMap.externalName for production rails. ConfigMap stub renders a default rail for smoke testing. Closes #270. All three charts: - Default observability toggles to false per BLUEPRINT-AUTHORING.md §11.2 - Pin upstream image tags (no :latest) per INVIOLABLE-PRINCIPLES.md #4 - Non-root securityContext (runAsUser 1000, drop ALL capabilities) - prometheus.io scrape annotations on the Pod for fallback discovery - Operator-tunable NetworkPolicy gating ingress to bp-llm-gateway and egress to HuggingFace / bp-vllm / bp-bge as appropriate helm template (default values) per chart: bp-vllm: ConfigMap, Deployment, Service, ServiceAccount bp-bge: ConfigMap, Deployment, Service, ServiceAccount bp-nemo-guardrails: ConfigMap, Deployment, Service, ServiceAccount helm template (--set serviceMonitor.enabled=true networkPolicy.enabled=true hpa.enabled=true): All three render ConfigMap + Deployment + Service + ServiceAccount + ServiceMonitor + NetworkPolicy + HorizontalPodAutoscaler. helm lint: 0 chart(s) failed for all three (single INFO on missing icon — icons land with the marketplace card work). Closes #266 Closes #269 Closes #270 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:37:07 +04:00
e3mrah	0cfd0defa9	fix(bp-langfuse): drop apostrophe from description to clear GHCR 500 (resolves #215 ) (#278 ) Root cause: Helm's `helm push` collapses the chart `description` field into a single-line OCI manifest annotation `org.opencontainers.image.description`. The GHCR manifest-PUT validator returns a deterministic 500 Internal Server Error when that annotation is long AND contains an ASCII apostrophe. bp-langfuse 1.0.0 was the only chart in the observability batch (PR #214) carrying both characteristics, so it was the only one that failed to publish. Fix: reword the affected sentence from "Langfuse's persistent state" to "the Langfuse persistent state" — drops the apostrophe, preserves the meaning, and crucially preserves every byte of the actual chart payload (values, templates, all 350 entries of the upstream langfuse-1.5.28 subchart with its 4-level-deep Bitnami vendoring). No runtime behavioural change; helm template renders the exact same 6 resources across 490 lines. The narrowing was done by progressively reducing the Chart.yaml from the failing version to a passing version while pushing to a scratch GHCR namespace, with the bp-langfuse repo deleted between attempts (verified via `DELETE /orgs/openova-io/packages/container/bp-langfuse` and re-querying). The trigger is reproducible: long description + apostrophe → 500; long description without apostrophe → push succeeds; short description with apostrophe → push succeeds. Added a multi-line WARNING comment immediately above `description:` documenting the trigger so future authors do not reintroduce a possessive form. Issue #215 captures the full reproduction. Closes #215 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 17:31:51 +04:00
e3mrah	ec3821f7e1	fix(bp-): event-driven HR install -- drop blanket timeout, use disableWait (#250 ) Helm install completes when manifests apply, not when pods reach Ready. Flux dependsOn checks Ready=True on each HR independently, so spec.install.disableWait + spec.upgrade.disableWait is the correct shape for slow-Ready workloads. Blanket spec.timeout: Nm watchdogs from PR #221 were a band-aid that caused cascading HR failures and blocked downstream HRs (bp-nats-jetstream, bp-openbao depended on bp-spire). Founder direction (verbatim): "always event driven robust jobs" Per-HR audit (drop spec.timeout: 15m, add disableWait, with reason): - bp-cilium: envoyconfig CRD self-wait — agent crash-loops until its own CRDs land - bp-cert-manager: webhook readiness depends on cainjector mutating Secret — multi-minute on cold start - bp-flux: adopts cloud-init Flux objects; the helm-controller reconciling THIS HR is itself a chart target — Ready deadlock without disableWait - bp-sealed-secrets: single-replica controller + CRD — install completes on manifest apply - bp-spire: spire-controller-manager waits for CRD informer cache sync — multi-minute legitimate path; chart fix below - bp-nats-jetstream: JetStream raft quorum formation across N replicas - bp-openbao: 3-node Raft sealed-by-default; Ready=True only after operator runs `bao operator init` unseal flow - bp-keycloak: DB schema migration + 100+ Liquibase changesets on first install - bp-gitea: PostgreSQL DB init + admin user + Blueprint catalog mirror seeding - bp-external-dns: pod readiness depends on PowerDNS API + pdns-pg CNPG cascade - bp-catalyst-platform: ~10 services, inter-service NATS/OTel readiness is not Helm's concern Intentionally NOT touched (other parallel agents own these): - bp-crossplane (Agent A): chart split for intra-chart CRD-ordering - bp-powerdns (Agent D): post-install hook for intra-chart Job-ordering bp-spire chart fix (1.1.3 -> 1.1.4): Root cause investigation on otech.omani.works (live): spire-controller-manager has restarted 37 times with: "failed to wait for clusterstaticentry caches to sync: timed out waiting for cache to be synced for Kind v1alpha1.ClusterStaticEntry" `kubectl get crd \| grep spire` returns nothing — the spire.spiffe.io v1alpha1 CRDs (ClusterSPIFFEID / ClusterStaticEntry / ClusterFederatedTrustDomain) are NOT registered. The upstream `spire` chart does not install its own CRDs; the spiffe maintainers ship them via the SEPARATE `spire-crds` chart, expected to be installed first. Fix: platform/spire/chart/Chart.yaml now declares spire-crds 0.5.0 as the FIRST dependency. Helm installs subcharts in dependency order, so listing spire-crds first guarantees CRDs are applied before the spire subchart's controller-manager Deployment starts. blueprint.yaml + both 06-spire.yaml cluster references bumped to 1.1.4. Live error this fixes (otech.omani.works, persistent ~5h): Helm upgrade failed for release spire-system/spire with chart bp-spire@1.1.3: context deadline exceeded + downstream cascade: bp-nats-jetstream / bp-openbao stuck at "dependency 'flux-system/bp-spire' is not ready" Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:55:19 +04:00
e3mrah	726af6df81	fix(bp-powerdns): self-generate api-credentials Secret + disable upstream zone-bootstrap Job (#248 ) Root cause investigation on otech.omani.works (kubectl, sanitized): $ kubectl get pods -n powerdns create-zone-if-not-exist-sh-tjtr4 0/1 CreateContainerConfigError 4h powerdns-57d7d49f99-{9hrb4,lxlgt,nkmht} 0/1 CreateContainerConfigError 4h dnsdist-594dbfc5f-wznsw 1/1 Running 4h $ kubectl get secrets -n powerdns powerdns Opaque 1 4h powerdns-api-tls-8kxpx Opaque 1 4h (NO `powerdns-api-credentials`, NO `pdns-pg-app`) $ kubectl describe pod ... powerdns-57d7d49f99-9hrb4 Environment: PDNS_API_KEY: <set to the key 'api-key' in secret 'powerdns-api-credentials'> Optional: false PDNS_DB_HOST: <set to the key 'host' in secret 'pdns-pg-app'> Optional: false State: Waiting Reason: CreateContainerConfigError The handover's chicken-egg-with-secret theory was directionally right but the cause was more fundamental: 1. Wrapper chart's api-credentials-secret.yaml (1.1.2) was a no-op unless operator set `apiKey` value out-of-band — comment said the deployment would "fail to start until the named Secret exists" as "the explicit signal we want". On a Sovereign that bootstraps from bp-* OCI artifacts, no operator is standing by, so the Secret is never created and pods sit in CreateContainerConfigError forever. 2. The upstream chart's `create-zone-if-not-exists-sh` Job is rendered whenever both `zoneName` and `api.key` are set — defaulting `zoneName: "example.de."` it ALWAYS rendered and ALWAYS failed (same missing Secret). Catalyst doesn't want this Job at all because zones are loaded later by pool-domain-manager (PDM). 3. The chart's CNPG Cluster template is gated behind Capabilities.APIVersions.Has "postgresql.cnpg.io/v1" — on a fresh Sovereign without bp-cnpg yet (bp-cnpg is on the roadmap, not in bootstrap-kit), no Cluster is rendered and `pdns-pg-app` Secret never materialises. With Helm `--wait`, install times out ("context deadline exceeded") even though the manifests applied cleanly. Fix: * api-credentials-secret.yaml: self-generate via Helm `lookup` + `randAlphaNum 32`. First install creates fresh randoms; every subsequent reconcile reads back the existing values from the Secret so the API key never rotates on upgrade. Operator can still pin specific values via .Values.powerdns.apiKey / .Values.powerdns.webserverPassword, or skip Secret creation entirely via .Values.powerdns.useExistingApiSecret. Same pattern as bitnami/postgresql, bitnami/keycloak. * values.yaml: set `powerdns.zoneName: ""` so upstream chart's `{{- if and .Values.powerdns.zoneName .Values.powerdns.api.key }}` gate skips the create-zone Job entirely. Catalyst's PDM creates zones via the REST API after the cluster comes up; we don't want a placeholder `example.de.` zone in production. * HelmRelease (both _template and otech.omani.works overlays): `install.disableWait: true` + `upgrade.disableWait: true` so the HelmRelease reports Ready as soon as manifests apply cleanly, rather than gating on powerdns Deployment readiness which depends on bp-cnpg landing first to synthesise `pdns-pg-app`. Runtime convergence is observed via kubectl, not gated on Helm. Live error this addresses: Helm upgrade failed for release powerdns/powerdns with chart bp-powerdns@1.1.2: context deadline exceeded Verified locally with `helm template`: - powerdns-api-credentials Secret renders with random api-key + webserver-password - create-zone-if-not-exist-sh Job no longer rendered - Deployment env continues to reference powerdns-api-credentials correctly Bumped 1.1.2 -> 1.1.3 (chart, blueprint, both bootstrap-kit overlays). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:55:12 +04:00
e3mrah	2d1799d738	fix(bp-crossplane): split XRDs+Compositions into bp-crossplane-claims (#247 ) Resolves install ordering on fresh clusters where the apiserver rejects CompositeResourceDefinition CRs because the apiextensions.crossplane.io CRDs registered by the crossplane subchart aren't live yet at apply time. - bp-crossplane bumped 1.1.2 -> 1.1.3 (controller-only payload) - NEW bp-crossplane-claims@1.0.0 carries XRDs + Compositions - Flux HelmRelease for crossplane-claims uses dependsOn: [bp-crossplane] - composition-validate.sh + fixtures relocate to the new chart - blueprint-release CI: opt-out annotation catalyst.openova.io/no-upstream=true permits zero-deps charts that legitimately ship only Catalyst-authored CRs (the original hollow-chart rule remains in force for every other umbrella chart) Live error this fixes (from otech.omani.works): no matches for kind "CompositeResourceDefinition" in version "apiextensions.crossplane.io/v1" -- ensure CRDs are installed first Pattern: intra-chart CRD-ordering breaks -> split charts + Flux dependsOn. Apply universally to similar cases going forward. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:55:05 +04:00
e3mrah	f658757962	fix(bp-crossplane): resolve CHART_DIR to absolute path in composition-validate.sh (#237 ) CI invokes the script as `bash <script> "platform/crossplane/chart"` from the repo root. The script then `cd`s into that relative path, which works, but every later `"$CHART_DIR/<sub>"` reference (notably FIXTURE_DIR for Case 6) inherits the now-stale relative prefix and resolves under the wrong cwd. Fix: resolve CHART_DIR via `(cd ... && pwd)` to an absolute path BEFORE the chdir. Local repro before fix: $ bash platform/crossplane/chart/tests/composition-validate.sh \ platform/crossplane/chart ... Case 6: every fixture XRC kind is matched by an XRD FAIL: fixtures dir platform/crossplane/chart/tests/fixtures missing Local result after fix: $ bash platform/crossplane/chart/tests/composition-validate.sh \ platform/crossplane/chart ... Case 6: every fixture XRC kind is matched by an XRD PASS All bp-crossplane Day-2 CRUD Composition gates green. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 09:36:07 +02:00
e3mrah	8592d20919	feat(bp-crossplane): 6 XRDs + Compositions for Day-2 CRUD (RegionClaim/ClusterClaim/NodePoolClaim/LoadBalancerClaim/PeeringClaim/NodeActionClaim) (#236 ) Adds the 6 CompositeResourceDefinitions and matching Compositions that back the catalyst-api Day-2 CRUD endpoints. catalyst-api writes XRCs of these kinds; Crossplane materialises them into provider-hcloud (and a small number of provider-kubernetes) managed resources. Per docs/INVIOLABLE-PRINCIPLES.md #3, every cloud-side op flows through provider-hcloud — never bespoke hcloud-go calls or shell-outs to the hcloud CLI. XRDs (canonical group: compose.openova.io/v1alpha1): - RegionClaim → composes the Phase-0 quartet via provider-hcloud: Network + NetworkSubnet + Firewall + Server (cp1) + LoadBalancer + LoadBalancerNetwork + LoadBalancerService×2 + LoadBalancerTarget. Mirrors infra/hetzner/main.tf 1:1 so deletion of a RegionClaim cascades the whole slice. - ClusterClaim → composes a provider-kubernetes Object that materialises a cluster-identity ConfigMap. The catalyst-environment-controller reads the CM to template per-server cloud-init. - NodePoolClaim → composes up to 100 provider-hcloud Server resources. UPDATE flow: patching replicas n→m flips the per-index Required-policy gate so Crossplane creates/deletes Server CRs. - LoadBalancerClaim → composes provider-hcloud LoadBalancer + LoadBalancerNetwork + up to 50 LoadBalancerService entries (per listener) + up to 50 LoadBalancerTarget entries. UPDATE: patch listeners[]/targets[] → composite controller adds/removes services/targets. - PeeringClaim → composes 1 or 2 provider-hcloud Route resources (bidirectional flag toggles the second one through a Required-policy gate). - NodeActionClaim → composes a provider-kubernetes Object that creates a batch/v1 Job running kubectl cordon/drain (k8s-side op, not a cloud op, per the task spec). action=replace additionally composes a provider-hcloud Server for the replacement node. UPDATE/DELETE summary: - UPDATE: every mutable schema field is patched onto the underlying managed resource; Crossplane's composite controller drives the diff and provider-hcloud reconciles to the new state. - DELETE: every composed resource has deletionPolicy: Delete, so a cascade delete of the composite tears down the whole resource graph in dependency-safe order (Crossplane retries until deps unblock). New tests: - tests/composition-validate.sh — 7 gates: helm renders cleanly, exactly 6 XRDs, ≥ 6 Compositions, all 6 expected claim kinds present, every rendered doc is valid YAML, every fixture references a real XRD, and (when KUBECONFIG + Crossplane CRDs available) server-side dry-run for every fixture. - tests/fixtures/<kind>-sample.yaml — one XRC fixture per kind. Version bump: - platform/crossplane/chart/Chart.yaml 1.1.1 → 1.1.2 - platform/crossplane/blueprint.yaml 1.1.1 → 1.1.2 - clusters/_template/bootstrap-kit/04-crossplane.yaml → 1.1.2 - clusters/otech.omani.works/bootstrap-kit/04-crossplane.yaml → 1.1.2 Hard rules respected: - provider-hcloud only for cloud ops (never hcloud-go, never CLI). - provider-kubernetes Object for k8s-side ops (never raw kubectl). - No bespoke kubectl manifests for cloud resources. - Frontend + catalyst-api Go code untouched (sibling-owned). - Target state, no MVP framing — all 6 Compositions ship. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 09:33:38 +02:00
e3mrah	c747fe2265	fix(bp-gitea): override postgresql to bitnamilegacy (Bitnami evacuated docker.io tags) (#231 ) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 08:27:49 +02:00
e3mrah	da87fb38c4	fix(bp-spire): disable ALL default-enabled clusterSPIFFEIDs (default+oidc+test-keys) (#230 ) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 08:13:41 +02:00
e3mrah	719c3bac35	fix(bp-spire): disable default ClusterSPIFFEID — CRD not observable in time on fresh install (#228 ) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 07:51:03 +02:00
e3mrah	1689ffcd1a	fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220 ) Both charts are scratch (no upstream Helm chart published — Coraza project + anchore/syft+grype CLIs ship containers only). The blueprint-release.yaml hollow-chart gate (issue #181) rejects charts with zero declared dependencies. Adding sigstore/common as a tiny library subchart satisfies the gate; common is a library type so it contributes zero runtime resources to either chart's rendered output. The Catalyst-side templates (Deployment+Service for bp-coraza, CronJob+PVC for bp-syft-grype) remain entirely in templates/ — the library dep is purely a CI-gate mechanism, NOT a functional dependency. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 06:15:28 +02:00
e3mrah	3a57e287e5	feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216 ) * feat(bp-falco): umbrella chart for security layer Catalyst Blueprint umbrella chart for falco — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-kyverno): umbrella chart for security layer Catalyst Blueprint umbrella chart for kyverno — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-trivy): umbrella chart for security layer Catalyst Blueprint umbrella chart for trivy — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-sigstore): umbrella chart for security layer Catalyst Blueprint umbrella chart for sigstore — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-syft-grype): umbrella chart for security layer Catalyst Blueprint umbrella chart for syft-grype — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-reloader): umbrella chart for security layer Catalyst Blueprint umbrella chart for reloader — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-coraza): umbrella chart for security layer Catalyst Blueprint umbrella chart for coraza — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-litmus): umbrella chart for security layer Catalyst Blueprint umbrella chart for litmus — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-30 06:07:38 +02:00
e3mrah	75128781b3	feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214 ) * feat(bp-grafana): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana — visualization layer of the LGTM observability stack (Loki/Grafana/Tempo/Mimir). Pinned to grafana/grafana 10.5.15 (appVersion 12.3.1) — current stable on 2026-04-29. Solo-Sovereign defaults: 1 replica, 10Gi PVC, ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-loki): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Loki — log aggregation backend of the LGTM stack. SingleBinary mode by default (solo-Sovereign min); SimpleScalable/Distributed are values toggles. Pinned to grafana/loki 7.0.0 (appVersion 3.6.7) on 2026-04-29. Filesystem storage default; SeaweedFS S3 wiring is per-Sovereign overlay when scaling out. All observability toggles default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-mimir): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Mimir — metrics storage tier of the LGTM stack. Pinned to grafana/mimir-distributed 6.0.6 (appVersion 3.0.4) on 2026-04-29. Solo-Sovereign defaults: every component scaled to 1 replica, zoneAwareReplication disabled, Kafka ingest-storage disabled. Bundled MinIO kept enabled as a stop-gap so the chart renders; SeaweedFS S3 wiring is per-Sovereign overlay. All metaMonitoring toggles default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-tempo): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Tempo — distributed tracing backend of the LGTM stack. Single-binary mode by default (solo-Sovereign min); microservice mode (tempo-distributed) is a chart swap toggle. Pinned to grafana/tempo 1.24.4 (appVersion 2.9.0) on 2026-04-29. Local PVC storage default; SeaweedFS S3 wiring is per-Sovereign overlay. Metrics generator disabled by default (depends on bp-mimir). ServiceMonitor default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-alloy): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Alloy — unified telemetry collector for the LGTM stack (logs, metrics, traces; OTLP-native). Pinned to grafana/alloy 1.8.0 (appVersion v1.16.0) on 2026-04-29. DaemonSet controller default (one Alloy per node) so node + container telemetry work out of the box. Empty Alloy config by default; per-Sovereign overlays populate forwarders to bp-loki/bp-mimir/bp-tempo once those reconcile. ServiceMonitor + ingress + CRDs default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-opentelemetry): umbrella chart for observability stack Catalyst Blueprint umbrella for the OpenTelemetry Collector — vendor- neutral telemetry collector. Sibling to bp-alloy; per-Sovereign overlays choose one. Pinned to open-telemetry/opentelemetry-collector 0.152.0 (appVersion 0.150.1) on 2026-04-29. Uses the contrib distribution (otel/opentelemetry-collector-contrib:0.150.1) so Loki/Mimir/Tempo exporters are bundled. Deployment mode default (1 replica); DaemonSet + StatefulSet are values toggles. All presets default false; ingress + ServiceMonitor + PodMonitor + PrometheusRule + NetworkPolicy default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-langfuse): umbrella chart for observability stack Catalyst Blueprint umbrella for Langfuse — LLM observability platform. Complements bp-grafana (infrastructure metrics) with AI-specific telemetry (traces, evaluations, prompts, cost attribution). Pinned to langfuse/langfuse 1.5.28 (appVersion 3.171.0) on 2026-04-29. Catalyst convention: ALL bundled Bitnami subcharts are disabled — PostgreSQL via cnpg.io/Cluster (bp-cnpg), Redis via bp-valkey, ClickHouse via bp-clickhouse, S3 via bp-seaweedfs. Per-Sovereign overlays wire external endpoints + Secret references. Telemetry to Langfuse Inc. defaulted false; signUpDisabled defaulted true. Part of issue #204 observability-stack umbrellas batch. * feat(bp-velero): umbrella chart for observability stack Catalyst Blueprint umbrella for Velero — Kubernetes-native backup and disaster recovery. Per platform/velero/README.md, ALL Velero output goes to SeaweedFS (Catalyst's unified S3 encapsulation), which transitions to a cloud archival backend on the cold tier. Pinned to vmware-tanzu/velero 12.0.1 (appVersion 1.18.0) on 2026-04-29. Bundled velero-plugin-for-aws:v1.14.0 init container so SeaweedFS S3 is reachable. backupsEnabled/snapshotsEnabled defaulted false at this layer (placeholders for backupStorageLocation); per-Sovereign overlays flip on after wiring SeaweedFS endpoint + credentials. ServiceMonitor + PodMonitor + PrometheusRule default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-29 22:11:04 +02:00
e3mrah	fa0e3a494b	fix(bp-keycloak): pin to current Bitnami tag (closes #191 ) (#198 ) * fix(bp-keycloak): pin to current Bitnami Keycloak tag (closes #191) Bitnami consolidated their tag scheme around 2025-09 (see https://github.com/bitnami/charts/issues/30852). The chart was pinned to upstream bitnami/keycloak Helm chart 24.7.1, whose default image tag `bitnami/keycloak:26.2.4-debian-12-r0` now returns 404 in the Docker Hub registry — installs hit ImagePullBackOff (verified on omantel). Changes: - Upstream Bitnami chart: 24.7.1 -> 25.2.0 (latest, appVersion 26.3.3) - Override image.registry/image.repository for every Bitnami image used by the chart (keycloak app, keycloak-config-cli, postgresql, postgres-exporter, os-shell) to point at `bitnamilegacy/`, where the historic debian-12 tags are preserved - Replace deprecated `proxy: edge` with `proxyHeaders: "xforwarded"` (chart 25.x renamed the field; Catalyst fronts Keycloak with Cilium Gateway which sets X-Forwarded- headers) - bp-keycloak chart version: 1.1.1 -> 1.1.2 Verification (registry HEAD via Bearer token): bitnami/keycloak:26.2.4-debian-12-r0 -> 404 (broken pin) bitnami/keycloak:26.3.3-debian-12-r0 -> 404 (registry move) bitnamilegacy/keycloak:26.3.3-debian-12-r0 -> 200 bitnamilegacy/keycloak-config-cli:6.4.0-... -> 200 bitnamilegacy/postgresql:17.6.0-debian-12-r0 -> 200 bitnamilegacy/postgres-exporter:0.17.1-... -> 200 bitnamilegacy/os-shell:12-debian-12-r50 -> 200 `helm template platform/keycloak/chart` renders cleanly; rendered images all resolve to bitnamilegacy/* tags listed above. Long-term follow-up (not blocking): bitnamilegacy is explicitly marked "no longer updated, may be removed in the future" — Catalyst should either build its own Keycloak image or migrate to the Bitnami Secure Image (BSI/Photon) catalog when chart support catches up. Tracked in the bp-keycloak description block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-keycloak): bump blueprint.yaml version to match Chart.yaml 1.1.2 --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:10:17 +02:00
e3mrah	bcd2e7980a	fix: hide CRD-emitting resources behind Capabilities gates (closes #190 ) (#200 ) * fix(bp-external-dns): hide CRD-emitting resources behind Capabilities gates (refs #190) Wrap the Catalyst overlay's ServiceMonitor and ExternalSecret templates in `.Capabilities.APIVersions.Has` checks so a cold install on a fresh Sovereign — where bp-kube-prometheus-stack and bp-external-secrets have not yet reconciled — no longer fails with `no matches for kind X in version Y`. The values toggles (`externalDns.serviceMonitor.enabled`, `externalDns.externalSecret.enabled`) remain — Capabilities is defense in depth so an operator flipping the toggle on a Sovereign that hasn't reached Phase 2 doesn't break the bp-external-dns reconcile. Verified locally: `helm template` with toggles off renders 0 of these resources; with toggles ON and `--api-versions monitoring.coreos.com/v1 --api-versions external-secrets.io/v1beta1` both render exactly once. Bump version 1.1.0 → 1.1.2 to align with the Phase-1 architectural-fix wave from issue #190. * fix(bp-powerdns): hide CRD-emitting resources behind Capabilities gates (refs #190) Three Catalyst overlay templates emit resources whose CRDs ship in OTHER charts and were unconditionally rendered, causing a cold install of bp-powerdns to fail with `no matches for kind X` on a Sovereign that hasn't yet reconciled the upstream chart: - cnpg-cluster.yaml → postgresql.cnpg.io/v1 Cluster (CRD ships in bp-cnpg) - api-ingress.yaml → traefik.io/v1alpha1 Middleware (CRD ships with the Traefik controller; k3s ships it by default but a Sovereign overlay MAY disable Traefik in favour of cilium-only ingress) - crossplane-floatingip.yaml → compose.openova.io/v1alpha1 HetznerFloatingIP (CRD ships when the Catalyst Crossplane composition family lands — see GAP DISCLOSURE in that template) Each is wrapped in `.Capabilities.APIVersions.Has "<group>/<version>"`. The Traefik router-middleware annotation on the Ingress is similarly gated so the auth posture cleanly moves to the Sovereign's chosen ingress controller when Traefik is absent. Verified locally: `helm template` with default values renders 0 of these resources; with `--api-versions postgresql.cnpg.io/v1 --api-versions traefik.io/v1alpha1 --api-versions compose.openova.io/v1alpha1` plus `--set crossplane.floatingIP.enabled=true`, all three render exactly once. Existing tests/observability-toggle.sh still passes. Bump version 1.1.1 → 1.1.2. * fix(bp-powerdns): bump blueprint.yaml to match Chart.yaml 1.1.2 after Capabilities gate work --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-04-29 20:10:14 +02:00
e3mrah	1f5c76def1	fix(platform): sync blueprint.yaml versions with Chart.yaml (#199 ) * feat(ui): Playwright cosmetic + step-flow regression guards 15 regression guards in products/catalyst/bootstrap/ui/e2e/cosmetic- guards.spec.ts that fail HARD when each user-flagged defect class returns: 1. card height drift from canonical 108px 2. reserved right padding eating description width 3. logo tile drift from per-brand LOGO_SURFACE 4. invisible glyph (white-on-white) via luminance proxy 5. wizard step order Org/Topology/Provider/Credentials/Components/ Domain/Review 6. legacy "Choose Your Stack" / "Always Included" tab labels 7. Domain step reachable before Components 8. CPX32 not the recommended Hetzner SKU 9. per-region SKU dropdown shows wrong provider catalog 10. provision page is .html (static) not SPA route 11. legacy bubble/edge DAG SVG markup on provision page 12. admin sidebar drift from canonical core/console (w-56 + 7 labels) 13. AppDetail uses tablist instead of sectioned layout 14. job rows navigate to /job/<id> instead of expand-in-place 15. Phase 0 banners (Hetzner infra / Cluster bootstrap) on AdminPage Each test prints a failure message naming the canonical reference, the source-of-truth file, and the data-testid PR needed (if any) so the implementing agent has a precise target. No .skip() — per INVIOLABLE-PRINCIPLES #2, missing components fail loud. CI: .github/workflows/cosmetic-guards.yaml runs the suite on every PR that touches products/catalyst/bootstrap/ui/ or core/console/. Docs: docs/UI-REGRESSION-GUARDS.md maps each test to the user's original complaint, the canonical reference, and the green/red semantics (5 tests intentionally RED on main today — they stay red until the companion-agent's UI work lands). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(platform): sync blueprint.yaml versions with Chart.yaml so manifest-validation passes --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 22:07:55 +04:00
hatiyildiz	b0c1c07271	fix(bp-flux): align upstream flux2 version with cloud-init's flux install (no double-install destruction) Live verified on omantel.omani.works (2026-04-29). bp-flux:1.1.1 shipped the fluxcd-community `flux2` subchart at 2.13.0 (= upstream Flux appVersion 2.3.0). Cloud-init pre-installed Flux core at v2.4.0 via `https://github.com/fluxcd/flux2/releases/download/v2.4.0/install.yaml`. helm-controller's reconcile of bp-flux ran `helm install` on top of the running v2.4.0 Flux; the chart's v2.3.0 CRD update failed apiserver admission with `status.storedVersions[0]: Invalid value: "v1": must appear in spec.versions`; Helm rolled back; the rollback DELETED every running Flux controller Deployment (helm-controller, source-controller, kustomize-controller, image-automation-controller, image-reflector-controller, notification-controller). The cluster lost its GitOps engine — no further HelmRelease could progress, and the only recovery was full `tofu destroy` + reprovision. This is OPTION C of the architectural fix proposed in the incident memo: version-align cloud-init's flux2 install with the bp-flux umbrella chart's `flux2` subchart so a single upstream Flux release is installed and helm-controller adopts it on first reconcile rather than reinstalls on top with a different version. Changes: * `infra/hetzner/cloudinit-control-plane.tftpl` — kept the install.yaml URL pinned at v2.4.0 (deliberate; this is the source of truth) and added the CRITICAL VERSION-PIN INVARIANT comment block documenting the failure mode. * `platform/flux/chart/Chart.yaml` — bumped `flux2` subchart dep from 2.13.0 to 2.14.1. The community chart 2.14.1 carries appVersion 2.4.0, matching cloud-init exactly. Bumped chart version 1.1.1 -> 1.1.2. * `platform/flux/chart/values.yaml` — `catalystBlueprint.upstream .version` mirror of the dep pin moved from 2.13.0 to 2.14.1. * `clusters/_template/bootstrap-kit/03-flux.yaml` and `clusters/omantel.omani.works/bootstrap-kit/03-flux.yaml` — bumped bp-flux HelmRelease to 1.1.2 + added explicit `install.disableTakeOwnership: false`, `upgrade.disableTakeOwnership: false`, and `upgrade.preserveValues: true` so helm-controller adopts the cloud-init-installed Flux objects rather than rolling back on ownership conflict. * `products/catalyst/chart/Chart.yaml` — bumped bp-catalyst-platform umbrella 1.1.1 -> 1.1.2, with bp-flux dep bumped to 1.1.2. * `clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml` and `clusters/omantel.omani.works/bootstrap-kit/13-bp-catalyst-platform.yaml` — bumped HelmRelease to 1.1.2. * `platform/flux/chart/tests/version-pin-replay.sh` — NEW. Six-case catastrophic-failure replay test: Case 1: Chart.yaml declares the flux2 subchart with explicit version. Case 2: cloud-init pins flux2 install.yaml to an explicit v-tag. Case 3: chart's flux2 subchart appVersion equals cloud-init's pinned upstream version (the load-bearing invariant). Case 4: values.yaml metadata mirrors the Chart.yaml dep pin. Case 5: helm template renders cleanly + contains the four core Flux controllers. Case 6: replay test rejects a planted mismatched fake Chart.yaml (the gate's own self-test — proves the gate works). All six cases green locally; the new test joins the existing observability-toggle test in tests/. * `docs/RUNBOOK-PROVISIONING.md` — new section "bp-flux double-install — version-pin invariant" documenting the failure mode, the four pin-sites, the safe bump procedure, and the existing-Sovereign recovery path (full reprovision). Existing Sovereigns running 1.1.1: no in-place recovery is possible once the rollback has fired. Reprovision required against 1.1.2. Per docs/INVIOLABLE-PRINCIPLES.md #3 (architecture as documented) + #4 (never hardcode) — the version pins remain operator-bumpable via PR, but BOTH cloud-init's URL AND the chart's subchart MUST move together in the same PR; CI gate tests/version-pin-replay.sh enforces this. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:38:17 +02:00
hatiyildiz	4265884d58	feat(bp-external-dns): umbrella chart + add to bootstrap-kit Kustomization Convert platform/external-dns/chart/ from a metadata-only wrapper to a proper Helm umbrella that pulls kubernetes-sigs/external-dns 1.15.2 (appVersion 0.15.1, k8s 1.31-validated) as a Helm subchart, mirroring the bp-cilium / bp-cert-manager / bp-powerdns shape. Native PowerDNS provider speaks the bp-powerdns REST API directly via the EXTERNAL_DNS_PDNS_API_KEY env var sourced from the powerdns-api-credentials Secret bp-powerdns renders. Catalyst overlay templates added (default-off where applicable per the observability-toggle rule for the bp-* family): - templates/networkpolicy.yaml (default ON; egress to powerdns + cluster DNS + apiserver only) - templates/servicemonitor.yaml (default OFF) - templates/externalsecret.yaml (default OFF; Phase-2 OpenBao path) - templates/_helpers.tpl Bootstrap-kit Kustomization gets a new 12-external-dns.yaml HelmRelease referencing bp-external-dns:1.1.0 with dependsOn bp-cert-manager + bp-powerdns, and the legacy 11-bp-catalyst-platform.yaml is renumbered 13- so the install ordering reads in canonical Phase-0 sequence. Mirrored to clusters/omantel.omani.works/bootstrap-kit/ with the SOVEREIGN_FQDN substitution applied. bp-catalyst-platform Chart.yaml drops bp-external-dns from its dependency block — install ordering for ExternalDNS is now owned by Flux dependsOn at the Kustomization layer rather than this umbrella's Helm dependency graph. Bumped 1.1.0 → 1.1.1 to reflect the dep removal, and the bootstrap-kit HelmRelease references in both clusters bumped in lockstep. Wrapper chart version bumped 1.0.0 → 1.1.1 (umbrella shape). Local gates pass: - helm dependency build (pulls external-dns-1.15.2.tgz) - helm lint (0 failures) - helm template smoke render (245 lines, 6 kinds rendered) - helm package + tar-tzf verifies external-dns subchart inside the packaged tgz (subchart-guard simulation passes) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:29:27 +02:00
e3mrah	31d5911221	Merge pull request #185 from openova-io/fix/bp-charts-observability-toggles-default-false fix(bp-*): observability toggles default false (v1.1.1)	2026-04-29 21:26:48 +04:00
hatiyildiz	1ddd569789	fix(bp-): observability toggles default false — break circular CRD dependency Extends the v1.1.1 hardening that started with cilium / cert-manager / crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints. Every observability toggle in every Catalyst-curated Blueprint now ships `false`/`null` by default; the operator opts in via a per-cluster values overlay at clusters/<sovereign>/bootstrap-kit/ once bp-kube-prometheus-stack reconciles. Live failure mode that prompted this (omantel.omani.works 2026-04-29): bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor to true. The upstream Cilium 1.16.5 chart renders a monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with kube-prometheus-stack — a tier-2 Application Blueprint that depends on the bootstrap-kit (cilium first). Helm install fails on a fresh Sovereign with "no matches for kind ServiceMonitor in version monitoring.coreos.com/v1 — ensure CRDs are installed first" and every downstream HelmRelease reports `dep is not ready`. The earlier trustCRDsExist=true mitigation only suppresses Helm's render-time gate; the apiserver still rejects the resource at install-time. Per-Blueprint changes: - bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false; hubble.metrics.enabled → null (this is the exact value that disables the upstream metrics ServiceMonitor template branch — verified by reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor .enabled → false. tests/observability-toggle.sh extended with Case 4 (default render produces no hubble-relay / hubble-ui Deployments). - bp-flux: flux2.prometheus.podMonitor.create → false. - bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled → false (explicit lock; upstream already defaults false). - bp-spire: spire.global.spire.recommendations.enabled + recommendations.prometheus → false. - bp-nats-jetstream: nats.promExporter.enabled + promExporter.podMonitor.enabled → false. - bp-openbao: openbao.injector.metrics.enabled + openbao.serviceMonitor.enabled → false. - bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled + metrics.prometheusRule.enabled → false. - bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.* serviceMonitor + prometheusRule → false. - bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled → false (forward-compatibility guard; current upstream pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future upstream bump cannot silently regress). Each chart ships a tests/observability-toggle.sh that asserts the rule in three cases (default off / explicit on opt-in / explicit off) — runs under blueprint-release.yaml's chart-test gate (added `bdeb0f54` + the existing wiring) before helm push. A regression that re-introduces a hardcoded enabled: true in any chart fails CI before the OCI artifact is published. Versioning: - All 11 leaf charts bumped 1.1.0 → 1.1.1. - products/catalyst/chart (bp-catalyst-platform umbrella) deps updated to 1.1.1 across the board. - clusters/_template/bootstrap-kit/03-flux through 10-gitea bumped to 1.1.1; clusters/omantel.omani.works/bootstrap-kit/* mirror. docs/BLUEPRINT-AUTHORING.md §11.2 table extended to enumerate every toggle disabled across all 11 Blueprints. References docs/INVIOLABLE-PRINCIPLES.md #4. GATES (all green): - helm dep build resolves cleanly post-change for every chart whose upstream is published (umbrella waits on per-leaf publish). - helm lint clean on all 11 leaves. - helm template . default render produces zero monitoring.coreos.com references on every leaf (verified locally). - tests/observability-toggle.sh PASS on all 11 leaves. Live verification: with v1.1.1 published the omantel.omani.works HelmRelease can roll forward without a manual values patch — Flux picks up the new chart digest automatically (semver: 1.x in OCIRepository). Refs: issue #182.	2026-04-29 19:23:52 +02:00
hatiyildiz	02b5b6c4c8	fix(bootstrap-kit): override cilium + cert-manager values to disable observability toggles Live verified on omantel: bp-cilium and bp-cert-manager v1.1.0 fail Helm install with 'no matches for kind ServiceMonitor in version monitoring.coreos.com/v1'. Manual kubectl-patch of the live HelmRelease worked but Flux's 15-min reconcile rolls back the patch because the HelmRelease CR is owned by the kustomize-controller from git. Override the values inline in the HelmRelease manifests so the patch is durable across Flux reconciles. Same pattern as the in-flight observability- toggle agent will apply to all 12 charts in the next chart bump (v1.1.1). This is the manifest-level workaround that unblocks the running omantel cluster TODAY without waiting for v1.1.1 publish. Mirrors the patches into both clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/bootstrap-kit/ so future Sovereigns inherit.	2026-04-29 19:17:08 +02:00
hatiyildiz	b1638f51ea	fix(bp-* tests): skip helm dep build when charts/ already vendored Earlier rerun failure on the CI workflow (bp-cert-manager 25120060270): Error: no repository definition for https://charts.jetstack.io. Please add the missing repos via 'helm repo add' Root cause: blueprint-release.yaml's earlier `helm dependency build` step (line 181) successfully resolves the upstream chart and populates chart/charts/ — but it does NOT `helm repo add` the upstream repo first. Helm 3.20's `helm dep build` succeeds on the first call by falling back to direct-URL fetch from Chart.yaml `dependencies[].repository`. A SECOND `helm dep build` (run by the test script) hits a different code path that requires the repo to be in the helm repo cache. Fix: tests/observability-toggle.sh now skips `helm dep build` when chart/charts/ is already populated (which is always the case in CI since the workflow's own `helm dependency build` step ran first). Local dev runs from a fresh checkout still resolve subcharts. Refs #182 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:12:21 +02:00
hatiyildiz	d34facc040	fix(bp-): observability toggles default false — break circular CRD dependency bp-cilium@1.1.0 install fails on every fresh Sovereign with: no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" — ensure CRDs are installed first Cascades to all 10 other bp- HelmReleases ("dep is not ready") since bp-cilium is the root of the bootstrap dep graph. Verified live on omantel.omani.works 2026-04-29 (issue #182). Root cause: platform/cilium/chart/values.yaml and platform/cert-manager/chart/values.yaml hardcoded `serviceMonitor.enabled: true`. The monitoring.coreos.com/v1 CRDs ship with kube-prometheus-stack — an Application-tier Blueprint that itself depends on the bootstrap-kit. Hardcoding `true` creates a circular CRD ordering: bp-cilium wants the CRD bp-kube-prometheus-stack provides, but bp-kube-prometheus-stack cannot install before bp-cilium. The `trustCRDsExist=true` mitigation only suppresses Helm's render-time gate; the apiserver still rejects the resource at install-time. Violates INVIOLABLE-PRINCIPLES.md #4 (never hardcode): observability toggles MUST be operator-tunable, not chart-level constants assuming an observability tier exists. This commit: A. Defaults every observability toggle false in the affected wrappers: - platform/cilium/chart/values.yaml: cilium.prometheus.enabled: false cilium.prometheus.serviceMonitor.enabled: false (trustCRDsExist removed — no longer relevant) - platform/cert-manager/chart/values.yaml: cert-manager.prometheus.enabled: false cert-manager.prometheus.servicemonitor.enabled: false - platform/crossplane/chart/values.yaml: crossplane.metrics.enabled: false (uniformity rule — does not break install but holds the invariant) B. Bumps affected wrapper charts 1.1.0 → 1.1.1: - bp-cilium, bp-cert-manager, bp-crossplane (leaves) - bp-catalyst-platform (umbrella; deps repinned to 1.1.1 for the 3) C. Updates clusters/_template/bootstrap-kit/* and clusters/omantel.omani.works/bootstrap-kit/* HelmRelease versions to 1.1.1 so the live Sovereign picks up the fix on Flux reconcile. D. Adds platform/<name>/chart/tests/observability-toggle.sh under each affected chart. Each script asserts: - default render produces zero monitoring.coreos.com refs - opt-in render with --set <toggle>=true succeeds and produces a ServiceMonitor (proves the toggle is wired) - explicit-off render succeeds and produces zero refs Wired into .github/workflows/blueprint-release.yaml via a new "Run chart integration tests" step that executes every chart/tests/ .sh on every publish — a regression that re-introduces a hardcoded `true` fails the publish job before the OCI artifact is pushed. E. Documents the rule in docs/BLUEPRINT-AUTHORING.md §11.2 "Observability toggles must default false". References Principle #4 and provides the canonical pattern (default off in wrapper values, opt-in via per-cluster overlay at clusters/<sovereign>/...). Per-chart audit table (which toggle was hardcoded → new default): \| Chart \| Toggle \| Was \| Now \| \|------------------\|----------------------------------------------------------\|------\|-------\| \| bp-cilium \| cilium.prometheus.enabled \| true \| false \| \| bp-cilium \| cilium.prometheus.serviceMonitor.enabled \| true \| false \| \| bp-cert-manager \| cert-manager.prometheus.enabled \| true \| false \| \| bp-cert-manager \| cert-manager.prometheus.servicemonitor.enabled \| true \| false \| \| bp-crossplane \| crossplane.metrics.enabled \| true \| false \| \| bp-flux \| (no observability hardcodes) \| n/a \| n/a \| \| bp-sealed-secrets\| (no observability hardcodes) \| n/a \| n/a \| \| bp-spire \| (no observability hardcodes) \| n/a \| n/a \| \| bp-nats-jetstream\| (no observability hardcodes) \| n/a \| n/a \| \| bp-openbao \| (no observability hardcodes) \| n/a \| n/a \| \| bp-keycloak \| (no observability hardcodes) \| n/a \| n/a \| \| bp-gitea \| (no observability hardcodes) \| n/a \| n/a \| \| bp-powerdns \| (no observability hardcodes) \| n/a \| n/a \| \| bp-catalyst-platform \| (umbrella, no values overlay) \| n/a \| n/a \| Local gates green: helm dep build ✓ all 3 affected charts helm lint ✓ all 3 helm template ✓ all 3 — 0 monitoring.coreos.com refs in default tests/observability-toggle.sh ✓ all 9 sub-cases pass Closes the install path for bp-cilium 1.1.1 on a fresh Sovereign; unblocks the full bp- dep graph. Refs: https://github.com/openova-io/openova/issues/182 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:08:09 +02:00
hatiyildiz	43aff20254	feat(bp-): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream Each platform/<name>/chart/Chart.yaml now declares the canonical upstream chart as a dependencies: entry. helm dependency build pulls the upstream payload into the OCI artifact at publish time, so Flux helm install of bp-<name>:1.1.0 actually installs the upstream Helm release alongside the Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer, ExternalSecret) under templates/. Pinned upstream chart versions per platform/<name>/blueprint.yaml: - cilium 1.16.5 https://helm.cilium.io - cert-manager v1.16.2 https://charts.jetstack.io - flux 2.4.0 https://fluxcd-community.github.io/helm-charts - crossplane 1.17.x https://charts.crossplane.io/stable - sealed-secrets 2.16.x https://bitnami-labs.github.io/sealed-secrets - spire ... https://spiffe.github.io/helm-charts-hardened - nats-jetstream ... https://nats-io.github.io/k8s/helm/charts - openbao ... https://openbao.github.io/openbao-helm - keycloak ... https://charts.bitnami.com/bitnami - gitea ... https://dl.gitea.com/charts - catalyst-platform umbrella over the 10 leaf bp- charts via helm dependency values.yaml in each chart adopts the umbrella convention: catalystBlueprint metadata block (provenance + version) at top level, upstream subchart values namespaced under the dependency name. cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER cert-manager controllers are running and CRDs registered (the previous hollow-chart shape ran the ClusterIssuer at install time when CRDs didn't exist yet, which was the omantel cluster's exact failure mode). Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella conversion is a meaningful structural revision). Cluster manifests in clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/ bootstrap-kit/ updated to reference 1.1.0. The blueprint-release.yaml workflow's helm package step needs an explicit helm dependency build before push so the upstream subchart bytes ship inside the OCI artifact. That CI change is a follow-up commit on this same branch (separate file scope).	2026-04-29 17:21:36 +02:00
hatiyildiz	67fdecb770	merge: remove k8gb (#171 )	2026-04-29 08:51:21 +02:00
hatiyildiz	f5daac52af	refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171 ) PowerDNS lua-records (`ifurlup`, `pickclosest`, `ifportup`) cover everything k8gb was doing — geo-aware response selection, health-checked failover, weighted round-robin — at the authoritative DNS layer. Eliminates a separate K8s controller, CRD set, and CoreDNS plugin from every Sovereign. Changes: - platform/k8gb/ deleted (Chart.yaml, values.yaml, blueprint.yaml never authored — only README existed) - products/catalyst/bootstrap/ui/public/component-logos/k8gb.svg deleted - componentGroups.ts: remove k8gb component (PowerDNS already there) - componentLogos.tsx: drop logo_k8gb + k8gb map entry - model.ts DEFAULT_COMPONENT_GROUPS spine: replace k8gb with powerdns - StepInfrastructure.tsx: copy refers to PowerDNS lua-records, not k8gb - provision.html: replace k8gb tile and edges with powerdns - catalog.generated.ts regenerated (now includes bp-powerdns) - docs sweep — every k8gb reference in PLATFORM-TECH-STACK, NAMING- CONVENTION, SOVEREIGN-PROVISIONING, SRE, ARCHITECTURE, GLOSSARY, COMPONENT-LOGOS, IMPLEMENTATION-STATUS, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST, README, infra/hetzner/README, platform READMEs (cilium, external-dns, failover-controller, litmus, flux, opentofu) rewritten to point at PowerDNS lua-records / MULTI-REGION-DNS.md. Historical entries in VALIDATION-LOG.md preserved as audit trail. - New docs/MULTI-REGION-DNS.md — canonical reference for the lua-record patterns (ifurlup all/pickclosest/pickfirst, ifportup, pickwhashed), Application Placement → lua-record selector mapping, when to add a second Sovereign region, operational checks. Closes #171. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:51:09 +02:00
hatiyildiz	f4679e2748	fix(powerdns): enable gpgsql-dnssec for DNSSEC API (1.0.6) Without `gpgsql-dnssec=yes` the gpgsql backend driver does not expose the DNSSEC API surface — `PUT /zones/<zone>` with `dnssec:true` returns 422 "no DNSSEC-capable backends are loaded". This blocks pool-domain- manager from enabling DNSSEC on every Sovereign child zone (mandatory per docs/PLATFORM-POWERDNS.md). Fix lands in additionalConfig so the directive is rendered alongside `default-soa-edit-signed=INCEPTION-EPOCH` and `direct-dnskey=yes`. No schema migration needed — the gpgsql 5.0.3 schema already includes the cryptokeys table; the missing piece was just the backend feature flag. Bumps Chart.yaml to 1.0.6. Verified: after this lands the PUT call returns 204 and POST /cryptokeys mints a usable KSK. Discovered while bringing up openova#168 (PDM per-Sovereign zones).	2026-04-29 08:42:18 +02:00
hatiyildiz	fa84cac438	fix(powerdns): plain ALTER TABLE in postInitSQL (avoid $$ escape battle, 1.0.5) The DO block in 1.0.4 rendered with $$ collapsed to $ by the time it reached CNPG's postInitApplicationSQL — "syntax error at or near $". Both Helm template processing and the YAML scalar block were chewing on the dollar signs. Replaced with explicit ALTER TABLE statements (one per gpgsql table) + GRANT — same end state, no PL/pgSQL quoting required. Verified at runtime on contabo-mkt: powerdns Pod went CrashLoopBackOff → Running 1/1 immediately after the manual ALTER ran by hand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:17:28 +02:00
hatiyildiz	214a3e1ada	fix(powerdns): grant table ownership to pdns user in CNPG bootstrap (1.0.4) Verified at runtime on Contabo-mkt: postInitApplicationSQL runs as the postgres superuser, not the application owner, so the schema tables created by the bootstrap block were owned by postgres. PowerDNS connects as 'pdns' and got 'permission denied for table domains' on the first SELECT against the zone cache. Added a DO block at the end of the schema bootstrap that walks every table in the public schema and ALTERs OWNER TO {{ .Values.postgres.cluster.owner }} plus GRANT ALL PRIVILEGES ON SCHEMA public — same shape PDM uses (and the contabo-mkt cluster verified the fix runtime: powerdns Pod went from CrashLoopBackOff to 1/1 Ready immediately after the same DDL was run by hand). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:14:12 +02:00
hatiyildiz	db20e9d42b	fix(powerdns): dnsdist backend resolution + drop DnstapLogAction (1.0.3) dnsdist 1.9.14 runtime errors: 1. newServer{address='powerdns:5353'} → "Unable to convert presentation address" — dnsdist's address parser expects IP[:port], not a DNS name. Kubernetes auto-injects POWERDNS_SERVICE_HOST as an env var into every pod in the same namespace as the powerdns Service; using that gives us the ClusterIP at config-load time without needing an init container or runtime DNS resolution. 2. DnstapLogAction(name, bool, fn) signature changed in 1.9 — the 2nd parameter now expects a shared_ptr to a RemoteLoggerInterface, not a boolean. Rather than wire up a remote dnstap server (which adds a moving part for marginal observability gain), drop the line. Catalyst observability is the dnsdist /metrics endpoint surfaced to Prometheus + the k8s container log. Bumped chart to 1.0.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:12:27 +02:00
hatiyildiz	20c0543806	fix(powerdns): correct dnsdist image tag + drop readOnlyRootFilesystem (1.0.2) Two runtime issues caught during first contabo-mkt rollout: 1. dnsdist image tag was "1.9" (default) — that tag doesn't exist in docker.io/powerdns/dnsdist-19. The 1.9.x line publishes 1.9.0 .. 1.9.14 (no rolling "1.9" alias). Pinned to 1.9.14 (current latest). 2. PowerDNS pod crash-looped on Errno 30 (Read-only file system: /etc/powerdns/pdns.d/0-api.conf.conf). The upstream pdns_server-startup script writes rendered config files to /etc/powerdns/pdns.d/ at container start, and the upstream template doesn't expose an emptyDir we could redirect that path to. Set readOnlyRootFilesystem=false with a verbose comment explaining why; the rest of the security context (runAsNonRoot, runAsUser=953, drop ALL caps) stays in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:06:39 +02:00
hatiyildiz	19d926bfeb	fix(powerdns): avoid recursive include in dnsdist checksum, bump to 1.0.1 Helm flagged dnsdist.yaml's checksum/config annotation as a recursive template self-reference (the file included itself). Replaced with a hash of the rendered .Values.dnsdist.config (post-tpl), which is the substantive content the annotation is supposed to track anyway. Bumped Chart.yaml to 1.0.1 so the OCIRepository semver "1.x" picks up the fix automatically on next reconcile. Blueprint API version stays at 1.0.0 (Blueprint contract is unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:02:53 +02:00
hatiyildiz	0190c60520	feat(powerdns): bp-powerdns wrapper chart + per-Sovereign zone model (#167 ) Introduces the bp-powerdns Catalyst Blueprint wrapper as the authoritative DNS service for every Sovereign zone. Replaces k8gb in componentGroups.ts — PowerDNS Lua records cover geo + health-checked failover natively, removing the dedicated GSLB controller. Wrapper chart (platform/powerdns/chart/): - Chart.yaml — bp-powerdns 1.0.0, depends on pschichtel/powerdns 0.10.0 upstream (verified Artifact Hub publisher, tracks docker.io/powerdns/ pdns-auth-50 at appVersion 5.0.3 — surveyed Artifact Hub, no official PowerDNS chart exists) - values.yaml — 3 replicas, gpgsql backend, DNSSEC ECDSAP256SHA256, lua-records ON, dnsdist 100 qps default per source IP, REST API at pdns.openova.io/api behind Traefik basicAuth - blueprint.yaml — Catalyst metadata, visibility=unlisted (mandatory infra), section pts-3-2-gitops-and-iac - templates/cnpg-cluster.yaml — separate `pdns-pg` Postgres (1 instance, 5Gi, postgres-16) with PowerDNS auth-5.0.3 schema applied via postInitApplicationSQL - templates/dnsdist.yaml — companion Deployment + ConfigMap with rate-limiting policy (MaxQPSIPRule per source IP) - templates/api-ingress.yaml — Traefik Ingress + basicAuth Middleware - templates/anycast-endpoint.yaml — placeholder Service of type LoadBalancer (Phase-0 stand-in for the anycast Floating IP target state) - templates/crossplane-floatingip.yaml — DISCLOSED GAP: target-state XHetznerFloatingIP composite, disabled by default until the Crossplane composition is authored (the existing compositions cover Server/Network/Firewall/LoadBalancer/PoolAllocation only). The placeholder anycast Service is the operational stand-in. Per docs/INVIOLABLE-PRINCIPLES.md: - #4 (never hardcode): every value flows from values.yaml or a referenced K8s Secret. Image tags come from upstream chart appVersion, never duplicated. - #8 (disclose every divergence): the XHetznerFloatingIP gap is documented in the template + in docs/PLATFORM-POWERDNS.md ("Anycast deferral" section). componentGroups.ts: powerdns added to SPINE group as mandatory (depends on cnpg). external-dns now lists powerdns as a dependency. k8gb removed. docs/PLATFORM-POWERDNS.md: per-Sovereign zone model, DNSSEC posture, REST API contract, lua-records GSLB pattern, dnsdist policy, anycast deferral runbook, first-deploy procedure for Contabo-mkt. Closes #167 (Phase 1 of public-repo work; Phase 4 cluster manifest lands in openova-private feat/powerdns-deploy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:49:51 +02:00
hatiyildiz	31b03ce02a	ci(pdm)+platform(crossplane): build workflow + XDynadotPoolAllocation composition (Phase 3+4 of #163 ) CI workflow (.github/workflows/pool-domain-manager-build.yaml) mirrors the marketplace-api / catalyst-api shape: - Triggers on push to core/pool-domain-manager/** + workflow_dispatch - Runs unit tests (reserved + dynadot — the integration suite needs a real Postgres which the workflow does not provide; full integration runs in test-bootstrap-api.yaml against an ephemeral CNPG) - Builds and pushes ghcr.io/openova-io/openova/pool-domain-manager:<sha> - Cosign-signs the image via Sigstore keyless OIDC (id-token: write) - Emits an SBOM attestation tied to the image digest - Manifest deployment is intentionally NOT in this workflow — PDM manifests live in the openova-private repo per the issue body, so the Flux Kustomization there picks up the new SHA via a follow-up private-repo commit (Phase 6 of #163) Crossplane composition (platform/crossplane/compositions/xrd-pool- allocation.yaml + composition-pool-allocation.yaml) wraps PDM as a declarative Crossplane Resource: apiVersion: compose.openova.io/v1alpha1 kind: XDynadotPoolAllocation spec: parameters: poolDomain: omani.works subdomain: omantel sovereignFQDN: omantel.omani.works loadBalancerIP: 1.2.3.4 createdBy: crossplane The Composition uses provider-http (crossplane-contrib/provider-http) to render the XR into a Reserve → Commit sequence of HTTP calls against PDM's in-cluster service URL. Per docs/INVIOLABLE-PRINCIPLES.md #3 we use provider-http rather than bespoke Go to keep the day-2 lifecycle declarative. Operators who want to pre-allocate a name (e.g. reserve 'omantel.omani.works' for a Sovereign that hasn't been provisioned yet) commit YAML to Git and Flux+Crossplane converge. Refs: #163	2026-04-29 06:46:11 +02:00
hatiyildiz	8886eff708	Merge branch 'feat/group-g-dns-finish-v3' Group G DNS finish (v3): #110 (Dynadot multi-domain table-driven tests), #112 (catalyst-dns httptest-mocked Dynadot coverage), #113 (cert-manager LE DNS-01 + HTTP-01 ClusterIssuer templates with operator runbook for the cert-manager-dynadot-webhook gap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:45:35 +02:00
hatiyildiz	97e942e0bc	feat(cert-manager): #113 — Lets Encrypt DNS-01 + HTTP-01 ClusterIssuers Adds platform/cert-manager/chart/templates/clusterissuer-letsencrypt-dns01.yaml with two ClusterIssuers, both Catalyst-curated, rendered conditionally from values.yaml: - letsencrypt-dns01-prod (TARGET STATE, default disabled) — ACME DNS-01 via the cert-manager webhook solver, pointing at a future `cert-manager-dynadot-webhook` Catalyst binary that will implement the webhook.acme.cert-manager.io/v1alpha1 contract against the existing internal/dynadot/ package. Shipping the issuer template ahead of the webhook so cluster overlays only need a values flip + secret ref — no template edits — once the webhook lands. - letsencrypt-http01-prod (INTERIM, default enabled) — ACME HTTP-01 via the cilium ingress class. Issues certs for the explicit hostnames (console, gitea, harbor, admin, api) but NOT for wildcards; the canonical *.<sub>.<domain> record needs DNS-01. Header comment explains the gap: the Catalyst external-dns webhook (products/catalyst/bootstrap/api/cmd/external-dns-dynadot-webhook/) implements a DIFFERENT RPC contract (records.list/add/delete) than what cert-manager DNS-01 expects (Present/CleanUp on ChallengeRequest CRD), so it cannot be reused; a dedicated cmd/cert-manager-dynadot-webhook/ must be built. Operator runbook for cutover is in the file header. values.yaml gains a `certManager.issuers.{email,acmeServer,dns01,http01}` section so all knobs are runtime-configurable per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode); cluster overlays in clusters/<sovereign>/ can flip dns01.enabled via the bp-catalyst-platform umbrella's values without rebuilding the Blueprint OCI artifact. blueprint.yaml gains a spec.outputs section advertising: - issuerName: letsencrypt-http01-prod (default) - wildcardIssuerName: letsencrypt-dns01-prod (target state) - issuerKind: ClusterIssuer so dependent Blueprints (cilium-gateway, harbor, gitea) can consume the issuer name without hardcoding it. Closes #113. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:44:56 +02:00
hatiyildiz	c07e0ad1ee	feat(external-dns): #109 — author bp-external-dns leaf chart for OCI publish The bp-catalyst-platform umbrella (issue #104) declares a dependency on bp-external-dns:1.0.0 — but the chart didn't exist; only README + Dynadot multi-domain policy lived under platform/external-dns/. Without this leaf the umbrella's `helm dependency build` fails (verified in run 25068433765). This commit authors the minimal target-state leaf: - Chart.yaml: name=bp-external-dns, version=1.0.0 - values.yaml: catalystBlueprint.upstream metadata (external-dns 1.15.0 from kubernetes-sigs/external-dns Helm repo) + Catalyst-curated values overlay (sources, txtOwnerId, ServiceMonitor, RBAC, resources) Per BLUEPRINT-AUTHORING.md §3, leaf charts are pure values-overlay wrappers: no templates dir, just Chart.yaml + values.yaml with the catalystBlueprint metadata block read by the bootstrap-kit installer at helm-install time. Per-Sovereign provider/zone/credential overrides are overlaid by the Crossplane Composition that materializes the HelmRelease — keeping this chart provider-agnostic (no hardcoded Cloudflare/Dynadot/Hetzner choice per INVIOLABLE-PRINCIPLES.md §4). After this lands, blueprint-release.yaml will publish ghcr.io/openova-io/bp-external-dns:1.0.0 and the next umbrella push will resolve all 11 leaf deps successfully.	2026-04-28 19:42:23 +02:00
hatiyildiz	f0fe3006ba	feat(external-dns): #109 — Catalyst-curated dynadot-multi-domain policy Adds platform/external-dns/policies/dynadot-multi-domain.yaml — the canonical external-dns + dynadot webhook deployment that ships in every Sovereign on an OpenOva pool domain. Why a webhook: external-dns has no upstream Dynadot provider; the canonical pattern is the webhook RPC contract, with a sidecar that implements the provider in our preferred language. We reuse the same internal/dynadot/ package the catalyst-api uses, so the never-wipe rule, record encoding, and managed-domain allowlist are identical on both write paths (per docs/INVIOLABLE-PRINCIPLES.md #2 — no duplicate implementations of the same concern). Multi-domain: - One --domain-filter per zone in the external-dns args; adding a third pool domain (e.g. acme.io) is a one-line edit here PLUS a one-key edit on dynadot-api-credentials' `domains` field. No webhook rebuild. - Webhook reads DYNADOT_MANAGED_DOMAINS from the same secret with optional=true, preserving backward compatibility with the legacy single-`domain` secret shape (pre-#108). TXT registry: - --txt-owner-id=$(SOVEREIGN_FQDN), --txt-prefix=_externaldns.<sub>. - Cluster overlays substitute SOVEREIGN_FQDN via the bp-catalyst-platform umbrella so two clusters sharing a parent zone (alpha.omani.works, beta.omani.works) cannot collide. Closes #109. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:45:53 +02:00
hatiyildiz	046e5ebc18	feat(day2-iac): Crossplane Compositions + per-Sovereign Flux cluster tree + catalyst-dns binary Group F deliverables — completes the day-2 IaC layer that takes over after OpenTofu's Phase 0 hand-off (per docs/SOVEREIGN-PROVISIONING.md §4). Three artifacts: 1. platform/crossplane/compositions/ — XRDs + Compositions for canonical Hetzner resources under the canonical compose.openova.io/v1alpha1 group (per BLUEPRINT-AUTHORING.md §8): - XHetznerNetwork + composition-network.yaml — wraps hcloud_network + subnet - XHetznerFirewall + composition-firewall.yaml - XHetznerServer + composition-server.yaml - XHetznerLoadBalancer + composition-loadbalancer.yaml (lb11, 80→31080, 443→31443) - README documenting the canonical pattern 2. clusters/_template/ — the canonical per-Sovereign Flux Kustomization tree. Copied to clusters/<sovereign-fqdn>/ at provisioning time; cloud-init's GitRepository points at the result. - kustomization.yaml (root: flux-system + infrastructure + bootstrap-kit) - flux-system/ (placeholder for Flux self-config customization) - infrastructure/ (provider-hcloud + ProviderConfig referencing hcloud-credentials secret OpenTofu writes) - bootstrap-kit/ — 11 HelmRelease manifests in dependency order: 01-cilium → 02-cert-manager → 03-flux → 04-crossplane → 05-sealed-secrets → 06-spire → 07-nats-jetstream → 08-openbao → 09-keycloak → 10-gitea → 11-bp-catalyst-platform Each pulls from oci://ghcr.io/openova-io/bp-<name>:1.0.0 — the wrapper charts published by blueprint-release CI. dependsOn declarations enforce the canonical install order at runtime. 3. clusters/omantel.omani.works/ — the first concrete Sovereign instance. Mirror of _template with SOVEREIGN_FQDN_PLACEHOLDER substituted to omantel.omani.works. This is what the wizard's first omantel.omani.works run will actually reconcile. 4. products/catalyst/bootstrap/api/cmd/catalyst-dns/main.go — small Go binary the OpenTofu module's null_resource.dns_pool invokes via local-exec at Phase-0 apply time. Reads DYNADOT_API_KEY/SECRET/DOMAIN/SUBDOMAIN/LB_IP env vars; calls existing dynadot.Client.AddSovereignRecords. Containerfile already builds + ships it at /usr/local/bin/catalyst-dns. Architectural compliance (Lesson #24 closed): - No bespoke Go cloud-API calls (Crossplane Compositions are the canonical day-2 IaC) - No exec.Command("helm", ...) (Flux HelmReleases are the canonical install unit) - No kubectl apply from outside (cloud-init kubectl-applies one Flux GitRepository, then Flux owns everything) After this commit, the path is end-to-end: wizard → catalyst-api → tofu apply (with infra/hetzner/) → cloud-init installs k3s + Flux + applies GitRepository pointing at clusters/omantel.omani.works/ → Flux reconciles bootstrap-kit (11 HelmReleases in dependency order) → Crossplane adopts day-2 management.	2026-04-28 14:09:29 +02:00
hatiyildiz	62d9c7d936	fix(charts): drop dependencies block — wrappers carry values overlay only The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks. Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values. This keeps: - blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd) - the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork) - the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>) Changes: - 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package. - 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values. - products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up. After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.	2026-04-28 12:57:29 +02:00
hatiyildiz	441ebaebb8	fix(charts): pin upstream chart versions/names to ones that exist in their repos The first Blueprint Release CI run (commit `8c0f766`) failed because four chart wrappers referenced upstream chart versions/names that don't exist in their published repositories: - platform/flux/chart: name was "flux", repo was OCI; actual is name "flux2" in plain helm repo at https://fluxcd-community.github.io/helm-charts. Pinned to 2.13.0. - platform/openbao/chart: version 2.1.0 was the binary appVersion, not the chart version. Pinned to 0.16.0 chart (which packages openbao 2.1.0 internally). - platform/keycloak/chart (Bitnami): chart version 25.0.6 was the appVersion of upstream; Bitnami's chart is at 24.7.1 packaging Keycloak 26.0.x. Pinned to 24.7.1. - platform/nats-jetstream/chart: name was "nats-jetstream"; the upstream chart is named "nats" (it always was — JetStream is a feature of NATS, not a separate chart). Renamed. Cilium, cert-manager, crossplane, sealed-secrets, spire wrappers were unaffected; their version pins matched upstream availability. Containerd permission-denied errors from `helm package` on cilium/cert-manager/crossplane/gitea/sealed-secrets are a separate CI plumbing issue (helm tries to pull OCI base images during package build via containerd, but the GitHub Actions runner blocks containerd socket access). Tracked as a follow-up: switch to `helm package --skip-refresh` or use a runner with containerd permissions. After this commit lands, the next blueprint-release CI run should green-build at minimum the 4 fixed charts. Successful builds publish bp-{flux,openbao,keycloak,nats-jetstream}:1.0.0 OCI artifacts to ghcr.io/openova-io/.	2026-04-28 12:55:21 +02:00

1 2

87 Commits