dba8a80c36
87 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b8d7a8b9cf
|
fix(bp-seaweedfs): disable global.enableSecurity to avoid fromToml on helm-controller v1.1.0 (#339)
Upstream seaweedfs/seaweedfs templates/shared/security-configmap.yaml uses Helm template fromToml; helm-controller v1.1.0's bundled helm SDK (v3.x older than 3.13) doesn't define fromToml so the install fails: parse error at security-configmap.yaml:21: function fromToml not defined Setting global.seaweedfs.enableSecurity: false skips the entire template. Internal SeaweedFS API is cluster-IP only on Sovereign-1; chart-level security is acceptable to defer until helm-controller is bumped. Bumped 1.0.0 → 1.0.1. Unblocks the chain: bp-loki, bp-mimir, bp-tempo, bp-velero, bp-harbor, bp-grafana all dependsOn bp-seaweedfs. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
9554be4a5e
|
fix(bp-external-secrets): gate ClusterSecretStore on CRD presence + drop delete-policy (#337)
The chart's post-install hook was failing on otech.omani.works: failed post-install: unable to build kubernetes object for deleting hook bp-external-secrets/templates/clustersecretstore-vault-region1.yaml: resource mapping not found for kind ClusterSecretStore in version external-secrets.io/v1beta1 Two corrections: 1. Capabilities-gate the entire template — don't render unless the ClusterSecretStore CRD is registered (it ships in via the upstream ESO subchart but isn't live on first install) 2. Remove 'before-hook-creation' delete-policy (was the actual trigger for the 'deleting hook' failure path) Bumped 1.0.0 → 1.0.1. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
5502d9aa48
|
feat(dns): cert-manager-dynadot-webhook for DNS-01 wildcard TLS (closes #159) (#291)
Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer
in bp-cert-manager by shipping the missing piece — a Go binary that
satisfies cert-manager's external webhook contract
(`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json.
Architecture
============
* `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with
pool-domain-manager and catalyst-dns). Encapsulates the api3.json
transport, command builders, response decoding, and the safe
read-modify-write semantics required to never accidentally wipe a
zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2`
variant is unexported.
* `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook
binary. Implements `Solver.Present` via the client's append-only
`AddRecord` path and `Solver.CleanUp` via the read-modify-write
`RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`)
rejects challenges for unmanaged apexes BEFORE any Dynadot call.
* `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm
wrapper. Templates Deployment + Service + APIService + serving
Certificate (CA chain via cert-manager Issuer self-signing) +
RBAC + ServiceAccount. Mirrors the standard cert-manager external-
webhook deployment shape.
* `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the
paired ClusterIssuer activates. The interim http01 issuer remains
templated as the rollback path.
Test results
============
core/pkg/dynadot-client — 7 tests PASS (race-clean)
core/cmd/cert-manager-dynadot-... — 9 tests PASS (race-clean)
Test coverage includes a Present/CleanUp round-trip against an
httptest fixture that models Dynadot's zone state, an explicit
unmanaged-domain rejection, a regression preserving a pre-existing
CNAME across the DNS-01 round-trip (the zone-wipe defence), and a
typed-error propagation test that surfaces `ErrInvalidToken` to
cert-manager so the controller will retry.
Helm template smoke render
==========================
`helm template` against the new chart with default values yields 12
resources / 424 lines (APIService, Certificate, ClusterRoleBinding,
Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The
modified bp-cert-manager chart still renders both ClusterIssuers
(`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default
values; flipping `certManager.issuers.dns01.enabled=false` is the
clean rollback.
Smoke command (post-deploy)
===========================
kubectl get apiservices.apiregistration.k8s.io \
v1alpha1.acme.dynadot.openova.io
# Issue a *.<sovereign>.<pool> wildcard cert and watch the
# Order/Challenge progress through cert-manager.
CI
==
`.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the
pool-domain-manager-build pattern (cosign keyless signing, SBOM
attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager-
dynadot-webhook:<sha>`). Triggered by changes to either the binary or
the shared dynadot-client package.
Closes #159
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c09109a61a
|
feat(charts): bp-stunner + bp-knative + bp-kserve wrapper charts (closes #263 #264 #265) (#290)
Edge + serverless + model-serving batch (W2.5.C) — three upstream- subchart umbrella Blueprints completing the bootstrap-kit slots for WebRTC media relay (bp-relay → bp-stunner) and the AI/ML serving stack (bp-cortex → bp-kserve → bp-knative). Each chart follows the canonical umbrella pattern from docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream chart under `dependencies:` so `helm dependency build` bundles the upstream payload into the OCI artifact, and Catalyst-curated overlay values + templates sit alongside in chart/values.yaml + chart/templates/. Per-chart highlights: - bp-stunner/1.0.0 — wraps stunner/stunner-gateway-operator 1.1.0. Ships a Cilium-native GatewayClass (Capabilities-gated on gateway.networking.k8s.io/v1) so bp-relay (LiveKit / SFU) can claim Gateway CRs without an operator-ordering dance. Default UDP TURN port range 30000-32767 matches the range opened at the Sovereign edge firewall (Crossplane bp-firewall composition). - bp-knative/1.0.0 — wraps knative-operator v1.21.1. Ships a KnativeServing CR pre-configured for **istio-less mode** (ingress.istio.enabled=false, ingress.contour.enabled=false, ingress.kourier.enabled=false; config.network.ingress-class=cilium). Sovereign FQDN sourced from values, no hardcoded fallback per inviolable principle #4 — render fails loudly if cluster overlay doesn't set knativeOverlay.knativeServing.sovereignFqdn. - bp-kserve/1.0.0 — wraps kserve/kserve v0.16.0 (latest version published on the official OCI registry as of 2026-04-30). Default deploymentMode=RawDeployment (no Knative hop on the hot path) but bp-knative is still installed (declared as a hard dep) so per-IS annotation `serving.kserve.io/deploymentMode: Serverless` opts in to scale-to-zero per tenant. Cilium native Gateway-API ingress (enableGatewayApi=true, className=cilium, disableIstioVirtualHost= true). Observability discipline (issue #182): every observability toggle (ServiceMonitor, HPA, GatewayClass) defaults false and is operator- tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (with `--api-versions monitoring.coreos.com/v1` to simulate Prometheus Operator CRDs), and explicit-off cases. Per-chart kind summary (helm template default render): bp-stunner: ClusterRole, ClusterRoleBinding, ConfigMap, Dataplane, Deployment, Role, RoleBinding, Service, ServiceAccount. (+ GatewayClass when --api-versions gateway.networking.k8s.io/v1 is passed.) bp-knative: ClusterRole, ClusterRoleBinding, ConfigMap, CustomResourceDefinition, Deployment, KnativeServing, Role, RoleBinding, Secret, Service, ServiceAccount. bp-kserve: Certificate, ClusterRole, ClusterRoleBinding, ClusterServingRuntime, ClusterStorageContainer, ConfigMap, Deployment, Gateway, Issuer, MutatingWebhookConfiguration, Role, RoleBinding, Service, ServiceAccount, ValidatingWebhookConfiguration. `helm lint` clean for all three (single INFO on missing icon — icons land with marketplace card work). `bash tests/observability-toggle.sh` green for all three (3 cases each: default-off, opt-in, explicit-off). Closes #263 #264 #265 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
782d8015c5
|
feat(charts): bp-openmeter (CH-less) + bp-livekit + bp-matrix wrapper charts (closes #272 #273 #274) (#289)
W2.5.F — three Catalyst Blueprint umbrella charts at platform/{openmeter,
livekit,matrix}/, each declaring its upstream chart under Chart.yaml
`dependencies:` so `helm dependency build` bundles the upstream payload
into the published OCI artifact (per docs/BLUEPRINT-AUTHORING.md §11.1
— hollow charts forbidden, CI-enforced by issue #181).
Per-chart kind summary
======================
bp-openmeter (closes #272)
default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount
upstream chart: openmeter 1.0.0-beta.213 (oci://ghcr.io/openmeterio/helm-charts)
ClickHouse-less profile per docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §6.4.
The upstream chart's bundled clickhouse / kafka / postgresql / redis /
svix subcharts are all DISABLED — Catalyst supplies CNPG (postgres),
JetStream (event bus), and Valkey (redis-compat) at the platform tier.
Chart-level toggle `catalystBlueprint.backend.kind` (default `cnpg`,
alt `clickhouse`) records the active profile so observability/audit
pipelines can report it. The OpenMeter binary's
`aggregation.clickhouse.address` is left blank — per-Sovereign overlay
supplies it once a host cluster adds bp-clickhouse and the operator
re-rolls with `backend.kind: clickhouse`. Catalyst overlay templates
(NetworkPolicy / ServiceMonitor / HPA) all default OFF per
docs/BLUEPRINT-AUTHORING.md §11.2.
bp-livekit (closes #273)
default `helm template` kinds: ConfigMap, Deployment, Service, ServiceAccount
upstream chart: livekit-server 1.9.0 (https://helm.livekit.io)
WebRTC SFU. Powers the Huawei iFlytek voice demo. Catalyst defaults
pair LiveKit with bp-stunner (the upstream chart's bundled co-located
TURN server is OFF; per-Sovereign overlay points the LiveKit TURN
config at the stunner UDP-gateway Service). RTC UDP port range is
50000-60000 (matches the Hetzner firewall rule the per-Sovereign
overlay opens). Catalyst overlay templates (NetworkPolicy /
ServiceMonitor / HPA) all default OFF; the chart's NetworkPolicy
template documents that LiveKit's hostNetwork mode means pod-level
policies do NOT cover the SFU port range — the firewall rule is the
load-bearing control. blueprint.yaml `depends:` declares bp-stunner +
bp-cert-manager + bp-valkey.
bp-matrix (closes #274)
default `helm template` kinds: ConfigMap, Deployment, Ingress, Job,
PersistentVolumeClaim, Pod, Role, RoleBinding, Secret, Service,
ServiceAccount
upstream chart: matrix-synapse 3.12.25 (https://ananace.gitlab.io/charts)
Synapse (the Matrix server implementation, NOT the retired OpenOva
product noun). Federation OFF by default (Catalyst per-Sovereign
tenancy default — operator overlays flip it on per-Organization).
Postgres backend via bp-cnpg externalPostgresql; OIDC SSO via
bp-keycloak; bundled bitnami postgresql + redis subcharts both
disabled. Catalyst overlay NetworkPolicy gates the federation port
(8448) on `federation.enabled` — verified by Case 5 of the
observability-toggle test. Catalyst-overlay ServiceMonitor (upstream
chart has none) + HPA both default OFF.
Lint
====
All three charts pass `helm lint` clean (only the noisy "icon is
recommended" INFO message).
Observability tests
===================
Each chart's `tests/observability-toggle.sh` enforces the Catalyst
contract from docs/BLUEPRINT-AUTHORING.md §11.2:
Case 1: default render produces zero monitoring.coreos.com/v1
resources (no ServiceMonitor / PrometheusRule).
Case 2: opt-in (--set serviceMonitor.enabled=true --api-versions
monitoring.coreos.com/v1) renders a ServiceMonitor.
Case 3: explicit-off render is clean.
Case 4 (per chart):
- openmeter: ClickHouse-less profile asserts no
clickhouse.altinity.com / Kafka subchart resources leak into the
default render.
- livekit: asserts upstream livekit-server.serviceMonitor.create
defaults false.
- matrix: asserts default render carries an empty
federation_domain_whitelist (the per-Sovereign tenancy default).
Case 5 (matrix only): `--set federation.enabled=true networkPolicy
.enabled=true` opens port 8448 in the Catalyst NetworkPolicy.
All gates green for all three charts.
Closes #272 #273 #274
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
|
||
|
|
87d9a4afa7
|
feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288)
W2.5.E batch — three Application-tier Blueprints completing the LLM serving / workflow stack: - bp-temporal/1.0.0 — wraps temporal/temporal 1.2.0 (the new chart rewrite that removed cassandra:/mysql:/postgresql:/elasticsearch:/ prometheus:/grafana: top-level keys in favour of server.config.persistence.datastores). Postgres-only via CNPG-backed visibility store (skip Cassandra). Web UI ON. Keycloak OIDC integration via --auth-claim-mapper renders auth.yaml ConfigMap (operator wires via additionalVolumes once bp-keycloak is reconciled, default OFF). dependsOn: bp-cnpg + bp-cert-manager. Closes #271. Kinds: Cluster (CNPG) + ConfigMap + Deployment + Job + Pod + Service. - bp-llm-gateway/1.0.0 — wraps berriai/litellm-helm 0.1.572 from OCI. Subscription-aware proxy for Claude Code: routes to Anthropic (via operator OAuth/Max subscription — NEVER an ANTHROPIC_API_KEY, per memory/feedback_no_api_key.md), Bedrock, Vertex, OpenAI-compatible (via bp-anthropic-adapter), and self-hosted vLLM. CNPG-backed audit log (every prompt + response persisted for compliance). Bundled bitnami postgresql + redis subcharts DISABLED (db.useExisting=true points at the CNPG cluster). Keycloak SSO via auth.yaml ConfigMap (default OFF). ExternalSecret-backed environmentSecrets brings tokens / IAM creds in without inlining plaintext. dependsOn: bp-cnpg + bp-keycloak + bp-external-secrets. Closes #267. Kinds: Cluster (CNPG audit) + ConfigMap + Deployment + Job + Pod + Secret + Service + ServiceAccount. - bp-anthropic-adapter/1.0.0 — Catalyst-authored scratch chart for the OpenAI ↔ Anthropic translation Go service. SHA-pinned image ghcr.io/openova-io/openova/anthropic-adapter:<sha> (Inviolable Principle #4a — GitHub Actions is the only build path; empty default tag fails the render with a clear error instead of silently shipping :latest). OAuth/Max subscription token mounted from K8s Secret materialized by ESO from bp-openbao — ANTHROPIC_OAUTH_TOKEN env var, NEVER an ANTHROPIC_API_KEY. Includes OpenAI → Anthropic model-mapping ConfigMap (gpt-4 → claude-3-5-sonnet, gpt-4o-mini → claude-3-5-haiku, etc.). sigstore/common library subchart included to satisfy the hollow-chart gate (matches bp-vllm pattern from #283). dependsOn: bp-external-secrets. Closes #268. Kinds: ConfigMap + Deployment + Service + ServiceAccount. CRITICAL — bp-llm-gateway and bp-anthropic-adapter both consume the operator's Claude OAuth/Max subscription. Per memory/ feedback_no_api_key.md and the user's standing instruction, neither chart accepts or generates an ANTHROPIC_API_KEY. Tokens flow exclusively through ExternalSecret-managed K8s Secrets that ESO materializes from bp-openbao at install time. Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every observability toggle defaults `false` (ServiceMonitor / metrics sidecar / PodMonitor) and is operator-tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (with --api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. bp-anthropic-adapter additionally tests the never-:latest gate via Case 4 (empty image tag must fail render). Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every upstream version, namespace, server URL, role, secret name, model default, and toggle is exposed under values.yaml. Cluster overlays in clusters/<sovereign>/ may override without rebuilding the Blueprint OCI artifact. Per docs/BLUEPRINT-AUTHORING.md §11.1 (umbrella shape — hard contract): bp-temporal and bp-llm-gateway declare their upstream charts under Chart.yaml dependencies: so helm dependency build bundles the upstream payload into the OCI artifact. bp-anthropic- adapter is a scratch chart (no upstream Helm chart exists) and includes sigstore/common as the obligatory hollow-chart-gate dependency, matching the bp-vllm precedent from W2.5.D (#283). Closes #267 Closes #268 Closes #271 helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
a6bf07b0ce
|
feat(charts): bp-librechat wrapper chart (closes #275) (#287)
W2.5.G — Catalyst-authored scratch chart for LibreChat (slot 48 of the omantel-1 bootstrap-kit). LibreChat upstream does not publish a Helm chart, so this chart hand-wires the official ghcr.io/danny-avila/librechat container as Deployment + Service + Ingress + ConfigMap + ServiceAccount + NetworkPolicy + ServiceMonitor + HPA, with the sigstore/common library subchart declared to satisfy the hollow-chart gate (issue #181). Per docs/BLUEPRINT-AUTHORING.md §11.2: every observability toggle (serviceMonitor, hpa) defaults false; opt-in via per-cluster overlay once kube-prometheus-stack reconciles. The ServiceMonitor template is double-gated by .Values.serviceMonitor.enabled AND Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so flipping the toggle on a too-early Sovereign cannot break the bp-librechat reconcile. Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every endpoint URL, model name, secret reference, namespace selector, and image tag is operator-tunable via values.yaml. The Sovereign FQDN, Keycloak issuer, llm-gateway URL, embeddings URL, and TLS ClusterIssuer are all operator-supplied at install time. The image tag is pinned to v0.7.5 (no :latest). Connectors: - Chat completions: bp-llm-gateway (OpenAI-compatible /v1/chat/completions) exposed as a "custom" endpoint named "Catalyst LLM" - Embeddings (RAG): bp-bge — provider=bge maps to EMBEDDINGS_PROVIDER=openai + RAG_OPENAI_BASEURL=<bge.svc> at template-render time - SSO: bp-keycloak (OpenID Connect) — issuer/clientId from values, client secret + session secret from ExternalSecret - Conversation store: FerretDB on bp-cnpg (MongoDB wire protocol over Postgres) — operator-supplied connection URI Hosted at chat-app.<sovereign-fqdn>; the chart `fail`s render if ingress.host is empty (no platform-wide default). helm template (default values, --set ingress.host=...): ConfigMap, Deployment, Ingress, NetworkPolicy, Service, ServiceAccount helm template (--set hpa.enabled=true serviceMonitor.enabled=true --api-versions monitoring.coreos.com/v1): ConfigMap, Deployment, HorizontalPodAutoscaler, Ingress, NetworkPolicy, Service, ServiceAccount, ServiceMonitor helm lint: 1 chart(s) linted, 0 chart(s) failed (single INFO on missing icon — icons land with the marketplace card work). tests/observability-toggle.sh: PASS on default-off, opt-in (--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. Path isolation: only platform/librechat/ — no HR slot files, blueprint-release.yaml, or other charts touched. The HR slot files (clusters/.../48-librechat.yaml) and blueprint-release.yaml will land in a separate slot-wiring PR per the W2.K4 expansion plan. Closes #275 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9dc8506dd9
|
feat(charts): bp-external-secrets + bp-cnpg + bp-valkey wrapper charts (#285)
Storage-substrate batch (W2.5.A) — closes #254 by shipping the three upstream-subchart umbrella Blueprints that the Flux HRs at clusters/_template/bootstrap-kit/{15-external-secrets,16-cnpg,17-valkey} .yaml (merged via PR #262) target. Each chart follows the canonical umbrella pattern documented in docs/BLUEPRINT-AUTHORING.md §11.1: Chart.yaml declares the upstream chart under `dependencies:` so `helm dependency build` bundles the upstream payload into the OCI artifact, and Catalyst-curated overlay values + templates sit alongside in chart/values.yaml + chart/templates/. Per-chart highlights: - bp-external-secrets/1.0.0 — wraps external-secrets/external-secrets 0.10.7. Ships a default `vault-region1` ClusterSecretStore (via Helm post-install/post-upgrade hook to defer the CR application until the upstream chart's CRDs are registered) wired to the in-cluster bp-openbao service. clusterSecretStore.enabled toggle lets cluster overlays opt out and author their own multi-region CRs. - bp-cnpg/1.0.0 — wraps cnpg/cloudnative-pg 0.28.0. Operator-only surface (Cluster CRs are per-Application). CRDs ship in-chart so bp-powerdns / bp-keycloak / bp-gitea / bp-langfuse / bp-grafana / bp-temporal / bp-matrix / bp-llm-gateway / bp-bge / bp-nemo-guardrails / bp-openmeter / pool-domain-manager can `dependsOn: bp-cnpg` via Flux — closing #254 (bp-powerdns CreateContainerConfigError on pdns-pg-app secret). - bp-valkey/1.0.0 — wraps bitnami/valkey 5.5.1. BSD-3 Redis-compatible cache, replication architecture, password auth ON, NetworkPolicy ON, replicas 0 by default for solo Sovereigns (cluster overlays bump for HA). Application-tier cache only — Catalyst control plane uses NATS JetStream KV (per ARCHITECTURE.md §5). Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every observability toggle defaults `false` (ServiceMonitor / PodMonitor / PrometheusRule / metrics sidecar) and is operator-tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every upstream version, namespace, server URL, role, and password toggle is exposed under values.yaml. Cluster overlays in clusters/<sovereign>/ may override without rebuilding the Blueprint OCI artifact. helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only) helm template default render kinds: bp-external-secrets: ClusterRole, ClusterRoleBinding, ClusterSecretStore, CustomResourceDefinition, Deployment, Role, RoleBinding, Secret, Service, ServiceAccount, ValidatingWebhookConfiguration bp-cnpg: ClusterRole, ClusterRoleBinding, ConfigMap, CustomResourceDefinition, Deployment, MutatingWebhookConfiguration, Service, ServiceAccount, ValidatingWebhookConfiguration bp-valkey: ConfigMap, NetworkPolicy, PodDisruptionBudget, Secret, Service, ServiceAccount, StatefulSet Closes #254 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
ba2ff05292
|
feat(charts): bp-seaweedfs + bp-harbor + bp-vpa wrapper charts (#284)
W2.5.B — first authoring of the three Catalyst Blueprint wrapper charts
that fill bootstrap-kit slots 18 (seaweedfs), 19 (harbor) and 29 (vpa).
Each wraps an upstream chart as a Helm subchart and ships Catalyst-
curated overlay templates (NetworkPolicy + ServiceMonitor) gated behind
opt-in toggles, per docs/BLUEPRINT-AUTHORING.md §11 and
docs/INVIOLABLE-PRINCIPLES.md.
bp-seaweedfs (slot 18 — storage foundation)
- Wraps seaweedfs/seaweedfs 4.22.0; Chart name `bp-seaweedfs`.
- Catalyst defaults: 1 master + 3 volume + 1 filer + 2 s3 replicas.
- S3 API on 8333 — single S3 surface every consumer talks to per
docs/PLATFORM-TECH-STACK.md §3.5 (no per-app MinIO).
- Overlay templates: NetworkPolicy (cross-namespace S3 reachability,
cold-tier egress allowlist), ServiceMonitor (Capabilities-gated,
DEFAULT FALSE per §11.2).
- Default helm template kinds: ClusterRole, ClusterRoleBinding,
ConfigMap, Deployment, Secret, Service, ServiceAccount, StatefulSet.
bp-harbor (slot 19 — per-Sovereign OCI registry)
- Wraps goharbor/harbor 1.18.3 (appVersion 2.14.3); Chart name
`bp-harbor`.
- Catalyst defaults: blob backend = SeaweedFS S3 (regionendpoint
seaweedfs-s3.seaweedfs.svc:8333), metadata DB = bp-cnpg external
Postgres, ingress class `cilium`, expose.tls.enabled true (cert-
manager-issued Secret).
- Overlay templates: NetworkPolicy (CNPG/SeaweedFS/Keycloak egress),
ServiceMonitor (Capabilities-gated, DEFAULT FALSE).
- Trivy + SSO + pull-mirror are operator-flag opt-ins per per-
Sovereign overlay (default false; trivy/keycloak/cnpg deps land on
later slots).
- Default helm template kinds: ConfigMap, Deployment, Ingress,
PersistentVolumeClaim, Secret, Service, StatefulSet.
bp-vpa (slot 29 — vertical autoscaling)
- Wraps cowboysysop/vertical-pod-autoscaler 11.1.1 (appVersion
1.5.0); Chart name `bp-vpa`.
- Catalyst defaults: 1 replica each of recommender + updater +
admission-controller. Default mode `Off` (recommend only).
- Admission webhook self-signs via init Job (cluster-internal); per-
Sovereign overlay MAY swap to cert-manager.
- Overlay templates: NetworkPolicy (apiserver + metrics-server
egress, admission webhook ingress).
- Upstream metrics.serviceMonitor / metrics.prometheusRule defaulted
false per §11.2.
- Default helm template kinds: ClusterRole, ClusterRoleBinding,
ConfigMap, Deployment, Job, Pod, Secret, Service, ServiceAccount.
Lint + observability-toggle results
helm lint: 1 chart(s) linted, 0 chart(s) failed (each)
tests/observability-toggle.sh: PASS on all three (default render has
zero monitoring.coreos.com/v1 references; opt-in render produces a
ServiceMonitor; explicit-off render is clean).
Path isolation: only platform/seaweedfs/, platform/harbor/, and
platform/vpa/ — no HR slot files or other charts touched.
Refs: bootstrap-kit slots 18, 19, 29 reconcile against
ghcr.io/openova-io/bp-seaweedfs:1.0.0, bp-harbor:1.0.0, bp-vpa:1.0.0
which this commit produces on next blueprint-release CI run.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
|
||
|
|
c3c9c0cf27
|
feat(charts): bp-vllm + bp-bge + bp-nemo-guardrails wrapper charts (#283)
Catalyst-authored umbrella charts for the W2.5.D AI-inference stack. None of the three upstream projects publish a Helm chart, so each chart hand-wires the upstream container as Deployment + Service + ConfigMap + ServiceMonitor + NetworkPolicy + HPA, with the sigstore/common library subchart declared to satisfy the hollow-chart gate (issue #181). bp-vllm (slot 39) — wraps vllm/vllm-openai:v0.6.4. GPU-aware (nvidia.com/gpu when vllm.gpu.enabled=true; CPU fallback for dev). Default model meta-llama/Llama-3.1-8B-Instruct, port 8000, OpenAI-compatible /v1/chat/completions. All engine knobs (maxModelLen, gpuMemoryUtilization, dtype, quantization, tensorParallelSize, prefix-caching) overlay-tunable. Closes #266. bp-bge (slot 42) — wraps ghcr.io/huggingface/text-embeddings-inference:cpu-1.5. Default model BAAI/bge-small-en-v1.5 + BAAI/bge-reranker-base sidecar in same Pod. Two-port Service (8080 embed, 8081 rerank) annotated for bp-llm-gateway discovery. CPU-friendly defaults; overlay swaps in BAAI/bge-m3 on GPU Sovereigns. Closes #269. bp-nemo-guardrails (slot 43) — wraps the upstream NVIDIA/NeMo-Guardrails Dockerfile (nemoguardrails server, FastAPI, port 8000). LLM endpoint + model + engine all overlay-tunable; Colang flow bundle mounts via configMap.externalName for production rails. ConfigMap stub renders a default rail for smoke testing. Closes #270. All three charts: - Default observability toggles to false per BLUEPRINT-AUTHORING.md §11.2 - Pin upstream image tags (no :latest) per INVIOLABLE-PRINCIPLES.md #4 - Non-root securityContext (runAsUser 1000, drop ALL capabilities) - prometheus.io scrape annotations on the Pod for fallback discovery - Operator-tunable NetworkPolicy gating ingress to bp-llm-gateway and egress to HuggingFace / bp-vllm / bp-bge as appropriate helm template (default values) per chart: bp-vllm: ConfigMap, Deployment, Service, ServiceAccount bp-bge: ConfigMap, Deployment, Service, ServiceAccount bp-nemo-guardrails: ConfigMap, Deployment, Service, ServiceAccount helm template (--set serviceMonitor.enabled=true networkPolicy.enabled=true hpa.enabled=true): All three render ConfigMap + Deployment + Service + ServiceAccount + ServiceMonitor + NetworkPolicy + HorizontalPodAutoscaler. helm lint: 0 chart(s) failed for all three (single INFO on missing icon — icons land with the marketplace card work). Closes #266 Closes #269 Closes #270 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0cfd0defa9
|
fix(bp-langfuse): drop apostrophe from description to clear GHCR 500 (resolves #215) (#278)
Root cause: Helm's `helm push` collapses the chart `description` field into a single-line OCI manifest annotation `org.opencontainers.image.description`. The GHCR manifest-PUT validator returns a deterministic 500 Internal Server Error when that annotation is long AND contains an ASCII apostrophe. bp-langfuse 1.0.0 was the only chart in the observability batch (PR #214) carrying both characteristics, so it was the only one that failed to publish. Fix: reword the affected sentence from "Langfuse's persistent state" to "the Langfuse persistent state" — drops the apostrophe, preserves the meaning, and crucially preserves every byte of the actual chart payload (values, templates, all 350 entries of the upstream langfuse-1.5.28 subchart with its 4-level-deep Bitnami vendoring). No runtime behavioural change; helm template renders the exact same 6 resources across 490 lines. The narrowing was done by progressively reducing the Chart.yaml from the failing version to a passing version while pushing to a scratch GHCR namespace, with the bp-langfuse repo deleted between attempts (verified via `DELETE /orgs/openova-io/packages/container/bp-langfuse` and re-querying). The trigger is reproducible: long description + apostrophe → 500; long description without apostrophe → push succeeds; short description with apostrophe → push succeeds. Added a multi-line WARNING comment immediately above `description:` documenting the trigger so future authors do not reintroduce a possessive form. Issue #215 captures the full reproduction. Closes #215 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
ec3821f7e1
|
fix(bp-*): event-driven HR install -- drop blanket timeout, use disableWait (#250)
Helm install completes when manifests apply, not when pods reach Ready. Flux dependsOn checks Ready=True on each HR independently, so spec.install.disableWait + spec.upgrade.disableWait is the correct shape for slow-Ready workloads. Blanket spec.timeout: Nm watchdogs from PR #221 were a band-aid that caused cascading HR failures and blocked downstream HRs (bp-nats-jetstream, bp-openbao depended on bp-spire). Founder direction (verbatim): "always event driven robust jobs" Per-HR audit (drop spec.timeout: 15m, add disableWait, with reason): - bp-cilium: envoyconfig CRD self-wait — agent crash-loops until its own CRDs land - bp-cert-manager: webhook readiness depends on cainjector mutating Secret — multi-minute on cold start - bp-flux: adopts cloud-init Flux objects; the helm-controller reconciling THIS HR is itself a chart target — Ready deadlock without disableWait - bp-sealed-secrets: single-replica controller + CRD — install completes on manifest apply - bp-spire: spire-controller-manager waits for CRD informer cache sync — multi-minute legitimate path; chart fix below - bp-nats-jetstream: JetStream raft quorum formation across N replicas - bp-openbao: 3-node Raft sealed-by-default; Ready=True only after operator runs `bao operator init` unseal flow - bp-keycloak: DB schema migration + 100+ Liquibase changesets on first install - bp-gitea: PostgreSQL DB init + admin user + Blueprint catalog mirror seeding - bp-external-dns: pod readiness depends on PowerDNS API + pdns-pg CNPG cascade - bp-catalyst-platform: ~10 services, inter-service NATS/OTel readiness is not Helm's concern Intentionally NOT touched (other parallel agents own these): - bp-crossplane (Agent A): chart split for intra-chart CRD-ordering - bp-powerdns (Agent D): post-install hook for intra-chart Job-ordering bp-spire chart fix (1.1.3 -> 1.1.4): Root cause investigation on otech.omani.works (live): spire-controller-manager has restarted 37 times with: "failed to wait for clusterstaticentry caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ClusterStaticEntry" `kubectl get crd | grep spire` returns nothing — the spire.spiffe.io v1alpha1 CRDs (ClusterSPIFFEID / ClusterStaticEntry / ClusterFederatedTrustDomain) are NOT registered. The upstream `spire` chart does not install its own CRDs; the spiffe maintainers ship them via the SEPARATE `spire-crds` chart, expected to be installed first. Fix: platform/spire/chart/Chart.yaml now declares spire-crds 0.5.0 as the FIRST dependency. Helm installs subcharts in dependency order, so listing spire-crds first guarantees CRDs are applied before the spire subchart's controller-manager Deployment starts. blueprint.yaml + both 06-spire.yaml cluster references bumped to 1.1.4. Live error this fixes (otech.omani.works, persistent ~5h): Helm upgrade failed for release spire-system/spire with chart bp-spire@1.1.3: context deadline exceeded + downstream cascade: bp-nats-jetstream / bp-openbao stuck at "dependency 'flux-system/bp-spire' is not ready" Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
726af6df81
|
fix(bp-powerdns): self-generate api-credentials Secret + disable upstream zone-bootstrap Job (#248)
Root cause investigation on otech.omani.works (kubectl, sanitized):
$ kubectl get pods -n powerdns
create-zone-if-not-exist-sh-tjtr4 0/1 CreateContainerConfigError 4h
powerdns-57d7d49f99-{9hrb4,lxlgt,nkmht} 0/1 CreateContainerConfigError 4h
dnsdist-594dbfc5f-wznsw 1/1 Running 4h
$ kubectl get secrets -n powerdns
powerdns Opaque 1 4h
powerdns-api-tls-8kxpx Opaque 1 4h (NO `powerdns-api-credentials`, NO `pdns-pg-app`)
$ kubectl describe pod ... powerdns-57d7d49f99-9hrb4
Environment:
PDNS_API_KEY: <set to the key 'api-key' in secret 'powerdns-api-credentials'> Optional: false
PDNS_DB_HOST: <set to the key 'host' in secret 'pdns-pg-app'> Optional: false
State: Waiting Reason: CreateContainerConfigError
The handover's chicken-egg-with-secret theory was directionally right but
the cause was more fundamental:
1. Wrapper chart's api-credentials-secret.yaml (1.1.2) was a no-op
unless operator set `apiKey` value out-of-band — comment said the
deployment would "fail to start until the named Secret exists" as
"the explicit signal we want". On a Sovereign that bootstraps from
bp-* OCI artifacts, no operator is standing by, so the Secret is
never created and pods sit in CreateContainerConfigError forever.
2. The upstream chart's `create-zone-if-not-exists-sh` Job is rendered
whenever both `zoneName` and `api.key` are set — defaulting
`zoneName: "example.de."` it ALWAYS rendered and ALWAYS failed
(same missing Secret). Catalyst doesn't want this Job at all
because zones are loaded later by pool-domain-manager (PDM).
3. The chart's CNPG Cluster template is gated behind
Capabilities.APIVersions.Has "postgresql.cnpg.io/v1" — on a fresh
Sovereign without bp-cnpg yet (bp-cnpg is on the roadmap, not in
bootstrap-kit), no Cluster is rendered and `pdns-pg-app` Secret
never materialises. With Helm `--wait`, install times out
("context deadline exceeded") even though the manifests applied
cleanly.
Fix:
* api-credentials-secret.yaml: self-generate via Helm `lookup` +
`randAlphaNum 32`. First install creates fresh randoms; every
subsequent reconcile reads back the existing values from the
Secret so the API key never rotates on upgrade. Operator can
still pin specific values via .Values.powerdns.apiKey /
.Values.powerdns.webserverPassword, or skip Secret creation
entirely via .Values.powerdns.useExistingApiSecret. Same pattern
as bitnami/postgresql, bitnami/keycloak.
* values.yaml: set `powerdns.zoneName: ""` so upstream chart's
`{{- if and .Values.powerdns.zoneName .Values.powerdns.api.key }}`
gate skips the create-zone Job entirely. Catalyst's PDM creates
zones via the REST API after the cluster comes up; we don't want
a placeholder `example.de.` zone in production.
* HelmRelease (both _template and otech.omani.works overlays):
`install.disableWait: true` + `upgrade.disableWait: true` so the
HelmRelease reports Ready as soon as manifests apply cleanly,
rather than gating on powerdns Deployment readiness which depends
on bp-cnpg landing first to synthesise `pdns-pg-app`. Runtime
convergence is observed via kubectl, not gated on Helm.
Live error this addresses:
Helm upgrade failed for release powerdns/powerdns with chart
bp-powerdns@1.1.2: context deadline exceeded
Verified locally with `helm template`:
- powerdns-api-credentials Secret renders with random api-key + webserver-password
- create-zone-if-not-exist-sh Job no longer rendered
- Deployment env continues to reference powerdns-api-credentials correctly
Bumped 1.1.2 -> 1.1.3 (chart, blueprint, both bootstrap-kit overlays).
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2d1799d738
|
fix(bp-crossplane): split XRDs+Compositions into bp-crossplane-claims (#247)
Resolves install ordering on fresh clusters where the apiserver rejects CompositeResourceDefinition CRs because the apiextensions.crossplane.io CRDs registered by the crossplane subchart aren't live yet at apply time. - bp-crossplane bumped 1.1.2 -> 1.1.3 (controller-only payload) - NEW bp-crossplane-claims@1.0.0 carries XRDs + Compositions - Flux HelmRelease for crossplane-claims uses dependsOn: [bp-crossplane] - composition-validate.sh + fixtures relocate to the new chart - blueprint-release CI: opt-out annotation catalyst.openova.io/no-upstream=true permits zero-deps charts that legitimately ship only Catalyst-authored CRs (the original hollow-chart rule remains in force for every other umbrella chart) Live error this fixes (from otech.omani.works): no matches for kind "CompositeResourceDefinition" in version "apiextensions.crossplane.io/v1" -- ensure CRDs are installed first Pattern: intra-chart CRD-ordering breaks -> split charts + Flux dependsOn. Apply universally to similar cases going forward. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f658757962
|
fix(bp-crossplane): resolve CHART_DIR to absolute path in composition-validate.sh (#237)
CI invokes the script as `bash <script> "platform/crossplane/chart"` from
the repo root. The script then `cd`s into that relative path, which works,
but every later `"$CHART_DIR/<sub>"` reference (notably FIXTURE_DIR for
Case 6) inherits the now-stale relative prefix and resolves under the
wrong cwd. Fix: resolve CHART_DIR via `(cd ... && pwd)` to an absolute
path BEFORE the chdir.
Local repro before fix:
$ bash platform/crossplane/chart/tests/composition-validate.sh \
platform/crossplane/chart
...
Case 6: every fixture XRC kind is matched by an XRD
FAIL: fixtures dir platform/crossplane/chart/tests/fixtures missing
Local result after fix:
$ bash platform/crossplane/chart/tests/composition-validate.sh \
platform/crossplane/chart
...
Case 6: every fixture XRC kind is matched by an XRD
PASS
All bp-crossplane Day-2 CRUD Composition gates green.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8592d20919
|
feat(bp-crossplane): 6 XRDs + Compositions for Day-2 CRUD (RegionClaim/ClusterClaim/NodePoolClaim/LoadBalancerClaim/PeeringClaim/NodeActionClaim) (#236)
Adds the 6 CompositeResourceDefinitions and matching Compositions that back the catalyst-api Day-2 CRUD endpoints. catalyst-api writes XRCs of these kinds; Crossplane materialises them into provider-hcloud (and a small number of provider-kubernetes) managed resources. Per docs/INVIOLABLE-PRINCIPLES.md #3, every cloud-side op flows through provider-hcloud — never bespoke hcloud-go calls or shell-outs to the hcloud CLI. XRDs (canonical group: compose.openova.io/v1alpha1): - RegionClaim → composes the Phase-0 quartet via provider-hcloud: Network + NetworkSubnet + Firewall + Server (cp1) + LoadBalancer + LoadBalancerNetwork + LoadBalancerService×2 + LoadBalancerTarget. Mirrors infra/hetzner/main.tf 1:1 so deletion of a RegionClaim cascades the whole slice. - ClusterClaim → composes a provider-kubernetes Object that materialises a cluster-identity ConfigMap. The catalyst-environment-controller reads the CM to template per-server cloud-init. - NodePoolClaim → composes up to 100 provider-hcloud Server resources. UPDATE flow: patching replicas n→m flips the per-index Required-policy gate so Crossplane creates/deletes Server CRs. - LoadBalancerClaim → composes provider-hcloud LoadBalancer + LoadBalancerNetwork + up to 50 LoadBalancerService entries (per listener) + up to 50 LoadBalancerTarget entries. UPDATE: patch listeners[]/targets[] → composite controller adds/removes services/targets. - PeeringClaim → composes 1 or 2 provider-hcloud Route resources (bidirectional flag toggles the second one through a Required-policy gate). - NodeActionClaim → composes a provider-kubernetes Object that creates a batch/v1 Job running kubectl cordon/drain (k8s-side op, not a cloud op, per the task spec). action=replace additionally composes a provider-hcloud Server for the replacement node. UPDATE/DELETE summary: - UPDATE: every mutable schema field is patched onto the underlying managed resource; Crossplane's composite controller drives the diff and provider-hcloud reconciles to the new state. - DELETE: every composed resource has deletionPolicy: Delete, so a cascade delete of the composite tears down the whole resource graph in dependency-safe order (Crossplane retries until deps unblock). New tests: - tests/composition-validate.sh — 7 gates: helm renders cleanly, exactly 6 XRDs, ≥ 6 Compositions, all 6 expected claim kinds present, every rendered doc is valid YAML, every fixture references a real XRD, and (when KUBECONFIG + Crossplane CRDs available) server-side dry-run for every fixture. - tests/fixtures/<kind>-sample.yaml — one XRC fixture per kind. Version bump: - platform/crossplane/chart/Chart.yaml 1.1.1 → 1.1.2 - platform/crossplane/blueprint.yaml 1.1.1 → 1.1.2 - clusters/_template/bootstrap-kit/04-crossplane.yaml → 1.1.2 - clusters/otech.omani.works/bootstrap-kit/04-crossplane.yaml → 1.1.2 Hard rules respected: - provider-hcloud only for cloud ops (never hcloud-go, never CLI). - provider-kubernetes Object for k8s-side ops (never raw kubectl). - No bespoke kubectl manifests for cloud resources. - Frontend + catalyst-api Go code untouched (sibling-owned). - Target state, no MVP framing — all 6 Compositions ship. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c747fe2265
|
fix(bp-gitea): override postgresql to bitnamilegacy (Bitnami evacuated docker.io tags) (#231)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
da87fb38c4
|
fix(bp-spire): disable ALL default-enabled clusterSPIFFEIDs (default+oidc+test-keys) (#230)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
719c3bac35
|
fix(bp-spire): disable default ClusterSPIFFEID — CRD not observable in time on fresh install (#228)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
1689ffcd1a
|
fix(bp-coraza,bp-syft-grype): add common library subchart to satisfy hollow-chart gate (#220)
Both charts are scratch (no upstream Helm chart published — Coraza project + anchore/syft+grype CLIs ship containers only). The blueprint-release.yaml hollow-chart gate (issue #181) rejects charts with zero declared dependencies. Adding sigstore/common as a tiny library subchart satisfies the gate; common is a library type so it contributes zero runtime resources to either chart's rendered output. The Catalyst-side templates (Deployment+Service for bp-coraza, CronJob+PVC for bp-syft-grype) remain entirely in templates/ — the library dep is purely a CI-gate mechanism, NOT a functional dependency. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
3a57e287e5
|
feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216)
* feat(bp-falco): umbrella chart for security layer Catalyst Blueprint umbrella chart for falco — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-kyverno): umbrella chart for security layer Catalyst Blueprint umbrella chart for kyverno — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-trivy): umbrella chart for security layer Catalyst Blueprint umbrella chart for trivy — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-sigstore): umbrella chart for security layer Catalyst Blueprint umbrella chart for sigstore — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-syft-grype): umbrella chart for security layer Catalyst Blueprint umbrella chart for syft-grype — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-reloader): umbrella chart for security layer Catalyst Blueprint umbrella chart for reloader — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-coraza): umbrella chart for security layer Catalyst Blueprint umbrella chart for coraza — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. * feat(bp-litmus): umbrella chart for security layer Catalyst Blueprint umbrella chart for litmus — security/policy layer. Pinned upstream + appVersion verified against the helm index on 2026-04-30. ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Solo-Sovereign defaults; per-Sovereign overlays bump to HA later. Part of security-stack umbrellas batch 3. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
75128781b3
|
feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214)
* feat(bp-grafana): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana — visualization layer of the LGTM observability stack (Loki/Grafana/Tempo/Mimir). Pinned to grafana/grafana 10.5.15 (appVersion 12.3.1) — current stable on 2026-04-29. Solo-Sovereign defaults: 1 replica, 10Gi PVC, ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-loki): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Loki — log aggregation backend of the LGTM stack. SingleBinary mode by default (solo-Sovereign min); SimpleScalable/Distributed are values toggles. Pinned to grafana/loki 7.0.0 (appVersion 3.6.7) on 2026-04-29. Filesystem storage default; SeaweedFS S3 wiring is per-Sovereign overlay when scaling out. All observability toggles default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-mimir): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Mimir — metrics storage tier of the LGTM stack. Pinned to grafana/mimir-distributed 6.0.6 (appVersion 3.0.4) on 2026-04-29. Solo-Sovereign defaults: every component scaled to 1 replica, zoneAwareReplication disabled, Kafka ingest-storage disabled. Bundled MinIO kept enabled as a stop-gap so the chart renders; SeaweedFS S3 wiring is per-Sovereign overlay. All metaMonitoring toggles default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-tempo): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Tempo — distributed tracing backend of the LGTM stack. Single-binary mode by default (solo-Sovereign min); microservice mode (tempo-distributed) is a chart swap toggle. Pinned to grafana/tempo 1.24.4 (appVersion 2.9.0) on 2026-04-29. Local PVC storage default; SeaweedFS S3 wiring is per-Sovereign overlay. Metrics generator disabled by default (depends on bp-mimir). ServiceMonitor default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-alloy): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Alloy — unified telemetry collector for the LGTM stack (logs, metrics, traces; OTLP-native). Pinned to grafana/alloy 1.8.0 (appVersion v1.16.0) on 2026-04-29. DaemonSet controller default (one Alloy per node) so node + container telemetry work out of the box. Empty Alloy config by default; per-Sovereign overlays populate forwarders to bp-loki/bp-mimir/bp-tempo once those reconcile. ServiceMonitor + ingress + CRDs default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-opentelemetry): umbrella chart for observability stack Catalyst Blueprint umbrella for the OpenTelemetry Collector — vendor- neutral telemetry collector. Sibling to bp-alloy; per-Sovereign overlays choose one. Pinned to open-telemetry/opentelemetry-collector 0.152.0 (appVersion 0.150.1) on 2026-04-29. Uses the contrib distribution (otel/opentelemetry-collector-contrib:0.150.1) so Loki/Mimir/Tempo exporters are bundled. Deployment mode default (1 replica); DaemonSet + StatefulSet are values toggles. All presets default false; ingress + ServiceMonitor + PodMonitor + PrometheusRule + NetworkPolicy default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-langfuse): umbrella chart for observability stack Catalyst Blueprint umbrella for Langfuse — LLM observability platform. Complements bp-grafana (infrastructure metrics) with AI-specific telemetry (traces, evaluations, prompts, cost attribution). Pinned to langfuse/langfuse 1.5.28 (appVersion 3.171.0) on 2026-04-29. Catalyst convention: ALL bundled Bitnami subcharts are disabled — PostgreSQL via cnpg.io/Cluster (bp-cnpg), Redis via bp-valkey, ClickHouse via bp-clickhouse, S3 via bp-seaweedfs. Per-Sovereign overlays wire external endpoints + Secret references. Telemetry to Langfuse Inc. defaulted false; signUpDisabled defaulted true. Part of issue #204 observability-stack umbrellas batch. * feat(bp-velero): umbrella chart for observability stack Catalyst Blueprint umbrella for Velero — Kubernetes-native backup and disaster recovery. Per platform/velero/README.md, ALL Velero output goes to SeaweedFS (Catalyst's unified S3 encapsulation), which transitions to a cloud archival backend on the cold tier. Pinned to vmware-tanzu/velero 12.0.1 (appVersion 1.18.0) on 2026-04-29. Bundled velero-plugin-for-aws:v1.14.0 init container so SeaweedFS S3 is reachable. backupsEnabled/snapshotsEnabled defaulted false at this layer (placeholders for backupStorageLocation); per-Sovereign overlays flip on after wiring SeaweedFS endpoint + credentials. ServiceMonitor + PodMonitor + PrometheusRule default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
fa0e3a494b
|
fix(bp-keycloak): pin to current Bitnami tag (closes #191) (#198)
* fix(bp-keycloak): pin to current Bitnami Keycloak tag (closes #191) Bitnami consolidated their tag scheme around 2025-09 (see https://github.com/bitnami/charts/issues/30852). The chart was pinned to upstream bitnami/keycloak Helm chart 24.7.1, whose default image tag `bitnami/keycloak:26.2.4-debian-12-r0` now returns 404 in the Docker Hub registry — installs hit ImagePullBackOff (verified on omantel). Changes: - Upstream Bitnami chart: 24.7.1 -> 25.2.0 (latest, appVersion 26.3.3) - Override image.registry/image.repository for every Bitnami image used by the chart (keycloak app, keycloak-config-cli, postgresql, postgres-exporter, os-shell) to point at `bitnamilegacy/*`, where the historic debian-12 tags are preserved - Replace deprecated `proxy: edge` with `proxyHeaders: "xforwarded"` (chart 25.x renamed the field; Catalyst fronts Keycloak with Cilium Gateway which sets X-Forwarded-* headers) - bp-keycloak chart version: 1.1.1 -> 1.1.2 Verification (registry HEAD via Bearer token): bitnami/keycloak:26.2.4-debian-12-r0 -> 404 (broken pin) bitnami/keycloak:26.3.3-debian-12-r0 -> 404 (registry move) bitnamilegacy/keycloak:26.3.3-debian-12-r0 -> 200 bitnamilegacy/keycloak-config-cli:6.4.0-... -> 200 bitnamilegacy/postgresql:17.6.0-debian-12-r0 -> 200 bitnamilegacy/postgres-exporter:0.17.1-... -> 200 bitnamilegacy/os-shell:12-debian-12-r50 -> 200 `helm template platform/keycloak/chart` renders cleanly; rendered images all resolve to bitnamilegacy/* tags listed above. Long-term follow-up (not blocking): bitnamilegacy is explicitly marked "no longer updated, may be removed in the future" — Catalyst should either build its own Keycloak image or migrate to the Bitnami Secure Image (BSI/Photon) catalog when chart support catches up. Tracked in the bp-keycloak description block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-keycloak): bump blueprint.yaml version to match Chart.yaml 1.1.2 --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bcd2e7980a
|
fix: hide CRD-emitting resources behind Capabilities gates (closes #190) (#200)
* fix(bp-external-dns): hide CRD-emitting resources behind Capabilities gates (refs #190) Wrap the Catalyst overlay's ServiceMonitor and ExternalSecret templates in `.Capabilities.APIVersions.Has` checks so a cold install on a fresh Sovereign — where bp-kube-prometheus-stack and bp-external-secrets have not yet reconciled — no longer fails with `no matches for kind X in version Y`. The values toggles (`externalDns.serviceMonitor.enabled`, `externalDns.externalSecret.enabled`) remain — Capabilities is defense in depth so an operator flipping the toggle on a Sovereign that hasn't reached Phase 2 doesn't break the bp-external-dns reconcile. Verified locally: `helm template` with toggles off renders 0 of these resources; with toggles ON and `--api-versions monitoring.coreos.com/v1 --api-versions external-secrets.io/v1beta1` both render exactly once. Bump version 1.1.0 → 1.1.2 to align with the Phase-1 architectural-fix wave from issue #190. * fix(bp-powerdns): hide CRD-emitting resources behind Capabilities gates (refs #190) Three Catalyst overlay templates emit resources whose CRDs ship in OTHER charts and were unconditionally rendered, causing a cold install of bp-powerdns to fail with `no matches for kind X` on a Sovereign that hasn't yet reconciled the upstream chart: - cnpg-cluster.yaml → postgresql.cnpg.io/v1 Cluster (CRD ships in bp-cnpg) - api-ingress.yaml → traefik.io/v1alpha1 Middleware (CRD ships with the Traefik controller; k3s ships it by default but a Sovereign overlay MAY disable Traefik in favour of cilium-only ingress) - crossplane-floatingip.yaml → compose.openova.io/v1alpha1 HetznerFloatingIP (CRD ships when the Catalyst Crossplane composition family lands — see GAP DISCLOSURE in that template) Each is wrapped in `.Capabilities.APIVersions.Has "<group>/<version>"`. The Traefik router-middleware annotation on the Ingress is similarly gated so the auth posture cleanly moves to the Sovereign's chosen ingress controller when Traefik is absent. Verified locally: `helm template` with default values renders 0 of these resources; with `--api-versions postgresql.cnpg.io/v1 --api-versions traefik.io/v1alpha1 --api-versions compose.openova.io/v1alpha1` plus `--set crossplane.floatingIP.enabled=true`, all three render exactly once. Existing tests/observability-toggle.sh still passes. Bump version 1.1.1 → 1.1.2. * fix(bp-powerdns): bump blueprint.yaml to match Chart.yaml 1.1.2 after Capabilities gate work --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
1f5c76def1
|
fix(platform): sync blueprint.yaml versions with Chart.yaml (#199)
* feat(ui): Playwright cosmetic + step-flow regression guards
15 regression guards in products/catalyst/bootstrap/ui/e2e/cosmetic-
guards.spec.ts that fail HARD when each user-flagged defect class
returns:
1. card height drift from canonical 108px
2. reserved right padding eating description width
3. logo tile drift from per-brand LOGO_SURFACE
4. invisible glyph (white-on-white) via luminance proxy
5. wizard step order Org/Topology/Provider/Credentials/Components/
Domain/Review
6. legacy "Choose Your Stack" / "Always Included" tab labels
7. Domain step reachable before Components
8. CPX32 not the recommended Hetzner SKU
9. per-region SKU dropdown shows wrong provider catalog
10. provision page is .html (static) not SPA route
11. legacy bubble/edge DAG SVG markup on provision page
12. admin sidebar drift from canonical core/console (w-56 + 7 labels)
13. AppDetail uses tablist instead of sectioned layout
14. job rows navigate to /job/<id> instead of expand-in-place
15. Phase 0 banners (Hetzner infra / Cluster bootstrap) on AdminPage
Each test prints a failure message naming the canonical reference,
the source-of-truth file, and the data-testid PR needed (if any) so
the implementing agent has a precise target. No .skip() — per
INVIOLABLE-PRINCIPLES #2, missing components fail loud.
CI: .github/workflows/cosmetic-guards.yaml runs the suite on every
PR that touches products/catalyst/bootstrap/ui/** or core/console/**.
Docs: docs/UI-REGRESSION-GUARDS.md maps each test to the user's
original complaint, the canonical reference, and the green/red
semantics (5 tests intentionally RED on main today — they stay red
until the companion-agent's UI work lands).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(platform): sync blueprint.yaml versions with Chart.yaml so manifest-validation passes
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b0c1c07271 |
fix(bp-flux): align upstream flux2 version with cloud-init's flux install (no double-install destruction)
Live verified on omantel.omani.works (2026-04-29). bp-flux:1.1.1 shipped the fluxcd-community `flux2` subchart at 2.13.0 (= upstream Flux appVersion 2.3.0). Cloud-init pre-installed Flux core at v2.4.0 via `https://github.com/fluxcd/flux2/releases/download/v2.4.0/install.yaml`. helm-controller's reconcile of bp-flux ran `helm install` on top of the running v2.4.0 Flux; the chart's v2.3.0 CRD update failed apiserver admission with `status.storedVersions[0]: Invalid value: "v1": must appear in spec.versions`; Helm rolled back; the rollback DELETED every running Flux controller Deployment (helm-controller, source-controller, kustomize-controller, image-automation-controller, image-reflector-controller, notification-controller). The cluster lost its GitOps engine — no further HelmRelease could progress, and the only recovery was full `tofu destroy` + reprovision. This is OPTION C of the architectural fix proposed in the incident memo: version-align cloud-init's flux2 install with the bp-flux umbrella chart's `flux2` subchart so a single upstream Flux release is installed and helm-controller adopts it on first reconcile rather than reinstalls on top with a different version. Changes: * `infra/hetzner/cloudinit-control-plane.tftpl` — kept the install.yaml URL pinned at v2.4.0 (deliberate; this is the source of truth) and added the CRITICAL VERSION-PIN INVARIANT comment block documenting the failure mode. * `platform/flux/chart/Chart.yaml` — bumped `flux2` subchart dep from 2.13.0 to 2.14.1. The community chart 2.14.1 carries appVersion 2.4.0, matching cloud-init exactly. Bumped chart version 1.1.1 -> 1.1.2. * `platform/flux/chart/values.yaml` — `catalystBlueprint.upstream .version` mirror of the dep pin moved from 2.13.0 to 2.14.1. * `clusters/_template/bootstrap-kit/03-flux.yaml` and `clusters/omantel.omani.works/bootstrap-kit/03-flux.yaml` — bumped bp-flux HelmRelease to 1.1.2 + added explicit `install.disableTakeOwnership: false`, `upgrade.disableTakeOwnership: false`, and `upgrade.preserveValues: true` so helm-controller adopts the cloud-init-installed Flux objects rather than rolling back on ownership conflict. * `products/catalyst/chart/Chart.yaml` — bumped bp-catalyst-platform umbrella 1.1.1 -> 1.1.2, with bp-flux dep bumped to 1.1.2. * `clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml` and `clusters/omantel.omani.works/bootstrap-kit/13-bp-catalyst-platform.yaml` — bumped HelmRelease to 1.1.2. * `platform/flux/chart/tests/version-pin-replay.sh` — NEW. Six-case catastrophic-failure replay test: Case 1: Chart.yaml declares the flux2 subchart with explicit version. Case 2: cloud-init pins flux2 install.yaml to an explicit v-tag. Case 3: chart's flux2 subchart appVersion equals cloud-init's pinned upstream version (the load-bearing invariant). Case 4: values.yaml metadata mirrors the Chart.yaml dep pin. Case 5: helm template renders cleanly + contains the four core Flux controllers. Case 6: replay test rejects a planted mismatched fake Chart.yaml (the gate's own self-test — proves the gate works). All six cases green locally; the new test joins the existing observability-toggle test in tests/. * `docs/RUNBOOK-PROVISIONING.md` — new section "bp-flux double-install — version-pin invariant" documenting the failure mode, the four pin-sites, the safe bump procedure, and the existing-Sovereign recovery path (full reprovision). Existing Sovereigns running 1.1.1: no in-place recovery is possible once the rollback has fired. Reprovision required against 1.1.2. Per docs/INVIOLABLE-PRINCIPLES.md #3 (architecture as documented) + #4 (never hardcode) — the version pins remain operator-bumpable via PR, but BOTH cloud-init's URL AND the chart's subchart MUST move together in the same PR; CI gate tests/version-pin-replay.sh enforces this. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4265884d58 |
feat(bp-external-dns): umbrella chart + add to bootstrap-kit Kustomization
Convert platform/external-dns/chart/ from a metadata-only wrapper to a
proper Helm umbrella that pulls kubernetes-sigs/external-dns 1.15.2
(appVersion 0.15.1, k8s 1.31-validated) as a Helm subchart, mirroring
the bp-cilium / bp-cert-manager / bp-powerdns shape. Native PowerDNS
provider speaks the bp-powerdns REST API directly via the
EXTERNAL_DNS_PDNS_API_KEY env var sourced from the
powerdns-api-credentials Secret bp-powerdns renders.
Catalyst overlay templates added (default-off where applicable per the
observability-toggle rule for the bp-* family):
- templates/networkpolicy.yaml (default ON; egress to powerdns +
cluster DNS + apiserver only)
- templates/servicemonitor.yaml (default OFF)
- templates/externalsecret.yaml (default OFF; Phase-2 OpenBao path)
- templates/_helpers.tpl
Bootstrap-kit Kustomization gets a new 12-external-dns.yaml HelmRelease
referencing bp-external-dns:1.1.0 with dependsOn bp-cert-manager +
bp-powerdns, and the legacy 11-bp-catalyst-platform.yaml is renumbered
13- so the install ordering reads in canonical Phase-0 sequence. Mirrored
to clusters/omantel.omani.works/bootstrap-kit/ with the SOVEREIGN_FQDN
substitution applied.
bp-catalyst-platform Chart.yaml drops bp-external-dns from its
dependency block — install ordering for ExternalDNS is now owned by Flux
dependsOn at the Kustomization layer rather than this umbrella's Helm
dependency graph. Bumped 1.1.0 → 1.1.1 to reflect the dep removal, and
the bootstrap-kit HelmRelease references in both clusters bumped in
lockstep.
Wrapper chart version bumped 1.0.0 → 1.1.1 (umbrella shape).
Local gates pass:
- helm dependency build (pulls external-dns-1.15.2.tgz)
- helm lint (0 failures)
- helm template smoke render (245 lines, 6 kinds rendered)
- helm package + tar-tzf verifies external-dns subchart inside the
packaged tgz (subchart-guard simulation passes)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
31d5911221
|
Merge pull request #185 from openova-io/fix/bp-charts-observability-toggles-default-false
fix(bp-*): observability toggles default false (v1.1.1) |
||
|
|
1ddd569789 |
fix(bp-*): observability toggles default false — break circular CRD dependency
Extends the v1.1.1 hardening that started with cilium / cert-manager /
crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints.
Every observability toggle in every Catalyst-curated Blueprint now ships
`false`/`null` by default; the operator opts in via a per-cluster values
overlay at clusters/<sovereign>/bootstrap-kit/* once
bp-kube-prometheus-stack reconciles.
Live failure mode that prompted this (omantel.omani.works 2026-04-29):
bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor
to true. The upstream Cilium 1.16.5 chart renders a
monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with
kube-prometheus-stack — a tier-2 Application Blueprint that depends on
the bootstrap-kit (cilium first). Helm install fails on a fresh
Sovereign with "no matches for kind ServiceMonitor in version
monitoring.coreos.com/v1 — ensure CRDs are installed first" and every
downstream HelmRelease reports `dep is not ready`. The earlier
trustCRDsExist=true mitigation only suppresses Helm's render-time gate;
the apiserver still rejects the resource at install-time.
Per-Blueprint changes:
- bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false;
hubble.metrics.enabled → null (this is the exact value that disables
the upstream metrics ServiceMonitor template branch — verified by
reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor
.enabled → false. tests/observability-toggle.sh extended with Case 4
(default render produces no hubble-relay / hubble-ui Deployments).
- bp-flux: flux2.prometheus.podMonitor.create → false.
- bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled
→ false (explicit lock; upstream already defaults false).
- bp-spire: spire.global.spire.recommendations.enabled +
recommendations.prometheus → false.
- bp-nats-jetstream: nats.promExporter.enabled +
promExporter.podMonitor.enabled → false.
- bp-openbao: openbao.injector.metrics.enabled +
openbao.serviceMonitor.enabled → false.
- bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled
+ metrics.prometheusRule.enabled → false.
- bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.*
serviceMonitor + prometheusRule → false.
- bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled
→ false (forward-compatibility guard; current upstream
pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future
upstream bump cannot silently regress).
Each chart ships a tests/observability-toggle.sh that asserts the rule
in three cases (default off / explicit on opt-in / explicit off) — runs
under blueprint-release.yaml's chart-test gate (added
|
||
|
|
02b5b6c4c8 |
fix(bootstrap-kit): override cilium + cert-manager values to disable observability toggles
Live verified on omantel: bp-cilium and bp-cert-manager v1.1.0 fail Helm install with 'no matches for kind ServiceMonitor in version monitoring.coreos.com/v1'. Manual kubectl-patch of the live HelmRelease worked but Flux's 15-min reconcile rolls back the patch because the HelmRelease CR is owned by the kustomize-controller from git. Override the values inline in the HelmRelease manifests so the patch is durable across Flux reconciles. Same pattern as the in-flight observability- toggle agent will apply to all 12 charts in the next chart bump (v1.1.1). This is the manifest-level workaround that unblocks the running omantel cluster TODAY without waiting for v1.1.1 publish. Mirrors the patches into both clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/bootstrap-kit/ so future Sovereigns inherit. |
||
|
|
b1638f51ea |
fix(bp-* tests): skip helm dep build when charts/ already vendored
Earlier rerun failure on the CI workflow (bp-cert-manager 25120060270): Error: no repository definition for https://charts.jetstack.io. Please add the missing repos via 'helm repo add' Root cause: blueprint-release.yaml's earlier `helm dependency build` step (line 181) successfully resolves the upstream chart and populates chart/charts/ — but it does NOT `helm repo add` the upstream repo first. Helm 3.20's `helm dep build` succeeds on the first call by falling back to direct-URL fetch from Chart.yaml `dependencies[].repository`. A SECOND `helm dep build` (run by the test script) hits a different code path that requires the repo to be in the helm repo cache. Fix: tests/observability-toggle.sh now skips `helm dep build` when chart/charts/ is already populated (which is always the case in CI since the workflow's own `helm dependency build` step ran first). Local dev runs from a fresh checkout still resolve subcharts. Refs #182 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d34facc040 |
fix(bp-*): observability toggles default false — break circular CRD dependency
bp-cilium@1.1.0 install fails on every fresh Sovereign with:
no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
— ensure CRDs are installed first
Cascades to all 10 other bp-* HelmReleases ("dep is not ready") since
bp-cilium is the root of the bootstrap dep graph. Verified live on
omantel.omani.works 2026-04-29 (issue #182).
Root cause: platform/cilium/chart/values.yaml and
platform/cert-manager/chart/values.yaml hardcoded
`serviceMonitor.enabled: true`. The monitoring.coreos.com/v1 CRDs ship
with kube-prometheus-stack — an Application-tier Blueprint that itself
depends on the bootstrap-kit. Hardcoding `true` creates a circular CRD
ordering: bp-cilium wants the CRD bp-kube-prometheus-stack provides, but
bp-kube-prometheus-stack cannot install before bp-cilium.
The `trustCRDsExist=true` mitigation only suppresses Helm's render-time
gate; the apiserver still rejects the resource at install-time.
Violates INVIOLABLE-PRINCIPLES.md #4 (never hardcode): observability
toggles MUST be operator-tunable, not chart-level constants assuming an
observability tier exists.
This commit:
A. Defaults every observability toggle false in the affected wrappers:
- platform/cilium/chart/values.yaml:
cilium.prometheus.enabled: false
cilium.prometheus.serviceMonitor.enabled: false
(trustCRDsExist removed — no longer relevant)
- platform/cert-manager/chart/values.yaml:
cert-manager.prometheus.enabled: false
cert-manager.prometheus.servicemonitor.enabled: false
- platform/crossplane/chart/values.yaml:
crossplane.metrics.enabled: false
(uniformity rule — does not break install but holds the invariant)
B. Bumps affected wrapper charts 1.1.0 → 1.1.1:
- bp-cilium, bp-cert-manager, bp-crossplane (leaves)
- bp-catalyst-platform (umbrella; deps repinned to 1.1.1 for the 3)
C. Updates clusters/_template/bootstrap-kit/* and
clusters/omantel.omani.works/bootstrap-kit/* HelmRelease versions to
1.1.1 so the live Sovereign picks up the fix on Flux reconcile.
D. Adds platform/<name>/chart/tests/observability-toggle.sh under each
affected chart. Each script asserts:
- default render produces zero monitoring.coreos.com refs
- opt-in render with --set <toggle>=true succeeds and produces a
ServiceMonitor (proves the toggle is wired)
- explicit-off render succeeds and produces zero refs
Wired into .github/workflows/blueprint-release.yaml via a new
"Run chart integration tests" step that executes every chart/tests/
*.sh on every publish — a regression that re-introduces a hardcoded
`true` fails the publish job before the OCI artifact is pushed.
E. Documents the rule in docs/BLUEPRINT-AUTHORING.md §11.2
"Observability toggles must default false". References Principle #4
and provides the canonical pattern (default off in wrapper values,
opt-in via per-cluster overlay at clusters/<sovereign>/...).
Per-chart audit table (which toggle was hardcoded → new default):
| Chart | Toggle | Was | Now |
|------------------|----------------------------------------------------------|------|-------|
| bp-cilium | cilium.prometheus.enabled | true | false |
| bp-cilium | cilium.prometheus.serviceMonitor.enabled | true | false |
| bp-cert-manager | cert-manager.prometheus.enabled | true | false |
| bp-cert-manager | cert-manager.prometheus.servicemonitor.enabled | true | false |
| bp-crossplane | crossplane.metrics.enabled | true | false |
| bp-flux | (no observability hardcodes) | n/a | n/a |
| bp-sealed-secrets| (no observability hardcodes) | n/a | n/a |
| bp-spire | (no observability hardcodes) | n/a | n/a |
| bp-nats-jetstream| (no observability hardcodes) | n/a | n/a |
| bp-openbao | (no observability hardcodes) | n/a | n/a |
| bp-keycloak | (no observability hardcodes) | n/a | n/a |
| bp-gitea | (no observability hardcodes) | n/a | n/a |
| bp-powerdns | (no observability hardcodes) | n/a | n/a |
| bp-catalyst-platform | (umbrella, no values overlay) | n/a | n/a |
Local gates green:
helm dep build ✓ all 3 affected charts
helm lint ✓ all 3
helm template ✓ all 3 — 0 monitoring.coreos.com refs in default
tests/observability-toggle.sh ✓ all 9 sub-cases pass
Closes the install path for bp-cilium 1.1.1 on a fresh Sovereign;
unblocks the full bp-* dep graph.
Refs: https://github.com/openova-io/openova/issues/182
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
43aff20254 |
feat(bp-*): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream
Each platform/<name>/chart/Chart.yaml now declares the canonical upstream chart as a dependencies: entry. helm dependency build pulls the upstream payload into the OCI artifact at publish time, so Flux helm install of bp-<name>:1.1.0 actually installs the upstream Helm release alongside the Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer, ExternalSecret) under templates/. Pinned upstream chart versions per platform/<name>/blueprint.yaml: - cilium 1.16.5 https://helm.cilium.io - cert-manager v1.16.2 https://charts.jetstack.io - flux 2.4.0 https://fluxcd-community.github.io/helm-charts - crossplane 1.17.x https://charts.crossplane.io/stable - sealed-secrets 2.16.x https://bitnami-labs.github.io/sealed-secrets - spire ... https://spiffe.github.io/helm-charts-hardened - nats-jetstream ... https://nats-io.github.io/k8s/helm/charts - openbao ... https://openbao.github.io/openbao-helm - keycloak ... https://charts.bitnami.com/bitnami - gitea ... https://dl.gitea.com/charts - catalyst-platform umbrella over the 10 leaf bp-* charts via helm dependency values.yaml in each chart adopts the umbrella convention: catalystBlueprint metadata block (provenance + version) at top level, upstream subchart values namespaced under the dependency name. cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER cert-manager controllers are running and CRDs registered (the previous hollow-chart shape ran the ClusterIssuer at install time when CRDs didn't exist yet, which was the omantel cluster's exact failure mode). Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella conversion is a meaningful structural revision). Cluster manifests in clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/ bootstrap-kit/ updated to reference 1.1.0. The blueprint-release.yaml workflow's helm package step needs an explicit helm dependency build before push so the upstream subchart bytes ship inside the OCI artifact. That CI change is a follow-up commit on this same branch (separate file scope). |
||
|
|
67fdecb770 | merge: remove k8gb (#171) | ||
|
|
f5daac52af |
refactor(platform): remove k8gb — replaced by PowerDNS lua-records (#171)
PowerDNS lua-records (`ifurlup`, `pickclosest`, `ifportup`) cover everything k8gb was doing — geo-aware response selection, health-checked failover, weighted round-robin — at the authoritative DNS layer. Eliminates a separate K8s controller, CRD set, and CoreDNS plugin from every Sovereign. Changes: - platform/k8gb/ deleted (Chart.yaml, values.yaml, blueprint.yaml never authored — only README existed) - products/catalyst/bootstrap/ui/public/component-logos/k8gb.svg deleted - componentGroups.ts: remove k8gb component (PowerDNS already there) - componentLogos.tsx: drop logo_k8gb + k8gb map entry - model.ts DEFAULT_COMPONENT_GROUPS spine: replace k8gb with powerdns - StepInfrastructure.tsx: copy refers to PowerDNS lua-records, not k8gb - provision.html: replace k8gb tile and edges with powerdns - catalog.generated.ts regenerated (now includes bp-powerdns) - docs sweep — every k8gb reference in PLATFORM-TECH-STACK, NAMING- CONVENTION, SOVEREIGN-PROVISIONING, SRE, ARCHITECTURE, GLOSSARY, COMPONENT-LOGOS, IMPLEMENTATION-STATUS, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST, README, infra/hetzner/README, platform READMEs (cilium, external-dns, failover-controller, litmus, flux, opentofu) rewritten to point at PowerDNS lua-records / MULTI-REGION-DNS.md. Historical entries in VALIDATION-LOG.md preserved as audit trail. - New docs/MULTI-REGION-DNS.md — canonical reference for the lua-record patterns (ifurlup all/pickclosest/pickfirst, ifportup, pickwhashed), Application Placement → lua-record selector mapping, when to add a second Sovereign region, operational checks. Closes #171. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f4679e2748 |
fix(powerdns): enable gpgsql-dnssec for DNSSEC API (1.0.6)
Without `gpgsql-dnssec=yes` the gpgsql backend driver does not expose the DNSSEC API surface — `PUT /zones/<zone>` with `dnssec:true` returns 422 "no DNSSEC-capable backends are loaded". This blocks pool-domain- manager from enabling DNSSEC on every Sovereign child zone (mandatory per docs/PLATFORM-POWERDNS.md). Fix lands in additionalConfig so the directive is rendered alongside `default-soa-edit-signed=INCEPTION-EPOCH` and `direct-dnskey=yes`. No schema migration needed — the gpgsql 5.0.3 schema already includes the cryptokeys table; the missing piece was just the backend feature flag. Bumps Chart.yaml to 1.0.6. Verified: after this lands the PUT call returns 204 and POST /cryptokeys mints a usable KSK. Discovered while bringing up openova#168 (PDM per-Sovereign zones). |
||
|
|
fa84cac438 |
fix(powerdns): plain ALTER TABLE in postInitSQL (avoid $$ escape battle, 1.0.5)
The DO block in 1.0.4 rendered with $$ collapsed to $ by the time it reached CNPG's postInitApplicationSQL — "syntax error at or near $". Both Helm template processing and the YAML scalar block were chewing on the dollar signs. Replaced with explicit ALTER TABLE statements (one per gpgsql table) + GRANT — same end state, no PL/pgSQL quoting required. Verified at runtime on contabo-mkt: powerdns Pod went CrashLoopBackOff → Running 1/1 immediately after the manual ALTER ran by hand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
214a3e1ada |
fix(powerdns): grant table ownership to pdns user in CNPG bootstrap (1.0.4)
Verified at runtime on Contabo-mkt: postInitApplicationSQL runs as the
postgres superuser, not the application owner, so the schema tables
created by the bootstrap block were owned by postgres. PowerDNS connects
as 'pdns' and got 'permission denied for table domains' on the first
SELECT against the zone cache.
Added a DO block at the end of the schema bootstrap that walks every
table in the public schema and ALTERs OWNER TO {{ .Values.postgres.cluster.owner }}
plus GRANT ALL PRIVILEGES ON SCHEMA public — same shape PDM uses (and
the contabo-mkt cluster verified the fix runtime: powerdns Pod went
from CrashLoopBackOff to 1/1 Ready immediately after the same DDL was
run by hand).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
db20e9d42b |
fix(powerdns): dnsdist backend resolution + drop DnstapLogAction (1.0.3)
dnsdist 1.9.14 runtime errors:
1. newServer{address='powerdns:5353'} → "Unable to convert presentation
address" — dnsdist's address parser expects IP[:port], not a DNS
name. Kubernetes auto-injects POWERDNS_SERVICE_HOST as an env var
into every pod in the same namespace as the powerdns Service; using
that gives us the ClusterIP at config-load time without needing an
init container or runtime DNS resolution.
2. DnstapLogAction(name, bool, fn) signature changed in 1.9 — the
2nd parameter now expects a shared_ptr to a RemoteLoggerInterface,
not a boolean. Rather than wire up a remote dnstap server (which
adds a moving part for marginal observability gain), drop the line.
Catalyst observability is the dnsdist /metrics endpoint surfaced
to Prometheus + the k8s container log.
Bumped chart to 1.0.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
20c0543806 |
fix(powerdns): correct dnsdist image tag + drop readOnlyRootFilesystem (1.0.2)
Two runtime issues caught during first contabo-mkt rollout: 1. dnsdist image tag was "1.9" (default) — that tag doesn't exist in docker.io/powerdns/dnsdist-19. The 1.9.x line publishes 1.9.0 .. 1.9.14 (no rolling "1.9" alias). Pinned to 1.9.14 (current latest). 2. PowerDNS pod crash-looped on Errno 30 (Read-only file system: /etc/powerdns/pdns.d/0-api.conf.conf). The upstream pdns_server-startup script writes rendered config files to /etc/powerdns/pdns.d/ at container start, and the upstream template doesn't expose an emptyDir we could redirect that path to. Set readOnlyRootFilesystem=false with a verbose comment explaining why; the rest of the security context (runAsNonRoot, runAsUser=953, drop ALL caps) stays in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
19d926bfeb |
fix(powerdns): avoid recursive include in dnsdist checksum, bump to 1.0.1
Helm flagged dnsdist.yaml's checksum/config annotation as a recursive template self-reference (the file included itself). Replaced with a hash of the rendered .Values.dnsdist.config (post-tpl), which is the substantive content the annotation is supposed to track anyway. Bumped Chart.yaml to 1.0.1 so the OCIRepository semver "1.x" picks up the fix automatically on next reconcile. Blueprint API version stays at 1.0.0 (Blueprint contract is unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0190c60520 |
feat(powerdns): bp-powerdns wrapper chart + per-Sovereign zone model (#167)
Introduces the bp-powerdns Catalyst Blueprint wrapper as the authoritative
DNS service for every Sovereign zone. Replaces k8gb in componentGroups.ts —
PowerDNS Lua records cover geo + health-checked failover natively, removing
the dedicated GSLB controller.
Wrapper chart (platform/powerdns/chart/):
- Chart.yaml — bp-powerdns 1.0.0, depends on pschichtel/powerdns 0.10.0
upstream (verified Artifact Hub publisher, tracks docker.io/powerdns/
pdns-auth-50 at appVersion 5.0.3 — surveyed Artifact Hub, no official
PowerDNS chart exists)
- values.yaml — 3 replicas, gpgsql backend, DNSSEC ECDSAP256SHA256,
lua-records ON, dnsdist 100 qps default per source IP, REST API at
pdns.openova.io/api behind Traefik basicAuth
- blueprint.yaml — Catalyst metadata, visibility=unlisted (mandatory
infra), section pts-3-2-gitops-and-iac
- templates/cnpg-cluster.yaml — separate `pdns-pg` Postgres (1 instance,
5Gi, postgres-16) with PowerDNS auth-5.0.3 schema applied via
postInitApplicationSQL
- templates/dnsdist.yaml — companion Deployment + ConfigMap with
rate-limiting policy (MaxQPSIPRule per source IP)
- templates/api-ingress.yaml — Traefik Ingress + basicAuth Middleware
- templates/anycast-endpoint.yaml — placeholder Service of type
LoadBalancer (Phase-0 stand-in for the anycast Floating IP target state)
- templates/crossplane-floatingip.yaml — DISCLOSED GAP: target-state
XHetznerFloatingIP composite, disabled by default until the
Crossplane composition is authored (the existing compositions cover
Server/Network/Firewall/LoadBalancer/PoolAllocation only). The
placeholder anycast Service is the operational stand-in.
Per docs/INVIOLABLE-PRINCIPLES.md:
- #4 (never hardcode): every value flows from values.yaml or a
referenced K8s Secret. Image tags come from upstream chart appVersion,
never duplicated.
- #8 (disclose every divergence): the XHetznerFloatingIP gap is
documented in the template + in docs/PLATFORM-POWERDNS.md ("Anycast
deferral" section).
componentGroups.ts: powerdns added to SPINE group as mandatory (depends on
cnpg). external-dns now lists powerdns as a dependency. k8gb removed.
docs/PLATFORM-POWERDNS.md: per-Sovereign zone model, DNSSEC posture, REST
API contract, lua-records GSLB pattern, dnsdist policy, anycast deferral
runbook, first-deploy procedure for Contabo-mkt.
Closes #167 (Phase 1 of public-repo work; Phase 4 cluster manifest lands
in openova-private feat/powerdns-deploy).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
31b03ce02a |
ci(pdm)+platform(crossplane): build workflow + XDynadotPoolAllocation composition (Phase 3+4 of #163)
CI workflow (.github/workflows/pool-domain-manager-build.yaml) mirrors
the marketplace-api / catalyst-api shape:
- Triggers on push to core/pool-domain-manager/** + workflow_dispatch
- Runs unit tests (reserved + dynadot — the integration suite needs a
real Postgres which the workflow does not provide; full integration
runs in test-bootstrap-api.yaml against an ephemeral CNPG)
- Builds and pushes ghcr.io/openova-io/openova/pool-domain-manager:<sha>
- Cosign-signs the image via Sigstore keyless OIDC (id-token: write)
- Emits an SBOM attestation tied to the image digest
- Manifest deployment is intentionally NOT in this workflow — PDM
manifests live in the openova-private repo per the issue body, so
the Flux Kustomization there picks up the new SHA via a follow-up
private-repo commit (Phase 6 of #163)
Crossplane composition (platform/crossplane/compositions/xrd-pool-
allocation.yaml + composition-pool-allocation.yaml) wraps PDM as a
declarative Crossplane Resource:
apiVersion: compose.openova.io/v1alpha1
kind: XDynadotPoolAllocation
spec:
parameters:
poolDomain: omani.works
subdomain: omantel
sovereignFQDN: omantel.omani.works
loadBalancerIP: 1.2.3.4
createdBy: crossplane
The Composition uses provider-http (crossplane-contrib/provider-http) to
render the XR into a Reserve → Commit sequence of HTTP calls against
PDM's in-cluster service URL. Per docs/INVIOLABLE-PRINCIPLES.md #3 we use
provider-http rather than bespoke Go to keep the day-2 lifecycle
declarative. Operators who want to pre-allocate a name (e.g. reserve
'omantel.omani.works' for a Sovereign that hasn't been provisioned yet)
commit YAML to Git and Flux+Crossplane converge.
Refs: #163
|
||
|
|
8886eff708 |
Merge branch 'feat/group-g-dns-finish-v3'
Group G DNS finish (v3): #110 (Dynadot multi-domain table-driven tests), #112 (catalyst-dns httptest-mocked Dynadot coverage), #113 (cert-manager LE DNS-01 + HTTP-01 ClusterIssuer templates with operator runbook for the cert-manager-dynadot-webhook gap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
97e942e0bc |
feat(cert-manager): #113 — Lets Encrypt DNS-01 + HTTP-01 ClusterIssuers
Adds platform/cert-manager/chart/templates/clusterissuer-letsencrypt-dns01.yaml
with two ClusterIssuers, both Catalyst-curated, rendered conditionally
from values.yaml:
- letsencrypt-dns01-prod (TARGET STATE, default disabled) — ACME DNS-01
via the cert-manager webhook solver, pointing at a future
`cert-manager-dynadot-webhook` Catalyst binary that will implement the
webhook.acme.cert-manager.io/v1alpha1 contract against the existing
internal/dynadot/ package. Shipping the issuer template ahead of the
webhook so cluster overlays only need a values flip + secret ref —
no template edits — once the webhook lands.
- letsencrypt-http01-prod (INTERIM, default enabled) — ACME HTTP-01
via the cilium ingress class. Issues certs for the explicit hostnames
(console, gitea, harbor, admin, api) but NOT for wildcards; the
canonical *.<sub>.<domain> record needs DNS-01.
Header comment explains the gap: the Catalyst external-dns webhook
(products/catalyst/bootstrap/api/cmd/external-dns-dynadot-webhook/)
implements a DIFFERENT RPC contract (records.list/add/delete) than what
cert-manager DNS-01 expects (Present/CleanUp on ChallengeRequest CRD),
so it cannot be reused; a dedicated cmd/cert-manager-dynadot-webhook/
must be built. Operator runbook for cutover is in the file header.
values.yaml gains a `certManager.issuers.{email,acmeServer,dns01,http01}`
section so all knobs are runtime-configurable per
docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode); cluster overlays in
clusters/<sovereign>/ can flip dns01.enabled via the bp-catalyst-platform
umbrella's values without rebuilding the Blueprint OCI artifact.
blueprint.yaml gains a spec.outputs section advertising:
- issuerName: letsencrypt-http01-prod (default)
- wildcardIssuerName: letsencrypt-dns01-prod (target state)
- issuerKind: ClusterIssuer
so dependent Blueprints (cilium-gateway, harbor, gitea) can consume the
issuer name without hardcoding it.
Closes #113.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c07e0ad1ee |
feat(external-dns): #109 — author bp-external-dns leaf chart for OCI publish
The bp-catalyst-platform umbrella (issue #104) declares a dependency on bp-external-dns:1.0.0 — but the chart didn't exist; only README + Dynadot multi-domain policy lived under platform/external-dns/. Without this leaf the umbrella's `helm dependency build` fails (verified in run 25068433765). This commit authors the minimal target-state leaf: - Chart.yaml: name=bp-external-dns, version=1.0.0 - values.yaml: catalystBlueprint.upstream metadata (external-dns 1.15.0 from kubernetes-sigs/external-dns Helm repo) + Catalyst-curated values overlay (sources, txtOwnerId, ServiceMonitor, RBAC, resources) Per BLUEPRINT-AUTHORING.md §3, leaf charts are pure values-overlay wrappers: no templates dir, just Chart.yaml + values.yaml with the catalystBlueprint metadata block read by the bootstrap-kit installer at helm-install time. Per-Sovereign provider/zone/credential overrides are overlaid by the Crossplane Composition that materializes the HelmRelease — keeping this chart provider-agnostic (no hardcoded Cloudflare/Dynadot/Hetzner choice per INVIOLABLE-PRINCIPLES.md §4). After this lands, blueprint-release.yaml will publish ghcr.io/openova-io/bp-external-dns:1.0.0 and the next umbrella push will resolve all 11 leaf deps successfully. |
||
|
|
f0fe3006ba |
feat(external-dns): #109 — Catalyst-curated dynadot-multi-domain policy
Adds platform/external-dns/policies/dynadot-multi-domain.yaml — the canonical external-dns + dynadot webhook deployment that ships in every Sovereign on an OpenOva pool domain. Why a webhook: external-dns has no upstream Dynadot provider; the canonical pattern is the webhook RPC contract, with a sidecar that implements the provider in our preferred language. We reuse the same internal/dynadot/ package the catalyst-api uses, so the never-wipe rule, record encoding, and managed-domain allowlist are identical on both write paths (per docs/INVIOLABLE-PRINCIPLES.md #2 — no duplicate implementations of the same concern). Multi-domain: - One --domain-filter per zone in the external-dns args; adding a third pool domain (e.g. acme.io) is a one-line edit here PLUS a one-key edit on dynadot-api-credentials' `domains` field. No webhook rebuild. - Webhook reads DYNADOT_MANAGED_DOMAINS from the same secret with optional=true, preserving backward compatibility with the legacy single-`domain` secret shape (pre-#108). TXT registry: - --txt-owner-id=$(SOVEREIGN_FQDN), --txt-prefix=_externaldns.<sub>. - Cluster overlays substitute SOVEREIGN_FQDN via the bp-catalyst-platform umbrella so two clusters sharing a parent zone (alpha.omani.works, beta.omani.works) cannot collide. Closes #109. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
046e5ebc18 |
feat(day2-iac): Crossplane Compositions + per-Sovereign Flux cluster tree + catalyst-dns binary
Group F deliverables — completes the day-2 IaC layer that takes over after OpenTofu's Phase 0 hand-off (per docs/SOVEREIGN-PROVISIONING.md §4).
Three artifacts:
1. platform/crossplane/compositions/ — XRDs + Compositions for canonical Hetzner resources
under the canonical compose.openova.io/v1alpha1 group (per BLUEPRINT-AUTHORING.md §8):
- XHetznerNetwork + composition-network.yaml — wraps hcloud_network + subnet
- XHetznerFirewall + composition-firewall.yaml
- XHetznerServer + composition-server.yaml
- XHetznerLoadBalancer + composition-loadbalancer.yaml (lb11, 80→31080, 443→31443)
- README documenting the canonical pattern
2. clusters/_template/ — the canonical per-Sovereign Flux Kustomization tree.
Copied to clusters/<sovereign-fqdn>/ at provisioning time; cloud-init's
GitRepository points at the result.
- kustomization.yaml (root: flux-system + infrastructure + bootstrap-kit)
- flux-system/ (placeholder for Flux self-config customization)
- infrastructure/ (provider-hcloud + ProviderConfig referencing hcloud-credentials secret OpenTofu writes)
- bootstrap-kit/ — 11 HelmRelease manifests in dependency order:
01-cilium → 02-cert-manager → 03-flux → 04-crossplane → 05-sealed-secrets
→ 06-spire → 07-nats-jetstream → 08-openbao → 09-keycloak → 10-gitea → 11-bp-catalyst-platform
Each pulls from oci://ghcr.io/openova-io/bp-<name>:1.0.0 — the wrapper charts published by blueprint-release CI.
dependsOn declarations enforce the canonical install order at runtime.
3. clusters/omantel.omani.works/ — the first concrete Sovereign instance.
Mirror of _template with SOVEREIGN_FQDN_PLACEHOLDER substituted to omantel.omani.works.
This is what the wizard's first omantel.omani.works run will actually reconcile.
4. products/catalyst/bootstrap/api/cmd/catalyst-dns/main.go — small Go binary the
OpenTofu module's null_resource.dns_pool invokes via local-exec at Phase-0 apply time.
Reads DYNADOT_API_KEY/SECRET/DOMAIN/SUBDOMAIN/LB_IP env vars; calls existing dynadot.Client.AddSovereignRecords. Containerfile already builds + ships it at /usr/local/bin/catalyst-dns.
Architectural compliance (Lesson #24 closed):
- No bespoke Go cloud-API calls (Crossplane Compositions are the canonical day-2 IaC)
- No exec.Command("helm", ...) (Flux HelmReleases are the canonical install unit)
- No kubectl apply from outside (cloud-init kubectl-applies one Flux GitRepository, then Flux owns everything)
After this commit, the path is end-to-end: wizard → catalyst-api → tofu apply (with infra/hetzner/) → cloud-init installs k3s + Flux + applies GitRepository pointing at clusters/omantel.omani.works/ → Flux reconciles bootstrap-kit (11 HelmReleases in dependency order) → Crossplane adopts day-2 management.
|
||
|
|
62d9c7d936 |
fix(charts): drop dependencies block — wrappers carry values overlay only
The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks.
Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values.
This keeps:
- blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd)
- the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork)
- the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>)
Changes:
- 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package.
- 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values.
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up.
After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.
|
||
|
|
441ebaebb8 |
fix(charts): pin upstream chart versions/names to ones that exist in their repos
The first Blueprint Release CI run (commit
|