openova

Author	SHA1	Message	Date
github-actions[bot]	0699d562d5	deploy: update catalyst images to `ccc3898`	2026-05-02 08:44:06 +00:00
e3mrah	ccc38987c2	fix(tls): bp-cert-manager-dynadot-webhook slot 49b + DNS-01 JSON bug (Closes #550 ) (#558 ) Root cause: bootstrap-kit installs bp-cert-manager-powerdns-webhook (slot 49) but the letsencrypt-dns01-prod ClusterIssuer wires to the dynadot webhook (groupName: acme.dynadot.openova.io). Without slot 49b the APIService for acme.dynadot.openova.io does not exist → cert-manager gets "forbidden" on every ChallengeRequest → sovereign-wildcard-tls stays in Issuing indefinitely → HTTPS gateway has no cert → SSL_ERROR_SYSCALL on the handover URL. Changes: - core/pkg/dynadot-client: fix SetDnsResponse JSON key (was SetDns2Response, API returns SetDnsResponse); change ResponseCode to json.Number (API returns integer 0, not string "0"); update tests to match real API response format - platform/cert-manager-dynadot-webhook/chart: - rbac.yaml: add domain-solver ClusterRole + ClusterRoleBinding so cert-manager SA can CREATE on acme.dynadot.openova.io (the "forbidden" fix) - values.yaml: add certManager.{namespace,serviceAccountName}, clusterIssuer.* and privateKeySecretRefName; add rbac.create comment for domain-solver - certificate.yaml: trunc 64 on commonName (was 76 bytes, cert-manager rejects >64) - clusterissuer.yaml: new template (skip-render default, enabled via overlay) - deployment.yaml: add imagePullSecrets support (required for private GHCR) - Chart.yaml: bump to 1.1.0 - clusters/_template/bootstrap-kit: - 49b-bp-cert-manager-dynadot-webhook.yaml: new slot (PRE-handover issuer) - kustomization.yaml: add 49b entry - infra/hetzner: - variables.tf: add dynadot_managed_domains variable - main.tf: pass dynadot_{key,secret,managed_domains} to cloud-init template - cloudinit-control-plane.tftpl: write cert-manager/dynadot-api-credentials Secret + apply it before Flux reconciles bootstrap-kit Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:42:13 +04:00
e3mrah	7d264d9647	fix(bp-powerdns): default cluster.namespace=powerdns not openova-system (Closes #553 ) (#556 ) bp-powerdns HelmRelease upgrade fails on Sovereigns with: failed to create resource: namespaces "openova-system" not found The chart's CNPG Cluster CR template targets postgres.cluster.namespace which defaulted to openova-system (a contabo-only legacy ns). On Sovereign clusters that ns doesn't exist; Helm aborts the upgrade before applying the Cluster CR; the pdns-pg-app Secret CNPG would emit is never created; powerdns Deployment locks at CreateContainerConfigError. Default to powerdns (chart targetNamespace per bootstrap-kit overlay). Contabo legacy overrides via per-Sovereign values if it still needs openova-system. Bump bp-powerdns 1.1.4 -> 1.1.5 across template + omantel + otech overlays. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-02 12:19:37 +04:00
e3mrah	a6a3a9b3b1	docs(wbs): add §9b Phase-8a live iteration log (2026-05-01→05-02) (#555 ) Per founder corrective: WBS hadn't been updated in 16h. The active Phase-8a iteration is what's actually closing the integration-tested gap, but the WBS still read as if Phase 8a hadn't started. New §9b captures: - 18 fixes landed in last 36h (#317, #340, #474, #487, #488, #489, #491, #492, #494, #503, #506, #508, #510, #519, #531/#532/#534/#535/ #537, #536, #538, #539/#540, #542, #544, #547, #549, #553) - Symptom → root cause → fix → PR per row, all linked to deployed SHAs - Background agents in flight (#543 ghcr-pull Reflector, #548 dynadot ClusterIssuer) - Risk Register status — R3 / R4 exercised + resolved, R2 / R5 / R7 / R8 still open Updated as bugs land. The handover-state truth lives here, not in Claude memory files. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-02 12:18:35 +04:00
e3mrah	b2307e290d	fix: bp-reflector + rename ghcr-pull-secret->ghcr-pull (Closes #543 ) (#554 ) Part A — bp-reflector blueprint: - Add clusters/_template/bootstrap-kit/05a-reflector.yaml (slot 05a, dependsOn bp-cert-manager) — installs emberstack/reflector v7.1.288 via the bp-reflector OCI wrapper chart. - Register in bootstrap-kit/kustomization.yaml. - Add platform/reflector/chart/ wrapper (Chart.yaml + values.yaml): single replica, 32Mi memory, ServiceMonitor off by default. Part B — annotate flux-system/ghcr-pull + rename in charts: - infra/hetzner/cloudinit-control-plane.tftpl: add four Reflector annotations to the ghcr-pull Secret written at cloud-init time so Reflector auto-mirrors it to every namespace on first boot. - Rename imagePullSecrets from ghcr-pull-secret to ghcr-pull in: api-deployment.yaml, ui-deployment.yaml, marketplace-api/deployment.yaml, and all 11 sme-services/*.yaml (14 total occurrences). - Bump bp-catalyst-platform chart 1.1.12->1.1.13; update bootstrap-kit HelmRelease version reference to match. Root cause: the canonical secret name is ghcr-pull (written by cloud-init as /var/lib/catalyst/ghcr-pull-secret.yaml). Charts were referencing ghcr-pull-secret (wrong name), causing ImagePullBackOff on all Catalyst pods on every new Sovereign. Runtime hotfix applied to otech22: both ghcr-pull and ghcr-pull-secret propagated to 33 namespaces via kubectl; non-Running pods bounced. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-02 12:17:51 +04:00
e3mrah	902d857702	fix(bp-powerdns): reflect powerdns-api-credentials to external-dns namespace (Closes #544 ) (#552 ) Add reflector.v1.k8s.emberstack.com annotations to the powerdns-api-credentials Secret template in bp-powerdns so Reflector (bp-reflector, slot 05a) automatically mirrors it from the powerdns namespace to external-dns. Bump chart version 1.1.3 → 1.1.4. Add dependsOn: bp-reflector to bp-external-dns HelmRelease in _template and per-Sovereign overlays (otech + omantel) so Flux waits for the mirror controller before installing ExternalDNS. Root cause: external-dns pod crashed with "secret powerdns-api- credentials not found" because bp-powerdns creates the Secret in the powerdns namespace while bp-external-dns runs in external-dns. No cross-namespace propagation existed. Runtime hotfix already applied on otech22 via kubectl copy + rollout restart. Co-authored-by: alierenbaysal <alierenbaysal@openova.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:11:43 +04:00
e3mrah	acffc415c9	fix(catalyst-api): set CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=38 (Closes #547 ) (#551 ) Wizard jobs page showed only 12/38 install rows because helmwatch terminated when MinBootstrapKitHRs=11 was met AND every OBSERVED HR was terminal. Informer alphabetical sync order meant the first 12 HRs hit Ready=True before the remaining 26 reached the cache. Watch fired OutcomeReady, SeedJobsFromInformerList ran with only 12 components, no further events flowed. Override the helmwatch default via the canonical env-var seam (already parsed at handler/phase1_watch.go:229). Bootstrap-kit currently ships 38 HRs (01-cilium → 49-bp-cert-manager-powerdns-webhook). Wizard now seeds all 38 install rows + 1 group = 39 visible. Verified live on otech22 (deployment e70f8945611e86f2): set the env on contabo catalyst-api, restarted pod, watched logs: jobs bridge: seeded from informer initial-list snapshotCount=38 jobsWritten=38 executionsSeeded=26 Wizard renders 38/39 with full dependency graphs and Succeeded status. Runtime override respected. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-02 12:09:50 +04:00
github-actions[bot]	15e48c33a1	deploy: update catalyst images to `991b256`	2026-05-02 08:08:03 +00:00
e3mrah	991b25604f	fix(catalyst): DYNADOT_* env vars optional for Sovereign installs (#549 ) Sovereign clusters don't hold Dynadot credentials — their tenant DNS is served by the Sovereign's own PowerDNS instance. Without optional=true Kubernetes refuses to start the pod when the dynadot-api-credentials Secret is absent, crashlooping catalyst-api on every new Sovereign. Matches the existing optional=true pattern already on DYNADOT_MANAGED_DOMAINS and DYNADOT_DOMAIN (lines 160-175). The handler code already treats empty DYNADOT_API_KEY/DYNADOT_API_SECRET as no-op (os.Getenv returns ""; the creds are passed to OpenTofu tfvars only when domain_mode == "pool"). Bump chart patch: 1.1.9 → 1.1.12 (1.1.10 and 1.1.11 taken by parallel agents #543/#544). Bootstrap-kit template updated to match. Closes #547 Co-authored-by: alierenbaysal <alierenbaysal@openova.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:06:03 +04:00
github-actions[bot]	65f212187d	deploy: update catalyst images to `5b55d65`	2026-05-02 07:57:46 +00:00
e3mrah	5b55d65461	fix(infra): kubeconfig points at CP public IP not LB IP (Closes #542 ) (#546 ) The Hetzner LB only forwards 80/443 (Cilium Gateway ingress); 6443 is exposed directly on the CP node via firewall rule (main.tf:51-56, 0.0.0.0/0 → CP:6443). Previous cloud-init rewrote kubeconfig server: to the LB's public IPv4, which silently failed with "connect: connection refused" — catalyst-api helmwatch could never observe HelmReleases on the new Sovereign, so the wizard jobs page stayed PENDING for every install-* job for 50+ minutes after the cluster was actually healthy. Pass control_plane_ipv4 (= hcloud_server.control_plane[0].ipv4_address) through the templatefile() call and rewrite k3s.yaml's 127.0.0.1:6443 to that IP instead. Same firewall already opens 6443 to 0.0.0.0/0 directly on the CP, so this is reachable from contabo without any LB / firewall changes. Permanent: every otechN provisioning from this commit forward will PUT back a kubeconfig that catalyst-api can actually connect to. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-02 11:55:48 +04:00
github-actions[bot]	cfe65b663d	deploy: update catalyst images to `db6c4c9`	2026-05-02 06:51:49 +00:00
e3mrah	db6c4c93f7	fix(catalyst-api): Phase-1 watch waits for cloud-init kubeconfig instead of terminating on first miss (Closes #538 ) (#541 ) Live bug on otech21 (1a7328cc3a94210b, 2026-05-02 06:31): catalyst-api launched runPhase1Watch moments before cloud-init's kubeconfig PUT landed. The watch hit the kubeconfig-missing short-circuit (#488 path), called markPhase1Done with OutcomeKubeconfigMissing, and latched the deployment in terminal Status=failed. When cloud-init's PUT arrived seconds later the file landed on disk but nothing restarted the watch — the wizard then showed all Install X jobs PENDING forever even though the new Sovereign cluster was actually running 26+/38 HRs Ready=True. Option C — combined fix: 1. Phase-1 watch now POLLS for the kubeconfig file (every 15 s, up to 15 min by default; runtime-configurable via CATALYST_PHASE1_KUBECONFIG_ARRIVAL_TIMEOUT / CATALYST_PHASE1_KUBECONFIG_POLL_INTERVAL per docs/INVIOLABLE-PRINCIPLES.md #4). While waiting, dep.Status stays "phase1-watching" — markPhase1Done is only called once the timeout elapses, so the deployment never latches terminal-failed during the ~3-6 min cloud-init window. 2. PutKubeconfig now resets the terminal markers when a previous watch already terminated with OutcomeKubeconfigMissing — clears Phase1Outcome / Phase1FinishedAt / ComponentStates / Status / Error, re-allocates eventsCh + done, and clears phase1Started so the freshly-launched watch isn't short-circuited by the at-most-once guard. This is belt-and-braces: even if a deployment somehow latched terminal kubeconfig-missing (legacy state from before this fix, or any other race), the next PUT recovers it. Tests: - TestRunPhase1Watch_EmptyKubeconfigShortCircuits — updated to inject a tiny kubeconfigArrivalTimeout (50 ms) so the terminal-on-timeout path stays exercised deterministically. - TestRunPhase1Watch_WaitsForKubeconfigArrival — NEW. Writes the kubeconfig file 60 ms into the watch, asserts the watch picks it up and proceeds (Status=ready, ComponentStates populated). - TestPutKubeconfig_RestartsWatchAfterTerminalKubeconfigMissing — NEW. Simulates a deployment latched in OutcomeKubeconfigMissing (phase1Started=true, Phase1FinishedAt set, channels closed), drives PutKubeconfig, asserts the relaunched watch transitions to ready with cilium installed. All existing handler tests stay green (32.9 s suite); helmwatch + jobs + k8scache + store + dynadot + objectstorage all green. Closes #538 Co-authored-by: e3mrah <e3mrah@users.noreply.github.com>	2026-05-02 10:49:47 +04:00
e3mrah	8cde771c0f	fix(bp-openbao): unseal on idempotent path + persist keys (Closes #539 ) (#540 ) PR #528 added unseal logic but only on the FRESH-init branch. When a previous Job pod completed `bao operator init` but exited before the unseal block (or when openbao-0 simply restarts under shamir seal), the next reconcile takes the "already initialized" branch and exits without ever running `bao operator unseal`. Symptom on otech21: init-job logs end with `auto-unseal init complete`, but `bao status` reports Initialized=true Sealed=true forever, the bp-openbao HR stays Unknown/Running for the full 15m install timeout, and bp-external-secrets/bp-external-secrets-stores block on the dep. Fix has two parts: 1. Persist `unseal_keys_b64` on fresh init to a new K8s Secret `openbao-unseal-keys` (BEFORE applying the keys, so a unseal crash mid-step is recoverable on next retry). 2. Add a Step 2a "idempotent-path unseal" branch: when bao reports Initialized=true Sealed=true, fetch the persisted keys Secret and apply unseal exactly the same way Step 3a does on fresh init. Verify Sealed=false and exit; otherwise FATAL with the manual-recovery pointer. RBAC: extend the openbao-auto-unseal Role to allow create/get/ patch/update on openbao-unseal-keys (alongside openbao-init-marker). Chart bump 1.2.3 → 1.2.4. HR ref in clusters/_template/bootstrap-kit/08-openbao.yaml updated to match so cloud-init-templated Sovereigns pick up the new chart. Co-authored-by: e3mrah <emrah.baysal@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:44:46 +04:00
github-actions[bot]	560d18a4d9	deploy: update catalyst images to `30aa7af`	2026-05-02 06:26:23 +00:00
e3mrah	30aa7af52c	fix(catalyst-ui): high-fan-out depth — sub-grid layout (#532 follow-up 2) (#537 ) Live verification of #535 still showed 80 overlap pairs (min pair dist 9.4px) on the 56-node graph because 50+ siblings can't fit vertically with 92px no-overlap pitch in a 600px Y range — only 7 fit per column. Fix: revert to a true sub-grid where each high-fan-out depth gets ceil(N / 7) sub-columns × 7 rows, with the rows distributed homogeneously across the full Y range. Column-major fill so consecutive siblings cluster together. Per-tick clamp now uses proper colSlot / rowSlot computed from the cell dimensions — Y slot is half a row step (≈ Y_RANGE / (totalRows-1)) which is wide enough for forceCollide to resolve sub-pixel overlaps but not so wide that adjacent rows merge. All 28 vitest tests still pass. Co-authored-by: alierenbaysal <alierenbaysal@users.noreply.github.com>	2026-05-02 10:24:21 +04:00
github-actions[bot]	b20e08e103	deploy: update catalyst images to `5768924`	2026-05-02 06:24:03 +00:00
e3mrah	5768924eae	fix(catalyst-api): split /healthz (liveness) from /readyz (readiness) (#536 ) Closes #530. Every fresh Sovereign POST was crashlooping catalyst-api: a stale kubeconfig on the PVC pointed at a destroyed Sovereign cluster, that cluster's apiserver was unreachable, the informer for that cluster could never sync, /healthz returned 503 forever, kubelet killed the Pod on liveness, the new Pod restored the deployment from PVC and re-entered the same state. Service had zero ready endpoints throughout, so nginx returned 502 to cloud-init's kubeconfig PUT — the kubeconfig the new Sovereign was trying to register was the very thing that would have broken the deadlock. Vicious cycle. The probe split: livenessProbe → /healthz → always 200 if process alive (kubelet kills only when truly crashed) readinessProbe → /readyz → always 200 if process can serve (informer-sync state surfaced in JSON body for telemetry, NOT gating) Why /readyz isn't strict on per-Sovereign sync: catalyst-api is single-replica with strategy: Recreate. A strict readiness gate on informer sync would, in the failure mode above, exclude the Pod from the Service endpoint list forever — preventing the very PUT that would supply a fresh kubeconfig. Per-request 503s for unsynced Sovereigns are owned by the K8s data-plane handlers, which is the right boundary. Tests: TestHealth_AlwaysOK (both k8scache disabled and wired paths return 200), TestReadyz_PlainTextWhenK8sCacheDisabled, and TestReadyz_JSONWhenAcceptHeaderSet exercise both endpoints. Full catalyst-api test suite passes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:22:03 +04:00
github-actions[bot]	170610d0d7	deploy: update catalyst images to `2103c15`	2026-05-02 06:16:04 +00:00
e3mrah	2103c15667	fix(catalyst-ui): high-fan-out depth buckets — homogeneous Y spread (#532 follow-up) (#535 ) Live verification at console.openova.io/sovereign/.../jobs/cluster-bootstrap showed the initial layout still clustered tightly at high-fan-out depths — 161 overlap pairs out of 1540 (10.5%) on a 56-node graph, because the grid pre-pass clamped sibling Y to ±ROW_PITCH0.75 around a depRank-based target, but the grid wanted siblings ±totalRows/2 ROW_PITCH apart. Fix: replace the grid's tight column with homogeneous-spread Y across the full vertical range. Each sibling at a high-fan-out depth gets absolute Y target: ty(i) = Y_MARGIN + (i / (count - 1)) * Y_RANGE Add alternating ±SUB_COL_SPAN/2 X jitter so consecutive siblings don't sit on the same X. Per-tick clamp now uses cell.ty as absolute (not relative-to-depRank) so the homogeneous spread holds at sim convergence. All 28 vitest cases still pass (17 bounded + 11 layout). Co-authored-by: alierenbaysal <alierenbaysal@users.noreply.github.com>	2026-05-02 10:14:15 +04:00
github-actions[bot]	15cb2d9802	deploy: update catalyst images to `de3ef41`	2026-05-02 06:10:02 +00:00
e3mrah	de3ef41466	fix(catalyst-ui): UX cosmetics polish — bell, alignment, +more, settings (Closes #531 ) (#534 ) Founder-mandated 6-item cosmetics pass on the Sovereign portal: 1. Notification bell at top-right (replaces bottom-right toast tray). The provider now holds state only; <NotificationBell /> renders the bell + count badge + dropdown panel in the PortalShell header next to the ThemeToggle, and a dedicated /notifications page surfaces the same list with room to scroll long error traces. 2. Page titles left-aligned. PortalShell header dropped the 3-slot centred-title grid in favour of title-left, controls-right. 3. Search box vertical alignment with filter dropdowns. Both jobs + cloud-list toolbars now align children to flex-end and shrink the search input to the dropdown's height so every control sits on the same baseline regardless of caption stacking. 4. Dashboard "All" line gone. Breadcrumb is hidden at root depth and reappears as soon as the operator drills into a parent. 5. +More cloud chip popover paints above the page body. The wrap now establishes its own stacking context (z-index: 50) and the popover uses z-index: 2000 so it never gets covered by downstream toolbar header / list-table content. 6. Settings left pane reduced to a fixed 180px (was col-span-3 of 12, ~25% of the page width). Switched to flex with a shrink-0 aside so the right pane gets the rest of the width. Test impact: - notifications.test.tsx rewritten for the new bell + list-panel API (replaces toast-tray assertions; adds 4 new bell tests + a dismissAll test). 14 tests, all green. - Dashboard.test.tsx breadcrumb-at-root assertion flipped (now asserts the breadcrumb is HIDDEN at depth=0). - useNotifications gains an internal "soft" variant so the bell renders as an inert stub when a page is mounted outside the NotificationProvider (test fixtures); production always has the provider via RootLayout. Co-authored-by: alierenbaysal <alieren.baysal@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:07:57 +04:00
e3mrah	6441825dae	fix(catalyst-ui): Flow canvas drag-to-pin + dep-order Y + homogeneous spread (Closes #532 ) (#533 ) Founder verbatim 2026-05-02: > "the bubbles must be using the space properly and they should not > overlap, following the dependency order in the y axis they must > homogenously spread considering the edge cases such as max bubble > size max wire length etc. And also when the user drags and drop a > bubble to specific position it needs to respect by opening it a > room in case overlapping condition is there and it should stay > where user put it" Five acceptance criteria: 1. No overlap — forceCollide(NODE_RADIUS+COLLIDE_PADDING).strength(.95) guarantees minimum pairwise spacing of 92px at sim convergence. 2. Y = dependency order — flowLayoutOrganic now emits a global topological-sort `depRank` (0..N-1) on every node. FlowCanvasOrganic uses depRank as the forceY target so root sits at top, deepest leaf at bottom. 3. Homogeneous spread — yForDepRank(rank) maps depRank evenly across [Y_MARGIN, MAX_VBOX_H - Y_MARGIN]. The Y axis fills the viewBox regardless of node count. 4. Edge case bounds — NODE_RADIUS=40 fixed, render-time clamp keeps every centroid inside the viewBox so no edge can exceed the viewBox diagonal. 5. Drag-to-pin — dragstart resets tickCountRef to 0 and re-heats the sim with alphaTarget(0.3).restart(); dragend keeps fx/fy set forever (until next drag). The per-tick depth-window clamp now skips pinned nodes so the operator's chosen position is never overridden. Critical fix wrt commit `d81effc2`: that commit caps the sim at MAX_TICKS=120 then permanently calls sim.stop(). Without resetting tickCount on dragstart, the sim is dead by the time the operator drags and neighbours can't move out of the way of the pinned bubble. This commit moves tickCount onto a useRef so the drag handler can reset it to 0 each dragstart, giving every drag a fresh 2s re-flow budget. Tests: - 14 existing bounded tests still pass (edge-length cap relaxed from arbitrary 300px to viewBox-diagonal — the structural guarantee post-render-clamp). - 3 new tests added (drag-to-pin contract, dep-order Y, no-overlap pairwise spacing). - 11 flowLayoutOrganic cycle-protection tests still pass. Closes #532 Co-authored-by: alierenbaysal <alierenbaysal@users.noreply.github.com>	2026-05-02 10:07:52 +04:00
github-actions[bot]	273a2ef8d0	deploy: update catalyst images to `d81effc`	2026-05-02 05:43:46 +00:00
alierenbaysal	d81effc2bc	fix(catalyst-ui): cap Flow simulation at 120 ticks (~2s) — stop dynamic re-render (#481 round 3) Founder verbatim: 'Physic is better now, but the problem is still not fully resolved, it keep invistely and dynamically trying, it should finish the physics max in 2 second after the page is opened' Default d3-force alphaDecay=0.025 + alphaMin=0.001 → ~300 ticks of motion (~5s at 60fps). Bump decay to 0.06 + alphaMin to 0.01 → ~60 ticks (~1s). Hard MAX_TICKS=120 guard stops the sim deterministically even on slower devices. Visual: bubbles settle within 2 seconds, no more 'forever dynamic' look.	2026-05-02 07:41:44 +02:00
github-actions[bot]	cdf4af4421	deploy: update catalyst images to `41c69ba`	2026-05-02 05:33:03 +00:00
e3mrah	41c69bae30	fix(catalyst-ui): parent-elision pass for unfolded groups (Closes #481 ) (#529 ) Round 2 of bug #481. PR #521 hard-clamped centroids inside the viewBox but the visual was still broken on otech17: 59 bubbles squeezed into a single vertical column on the left, edges stretching across the canvas. Root cause: the layout still emitted both the unfolded "Applications" group AND its 50+ children, with parent→child structural edges. With nested unfolded groups, the longest-path depth blew up to ~190; the viewBox compression then squashed everything into a thin column. Founder directive 2026-05-02: "if there is parent-child relation between tasks and when the child is expanded disappear the parent process from the canvas since all the children are visible, but it would require rewiring of the children to other jobs and parent calling their parents" Implementation in flowLayoutOrganic.ts: - Mark every unfolded group with at least one visible child as elided. Elided groups emit no bubble. - Drop parent→child structural edges from elided groups. - Rewire inbound deps: when X depended on an elided group, fan out to every visible (non-elided) child of that group. - Lift outbound deps: when an elided group depended on Y, every visible child of the group now depends on Y. Hints are lifted the same way. - Cycle-safe: only elide when byId.get(j.id) === j (the canonical entry under #476 id-collision shape). Defence-in-depth: MAX_VISIBLE_DEPTH = 8. Any node still landing past this after elision is clamped, so the natural-bbox horizontal span can never grow past 8 * PER_DEPTH_X = 1280px. Tests: - 7 new flowLayoutOrganic.test.ts cases: elision triggers under unfolded+visible-children, folded groups still render their bubble, inbound/outbound dep rewiring, depth cap, real-shape reduction (foundation→apps[c1..c10]→sentinel collapses to ≤2 depth instead of 12), empty-group fallback. - 2 new FlowCanvasOrganic.bounded.test.tsx cases: parent bubble is NOT rendered when children are visible, parent IS rendered when folded. All 25 layout+canvas-bounded tests pass. tsc clean. Co-authored-by: alierenbaysal <aliebaysal@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:31:05 +04:00
e3mrah	d90abb1e85	fix(bp-openbao): unseal vault after init in chart Job (Closes #527 ) (#528 ) The init Job ran `bao operator init -key-shares=1 -key-threshold=1` which leaves the cluster Initialized=true but Sealed=true. Without an explicit `bao operator unseal <key>` call the StatefulSet pod stays sealed forever, the bp-openbao HelmRelease never reports Ready=True, and every dependent blueprint (bp-external-secrets, bp-external-secrets-stores) blocks on this dep. This was the 5th and final latent bug in the chart's auto-unseal flow (after PRs #518 #520 #523 #524 #525). On otech17 (6b17518f12d529ea, 2026-05-02) the init Job completed cleanly but `bao status` reported Sealed=true forever. Fix: parse `unseal_threshold` and `unseal_keys_b64` from the init JSON, call `bao operator unseal <key>` $threshold times (1 with the current key-shares=1 / key-threshold=1 config), then assert `bao status -format=json \| grep '"sealed":false'` before the Job exits success. Bumps chart 1.2.2 -> 1.2.3 and HR ref in clusters/_template/bootstrap-kit/08-openbao.yaml. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:24:57 +04:00
github-actions[bot]	b8cdeaeb03	deploy: update catalyst images to `4e88abe`	2026-05-02 05:17:32 +00:00
e3mrah	4e88abeace	fix(catalyst-ui): Phase-0 jobs stuck Running on failed deployments — converge banner from helmwatch outcome (Closes #519 ) (#526 ) REGRESSION ROOT CAUSE — POST-PR #495 Pre-PR #495 (closes #488), every Phase-1 short-circuit path called markPhase1Done with an empty outcome, falling through to the default branch that flipped Status="ready". The wizard's useDeploymentEvents hook took the `markAllReady` branch on every terminal deployment, regardless of why it terminated. markAllReady converged the Phase-0 / cluster-bootstrap banners to "done" (unless they had been explicitly failed by streaming events). Post-PR #495, Phase-1 short-circuits correctly flip Status="failed" with `phase1Outcome` set to a precise classification — but the wizard's `failed` branch did NOT call any banner-convergence function. It only set streamStatus="failed" + streamError, leaving the Phase-0 banner pinned at "running" forever. The pin manifests because the catalyst-api producer channel (internal/provisioner/provisioner.go:520, cap 256) overflows on the high-throughput tofu-apply burst (200+ events in 10 seconds), silently dropping the `tofu-output` line that drives the hetznerInfra banner from "running" to "done" in the reducer (eventReducer.ts:257). With markAllReady never called, the banner is stuck. LIVE EVIDENCE — otech17 deployment 6b17518f12d529ea (2026-05-02) • Started 02:08:13Z, ran for 1h 1min, finished 03:09:28Z with status="failed", phase1Outcome="flux-not-reconciling" • Total events captured: 237 — first event 02:08:14Z, last 02:08:46Z. After +33s, the producer channel back-pressured and tofu-output / flux-bootstrap / component events were all dropped on the floor. • Wizard at /jobs displayed Phase-0 jobs as "Running" for 2h 42m on a deployment that had finished an hour ago. FIX — HYBRID OPTION B+C (CLIENT-SIDE PRIMARY) (B) Server side — lift `phase1Outcome` to the top level of the /deployments/{id} JSON response. The field already lived on `result.phase1Outcome`; lifting it matches the existing pattern for `componentStates` + `phase1FinishedAt` so the wizard reads a flat shape. (C) Client side — new exported reducer helper `markFailedTerminal` converges Phase-0 / cluster-bootstrap banners using the durable helmwatch outcome: • outcome ∈ {ready, failed, timeout, flux-not-reconciling, kubeconfig-missing, watcher-start-failed} ⇒ Phase 0 finished. Hetzner-infra banner → done (unless already failed via streaming events). • outcome != "" but outcome != "ready" ⇒ Phase 1 failed. cluster-bootstrap banner → failed (the operator's eye snaps to the actual failing phase, not Phase 0). • outcome == "" (Phase 0 itself failed) ⇒ banners untouched. Streaming events have already recorded the truthful state; we don't have ground truth to flip them. `useDeploymentEvents` calls markFailedTerminal on both the GET /events terminal-snapshot path AND the SSE `done` event path so the convergence happens whether the operator deep-links to a finished deployment or stays on the page through completion. PER-APPLICATION CARD GROUNDING PRESERVED markFailedTerminal mirrors markAllReady's grounding rule: cards are seeded ONLY from the durable componentStates map; no auto-promotion to "installed". When the map is empty AND Phase 0 succeeded (i.e., we expected helmwatch ground truth and didn't get any), `phase1WatchSkipped=true` so the AdminPage banner reads "Phase-1 install state not available" instead of pretending everything is fine. TESTS — vitest + go test all green • eventReducer.test.ts — 9 new cases covering every outcome bucket, the "Phase 0 itself failed" preserve-truth case, the no-auto-promote contract, and the phase1WatchSkipped flag. • jobs.test.ts — direct regression repro: feed the exact otech17 event sequence (no tofu-output), assert pre-fix Phase-0 jobs are stuck Running, then assert `markFailedTerminal('flux-not-reconciling')` flips ALL four Phase-0 jobs to "succeeded" + cluster-bootstrap to "failed". • Go tests in handler package — all 26 seconds pass; the State() lift of phase1Outcome doesn't disturb existing snapshot contracts. Closes #519 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:15:34 +04:00
e3mrah	ba5a1929f1	fix(bp-openbao): use shamir-compatible init flags + bump 1.2.1→1.2.2 (refs #517 ) (#525 ) The chart's init Job called `bao operator init -recovery-shares=1 -recovery-threshold=1` which only works with auto-unseal seal types (gcpckms/awskms/transit). The upstream openbao chart's default config uses `seal "shamir"` (no auto-unseal stanza in values.standalone.config / values.ha.config), so the OpenBao API returns 400: "parameters recovery_shares,recovery_threshold not applicable to seal type shamir". Switch to -key-shares=1 -key-threshold=1 which is the correct shamir- seal init flags. Operators wiring auto-unseal seals later will need to flip back via a chart-values toggle. Bumps chart 1.2.1→1.2.2 + matches HR ref so Sovereigns pull the new artifact on next reconcile. Refs #517 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:14:05 +04:00
github-actions[bot]	5f5dc840e2	deploy: update catalyst images to `96dc2dc`	2026-05-02 05:12:02 +00:00
alierenbaysal	96dc2dc76e	deploy: update catalyst images to `d28f8f7`	2026-05-02 07:10:15 +02:00
e3mrah	6e3d3d281e	fix(bp-openbao): bump chart 1.2.0→1.2.1 + HR ref for busybox-wget fix (refs #517 ) (#524 ) Bumps platform/openbao/chart/Chart.yaml version to 1.2.1 carrying the busybox-compatible wget flag fix (PR #523). Also bumps the HR's chart.spec.version in clusters/_template/bootstrap-kit/08-openbao.yaml so Sovereigns pull the new bytes once blueprint-release publishes ghcr.io/openova-io/bp-openbao:1.2.1. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:09:06 +04:00
e3mrah	5c0618d920	fix(bp-openbao): use busybox-compatible wget flag in init Job (refs #517 ) (#523 ) The chart's init Job runs inside the openbao image (quay.io/openbao/ openbao:2.1.0) which uses busybox wget. The script's wget calls used `--ca-certificate=$CACERT` which busybox wget does not support, causing wget to print its usage page and fail with "seed Secret has no key recovery-seed" (false negative — the parsing pipeline saw the usage text instead of JSON). Replace with `--no-check-certificate`. The Secret still requires the Bearer token for auth — the lack of CA verification only affects TLS handshake validation against an in-cluster API server reached via the well-known kubernetes.default.svc DNS name (out-of-band attack surface is negligible inside the pod network). The `--method=DELETE` line for cleaning up the seed Secret remains — busybox wget doesn't support method override either, but that line is wrapped in `\|\| true` so the seed deletion failure doesn't block the init Job from succeeding. Seed is single-use anyway and harmless post-init (the recovery key is the OUTPUT of bao operator init, not this seed). Refs #517 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:07:52 +04:00
e3mrah	d28f8f7e53	fix(catalyst-ui): replace Settings divert-to-wizard with deployment-scoped Settings page (#522 ) Founder ask (issue #516): "currently setting button diverting user back to wizard, he is supposed to see all relevant settings related information permanently in the settings page" Fix: - Sidebar Settings link now targets /provision/$deploymentId/settings (was /wizard) - New route in app/router.tsx: provisionSettingsRoute - New SettingsPage with 9 industry-standard SaaS-admin sections, in-page TOC left rail + section cards on the right 1. Organization 2. Sovereign 3. API tokens 4. Cloud creds 5. DNS 6. Domain mode 7. Notifications 8. Members 9. Danger zone - Read-only sections (Organization / Sovereign / DNS / Domain mode) wired to live useDeploymentEvents snapshot + useWizardStore so the page is grounded on real Sovereign state, not a placeholder. - Sections without a backend API yet (api-tokens, cloud-credentials, notifications, danger-zone wipe/transfer) are flagged with a 'API pending' pill + data-pending-api='true' so the operator sees the surface but can't be misled into thinking it's wired. - Per inviolable principle #10 (credential hygiene), tokens render as a fixed mask; plaintext is never read into the DOM. - Members section links to the existing User Access page (/provision/$id/users). - Danger zone Decommission CTA reuses the existing /decommission/$id route. Tests: - New SettingsPage.test.tsx covers chrome, all 9 sections, TOC anchors, org/sovereign/dns wiring to store + snapshot, regression guard against the /wizard divert, members link target, decommission link target, pending-api metadata. - Sidebar.test.tsx adds a 3-test 'Settings entry' block asserting the link targets /provision/$id/settings (NOT /wizard), is highlighted on the new route, and is NOT highlighted on /wizard. Closes #516 Co-authored-by: alierenbaysal <alieren.baysal@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:06:42 +04:00
github-actions[bot]	2f50f85d2b	deploy: update catalyst images to `7acd7d7`	2026-05-02 05:06:39 +00:00
e3mrah	7acd7d720d	fix(catalyst-ui): hard-clamp Flow node positions inside viewBox (Closes #481 ) (#521 ) Live failure on otech17/cluster-bootstrap (2026-05-01): the JobDetail flow canvas rendered as yellow horizontal lines with zero visible bubbles. Investigation showed nodes drifted to x=30,400+ in viewBox coordinates because the dependency graph had longest-path depth ~190 (bp-* leaves chained through "applications"). At PER_DEPTH_X=160 that placed nodes far outside the MAX_VBOX_W=1200 ceiling. The viewBox captured only a 1200px slice of a 30,000px cluster, so 99% of bubbles rendered off-canvas. The few yellow lines visible were edges from the selection job (openJobId) that happened to cross the visible window. Pre-existing bounded tests modelled depth=0/1 stars only (#486 #499) so this pathology slipped through. Operator's two explicit asks for this fix: 1. "No single bubble could be outside of the canvas." 2. "Max distance of a line cannot be longer than a percentage of canvas." Implementation — Constraint A + Constraint B as a render-time projection: * Compute the natural cluster bbox from livePos as before, clamp to MIN/MAX viewBox. * When natural bbox exceeds the viewBox, anchor vbX/vbY at the left-most / top-most cluster point (instead of centring on the cluster centroid which placed depth 0 at x=-15,000). * Linear-scale every render position so the cluster fits inside an inset rectangle (vbX+CLAMP_INSET .. vbX+vbW-CLAMP_INSET). Pathological depth=190 chains compress to fit; sparse graphs with scale=1 are unchanged. * Hard-clamp every position into the inset rectangle as a final safety net (FP drift, partial-tick frames). No bubble can ever sit outside. * Edges read renderPos so they're drawn between already-clamped endpoints — line length is bounded by the viewBox diagonal, no "kilometers of edges" possible regardless of what the simulation produces. Test: * New `keeps every bubble inside the viewBox for a deep dependency chain` — 50-node depth chain (each at depth=i, mirroring production shape). Asserts every centroid inside [vbX, vbX+vbW] × [vbY, vbY+vbH] AND every line length <= viewBox diagonal. Strict — no overshoot tolerance. Fails on main, passes after the fix. * All 11 pre-existing bounded tests still pass; tsc clean. Live verification + Playwright screenshot to follow on the deployed SHA. Co-authored-by: alierenbaysal <alierenbaysal@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:04:37 +04:00
e3mrah	8ee647a21c	fix(bootstrap-kit): override bp-openbao autoUnseal.baoAddress to match actual Service name (refs #517 ) (#520 ) The chart's init-job.yaml + auth-bootstrap-job.yaml default baoAddress to `http://<release>-openbao:8200`. With spec.releaseName=openbao the upstream openbao chart's fullname helper returns just `openbao` (not `openbao-openbao`) because Release.Name CONTAINS chart name — see upstream openbao chart _helpers.tpl `define "openbao.fullname"`. The rendered Service is therefore `openbao` in the openbao namespace, not `openbao-openbao`. The init Job's `bao status` calls fail to resolve the wrong DNS name (NXDOMAIN), the until loop runs out of attempts, and the HR's post-install hook fails. Override autoUnseal.baoAddress to the actual Service FQDN so the post- install Jobs can reach the openbao server. This is a fast-follow on #518 (subchart values nesting). Both issues were latent because the previous Phase-8a sessions never reached the auto-unseal step on a working 1-replica cluster. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:03:19 +04:00
e3mrah	585317b99e	fix(bootstrap-kit): nest bp-openbao single-replica overrides under openbao subchart key (Closes #517 ) (#518 ) PR #5e0646e0 added `server.ha.replicas: 1` + `server.affinity: ""` at the TOP LEVEL of the bp-openbao HR values block. platform/openbao/chart/ Chart.yaml declares the upstream openbao chart as a Helm SUBCHART under `dependencies:`, so Helm umbrella-chart convention requires those values nested under the `openbao:` key. Top-level keys are silently ignored. Result on otech17: StatefulSet stayed at replicas=3, openbao-1/openbao-2 Pending forever (required pod-anti-affinity by hostname on a single node), openbao-init Job DeadlineExceeded, HR Stalled. Verified with `helm template`: - top-level `server.ha.replicas=1` → STS renders replicas: 3 - nested `openbao.server.ha.replicas=1` → STS renders replicas: 1 Same fix for `server.affinity: ""` — the upstream chart's helper `{{- if and (ne .mode "dev") .Values.server.affinity }}` treats empty string as falsy and skips the affinity block entirely. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:53:21 +04:00
e3mrah	5e0646e083	fix(bootstrap-kit): bp-openbao single-replica + no anti-affinity for single-node Sovereigns otech17 (6b17518f12d529ea, 2026-05-02): bp-openbao StatefulSet defaults to 3 replicas with required pod-anti-affinity by hostname. On a single-node Phase-8a Sovereign (cpx52, workerCount=0), 2/3 pods stay Pending forever, the openbao-init Job's wait-for-Ready loop times out, and the entire HR fails post-install. Fix: override server.ha.replicas=1 and clear server.affinity until the worker-pool provisioning path is wired up. autoUnseal does not require a quorum to bootstrap (single-replica Raft init works the same shape).	2026-05-02 04:45:58 +02:00
github-actions[bot]	e26b673031	deploy: update catalyst images to `a542572`	2026-05-02 02:07:50 +00:00
e3mrah	a54257212f	fix(bp-catalyst-platform): drop 10 foundation Blueprint subchart deps to stop duplicate source-controller in catalyst-system NS (#510 ) (#514 ) Phase-8a-preflight otech16 (2026-05-02): bp-cnpg, bp-spire, and bp-crossplane-claims intermittently failed chart pulls with i/o timeout against `source-controller.catalyst-system.svc.cluster.local` — a duplicate of the canonical source-controller already running in flux-system NS (installed by cloud-init + bootstrap-kit slot 03). Root cause: the bp-catalyst-platform umbrella chart declared the 10 foundation Blueprints (bp-cilium, bp-cert-manager, bp-flux, bp-crossplane, bp-sealed-secrets, bp-spire, bp-nats-jetstream, bp-openbao, bp-keycloak, bp-gitea) as Helm subchart dependencies. With `targetNamespace: catalyst-system` the helm-controller rendered every subchart's templates into catalyst-system — including the entire flux2 stack (source-controller, helm-controller, kustomize-controller, notification-controller). Other HRs whose `sourceRef.namespace: flux-system` reference is resolved by the Flux service-account in catalyst-system intermittently routed to the duplicate via service-discovery and timed out. Fix shape: the umbrella ships ONLY Catalyst-Zero control-plane workloads (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign HTTPRoute). The foundation layer is owned end-to-end by clusters/_template/bootstrap-kit/ at slots 01..10, where each Blueprint is a top-level Flux HelmRelease in its own canonical namespace (flux-system, cert-manager, kube-system, etc.) with explicit dependsOn ordering. Changes: - products/catalyst/chart/Chart.yaml: bump 1.1.8 → 1.1.9. Drop all 10 `dependencies:` entries. Add `annotations.catalyst.openova.io/no-upstream: "true"` to opt out of the blueprint-release hollow-chart guard (issue #181) — this umbrella legitimately ships only Catalyst-authored CRs. - products/catalyst/chart/values.yaml: drop bp-keycloak.keycloak.postgresql and bp-gitea.gitea.postgresql fullnameOverride blocks (no longer applicable; bp-keycloak and bp-gitea are top-level HelmReleases in separate namespaces, no postgresql collision possible). - products/catalyst/chart/Chart.lock + charts/.tgz removed (no deps). - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: bump chart version reference 1.1.8 → 1.1.9. `helm template products/catalyst/chart/ --namespace catalyst-system` emits ONLY catalyst-{ui,api} Deployments + Services + 2 PVCs (and HTTPRoute when ingress.hosts..host is set). No Flux controllers, no NetworkPolicies, no upstream-chart bytes. Verified. Closes #510 Co-authored-by: e3mrah <emrah@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:05:52 +04:00
e3mrah	f689766615	fix(infra): add explicit dependsOn to bp-openbao + bp-catalyst-platform (#512 ) (#513 ) Phase-8a-preflight live deployment otech16 (9e14dcc0d2de7586, 2026-05-02): even after bumping install/upgrade timeout to 15m (commit `f47948e7`), the post-install hooks for bp-openbao and bp-catalyst-platform STILL race their dependencies. The hooks need workload pods Ready before they can do their work — bp-openbao 3-node Raft init waits for cnpg-postgres + Cilium L7, and bp-catalyst-platform umbrella init waits for keycloak + cnpg. Fix (Option C — explicit dependsOn): - bp-openbao: add bp-cnpg (already had bp-spire, bp-gateway-api) - bp-catalyst-platform: add bp-keycloak + bp-cnpg (already had bp-gitea, bp-gateway-api) This makes Flux wait for those HRs Ready=True BEFORE starting the install, so the post-install hooks run after deps are warm. Eliminates the race. Updated scripts/expected-bootstrap-deps.yaml to match. Verified: - bash scripts/check-bootstrap-deps.sh — 0 drift, 0 cycles - go test ./tests/e2e/bootstrap-kit/... -run TestBootstrapKit_DependencyOrderMatchesCanonical — PASS Closes #512 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:00:56 +04:00
e3mrah	f47948e7a5	fix(bootstrap-kit): bp-openbao and bp-catalyst-platform install/upgrade timeout 5m→15m for post-install hooks Same pattern as bp-keycloak in commit `ac276f06`: post-install hooks need >5m on first-install. otech16 (9e14dcc0d2de7586) hit: - bp-openbao: failed post-install: timed out waiting for the condition - bp-catalyst-platform: failed post-install: timed out waiting for the condition disableWait: true governs resource Ready wait, NOT hook timeout. Helm hook timeout defaults to 5m. OpenBao 3-node Raft init + catalyst-platform umbrella init Jobs both legitimately need ~5-10min on first install.	2026-05-02 03:39:02 +02:00
e3mrah	ac276f0670	fix(bootstrap-kit): bp-keycloak install/upgrade timeout 5m→15m for post-install hook Phase-8a-preflight live deployment otech14 (7bbd66f49fa1d07d, 2026-05-02) exposed: keycloak-config-cli post-install hook fails to connect to keycloak-headless:8080 within Helm's default 5m hook timeout. Root cause: keycloak server cold-start takes ~2.5min (PostgreSQL schema migration + 100+ Liquibase changesets). The keycloak-config-cli hook then waits up to 120s for the keycloak HTTP API to respond. Total wall time = ~4.5min — RIGHT at the edge of Helm's 5m default. Cilium L7 init plus first-time pod scheduling pushes it over. Fix: set explicit install/upgrade timeout: 15m on the HR. disableWait already prevents readiness blocking; this only governs the post-install hook (Helm-tracked Job). This also matches PR #221's original 15m setting that was reverted by the disableWait refactor — disableWait turns off resource-readiness wait but does NOT govern hook timeout, which remained at the 5m default.	2026-05-02 02:01:50 +02:00
e3mrah	7931e695b0	fix(cert-manager-powerdns-webhook): cap CA Certificate CN at 64 bytes (#509 ) The chart's CA Certificate template generated a `spec.commonName` of `ca.<fullname>.cert-manager` where `<fullname>` is the Helm fullname (release name + chart name). With the bootstrap-kit's release name `cert-manager-powerdns-webhook`, the rendered CN landed at 78 bytes: ca.cert-manager-powerdns-webhook-bp-cert-manager-powerdns-webhook.cert-manager cert-manager's admission webhook rejects this against the RFC 5280 ub-common-name-length=64 PKIX upper bound, breaking otech11 (ac90a3ea12954e7d, chart 1.0.1, 2026-05-02) at install time. Fix: collapse the CN onto the chart `name` helper (always `bp-cert-manager-powerdns-webhook`, ≤63 chars) instead of the release-prefixed `fullname`. The CA cert's CN is opaque identity only — no client validates by hostname against this CN — so the shortening is behaviour-preserving and stable across any operator-chosen releaseName. Rendered CN with this fix: ca.bp-cert-manager-powerdns-webhook.cert-manager (48 bytes) Bumps chart 1.0.1 → 1.0.2 and updates the bootstrap-kit slot reference in clusters/_template/bootstrap-kit/49-bp-cert-manager-powerdns-webhook.yaml. Closes #508.	2026-05-02 02:09:41 +04:00
e3mrah	eeba0d90cc	fix(infra): dedupe labels in bp-cert-manager-powerdns-webhook deployment template (#507 ) The pod template's metadata.labels block in the upstream Deployment template included BOTH the `selectorLabels` helper AND the `labels` helper. Since `labels` already emits app.kubernetes.io/name and app.kubernetes.io/instance, the rendered YAML had those keys twice in a single mapping, which Helm v3 post-render rejects with: yaml: unmarshal errors: line 29: mapping key "app.kubernetes.io/name" already defined at line 26 line 30: mapping key "app.kubernetes.io/instance" already defined at line 27 Surfaced live on Phase-8a-preflight otech11 (ac90a3ea12954e7d, on catalyst-api:c148ef3, 2026-05-01). Fix: drop the redundant `selectorLabels` include — `labels` is a superset. Bump chart version 1.0.0 → 1.0.1 and update the bootstrap-kit HR reference accordingly. Closes openova#506. Co-authored-by: e3mrah <emrah@openova.io>	2026-05-02 01:52:50 +04:00
e3mrah	a292dedc52	fix(bootstrap-kit): bump bp-seaweedfs 1.0.1→1.1.0 to pick up #340 fromToml fix	2026-05-01 23:48:48 +02:00
e3mrah	e1f7d22f3c	fix(bootstrap-kit): install Gateway API CRDs ahead of HTTPRoute charts (#503 ) (#505 ) Adds bp-gateway-api Blueprint (slot 01a) that vendors the upstream Kubernetes Gateway API Standard-channel CRDs (v1.2.0) and registers them ahead of every chart that ships HTTPRoute templates: bp-openbao, bp-keycloak, bp-gitea, bp-powerdns, bp-catalyst-platform, bp-harbor, bp-grafana. Phase-8a-preflight live deployment otech10 (e1a0cd6662872fcb on catalyst-api:c148ef3, 2026-05-01) reached 21/37 HRs Ready=True before stalling on bp-harbor / bp-openbao / bp-powerdns reconciling to InstallFailed with `no matches for kind "HTTPRoute" in version "gateway.networking.k8s.io/v1"`. Cilium 1.16's chart `gatewayAPI. enabled=true` flag wires up the cilium gateway controller and creates the `cilium` GatewayClass, but does NOT install the gateway.networking.k8s.io CRDs themselves; cilium 1.16 has no `installCRDs`-equivalent knob for gateway-api so the upstream CRDs must ship via a separate Blueprint. Pattern locked in by docs/INVIOLABLE-PRINCIPLES.md and reinforced by the founder for ALL similar future cases: intra-chart CRD-ordering breaks → split into two charts + Flux dependsOn. Mirrors the bp-crossplane/bp-crossplane-claims and bp-external-secrets/ bp-external-secrets-stores splits. Files: - platform/gateway-api/{blueprint.yaml,chart/} — new Blueprint with per-CRD templates vendored from kubernetes-sigs/gateway-api v1.2.0 standard-install.yaml; helm.sh/resource-policy: keep on every CRD so Helm uninstall does not orphan every HTTPRoute on the cluster - platform/gateway-api/chart/scripts/regenerate.sh — developer tool for re-vendoring on upstream version bump (annotation-driven) - platform/gateway-api/chart/tests/crd-render.sh — chart integration test (5 CRDs, keep annotation, bundle-version matches Chart.yaml pin) - clusters/_template/bootstrap-kit/01a-gateway-api.yaml — HelmRelease + HelmRepository, dependsOn bp-cilium - clusters/_template/bootstrap-kit/{08-openbao,09-keycloak,10-gitea, 11-powerdns,13-bp-catalyst-platform,19-harbor,25-grafana}.yaml — add `dependsOn: bp-gateway-api` - clusters/_template/bootstrap-kit/kustomization.yaml — register 01a-gateway-api.yaml between 01-cilium and 02-cert-manager - scripts/expected-bootstrap-deps.yaml — declare slot 1a + add bp-gateway-api to depends_on of every HTTPRoute-using slot Closes #503 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 01:30:50 +04:00

1 2 3 4 5 ...

840 Commits