Commit Graph

1215 Commits

Author SHA1 Message Date
e3mrah
0a721506d1
fix(catalyst-api): eventual-consistent Phase-1 watcher with late-poll (#910) (#913)
When the all-terminal trip fires with at least one failed HelmRelease,
keep the informer running for an additional LatePollTimeout window
(default 10 minutes) to give Flux helm-controller's remediation.retries
path room to flip the failed HR back to installing → installed. If
every component reaches StateInstalled during the late-poll window,
classify as OutcomeReady; if the deadline elapses with any HR still
failed, classify as OutcomeFailed exactly as before.

Motivated by the otech105 incident (2026-05-05): bp-catalyst-platform
1.4.17 hit the missing-sme-namespace InstallFailed on first install,
1.4.18 (chart-version bump) succeeded a few minutes later — the
Sovereign reached 40/40 HRs Ready=True but the orchestrator had
already marked the deployment FAILED at the moment of the 1.4.17
terminal observation.

Specifically:
* internal/helmwatch: new Config fields LatePollTimeout +
  LatePollInterval, new runLatePoll loop that re-reads the live
  state map until convergence-or-deadline. Per-component events
  fire via the existing dispatch path so the wizard log pane
  surfaces the recovery window. New CompileLatePollTimeout +
  CompileLatePollInterval env helpers parse
  CATALYST_PHASE1_LATE_POLL_TIMEOUT +
  CATALYST_PHASE1_LATE_POLL_INTERVAL.
* internal/handler: phase1WatchConfigForDeployment threads the
  two new knobs through. Two new test-only handler fields
  phase1LatePollTimeout / phase1LatePollInterval mirror the
  existing Phase-1 knobs.
* clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
  bump install/upgrade timeout from 15m to 25m for the
  bp-catalyst-platform umbrella specifically. The chart genuinely
  needs ~20 minutes worst-case on a fresh franchised Sovereign
  with the full SME service stack; every other bp-* chart stays
  at its previous default since they install in well under 5
  minutes empirically.

New tests cover:
* TestWatch_LatePollRecoversFailedComponentToReady — happy path
* TestWatch_LatePollExhaustsKeepsOutcomeFailed — exhaustion path
* TestWatch_LatePollMultipleFailedPartialRecovery — partial recovery
* TestWatch_LatePollDoesNotRunWhenNoFailures — happy-path regression
* TestLatePollActive_FlagToggles — accessor wiring
* TestCompileLatePoll{Timeout,Interval}_DefaultOnEmpty — env helpers
* TestRunPhase1Watch_LatePollRecoversFailedToReady — handler integration
* TestRunPhase1Watch_LatePollExhaustsFlipsToFailed — handler integration
* TestPhase1WatchConfig_LatePollEnvVarOverride — env wiring
* TestPhase1WatchConfig_LatePollFieldOverrideBeatsEnv — test injection

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:25:51 +04:00
github-actions[bot]
937491b17d deploy: update catalyst images to dd2fe1a 2026-05-05 08:16:17 +00:00
e3mrah
dd2fe1aa62
fix(bp-catalyst-platform): unblock Sovereign Console PIN-login on fresh provision (1.4.19, #910 Bugs 2+3) (#912)
Two coupled fixes that unblock Sovereign Console PIN-login on every
freshly franchised cluster (1.4.18 closed Bug 1 — the missing `sme`
namespace).

Bug 2 — CATALYST_SESSION_COOKIE_DOMAIN was hardcoded to
console.openova.io in templates/api-deployment.yaml. On a Sovereign the
request host is console.<sov-fqdn>, so the browser silently rejected
the Set-Cookie (RFC 6265 §5.3 step 6 — Domain mismatch) and every
/api/* request landed without a session, redirecting back to /login
forever. Caught live on otech105 (2026-05-05).

Fix: change the literal default to "" (empty). Per the dual-mode
contract documented in the CATALYST_POWERDNS_API_URL block of
api-deployment.yaml, this MUST stay a literal — Helm template
directives in `value:` fields break the contabo Kustomize-mode build.
Empty value is correct on BOTH paths: when CATALYST_SESSION_COOKIE_DOMAIN
is empty the auth handler omits the Domain attribute and the browser
binds the cookie to the exact request host. On contabo that is
console.openova.io (wizard + magic-link served from the same host); on
a Sovereign that is console.<sov-fqdn> (likewise). Per-Sovereign
overlays MAY override via the catalystApi.env additional-env patch in
the per-cluster HelmRelease for unusual topologies.

Bug 3 — catalyst-openova-kc-credentials-secret.yaml's smtp-user/
smtp-pass lookup used "existing target wins" persistence over the
source `sovereign-smtp-credentials` Secret seeded by A5's provisioner
(issue #883). On first install the source Secret had not yet been
seeded (race between catalyst-api's seedSovereignSMTP step and the
chart reconcile), so the chart rendered empty SMTP creds, persisted
them into the target, and operator-edited target bytes would be
overwritten on every subsequent reconcile because the source ALSO
won at that point — a footgun. Caught live on otech105 (2026-05-05):
POST /api/v1/auth/pin/issue 502'd with `email-send-failed`.

Fix: invert the SMTP-cred lookup precedence. SOURCE
(sovereign-smtp-credentials) wins over the persisted target. Every
Flux reconcile (1m cadence) re-reads the source, so as soon as A5's
seed completes the chart picks it up on the next tick. Operator
rotation: edit sovereign-smtp-credentials (the operator-facing seam);
the target is a chart-derived projection and never an operator surface.

KC fields keep the previous "existing target wins" contract because
bp-keycloak's openbao-bridge auto-rotates the client-secret on every
Helm upgrade and we want that rotation to require explicit operator
action (delete the target Secret) rather than auto-roll the
catalyst-api Pod.

Lockstep:
  - products/catalyst/chart/Chart.yaml: 1.4.18 → 1.4.19 with full
    1.4.19 changelog block.
  - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
    pinned chart version 1.4.18 → 1.4.19 with inline rationale
    comment matching the 1.4.x changelog format.

Verification:
  - helm template (default values) clean — Kustomize-mode contabo
    build path unchanged.
  - helm template Sovereign-mode (ingress.marketplace.enabled=true,
    sovereignFQDN=otech106.omani.works) renders 62 resources;
    CATALYST_SESSION_COOKIE_DOMAIN renders as `value: ""`.
  - kubectl kustomize products/catalyst/chart/templates clean —
    contabo Kustomize-mode build emits same resource set, with
    CATALYST_SESSION_COOKIE_DOMAIN: "".

Refs: #910

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:14:20 +04:00
e3mrah
58bfdb5eb3
fix(catalyst-api): align SME tenant orchestrator emit with bp-keycloak / bp-cnpg chart contracts (#910) (#911)
The sme_tenant_gitops.go emit for per-tenant bp-keycloak HelmReleases
used a values shape (`topology`, `realm.*`, `bootstrap.*`, `ingress.*`)
that the bp-keycloak chart does NOT consume. Result: tenant Keycloak
Pod ran but the chart's templates/httproute.yaml guard rendered
nothing (`gateway.host` was unset), so tenant users could not reach
their own Keycloak and downstream WordPress / OpenClaw / Stalwart
OIDC integration broke.

Chart contract (platform/keycloak/chart/values.yaml):
  - sovereignFQDN
  - sovereignRealm.enabled
  - gateway.enabled / gateway.host / gateway.parentRef
  - smtp.{host,port,from,user,password,ssl,starttls,auth}

This change emits the canonical shape, plus a forward-looking
realmConfig.tenant.* marker for the future tenant-mode realm template
(Helm accepts unknown values silently — the marker is harmless until
the chart honours it).

Also fixes bp-cnpg emit: the chart is a pure umbrella subchart of
cloudnative-pg; per-Sovereign overrides MUST flow through the
`cloudnative-pg.*` namespace. The previous top-level `namespace` /
`operator.enabled` keys were silently ignored by Helm. Tenant install
also disables CRD creation since the mothership bp-cnpg already owns
them.

Tenant SMTP credentials are wired via spec.valuesFrom referring to a
per-tenant `sme-tenant-smtp-credentials` Secret (optional=true so the
chart still installs before the Secret is reflected — outbound mail
silently no-ops, login flows work).

Tests:
  - TestBPKeycloakEmittedYAMLParses        (every byte parses as YAML)
  - TestBPKeycloakValuesContract           (sovereignFQDN/gateway/smtp/sovereignRealm)
  - TestBPKeycloakValuesContract_NoLegacyKeys
  - TestBPCNPGSubchartKey
  - TestBPKeycloakValuesFromSMTPSecret     (optional, smtp.* targetPath)
  - TestBPKeycloakInstallTimeout

Verified WP / OpenClaw / Stalwart emit shapes already align with their
chart values.yaml (smeDomain / keycloak.realmURL / clientID /
clientSecretName / ingress.host) — no change needed in those templates.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:12:50 +04:00
github-actions[bot]
abea3af1e5 deploy: update catalyst images to 4969525 2026-05-05 07:40:42 +00:00
e3mrah
496952587e
fix(bp-catalyst-platform): create sme namespace on marketplace Sovereigns (1.4.18) (#909)
Every template under templates/sme-services/* (billing, auth, ferretdb,
valkey-cross-ns-secret, sme-secrets, provisioning-github-token,
cnpg-cluster, ...) emits resources with `namespace: sme`. On
Catalyst-Zero (contabo) the `sme` namespace is pre-provisioned by
clusters/contabo-mkt/apps/sme/* — so the chart never needed to create
it. On a fresh franchised Sovereign nothing else creates the `sme`
namespace, so chart 1.4.17 install failed 23 times with
`failed to create resource: namespaces "sme" not found`. Caught live
on otech105 (2026-05-05) — bp-catalyst-platform stuck Ready=False
for 18 minutes blocking every downstream Sovereign Console login + the
full marketplace UI.

Fix:
  - NEW templates/sme-services/sme-namespace.yaml — gated on the same
    `.Values.ingress.marketplace.enabled` flag the rest of the SME
    bundle uses. Renders a Namespace `sme` with
    `helm.sh/resource-policy: keep` so a chart uninstall never
    cascade-deletes every SME workload + tenant.
  - Same dual-mode contract as templates/marketplace-api/secret.yaml
    (#887) and templates/catalyst-openova-kc-credentials-secret.yaml
    (#901): the new file is intentionally NOT added to
    templates/sme-services/kustomization.yaml's `resources:` list, so
    the Kustomize-mode contabo build skips it entirely (contabo's
    `sme` namespace is owned by clusters/contabo-mkt/apps/sme/
    namespace.yaml).

Lockstep:
  - products/catalyst/chart/Chart.yaml: 1.4.17 -> 1.4.18 with
    full 1.4.18 changelog block.
  - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
    pinned chart version 1.4.17 -> 1.4.18 with inline rationale
    comment matching the 1.4.x changelog format.

Verified live on otech105: after the runtime hot-fix
(`kubectl create ns sme`) bp-catalyst-platform reached
Ready=True ("Helm upgrade succeeded for release catalyst-system/
catalyst-platform.v2 with chart bp-catalyst-platform@1.4.17") and
all 40/40 bootstrap-kit HRs converged. This PR ensures future
Sovereigns provision cleanly without operator intervention.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:38:31 +04:00
github-actions[bot]
82ade7397c deploy: update catalyst images to aec4aca 2026-05-05 07:09:37 +00:00
e3mrah
aec4aca296
fix(catalyst-api): PDM client must add basic auth for public ingress (#907) (#908)
# What

The pdm.Client (Reserve / Commit / Release / Check) never sets the
`Authorization: Basic …` header — but the Sovereign-side catalyst-api
talks to PDM via the public ingress at https://pool.openova.io which is
gated by Traefik basicAuth Middleware. Every fresh provision attempt
fails at the very first PDM hop with:

    {"detail":"pool-domain-manager is temporarily unreachable: pdm reserve status 401: 401 Unauthorized\n",
     "error":"pdm-unavailable"}

This blocks 100% of fresh otechN provisions on pool-mode Sovereigns.

# Why now

Caught live during DoD A6 verification on otech104. The
`pdm-basicauth` Secret is already provisioned on Sovereigns (per
api-deployment.yaml lines 588-625, the env vars
CATALYST_PDM_BASIC_AUTH_USER / _PASS are wired through Reflector from
contabo). The handler-side `pdmFlipNS` and `pdmCreatePowerDNSZone`
(Day-2 add-domain operations) already use these credentials — but the
core `pdm.Client` used during initial provisioning does not. This is
the asymmetry the fix corrects.

# What changes

* `internal/pdm/client.go` — add a private `do(req)` helper that
  decorates outbound requests with basic auth from Pod env. Replace
  the four direct `c.HTTP.Do(req)` callsites with `c.do(req)`.
  Read every call so a Secret rotation propagates without a Pod
  restart (Reloader handles env reload). When env is unset the
  helper is a no-op — preserving the in-cluster Service path used
  by Catalyst-Zero (contabo) where Traefik basicAuth is not in
  front of the request.
* `internal/pdm/client_test.go` — two new tests:
  - `TestClient_BasicAuth_AppliedFromEnv` — every method (Check /
    Reserve / Commit / Release) carries the expected `Basic …`
    header when env is set.
  - `TestClient_BasicAuth_OmittedWhenEnvUnset` — defensive shape
    for in-cluster Service path.

Per Inviolable Principle #10, the credentials never enter a struct
that gets logged — read-and-set inside `do()` only.

Per Inviolable Principle #4 (never hardcode), the basic-auth shape
mirrors the existing `pdmBasicAuth()` seam in
`handler/parent_domains.go` — same env-var contract, same defensive
"empty creds = skip auth" semantics.

# Verification

`go test ./internal/pdm/...` passes locally.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 11:07:25 +04:00
github-actions[bot]
300c774ff4 deploy: update catalyst images to e08d872 2026-05-05 07:03:01 +00:00
e3mrah
e08d8721e1
fix(pdm/dynadot): pre-register glue records before set_ns (#900) (#906)
Multi-domain Day-2 add-domain on a Sovereign was failing with Dynadot's
"'ns1.<sov>.omani.works' needs to be registered with an ip address
before it can be used" error. Dynadot rejects set_ns whenever the NS
hostnames aren't registered as account-level "host records" first.

This change wires the glue pre-registration into the PDM dynadot
adapter as an optional registrar.GlueRegistrar interface, threads the
Sovereign's load-balancer IPv4 from cloud-init through Flux postBuild
into the chart's `global.sovereignLBIP`, and forwards it via
catalyst-api's pdmFlipNS to PDM's /set-ns endpoint as a new `glueIP`
field. PDM's SetNS handler calls RegisterGlueRecord for each
out-of-bailiwick NS before SetNameservers, with idempotent get_ns →
register_ns / set_ns_ip semantics so retries are free.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:00:45 +04:00
e3mrah
7658f9d937
fix(catalyst-api): seed sovereign-smtp-credentials Secret on freshly franchised Sovereigns (#883) (#905)
On a freshly franchised Sovereign the console-side magic-link / PIN
email flow fails because there's no SMTP relay reachable in the
cluster. Phase-1 architectural decision (founder-confirmed): the
Sovereign Console relays mail through the mothership Stalwart at
mail.openova.io:587 during initial provisioning. A Sovereign-local
Stalwart-relay is Phase-2 work tracked separately.

This PR teaches the catalyst-api Sovereign provisioner to seed the
catalyst-system/sovereign-smtp-credentials Secret on the new cluster
right after the cloud-init kubeconfig postback lands and BEFORE
runPhase1Watch fires. The bp-catalyst-platform chart's auto-create
step (#901) reads this Secret via Helm `lookup` when rendering the
Sovereign-local catalyst-openova-kc-credentials Secret, so the
chart-rendered bytes carry working SMTP submission credentials and
the auth service's SMTP-PLAIN dial against mail.openova.io:587
succeeds on the first send-pin.

What's seeded:
  Secret catalyst-system/sovereign-smtp-credentials
    smtp-user: <mothership CATALYST_SMTP_USER>
    smtp-pass: <mothership CATALYST_SMTP_PASS>

The mothership catalyst-api Pod already has both env vars wired via
secretKeyRef → catalyst-openova-kc-credentials in the catalyst
namespace (chart api-deployment.yaml.679-740) — no new K8s read
against the mothership API is needed.

Idempotent: an already-existing sovereign-smtp-credentials Secret
short-circuits to AlreadyExists. The helper does NOT update an
existing Secret — operator-supplied bytes take precedence over
mothership re-seed. This survives the kubeconfig PUT retry path,
the kubeconfig-missing relaunch (#538), and operator manual replay
during incident response.

Failure modes are surfaced via the SSE event bus (sovereign-smtp-seed
phase) so the wizard renders the seed outcome inline with helmwatch
events. A failure does NOT abort Phase-1 — the chart's lookup will
not find the Secret, the auth pod will log SMTP-refused on first
send-pin (exactly the pre-fix behaviour), and the operator sees a
loud warn at provision time rather than a silent "ready" with broken
email.

Per docs/INVIOLABLE-PRINCIPLES.md #10 (credential hygiene): the
catalyst-api never logs the SMTP password. Logs include the
deployment id, target namespace + secret name, and byte length —
never the plaintext.

Per #4 (never hardcode): namespace + secret name are fixed-by-chart-
contract (#901); timeout is overridable via
CATALYST_SOVEREIGN_SMTP_SEED_TIMEOUT.

Tests:
  - skipped-no-env outcome when mothership env unset
  - happy path: Secret + Namespace created, data + labels +
    annotations verified
  - already-exists pre-Create: no overwrite of operator bytes
  - race during Create: AlreadyExists treated as success
  - client-build failure: ClientFailure outcome
  - api-failure on Get (non-NotFound): APIFailure outcome
  - emit event matrix: every outcome maps to expected level + substr

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:58:49 +04:00
e3mrah
368545369b
fix(bp-stalwart-tenant): unbootable on fresh tenants — values shape, missing admin Secret, sec ctx (#898) (#904)
Three fixes that left bp-stalwart-tenant 0.1.0 unable to come up on a
freshly-franchised SME tenant. All surfaced on the otech103 alice
tenant during the Phase-1 DoD sweep.

1. Tenant-domain values shape (HelmRelease render error)

   The 0.1.0 chart referenced `.Values.domain.primary` in five
   templates. The live HR on otech103 had `values.domain:
   acme.omani.works` (a string), emitted by a pre-#897 catalyst-api
   build, so every reconcile died with:

     can't evaluate field primary in type interface {}

   Added `bp-stalwart-tenant.tenantDomain` + `tenantMode` helpers
   that resolve in priority order:

     1. `tenant.domain`        (forward-looking flat shape)
     2. `domain.primary`       (canonical post-#897 map shape)
     3. `domain` (string)      (legacy pre-#897 shape — back-compat)

   Returns "" smoke-render-safe; per-template gates skip when empty.

2. Missing stalwart-admin Secret

   deployment.yaml + mailbox-provision-job.yaml reference a Secret
   key `ADMIN_PASSWORD` on `.Values.admin.secretName`. The 0.1.0
   chart only emitted an ExternalSecret, and only when
   `admin.externalSecret.remoteRef.key` was non-empty (smoke-render
   concession). Fresh tenants land in CreateContainerConfigError.

   Added `templates/admin-secret.yaml` mirroring marketplace-api/
   secret.yaml (#887): random 32-char ADMIN_PASSWORD generated by
   sprig randAlphaNum, persisted across reconcile via lookup,
   helm.sh/resource-policy: keep so reinstall picks it back up.
   Auto-disabled when an authoritative ExternalSecret is wired —
   no double-bind between two controllers.

3. Pod sec ctx vs. upstream image's file capabilities

   `getcap docker.io/stalwartlabs/stalwart:v0.16.3 /usr/local/bin/
   stalwart` reports `cap_net_bind_service=ep`. The image creates
   user `stalwart` at UID 2000 and the binary IS the entrypoint
   (no demotion script). The 0.1.0 chart ran as UID 65534 with
   `drop: ALL` — kernel refuses to elevate file caps with empty
   bounding set, so exec failed with `operation not permitted`.

   Aligned to image's native UID 2000, kept `drop: ALL` and added
   `NET_BIND_SERVICE` explicitly. fsGroup 2000 ensures /opt/stalwart
   PVC is writable.

Other:
- Bumped Chart.yaml + blueprint.yaml to 0.1.1 (#817 alignment).
- configSchema in blueprint.yaml now permits the legacy + tenant
  shapes alongside the canonical map.
- mailboxProvisioner.setupJob.enabled defaults to false until the
  canonical stalwart-cli image is published (re-uses upstream
  stalwart container as fallback CLI host).

Acceptance: targeted at otech103 alice tenant
(sme-789ae512-bc0f-467c-a016-001f5496c403) where 0.1.0 reconciliation
fails with the value-shape error and the pod CrashLoops with `exec
... operation not permitted`. Verification on otech103 in #898.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:55:03 +04:00
e3mrah
cab0a30e4a
fix(catalyst): unblock Sovereign Console login on fresh provision (#901) (#903)
Three-bug chain blocked https://console.<sov-fqdn>/login PIN-issue on
every fresh Sovereign with HTTP 503 "CATALYST_OPENOVA_KC_SA_CLIENT_SECRET
not set":

1. catalyst-openova-kc-credentials Secret was hand-rolled on contabo-mkt
   and never provisioned on Sovereign by the chart. NEW
   templates/catalyst-openova-kc-credentials-secret.yaml mirrors the
   canonical KC SA Secret (keycloak/catalyst-kc-sa-credentials, created
   by bp-keycloak's openbao-bridge post-install hook) into
   catalyst-system/catalyst-openova-kc-credentials with the keys
   api-deployment.yaml's PIN-auth env block expects. Same Helm-`lookup`
   persistence + `helm.sh/resource-policy: keep` pattern as
   templates/marketplace-api/secret.yaml (#887).

   Sovereign-vs-contabo gate: render only when `lookup "v1" "Secret"
   "keycloak" "catalyst-kc-sa-credentials"` returns non-nil. On contabo
   that lookup is nil (Catalyst-Zero uses keycloak-zero in its own ns
   with its own hand-rolled Secret); template emits empty bytes, no
   ownership flap. Not added to templates/kustomization.yaml `resources:`
   so Kustomize-mode contabo build skips it entirely.

2. SMTP host default `stalwart-web.stalwart.svc.cluster.local` doesn't
   resolve on Sovereign. Chart now populates smtp-host/smtp-port/smtp-from
   from .Values.sovereign.smtp.* defaulting to mail.openova.io:587 /
   noreply@openova.io. SMTP user/pass mirrored from a SECONDARY lookup
   against catalyst-system/sovereign-smtp-credentials (#883 seam). When
   the source Secret is absent the new Secret renders with empty
   smtp-user/smtp-pass — login surface still works and PIN delivery
   surfaces as a clear "email delivery failed" log line, not as a 503.

3. CATALYST_POST_AUTH_REDIRECT default `/sovereign/wizard` is mothership-
   only. Default flips to `/sovereign/components` (the post-handover
   Sovereign Console homepage). Per-Sovereign overlays override via the
   catalystApi.env additional-env patch — the chart value is a literal
   per the dual-mode contract documented in the CATALYST_POWERDNS_API_URL
   block of api-deployment.yaml.

Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.16 → 1.4.17.

Refs: #901

Signed-off-by: hatice.yildiz <hatice.yildiz@openova.io>
Co-authored-by: hatice.yildiz <hatice.yildiz@openova.io>
2026-05-05 10:54:09 +04:00
e3mrah
93c4b700de
fix(bp-keycloak): templatize existingConfigmap reference for per-tenant installs (#899) (#902)
bp-keycloak 1.3.2 hardcoded `keycloak.keycloakConfigCli.existingConfigmap` to
the literal "keycloak-sovereign-realm-config". This worked for the Sovereign-
mothership bootstrap-kit (releaseName=keycloak emits matching ConfigMap) but
broke for every per-tenant install where releaseName=bp-keycloak emits
"bp-keycloak-sovereign-realm-config" — the post-install keycloak-config-cli
Job stuck in ContainerCreating with `MountVolume.SetUp failed for volume
"config-volume" : configmap "keycloak-sovereign-realm-config" not found`,
HelmRelease InstallFailed after 15m timeout, cascading to bp-openclaw and
bp-wordpress-tenant which dependsOn it.

The bitnami/keycloak subchart's `keycloak.keycloakConfigCli.configmapName`
helper (charts/keycloak/templates/_helpers.tpl) applies `tpl` to the
existingConfigmap value, so embedding `{{ .Release.Name }}` inside the
string resolves at chart-render time. With this single-line change:

  - Sovereign-mothership (releaseName=keycloak) → keycloak-sovereign-realm-config (unchanged)
  - Per-tenant (releaseName=bp-keycloak)        → bp-keycloak-sovereign-realm-config (matches actual emitted ConfigMap)

Verified via helm template both modes — backendRef and config-volume
configMap.name match the actual ConfigMap emitted by
templates/configmap-sovereign-realm.yaml.

Chart bumped 1.3.2 → 1.3.3 + bootstrap-kit slot 09 + blueprint.yaml.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:49:39 +04:00
github-actions[bot]
febad0249d deploy: update catalyst images to 6b0d6c3 2026-05-05 06:00:29 +00:00
e3mrah
6b0d6c37af
fix(catalyst-api): SME tenant bp-stalwart overlay uses correct domain.{primary,mode} schema (#897)
* fix(bp-catalyst-platform): bump 1.4.15 -> 1.4.16 to republish with #893/#889 catalyst-api image (727fb2f)

* fix(catalyst-api): SME tenant bp-stalwart overlay uses correct domain.{primary,mode} schema

The bp-stalwart-tenant chart values schema is:
  domain:
    primary: <fqdn>
    mode: free-subdomain | byo

But the tenant overlay template emitted a flat scalar:
  domain: <fqdn>

Helm rendered the mailbox-provision-job template and hit:
  template: bp-stalwart-tenant/templates/mailbox-provision-job.yaml:67:
  can't evaluate field primary in type interface {}

Fix: emit the correct nested object with .DomainMode threaded through
from smeTenantTemplateData (already populated by renderSMETenantOverlay).

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:58:11 +04:00
github-actions[bot]
d084cceeba deploy: update catalyst images to 98f5543 2026-05-05 05:54:30 +00:00
e3mrah
98f5543bdc
fix(bp-catalyst-platform): bump 1.4.15 -> 1.4.16 to republish with #893/#889 catalyst-api image (727fb2f) (#896)
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:52:30 +04:00
github-actions[bot]
98fc72dfd4 deploy: update catalyst images to 727fb2f 2026-05-05 05:47:47 +00:00
e3mrah
727fb2ffdd
fix(catalyst-api): SME tenant orchestrator emits shared helmrepositories.yaml (#893 follow-up) (#895)
* fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889)

The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at
clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml
that enumerates tenant subdirectories. The orchestrator only wrote
per-tenant overlays without the parent index, so on otech103 Flux
hit:

  kustomization path not found: stat /tmp/kustomization-...
  /clusters/otech103.omani.works/sme-tenants: no such file or directory

Even after a tenant signup, the parent path lacked a kustomization.yaml
so Flux couldn't enumerate subdirs.

Fix: NEW writeParentTenantsIndex helper called from both
WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for
subdirectories that contain kustomization.yaml, sorts them lexically
for deterministic output (no spurious diffs), and writes a parent
kustomization.yaml listing them under `resources:`. Empty list (no
tenants) renders as `resources: []` — still a valid Kustomization
root, so Flux stays Ready=True after the last tenant teardown.

git add covers both the per-tenant subdir AND the parent index, so a
single commit captures the delta.

Live on otech103 post-cutover, 2026-05-05.

* fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891)

After Day-2 cutover, the GitRepository ignore filter excluded the
Sovereign's own clusters/<sov-fqdn>/ subtree. This made every
Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov
overlays) hit "kustomization path not found" because source-controller
filtered the path out of the artifact tarball.

Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for
20+ minutes despite the orchestrator successfully committing the
overlay to local Gitea.

Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a
multi-line YAML strategic-merge file via /tmp emptyDir (since the
Pod runs readOnlyRootFilesystem), composing the new ignore filter:

  /*
  !/clusters/_template
  !/clusters/${SOVEREIGN_FQDN}
  !/platform
  !/products

The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already
established in the chart values).

Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep.

* fix(catalyst-api): SME tenant HR templates reference correct per-blueprint HelmRepository names (#893)

Five overlay templates in sme_tenant_gitops.go hardcoded:
  sourceRef:
    name: openova-blueprints

But Sovereign clusters have NO HelmRepository named `openova-blueprints`.
Each blueprint ships its own HelmRepository named after itself:
- bp-keycloak / bp-cnpg / bp-wordpress-tenant / bp-openclaw /
  bp-stalwart-tenant

Live on otech103 (2026-05-05): all 5 tenant bp-* HRs stuck in
"HelmChart not ready: latest generation of object has not been
reconciled" because the HelmRepository didn't exist.

Fix: each template's sourceRef.name now matches the actual
HelmRepository name. Verified live patch works on otech103.

* fix(catalyst-api): SME tenant orchestrator emits shared helmrepositories.yaml at parent level (#893 follow-up)

After #893 fixed the per-tenant HR sourceRef.name to match the actual
HelmRepository name, the HelmRepositories themselves were absent on
Sovereigns: the bootstrap-kit only ships a small canonical set
(bp-cilium, bp-cnpg, bp-keycloak, bp-gitea, ...). The SME tenant
charts (bp-wordpress-tenant, bp-openclaw, bp-stalwart-tenant) and the
vcluster (loft) repo aren't on a Sovereign by default.

Fix: extend writeParentTenantsIndex to ALSO emit a shared
helmrepositories.yaml at clusters/<sov-fqdn>/sme-tenants/
helmrepositories.yaml. The parent kustomization.yaml lists it FIRST
so source-controller reconciles the HelmRepositories before any
tenant HelmChart is requested.

Six HelmRepositories total: bp-keycloak, bp-cnpg, bp-wordpress-tenant,
bp-openclaw, bp-stalwart-tenant (oci://ghcr.io/openova-io), and loft
(https://charts.loft.sh) for the vcluster chart.

Live verification on otech103: applied the four missing repos
(bp-wordpress-tenant, bp-openclaw, bp-stalwart-tenant, loft) and the
tenant HRs progress past SourceNotReady.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:44:52 +04:00
github-actions[bot]
4a810ddcf7 deploy: update catalyst images to 3eb0cd6 2026-05-05 05:43:58 +00:00
e3mrah
3eb0cd6d0b
fix(catalyst-api): SME tenant HR templates reference correct per-blueprint HelmRepository names (#893) (#894)
* fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889)

The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at
clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml
that enumerates tenant subdirectories. The orchestrator only wrote
per-tenant overlays without the parent index, so on otech103 Flux
hit:

  kustomization path not found: stat /tmp/kustomization-...
  /clusters/otech103.omani.works/sme-tenants: no such file or directory

Even after a tenant signup, the parent path lacked a kustomization.yaml
so Flux couldn't enumerate subdirs.

Fix: NEW writeParentTenantsIndex helper called from both
WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for
subdirectories that contain kustomization.yaml, sorts them lexically
for deterministic output (no spurious diffs), and writes a parent
kustomization.yaml listing them under `resources:`. Empty list (no
tenants) renders as `resources: []` — still a valid Kustomization
root, so Flux stays Ready=True after the last tenant teardown.

git add covers both the per-tenant subdir AND the parent index, so a
single commit captures the delta.

Live on otech103 post-cutover, 2026-05-05.

* fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891)

After Day-2 cutover, the GitRepository ignore filter excluded the
Sovereign's own clusters/<sov-fqdn>/ subtree. This made every
Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov
overlays) hit "kustomization path not found" because source-controller
filtered the path out of the artifact tarball.

Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for
20+ minutes despite the orchestrator successfully committing the
overlay to local Gitea.

Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a
multi-line YAML strategic-merge file via /tmp emptyDir (since the
Pod runs readOnlyRootFilesystem), composing the new ignore filter:

  /*
  !/clusters/_template
  !/clusters/${SOVEREIGN_FQDN}
  !/platform
  !/products

The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already
established in the chart values).

Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep.

* fix(catalyst-api): SME tenant HR templates reference correct per-blueprint HelmRepository names (#893)

Five overlay templates in sme_tenant_gitops.go hardcoded:
  sourceRef:
    name: openova-blueprints

But Sovereign clusters have NO HelmRepository named `openova-blueprints`.
Each blueprint ships its own HelmRepository named after itself:
- bp-keycloak / bp-cnpg / bp-wordpress-tenant / bp-openclaw /
  bp-stalwart-tenant

Live on otech103 (2026-05-05): all 5 tenant bp-* HRs stuck in
"HelmChart not ready: latest generation of object has not been
reconciled" because the HelmRepository didn't exist.

Fix: each template's sourceRef.name now matches the actual
HelmRepository name. Verified live patch works on otech103.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:41:47 +04:00
e3mrah
eddf0e62a4
fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter (#891) (#892)
* fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889)

The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at
clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml
that enumerates tenant subdirectories. The orchestrator only wrote
per-tenant overlays without the parent index, so on otech103 Flux
hit:

  kustomization path not found: stat /tmp/kustomization-...
  /clusters/otech103.omani.works/sme-tenants: no such file or directory

Even after a tenant signup, the parent path lacked a kustomization.yaml
so Flux couldn't enumerate subdirs.

Fix: NEW writeParentTenantsIndex helper called from both
WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for
subdirectories that contain kustomization.yaml, sorts them lexically
for deterministic output (no spurious diffs), and writes a parent
kustomization.yaml listing them under `resources:`. Empty list (no
tenants) renders as `resources: []` — still a valid Kustomization
root, so Flux stays Ready=True after the last tenant teardown.

git add covers both the per-tenant subdir AND the parent index, so a
single commit captures the delta.

Live on otech103 post-cutover, 2026-05-05.

* fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891)

After Day-2 cutover, the GitRepository ignore filter excluded the
Sovereign's own clusters/<sov-fqdn>/ subtree. This made every
Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov
overlays) hit "kustomization path not found" because source-controller
filtered the path out of the artifact tarball.

Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for
20+ minutes despite the orchestrator successfully committing the
overlay to local Gitea.

Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a
multi-line YAML strategic-merge file via /tmp emptyDir (since the
Pod runs readOnlyRootFilesystem), composing the new ignore filter:

  /*
  !/clusters/_template
  !/clusters/${SOVEREIGN_FQDN}
  !/platform
  !/products

The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already
established in the chart values).

Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:39:42 +04:00
github-actions[bot]
c2ff6da073 deploy: update catalyst images to a9f0626 2026-05-05 05:31:48 +00:00
e3mrah
a9f06265fb
fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889) (#890)
The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at
clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml
that enumerates tenant subdirectories. The orchestrator only wrote
per-tenant overlays without the parent index, so on otech103 Flux
hit:

  kustomization path not found: stat /tmp/kustomization-...
  /clusters/otech103.omani.works/sme-tenants: no such file or directory

Even after a tenant signup, the parent path lacked a kustomization.yaml
so Flux couldn't enumerate subdirs.

Fix: NEW writeParentTenantsIndex helper called from both
WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for
subdirectories that contain kustomization.yaml, sorts them lexically
for deterministic output (no spurious diffs), and writes a parent
kustomization.yaml listing them under `resources:`. Empty list (no
tenants) renders as `resources: []` — still a valid Kustomization
root, so Flux stays Ready=True after the last tenant teardown.

git add covers both the per-tenant subdir AND the parent index, so a
single commit captures the delta.

Live on otech103 post-cutover, 2026-05-05.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:29:44 +04:00
github-actions[bot]
654ac4fb5e deploy: update catalyst images to 3726176 2026-05-05 05:28:33 +00:00
e3mrah
3726176e19
fix(bp-catalyst-platform): auto-provision marketplace-api-secrets on Sovereign install (#887) (#888)
* fix(bp-catalyst-platform): bump 1.4.13 -> 1.4.14 to republish with #879 catalyst-api image (7bfd6df)

Chart 1.4.13 was published from commit 7bfd6df5 (the #879 fix) BEFORE the
deploy-bot updated values.yaml's catalystApi.tag from aa226df -> 7bfd6df,
so 1.4.13 OCI bytes still reference the OLD catalyst-api image without
the pdmFlipNS basic-auth + nameservers + lookup-primary-domain
SOVEREIGN_FQDN-fallback fixes.

Same deploy-step race already documented in 1.4.6 / 1.4.9 / 1.4.12
changelog entries — catalyst-build CI doesn't yet auto-bump chart patch
+ dispatch blueprint-release the way services-build does (per #874), so
this manual republish is required after every catalyst-api image change.

No template/code changes — pure version bump to roll a fresh OCI artifact
whose values.yaml references catalystApi.tag=7bfd6df. Lockstep slot 13
pin bumps to 1.4.14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bp-catalyst-platform): auto-provision marketplace-api-secrets on Sovereign install (#887)

templates/marketplace-api/deployment.yaml referenced a secretKeyRef on
`marketplace-api-secrets` (key: `jwt-secret`) but the chart never rendered
the Secret. On contabo-mkt this is hand-rolled; on a freshly franchised
Sovereign with ingress.marketplace.enabled=true the marketplace-api Pod
hit CreateContainerConfigError on every reconcile.

Fix: NEW templates/marketplace-api/secret.yaml uses Helm `lookup` to
persist a 64-char randAlphaNum jwt-secret across reconciles (same
load-bearing pattern as sme-secrets, valkey-cross-ns-secret,
provisioning-github-token, gitea-admin-secret per
feedback_passwords.md). Without lookup every reconcile would invalidate
every active marketplace JWT.

helm.sh/resource-policy: keep so the Secret survives helm uninstall.
Lockstep slot 13 pin bumps 1.4.14 -> 1.4.15.

Caught live on otech103 post-cutover, 2026-05-05.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 09:26:23 +04:00
github-actions[bot]
87e090dd0c deploy: update catalyst images to 213039d 2026-05-05 05:12:35 +00:00
e3mrah
213039dc31
fix(bp-catalyst-platform): bump 1.4.13 -> 1.4.14 to republish with #879 catalyst-api image (7bfd6df) (#886)
Chart 1.4.13 was published from commit 7bfd6df5 (the #879 fix) BEFORE the
deploy-bot updated values.yaml's catalystApi.tag from aa226df -> 7bfd6df,
so 1.4.13 OCI bytes still reference the OLD catalyst-api image without
the pdmFlipNS basic-auth + nameservers + lookup-primary-domain
SOVEREIGN_FQDN-fallback fixes.

Same deploy-step race already documented in 1.4.6 / 1.4.9 / 1.4.12
changelog entries — catalyst-build CI doesn't yet auto-bump chart patch
+ dispatch blueprint-release the way services-build does (per #874), so
this manual republish is required after every catalyst-api image change.

No template/code changes — pure version bump to roll a fresh OCI artifact
whose values.yaml references catalystApi.tag=7bfd6df. Lockstep slot 13
pin bumps to 1.4.14.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 09:10:37 +04:00
e3mrah
4120e4ed9d
fix(bp-catalyst-platform): Flux Kustomization watching SME tenant overlays (#882) (#885)
The catalyst-api SME-tenant pipeline's GitOps writer
(sme_tenant_gitops.go::WriteTenantOverlay) commits per-tenant Kustomize
overlays to clusters/<sov-fqdn>/sme-tenants/<tenant-id>/ on every
successful POST /api/v1/sme/tenants — but no Flux Kustomization on the
Sovereign cluster watched that path.

The state machine (sme_tenant.go) advanced optimistically through every
step (vcluster -> bp_charts -> dns -> certs -> keycloak_clients ->
registry) and reported state=done, while no actual K8s resources
materialised because nothing was reconciling the orchestrator's write
target.

Verified live on otech103 (2026-05-04 23:18 Berlin): the orchestrator
successfully committed the 9-file overlay for tenant 15f1e45e-... to
the local Gitea openova/openova repo @main, but `kubectl get hr -n
sme-15f1e45e-...` returned No resources found indefinitely.

Fix:
- NEW templates/sme-services/sme-tenants-kustomization.yaml renders
  one Flux Kustomization in flux-system that sweeps the entire
  ./clusters/<global.sovereignFQDN>/sme-tenants directory tree.
- sourceRef: flux-system/openova GitRepository (the same one the
  cluster bootstraps from; cutover Step 5 flips its .spec.url to the
  local in-cluster Gitea, which is precisely where sme_tenant_gitops.go
  pushes via CATALYST_GITOPS_REPO_URL).
- interval=1m (matches the orchestrator's documented "Flux reconciles
  within ~1 min" SLA), prune=true (DELETE /api/v1/sme/tenants/<id>
  removes the overlay; Flux GCs the resources), wait=false (per-tenant
  overlays each install ~5 bp-* HRs asynchronously and have their own
  readiness watcher in the orchestrator; blocking this top-level
  Kustomization on every tenant's full readiness would let one stuck
  tenant gate every other tenant).
- Gated on .Values.ingress.marketplace.enabled — non-marketplace
  Sovereigns don't run the SME tenant pipeline.
- Per Inviolable Principle #4, every knob is operator-overridable
  via .Values.smeTenants.kustomization.* (sourceRef name/namespace,
  interval, retryInterval, timeout, prune, wait).

Lockstep slot 13 pin in clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
bumps from 1.4.12 -> 1.4.13.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 09:09:00 +04:00
github-actions[bot]
be54707bfb deploy: update catalyst images to 7bfd6df 2026-05-05 05:04:30 +00:00
e3mrah
7bfd6df588
fix(catalyst-api,bp-catalyst-platform,infra): unblock multi-domain Day-2 add-domain flow on Sovereigns (#879) (#884)
5 stacked wiring bugs blocked the Day-2 add-parent-domain happy path on a
fresh post-handover Sovereign — surfaced live on otech103, 2026-05-05 — plus
a 6th gap (ghcr-pull reflector for catalyst-system). All six fixed in one PR
so a single chart bump + cloud-init re-render closes the gap end-to-end.

Bug 1 (chart, api-deployment.yaml): wire POOL_DOMAIN_MANAGER_URL=
https://pool.openova.io. The in-cluster Service default only resolves on
contabo; on Sovereigns every Day-2 POST died with NXDOMAIN.

Bug 2 (chart + code): wire CATALYST_PDM_BASIC_AUTH_USER / _PASS env from a
new pdm-basicauth Secret, and have pdmFlipNS SetBasicAuth from those envs.
The PDM public ingress at pool.openova.io is gated by Traefik basicAuth;
calls without Authorization: Basic returned 401. optional=true so contabo
+ CI + older Sovereigns degrade to a clear 401 log line. Per Inviolable
Principle #10, the credentials only ever live in Pod env + are read once
per call by pdmFlipNS — never enter a logged struct or persisted record.

Bug 3 (code, parent_domains.go): pdmFlipNS body now includes the required
nameservers field (computed from expectedNSFor). PDM's SetNSRequest schema
requires it; the previous body got 422 missing-nameservers.

Bug 4 (code, parent_domains.go): lookupPrimaryDomain falls back to
SOVEREIGN_FQDN env after CATALYST_PRIMARY_DOMAIN. On a post-handover
Sovereign no Deployment record is persisted, so without this fallback GET
/parent-domains returned {"items":[]} and the propagation panel showed
expectedNs:null. SOVEREIGN_FQDN is already wired by api-deployment.yaml
from the sovereign-fqdn ConfigMap.

Bug 5 (chart, httproute.yaml): catalyst-ui /auth/* PathPrefix narrowed to
Exact /auth/handover. The previous PathPrefix collided with OIDC PKCE
redirect_uri /auth/callback — catalyst-api 404s on that path because it
only registers /api/v1/auth/callback, breaking login post-handover-JWT-
cookie expiry. Exact match keeps /auth/handover routed to catalyst-api
while every other /auth/* path falls through to catalyst-ui's React
Router for client-side OIDC.

Bug 6 (cloud-init): ghcr-pull + harbor-robot-token + new pdm-basicauth
Reflector annotations enumerate explicit allowed/auto-namespaces (sme,
catalyst, catalyst-system, gitea, harbor) instead of empty-string. The
ambiguous empty-string interpretation caused otech103 to require a manual
catalyst-system mirror creation; explicit list back-ports the verified
working state.

Provisioner wiring: Request.PDMBasicAuthUser/Pass + Provisioner fields
+ tfvars emission so the contabo catalyst-api can stamp the credentials
onto every Sovereign provision request. variables.tf adds matching
pdm_basic_auth_user / pdm_basic_auth_pass tofu vars (sensitive, default
empty) so older provisioner builds that pre-date this change keep
rendering valid cloud-init (the Secret renders with empty values and
Pod start is unaffected).

Chart bumped 1.4.11 -> 1.4.12, lockstep slot 13 pin updated. Closes
the architectural blockers tracked in #879; the catalyst-api image
rebuild + chart republish run via the existing CI pipelines (services-
build.yaml + blueprint-release.yaml) on this commit's SHA.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 09:02:39 +04:00
github-actions[bot]
2bcff5b43b deploy: update catalyst images to aa226df 2026-05-05 04:52:11 +00:00
e3mrah
aa226df757
fix(bp-catalyst-platform): bump 1.4.11 -> 1.4.12 to republish with current catalyst-api image (#878 follow-up) (#881)
Same deploy-step race as #871 (chart 1.4.9): chart 1.4.11 was
published from commit 7bdd14fc BEFORE the deploy-bot updated
values.yaml's catalystApi.tag from 20413ec -> 7bdd14f. The OCI
artifact for 1.4.11 still bakes in the OLD image SHA without the
git binary, so otech103 reconciles 1.4.11 and the catalyst-api Pod
runs an image that still fails the SME tenant pipeline at git clone.

Long-term fix is the catalyst-build equivalent of #874 (auto-bump
chart patch on Catalyst-API image rebuild). Short-term: this manual
bump.

No template change. Lockstep slot 13 pin bumps to 1.4.12.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:50:06 +04:00
github-actions[bot]
1d7023d7c0 deploy: update catalyst images to 7bdd14f 2026-05-05 04:47:59 +00:00
e3mrah
7bdd14fcb1
fix(catalyst-api,bp-catalyst-platform): SME tenant gitops auth + git binary (#878) (#880)
Three-part fix that unblocks the SME tenant pipeline post-Day-2-
Independence cutover. Live-reproduced on otech103 — POST /api/v1/sme/
tenants succeeds (HTTP 202) but the first reconcile fails with
"gitops token unconfigured" → after wiring the env, fails with
`exec: "git": executable file not found in $PATH` → after fixing
the URL hardcoding, would still 401 against local Gitea because
the basic-auth username is hardcoded "x-access-token".

Part A — code (marketplace_settings.go + sme_tenant_gitops.go):
- Add gitOpsConfig.User (loaded from CATALYST_GITOPS_USER env,
  default "x-access-token" for back-compat with GitHub PATs).
- New injectTokenIntoURLWithUser(rawURL, user, token) — variant of
  injectTokenIntoURL that takes a configurable basic-auth username.
- Update all 3 call sites in marketplace_settings.go +
  sme_tenant_gitops.go to use the new variant with cfg.User.

Part B — Containerfile:
- apk add git in the runtime stage. The SME tenant pipeline (#804)
  and marketplace-settings GitOps writer both shell out to git
  clone/commit/push; without the binary every first reconcile fails.

Part C — chart (api-deployment.yaml):
- Wire CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN envs on
  catalyst-api Deployment, sourced from the local `gitea-admin-secret`
  (already mirrored into catalyst-system via bp-reflector annotation
  per #866). optional=true so Catalyst-Zero (contabo) keeps using
  its existing GitHub PAT path.

Bump bp-catalyst-platform 1.4.10 -> 1.4.11 + lockstep slot 13 pin.

Closes #878

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:45:45 +04:00
e3mrah
8e4c88fd28
fix(bp-self-sovereign-cutover): auto-sync local Gitea mirror from upstream GitHub (#870) (#875)
Step-1 gitea-mirror Job replaces the legacy one-shot create-empty-repo +
git-push pattern with a single call to Gitea's native /repos/migrate API
with mirror=true and mirror_interval=10m0s. Gitea now polls the upstream
openova-io/openova repo on a 10-minute interval and replicates branches
+ tags into the local Sovereign Gitea automatically.

Closes the "Sovereign drifts from upstream main forever after Day-2
cutover" bug — hit twice during the otech103 2026-05-04 overnight DoD
session, requiring manual `git fetch` inside the Gitea pod for every
chart rollout.

Why /repos/migrate over the previous git push approach:
- Gitea cannot convert a regular repo into a pull-mirror after creation
  (the mirror flag is set at create-time only). The migrate endpoint
  creates the repo AS a mirror in one shot.
- The migrate endpoint accepts toggles for issues / pull-requests /
  wiki / labels / milestones / releases — we set them all to false so
  Gitea only replicates branches+tags, the only refs the Sovereign's
  Flux GitRepository needs.
- Recurring sync is a Gitea-native capability; using it avoids a
  parallel CronJob (which would violate the "event-driven not cron"
  inviolable principle) or a long-poll sidecar (which would duplicate
  what Gitea already does).

Idempotency: if the repo already exists from a prior cutover attempt,
the script PATCHes mirror_interval to the desired value and POSTs to
/mirror-sync to trigger an immediate refresh. Note that PATCH alone
cannot convert a legacy non-mirror repo to a mirror — Sovereigns
seeded by chart < 0.1.14 would need an operator-driven repo delete +
re-migrate to retro-fit auto-sync, but new provisions take the
migrate path automatically.

Verification on the rendered ConfigMap:
  $ helm template smoke .                   # renders 16 docs cleanly
  $ bash tests/cutover-contract.sh          # all 7 gates green
  $ sh -n <rendered-script>                 # POSIX shell syntax OK

Chart bumped 0.1.13 → 0.1.14 (Chart.yaml + blueprint.yaml spec.version
aligned per #817 invariant + slot 06a-bp-self-sovereign-cutover.yaml
pin lockstep).

Refs #870, #790.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:35:40 +04:00
e3mrah
5a8210856f
fix(bp-catalyst-platform): wire CATALYST_OTECH_FQDN env on catalyst-api Deployment (#876) (#877)
The SME tenant create handler (sme_tenant.go:481) and the parent-
domain pool seed (sovereign_parent_domains.go:45) both read the
CATALYST_OTECH_FQDN env. The chart only wired SOVEREIGN_FQDN (same
value semantically — the Sovereign's public FQDN — but a different
env name). Without CATALYST_OTECH_FQDN, POST /api/v1/sme/tenants
returns 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign,
and the SME-pool fallback path returns an empty list.

Fix: add a CATALYST_OTECH_FQDN env entry on the catalyst-api
Deployment, sourced from the same `sovereign-fqdn` ConfigMap (key
`fqdn`) that feeds SOVEREIGN_FQDN. optional=true since Catalyst-Zero
(contabo) doesn't run the SME tenant pipeline. The two env names
exist for historical reasons (Phase-8b handover vs SME-tier tenant
pipeline #804); they ultimately point at the same value.

Bump bp-catalyst-platform 1.4.9 -> 1.4.10 + lockstep slot 13 pin.

Closes #876

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:35:27 +04:00
e3mrah
db332f6767
fix(ci): services-build auto-bumps chart patch + dispatches blueprint-release (#874)
* fix(bp-catalyst-platform): bump 1.4.8 -> 1.4.9 to republish with current services-auth image (#871)

Chart 1.4.8 was published from commit 95a06f56 BEFORE the deploy-bot
updated templates/sme-services/auth.yaml's image pin from
services-auth:fa4395f -> services-auth:95a06f5 (which has the
/auth/send-pin alias from PR #869). The blueprint-release workflow
fired on 95a06f56 only, so the OCI artifact for 1.4.8 was published
with the OLD image SHA in chart bytes. otech103 reconciled 1.4.8 and
rendered the auth Deployment with the OLD image -> /auth/send-pin
returns 404 -> SME marketplace signup blocked.

Same deploy-step race documented in feedback_idempotent_iac_purge.md
and the overnight DoD bookmark. Long-term fix is a double-bump
sequencing PR (file separately); short-term fix is bumping the chart
version so blueprint-release republishes the artifact with the
current image pin.

No template change. Lockstep slot 13 pin in
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml bumps
from 1.4.8 -> 1.4.9.

Closes #871

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): services-build deploy auto-bumps chart patch + dispatches blueprint-release (#872)

Eliminate the recurring race between services-build's deploy commit
and blueprint-release's path-trigger on chart-version-bumping PRs.

Before: a PR bumping `products/catalyst/chart/Chart.yaml` AND touching
`core/services/**` triggered both workflows on the same merge SHA in
parallel. blueprint-release packaged the chart at the merge commit
(which still held the OLD image SHAs) and published the bumped
chart version with stale image refs. services-build's deploy commit
landed AFTER, but per GitHub Actions design GITHUB_TOKEN-authored
pushes do NOT re-trigger workflows, so blueprint-release never fired
again on the corrected chart. A manual no-op chart bump PR was the
only way to republish (PR #865 chasing PR #864 was the live incident).

After: services-build's deploy step
  1. sed-rewrites image: lines under products/catalyst/chart/templates/sme-services/*.yaml (unchanged)
  2. Pure-bash semver patch-bumps Chart.yaml `version:` and `appVersion:` atomically
  3. Single commit captures both rewrites
  4. Explicit `gh workflow run blueprint-release.yaml -f blueprint=catalyst -f tree=products` dispatches the chart publish (matches catalyst-build's PR #720 pattern)
  5. Idempotent push retry re-reads origin/main and bumps from THAT version on conflict, so concurrent CI runs produce strictly increasing patch versions instead of clobbering each other

Adds `actions: write` to the deploy job permissions so the
gh workflow run dispatch doesn't return HTTP 403.

The manual chart-version field in author PRs becomes a floor; CI
auto-bumps from there. PR authors should NOT bump the patch
themselves any more — the deploy step does it. Major/minor bumps
remain the author's call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:32:34 +04:00
github-actions[bot]
8e8bb642aa deploy: update catalyst images to 20413ec 2026-05-05 04:31:32 +00:00
e3mrah
20413ecc14
fix(bp-catalyst-platform): bump 1.4.8 -> 1.4.9 to republish with current services-auth image (#871) (#873)
Chart 1.4.8 was published from commit 95a06f56 BEFORE the deploy-bot
updated templates/sme-services/auth.yaml's image pin from
services-auth:fa4395f -> services-auth:95a06f5 (which has the
/auth/send-pin alias from PR #869). The blueprint-release workflow
fired on 95a06f56 only, so the OCI artifact for 1.4.8 was published
with the OLD image SHA in chart bytes. otech103 reconciled 1.4.8 and
rendered the auth Deployment with the OLD image -> /auth/send-pin
returns 404 -> SME marketplace signup blocked.

Same deploy-step race documented in feedback_idempotent_iac_purge.md
and the overnight DoD bookmark. Long-term fix is a double-bump
sequencing PR (file separately); short-term fix is bumping the chart
version so blueprint-release republishes the artifact with the
current image pin.

No template change. Lockstep slot 13 pin in
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml bumps
from 1.4.8 -> 1.4.9.

Closes #871

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:29:37 +04:00
github-actions[bot]
43a31f680c deploy: update sme service images to 95a06f5 2026-05-05 04:23:28 +00:00
e3mrah
95a06f56f8
fix(sme-marketplace): unblock PIN signin — route /api/* to sme/gateway + add send-pin alias (#868) (#869)
Two-part fix for marketplace UI signin flow which 503'd then 404'd on
otech103. Live debugging found two stacked bugs.

Part A — chart (HTTPRoute backend):
- marketplace-routes.yaml: /api/* rule now backendRefs sme/gateway:8080
  (cross-namespace) instead of catalyst-system/marketplace-api which had
  a Service selector matching zero Pods. The gateway in sme already
  fronts services-auth, catalog, tenant, billing, provisioning.
- marketplace-reference-grant.yaml: extend `to:` list with the gateway
  Service so the cross-ns hop is authorised by Gateway API.
- Bump bp-catalyst-platform 1.4.7 → 1.4.8 + lockstep slot 13 pin.

Part B — services-auth (route name):
- Add /auth/send-pin alias delegating to existing SendMagicLink handler,
  and /auth/verify-pin alias delegating to VerifyMagicLink. The
  marketplace UI surfaces a 6-digit PIN ("Send PIN" button), so the
  PIN-named routes are the canonical UX-facing names. /auth/magic-link
  and /auth/verify remain registered for backward compat.
- services-build workflow auto-rebuilds the auth image on push to
  core/services/** — no manual dispatch needed.

Refs: #868

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 08:22:17 +04:00
github-actions[bot]
b42a61f883 deploy: update catalyst images to 3bfc97d 2026-05-05 02:28:04 +00:00
e3mrah
3bfc97dcea
feat(bp-catalyst-platform): provision provisioning-github-token Secret on Sovereign install (#866) (#867)
After #859 + #861 + #863 cleared 12/13 SME pods on otech103, the
provisioning Deployment stayed in CreateContainerConfigError waiting
on `secret/provisioning-github-token` (key GITHUB_TOKEN) which exists
on contabo-mkt as a hand-rolled SealedSecret but had no Sovereign-side
equivalent. Without this Secret the Pod can't even start.

Fix (issue #866 Option C — local-Gitea target):
Post-cutover the canonical Git target on a Sovereign IS the local
Gitea instance (the GitRepository CRs already point there). New
template templates/sme-services/provisioning-github-token.yaml uses
Helm `lookup` to read the auto-generated gitea admin password from
`gitea/gitea-admin-secret` and re-emit it as
`sme/provisioning-github-token` under the GITHUB_TOKEN key. Same
lookup-and-mirror pattern as valkey-cross-ns-secret.yaml (#863) and
sme-secrets.yaml (#859). bp-gitea (slot 10) reaches Ready before
bp-catalyst-platform (slot 13) so the lookup has data by the time
this template renders.

values.yaml — new `smeServices.provisioning.gitToken.*` block
(sourceNamespace / sourceSecretName / sourcePasswordKey /
destNamespace / destSecretName / destKey) so per-Sovereign overlays
pointing the provisioning service at a non-Gitea Git host (e.g. a
GitHub PAT via OpenBao + ExternalSecret) can swap the source ref
without forking the chart (Inviolable Principle #4).

Out of scope: full Gitea REST-API target support in
core/services/provisioning/github/client.go (which hardcodes
https://api.github.com today) is a follow-up Go change.

Chart 1.4.6 → 1.4.7. Slot 13 pin bumped in lockstep.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:26:03 +04:00
github-actions[bot]
348b70a7d9 deploy: update catalyst images to b0debf9 2026-05-05 02:18:30 +00:00
e3mrah
b0debf93a6
fix(bp-catalyst-platform): bump 1.4.5 -> 1.4.6 to bundle rebuilt SME images (#863) (#865)
Chart 1.4.5 was published at commit fa4395fa BEFORE the services-build
deploy step committed 9731701c updating auth.yaml + gateway.yaml `image:`
lines to fa4395f. Result: Sovereigns pulling 1.4.5 got the OLD image
(5cdb738) without the ConnectValkeyWithAuth Go change — VALKEY_PASSWORD
env was wired but the binary ignored it and still failed with "NOAUTH
HELLO" on connect.

Same race documented in 1.1.16 changelog (catalyst-ui base:/ fix).

No template/code changes — pure version bump to roll a fresh OCI
artifact whose `helm template` output references the rebuilt image.

Slot 13 pin lockstep 1.4.5 -> 1.4.6.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:16:27 +04:00
github-actions[bot]
9731701c56 deploy: update sme service images to fa4395f 2026-05-05 02:10:45 +00:00
e3mrah
fa4395fa3a
fix(bp-catalyst-platform): wire VALKEY_PASSWORD into SME auth + gateway (#863) (#864)
After PR #862 (1.4.4) made cross-ns Valkey reachable from `sme` ns, the
auth Pod started CrashLoopBackOff with "NOAUTH HELLO must be called with
the client already authenticated". Root cause: bp-valkey 1.0.0 ships
auth.enabled=true (bitnami default) but SME service code + Deployment
templates never plumbed a password through.

Chart 1.4.4 -> 1.4.5. Slot 13 pin lockstep.

Changes:
- core/services/shared/db/valkey.go: add ConnectValkeyWithAuth overload
  taking username + password. ConnectValkey kept backwards-compatible
  for contabo-mkt's auth-less in-namespace Valkey.
- core/services/auth/main.go + gateway/main.go: read VALKEY_USERNAME +
  VALKEY_PASSWORD env, call ConnectValkeyWithAuth when password set,
  else fall through to no-auth path.
- NEW templates/sme-services/valkey-cross-ns-secret.yaml: Helm `lookup`
  reads bp-valkey's auto-generated `valkey-password` from the
  `valkey/valkey` Secret and re-emits it as `sme-valkey-auth` in `sme`
  ns. Same pattern as sme-secrets.yaml (#859) and gitea-admin-secret
  (#830 Bug 2). On first install the lookup may return nil; Flux's 15m
  reconcile picks up the mirror once bp-valkey is Ready.
- auth.yaml + gateway.yaml: add VALKEY_PASSWORD env from `sme-valkey-
  auth` Secret with optional=true so contabo-mkt's auth-less path keeps
  working when the mirror Secret is absent.
- values.yaml: add `smeServices.valkey.{sourceSecretName,
  sourcePasswordKey, destNamespace, destSecretName}` knobs (Inviolable
  Principle #4).

Live verified the failure mode on otech103: 11/13 SME pods Running 1/1,
auth in CrashLoopBackOff with NOAUTH HELLO error. Provisioning Pod's
CreateContainerConfigError is unrelated (ghcr-pull, separate ticket).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:09:38 +04:00
github-actions[bot]
329baf0d65 deploy: update catalyst images to ee00ec0 2026-05-05 01:55:09 +00:00