fix(catalyst-api,bp-catalyst-platform,infra): unblock multi-domain Day-2 add-domain flow on Sovereigns (#879) (#884)

5 stacked wiring bugs blocked the Day-2 add-parent-domain happy path on a
fresh post-handover Sovereign — surfaced live on otech103, 2026-05-05 — plus
a 6th gap (ghcr-pull reflector for catalyst-system). All six fixed in one PR
so a single chart bump + cloud-init re-render closes the gap end-to-end.

Bug 1 (chart, api-deployment.yaml): wire POOL_DOMAIN_MANAGER_URL=
https://pool.openova.io. The in-cluster Service default only resolves on
contabo; on Sovereigns every Day-2 POST died with NXDOMAIN.

Bug 2 (chart + code): wire CATALYST_PDM_BASIC_AUTH_USER / _PASS env from a
new pdm-basicauth Secret, and have pdmFlipNS SetBasicAuth from those envs.
The PDM public ingress at pool.openova.io is gated by Traefik basicAuth;
calls without Authorization: Basic returned 401. optional=true so contabo
+ CI + older Sovereigns degrade to a clear 401 log line. Per Inviolable
Principle #10, the credentials only ever live in Pod env + are read once
per call by pdmFlipNS — never enter a logged struct or persisted record.

Bug 3 (code, parent_domains.go): pdmFlipNS body now includes the required
nameservers field (computed from expectedNSFor). PDM's SetNSRequest schema
requires it; the previous body got 422 missing-nameservers.

Bug 4 (code, parent_domains.go): lookupPrimaryDomain falls back to
SOVEREIGN_FQDN env after CATALYST_PRIMARY_DOMAIN. On a post-handover
Sovereign no Deployment record is persisted, so without this fallback GET
/parent-domains returned {"items":[]} and the propagation panel showed
expectedNs:null. SOVEREIGN_FQDN is already wired by api-deployment.yaml
from the sovereign-fqdn ConfigMap.

Bug 5 (chart, httproute.yaml): catalyst-ui /auth/* PathPrefix narrowed to
Exact /auth/handover. The previous PathPrefix collided with OIDC PKCE
redirect_uri /auth/callback — catalyst-api 404s on that path because it
only registers /api/v1/auth/callback, breaking login post-handover-JWT-
cookie expiry. Exact match keeps /auth/handover routed to catalyst-api
while every other /auth/* path falls through to catalyst-ui's React
Router for client-side OIDC.

Bug 6 (cloud-init): ghcr-pull + harbor-robot-token + new pdm-basicauth
Reflector annotations enumerate explicit allowed/auto-namespaces (sme,
catalyst, catalyst-system, gitea, harbor) instead of empty-string. The
ambiguous empty-string interpretation caused otech103 to require a manual
catalyst-system mirror creation; explicit list back-ports the verified
working state.

Provisioner wiring: Request.PDMBasicAuthUser/Pass + Provisioner fields
+ tfvars emission so the contabo catalyst-api can stamp the credentials
onto every Sovereign provision request. variables.tf adds matching
pdm_basic_auth_user / pdm_basic_auth_pass tofu vars (sensitive, default
empty) so older provisioner builds that pre-date this change keep
rendering valid cloud-init (the Secret renders with empty values and
Pod start is unaffected).

Chart bumped 1.4.11 -> 1.4.12, lockstep slot 13 pin updated. Closes
the architectural blockers tracked in #879; the catalyst-api image
rebuild + chart republish run via the existing CI pipelines (services-
build.yaml + blueprint-release.yaml) on this commit's SHA.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-05 09:02:39 +04:00 committed by GitHub
parent 2bcff5b43b
commit 7bfd6df588
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 411 additions and 12 deletions

View File

@ -104,7 +104,7 @@ spec:
# /auth/send-pin → SendMagicLink (and /auth/verify-pin →
# VerifyMagicLink) so the UI's PIN-naming reaches the existing
# backend handler.
version: 1.4.12
version: 1.4.13
sourceRef:
kind: HelmRepository
name: bp-catalyst-platform

View File

@ -174,10 +174,26 @@ write_files:
# so all workloads can pull from ghcr.io/openova-io without
# per-namespace manual creation. reflection-auto-enabled means
# Reflector creates the copy in new namespaces as they appear.
#
# ALLOWED + AUTO namespaces — explicitly enumerated.
# Issue #879 Bonus Bug 6: the previous values left both fields
# as empty strings, which Reflector interprets ambiguously
# depending on version. On otech103 (2026-05-05) catalyst-api
# POD failed to pull the SHA-pinned image until an operator
# manually created the Secret in the `catalyst-system`
# namespace. The fix here lists every namespace catalyst-api
# and SME services land in: sme, catalyst, catalyst-system,
# gitea, harbor — paired with `auto-namespaces` so a
# later-created namespace (the bp-* HelmReleases land in their
# own namespaces over time) still gets the mirror automatically
# the moment it appears. The list is the SUPERSET of what
# otech103 verified live. Future namespaces added to the
# bootstrap-kit (a new bp-* slot) only need an addition here
# plus a Pod restart to pick up the new mirror.
reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: ""
reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "sme,catalyst,catalyst-system,gitea,harbor"
reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: ""
reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "sme,catalyst,catalyst-system,gitea,harbor"
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${base64encode(jsonencode({
@ -272,6 +288,60 @@ write_files:
data:
api-key: ${base64encode(powerdns_api_key)}
# ── flux-system/pdm-basicauth Secret (issue #879 Bug 2) ──────────────
#
# The Sovereign-side catalyst-api Pod (api-deployment.yaml) reads
# CATALYST_PDM_BASIC_AUTH_USER + CATALYST_PDM_BASIC_AUTH_PASS via
# secretKeyRef into `pdm-basicauth` (in the same namespace
# catalyst-api lives — catalyst-system). Reflector mirrors this
# Secret out of flux-system to sme,catalyst,catalyst-system,gitea,
# harbor (same canonical pattern flux-system/ghcr-pull and
# flux-system/harbor-robot-token already use).
#
# The Pod adds `Authorization: Basic …` to every PDM call so the
# Traefik basicAuth Middleware in front of pool.openova.io accepts
# the request — pdmFlipNS in parent_domains.go is the call site.
# Without this Secret + Reflector mirror, every Day-2 add-parent-
# domain POST returns 401 from PDM (caught live on otech103,
# 2026-05-05 — issue #879).
#
# optional=true on the secretKeyRef in api-deployment.yaml so:
# - Catalyst-Zero pods (contabo's catalyst-api) start cleanly
# when the Secret is absent. Contabo uses the in-cluster
# Service path which bypasses the ingress entirely.
# - CI / older Sovereigns that pre-date this provisioning seam
# start cleanly. POSTs without auth get 401 from PDM with a
# clear log line, instead of crashlooping on Pod start.
#
# Per Inviolable Principle #10: the credentials never enter a
# logged struct, a deployment record, or any committed git file.
# Plaintext only ever lives in the per-deployment OpenTofu workdir
# (mode 0600, wiped on tofu destroy) and inside the Sovereign's
# encrypted etcd.
- path: /var/lib/catalyst/pdm-basicauth-secret.yaml
permissions: '0600'
content: |
apiVersion: v1
kind: Secret
metadata:
name: pdm-basicauth
namespace: flux-system
annotations:
# bp-reflector (slot 05a) mirrors this Secret to every
# namespace listed below so catalyst-api in catalyst-system
# picks it up event-driven. List explicitly enumerates the
# known namespaces (issue #879 Bonus Bug 6 — empty-string
# ambiguity caused otech103 to require a manual mirror
# creation in catalyst-system).
reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "sme,catalyst,catalyst-system,gitea,harbor"
reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "sme,catalyst,catalyst-system,gitea,harbor"
type: Opaque
data:
username: ${base64encode(pdm_basic_auth_user)}
password: ${base64encode(pdm_basic_auth_pass)}
# ── flux-system/object-storage Secret (issue #371, vendor-agnostic since #425) ─
#
# The Sovereign's per-cluster S3 credentials, materialised as a stock
@ -1201,6 +1271,17 @@ runcmd:
# start cleanly. Same idempotency property as ghcr-pull above.
- 'kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml apply -f /var/lib/catalyst/harbor-robot-token-secret.yaml'
# ── flux-system/pdm-basicauth Secret (issue #879 Bug 2) ──────────────
#
# Apply the PDM basic-auth credentials BEFORE Flux reconciles the
# bootstrap-kit. bp-reflector (slot 05a) mirrors this Secret from
# flux-system into catalyst-system on first reconcile so the
# catalyst-api Pod can mount it via secretKeyRef (optional=true so
# Pod start is not blocked when this is absent). Same idempotency
# property as ghcr-pull above — re-running cloud-init rewrites the
# bytes; a token rotation propagates through here on the next render.
- 'kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml apply -f /var/lib/catalyst/pdm-basicauth-secret.yaml'
# ── cert-manager/powerdns-api-credentials Secret (PR #681 followup) ──
#
# Apply the contabo PowerDNS credentials BEFORE Flux reconciles

View File

@ -256,6 +256,17 @@ locals {
# records to contabo's authoritative omani.works zone.
powerdns_api_key = var.powerdns_api_key
# PDM (Pool Domain Manager) basic-auth credentials (issue #879 Bug 2).
# Interpolated into the Sovereign's `flux-system/pdm-basicauth` Secret
# at cloud-init time so catalyst-api in catalyst-system can call PDM
# at https://pool.openova.io with `Authorization: Basic ` for the
# Day-2 multi-domain "Add another parent domain" flow. Reflector
# auto-mirrors the Secret into `catalyst-system` (same canonical
# pattern flux-system/ghcr-pull and flux-system/harbor-robot-token
# already use). Sensitive never logged, never committed.
pdm_basic_auth_user = var.pdm_basic_auth_user
pdm_basic_auth_pass = var.pdm_basic_auth_pass
deployment_id = var.deployment_id
kubeconfig_bearer_token = var.kubeconfig_bearer_token
catalyst_api_url = var.catalyst_api_url

View File

@ -595,6 +595,44 @@ variable "harbor_robot_token" {
default = ""
}
variable "pdm_basic_auth_user" {
type = string
description = <<-EOT
Username for the Pool Domain Manager (PDM) public ingress at
`pool.openova.io`. The Sovereign-side catalyst-api uses this
value (paired with `pdm_basic_auth_pass`) to authenticate
every PDM call (Day-2 multi-domain "Add another parent
domain" flow — issue #879). Cloud-init writes the value into
a `pdm-basicauth` Secret in the `flux-system` namespace with
Reflector annotations so the Secret mirrors into
`catalyst-system` where catalyst-api reads it via secretKeyRef.
Source on contabo: `openova-system/pool-domain-manager-basicauth`
Secret (operator-managed). The catalyst-api provisioner forwards
plaintext at provisioning time never logged, never committed.
Default empty: when unset, the cloud-init still renders the
`pdm-basicauth` Secret with empty values. The Sovereign-side
pdmFlipNS skips SetBasicAuth when the env value is empty, so
older Sovereigns that pre-date this variable degrade to a
clear PDM 401 instead of a panic. Once the operator fills
this in, a re-provision (or a Secret rotation via cloud-init
re-render) supplies real credentials.
EOT
sensitive = true
default = ""
}
variable "pdm_basic_auth_pass" {
type = string
description = <<-EOT
Password for the Pool Domain Manager (PDM) public ingress.
See `pdm_basic_auth_user` for the full lifecycle. Sensitive.
EOT
sensitive = true
default = ""
}
variable "object_storage_bucket_name" {
type = string
description = <<-EOT

View File

@ -378,10 +378,26 @@ func (h *Handler) lookupPrimaryDomain() string {
return true
})
if len(candidates) == 0 {
// Fallback: env override for tests / single-Sovereign sandboxes.
// Fallback chain (issue #879 Bug 4):
// 1. CATALYST_PRIMARY_DOMAIN — explicit override for tests /
// single-Sovereign sandboxes.
// 2. SOVEREIGN_FQDN — the Sovereign's public FQDN, already
// wired into every catalyst-api Pod via the sovereign-fqdn
// ConfigMap (api-deployment.yaml). On a post-handover
// Sovereign no Deployment record is persisted (handover is
// JWT-only — no wizard-run on the Sovereign-side that
// writes one), so without this fallback GET
// /parent-domains returns {"items":[]} and the propagation
// panel shows expectedNs:null. Caught live on otech103,
// 2026-05-05. Reading the ALREADY-wired SOVEREIGN_FQDN env
// makes the implicit primary visible without touching
// anything else.
if v := strings.TrimSpace(os.Getenv("CATALYST_PRIMARY_DOMAIN")); v != "" {
return v
}
if v := strings.TrimSpace(os.Getenv("SOVEREIGN_FQDN")); v != "" {
return v
}
return ""
}
sort.Strings(candidates)
@ -881,14 +897,46 @@ func (h *Handler) GetPropagation(w http.ResponseWriter, r *http.Request) {
// call here (rather than re-using SetNSRegistrar's HTTP handler) so the
// AddParentDomain pipeline can examine the response and update the store
// atomically. Token never enters a logged struct.
//
// Issue #879 (otech103, 2026-05-05) wired three previously-missing pieces
// to make this work end-to-end on a Sovereign:
//
// - Bug 2 — Basic auth: PDM is exposed via the public ingress
// `pool.openova.io` (clusters/contabo-mkt/apps/pool-domain-manager/
// ingress.yaml) which is gated by Traefik basicAuth. Calls without
// `Authorization: Basic …` get 401. Credentials are loaded from
// the Pod env (CATALYST_PDM_BASIC_AUTH_USER / _PASS, sourced from
// the `pdm-basicauth` Secret per api-deployment.yaml). When unset
// (Catalyst-Zero in-cluster path / older Sovereigns) we omit the
// header — the in-cluster Service URL is unauthenticated and PDM
// responds normally. Per Inviolable Principle #10 the credentials
// are only read at call time and never enter a logged struct.
//
// - Bug 3 — `nameservers` field: PDM's SetNSRequest schema
// (core/pool-domain-manager/internal/handler/registrar.go:149)
// requires a non-empty `nameservers` array. Without it, PDM
// returns 422 `missing-nameservers`. We populate it from
// `expectedNSFor(domain)` which already computes
// `[ns1.<primary>, ns2.<primary>]` from the Sovereign's primary
// FQDN — same nameserver pair PDM's gTLD-side write needs to
// point the new domain at the Sovereign's PowerDNS.
func (h *Handler) pdmFlipNS(ctx context.Context, registrarKind, domain, token string) error {
pdmBase := pdmBaseURL()
if pdmBase == "" {
return fmt.Errorf("pdm-unavailable")
}
body, _ := json.Marshal(map[string]string{
"domain": domain,
"token": token,
// PDM's SetNSRequest schema requires `nameservers` (a non-empty array).
// Compute from the Sovereign's primary FQDN — same `[ns1.<primary>,
// ns2.<primary>]` pair PDM's gTLD-side write uses for the existing
// primary domain.
nameservers := h.expectedNSFor(domain)
if len(nameservers) == 0 {
return fmt.Errorf("expected-ns-unavailable: cannot compute nameservers (primary domain unknown)")
}
body, _ := json.Marshal(map[string]any{
"domain": domain,
"token": token,
"nameservers": nameservers,
})
target := fmt.Sprintf("%s/api/v1/registrar/%s/set-ns",
strings.TrimRight(pdmBase, "/"),
@ -901,6 +949,12 @@ func (h *Handler) pdmFlipNS(ctx context.Context, registrarKind, domain, token st
return fmt.Errorf("build-request: %w", err)
}
httpReq.Header.Set("Content-Type", "application/json")
// Basic auth for the public PDM ingress. Skipped when the Pod env
// is unset — covers the in-cluster-Service path on Catalyst-Zero
// and CI / local dev, where PDM is unauthenticated.
if user, pass := pdmBasicAuth(); user != "" {
httpReq.SetBasicAuth(user, pass)
}
resp, err := (&http.Client{Timeout: 30 * time.Second}).Do(httpReq)
if err != nil {
return fmt.Errorf("pdm-unreachable: %w", err)
@ -913,6 +967,20 @@ func (h *Handler) pdmFlipNS(ctx context.Context, registrarKind, domain, token st
return nil
}
// pdmBasicAuth reads the basic-auth credentials for the PDM public
// ingress out of the Pod env. Returns ("", "") when unset — callers
// then skip SetBasicAuth and rely on the in-cluster Service path
// (unauthenticated). Read every call so a Secret rotation propagates
// without a Pod restart (Reloader handles the env reload).
//
// Per Inviolable Principle #10, this is the ONE call site that touches
// the credentials — they never enter a struct that gets logged.
func pdmBasicAuth() (string, string) {
user := strings.TrimSpace(os.Getenv("CATALYST_PDM_BASIC_AUTH_USER"))
pass := os.Getenv("CATALYST_PDM_BASIC_AUTH_PASS")
return user, pass
}
// pdmCreatePowerDNSZone — runtime PowerDNS zone-create for the
// admin-console "Add another parent domain" flow.
//
@ -985,6 +1053,11 @@ func (h *Handler) pdmCreatePowerDNSZone(ctx context.Context, domain string) erro
return fmt.Errorf("build-request: %w", err)
}
httpReq.Header.Set("Content-Type", "application/json")
// Same basic-auth treatment as pdmFlipNS — the public PDM ingress
// requires the Authorization header. Issue #879 Bug 2 follow-up.
if user, pass := pdmBasicAuth(); user != "" {
httpReq.SetBasicAuth(user, pass)
}
resp, err := (&http.Client{Timeout: 15 * time.Second}).Do(httpReq)
if err != nil {
return fmt.Errorf("pdm-unreachable: %w", err)

View File

@ -291,6 +291,21 @@ type Request struct {
// wizard payload.
PowerDNSAPIKey string `json:"-"`
// PDMBasicAuthUser / PDMBasicAuthPass — credentials for the public
// PDM ingress at pool.openova.io (issue #879 Bug 2). cloudinit-
// control-plane.tftpl writes them into the Sovereign's `flux-system/
// pdm-basicauth` Secret so the catalyst-api Pod (mounted via
// Reflector mirror into catalyst-system) can `Authorization: Basic …`
// against the Traefik basicAuth Middleware in front of PDM.
// Stamped server-side from Provisioner.PDMBasicAuthUser /
// PDMBasicAuthPass (envs CATALYST_PDM_BASIC_AUTH_USER /
// CATALYST_PDM_BASIC_AUTH_PASS). json:"-" — never accepted from
// wizard payload. Empty falls through to a Secret with empty values;
// the Sovereign's catalyst-api skips SetBasicAuth and degrades to
// PDM 401 (clear log line) instead of crashlooping.
PDMBasicAuthUser string `json:"-"`
PDMBasicAuthPass string `json:"-"`
// DeploymentID — catalyst-api's per-deployment identifier (16-char
// hex). Stamped onto the Request by the handler before tfvars are
// emitted so the OpenTofu cloud-init template can render the URL
@ -746,6 +761,19 @@ type Provisioner struct {
// issues and the Sovereign Console TLS handshake fails (caught
// live on otech47).
PowerDNSAPIKey string
// PDMBasicAuthUser / PDMBasicAuthPass — credentials for the public
// PDM ingress at pool.openova.io (issue #879 Bug 2). Mounted from
// the Reflector-mirrored `pdm-basicauth` Secret as envs
// CATALYST_PDM_BASIC_AUTH_USER / CATALYST_PDM_BASIC_AUTH_PASS.
// cloudinit-control-plane.tftpl interpolates them into the new
// Sovereign's flux-system/pdm-basicauth Secret so its catalyst-api
// inherits the same auth posture (Reflector mirrors them into
// catalyst-system). Empty values render an empty Secret and the
// Sovereign-side pdmFlipNS skips SetBasicAuth — same degradation
// posture as the harbor-robot-token Empty-Token path.
PDMBasicAuthUser string
PDMBasicAuthPass string
}
// New returns a Provisioner with paths read from environment.
@ -764,6 +792,8 @@ func New() *Provisioner {
GHCRPullToken: os.Getenv("CATALYST_GHCR_PULL_TOKEN"),
HarborRobotToken: os.Getenv("CATALYST_HARBOR_ROBOT_TOKEN"),
PowerDNSAPIKey: os.Getenv("CATALYST_POWERDNS_API_KEY"),
PDMBasicAuthUser: os.Getenv("CATALYST_PDM_BASIC_AUTH_USER"),
PDMBasicAuthPass: os.Getenv("CATALYST_PDM_BASIC_AUTH_PASS"),
}
}
@ -785,6 +815,12 @@ func (p *Provisioner) Provision(ctx context.Context, req Request, events chan<-
if strings.TrimSpace(req.PowerDNSAPIKey) == "" {
req.PowerDNSAPIKey = p.PowerDNSAPIKey
}
if strings.TrimSpace(req.PDMBasicAuthUser) == "" {
req.PDMBasicAuthUser = p.PDMBasicAuthUser
}
if req.PDMBasicAuthPass == "" {
req.PDMBasicAuthPass = p.PDMBasicAuthPass
}
if err := req.Validate(); err != nil {
return nil, err
@ -868,6 +904,12 @@ func (p *Provisioner) Destroy(ctx context.Context, req Request, events chan<- Ev
if strings.TrimSpace(req.PowerDNSAPIKey) == "" {
req.PowerDNSAPIKey = p.PowerDNSAPIKey
}
if strings.TrimSpace(req.PDMBasicAuthUser) == "" {
req.PDMBasicAuthUser = p.PDMBasicAuthUser
}
if req.PDMBasicAuthPass == "" {
req.PDMBasicAuthPass = p.PDMBasicAuthPass
}
emit := func(phase, level, msg string) {
select {
@ -1122,6 +1164,17 @@ func writeTfvars(deployDir string, req Request) error {
// through to anonymous Harbor pulls.
"harbor_robot_token": req.HarborRobotToken,
// PDM basic-auth credentials (issue #879 Bug 2). Stamped server-
// side. cloudinit-control-plane.tftpl writes them into the
// new Sovereign's flux-system/pdm-basicauth Secret so its
// catalyst-api can call PDM via Authorization: Basic ….
// Empty falls through to a Secret with empty values; the
// Sovereign's pdmFlipNS skips SetBasicAuth and degrades to
// PDM 401 with a clear log line, matching the harbor-robot-
// token degradation posture.
"pdm_basic_auth_user": req.PDMBasicAuthUser,
"pdm_basic_auth_pass": req.PDMBasicAuthPass,
// Cloud-init kubeconfig postback (issue #183, Option D). The
// catalyst-api stamps deployment_id + kubeconfig_bearer_token
// onto the Request before writeTfvars is called: deployment_id

View File

@ -1,7 +1,7 @@
apiVersion: v2
name: bp-catalyst-platform
version: 1.4.12
appVersion: 1.4.12
version: 1.4.13
appVersion: 1.4.13
description: |
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
Composes the catalyst-{ui,api}, console, admin, marketplace UI modules and the marketplace-api backend.
@ -643,6 +643,58 @@ description: |
catalyst-build workflow needs the equivalent. Until then this manual
bump is required after every catalyst-api image change. Lockstep
slot 13 pin bumps to 1.4.12. 2026-05-05.
1.4.13 (issue #879): unblock the multi-domain Day-2 add-domain happy
path on a fresh post-handover Sovereign. Five stacked wiring fixes,
three of which are chart-side:
Bug 1 — POOL_DOMAIN_MANAGER_URL: api-deployment.yaml now wires
`POOL_DOMAIN_MANAGER_URL=https://pool.openova.io` so the Sovereign-
side catalyst-api hits the public PDM ingress on contabo (the
in-cluster default `pool-domain-manager.openova-system.svc` only
resolves on contabo and is NXDOMAIN on franchised Sovereigns).
Caught live on otech103, 2026-05-05: every Day-2 add-domain POST
failed with `dial tcp: lookup pool-domain-manager.openova-system.
svc.cluster.local: no such host`.
Bug 2 — CATALYST_PDM_BASIC_AUTH_USER / _PASS: api-deployment.yaml
now mounts the `pdm-basicauth` Secret (keys `username`+`password`)
so pdmFlipNS can `Authorization: Basic ...` against the Traefik
basicAuth Middleware in front of pool.openova.io. optional=true:
Catalyst-Zero pods skip the header (in-cluster Service path is
unauthenticated) and CI / older Sovereigns degrade to a clear 401
log line instead of crashlooping. The Secret is provisioned by
cloud-init at handover-time (paired infra change in
cloudinit-control-plane.tftpl).
Bug 5 — HTTPRoute /auth/handover Exact match: httproute.yaml
catalyst-ui rule changed from PathPrefix `/auth/` to Exact
`/auth/handover`. The previous PathPrefix collided with the OIDC
PKCE redirect_uri `/auth/callback` — catalyst-api 404s on that
path because it only registers `/api/v1/auth/callback`. Result
post-handover-JWT-cookie-expiry (8h TTL): the operator could not
log into the Sovereign Console at all (caught live on otech103).
Exact-match keeps /auth/handover routed to catalyst-api while
every other /auth/* path falls through to catalyst-ui's React
Router for client-side OIDC.
Three coupled code-side fixes ship in catalyst-api as part of the
same #879 PR (parent_domains.go):
Bug 2-code: pdmFlipNS now SetBasicAuth from the env (read every
call so a Secret rotation propagates without Pod restart).
Bug 3-code: pdmFlipNS body now includes `nameservers` (computed
from expectedNSFor — PDM's SetNSRequest schema requires it; the
previous body got 422 missing-nameservers).
Bug 4-code: lookupPrimaryDomain falls back to SOVEREIGN_FQDN env
after CATALYST_PRIMARY_DOMAIN. On a post-handover Sovereign no
Deployment record is persisted, so without this fallback GET
/parent-domains returned {"items":[]} and the propagation panel
showed `expectedNs: null`. The SOVEREIGN_FQDN env is already
wired by api-deployment.yaml from the sovereign-fqdn ConfigMap.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.12 → 1.4.13. 2026-05-05.
type: application
# Opt-out from the blueprint-release hollow-chart guard (issue #181 / #510).

View File

@ -548,6 +548,82 @@ spec:
name: gitea-admin-secret
key: password
optional: true
# POOL_DOMAIN_MANAGER_URL — base URL of the central Pool Domain
# Manager (PDM) ingress on Catalyst-Zero (contabo). Sovereign-
# side catalyst-api calls PDM's /api/v1/registrar/{r}/set-ns
# endpoint for the Day-2 multi-domain "Add another parent
# domain" flow (issue #879, parent epic #825 / #829).
#
# Why a public ingress URL (not an in-cluster Service):
# the in-cluster default `pool-domain-manager.openova-system.
# svc.cluster.local` ONLY resolves on the contabo cluster
# (PDM lives in `openova-system` ns there). On a franchised
# Sovereign post-handover, that DNS name is NXDOMAIN, so
# every Day-2 add-domain call returned `dial tcp: lookup
# pool-domain-manager.openova-system.svc.cluster.local on
# 10.43.0.10:53: no such host` (caught live on otech103,
# 2026-05-05 — issue #879 verification).
#
# The default below points at the public PDM ingress on
# contabo (`pool.openova.io`). Per Inviolable Principle #4
# (never hardcode), per-Sovereign overlays may override via
# `catalystApi.poolDomainManagerURL` in values. Catalyst-Zero
# (contabo) leaves this default — its catalyst-api Pod hits
# the SAME public URL via its own loopback ingress (the proxy
# is idempotent on the source cluster).
#
# Pairs with CATALYST_PDM_BASIC_AUTH_USER / _PASS below: the
# PDM ingress at pool.openova.io is gated by Traefik basicAuth
# (clusters/contabo-mkt/apps/pool-domain-manager/ingress.yaml).
# Both halves wired together so a fresh Sovereign reaches PDM
# without a manual env-var patch.
#
# NOTE — DUAL-MODE CONTRACT: this file is consumed BOTH by
# Helm (per-Sovereign install via bp-catalyst-platform OCI)
# AND by Kustomize (contabo-mkt's clusters/contabo-mkt/apps/
# catalyst-platform). The default literal below (no Helm
# template directives) keeps both build paths clean. Per-
# Sovereign overlays override via the HelmRelease overlay's
# `catalystApi.env` additional-env patch (Helm-only, takes
# precedence over THIS default at template-render time).
- name: POOL_DOMAIN_MANAGER_URL
value: "https://pool.openova.io"
# CATALYST_PDM_BASIC_AUTH_USER / _PASS — basic-auth credentials
# for the PDM public ingress (issue #879 Bug 2). The Sovereign-
# side catalyst-api adds `Authorization: Basic …` to every PDM
# call so the Traefik basicAuth Middleware in front of
# pool.openova.io accepts the request. Without this, every
# Day-2 add-domain call returns 401 from PDM (caught live on
# otech103).
#
# Source Secret (`pdm-basicauth`, keys `username` + `password`)
# is pre-provisioned by cloud-init on every Sovereign at
# provision time, mirrored via the same Reflector seam ghcr-
# pull / harbor-robot-token already use. optional=true so:
# - Catalyst-Zero pods (contabo's catalyst-api) start cleanly
# when the Secret is absent. On contabo the in-cluster
# Service path bypasses the ingress entirely and BasicAuth
# is a no-op.
# - CI / local dev / older Sovereigns that pre-date this
# provisioning seam start cleanly. POSTs without auth get
# 401 from PDM with a clear log line, instead of the Pod
# crashlooping on start.
#
# Per Inviolable Principle #10: the credentials never enter a
# logged struct or a deployment record — loaded into the Pod
# env once at start, read per-call by pdmFlipNS only.
- name: CATALYST_PDM_BASIC_AUTH_USER
valueFrom:
secretKeyRef:
name: pdm-basicauth
key: username
optional: true
- name: CATALYST_PDM_BASIC_AUTH_PASS
valueFrom:
secretKeyRef:
name: pdm-basicauth
key: password
optional: true
# CATALYST_HANDOVER_KEY_PATH — path to the RS256 PRIVATE key
# catalyst-api uses to mint magic-link + handover JWTs. The
# signer auto-generates the keypair on first start if absent.

View File

@ -43,17 +43,32 @@ spec:
hostnames:
- {{ $consoleHost | quote }}
rules:
# /auth/* and /api/* on the console hostname route to catalyst-api
# /auth/handover on the console hostname routes to catalyst-api
# (the Go backend), not catalyst-ui (the React shell). The handover
# JWT lands at GET /auth/handover?token=… which is implemented in
# catalyst-api. Without this rule the React app sees /auth/handover,
# has no client-side route for it, and falls through to Keycloak's
# bare login screen — defeating the Phase-8b seamless auth promise.
# Caught live on otech46.
#
# CRITICAL — Exact match, not PathPrefix. Issue #879 Bug 5:
# /auth/handover MUST be Exact (NOT PathPrefix /auth/). OIDC PKCE on
# the Sovereign-side Keycloak (catalyst-ui client) configures
# redirect_uri = https://console.<sov>/auth/callback, which the
# React SPA handles client-side. A PathPrefix /auth/ rule routes
# /auth/callback to catalyst-api too — and catalyst-api only
# registers /api/v1/auth/callback, so it returns bare "404 page not
# found". Result post-handover-JWT-cookie-expiry (8h TTL): the
# operator cannot log into the Sovereign Console at all (Keycloak
# bounces between login and a 404). Verified live on otech103,
# 2026-05-05. By using Exact /auth/handover here, every other
# /auth/* path (including /auth/callback, /auth/silent-renew, any
# future client-side OIDC routes) falls through to the catalyst-ui
# default rule below and the React Router resolves them.
- matches:
- path:
type: PathPrefix
value: "/auth/"
type: Exact
value: "/auth/handover"
backendRefs:
- name: catalyst-api
port: 8080