openova/platform/stalwart-tenant/chart/templates/dns-records-configmap.yaml
e3mrah 368545369b
fix(bp-stalwart-tenant): unbootable on fresh tenants — values shape, missing admin Secret, sec ctx (#898) (#904)
Three fixes that left bp-stalwart-tenant 0.1.0 unable to come up on a
freshly-franchised SME tenant. All surfaced on the otech103 alice
tenant during the Phase-1 DoD sweep.

1. Tenant-domain values shape (HelmRelease render error)

   The 0.1.0 chart referenced `.Values.domain.primary` in five
   templates. The live HR on otech103 had `values.domain:
   acme.omani.works` (a string), emitted by a pre-#897 catalyst-api
   build, so every reconcile died with:

     can't evaluate field primary in type interface {}

   Added `bp-stalwart-tenant.tenantDomain` + `tenantMode` helpers
   that resolve in priority order:

     1. `tenant.domain`        (forward-looking flat shape)
     2. `domain.primary`       (canonical post-#897 map shape)
     3. `domain` (string)      (legacy pre-#897 shape — back-compat)

   Returns "" smoke-render-safe; per-template gates skip when empty.

2. Missing stalwart-admin Secret

   deployment.yaml + mailbox-provision-job.yaml reference a Secret
   key `ADMIN_PASSWORD` on `.Values.admin.secretName`. The 0.1.0
   chart only emitted an ExternalSecret, and only when
   `admin.externalSecret.remoteRef.key` was non-empty (smoke-render
   concession). Fresh tenants land in CreateContainerConfigError.

   Added `templates/admin-secret.yaml` mirroring marketplace-api/
   secret.yaml (#887): random 32-char ADMIN_PASSWORD generated by
   sprig randAlphaNum, persisted across reconcile via lookup,
   helm.sh/resource-policy: keep so reinstall picks it back up.
   Auto-disabled when an authoritative ExternalSecret is wired —
   no double-bind between two controllers.

3. Pod sec ctx vs. upstream image's file capabilities

   `getcap docker.io/stalwartlabs/stalwart:v0.16.3 /usr/local/bin/
   stalwart` reports `cap_net_bind_service=ep`. The image creates
   user `stalwart` at UID 2000 and the binary IS the entrypoint
   (no demotion script). The 0.1.0 chart ran as UID 65534 with
   `drop: ALL` — kernel refuses to elevate file caps with empty
   bounding set, so exec failed with `operation not permitted`.

   Aligned to image's native UID 2000, kept `drop: ALL` and added
   `NET_BIND_SERVICE` explicitly. fsGroup 2000 ensures /opt/stalwart
   PVC is writable.

Other:
- Bumped Chart.yaml + blueprint.yaml to 0.1.1 (#817 alignment).
- configSchema in blueprint.yaml now permits the legacy + tenant
  shapes alongside the canonical map.
- mailboxProvisioner.setupJob.enabled defaults to false until the
  canonical stalwart-cli image is published (re-uses upstream
  stalwart container as fallback CLI host).

Acceptance: targeted at otech103 alice tenant
(sme-789ae512-bc0f-467c-a016-001f5496c403) where 0.1.0 reconciliation
fails with the value-shape error and the pod CrashLoops with `exec
... operation not permitted`. Verification on otech103 in #898.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:55:03 +04:00

58 lines
2.6 KiB
YAML

{{- /*
DNS records the SME admin must publish for this mail server.
The unified-rbac console UI reads this ConfigMap to render the "Mail
domain DNS setup" pane to the SME admin (BYO-domain mode) or to display
"DNS setup complete" status (free-subdomain mode, where the chart's
PowerDNS Job already published the records to the otech zone).
Per docs/INVIOLABLE-PRINCIPLES.md #4, every value (selector, algorithm,
SPF policy, DMARC policy, DMARC rua) is operator-tunable. The actual
DKIM public key is written into RocksDB on first Stalwart boot — the
setup Job reads it back via `/api/dkim/<selector>` and the unified-rbac
service patches this ConfigMap at runtime once the value is known. The
template below ships the static records (MX, SPF, DMARC) and a
placeholder DKIM record line so the UI can render before DKIM is sealed.
*/}}
{{- if .Values.stalwart.enabled }}
{{- $tenantDomain := include "bp-stalwart-tenant.tenantDomain" . -}}
{{- $tenantMode := include "bp-stalwart-tenant.tenantMode" . -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "bp-stalwart-tenant.dnsRecordsConfigMapName" . }}
labels:
{{- include "bp-stalwart-tenant.labels" . | nindent 4 }}
catalyst.openova.io/role: dns-records-required
data:
domain: {{ $tenantDomain | quote }}
mode: {{ $tenantMode | quote }}
records.yaml: |
# MX record — points the SME's domain at this Stalwart's mail Service.
# The hostname is the LoadBalancer-assigned address (filled in at
# runtime by the unified-rbac controller once the LB IP is known).
- kind: MX
name: {{ $tenantDomain | quote }}
priority: 10
value: {{ printf "mail.%s" $tenantDomain | quote }}
# SPF — declare this Stalwart's IP as a permitted sender; close
# everything else off per .Values.dns.spf.policy.
- kind: TXT
name: {{ $tenantDomain | quote }}
value: {{ printf "v=spf1 mx %s" .Values.dns.spf.policy | quote }}
# DKIM public key — selector + algorithm rendered here; the actual
# `p=<base64>` blob is stamped in by the unified-rbac controller
# once Stalwart has minted the key on first boot. Placeholder
# "<DKIM-PUBLIC-KEY>" is a sentinel the controller searches for.
- kind: TXT
name: {{ printf "%s._domainkey.%s" .Values.dns.dkim.selector $tenantDomain | quote }}
value: "v=DKIM1; k=ed25519; p=<DKIM-PUBLIC-KEY>"
# DMARC.
- kind: TXT
name: {{ printf "_dmarc.%s" $tenantDomain | quote }}
value: {{ printf "v=DMARC1; p=%s; rua=mailto:%s" .Values.dns.dmarc.policy (default (printf "dmarc@%s" $tenantDomain) .Values.dns.dmarc.rua) | quote }}
{{- end }}