fix(bp-powerdns): self-generate api-credentials Secret + disable upstream zone-bootstrap Job (#248)

Root cause investigation on otech.omani.works (kubectl, sanitized):

  $ kubectl get pods -n powerdns
  create-zone-if-not-exist-sh-tjtr4   0/1  CreateContainerConfigError  4h
  powerdns-57d7d49f99-{9hrb4,lxlgt,nkmht}  0/1  CreateContainerConfigError  4h
  dnsdist-594dbfc5f-wznsw                  1/1  Running  4h

  $ kubectl get secrets -n powerdns
  powerdns                Opaque  1  4h
  powerdns-api-tls-8kxpx  Opaque  1  4h     (NO `powerdns-api-credentials`, NO `pdns-pg-app`)

  $ kubectl describe pod ... powerdns-57d7d49f99-9hrb4
  Environment:
    PDNS_API_KEY:  <set to the key 'api-key' in secret 'powerdns-api-credentials'>  Optional: false
    PDNS_DB_HOST:  <set to the key 'host' in secret 'pdns-pg-app'>                  Optional: false
    State: Waiting   Reason: CreateContainerConfigError

The handover's chicken-egg-with-secret theory was directionally right but
the cause was more fundamental:

  1. Wrapper chart's api-credentials-secret.yaml (1.1.2) was a no-op
     unless operator set `apiKey` value out-of-band — comment said the
     deployment would "fail to start until the named Secret exists" as
     "the explicit signal we want". On a Sovereign that bootstraps from
     bp-* OCI artifacts, no operator is standing by, so the Secret is
     never created and pods sit in CreateContainerConfigError forever.

  2. The upstream chart's `create-zone-if-not-exists-sh` Job is rendered
     whenever both `zoneName` and `api.key` are set — defaulting
     `zoneName: "example.de."` it ALWAYS rendered and ALWAYS failed
     (same missing Secret). Catalyst doesn't want this Job at all
     because zones are loaded later by pool-domain-manager (PDM).

  3. The chart's CNPG Cluster template is gated behind
     Capabilities.APIVersions.Has "postgresql.cnpg.io/v1" — on a fresh
     Sovereign without bp-cnpg yet (bp-cnpg is on the roadmap, not in
     bootstrap-kit), no Cluster is rendered and `pdns-pg-app` Secret
     never materialises. With Helm `--wait`, install times out
     ("context deadline exceeded") even though the manifests applied
     cleanly.

Fix:

  * api-credentials-secret.yaml: self-generate via Helm `lookup` +
    `randAlphaNum 32`. First install creates fresh randoms; every
    subsequent reconcile reads back the existing values from the
    Secret so the API key never rotates on upgrade. Operator can
    still pin specific values via .Values.powerdns.apiKey /
    .Values.powerdns.webserverPassword, or skip Secret creation
    entirely via .Values.powerdns.useExistingApiSecret. Same pattern
    as bitnami/postgresql, bitnami/keycloak.

  * values.yaml: set `powerdns.zoneName: ""` so upstream chart's
    `{{- if and .Values.powerdns.zoneName .Values.powerdns.api.key }}`
    gate skips the create-zone Job entirely. Catalyst's PDM creates
    zones via the REST API after the cluster comes up; we don't want
    a placeholder `example.de.` zone in production.

  * HelmRelease (both _template and otech.omani.works overlays):
    `install.disableWait: true` + `upgrade.disableWait: true` so the
    HelmRelease reports Ready as soon as manifests apply cleanly,
    rather than gating on powerdns Deployment readiness which depends
    on bp-cnpg landing first to synthesise `pdns-pg-app`. Runtime
    convergence is observed via kubectl, not gated on Helm.

Live error this addresses:
  Helm upgrade failed for release powerdns/powerdns with chart
  bp-powerdns@1.1.2: context deadline exceeded

Verified locally with `helm template`:
  - powerdns-api-credentials Secret renders with random api-key + webserver-password
  - create-zone-if-not-exist-sh Job no longer rendered
  - Deployment env continues to reference powerdns-api-credentials correctly

Bumped 1.1.2 -> 1.1.3 (chart, blueprint, both bootstrap-kit overlays).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-04-30 16:55:12 +04:00 committed by GitHub
parent 2d1799d738
commit 726af6df81
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 131 additions and 28 deletions

View File

@ -80,14 +80,25 @@ spec:
chart:
spec:
chart: bp-powerdns
version: 1.1.2
version: 1.1.3
sourceRef:
kind: HelmRepository
name: bp-powerdns
namespace: flux-system
# disableWait: a Sovereign without bp-cnpg yet reconciled has no
# `pdns-pg-app` Secret (the chart's CNPG Cluster template is gated
# behind the `postgresql.cnpg.io/v1` CRD via Capabilities.APIVersions
# check — see chart/templates/cnpg-cluster.yaml). Without disableWait,
# Helm's `--wait` would hold until the powerdns Deployment is Ready,
# which can't happen until CNPG comes up and synthesises the Secret.
# The HelmRelease itself reports Ready as soon as the manifests apply
# cleanly; runtime convergence (powerdns pods becoming Ready once
# CNPG lands) is observed via kubectl, not gated on Helm.
install:
disableWait: true
remediation:
retries: 3
upgrade:
disableWait: true
remediation:
retries: 3

View File

@ -80,14 +80,25 @@ spec:
chart:
spec:
chart: bp-powerdns
version: 1.1.2
version: 1.1.3
sourceRef:
kind: HelmRepository
name: bp-powerdns
namespace: flux-system
# disableWait: a Sovereign without bp-cnpg yet reconciled has no
# `pdns-pg-app` Secret (the chart's CNPG Cluster template is gated
# behind the `postgresql.cnpg.io/v1` CRD via Capabilities.APIVersions
# check — see chart/templates/cnpg-cluster.yaml). Without disableWait,
# Helm's `--wait` would hold until the powerdns Deployment is Ready,
# which can't happen until CNPG comes up and synthesises the Secret.
# The HelmRelease itself reports Ready as soon as the manifests apply
# cleanly; runtime convergence (powerdns pods becoming Ready once
# CNPG lands) is observed via kubectl, not gated on Helm.
install:
disableWait: true
remediation:
retries: 3
upgrade:
disableWait: true
remediation:
retries: 3

View File

@ -6,7 +6,7 @@ metadata:
catalyst.openova.io/category: per-host-cluster-infrastructure
catalyst.openova.io/section: pts-3-2-gitops-and-iac
spec:
version: 1.1.2
version: 1.1.3
card:
title: PowerDNS
summary: |

View File

@ -1,6 +1,6 @@
apiVersion: v2
name: bp-powerdns
version: 1.1.2
version: 1.1.3
description: |
Catalyst-curated Blueprint wrapper for PowerDNS Authoritative.
Carries Catalyst-specific values.yaml + templates (CNPG cluster, dnsdist

View File

@ -1,34 +1,78 @@
{{- /*
PowerDNS REST API + webserver credentials.
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode) the values flow from
helm install --set-string powerdns.apiKey=<random>,powerdns.webserverPassword=<random>
or, in the production deployment, from an ExternalSecret rendered by the
private-repo cluster manifest at clusters/contabo-mkt/apps/powerdns/.
This template ships a Secret that the upstream chart's deployment.yaml
reads via secretRef indirection (powerdns.api.key.secretRef and
powerdns.webserver.password.secretRef in values.yaml).
Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode) + #10 (credential
hygiene) we MUST never bake plaintext credentials into the chart, BUT we
also MUST give a freshly-installed Sovereign a way to bring its own
PowerDNS up without out-of-band ceremony — a Sovereign that bootstraps
itself from bp-* OCI artifacts has no operator standing by to inject a
Secret while Helm is mid-install.
Bootstrap rendering: the chart REQUIRES values .Values.powerdns.apiKey
and .Values.powerdns.webserverPassword OR an existing
`powerdns-api-credentials` Secret in-cluster (the private-repo
ExternalSecret produces the latter at deploy time). When neither is set,
helm install errors with a useful message via `required`.
The pattern below mirrors bitnami/postgresql, bitnami/keycloak, and the
upstream redis chart: on FIRST install, generate a 32-char random api-key
and 32-char random webserver-password and persist them in the Secret. On
every subsequent reconcile, `lookup` returns the existing Secret and we
re-emit the SAME values — no rotation on upgrade, no drift, no chicken-
and-egg with the deployment.
The Secret is created BEFORE the Deployment in Helm's normal install
order (alphabetical by kind: Secret < Deployment), so the powerdns pods
find their `powerdns-api-credentials` Secret on first start instead of
sitting in CreateContainerConfigError forever (which is what 1.1.2 did
on otech.omani.works — see PR fixing this).
Operator override:
- Set `.Values.powerdns.apiKey` and `.Values.powerdns.webserverPassword`
in the cluster overlay to inject specific values (e.g. when a
sealed-secret already pins them).
- Set `.Values.powerdns.useExistingApiSecret: true` to skip Secret
creation entirely and rely on a Secret named `powerdns-api-credentials`
already present in the namespace (created by an out-of-band
ExternalSecret / SealedSecret / etc).
*/}}
{{- if and (not .Values.powerdns.apiKey) (not .Values.powerdns.useExistingApiSecret) }}
{{- /* Operator must provide the secret out-of-band. Skip Secret creation —
the deployment will fail to start until the named Secret exists,
which is the explicit signal we want. */ -}}
{{- else if .Values.powerdns.apiKey }}
{{- if not .Values.powerdns.useExistingApiSecret }}
{{- $secretName := "powerdns-api-credentials" -}}
{{- $existing := lookup "v1" "Secret" .Release.Namespace $secretName -}}
{{- $apiKey := "" -}}
{{- $webPass := "" -}}
{{- if $existing -}}
{{- /* Reuse what's already there — never rotate on upgrade. */ -}}
{{- $apiKey = index $existing.data "api-key" | b64dec -}}
{{- $webPass = index $existing.data "webserver-password" | b64dec -}}
{{- end -}}
{{- /* Operator-supplied values win over both lookup and randAlphaNum. */ -}}
{{- if .Values.powerdns.apiKey -}}
{{- $apiKey = .Values.powerdns.apiKey -}}
{{- end -}}
{{- if .Values.powerdns.webserverPassword -}}
{{- $webPass = .Values.powerdns.webserverPassword -}}
{{- end -}}
{{- /* Fall back to fresh randoms only when neither lookup nor operator
provided a value (i.e. genuine first install). 32 chars from the
alphanum set per INVIOLABLE-PRINCIPLES #10 (>= 24 chars, no
dictionary words). */ -}}
{{- if not $apiKey -}}
{{- $apiKey = randAlphaNum 32 -}}
{{- end -}}
{{- if not $webPass -}}
{{- $webPass = randAlphaNum 32 -}}
{{- end -}}
apiVersion: v1
kind: Secret
metadata:
name: powerdns-api-credentials
name: {{ $secretName }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "bp-powerdns.labels" . | nindent 4 }}
annotations:
catalyst.openova.io/comment: |
Generated on first install via helm `lookup` + `randAlphaNum`. On
every subsequent reconcile the existing values are read back so the
Secret is stable across upgrades. Operator may override via
.Values.powerdns.apiKey / .Values.powerdns.webserverPassword in the
cluster overlay.
type: Opaque
stringData:
api-key: {{ required "powerdns.apiKey is required (random 32+ chars). See INVIOLABLE-PRINCIPLES #10." .Values.powerdns.apiKey | quote }}
webserver-password: {{ required "powerdns.webserverPassword is required." .Values.powerdns.webserverPassword | quote }}
api-key: {{ $apiKey | quote }}
webserver-password: {{ $webPass | quote }}
{{- end }}

View File

@ -39,6 +39,27 @@ catalystBlueprint:
# `helm dependency build` resolves the upstream as a subchart; values here
# under the `powerdns:` key flow into that subchart unchanged.
powerdns:
# ─── Catalyst-only credential injection knobs ───────────────────────
# Read by templates/api-credentials-secret.yaml (wrapper-level). The
# upstream subchart ignores these keys.
#
# Default behaviour: chart self-generates a 32-char api-key and 32-char
# webserver-password on first install (Helm `lookup` re-uses on every
# subsequent reconcile so the values are stable across upgrades).
#
# Operator overrides:
# apiKey + webserverPassword — pin specific values (e.g. when a
# sealed-secret already encodes them
# for cross-cluster GitOps).
# useExistingApiSecret: true — skip Secret creation entirely; the
# chart assumes a Secret named
# `powerdns-api-credentials` is
# provided out-of-band (ExternalSecret
# / SealedSecret / kubectl create).
apiKey: ""
webserverPassword: ""
useExistingApiSecret: false
# 3 replicas across regions — anycast-fronted public NS endpoints
# (per #167 acceptance criteria). Each replica connects to the same CNPG
# database; PowerDNS Authoritative is stateless beyond the database.
@ -118,10 +139,13 @@ powerdns:
powerdns:
# ─── REST API + webserver ───────────────────────────────────────────
# API key + webserver password flow from a Catalyst-managed K8s Secret
# (`powerdns-api-credentials` — see templates/secret.yaml). Per
# INVIOLABLE-PRINCIPLES #4 + #10 the values are NEVER inlined here.
# The upstream chart's secretRef helper takes flat name+key fields
# (see powerdns.secretRef in upstream _helpers.tpl).
# (`powerdns-api-credentials` — see templates/api-credentials-secret.yaml).
# Per INVIOLABLE-PRINCIPLES #4 + #10 the values are NEVER inlined here;
# the wrapper's api-credentials-secret.yaml self-generates 32-char
# randoms on first install and reuses them on every subsequent
# reconcile via Helm `lookup`. The upstream chart's secretRef helper
# takes flat name+key fields (see powerdns.secretRef in upstream
# _helpers.tpl).
api:
key:
name: powerdns-api-credentials
@ -136,6 +160,19 @@ powerdns:
name: powerdns-api-credentials
key: webserver-password
# ─── Bootstrap zone (DISABLED — Catalyst loads zones via PDM) ──────
# The upstream chart renders a `create-zone-if-not-exists-sh` Job
# that POSTs `zoneName` to /api/v1/servers/localhost/zones at install
# time. Catalyst does NOT use this — the Sovereign's zones are
# provisioned later by pool-domain-manager (PDM) via the same REST
# API after the cluster comes up. Setting `zoneName: ""` short-
# circuits the upstream Job's `{{- if and .Values.powerdns.zoneName
# .Values.powerdns.api.key }}` gate so the Job is never rendered,
# which means the install completes the moment the powerdns
# Deployment is Ready instead of waiting for a Job whose only effect
# is creating an `example.de.` placeholder zone we don't want.
zoneName: ""
# ─── DNS UPDATE (RFC 2136) ──────────────────────────────────────────
# Off — Catalyst writes records via the REST API only (cert-manager
# webhook + external-dns webhook + crossplane DNS XR all use the