openova/clusters/_template/bootstrap-kit/80-newapi.yaml
e3mrah 2ff50f0591
fix(bp-newapi+services-build): imagePullSecrets on Pod, sed bumps values.yaml smeTag (#955)
Two SME-blocker bugs caught live on otech113 (alice signup gate 5 fails on
fresh Sovereign):

#952 — bp-newapi 1.4.0 Pod has no imagePullSecrets, so kubelet pulls
PRIVATE ghcr.io/openova-io/openova/{newapi-mirror,services-metering-sidecar}
anonymously and gets 403 Forbidden. Fix:

- Templatize spec.imagePullSecrets on Deployment + channel-seed Job.
- Default values.yaml `imagePullSecrets: [{name: ghcr-pull}]`.
- Add `newapi` to flux-system/ghcr-pull's reflector
  reflection-{allowed,auto}-namespaces in cloudinit-control-plane.tftpl
  so bp-reflector mirrors the source Secret into the namespace
  automatically on every fresh Sovereign.
- Bump bp-newapi 1.4.0 -> 1.4.1, update _template overlay.

#953 — services-build.yaml's image-rewrite loop only matched the
hardcoded `image: ghcr.io/.../services-<svc>:<sha>` form. 7 of 8
sme-services templates use `image: "{{ ... }}/services-<svc>:{{
.Values.images.smeTag }}"`. Each services-build run bumped only
auth.yaml while reporting "update sme service images to ${SHA}",
leaving the live Pod on stale bytes (PR #951's #941 fix never reached
services-catalog despite the merge + chart bump chain). Fix:

- After the hardcoded loop, also bump `images.smeTag` in
  products/catalyst/chart/values.yaml with a strict regex match
  (`^  smeTag: "<sha>"$`); refuse to auto-bump if the line shape
  changes (defends against silent drift if a contributor renames the
  field).
- Mirror the change into the retry-path `rewrite()` function so a
  reset-to-origin/main retry does not recreate the original bug.

Tests:

- platform/newapi/chart/tests/imagepullsecrets-render.sh — 4 cases
  asserting the Deployment and channel-seed Job carry the default
  ghcr-pull reference, that an empty override suppresses the block,
  and that custom secret names propagate (Inviolable Principle #4).
- tests/integration/services-build-rewrite.sh — 3 cases reproducing
  the workflow's rewrite logic on a sandboxed copy of the live
  chart, asserting both auth.yaml's hardcoded line AND values.yaml's
  smeTag get bumped, that helm-render of the catalyst chart with
  the bumped values produces all 8 SME-service Deployments at the
  new SHA, and that an idempotent re-bump to a second SHA also lands
  cleanly.

Refs: #952 #953 (umbrella #915 — alice signup gate 5).

Co-authored-by: hatiyildiz <143030955+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:47:37 +04:00

213 lines
8.8 KiB
YAML

# bp-newapi — Catalyst Application Blueprint, bootstrap-kit slot 80.
# Multi-tenant LLM marketplace gateway. Ships in backend-only mode: the
# OpenAI-compatible API at api.<sovereign-fqdn>/v1/* is customer-facing;
# the upstream's portal UI is disabled at ingress; Catalyst replaces it
# as the customer surface; NewAPI's admin UI at admin.<sovereign-fqdn>
# is exposed only to ops staff (Keycloak-gated).
#
# This slot enables the SME-tenant turnkey experience (epic #795). The
# Catalyst signup hook (delivered by unified-rbac in #802 against the
# contract recorded in ADR-0003) reads the `catalyst-newapi-admin-token`
# Secret rendered by this chart's ExternalSecret to issue per-user API
# keys against NewAPI's admin API at `http://newapi.newapi.svc`.
#
# Wrapper chart: platform/newapi/chart/
# Catalyst-curated values: platform/newapi/chart/values.yaml
# Reconciled by: Flux on the new Sovereign's k3s control plane.
---
apiVersion: v1
kind: Namespace
metadata:
name: newapi
labels:
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bp-newapi
namespace: flux-system
spec:
type: oci
interval: 15m
url: oci://ghcr.io/openova-io
secretRef:
name: ghcr-pull
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: bp-newapi
namespace: flux-system
spec:
interval: 15m
releaseName: newapi
targetNamespace: newapi
# bp-newapi depends on:
# - bp-openbao(08): the secret backend the chart's ExternalSecret
# pulls `ADMIN_API_TOKEN` from. Without OpenBao Ready, the
# ExternalSecret never resolves and the Catalyst signup hook can't
# reach the NewAPI admin API.
# - bp-keycloak(09): the OIDC issuer for the ops-staff admin UI at
# admin.<sovereign-fqdn>. Without Keycloak Ready, the OIDC
# middleware can't redirect ops-staff requests.
# - bp-cnpg(16): operator provisions the Postgres cluster for users,
# credits, channels, and audit log via a Crossplane
# PostgresqlInstance claim once cnpg is Ready. The DSN is mounted
# into NewAPI via `database.existingSecret` (operator-set).
dependsOn:
- name: bp-openbao
- name: bp-keycloak
- name: bp-cnpg
chart:
spec:
chart: bp-newapi
# 1.4.0 (issue #943, 2026-05-05): auto-provision CNPG-backed
# Postgres + chart-emitted SESSION_SECRET/CRYPTO_SECRET so a
# Sovereign install lands a real Pod without operator intervention.
# Pre-#943 the Deployment silently skipped render whenever
# database.existingSecret OR credentials.existingSecret was
# empty (the bootstrap-kit overlay supplies neither), so NewAPI
# never came up and alice signup gate 5 (LLM) timed out. Both
# auto-provisions are capability-gated on bp-cnpg's CRD and
# operator-overridable per Inviolable Principle #4.
# 1.3.0: defaultChannels.qwenBankDhofar (channel #1 = Qwen3.6 @
# https://llm-api.omtd.bankdhofar.com) + post-install/post-upgrade
# `channel-seed` Helm hook Job that idempotently POSTs default
# channels into NewAPI's admin API. Issue #915 (epic SME tenant
# integration DoD: alice → OpenClaw → NewAPI → Qwen3.6@BankDhofar
# end-to-end).
# 1.2.0: Traefik Middleware gated behind ingress.middleware.enabled.
# 1.4.1 (issue #952, 2026-05-05): Pod imagePullSecrets templated +
# default to `[{name: ghcr-pull}]` so kubelet authenticates pulls
# of the PRIVATE newapi-mirror + metering-sidecar images. Paired
# with cloud-init adding `newapi` to flux-system/ghcr-pull's
# reflector auto-namespaces list.
version: 1.4.1
sourceRef:
kind: HelmRepository
name: bp-newapi
namespace: flux-system
# Event-driven install per docs/INVIOLABLE-PRINCIPLES.md #3 (Flux
# dependsOn is the gate, not Helm timeout). NewAPI itself starts in
# ~10 s once the Postgres DSN Secret is present; the long pole is
# waiting for the operator's Crossplane claim to materialise the DB.
install:
disableWait: true
remediation:
retries: 3
upgrade:
disableWait: true
remediation:
retries: 3
# Per-Sovereign overrides — the operator MUST supply at install time:
# - ingress.host = api.${SOVEREIGN_FQDN}
# - ingress.adminHost = admin.${SOVEREIGN_FQDN}
# - auth.adminUI.keycloak.issuer = https://auth.${SOVEREIGN_FQDN}/realms/ops
# - database.existingSecret = Postgres DSN Secret (from the
# Crossplane PostgresqlInstance claim)
# - credentials.existingSecret = SESSION_SECRET + CRYPTO_SECRET
# (rotated via OpenBao)
# - catalystIntegration.externalSecret.remoteRef.key
# = sovereign/${SOVEREIGN_FQDN}/newapi/admin-token
# - defaultChannels.vllm.enabled = true (first-otech)
# - defaultChannels.vllm.endpoint
# + defaultChannels.vllm.attestation.owner
#
# Defaults below wire the first-otech provider channel to the same
# upstream the OpenOva marketing site uses (Qwen via Axon →
# `llm-api.omtd.bankdhofar.com`, model `qwen3-coder`); the operator
# overlay overrides any of these by setting them in this HelmRelease's
# spec.values.
values:
sovereignFQDN: ${SOVEREIGN_FQDN}
ingress:
host: api.${SOVEREIGN_FQDN}
adminHost: admin.${SOVEREIGN_FQDN}
tls:
enabled: true
issuer: letsencrypt-prod
auth:
adminUI:
mode: keycloak
keycloak:
issuer: https://auth.${SOVEREIGN_FQDN}/realms/ops
clientId: newapi-admin
existingSecret: newapi-oidc
customerAPI:
keyIssuer: catalyst
catalystIntegration:
enabled: true
existingSecret: catalyst-newapi-admin-token
externalSecret:
enabled: true
refreshInterval: "1h"
secretStoreRef:
kind: ClusterSecretStore
name: vault-region1
remoteRef:
# Canonical OpenBao path per docs/INVIOLABLE-PRINCIPLES.md #4.
# Under the `vault-region1` store's `secret/` mount the full
# path is `secret/sovereign/<fqdn>/newapi/admin-token`.
key: sovereign/${SOVEREIGN_FQDN}/newapi/admin-token
property: ADMIN_API_TOKEN
# Default channels — chart-side composition (channel #1 first).
#
# `qwenBankDhofar` (issue #915) is the canonical first channel:
# Qwen3.6 hosted at BankDhofar (https://llm-api.omtd.bankdhofar.com,
# model `qwen3-coder` / alias `qwen3.6`) — the SAME relay the
# OpenOva marketing site's Axon helmrelease consumes
# (openova-private/clusters/contabo-mkt/apps/axon/helmrelease.yaml).
# Disabled in the template so a fresh Sovereign does not silently
# wire customers to a third-party endpoint; per-Sovereign overlays
# (clusters/<sovereign>/bootstrap-kit/80-newapi.yaml) enable this
# block and supply:
# - defaultChannels.qwenBankDhofar.enabled = true
# - defaultChannels.qwenBankDhofar.endpoint = https://llm-api.omtd.bankdhofar.com
# - defaultChannels.qwenBankDhofar.attestation.accountId (legal-team-owned)
# - defaultChannels.qwenBankDhofar.attestation.contractRef (legal-team-owned)
# - the Secret `newapi-channel-qwen-bankdhofar` containing the
# upstream API key under key `API_KEY` (or an ExternalSecret
# pulling from OpenBao at
# `sovereign/<sovereign-fqdn>/newapi/channel-qwen-bankdhofar`)
# - auth.adminUI.masterKeySecret = name of a Secret carrying
# `MASTER_KEY` (NewAPI bootstrap admin auth) — required for
# the channel-seed Helm hook Job to POST against the admin API
# ONCE at install time. Operator may rotate the master key out
# post-bootstrap; channels persist in Postgres.
#
# When the operator flips `qwenBankDhofar.enabled: true`, the
# chart's post-install/post-upgrade `channel-seed` Job probes
# NewAPI's admin API (`/api/channel/?keyword=<name>`) and POSTs
# the channel definition idempotently. Re-runs after upgrades
# are no-ops once the channel exists.
#
# The legacy `vllm` slot (in-cluster vLLM fallback) remains for
# operators that run their own bp-vllm + open-weight model in-
# cluster; it composes after `qwenBankDhofar` and any operator
# `.Values.channels`.
defaultChannels:
qwenBankDhofar:
enabled: false
name: qwen3.6-bankdhofar
endpoint: ""
models:
- qwen3.6
- qwen3-coder
existingSecret: newapi-channel-qwen-bankdhofar
existingSecretKey: API_KEY
attestation:
kind: commercial-contract
accountId: ""
contractRef: ""
vllm:
enabled: false
name: qwen
endpoint: ""
models:
- qwen3-coder
attestation:
kind: in-cluster
owner: ${SOVEREIGN_FQDN}