Commit Graph

33 Commits

Author SHA1 Message Date
e3mrah
95a06f56f8
fix(sme-marketplace): unblock PIN signin — route /api/* to sme/gateway + add send-pin alias (#868) (#869)
Two-part fix for marketplace UI signin flow which 503'd then 404'd on
otech103. Live debugging found two stacked bugs.

Part A — chart (HTTPRoute backend):
- marketplace-routes.yaml: /api/* rule now backendRefs sme/gateway:8080
  (cross-namespace) instead of catalyst-system/marketplace-api which had
  a Service selector matching zero Pods. The gateway in sme already
  fronts services-auth, catalog, tenant, billing, provisioning.
- marketplace-reference-grant.yaml: extend `to:` list with the gateway
  Service so the cross-ns hop is authorised by Gateway API.
- Bump bp-catalyst-platform 1.4.7 → 1.4.8 + lockstep slot 13 pin.

Part B — services-auth (route name):
- Add /auth/send-pin alias delegating to existing SendMagicLink handler,
  and /auth/verify-pin alias delegating to VerifyMagicLink. The
  marketplace UI surfaces a 6-digit PIN ("Send PIN" button), so the
  PIN-named routes are the canonical UX-facing names. /auth/magic-link
  and /auth/verify remain registered for backward compat.
- services-build workflow auto-rebuilds the auth image on push to
  core/services/** — no manual dispatch needed.

Refs: #868

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 08:22:17 +04:00
e3mrah
fa4395fa3a
fix(bp-catalyst-platform): wire VALKEY_PASSWORD into SME auth + gateway (#863) (#864)
After PR #862 (1.4.4) made cross-ns Valkey reachable from `sme` ns, the
auth Pod started CrashLoopBackOff with "NOAUTH HELLO must be called with
the client already authenticated". Root cause: bp-valkey 1.0.0 ships
auth.enabled=true (bitnami default) but SME service code + Deployment
templates never plumbed a password through.

Chart 1.4.4 -> 1.4.5. Slot 13 pin lockstep.

Changes:
- core/services/shared/db/valkey.go: add ConnectValkeyWithAuth overload
  taking username + password. ConnectValkey kept backwards-compatible
  for contabo-mkt's auth-less in-namespace Valkey.
- core/services/auth/main.go + gateway/main.go: read VALKEY_USERNAME +
  VALKEY_PASSWORD env, call ConnectValkeyWithAuth when password set,
  else fall through to no-auth path.
- NEW templates/sme-services/valkey-cross-ns-secret.yaml: Helm `lookup`
  reads bp-valkey's auto-generated `valkey-password` from the
  `valkey/valkey` Secret and re-emits it as `sme-valkey-auth` in `sme`
  ns. Same pattern as sme-secrets.yaml (#859) and gitea-admin-secret
  (#830 Bug 2). On first install the lookup may return nil; Flux's 15m
  reconcile picks up the mirror once bp-valkey is Ready.
- auth.yaml + gateway.yaml: add VALKEY_PASSWORD env from `sme-valkey-
  auth` Secret with optional=true so contabo-mkt's auth-less path keeps
  working when the mirror Secret is absent.
- values.yaml: add `smeServices.valkey.{sourceSecretName,
  sourcePasswordKey, destNamespace, destSecretName}` knobs (Inviolable
  Principle #4).

Live verified the failure mode on otech103: 11/13 SME pods Running 1/1,
auth in CrashLoopBackOff with NOAUTH HELLO error. Provisioning Pod's
CreateContainerConfigError is unrelated (ghcr-pull, separate ticket).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:09:38 +04:00
e3mrah
5cdb738ac9
fix(services): go mod tidy across sibling services after #798 shared deps bump (#821)
#798 added github.com/nats-io/nats.go to core/services/shared/go.mod and
adjusted x/sys/x/crypto/x/text to Go 1.22-compatible versions. The
sibling services (auth, catalog, domain, gateway, notification,
provisioning, tenant) reference the same shared module via the local
`replace` directive — their go.sum files must include the new transitive
hashes, otherwise the CI Containerfile build hits:

    go: updates to go.mod needed; to update it: go mod tidy

This commit is a pure `go mod tidy` across all 7 services; no source
changes. CI services-build is now unblocked.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:35:46 +04:00
e3mrah
9645a9044a
feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818)
* feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798)

Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add
the SME-2 metering integration end-to-end. NewAPI is consumed as the
upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned
mirror, not a fork) — the metering envelope is produced by a Go sidecar
that observes the OpenAI-style `usage.total_tokens` field on every
2xx /v1/* response. This avoids forking the upstream binary while still
producing the canonical envelope shape on `catalyst.usage.recorded`.

A) NewAPI metering sidecar — core/services/metering-sidecar/
   - Transparent reverse proxy in front of NewAPI on its own port; the
     bp-newapi Service routes the cluster-fronting port to the sidecar,
     which forwards to NewAPI on the pod's loopback.
   - Observes successful /v1/* JSON responses, parses
     `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes
     amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes
     one envelope on `catalyst.usage.recorded` per completed request.
   - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed.
   - Customer-facing latency is NEVER blocked on metering: the response
     body is restored before publish; on NATS unreachable the envelope
     is persisted to disk and retried by a background drain loop.
   - 14 unit tests (proxy + publisher + safeFilename guards).

B) sme-billing NATS subscriber — core/services/billing/handlers/
   metering_consumer.go
   - JetStream durable consumer `sme-billing-metering` on stream
     `CATALYST_USAGE` (provisioned by sme-billing on startup).
   - Idempotent on metadata.request_id via a UNIQUE partial index on
     credit_ledger.external_ref; redelivery from the broker collapses
     to a single ledger row.
   - Customer auto-create on cold start (the rbac sme.user.created
     envelope may land AFTER the first metered request; we don't strand
     usage waiting for it).
   - 11 unit tests covering happy-path, idempotency, malformed-payload
     poison-pill, missing-request-id, non-negative amount guard,
     resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak.

C) HTTP handler POST /billing/metering/record — handlers/metering.go
   - Synchronous validate → INSERT credit_ledger → return
     {ledger_entry_id, balance_after_omr, balance_after_micro_omr,
     duplicate}. Same payload + idempotency guard as the NATS path.
   - Auth: superadmin OR sovereign-admin (operator-admin model;
     end-user LLM traffic flows through the sidecar, never this URL).
   - 8 unit tests covering happy-path, idempotency, role gating,
     malformed-JSON, positive-amount rejection, customer-not-found.

D) Schema — core/services/billing/store/store.go
   - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT
     (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR
     exact integer — preserves precision at metering rates).
   - ADD COLUMN external_ref TEXT + UNIQUE partial index for
     idempotency dedup.
   - ADD COLUMN metadata JSONB for the raw envelope.
   - GetCreditBalance projects both amount_omr (legacy) and
     amount_micro_omr (new) into the integer-OMR view.
   - GetCreditBalanceMicroOMR returns canonical precision.
   - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0)
     distinguishes fresh insert from duplicate without a follow-up
     SELECT.

E) Wiring
   - core/services/shared/events/nats.go — minimal NATS JetStream
     publisher + subscriber surface; legacy RedPanda producer/consumer
     in events.go untouched per [Q-mine-3].
   - core/services/billing/main.go — NATS_URL env; subscriber wired
     in parallel with the existing RedPanda tenant-events consumer.
   - middleware/jwt.go — exported test helper WithClaims so handler
     tests can construct an authenticated context without minting a
     real signed token.
   - .github/workflows/services-build.yaml — metering-sidecar added
     to the build matrix; deploy job skips it (image consumed by the
     bp-newapi chart, not products/catalyst sme-services).

F) bp-newapi chart (1.0.0 → 1.1.0)
   - meteringSidecar block in values.yaml: image, port, NATS URL,
     priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool
     dir, header names, resources, securityContext (read-only-rootfs).
   - deployment.yaml renders the sidecar container + emptyDir spool
     volume when meteringSidecar.enabled (default true).
   - service.yaml routes the cluster-fronting :3000 to the sidecar
     when enabled, exposes a separate :3001 → NewAPI direct port for
     bp-catalyst-platform admin-API traffic (ADR-0003 §3.2).
   - networkpolicy.yaml allows the sidecar's port + nats-system
     egress for JetStream publish.

Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green.
Helm template renders cleanly with sidecar enabled and disabled.

Closes #798

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798)

Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which
lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000")
that does NOT scan directly into Go int64 — the integration test
TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on
the post-redeem balance read.

Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is
unambiguously bigint and Scan target stays uniform across pre-#798 rows
(amount_omr only) and post-#798 rows (amount_micro_omr present).

Affects:
  - GetCreditBalance
  - GetCreditBalanceMicroOMR
  - RecordUsage's running-balance read

Test mocks updated to match the new SQL prefix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:32:42 +04:00
e3mrah
2a034a0959
feat(catalog): unified catalog with Published flag — operator curates marketplace (#710 wave 2) (#724)
Single source of truth for apps; Sovereign-console operator decides which
apps marketplace customers see; marketplace storefront filters by
Published. Per founder rule 2026-05-04: unpublish is a marketplace-
visibility toggle, not a deployment-lifecycle action — existing tenant
deployments of an unpublished app keep running unaffected.

core/services/catalog/store/store.go
====================================
- App.Published bool — operator-controlled visibility
- ListPublishedApps: marketplace-storefront subset
  (Published=true AND System=false AND Deployable=true).
  System and Deployable are catalog-team-controlled; Published is the
  operator's curation knob.
- SetAppPublished(slug, bool) — hot-path one-bit write the Sovereign
  console hits per row toggle. Cheaper than UpdateApp; slug-keyed so
  the UI doesn't need the internal Mongo _id.
- UpdateApp: thread published through full-update path too.

core/services/catalog/handlers/handlers.go + routes.go
======================================================
- ListApps now honours ?published=true query param:
    GET /catalog/apps                  → operator view: every app
    GET /catalog/apps?published=true   → marketplace view: filtered
- New PATCH /catalog/admin/apps/{slug}/publish?value={true|false}
  for the Sovereign-console operator's row toggle.
- requireAdmin gating preserved on the admin endpoint.

core/services/catalog/handlers/seed.go
======================================
- migrateAppPublished: defaults Published=true on every existing app
  on the day Catalyst 1.3.x ships. Operators opt OUT of marketplace
  visibility per app, not IN — matches how a real SaaS storefront is
  curated and prevents an empty marketplace on flag-introduction day.
  Idempotent on re-run.

core/marketplace/src/lib/api.ts
================================
- getApps() now hits /catalog/apps?published=true so the marketplace
  storefront only renders the operator-curated subset.

DoD pending wave 2.5
====================
The Sovereign-console "Catalog & publishing" admin page (per-row
toggle UI) is the next chunk and ships in a follow-up — backend +
storefront filter are the load-bearing change here. Catalog admins
can flip the flag today via the PATCH endpoint; the per-row UI is
quality-of-life on top.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-04 11:37:03 +04:00
e3mrah
73d68d99c1
fix(auth-ux): HTML PIN email + copyable email pill + 6-box marketplace PIN + drop UI debris (#721) (#723)
Wave 1 of #721 — what the founder actually saw on console.openova.io
and marketplace.openova.io / marketplace.<sov>.

PIN email rewrite (catalyst-api auth.go)
========================================
Was: plaintext "Your OpenOva sign-in code:\n\n    9 6 5 1 2 8\n…"
Now: multipart/alternative MIME with a polished HTML alternative —
white card on neutral background, OpenOva mark + wordmark,
"Your sign-in code" heading, big tinted code block (34px monospaced,
10px letter-spacing, one-tap copy on iOS Mail), expiration + ignore
notice, footer credit. Inline styles only — Gmail/Outlook web strip
<style>. Card pinned at 480px so narrow webmail panes render correctly.
text/plain fallback kept for clients without HTML.

Catalyst-Zero verify page (VerifyPinPage.tsx)
=============================================
- Email shown as a copyable PILL with copy icon — click copies to
  clipboard, icon flips to a check for 1.5s. Selection-fallback for
  browsers without clipboard API.
- Centered title + subtitle (was left-aligned in 1.2.x).
- Microcopy: "Codes expire after 10 minutes — check your spam folder."

Marketplace checkout sign-in (CheckoutStep.svelte)
==================================================
- 1 single <input maxlength=6> → 6 separate <input maxlength=1>
  boxes with auto-advance, paste-fan-out (paste a 6-digit code anywhere
  on the row, all 6 boxes fill, autosubmits), backspace-back, ArrowLeft/
  Right navigation, autocomplete=one-time-code on first box for iOS SMS
  autofill, caret-transparent so the digit IS the caret.
- Email shown as the same copyable pill pattern (svg copy/check icons,
  hover-to-brand affordance).
- Dropped "Use a different email" link (browser back works).
- Added expire/spam microcopy below button.

Header + wayfinding cleanup
===========================
- Header.svelte: top-right "Sign in" button hidden when pathname is
  /checkout or /login. Two sign-in CTAs on the same screen was the UI
  debris caught live 2026-05-04.
- CheckoutStep.svelte: "← Back to Review" moved from bottom-left
  (where users don't look) to top-left above the Checkout heading,
  rendered with a chevron icon.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-04 11:30:24 +04:00
e3mrah
4946ccd125
feat(bp-catalyst-platform): expose marketplace + tenant wildcard, bump 1.3.0 (closes #710) (#719)
Marketplace exposure for franchised Sovereigns. Otech becomes a SaaS
operator with a single overlay toggle.

Changes
=======

products/catalyst/chart:
- Chart.yaml 1.2.7 → 1.3.0
- values.yaml: ingress.marketplace.enabled toggle (default false) +
  marketplace.{brand,currency,paymentProvider,signupPolicy} surface
- templates/sme-services/marketplace-routes.yaml: HTTPRoute
  marketplace.<sov> with /api/ → marketplace-api, /back-office/ → admin,
  / → marketplace; HTTPRoute *.<sov> → console (per-tenant wildcard)
- templates/sme-services/marketplace-reference-grant.yaml: cross-
  namespace ReferenceGrant from catalyst-system HTTPRoute → sme Services
- .helmignore: stop excluding sme-services/* and marketplace-api/* (only
  *.kustomization.yaml + *.ingress.yaml remain Kustomize-only)
- All sme-services/* + marketplace-api/* manifests wrapped with
  {{ if .Values.ingress.marketplace.enabled }} so non-marketplace
  Sovereigns render the chart unchanged

clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
- chart version 1.2.7 → 1.3.0
- ingress.hosts.marketplace.host: marketplace.${SOVEREIGN_FQDN}
- ingress.marketplace.enabled: ${MARKETPLACE_ENABLED:-false}

infra/hetzner:
- variables.tf: marketplace_enabled var (string "true"/"false", default "false")
- main.tf: thread var into cloudinit-control-plane.tftpl
- cloudinit-control-plane.tftpl: postBuild.substitute.MARKETPLACE_ENABLED
  on bootstrap-kit, sovereign-tls, infrastructure-config Kustomizations

products/catalyst/bootstrap/api/internal/provisioner/provisioner.go:
- Request.MarketplaceEnabled bool (json:"marketplaceEnabled")
- writeTfvars: marketplace_enabled = "true"|"false"

core/pool-domain-manager/internal/allocator/allocator.go:
- canonicalRecordSet adds "marketplace" prefix → marketplace.<sov>
  resolves via PDM at zone-commit time (PR #710 explicit record so
  caches don't depend on the *.<sov> wildcard alone)

DoD ready
=========
- helm template with ingress.marketplace.enabled=false → identical
  manifest set to 1.2.7 (verified locally)
- helm template with ingress.marketplace.enabled=true → emits 17 extra
  resources: 13 sme-services workloads + 2 marketplace-api + 1
  HTTPRoute pair + 1 ReferenceGrant
- pdm tests: TestCanonicalRecordSet, TestCommitDNSShape green
- catalyst-api builds, provisioner cloudinit_path_test green

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-04 07:47:37 +04:00
e3mrah
174ca02aba
feat(marketplace): omantel.openova.io vanity host with light-theme partner branding (#633)
Adds a tenant-aware branding layer to the marketplace so the same pods can
serve marketplace.openova.io (default OpenOva, dark) and omantel.openova.io
(Omantel logo, forced light theme) — no extra deployments, no extra resources.

Tomorrow's Omantel demo lands on omantel.openova.io and gets the partner
look without disturbing the existing marketplace.openova.io experience.

Changes
- src/lib/tenant.ts: hostname → tenant config (logo, brand, force theme,
  skip-console-redirect). Easy to extend with future partner hosts.
- src/layouts/Layout.astro: pre-hydration script sets <html data-tenant>
  and forces light theme for omantel before paint (zero flash). Returning-
  user redirect to console.openova.io/nova is suppressed for tenants with
  skipConsoleRedirect=true so the demo stays on the partner host.
- src/components/Header.svelte: renders both brand spans; CSS in
  global.css hides the inactive one based on html[data-tenant]. SSR'd
  HTML stays cacheable across hostnames.
- public/logos/omantel.svg: official Omantel wordmark (Wikimedia source,
  brand colours #283d90 navy + #e27739 orange).

Ingress + chart fixes
- products/catalyst/chart/templates/sme-services/ingress.yaml: adds two
  ingresses (omantel /api/ priority 200, omantel / priority 100) pointing
  at the existing gateway/marketplace services. cert-manager issues
  omantel-tls via letsencrypt-prod (DNS already resolves via the
  *.openova.io wildcard A record).
- products/catalyst/chart/templates/sme-services/marketplace.yaml: this
  path is Kustomize-applied (contabo-mkt only — Sovereigns skip via
  .helmignore), so the image must be a concrete string. PR #580 templated
  it with Helm syntax which produced InvalidImageName on the new
  ReplicaSet — rolling forward stalled. De-templatized and pinned to the
  current deployed SHA so the marketplace-build CI sed can update it.

Backwards compatibility
- marketplace.openova.io: identical render — default tenant 'openova',
  inline OpenOva SVG, dark theme by default, console redirect intact.
- Other hosts (console.openova.io, admin.openova.io): untouched.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:15:13 +04:00
e3mrah
5a403e66b1
fix(tls): DNS-01 wildcard TLS chain — solverName pdns, NodePort 30053, dynadot test fix (#582)
* fix(bp-harbor): CNPG database must be 'registry' not 'harbor' — matches coreDatabase

Harbor upstream always connects to a database named 'registry'
(harbor.database.external.coreDatabase default). The CNPG Cluster was
initialised with database='harbor', causing:

  FATAL: database "registry" does not exist (SQLSTATE 3D000)

Fix: change postgres.cluster.database default from 'harbor' → 'registry'
in values.yaml and cnpg-cluster.yaml template. Both the CNPG bootstrap
and Harbor's coreDatabase now use 'registry'.

Runtime fix on otech22: CREATE DATABASE registry OWNER harbor was run
against harbor-pg-1. harbor-core is now 1/1 Running.

Bump bp-harbor 1.2.1 → 1.2.2. Bootstrap-kit refs updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tls): DNS-01 wildcard TLS chain — solverName, NodePort 30053, dynadot test fix

Five independent fixes that together complete the DNS-01 wildcard TLS chain
for per-Sovereign certificate autonomy:

1. cert-manager-powerdns-webhook solverName mismatch (root cause of #550 echo):
   - values.yaml: `webhook.solverName: powerdns` → `pdns`
   - The zachomedia binary's Name() returns "pdns" (hardcoded). cert-manager
     calls POST /apis/<groupName>/v1alpha1/<solverName>; when solverName is
     "powerdns" cert-manager gets 404 → "server could not find the resource".

2. cert-manager-dynadot-webhook solver_test.go mock format:
   - writeOK() and error injection used old ResponseHeader-wrapped format
   - Real api3.json returns ResponseCode/Status directly in SetDnsResponse
   - This caused the image build to fail at ccc38987 so the dynadot fix
     never shipped; solver tests now pass cleanly (go test ./... OK)

3. PowerDNS NodePort 30053 anycast overlay (bootstrap-kit and template):
   - _template/bootstrap-kit/11-powerdns.yaml: adds anycast NodePort values
   - omantel + otech bootstrap-kit: same NodePort 30053 overlay applied
   - anycast-endpoint.yaml: optional nodePort field rendered in port list

4. Hetzner LB + firewall for DNS port 53 (infra/hetzner/main.tf):
   - hcloud_load_balancer_service.dns: TCP:53 → NodePort 30053
   - Firewall: TCP+UDP :53 from 0.0.0.0/0,::/0

5. dynadot-client JSON parsing fix (core/pkg/dynadot-client):
   - AddRecord + SetFullDNS: struct no longer wraps respHeader in ResponseHeader
   - client_test.go: mock responses updated to real api3.json format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: alierenbaysal <alierenbaysal@openova.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:49:58 +04:00
hatiyildiz
7c3ff940ff fix(ci): update solver_test.go fixtures + expected-bootstrap-deps.yaml for #550
- core/cmd/cert-manager-dynadot-webhook/solver_test.go: fix SetDns2Response →
  SetDnsResponse and ResponseCode:"0" → ResponseCode:0 in test fixtures so
  webhook command tests pass against the corrected dynadot-client JSON parsing
- scripts/expected-bootstrap-deps.yaml: declare bp-cert-manager-dynadot-webhook
  at slot 49b so the bootstrap-kit dependency-graph audit passes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 10:44:18 +02:00
e3mrah
ccc38987c2
fix(tls): bp-cert-manager-dynadot-webhook slot 49b + DNS-01 JSON bug (Closes #550) (#558)
Root cause: bootstrap-kit installs bp-cert-manager-powerdns-webhook (slot 49)
but the letsencrypt-dns01-prod ClusterIssuer wires to the dynadot webhook
(groupName: acme.dynadot.openova.io). Without slot 49b the APIService for
acme.dynadot.openova.io does not exist → cert-manager gets "forbidden" on
every ChallengeRequest → sovereign-wildcard-tls stays in Issuing indefinitely
→ HTTPS gateway has no cert → SSL_ERROR_SYSCALL on the handover URL.

Changes:
- core/pkg/dynadot-client: fix SetDnsResponse JSON key (was SetDns2Response,
  API returns SetDnsResponse); change ResponseCode to json.Number (API returns
  integer 0, not string "0"); update tests to match real API response format
- platform/cert-manager-dynadot-webhook/chart:
  - rbac.yaml: add domain-solver ClusterRole + ClusterRoleBinding so
    cert-manager SA can CREATE on acme.dynadot.openova.io (the "forbidden" fix)
  - values.yaml: add certManager.{namespace,serviceAccountName}, clusterIssuer.*
    and privateKeySecretRefName; add rbac.create comment for domain-solver
  - certificate.yaml: trunc 64 on commonName (was 76 bytes, cert-manager rejects >64)
  - clusterissuer.yaml: new template (skip-render default, enabled via overlay)
  - deployment.yaml: add imagePullSecrets support (required for private GHCR)
  - Chart.yaml: bump to 1.1.0
- clusters/_template/bootstrap-kit:
  - 49b-bp-cert-manager-dynadot-webhook.yaml: new slot (PRE-handover issuer)
  - kustomization.yaml: add 49b entry
- infra/hetzner:
  - variables.tf: add dynadot_managed_domains variable
  - main.tf: pass dynadot_{key,secret,managed_domains} to cloud-init template
  - cloudinit-control-plane.tftpl: write cert-manager/dynadot-api-credentials
    Secret + apply it before Flux reconciles bootstrap-kit

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:42:13 +04:00
e3mrah
3a34969a2f
feat(catalyst+pdm): Sovereign self-decommission + post-handover redirect (closes #319) (#451)
Customer-side decommission UI + PDM release endpoints + Catalyst-Zero
redirect to console.<sovereign-fqdn> once handover is finalised.

Anti-duplication map (canonical seams reused, NOT duplicated):
  - catalyst-api wipe.go: existing wipe endpoint already drives PDM
    release + Hetzner purge + tofu destroy + local cleanup. The new
    DecommissionPage POSTs to the same endpoint with an optional
    backup-destination payload.
  - PDM Allocator.Release: child zone delete + parent-zone NS revert
    + allocation row delete already idempotent. The new sovereign-side
    POST /api/v1/release is a thin FQDN-shaped wrapper that splits at
    the first dot and delegates to Allocator.Release.
  - The orphan force-release path adds gates (X-Force-Release-Confirm
    header, 30-day grace, DNS-NXDOMAIN check) on top of the same seam.

Scope contract with #317 (handover finalisation): NOT touching
internal/handler/handover.go. AdoptedAt is a new contract field on
Deployment + store.Record that the redirect helper consumes; future
#317 enhancement will populate it before deletion.

Files:
  core/pool-domain-manager/internal/handler/release.go         (NEW)
  core/pool-domain-manager/internal/handler/release_test.go    (NEW)
  core/pool-domain-manager/internal/handler/handler.go         (route wiring)
  products/catalyst/bootstrap/api/internal/handler/deployments.go     (AdoptedAt field + State()/toRecord/fromRecord)
  products/catalyst/bootstrap/api/internal/handler/deployments_adopted_test.go (NEW)
  products/catalyst/bootstrap/api/internal/store/store.go      (AdoptedAt persistence)
  products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.tsx        (NEW)
  products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.test.tsx   (NEW)
  products/catalyst/bootstrap/ui/src/pages/sovereign/Dashboard.tsx    (Decommission link)
  products/catalyst/bootstrap/ui/src/app/router.tsx            (redirect + decom route)
  docs/omantel-handover-wbs.md                                 (T319 → done)

Tests: 13 new Go test cases + 5 new vitest cases all green. catalyst-
api + PDM full suites pass. Live execution against omantel deferred to
Phase 8 per ticket scope (no Dynadot/Hetzner exec here).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
2026-05-01 19:27:18 +04:00
e3mrah
5502d9aa48
feat(dns): cert-manager-dynadot-webhook for DNS-01 wildcard TLS (closes #159) (#291)
Activates the previously-templated `letsencrypt-dns01-prod` ClusterIssuer
in bp-cert-manager by shipping the missing piece — a Go binary that
satisfies cert-manager's external webhook contract
(`webhook.acme.cert-manager.io/v1alpha1`) against the Dynadot api3.json.

Architecture
============

* `core/pkg/dynadot-client/` — canonical Dynadot HTTP client (shared with
  pool-domain-manager and catalyst-dns). Encapsulates the api3.json
  transport, command builders, response decoding, and the safe
  read-modify-write semantics required to never accidentally wipe a
  zone (memory: feedback_dynadot_dns.md). Destructive `set_dns2`
  variant is unexported.
* `core/cmd/cert-manager-dynadot-webhook/` — the cert-manager webhook
  binary. Implements `Solver.Present` via the client's append-only
  `AddRecord` path and `Solver.CleanUp` via the read-modify-write
  `RemoveSubRecord` path. Domain allowlist (`DYNADOT_MANAGED_DOMAINS`)
  rejects challenges for unmanaged apexes BEFORE any Dynadot call.
* `platform/cert-manager-dynadot-webhook/` — Catalyst-authored Helm
  wrapper. Templates Deployment + Service + APIService + serving
  Certificate (CA chain via cert-manager Issuer self-signing) +
  RBAC + ServiceAccount. Mirrors the standard cert-manager external-
  webhook deployment shape.
* `platform/cert-manager/chart/` — flips `dns01.enabled: true` so the
  paired ClusterIssuer activates. The interim http01 issuer remains
  templated as the rollback path.

Test results
============

  core/pkg/dynadot-client          — 7 tests PASS  (race-clean)
  core/cmd/cert-manager-dynadot-... — 9 tests PASS  (race-clean)

Test coverage includes a Present/CleanUp round-trip against an
httptest fixture that models Dynadot's zone state, an explicit
unmanaged-domain rejection, a regression preserving a pre-existing
CNAME across the DNS-01 round-trip (the zone-wipe defence), and a
typed-error propagation test that surfaces `ErrInvalidToken` to
cert-manager so the controller will retry.

Helm template smoke render
==========================

`helm template` against the new chart with default values yields 12
resources / 424 lines (APIService, Certificate, ClusterRoleBinding,
Deployment, Issuer, Role, RoleBinding, Service, ServiceAccount). The
modified bp-cert-manager chart still renders both ClusterIssuers
(`letsencrypt-dns01-prod` + `letsencrypt-http01-prod`) with default
values; flipping `certManager.issuers.dns01.enabled=false` is the
clean rollback.

Smoke command (post-deploy)
===========================

  kubectl get apiservices.apiregistration.k8s.io \
    v1alpha1.acme.dynadot.openova.io
  # Issue a *.<sovereign>.<pool> wildcard cert and watch the
  # Order/Challenge progress through cert-manager.

CI
==

`.github/workflows/build-cert-manager-dynadot-webhook.yaml` mirrors the
pool-domain-manager-build pattern (cosign keyless signing, SBOM
attestation, GHCR push at `ghcr.io/openova-io/openova/cert-manager-
dynadot-webhook:<sha>`). Triggered by changes to either the binary or
the shared dynadot-client package.

Closes #159

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:37:47 +04:00
hatiyildiz
20f5dca902 feat(wizard): #169 — StepDomain three-mode (pool / byo-manual / byo-api)
Closes openova#169.

Wizard UI:
- New StepDomain.tsx with three radio modes (pool / BYO manual NS / BYO
  registrar API). Pool flow unchanged from #163. BYO-manual surfaces the
  three OpenOva nameservers (ns1-3.openova.io) verbatim with copy buttons.
  BYO-api adds a registrar dropdown (Cloudflare, Namecheap, GoDaddy, OVH,
  Dynadot) + token field + Validate button — read-only validation hits
  /api/v1/registrar/{r}/validate before Next is enabled.
- StepOrg trimmed to org-only fields (domain capture moved to StepDomain).
- WizardPage + WizardLayout add the new "Domain" step (now 7 steps total).

Wizard store:
- DomainMode expanded to 'pool' | 'byo-manual' | 'byo-api' with legacy
  'byo' coerced to 'byo-manual' on rehydrate.
- New fields: registrarType (RegistrarType | null), registrarToken,
  registrarTokenValidated.
- partialize() strips registrarToken + registrarTokenValidated from
  localStorage (credential hygiene per docs/INVIOLABLE-PRINCIPLES.md #10).
- setSovereignDomainMode cascades a clean reset of irrelevant fields.

PDM (core/pool-domain-manager):
- New endpoint POST /api/v1/registrar/{registrar}/validate — read-only
  twin of /set-ns. Calls adapter.ValidateToken; never flips NS records.
  Maps registrar errors to canonical HTTP statuses (401/403/429/502).
  Token never enters a logged struct.

catalyst-api (products/catalyst/bootstrap/api):
- New handler/registrar.go — thin proxy that forwards
  /api/v1/registrar/{r}/{validate|set-ns} to PDM's matching endpoint,
  reading the body once and streaming PDM's response status + body
  verbatim so the wizard's error-mapping vocabulary stays consistent.

Tests:
- StepDomain.test.tsx — 18 vitest cases covering all three modes,
  mode-switch field cleanup, validate fetch happy/error paths, token
  invalidation on edit.
- store.test.ts — wizard-store mutations + persist hygiene.
- StepSuccess.test.tsx — fixture updated 'byo' -> 'byo-manual'.
- registrar_test.go (PDM) — 7 new test cases for /validate covering
  happy, invalid-token, domain-not-in-account, unsupported-registrar,
  missing-fields, bad-JSON, response-doesnt-leak-token.

103 vitest cases pass. Go tests pass for both PDM and catalyst-api.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:01:07 +02:00
hatiyildiz
a6fb7410f4 feat(pdm): per-Sovereign PowerDNS zones for #168
Refactor pool-domain-manager to own per-Sovereign zones in PowerDNS,
replacing the previous Dynadot-set_dns2 record-write flow.

Phase 1 — internal/pdns: REST client for PowerDNS Authoritative API
  - CreateZone / DeleteZone / EnsureZone / ZoneExists
  - PatchRRSets (atomic batch RRset writes)
  - AddARecord / AddNSDelegation / RemoveNSDelegation
  - EnableDNSSEC: PUT dnssec flag, generate KSK+ZSK (algorithm 13
    ECDSAP256SHA256 per docs/PLATFORM-POWERDNS.md), POST rectify
  - retry-once-on-5xx with exponential backoff (250ms, 1s)
  - X-API-Key header from K8s Secret, never logged
  - 22 unit tests covering every method against httptest mock

Phase 2 — allocator: DNSWriter interface + per-Sovereign lifecycle
  - /reserve: insert pdm-pg row + create child zone with apex NS
    RRset + add NS delegation into parent + enable DNSSEC on child
  - /commit: write the canonical 6-record set (apex, *, console,
    api, gitea, harbor) into child zone, TTL 300, atomic PATCH
  - /release: drop child zone (DNSSEC keys retire) + remove parent
    NS delegation, idempotent on 404
  - sweeper teardowns DNS for expired reservations before deleting
    pdm-pg rows
  - rollback path on Reserve failure preserves operator UX
  - allocator_test.go: fake DNSWriter for state-machine assertions

Phase 3 — startup parent-zone bootstrap
  - BootstrapParentZones runs at PDM startup before HTTP serves
  - EnsureZone for every entry in DYNADOT_MANAGED_DOMAINS
  - DNSSEC enabled on each parent zone (idempotent)
  - PDM exits non-zero if bootstrap fails

Phase 4 — schema unchanged
  - child zone name derived as <subdomain>.<poolDomain>, no new column
  - existing pool_allocations table works as-is

Phase 5 — dynadot package trimmed
  - removed AddSovereignRecords / DeleteSubdomainRecords / AddRecord /
    getZone / writeZone (Dynadot DNS write code)
  - kept IsManagedDomain / ManagedDomains / ResetManagedDomains /
    ErrUnmanagedDomain (config-resolution helpers)
  - registrar adapter at internal/registrar/dynadot/ untouched (handles
    BYO Flow B NS-delegation via #170)

Phase 6 — env-var contract
  PDM_PDNS_BASE_URL, PDM_PDNS_API_KEY, PDM_PDNS_SERVER_ID, PDM_NAMESERVERS
  all runtime-configurable per docs/INVIOLABLE-PRINCIPLES.md #4.

Quality bar (all met):
  - DNSSEC enabled on every child zone (mandatory per spec)
  - parent NS delegation TTL 3600, child A-record TTL 300
  - retry-once-on-5xx with exponential backoff in pdns client
  - all credentials flow from env vars sourced from K8s Secrets
  - no hardcoded URLs, regions, or NS endpoints

Closes openova#168 (DNS-side; private-repo manifest update lands separately).
2026-04-29 08:36:45 +02:00
hatiyildiz
567d7e1f60 feat(pdm): registrar adapters for Cloudflare, Namecheap, GoDaddy, OVH, Dynadot (#170)
Adds the BYO Flow B (#166) registrar-flip seam: PDM now exposes a
provider-agnostic Registrar interface and 5 adapter implementations
plus a new HTTP endpoint that dispatches to them.

Wire surface
- POST /api/v1/registrar/{registrar}/set-ns
  Body: {"domain":"...","token":"...","nameservers":["..."]}
  Reply: {"success":true,"registrar":"...","domain":"...",
          "nameservers":["..."],"propagation":"..."}
- GET /healthz now lists the wired-in registrar names.

Interface (internal/registrar/registrar.go)
- Name(), ValidateToken, SetNameservers, GetNameservers
- Typed errors: ErrInvalidToken, ErrRateLimited, ErrDomainNotInAccount,
  ErrAPIUnavailable, ErrUnsupportedRegistrar
- Registry map[string]Registrar with Lookup + Names helpers

Adapters
- internal/registrar/cloudflare/  — API v4 with Bearer token; verifies
  via /user/tokens/verify, looks up zone by name, PATCHes name_servers
- internal/registrar/namecheap/   — XML API; ApiUser+ApiKey+UserName+
  ClientIp auth; getBalances probe + getList domain check; setCustom
  for write. IP-whitelisting requirement documented in source comments
- internal/registrar/godaddy/     — v1 API with sso-key auth;
  GET /v1/domains list + PATCH /v1/domains/{d} with nameServers body
- internal/registrar/ovh/         — request signing (HMAC-SHA1 over
  appSecret+consumerKey+method+url+body+timestamp); GET /domain probe;
  POST /domain/{d}/nameServers/update for write; GET .../nameServer[/{id}]
  for read
- internal/registrar/dynadot/     — api3.json with key+secret as colon-
  separated token; uses set_ns + domain_info commands. Distinct from
  the existing internal/dynadot package which is the DNS-record writer
  for OpenOva-managed pool domains (different concern: pool DNS vs.
  customer-domain registrar NS-flip)

Token hygiene (per docs/INVIOLABLE-PRINCIPLES.md #10)
- Tokens never persisted: in-memory only for the duration of the call
- Never logged: handler uses classifyOutcome to render redacted
  outcome labels, never the raw error message or token
- Never echoed in responses
- TestSetNSResponseDoesNotEchoToken + TestSetNSHappy assert no token
  bytes appear in JSON body or zerolog/slog output

Tests
- 74 new unit tests (httptest server per adapter):
  cloudflare 11, dynadot 11, godaddy 11, namecheap 13, ovh 12,
  handler 14, registrar interface 2
- Each adapter covers: happy path, bad-token, rate-limited (429),
  bad-domain (404 / not-in-account), empty-NS guard, name+default
- OVH signature math verified deterministically via injected nowFn

Acceptance (issue #170)
- All 5 adapters pass their unit tests
- PDM /api/v1/registrar/{r}/set-ns endpoint live
- Wired into cmd/pdm/main.go: every adapter registered at startup

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), each adapter's
BaseURL is constructor-default + struct-overridable, so tests inject
httptest endpoints without environment shenanigans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:46:30 +02:00
hatiyildiz
585b046f5d feat(pdm): pool-domain-manager service skeleton (Phase 1 of #163)
Build a new Go service core/pool-domain-manager that becomes the SOLE
authority for OpenOva-pool subdomain allocation across the fleet.

Why this exists: today products/catalyst/bootstrap/api/internal/handler/
subdomains.go does naive net.LookupHost() to decide whether a candidate
subdomain is taken. Dynadot's wildcard parking record at the apex of
omani.works (and any future pool domain) makes EVERY subdomain resolve
to 185.53.179.128, so the check rejects everything. DNS is the wrong
source of truth for an OpenOva-managed pool — the central control plane
must own the allocation table.

What this commit adds (no integration with catalyst-api yet — that lands
in a follow-up commit):

  core/pool-domain-manager/
    cmd/pdm/main.go                     chi router, healthz, sweeper boot
    api/openapi.yaml                     wire contract for every endpoint
    Containerfile                        alpine final stage, UID 65534
    internal/store/                      pgx + CNPG; pool_allocations table
      migrations.sql                       idempotent CREATE TABLE schema
      store.go                             Reserve/Get/Commit/Release/List
      store_test.go                        integration tests (PDM_TEST_DSN)
    internal/dynadot/                    moved + extended; SOLE Dynadot caller
      dynadot.go                           AddRecord, AddSovereignRecords,
                                           DeleteSubdomainRecords (read-modify-
                                           write to honour feedback_dynadot_dns)
      dynadot_test.go                      managed-domain resolution tests
    internal/reserved/                   centralised reserved-name list
      reserved.go                          IsReserved/All; pulled out of
                                           catalyst-api's subdomains.go
    internal/handler/                    HTTP surface
      handler.go                           /api/v1/pool/{domain}/{check,reserve,
                                           commit,release,list}, /healthz,
                                           /api/v1/reserved
    internal/allocator/                  state machine + sweeper goroutine

Architecture choices and how they map to docs/INVIOLABLE-PRINCIPLES.md:

  - Principle #4 (never hardcode): every value (PORT, PDM_DATABASE_URL,
    DYNADOT_MANAGED_DOMAINS, PDM_RESERVATION_TTL, PDM_SWEEPER_INTERVAL)
    flows from env vars; the K8s ExternalSecret will populate them at
    deploy time. The reserved-subdomain list lives in ONE place
    (internal/reserved); catalyst-api will not duplicate it.

  - Principle #2 (no quality compromise): the state machine commits the
    DB row before the Dynadot side-effect, so a crash between the two
    leaves the system in a recoverable state (operator runs Release).
    The reservation_token in the row protects against stale-tab commit
    races. UPSERT semantics + a CHECK constraint mean two operators
    racing /reserve get a clean 23505 (unique_violation) → HTTP 409.

  - Principle #3 (follow architecture): PDM is a ClusterIP service in
    openova-system — it is not a Crossplane provider, not a Flux
    HelmRelease, not bespoke OpenTofu state. catalyst-api speaks to it
    via plain HTTP. The Crossplane Composition that wraps PDM as a
    declarative MR (XDynadotPoolAllocation) lands in a follow-up phase.

The DNS-wildcard problem the issue describes is fixed STRUCTURALLY here:
PDM never calls net.LookupHost. The /check path is a single SELECT
against pool_allocations. omani.works's wildcard A record at the apex
becomes architecturally irrelevant.

Tests exercised in this commit:
  - internal/reserved: full unit coverage (case-insensitive, sorted, set
    membership)
  - internal/dynadot: managed-domain runtime resolution (env-var,
    legacy single-domain fallback, built-in defaults, list parsing)
  - internal/store: integration suite gated on PDM_TEST_DSN env var,
    covers reserve happy-path, reserve race (ErrConflict), TTL expiry
    frees, commit happy-path, commit token mismatch, release removes
    row, sweeper deletes expired rows

Closes phase 1 of #163. Phase 2 (catalyst-api wiring), Phase 3 (CI +
manifests), Phase 4 (Crossplane composition), Phase 6 (deploy +
verification curl) follow in separate commits.

Refs: #163
2026-04-29 06:37:38 +02:00
Emrah Baysal
9519c1ef00 merge: Group L testing (Playwright e2e smoke tests, Hetzner provisioning test scaffold gated on HETZNER_TEST_TOKEN secret, integration tests for bootstrap installer + Dynadot + voucher) 2026-04-28 14:05:59 +02:00
hatiyildiz
7edf63ca7e docs(franchise),test(billing): voucher CRD propagation invariant
#118 verifies that the voucher shape on a franchised Sovereign is
identical to Catalyst-Zero. Two artefacts:

1. New §"Voucher shape propagates automatically" in
   docs/FRANCHISE-MODEL.md explaining WHY there is no propagation
   problem to solve: vouchers are not a CRD. They are rows in the
   per-Sovereign billing service's Postgres database, and every
   Sovereign runs the same SHA-pinned core/services/billing image.
   Same image → same migration → same schema → same handlers → same
   shape. The doc lists which file owns each part of the shape and
   includes a 4-step curl smoke test to run on any Sovereign at
   first-provisioning to confirm the invariant holds.

2. New core/services/billing/handlers/vouchers_test.go covering the
   public POST /billing/vouchers/redeem-preview endpoint added in
   #117. Four cases:
   - 404 on unknown / soft-deleted code (no tombstone leak)
   - 200 on a valid live code, asserting the public shape excludes
     times_redeemed and max_redemptions (defence-in-depth against
     enumeration)
   - 410 Gone on a code that exists but has hit its cap, with the
     credit/description still in the response so the landing page can
     show "campaign ended"
   - 400 on whitespace-only input

The tests run on every CI build of the billing service, on every
Sovereign that builds from this repo. If a future change drifts the
preview endpoint's shape, the tests fail before the regression can
ship.

Also tidies vouchers.go imports (removed two unused stdlib imports
that were placeholder).

Closes #118.
2026-04-28 13:59:31 +02:00
hatiyildiz
9404632830 feat(marketplace): public /redeem?code=... voucher landing flow
#116 adds the public landing page that the franchise model relies on
to convert voucher distribution into Catalyst signups (per
docs/FRANCHISE-MODEL.md §3, "redemption flow end-to-end").

New page core/marketplace/src/pages/redeem.astro:

- Reads ?code=... from the URL (or accepts manual entry if absent).
- POSTs to /api/billing/vouchers/redeem-preview (added in #117) — does
  NOT consume the voucher, just validates it.
- Renders one of four states:
  * Valid (200): "X OMR credit" + description + "Sign up to redeem"
    CTA. The CTA stashes the code in localStorage under
    `sme-pending-voucher` and routes to /plans (the start of the
    existing signup wizard).
  * Campaign ended (410): inactive or capped — shows the credit that
    was offered + a path to sign up without a voucher.
  * Not valid (404): never existed or soft-deleted (#91 tombstone-leak
    protection — the two are indistinguishable on the public surface).
  * No code present: a manual input form so a redeemer who landed on
    /redeem without a query string can paste their code.

CheckoutStep wiring (core/marketplace/src/components/CheckoutStep.svelte):

- The `promoCode` $state now hydrates from `sme-pending-voucher` so a
  redeemer arriving via /redeem reaches /checkout with the field
  pre-filled. They can still edit or clear it.
- After submitting to /billing/checkout, we clear the localStorage
  stash. This prevents a second signup on the same browser from
  silently carrying over the previous voucher.

The actual redemption (insert into promo_redemptions, increment
times_redeemed, credit_ledger entry) still happens transactionally
inside POST /billing/checkout — splitting it out would risk a
partially-redeemed code with no Order to show for it (the same
class of bug #91 fixed).

Per docs/INVIOLABLE-PRINCIPLES.md §1: target-state shape, not MVP.
The page handles all four observable backend states; manual-entry
fallback is included; the "campaign ended" path keeps the user moving
into signup rather than dead-ending.

Closes #116.
2026-04-28 13:56:54 +02:00
hatiyildiz
12387a4a74 feat(billing): /billing/vouchers/{issue,list,revoke,redeem-preview} surface
#117 adds a franchise-aligned URL surface for the existing PromoCode
voucher implementation, plus one new endpoint (redeem-preview) for the
public landing flow described in docs/FRANCHISE-MODEL.md §3.

The orchestrator's hint was right — the issue/list/revoke handlers
already exist (AdminUpsertPromo / AdminListPromos / AdminDeletePromo
on the legacy /billing/admin/promos surface). This commit:

1. Adds new endpoint handlers in core/services/billing/handlers/vouchers.go:
   - POST   /billing/vouchers/issue          (superadmin or sovereign-admin)
   - GET    /billing/vouchers/list           (superadmin or sovereign-admin)
   - DELETE /billing/vouchers/revoke/{code}  (superadmin or sovereign-admin)
   - POST   /billing/vouchers/redeem-preview (unauthenticated; public)

   The first three reuse the existing store-layer methods. The last is
   new — it validates a code without consuming it, returning a safe
   shape (no times_redeemed, no max_redemptions exposure) so an
   attacker scraping the public endpoint cannot enumerate cap status.

2. Distinguishes 404 (code never existed or soft-deleted — same
   tombstone-leak protection as #91) from 410 Gone (code exists but is
   inactive or capped). The 410 body still includes the credit and
   description so the landing page can show "this campaign has ended".

3. Keeps the legacy /billing/admin/promos endpoints in place — the
   existing admin UI continues to work without any breaking change.
   New code should target /billing/vouchers/...

4. Updates docs/FRANCHISE-MODEL.md to point to the new URL surface.

The actual REDEMPTION still happens transactionally inside POST
/billing/checkout via the `promo_code` field — that path locks the
promo row, inserts the promo_redemptions edge, increments
times_redeemed, and adds the credit_ledger entry in one transaction.
Splitting it into a separate /redeem endpoint would break that
atomicity, so we deliberately do not add one. The public redeem flow
is preview → signup → checkout-with-promo_code.

Closes #117.
2026-04-28 13:54:19 +02:00
hatiyildiz
3e956b7d81 test: voucher issuance integration test — real Postgres (#147)
Closes the Group L "integration test — voucher issuance via API — issue
→ redeem → Org created path" ticket.

Per docs/INVIOLABLE-PRINCIPLES.md principle #2 (no mocks where the test
would otherwise verify real behavior), this test runs against a real
PostgreSQL — not sqlmock. The voucher mechanic lives in
store.RedeemPromoCode which runs a transaction with SELECT FOR UPDATE on
promo_codes, COUNT lookup on promo_redemptions, and inserts into
credit_ledger. Mocking SQL strings doesn't verify whether the
transactional invariants actually hold under concurrent contention; this
codebase has been bitten by exactly that gap before (#93: counter
incremented before order was committed).

The test is gated on BILLING_TEST_PG_URL — when unset, it skips (NOT
mocks). CI populates it via the new postgres service container in
.github/workflows/test-billing-integration.yaml.

Each test gets its own Postgres schema (via CREATE SCHEMA + libpq's
options=-c search_path) so parallel runs don't cross-contaminate, and so
goroutine concurrency tests reliably hit the same schema regardless of
which pooled connection they pick up.

Coverage:
  - Issue → Redeem → Credit applied (the canonical happy path)
  - Per-customer double-redemption blocked
  - Redemption cap enforced under concurrency (12 goroutines fighting
    for a 5-cap voucher → exactly 5 successful redemptions, no more)
  - Soft-deleted codes rejected as "not found" (no tombstone leak per #91)
  - Inactive codes rejected with distinct "not active" error
  - Two different customers can each redeem the same voucher
  - Org-creation prerequisites: customer.tenant_id non-empty, balance > 0
    (these are the inputs the downstream tenant.created event consumer
    feeds into CreateTenant — covered by tenant-service consumer_test.go)

CI workflow added: .github/workflows/test-billing-integration.yaml runs
the tests against a postgres:16-alpine service container with -race.

Refs #147

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:53:43 +02:00
hatiyildiz
fabedd42c1 feat(admin,billing): per-Sovereign voucher issuance for sovereign-admin
#115 extends the existing PromoCode (voucher) admin surface so a
sovereign-admin role can issue, list, and revoke vouchers on a
franchised Sovereign. No new endpoints, no new schema, no new CRD —
all the changes are role-gating widenings on the existing surface.

Backend (core/services/billing/handlers/handlers.go):

- New `requireVoucherIssuer` helper accepts both `superadmin` and
  `sovereign-admin`. Used by AdminListPromos, AdminUpsertPromo, and
  AdminDeletePromo only. All other admin endpoints (Stripe settings,
  revenue, orders) keep the existing `requireAdmin` (superadmin-only).

UI (core/admin/src/components/AdminShell.svelte + BillingPage.svelte):

- AdminShell now accepts both roles. Sidebar nav is filtered by role:
  superadmin sees Revenue / Catalog / Tenants / Orders / Billing;
  sovereign-admin sees only Billing. Filtering is via a
  `superadminOnly` flag on each nav item (defence-in-depth: even if
  a sovereign-admin guesses a URL, the backend's requireAdmin will
  return 403).

- BillingPage hides the Stripe Configuration section for
  sovereign-admin (it would 403 from GET /billing/admin/settings
  anyway). The Vouchers (Promo Codes) section is shown to both roles
  with a small label tweak ("Issued vouchers are scoped to this
  Sovereign" for sovereign-admin).

Per docs/INVIOLABLE-PRINCIPLES.md §1 (target-state shape, no MVP)
and §3 (follow documented architecture exactly) — this matches the
FRANCHISE-MODEL.md design where "every franchised Sovereign runs the
same admin app" with role-based gating.

Closes #115.
2026-04-28 13:52:19 +02:00
hatiyildiz
7646840ffe feat(consolidation): move 8 SME backend services + shared module to public repo
Per docs/PROVISIONING-PLAN.md and tickets [B] sme-backend group. Migrates the 8 Go backend services from openova-private/services/ to openova/core/services/, plus the shared module they all depend on, plus the services-build CI workflow.

What moved:
- services/auth → core/services/auth (Go HTTP service for SME marketplace authentication)
- services/billing → core/services/billing (Go HTTP service for billing + voucher backend)
- services/catalog → core/services/catalog (Go HTTP service for App catalog)
- services/domain → core/services/domain (Go HTTP service for tenant domain mapping)
- services/gateway → core/services/gateway (Go HTTP gateway with rate limiting)
- services/notification → core/services/notification (Go HTTP service with email templates)
- services/provisioning → core/services/provisioning (Go HTTP service that commits tenant Application manifests via Gitea/GitHub API)
- services/tenant → core/services/tenant (Go HTTP service for tenant lifecycle)
- services/shared → core/services/shared (shared Go module: db, events, health, middleware, respond)
- 9 go.mod files updated: module github.com/openova-io/openova-private/services/<X> → github.com/openova-io/openova/core/services/<X>
- 9 go.sum and import paths similarly updated
- replace directives updated: openova-private/services/shared → openova/core/services/shared
- sme-services-build.yaml workflow → services-build.yaml in .github/workflows/, paths/context/image-base/deploy paths all repointed at core/services + ghcr.io/openova-io/openova/services-* + products/catalyst/chart/templates/sme-services
- All 8 manifests in products/catalyst/chart/templates/sme-services/ updated: image refs ghcr.io/openova-io/openova-private/sme-{X} → ghcr.io/openova-io/openova/services-{X}
- provisioning.yaml GITHUB_REPO env var: "openova-private" → "openova"

Closes [B] sme-backend (10 tickets).

After this commit, all 14 user-facing + backend Catalyst-Zero modules build from this public repo:
- 4 UIs: console, admin, marketplace, catalyst-ui
- 2 backends: marketplace-api, catalyst-api
- 8 SME services: auth, billing, catalog, domain, gateway, notification, provisioning, tenant
- 1 shared Go module

Note: 1 line in core/services/provisioning/main.go retains a literal default of "openova-private" for the GITHUB_REPO fallback when env var is unset; the K8s manifest sets GITHUB_REPO=openova explicitly so this path is never exercised in the deployed runtime, and the in-code default will be cleaned up in a follow-up.
2026-04-28 12:30:32 +02:00
hatiyildiz
3c2f7e4cda feat(consolidation): Phase 1 — move Catalyst-Zero apps + CI + manifests into public monorepo
Per docs/PROVISIONING-PLAN.md Phase 1. Catalyst-Zero (the running deployment on Contabo k3s, namespaces catalyst/sme/marketplace/website) source code now lives in this public repo. Cutover to public-repo CI builds happens in Phase 2.

What moved (from openova-private → openova):
- apps/console/ → core/console/ (Astro+Svelte UI)
- apps/admin/ → core/admin/ (Astro+Svelte UI, includes canonical voucher/billing/tenants admin surface)
- apps/marketplace/ → core/marketplace/ (Astro+Svelte UI, 5-step Plan→Apps→Addons→Checkout→Review flow)
- website/marketplace-api/ → core/marketplace-api/ (Go backend with handlers/, provisioner/, store/)
- clusters/contabo-mkt/apps/catalyst/ → products/catalyst/chart/templates/ (catalyst-{ui,api} K8s manifests)
- clusters/contabo-mkt/apps/sme/services/ → products/catalyst/chart/templates/sme-services/ (15 manifests)
- clusters/contabo-mkt/apps/marketplace-api/ → products/catalyst/chart/templates/marketplace-api/
- 5 CI workflows (catalyst-build, marketplace-api-build, sme-{admin,console,marketplace}-build) → .github/workflows/, renamed to drop "sme-" prefix

Image refs updated:
- ghcr.io/openova-io/openova-private/catalyst-{ui,api} → ghcr.io/openova-io/openova/catalyst-{ui,api}
- ghcr.io/openova-io/openova-private/sme-{admin,console,marketplace} → ghcr.io/openova-io/openova/{admin,console,marketplace}
- ghcr.io/openova-io/openova-private/marketplace-api → ghcr.io/openova-io/openova/marketplace-api

Workflow path updates:
- paths: 'apps/{X}/**' → 'core/{X}/**'
- context: apps/{X} → core/{X}
- deploy paths: clusters/contabo-mkt/apps/{X}/.../{X}.yaml → products/catalyst/chart/templates/.../{X}.yaml
- deploy commit: git add clusters/ → git add products/

Deferred to follow-up phase:
- 8 legacy SME backend services (auth, billing, catalog, domain, gateway, notification, provisioning, tenant) keep their ghcr.io/openova-io/openova-private/sme-* image refs because their source code in openova-private/services/ has not yet been migrated to public repo. Tracked via TODO in core/README.md migration history.
- sme-services-build.yaml NOT migrated (matches deferred services).

Documentation updates:
- core/README.md rewritten to describe what's actually in this directory now (4 deployed modules, not the old Go-monorepo placeholder design)
- products/catalyst/README.md created with migration status table
- products/catalyst/chart/Chart.yaml created (umbrella bp-catalyst-platform chart)
- docs/IMPLEMENTATION-STATUS.md §1 + §2.1 + §6 updated: console/admin/marketplace/marketplace-api/catalyst-{ui,api} all flipped from 📐 to 🚧 (deployed but not yet wired to unified Catalyst contract); openova Sovereign description rewritten to make Catalyst-Zero status explicit; omantel target updated to omantel.omani.works on Hetzner.

Verification:
- 99 source files copied (verified via git ls-files count)
- All image refs updated except the 8 deferred legacy SME backend services (verified via grep openova-private)
- Workflow naming reflects unified Catalyst (no more "sme-" prefix)

Phase 2 next: trigger public-repo CI builds, GHCR images published under openova/ namespace, Flux source on Catalyst-Zero repointed to this repo, rolling update of Contabo pods to new image SHAs. Catalyst-Zero becomes self-built from the public repo.
2026-04-28 12:08:09 +02:00
hatiyildiz
b00ec8f4df docs(pass-30): core/README catalyst-provisioner scope confusion + neo4j clean
core/README.md "User journeys" table had: "Sovereign bootstrap | Phase 0
done by catalyst-provisioner; this codebase contains the OpenTofu modules
under apps/provisioning/opentofu/..." — conflating two distinct services.

Per SOVEREIGN-PROVISIONING.md §2, catalyst-provisioner is a separate
Blueprint (bp-catalyst-provisioner) — explicitly "not part of any
Sovereign at runtime" — and lives outside core/. The core/apps/provisioning/
service is for runtime Application provisioning (validate configSchema,
compose manifests, commit to Environment's Gitea repo), an entirely
different concern from Phase 0 Sovereign bootstrap. Rewritten to call out
the separation.

platform/neo4j/README.md: clean.

Recurring shorthand note: ws.<env>.> JetStream subjects in core/README +
ARCHITECTURE (5 instances) treated as documented shorthand — precise form
per NAMING §11.2 is ws.{org}-{env_type}.>. Tightening deferred.

Validation log Pass 30 entry added.
2026-04-27 22:32:22 +02:00
hatiyildiz
27325edb32 docs(iter-2): glossary alignment — rename workspace-controller, fix definitions
GLOSSARY.md line-by-line audit. Eight corrections.

1. workspace-controller → environment-controller everywhere. The
   controller reconciles the Environment CRD; "workspace" is banned as
   a Catalyst scope, so it cannot be in a component name either. Fixed
   in: GLOSSARY, ARCHITECTURE, PLATFORM-TECH-STACK, NAMING-CONVENTION,
   SOVEREIGN-PROVISIONING, IMPLEMENTATION-STATUS, core/README,
   BUSINESS-STRATEGY. Banned-term entry in GLOSSARY now explicitly
   covers component names too.

2. "workspace repos" (per-Environment Gitea repos) → "Environment
   Gitea repos" in GLOSSARY, PLATFORM-TECH-STACK.

3. JWT claim {workspace, org, role} → {environment, org, role} in
   ARCHITECTURE projector diagram.

4. OpenOva definition refined: was "Never used to name a product",
   which contradicted "OpenOva Catalyst", "OpenOva Cortex". Now: brand
   prefix in product names; bare "OpenOva" = the company; bare
   "Catalyst" = the platform.

5. Catalyst definition completed: was missing provisioning, billing,
   gitea, observability — now lists all 14 control-plane components,
   pointing at the table below.

6. Catalyst components table: added `provisioning` (validates
   configSchema, commits to Environment Gitea); reordered to match
   ARCHITECTURE §3 grouping; clarified each component's source-of-truth
   (catalog-svc reads monorepo + Gitea, blueprint-controller watches
   monorepo + Gitea, etc.).

7. Environment definition: refers to NAMING §2.4 for env_type values;
   removed inline list that didn't match canonical ordering. Added
   concrete examples (acme-prod, acme-dev, bankdhofar-uat).

8. Application example: dropped "RocketChat" which appeared nowhere
   else; replaced with generic "running deployment" plus the
   established WordPress / Postgres examples.

9. sovereign-admin description: was "runs Crossplane" — Crossplane is
   platform plumbing not user-facing. Now: "manages the underlying
   clusters via Crossplane (which is platform plumbing, not a
   user-facing surface)".

Banned-term coverage:
- "Workspace" entry now covers BOTH the Catalyst scope AND component
  naming (workspace-controller → environment-controller).

Refs #37
2026-04-27 21:06:09 +02:00
hatiyildiz
2c4902b409 docs(iter-1): add IMPLEMENTATION-STATUS, fix wrong-org refs, reconcile monorepo
First validation iteration. Three concrete corrections.

1. Add docs/IMPLEMENTATION-STATUS.md as the bridge between target
   architecture and current code state. Status legend ( / 🚧 / 📐 / ⏸)
   applied per-component. Catalyst control plane = mostly 📐. Component
   READMEs = 🚧 (README only, no Blueprint manifests yet). products/axon
   =  (only product with real code). core/ = 📐 (just .gitkeep).

2. Status banner added to ARCHITECTURE, SECURITY, SOVEREIGN-PROVISIONING,
   BLUEPRINT-AUTHORING, PERSONAS-AND-JOURNEYS, PLATFORM-TECH-STACK, SRE
   pointing readers at IMPLEMENTATION-STATUS.md before they treat any
   described feature as built. GLOSSARY also references it.

3. Architectural decision (Option A — monorepo canonical):
   - Each platform/<name>/ and products/<name>/ folder is the source of
     ONE Blueprint, published as ghcr.io/openova-io/<name>:<semver> by
     CI fan-out from the monorepo root.
   - BLUEPRINT-AUTHORING.md §1, §2, §13 rewritten to match.
   - README.md "what's in this repo" rewritten to clarify monorepo +
     OCI-fan-out shape; no longer claims every directory is a Blueprint
     in a way that contradicts BLUEPRINT-AUTHORING.

Wrong-org fixes (3 places):
   - docs/PERSONAS-AND-JOURNEYS.md:13   github.com/openova → openova-io
   - docs/BLUEPRINT-AUTHORING.md:13     github.com/openova → openova-io
   - docs/BLUEPRINT-AUTHORING.md:404    github.com/openova → openova-io
   - docs/BLUEPRINT-AUTHORING.md ghcr.io/openova/* (3 refs) → openova-io

API group consistency:
   - All references unified to catalyst.openova.io/v1alpha1
     (was mixed v1 / v1alpha1; v1alpha1 is correct since the CRDs are
     design-stage with no implementation).

core/README.md updated to honestly describe the directory tree as
"target structure with .gitkeep placeholders" rather than implying
the apps/console, apps/projector, etc. binaries already exist.
The legacy apps/bootstrap and apps/manager directories are
acknowledged as transitional placeholders that will be removed when
the new apps/ layout is scaffolded.

CLAUDE.md and .claude/project-memory.md updated to put
IMPLEMENTATION-STATUS.md second in the read-first ordering.

Refs #37
2026-04-27 20:43:31 +02:00
hatiyildiz
039a724f31 docs: rewrite repository foundation around Catalyst as the platform
Repositions the public repo's identity. OpenOva is the company; Catalyst
is the platform. Sovereign is a deployed Catalyst. The historical
positioning (OpenOva = platform, Catalyst = bootstrap+IDP+lifecycle
sub-product) is retired. Catalyst now subsumes bootstrap, lifecycle, and
IDP responsibilities into one control plane.

- README.md             Catalyst-first front door. Sovereign concept,
                        repo structure, stack at a glance, cloud
                        provider matrix, getting-started paths
                        (managed via marketplace.openova.io vs
                        self-host via catalyst-provisioner).

- CLAUDE.md             Codebase guide for Claude. Banned-term table,
                        commit conventions (hatiyildiz default for
                        public repo), the no-fourth-surface rule,
                        per-component README rule of thumb.

- .claude/project-memory.md   Reduced to an index + decision log;
                        full architecture moved to docs/. Stack
                        decisions locked (NATS JetStream, OpenBao,
                        SPIFFE/SPIRE, per-Org Keycloak SME / per-
                        Sovereign corporate, Crossplane only IaC,
                        no Terraform/Pulumi user-facing surface).

- core/README.md        Catalyst control-plane Go application. Drops
                        the bootstrap-vs-manager split (both fold under
                        "Catalyst control plane"). Lists each component
                        deployable from this codebase: console,
                        marketplace, admin, projector, catalog-svc,
                        provisioning, workspace-controller, blueprint-
                        controller, billing. CRD list updated:
                        Sovereign / Organization / Environment /
                        Application / Blueprint / EnvironmentPolicy /
                        SecretPolicy / Runbook.

Refs #37
2026-04-27 20:05:58 +02:00
Emrah Baysal
54b1b4bd3d docs: add unified naming convention and align existing docs
- Add docs/NAMING-CONVENTION.md — canonical naming standard for all
  cloud resources, K8s objects, DNS, and tags across all providers.
  Covers dimension taxonomy (provider/region/building-block/environment),
  the Don't-Repeat-the-Parent principle, 4-char DNS location codes with
  full lookup table, multi-tenant scoping via namespace, and migration rules.

- Fix SRE.md: remove primary/DR region labels; clusters are named by
  building block (rtz/dmz/mgt), not failover role. Both regions run
  symmetric rtz clusters; k8gb owns traffic distribution.

- Fix PLATFORM-TECH-STACK.md: update both Mermaid diagrams and region
  table to use Region A / Region B (rtz cluster) language.

- Fix core/README.md: Platform CRD example now references cluster context
  names (hz-fsn-rtz-prod / hz-hel-rtz-prod) instead of primary/standby roles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:22:52 +01:00
talent-mesh
435f49738d feat: restructure platform to 52 components and 9 products
Technology forecast and strategic review restructure:
- Remove 13 components (backstage, mongodb, activemq, vitess, airflow, camel, dapr, superset, searxng, langserve, trino, lago, rabbitmq)
- Add 10 components (sigstore, syft-grype, nemo-guardrails, langfuse, reloader, matrix, ferretdb, litmus, livekit, coraza)
- Rename product: Synapse → Axon (SaaS LLM Gateway)
- Merge products: Titan + Fuse → Fabric (Data & Integration)
- New product: Relay (Communication)
- Replace Backstage with Catalyst IDP
- Replace MongoDB with FerretDB (MongoDB wire protocol on CNPG)
- Add supply chain security (Sigstore/Cosign, Syft+Grype)
- Add AI safety and observability (NeMo Guardrails, LangFuse)
- Add technology forecast 2027-2030 document
- Full verification pass: zero stale references across all docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:00:19 +00:00
talent-mesh
10245dff98 feat: ecosystem expansion to 55 components with license compliance
- Replace BSL-licensed components with open-source alternatives:
  Terraform→OpenTofu (MPL 2.0), Vault→OpenBao (MPL 2.0),
  Redpanda→Strimzi/Kafka (Apache 2.0), n8n→Airflow (Apache 2.0)
- Add 14 new platform components: activemq, camel, clickhouse, dapr,
  debezium, falco, flink, iceberg, opensearch, rabbitmq, superset,
  temporal, trino, vitess
- Rename meta-platforms/ to products/ with new product names:
  Cortex (AI Hub), Fingate (Open Banking), Titan (Data Lakehouse),
  Fuse (Microservices Integration)
- Update all documentation, READMEs, and cross-references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 18:15:11 +00:00
talent-mesh
535710289c feat: create OpenOva monorepo structure
Consolidate all component repos into a single monorepo:

- core/: Bootstrap + Lifecycle Manager application
- platform/: Individual component blueprints organized by category
  - networking/ (cilium, k8gb, external-dns, stunner)
  - security/ (cert-manager, external-secrets, vault, kyverno, trivy)
  - observability/ (grafana stack)
  - storage/ (minio, harbor, velero)
  - scaling/ (keda, vpa)
  - failover/ (failover-controller)
  - gitops/ (flux, gitea)
  - idp/ (backstage)
  - data/ (cnpg, mongodb, valkey, redpanda)
  - communication/ (stalwart)
  - iac/ (terraform, crossplane)
  - identity/ (keycloak)
- meta-platforms/: Bundled vertical solutions
  - ai-hub/ (enterprise AI platform)
  - open-banking/ (PSD2/FAPI fintech sandbox)
- docs/: Platform documentation (PLATFORM-TECH-STACK.md, SRE.md)

All internal links updated to use relative paths within monorepo.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 10:53:18 +00:00