Commit Graph

17 Commits

Author SHA1 Message Date
e3mrah
a57d05d4dd
fix(provisioning,catalog): parent-kustomization prefix collision + disable openclaw/stalwart-mail (#1043)
Two bugs surfaced live 2026-05-06 on tenant "test":

1) UpdateParentKustomization used substring match against "  - <slug>",
   which falsely "found" the slug when it was a PREFIX of an existing
   entry. Adding "test" to a file already listing "test11" or "test13"
   silently no-op'd. Result: tenant manifests committed but the
   tenants/kustomization.yaml never registered them, Flux's tenants
   Kustomization couldn't apply the new tenant, vCluster step timed
   out at 10m. Fix: exact line match on the resources entry.

2) openclaw + stalwart-mail were flagged Deployable=true in #941 but
   never had AppSpec entries in core/services/provisioning/gitops/apps.go
   KnownApps. The SME provisioning generator emits a single-Deployment
   template that requires Image + Port; for those two slugs it produced
   invalid manifests:

     Deployment.apps "openclaw" is invalid:
     containers[0].image: Required value
     containers[0].ports[0].containerPort: Required value

   tenant-test11-apps Kustomization rejected the dry-run, no apps ever
   landed inside the vcluster. Re-enabling these requires per-app
   overlay support beyond the single-Deployment template — separate
   work. For now: comment them out of DeployableAppSlugs so the catalog
   seed flips them back to Deployable=false on next pod restart and the
   marketplace UI shows them as COMING SOON.

Adds regression tests for both: prefix-collision in
UpdateParentKustomization, and a stability test on the deployable map
shape.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:21:39 +04:00
e3mrah
ff0e90156d
fix(provisioning): re-read parent kustomization on commit retry — prevent slug-resurrection race (#1034)
Live race seen 2026-05-06: bookcheck teardown committed at T (removed
the slug from tenants/kustomization.yaml + pruned its directory).
Multitest provision's first commit attempt at T-2s got a ref-race
rejection, the github client's retry replayed the SAME files map (which
held the pre-teardown parent kustomization with bookcheck still in it),
and the retry's commit at T+5s overwrote the teardown's removal. Result:
the parent kustomization listed bookcheck but the directory was gone,
Flux's tenants Kustomization wedged in build-failure loop, and EVERY
subsequent tenant change was blocked until manually unblocked.

Add CommitFilesWithPruneAndRebuild — same as CommitFilesWithPrune but
takes a `rebuild(ctx) (files, error)` callback invoked at the start of
each attempt. Wire both consumer paths (provision + teardown) through
it; each rebuild re-reads parent kustomization.yaml against the current
HEAD and re-applies UpdateParentKustomization / RemoveTenantFromParentKustomization
fresh. Static tenant-scoped manifests still flow through unchanged.

CommitFilesWithPrune is preserved as a thin wrapper for callers that
ship truly static files (e.g. day-2 app installs scoped to a tenant
subdir, no parent merge involved).

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:28:35 +04:00
e3mrah
f1744c8973
fix(provisioning): BookStack — also emit DB_USERNAME/DB_PASSWORD (Laravel-native) (#1031)
PR #1028 fixed the APP_KEY halt and switched to DB_USER/DB_PASS, but
linuxserver/bookstack's init script does NOT substitute DB_USER →
DB_USERNAME in the .env file. Laravel reads env vars natively but
using DB_USERNAME / DB_PASSWORD (Laravel-canonical names). Without
those, Laravel falls back to the .env placeholder values
(database_username / database_user_password) and the app fails with:

  SQLSTATE[HY000] [1045] Access denied for user 'database_username'@...

Caught live on tenant 'bookcheck' 2026-05-06 after PR #1028 deployed —
pod ran, app started, but every request hit the placeholder credentials.

Emit BOTH name pairs so the env works regardless of which the LSIO
upstream eventually wires up.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:59:14 +04:00
e3mrah
b180d56926
fix(provisioning): BookStack overlay — add DB_* envs + APP_KEY + APP_URL (#1028)
linuxserver/bookstack reads DB_HOST/DB_USER/DB_PASS/DB_DATABASE
(NOT WORDPRESS_DB_*) and halts init with "The application key is
missing, halting init!" when APP_KEY isn't set. The pod stays 1/1
Running because the readiness probe doesn't catch the silent halt,
but the application never binds to port 80, so the ingress returns
502. Discovered via live E2E on tenant 'aaa' (BookStack on m plan):
all 7 provisioning steps reported done, ingress healthy, cert ready,
but https://aaa.omani.rest → 502.

Add a "bookstack" DBEnvStyle case in the mysql env-emitter that
writes DB_*, APP_URL=https://<slug>.omani.rest, and a Laravel-format
APP_KEY (base64:<32-byte>). Also add a randomAppKey() helper alongside
randomHex(). Tag the catalog AppSpec with DBEnvStyle: "bookstack".

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:49:35 +04:00
e3mrah
c9b8c13406
fix(tenant): JWT-bypass /tenant/internal/* — paid checkouts never provisioned (#1018) (#1019)
Billing's dispatchOrderPlaced enriches the order.placed NATS event by
calling /tenant/internal/tenants/<id>/subdomain over the in-cluster
ClusterIP. routes.go registers that path with the comment "Internal —
unauthenticated service-to-service", but main.go wraps everything
under /tenant/ in JWTAuth except /tenant/check-slug/. So billing got
401, returned "" for the subdomain, published order.placed with
subdomain="", and provisioning rejected every paid checkout with
"invalid subdomain expected=[a-z][a-z0-9-]{2,30}".

Add /tenant/internal/ to the public-paths bypass. Both gateways
already 401 the path externally, and subdomain values are public DNS
names — the documented threat model.

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:09:55 +04:00
e3mrah
689276889c
fix(bp-catalyst-platform+bp-newapi): unblock alice signup gates 2-6 on Sovereigns (#915) (#951)
Six coupled chart + orchestrator fixes that unblock alice marketplace
signup → tenant ready → SaaS integrations → LLM → ledger on a freshly
franchised Sovereign. C5-final got Gate 1 GREEN on otech113 (2026-05-05)
but every downstream gate failed because the SME bundle hardcoded
contabo-only assumptions.

Bumps:
  - bp-catalyst-platform 1.4.21 → 1.4.22
  - bp-newapi             1.3.0 → 1.4.0
  - bootstrap-kit slot 13 + 80 pins updated in lockstep

Issues addressed (single consolidated PR — smaller PRs would race
against alice signup retries):

  - #934 (auth SMTP empty → "failed to send email"): sme-secrets.yaml
    now reads SMTP_* from `catalyst-system/sovereign-smtp-credentials`
    (the same A5-seeded source #883/#905 the chart 1.4.20 catalyst-
    openova-kc-credentials Secret already uses) with source-wins
    precedence. Both canonical (smtp-host/port/from/user/pass) AND
    legacy (host/port/from/user/password) source-Secret key shapes
    accepted. Empty source falls back to chart-level defaults so the
    contabo path stays clean.

  - #940 (provisioning service GITHUB_TOKEN placeholder + hardcoded
    upstream github.com): chart values
    .Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
    repo,branch}} make every GitHub-API coordinate operator-overridable
    with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
    API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
    Provisioning binary's startup gate validates the GITHUB_TOKEN does
    NOT contain placeholder substrings (<placeholder>, PLACEHOLDER,
    REPLACE_ME, ...) and crashes the Pod into Pending if it does — the
    operator sees the misconfig immediately instead of after alice
    signups have failed silently in service logs. GitHub client now
    accepts a custom API URL via NewClientWithAPIURL so Gitea's GitHub-
    compatible /api/v1 surface drops in without re-implementing the
    client.

  - #941 (catalog "27 apps COMING SOON"): added `openclaw` and
    `stalwart-mail` to migrateAppDeployable's deployable map at
    core/services/catalog/handlers/seed.go. Both blueprints (bp-openclaw,
    bp-stalwart-{sovereign,tenant}) ship with visibility=listed in the
    embedded blueprints.json AND have working SME-tenant overlay
    templates in sme_tenant_gitops.go, but the catalog handler silently
    filtered them out because they were missing here. Map extracted to
    DeployableAppSlugs() exported function so unit tests can assert
    membership without invoking a Mongo store.

  - #942 (REDPANDA_BROKERS hardcoded to talentmesh): configmap.yaml
    selects broker default at render time based on global.sovereignFQDN
    — Sovereign ⇒ NATS JetStream Service per ADR-0001 (the only local
    bus on Sovereigns); contabo ⇒ legacy Redpanda Service in talentmesh.
    Operator MAY override either default via
    .Values.smeServices.eventBus.brokers without forking the chart.
    The ConfigMap key name stays REDPANDA_BROKERS for back-compat with
    existing SME service Go env wiring; new EVENT_BUS_PROTOCOL key
    surfaces the protocol hint for services that want to switch wire
    format independently.

  - #943 (bp-newapi silently skips Deployment): NEW
    templates/cnpg-cluster.yaml auto-provisions a CNPG-backed Postgres
    Cluster + Helm-`lookup`-persistent DSN Secret when
    .Values.cnpg.enabled (DEFAULT true). NEW templates/credentials-
    secret.yaml auto-generates SESSION_SECRET + CRYPTO_SECRET (each
    64-char randAlphaNum, persistent across reconciles via Helm
    `lookup`) when .Values.credentials.autoProvision (DEFAULT true).
    deployment.yaml gate now resolves Secret names from the chart-
    emitted defaults when the operator hasn't supplied an override.
    Capabilities-gated on postgresql.cnpg.io/v1 so a cold install
    before bp-cnpg is Ready surfaces as "no Cluster yet" rather than
    a hard install error.

  - #944 (CRITICAL — cross-cluster pollution): provisioning.yaml
    templates GIT_BASE_PATH from
    .Values.smeServices.provisioning.gitBasePath with a topology-aware
    default `clusters/<sovereignFQDN>/sme-tenants` on Sovereigns. NEW
    `core/services/provisioning/gitguard` package validates at startup
    AND on every commit code path that the path begins with
    `clusters/<self-FQDN>/` — refusing to commit to any other cluster's
    tree. Defence in depth so a runtime env mutation (kubectl exec,
    ConfigMap update without Pod restart, hostile sidecar) cannot
    bypass the check. Pre-#944 every alice tenant overlay landed in
    upstream openova/openova `clusters/contabo-mkt/tenants/<id>/`
    which contabo Flux would then install on the contabo cluster —
    C5-final caught + reverted the alice2 incident at commit 5715db04.

Tests:
  - core/services/provisioning/gitguard: 22 cases covering Sovereign
    + contabo + traversal + prefix-collision + placeholder token
  - core/services/catalog/handlers: openclaw/stalwart-mail in
    deployable map + stable-shape lock against accidental deletes
  - helm-template smoke pass: bp-newapi (default values renders
    Deployment + auto-provisioned Secrets); bp-catalyst-platform
    (Sovereign render shows GIT_BASE_PATH=clusters/otech113.../sme-
    tenants, REDPANDA_BROKERS=nats-jetstream..., GITHUB_OWNER=openova,
    GITHUB_API_URL=http://gitea-http...)

Closes #934 #940 #941 #942 #943 #944
Refs umbrella #915

Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:27:23 +04:00
e3mrah
95a06f56f8
fix(sme-marketplace): unblock PIN signin — route /api/* to sme/gateway + add send-pin alias (#868) (#869)
Two-part fix for marketplace UI signin flow which 503'd then 404'd on
otech103. Live debugging found two stacked bugs.

Part A — chart (HTTPRoute backend):
- marketplace-routes.yaml: /api/* rule now backendRefs sme/gateway:8080
  (cross-namespace) instead of catalyst-system/marketplace-api which had
  a Service selector matching zero Pods. The gateway in sme already
  fronts services-auth, catalog, tenant, billing, provisioning.
- marketplace-reference-grant.yaml: extend `to:` list with the gateway
  Service so the cross-ns hop is authorised by Gateway API.
- Bump bp-catalyst-platform 1.4.7 → 1.4.8 + lockstep slot 13 pin.

Part B — services-auth (route name):
- Add /auth/send-pin alias delegating to existing SendMagicLink handler,
  and /auth/verify-pin alias delegating to VerifyMagicLink. The
  marketplace UI surfaces a 6-digit PIN ("Send PIN" button), so the
  PIN-named routes are the canonical UX-facing names. /auth/magic-link
  and /auth/verify remain registered for backward compat.
- services-build workflow auto-rebuilds the auth image on push to
  core/services/** — no manual dispatch needed.

Refs: #868

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 08:22:17 +04:00
e3mrah
fa4395fa3a
fix(bp-catalyst-platform): wire VALKEY_PASSWORD into SME auth + gateway (#863) (#864)
After PR #862 (1.4.4) made cross-ns Valkey reachable from `sme` ns, the
auth Pod started CrashLoopBackOff with "NOAUTH HELLO must be called with
the client already authenticated". Root cause: bp-valkey 1.0.0 ships
auth.enabled=true (bitnami default) but SME service code + Deployment
templates never plumbed a password through.

Chart 1.4.4 -> 1.4.5. Slot 13 pin lockstep.

Changes:
- core/services/shared/db/valkey.go: add ConnectValkeyWithAuth overload
  taking username + password. ConnectValkey kept backwards-compatible
  for contabo-mkt's auth-less in-namespace Valkey.
- core/services/auth/main.go + gateway/main.go: read VALKEY_USERNAME +
  VALKEY_PASSWORD env, call ConnectValkeyWithAuth when password set,
  else fall through to no-auth path.
- NEW templates/sme-services/valkey-cross-ns-secret.yaml: Helm `lookup`
  reads bp-valkey's auto-generated `valkey-password` from the
  `valkey/valkey` Secret and re-emits it as `sme-valkey-auth` in `sme`
  ns. Same pattern as sme-secrets.yaml (#859) and gitea-admin-secret
  (#830 Bug 2). On first install the lookup may return nil; Flux's 15m
  reconcile picks up the mirror once bp-valkey is Ready.
- auth.yaml + gateway.yaml: add VALKEY_PASSWORD env from `sme-valkey-
  auth` Secret with optional=true so contabo-mkt's auth-less path keeps
  working when the mirror Secret is absent.
- values.yaml: add `smeServices.valkey.{sourceSecretName,
  sourcePasswordKey, destNamespace, destSecretName}` knobs (Inviolable
  Principle #4).

Live verified the failure mode on otech103: 11/13 SME pods Running 1/1,
auth in CrashLoopBackOff with NOAUTH HELLO error. Provisioning Pod's
CreateContainerConfigError is unrelated (ghcr-pull, separate ticket).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:09:38 +04:00
e3mrah
5cdb738ac9
fix(services): go mod tidy across sibling services after #798 shared deps bump (#821)
#798 added github.com/nats-io/nats.go to core/services/shared/go.mod and
adjusted x/sys/x/crypto/x/text to Go 1.22-compatible versions. The
sibling services (auth, catalog, domain, gateway, notification,
provisioning, tenant) reference the same shared module via the local
`replace` directive — their go.sum files must include the new transitive
hashes, otherwise the CI Containerfile build hits:

    go: updates to go.mod needed; to update it: go mod tidy

This commit is a pure `go mod tidy` across all 7 services; no source
changes. CI services-build is now unblocked.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:35:46 +04:00
e3mrah
9645a9044a
feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818)
* feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798)

Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add
the SME-2 metering integration end-to-end. NewAPI is consumed as the
upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned
mirror, not a fork) — the metering envelope is produced by a Go sidecar
that observes the OpenAI-style `usage.total_tokens` field on every
2xx /v1/* response. This avoids forking the upstream binary while still
producing the canonical envelope shape on `catalyst.usage.recorded`.

A) NewAPI metering sidecar — core/services/metering-sidecar/
   - Transparent reverse proxy in front of NewAPI on its own port; the
     bp-newapi Service routes the cluster-fronting port to the sidecar,
     which forwards to NewAPI on the pod's loopback.
   - Observes successful /v1/* JSON responses, parses
     `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes
     amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes
     one envelope on `catalyst.usage.recorded` per completed request.
   - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed.
   - Customer-facing latency is NEVER blocked on metering: the response
     body is restored before publish; on NATS unreachable the envelope
     is persisted to disk and retried by a background drain loop.
   - 14 unit tests (proxy + publisher + safeFilename guards).

B) sme-billing NATS subscriber — core/services/billing/handlers/
   metering_consumer.go
   - JetStream durable consumer `sme-billing-metering` on stream
     `CATALYST_USAGE` (provisioned by sme-billing on startup).
   - Idempotent on metadata.request_id via a UNIQUE partial index on
     credit_ledger.external_ref; redelivery from the broker collapses
     to a single ledger row.
   - Customer auto-create on cold start (the rbac sme.user.created
     envelope may land AFTER the first metered request; we don't strand
     usage waiting for it).
   - 11 unit tests covering happy-path, idempotency, malformed-payload
     poison-pill, missing-request-id, non-negative amount guard,
     resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak.

C) HTTP handler POST /billing/metering/record — handlers/metering.go
   - Synchronous validate → INSERT credit_ledger → return
     {ledger_entry_id, balance_after_omr, balance_after_micro_omr,
     duplicate}. Same payload + idempotency guard as the NATS path.
   - Auth: superadmin OR sovereign-admin (operator-admin model;
     end-user LLM traffic flows through the sidecar, never this URL).
   - 8 unit tests covering happy-path, idempotency, role gating,
     malformed-JSON, positive-amount rejection, customer-not-found.

D) Schema — core/services/billing/store/store.go
   - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT
     (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR
     exact integer — preserves precision at metering rates).
   - ADD COLUMN external_ref TEXT + UNIQUE partial index for
     idempotency dedup.
   - ADD COLUMN metadata JSONB for the raw envelope.
   - GetCreditBalance projects both amount_omr (legacy) and
     amount_micro_omr (new) into the integer-OMR view.
   - GetCreditBalanceMicroOMR returns canonical precision.
   - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0)
     distinguishes fresh insert from duplicate without a follow-up
     SELECT.

E) Wiring
   - core/services/shared/events/nats.go — minimal NATS JetStream
     publisher + subscriber surface; legacy RedPanda producer/consumer
     in events.go untouched per [Q-mine-3].
   - core/services/billing/main.go — NATS_URL env; subscriber wired
     in parallel with the existing RedPanda tenant-events consumer.
   - middleware/jwt.go — exported test helper WithClaims so handler
     tests can construct an authenticated context without minting a
     real signed token.
   - .github/workflows/services-build.yaml — metering-sidecar added
     to the build matrix; deploy job skips it (image consumed by the
     bp-newapi chart, not products/catalyst sme-services).

F) bp-newapi chart (1.0.0 → 1.1.0)
   - meteringSidecar block in values.yaml: image, port, NATS URL,
     priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool
     dir, header names, resources, securityContext (read-only-rootfs).
   - deployment.yaml renders the sidecar container + emptyDir spool
     volume when meteringSidecar.enabled (default true).
   - service.yaml routes the cluster-fronting :3000 to the sidecar
     when enabled, exposes a separate :3001 → NewAPI direct port for
     bp-catalyst-platform admin-API traffic (ADR-0003 §3.2).
   - networkpolicy.yaml allows the sidecar's port + nats-system
     egress for JetStream publish.

Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green.
Helm template renders cleanly with sidecar enabled and disabled.

Closes #798

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798)

Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which
lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000")
that does NOT scan directly into Go int64 — the integration test
TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on
the post-redeem balance read.

Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is
unambiguously bigint and Scan target stays uniform across pre-#798 rows
(amount_omr only) and post-#798 rows (amount_micro_omr present).

Affects:
  - GetCreditBalance
  - GetCreditBalanceMicroOMR
  - RecordUsage's running-balance read

Test mocks updated to match the new SQL prefix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:32:42 +04:00
e3mrah
2a034a0959
feat(catalog): unified catalog with Published flag — operator curates marketplace (#710 wave 2) (#724)
Single source of truth for apps; Sovereign-console operator decides which
apps marketplace customers see; marketplace storefront filters by
Published. Per founder rule 2026-05-04: unpublish is a marketplace-
visibility toggle, not a deployment-lifecycle action — existing tenant
deployments of an unpublished app keep running unaffected.

core/services/catalog/store/store.go
====================================
- App.Published bool — operator-controlled visibility
- ListPublishedApps: marketplace-storefront subset
  (Published=true AND System=false AND Deployable=true).
  System and Deployable are catalog-team-controlled; Published is the
  operator's curation knob.
- SetAppPublished(slug, bool) — hot-path one-bit write the Sovereign
  console hits per row toggle. Cheaper than UpdateApp; slug-keyed so
  the UI doesn't need the internal Mongo _id.
- UpdateApp: thread published through full-update path too.

core/services/catalog/handlers/handlers.go + routes.go
======================================================
- ListApps now honours ?published=true query param:
    GET /catalog/apps                  → operator view: every app
    GET /catalog/apps?published=true   → marketplace view: filtered
- New PATCH /catalog/admin/apps/{slug}/publish?value={true|false}
  for the Sovereign-console operator's row toggle.
- requireAdmin gating preserved on the admin endpoint.

core/services/catalog/handlers/seed.go
======================================
- migrateAppPublished: defaults Published=true on every existing app
  on the day Catalyst 1.3.x ships. Operators opt OUT of marketplace
  visibility per app, not IN — matches how a real SaaS storefront is
  curated and prevents an empty marketplace on flag-introduction day.
  Idempotent on re-run.

core/marketplace/src/lib/api.ts
================================
- getApps() now hits /catalog/apps?published=true so the marketplace
  storefront only renders the operator-curated subset.

DoD pending wave 2.5
====================
The Sovereign-console "Catalog & publishing" admin page (per-row
toggle UI) is the next chunk and ships in a follow-up — backend +
storefront filter are the load-bearing change here. Catalog admins
can flip the flag today via the PATCH endpoint; the per-row UI is
quality-of-life on top.

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
2026-05-04 11:37:03 +04:00
Emrah Baysal
9519c1ef00 merge: Group L testing (Playwright e2e smoke tests, Hetzner provisioning test scaffold gated on HETZNER_TEST_TOKEN secret, integration tests for bootstrap installer + Dynadot + voucher) 2026-04-28 14:05:59 +02:00
hatiyildiz
7edf63ca7e docs(franchise),test(billing): voucher CRD propagation invariant
#118 verifies that the voucher shape on a franchised Sovereign is
identical to Catalyst-Zero. Two artefacts:

1. New §"Voucher shape propagates automatically" in
   docs/FRANCHISE-MODEL.md explaining WHY there is no propagation
   problem to solve: vouchers are not a CRD. They are rows in the
   per-Sovereign billing service's Postgres database, and every
   Sovereign runs the same SHA-pinned core/services/billing image.
   Same image → same migration → same schema → same handlers → same
   shape. The doc lists which file owns each part of the shape and
   includes a 4-step curl smoke test to run on any Sovereign at
   first-provisioning to confirm the invariant holds.

2. New core/services/billing/handlers/vouchers_test.go covering the
   public POST /billing/vouchers/redeem-preview endpoint added in
   #117. Four cases:
   - 404 on unknown / soft-deleted code (no tombstone leak)
   - 200 on a valid live code, asserting the public shape excludes
     times_redeemed and max_redemptions (defence-in-depth against
     enumeration)
   - 410 Gone on a code that exists but has hit its cap, with the
     credit/description still in the response so the landing page can
     show "campaign ended"
   - 400 on whitespace-only input

The tests run on every CI build of the billing service, on every
Sovereign that builds from this repo. If a future change drifts the
preview endpoint's shape, the tests fail before the regression can
ship.

Also tidies vouchers.go imports (removed two unused stdlib imports
that were placeholder).

Closes #118.
2026-04-28 13:59:31 +02:00
hatiyildiz
12387a4a74 feat(billing): /billing/vouchers/{issue,list,revoke,redeem-preview} surface
#117 adds a franchise-aligned URL surface for the existing PromoCode
voucher implementation, plus one new endpoint (redeem-preview) for the
public landing flow described in docs/FRANCHISE-MODEL.md §3.

The orchestrator's hint was right — the issue/list/revoke handlers
already exist (AdminUpsertPromo / AdminListPromos / AdminDeletePromo
on the legacy /billing/admin/promos surface). This commit:

1. Adds new endpoint handlers in core/services/billing/handlers/vouchers.go:
   - POST   /billing/vouchers/issue          (superadmin or sovereign-admin)
   - GET    /billing/vouchers/list           (superadmin or sovereign-admin)
   - DELETE /billing/vouchers/revoke/{code}  (superadmin or sovereign-admin)
   - POST   /billing/vouchers/redeem-preview (unauthenticated; public)

   The first three reuse the existing store-layer methods. The last is
   new — it validates a code without consuming it, returning a safe
   shape (no times_redeemed, no max_redemptions exposure) so an
   attacker scraping the public endpoint cannot enumerate cap status.

2. Distinguishes 404 (code never existed or soft-deleted — same
   tombstone-leak protection as #91) from 410 Gone (code exists but is
   inactive or capped). The 410 body still includes the credit and
   description so the landing page can show "this campaign has ended".

3. Keeps the legacy /billing/admin/promos endpoints in place — the
   existing admin UI continues to work without any breaking change.
   New code should target /billing/vouchers/...

4. Updates docs/FRANCHISE-MODEL.md to point to the new URL surface.

The actual REDEMPTION still happens transactionally inside POST
/billing/checkout via the `promo_code` field — that path locks the
promo row, inserts the promo_redemptions edge, increments
times_redeemed, and adds the credit_ledger entry in one transaction.
Splitting it into a separate /redeem endpoint would break that
atomicity, so we deliberately do not add one. The public redeem flow
is preview → signup → checkout-with-promo_code.

Closes #117.
2026-04-28 13:54:19 +02:00
hatiyildiz
3e956b7d81 test: voucher issuance integration test — real Postgres (#147)
Closes the Group L "integration test — voucher issuance via API — issue
→ redeem → Org created path" ticket.

Per docs/INVIOLABLE-PRINCIPLES.md principle #2 (no mocks where the test
would otherwise verify real behavior), this test runs against a real
PostgreSQL — not sqlmock. The voucher mechanic lives in
store.RedeemPromoCode which runs a transaction with SELECT FOR UPDATE on
promo_codes, COUNT lookup on promo_redemptions, and inserts into
credit_ledger. Mocking SQL strings doesn't verify whether the
transactional invariants actually hold under concurrent contention; this
codebase has been bitten by exactly that gap before (#93: counter
incremented before order was committed).

The test is gated on BILLING_TEST_PG_URL — when unset, it skips (NOT
mocks). CI populates it via the new postgres service container in
.github/workflows/test-billing-integration.yaml.

Each test gets its own Postgres schema (via CREATE SCHEMA + libpq's
options=-c search_path) so parallel runs don't cross-contaminate, and so
goroutine concurrency tests reliably hit the same schema regardless of
which pooled connection they pick up.

Coverage:
  - Issue → Redeem → Credit applied (the canonical happy path)
  - Per-customer double-redemption blocked
  - Redemption cap enforced under concurrency (12 goroutines fighting
    for a 5-cap voucher → exactly 5 successful redemptions, no more)
  - Soft-deleted codes rejected as "not found" (no tombstone leak per #91)
  - Inactive codes rejected with distinct "not active" error
  - Two different customers can each redeem the same voucher
  - Org-creation prerequisites: customer.tenant_id non-empty, balance > 0
    (these are the inputs the downstream tenant.created event consumer
    feeds into CreateTenant — covered by tenant-service consumer_test.go)

CI workflow added: .github/workflows/test-billing-integration.yaml runs
the tests against a postgres:16-alpine service container with -race.

Refs #147

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:53:43 +02:00
hatiyildiz
fabedd42c1 feat(admin,billing): per-Sovereign voucher issuance for sovereign-admin
#115 extends the existing PromoCode (voucher) admin surface so a
sovereign-admin role can issue, list, and revoke vouchers on a
franchised Sovereign. No new endpoints, no new schema, no new CRD —
all the changes are role-gating widenings on the existing surface.

Backend (core/services/billing/handlers/handlers.go):

- New `requireVoucherIssuer` helper accepts both `superadmin` and
  `sovereign-admin`. Used by AdminListPromos, AdminUpsertPromo, and
  AdminDeletePromo only. All other admin endpoints (Stripe settings,
  revenue, orders) keep the existing `requireAdmin` (superadmin-only).

UI (core/admin/src/components/AdminShell.svelte + BillingPage.svelte):

- AdminShell now accepts both roles. Sidebar nav is filtered by role:
  superadmin sees Revenue / Catalog / Tenants / Orders / Billing;
  sovereign-admin sees only Billing. Filtering is via a
  `superadminOnly` flag on each nav item (defence-in-depth: even if
  a sovereign-admin guesses a URL, the backend's requireAdmin will
  return 403).

- BillingPage hides the Stripe Configuration section for
  sovereign-admin (it would 403 from GET /billing/admin/settings
  anyway). The Vouchers (Promo Codes) section is shown to both roles
  with a small label tweak ("Issued vouchers are scoped to this
  Sovereign" for sovereign-admin).

Per docs/INVIOLABLE-PRINCIPLES.md §1 (target-state shape, no MVP)
and §3 (follow documented architecture exactly) — this matches the
FRANCHISE-MODEL.md design where "every franchised Sovereign runs the
same admin app" with role-based gating.

Closes #115.
2026-04-28 13:52:19 +02:00
hatiyildiz
7646840ffe feat(consolidation): move 8 SME backend services + shared module to public repo
Per docs/PROVISIONING-PLAN.md and tickets [B] sme-backend group. Migrates the 8 Go backend services from openova-private/services/ to openova/core/services/, plus the shared module they all depend on, plus the services-build CI workflow.

What moved:
- services/auth → core/services/auth (Go HTTP service for SME marketplace authentication)
- services/billing → core/services/billing (Go HTTP service for billing + voucher backend)
- services/catalog → core/services/catalog (Go HTTP service for App catalog)
- services/domain → core/services/domain (Go HTTP service for tenant domain mapping)
- services/gateway → core/services/gateway (Go HTTP gateway with rate limiting)
- services/notification → core/services/notification (Go HTTP service with email templates)
- services/provisioning → core/services/provisioning (Go HTTP service that commits tenant Application manifests via Gitea/GitHub API)
- services/tenant → core/services/tenant (Go HTTP service for tenant lifecycle)
- services/shared → core/services/shared (shared Go module: db, events, health, middleware, respond)
- 9 go.mod files updated: module github.com/openova-io/openova-private/services/<X> → github.com/openova-io/openova/core/services/<X>
- 9 go.sum and import paths similarly updated
- replace directives updated: openova-private/services/shared → openova/core/services/shared
- sme-services-build.yaml workflow → services-build.yaml in .github/workflows/, paths/context/image-base/deploy paths all repointed at core/services + ghcr.io/openova-io/openova/services-* + products/catalyst/chart/templates/sme-services
- All 8 manifests in products/catalyst/chart/templates/sme-services/ updated: image refs ghcr.io/openova-io/openova-private/sme-{X} → ghcr.io/openova-io/openova/services-{X}
- provisioning.yaml GITHUB_REPO env var: "openova-private" → "openova"

Closes [B] sme-backend (10 tickets).

After this commit, all 14 user-facing + backend Catalyst-Zero modules build from this public repo:
- 4 UIs: console, admin, marketplace, catalyst-ui
- 2 backends: marketplace-api, catalyst-api
- 8 SME services: auth, billing, catalog, domain, gateway, notification, provisioning, tenant
- 1 shared Go module

Note: 1 line in core/services/provisioning/main.go retains a literal default of "openova-private" for the GITHUB_REPO fallback when env var is unset; the K8s manifest sets GITHUB_REPO=openova explicitly so this path is never exercised in the deployed runtime, and the in-code default will be cleaned up in a follow-up.
2026-04-28 12:30:32 +02:00