Commit Graph

18 Commits

Author SHA1 Message Date
e3mrah
88a8ecd8bb
fix(cutover): Reflector-mirror harbor-admin Secret + in-cluster trigger endpoint (#935) (#947)
Two bugs surfaced live on otech113 2026-05-05 blocking Self-Sovereignty
Cutover end-to-end. Fix both in lockstep:

Bug 1 — bp-self-sovereign-cutover Step 02 (harbor-projects) Job in
`catalyst` namespace was hitting `secret "harbor-core" not found` for
11+ retries because the upstream Harbor `harbor-core` Secret only
exists in the `harbor` namespace and Kubernetes forbids cross-namespace
secretKeyRef. Step 02 was stuck in CreateContainerConfigError forever.

  Fix: bp-harbor 1.2.13 → 1.2.14 ships a Catalyst-curated `harbor-admin`
  Secret in the `harbor` namespace with Reflector mirror annotations
  (allowed-namespaces=catalyst, auto-enabled). The same Secret name
  auto-materialises in `catalyst` so the cutover Job's secretKeyRef
  resolves natively. Password is randomly generated on first install
  (32-char alphanum, 190 bits entropy per feedback_passwords.md) and
  preserved across reconciles via `lookup`. The upstream Harbor subchart
  consumes it via `existingSecretAdminPassword: harbor-admin`.
  bp-self-sovereign-cutover 0.1.16 → 0.1.17 updates
  `harbor.adminSecretRef.name` from `harbor-core` to `harbor-admin`.

Bug 2 — The 0.1.16 auto-trigger Helm post-install Job (#933) POSTed
/api/v1/sovereign/cutover/start which sits behind RequireSession
middleware. The Job has no human session cookie — every request 401'd
forever and cutover never started.

  Fix: new catalyst-api endpoint POST /api/v1/internal/cutover/trigger
  lives OUTSIDE RequireSession and validates the bearer token via the
  apiserver's TokenReview API + checks the resolved username matches
  the canonical `bp-self-sovereign-cutover-runner` SA. Same engine,
  same idempotency, same state machine — different auth surface.
  The auto-trigger Job now mounts its projected SA token at
  /var/run/secrets/kubernetes.io/serviceaccount/token and sends it
  as `Authorization: Bearer <token>`. SA username + accepted list are
  runtime-overridable per Inviolable Principle #4.

Tests
  - 6 Go unit tests for HandleCutoverInternalTrigger covering happy
    path, missing bearer (401), TokenReview rejection (502), wrong SA
    (403), idempotency (no Jobs created when complete), wrong method
    (405). All pass.
  - bp-harbor admin-secret contract test (5 cases) — Secret renders,
    HARBOR_ADMIN_PASSWORD key present, Reflector annotations, keep
    policy, upstream consumes via existingSecretAdminPassword.
  - bp-self-sovereign-cutover cutover-contract test extended with 3
    new cases — auto-trigger uses /internal/cutover/trigger, sends
    SA bearer token, references harbor-admin (not harbor-core).
  - All 12 cutover-contract gates green; all 4 observability-toggle
    gates green; helm template + helm lint clean on both charts.

Bootstrap-kit slot pins
  - clusters/_template/bootstrap-kit/19-harbor.yaml: 1.2.13 → 1.2.14
  - clusters/_template/bootstrap-kit/06a-bp-self-sovereign-cutover.yaml:
    0.1.16 → 0.1.17

Closes #935

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:12:50 +04:00
e3mrah
e9a72aa00d
feat(self-sovereign-cutover): auto-trigger on install + always-defined State (#933 E1) (#936)
Closes the otech113 dashboard regression where SovereigntyCard rendered
`invalid CutoverState: <undefined>` instead of a Tethered badge, and
makes the Day-2 cutover fire automatically once the chart lands rather
than waiting for an operator click on "Achieve True Sovereignty".

Founder rule per #933: handover is not "done" until cutover has run;
the operator must NOT have to click a CTA on
console.<sov-fqdn>/console/dashboard.

Three coupled changes:

1. catalyst-api: cutoverStatusResponse now ALWAYS emits a `state` field
   ("tethered" or "sovereign"), derived from cutoverComplete. The UI's
   branded parseCutoverState rejects empty/undefined, which is what
   was rendering the user-visible error text. Tests cover the empty
   ConfigMap, missing cutoverComplete, and explicit-true cases.

2. UI parseCutoverStatus: defensive fallback when wire frame omits
   `state` — derive from cutoverComplete (default "tethered"). Hostile/
   typo'd state values (e.g. 'pending', '') still throw via the branded
   parser. Defends against partial-rollout where a stale catalyst-api
   Pod is still serving the old shape.

3. bp-self-sovereign-cutover 0.1.16 (chart): new Helm post-install/
   post-upgrade hook (templates/10-auto-trigger-job.yaml) POSTs
   /api/v1/sovereign/cutover/start on catalyst-api after the step
   ConfigMaps + RBAC land. Idempotent via catalyst-api's durable
   status ConfigMap (200 if already complete, 409 if running, 200
   to start). Fails open: a transient catalyst-api unreachability
   exits 0 so the chart install doesn't block; operator can always
   re-fire via the manual CTA. Gated on .Values.trigger.auto (default
   true; per-Sovereign overlays can disable for soak Sovereigns).

Hard rules honoured:
- No contabo Pods touched.
- Existing tethered Sovereigns that have not cutover stay tethered —
  the auto-trigger Job is in the chart (per-Sovereign), not in the
  mothership; only fresh Sovereign installs of bp-self-sovereign-cutover
  0.1.16+ get it.
- IaC-first: the auto-trigger uses catalyst-api's existing /start
  endpoint (no bespoke cluster mutation outside the chart).
- Event-driven: post-install hook fires on chart install (no cron).

Verification:
- Go: cutover_test.go +TestBuildCutoverStatusResponse_StateAlwaysDefined
  +TestHandleCutoverStatus_StateFieldEmittedOnFreshSovereign — both
  green.
- TS: cutover.test.ts +5 cases for parseCutoverStatus state-fallback;
  35/35 green. Sovereignty widget tests 20/20 green.
- Chart: tests/cutover-contract.sh +Case 8/9 (auto-trigger present by
  default, absent under trigger.auto=false); helm template renders
  cleanly.

Co-authored-by: Hatice Yildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:40:52 +04:00
e3mrah
eddf0e62a4
fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter (#891) (#892)
* fix(catalyst-api): SME-tenant orchestrator writes parent kustomization.yaml index (#889)

The Flux Kustomization rendered by bp-catalyst-platform 1.4.13+ at
clusters/<sov-fqdn>/sme-tenants/ requires a parent kustomization.yaml
that enumerates tenant subdirectories. The orchestrator only wrote
per-tenant overlays without the parent index, so on otech103 Flux
hit:

  kustomization path not found: stat /tmp/kustomization-...
  /clusters/otech103.omani.works/sme-tenants: no such file or directory

Even after a tenant signup, the parent path lacked a kustomization.yaml
so Flux couldn't enumerate subdirs.

Fix: NEW writeParentTenantsIndex helper called from both
WriteTenantOverlay and DeleteTenantOverlay. Scans the parent dir for
subdirectories that contain kustomization.yaml, sorts them lexically
for deterministic output (no spurious diffs), and writes a parent
kustomization.yaml listing them under `resources:`. Empty list (no
tenants) renders as `resources: []` — still a valid Kustomization
root, so Flux stays Ready=True after the last tenant teardown.

git add covers both the per-tenant subdir AND the parent index, so a
single commit captures the delta.

Live on otech103 post-cutover, 2026-05-05.

* fix(self-sovereign-cutover): Step-5 widens GitRepository ignore filter to include clusters/<sov-fqdn>/ (#891)

After Day-2 cutover, the GitRepository ignore filter excluded the
Sovereign's own clusters/<sov-fqdn>/ subtree. This made every
Sovereign-specific Flux Kustomization (sme-tenants, future per-Sov
overlays) hit "kustomization path not found" because source-controller
filtered the path out of the artifact tarball.

Live on otech103 (2026-05-05): sme-tenants Kustomization stuck for
20+ minutes despite the orchestrator successfully committing the
overlay to local Gitea.

Fix: Step-5 (flux-gitrepository-patch) now writes the patch as a
multi-line YAML strategic-merge file via /tmp emptyDir (since the
Pod runs readOnlyRootFilesystem), composing the new ignore filter:

  /*
  !/clusters/_template
  !/clusters/${SOVEREIGN_FQDN}
  !/platform
  !/products

The SOVEREIGN_FQDN is wired from .Values.sovereign.fqdn (already
established in the chart values).

Bumps chart 0.1.14 -> 0.1.15. Slot 06a pin bumps in lockstep.

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-05 09:39:42 +04:00
e3mrah
8e4c88fd28
fix(bp-self-sovereign-cutover): auto-sync local Gitea mirror from upstream GitHub (#870) (#875)
Step-1 gitea-mirror Job replaces the legacy one-shot create-empty-repo +
git-push pattern with a single call to Gitea's native /repos/migrate API
with mirror=true and mirror_interval=10m0s. Gitea now polls the upstream
openova-io/openova repo on a 10-minute interval and replicates branches
+ tags into the local Sovereign Gitea automatically.

Closes the "Sovereign drifts from upstream main forever after Day-2
cutover" bug — hit twice during the otech103 2026-05-04 overnight DoD
session, requiring manual `git fetch` inside the Gitea pod for every
chart rollout.

Why /repos/migrate over the previous git push approach:
- Gitea cannot convert a regular repo into a pull-mirror after creation
  (the mirror flag is set at create-time only). The migrate endpoint
  creates the repo AS a mirror in one shot.
- The migrate endpoint accepts toggles for issues / pull-requests /
  wiki / labels / milestones / releases — we set them all to false so
  Gitea only replicates branches+tags, the only refs the Sovereign's
  Flux GitRepository needs.
- Recurring sync is a Gitea-native capability; using it avoids a
  parallel CronJob (which would violate the "event-driven not cron"
  inviolable principle) or a long-poll sidecar (which would duplicate
  what Gitea already does).

Idempotency: if the repo already exists from a prior cutover attempt,
the script PATCHes mirror_interval to the desired value and POSTs to
/mirror-sync to trigger an immediate refresh. Note that PATCH alone
cannot convert a legacy non-mirror repo to a mirror — Sovereigns
seeded by chart < 0.1.14 would need an operator-driven repo delete +
re-migrate to retro-fit auto-sync, but new provisions take the
migrate path automatically.

Verification on the rendered ConfigMap:
  $ helm template smoke .                   # renders 16 docs cleanly
  $ bash tests/cutover-contract.sh          # all 7 gates green
  $ sh -n <rendered-script>                 # POSIX shell syntax OK

Chart bumped 0.1.13 → 0.1.14 (Chart.yaml + blueprint.yaml spec.version
aligned per #817 invariant + slot 06a-bp-self-sovereign-cutover.yaml
pin lockstep).

Refs #870, #790.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:35:40 +04:00
e3mrah
9b710049e3
fix(self-sovereign-cutover): Step-8 baseline-diff (only NEW regressions count) (#858)
Live otech103: Step-8 survival window failed because infrastructure-config Kustomization had been NotReady for 4h pre-cutover (Crossplane provider CRD ordering — unrelated to sovereignty). Sovereignty proof asks 'did cutover break anything', not 'is the cluster perfect'. Capture baseline NotReady set before the window, only fail on NEW additions during.

Bumps 0.1.12 → 0.1.13 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 04:20:16 +04:00
e3mrah
d5d1d9b2cd
fix(self-sovereign-cutover): Step-8 tolerate slot-managed self-ref HelmRepositories (#857)
Live otech103: Step-8 verification flagged 2 HelmRepositories (bp-newapi + bp-self-sovereign-cutover) still on ghcr.io/openova-io. Both are declared in clusters/_template/bootstrap-kit/ slot files which Flux Kustomization re-applies on every reconcile — Step-6's patch is transient for them. Data-plane impact is null because they're not pulled again until the next cutover cycle which would re-apply the patch first. The 38 leaf-bp HelmRepositories ARE patched durably (live in HelmRelease values, not separate slot files).

Bumps 0.1.11 → 0.1.12 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 04:06:41 +04:00
e3mrah
142ea21534
fix(self-sovereign-cutover): Step-8 passive architectural verification (Cilium can't egressDeny+toFQDNs) (#856)
Live otech103: Step-8 (egress-block-test) failed because Cilium 1.16's CiliumNetworkPolicy schema doesn't support 'spec.egressDeny[].toFQDNs' — strict-decoding error 'unknown field'. FQDN-based matching in Cilium is only allowed in 'egress' (allow), not 'egressDeny'.

Pivot: Step-8 now asserts the architectural pivots from Steps 5-7 are actually live (GitRepository.url + all HelmRepositories + catalyst-api env all point at local Gitea/Harbor) BEFORE entering the durationSeconds survival window during which Flux Kustomization + HelmRelease readiness is polled. Same sovereignty proof, expressed in a form Cilium can evaluate.

Bumps 0.1.10 → 0.1.11 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 03:22:30 +04:00
e3mrah
86ae235804
fix(self-sovereign-cutover): catalyst-api namespace catalyst-system not catalyst-platform (#855)
Live otech103: Step-7 (catalyst-api-env-patch) hit 'deployments.apps catalyst-api not found' in catalyst-platform ns. Actual Sovereign-side namespace is catalyst-system. Bumps 0.1.9 → 0.1.10.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 01:59:11 +04:00
e3mrah
dd84060d05
fix(self-sovereign-cutover): switch from bitnami/kubectl to alpine/k8s (#854)
Live otech103 2026-05-04: bitnami/kubectl:1.31.4 404 on Docker Hub. Bitnami deprecated public Docker Hub registry in 2025; their kubectl image stopped getting tags. alpine/k8s is the canonical alpine-based replacement — kubectl + helm + standard k8s CLI surface, actively maintained, :1.31.4 verified present.

Bumps 0.1.8 → 0.1.9 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 01:55:46 +04:00
e3mrah
887ff62200
fix(self-sovereign-cutover): bitnami/kubectl tag :1.31 → :1.31.4 (#853)
Live otech103 2026-05-04: Step-5 (flux-gitrepository-patch) Pod DeadlineExceeded after 10m of ImagePullBackOff. bitnami/kubectl on DockerHub doesn't have a floating :1.31 tag — only patch-level :1.31.X. Pin to :1.31.4 (latest of 1.31 minor as of today).

Bumps 0.1.7 → 0.1.8 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 01:42:54 +04:00
e3mrah
e9970db7b6
fix(self-sovereign-cutover): proxy-quay adapter type docker-registry (#852)
Live otech103: Harbor rejects project create with metadata.proxy_cache=true on registries with type 'quay' — HTTP 400 'unsupported registry type quay'. Quay speaks plain v2 so docker-registry is the correct adapter (4/7 projects ahead succeeded with the same shape). Bumps 0.1.6 → 0.1.7.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 01:29:26 +04:00
e3mrah
ea51642092
fix(self-sovereign-cutover): proxy-ghcr Harbor adapter type 'github-ghcr' (#851)
Live otech103 2026-05-04: Step-2 harbor-projects POST /api/v2.0/registries returns 500 'adapter factory for github not found'. Harbor 2.x's canonical GHCR proxy-cache adapter is named 'github-ghcr', not 'github'.

Bumps 0.1.5 → 0.1.6 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 01:26:51 +04:00
e3mrah
8f96daeb6f
fix(self-sovereign-cutover): harbor service is 'harbor-core' not 'harbor-harbor-core' (#849)
Live failure on otech103 2026-05-04: Step-2 (harbor-projects) Pod exits silently after first echo because curl exit 6 (CURLE_COULDNT_RESOLVE_HOST). The chart's default harborInternalURL was http://harbor-harbor-core.harbor.svc.cluster.local but the actual bitnami harbor chart's service name is harbor-core (release name doesn't double-prefix when targetNamespace == 'harbor' AND releaseName == 'harbor').

Fix: harborInternalURL → http://harbor-core.harbor.svc.cluster.local. Verified via 'kubectl get svc -n harbor' on otech103.

Bumps 0.1.4 → 0.1.5 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 01:16:41 +04:00
e3mrah
ab5681e656
fix(self-sovereign-cutover): Step-1 use bare clone + explicit refspec push (#848)
Live failure on otech103 2026-05-04 even after 0.1.3: git push --all in a mirror clone still pushes refs/pull/* because mirror clones store all upstream refs (incl. GitHub PR refs) at the same level as refs/heads/, and --all walks the whole local refstore.

Fix: use git clone --bare (not --mirror) which only fetches refs/heads/* and refs/tags/*, then push with explicit refspecs:
  git push origin 'refs/heads/*:refs/heads/*'
  git push origin 'refs/tags/*:refs/tags/*'

Bumps 0.1.3 → 0.1.4 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 00:59:25 +04:00
e3mrah
6322d82775
fix(self-sovereign-cutover): Step-1 push --all + --tags (skip GitHub PR refs) (#847)
Live failure on otech103 2026-05-04: git push --mirror to local Gitea rejected by Gitea's update hook on every refs/pull/<n>/head + refs/pull/<n>/merge ref (those are GitHub-specific metadata refs Gitea doesn't accept). Branches and tags push fine.

Fix: split the push into 'git push --all' (branches) + 'git push --tags' (tags). Branches + tags are exactly what Flux GitRepository needs to reconcile from local Gitea — PR refs are upstream-only metadata not referenced by any consumer.

Bumps bp-self-sovereign-cutover 0.1.2 → 0.1.3 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 00:55:22 +04:00
e3mrah
3015033136
fix(self-sovereign-cutover): Step-1 creates Gitea org before repo (#846)
Live failure on otech103 2026-05-04: Step-1 hit 'POST /orgs/openova/repos returns 404 Not Found' because the org openova doesn't exist on a fresh Gitea install. The /user/repos fallback would have created the repo under gitea_admin/openova, but the subsequent git push targets openova/openova so it fails with 'remote: Not found'.

Fix: explicit org-create step before repo-create. POST /orgs with {username, visibility} creates the org idempotently (swallow 422 'already exists'). Then POST /orgs/<org>/repos creates the repo under it. Push URL targets openova/openova as before.

Bumps bp-self-sovereign-cutover 0.1.1 → 0.1.2 + slot 06a pin lockstep.

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 00:51:24 +04:00
e3mrah
e36089540d
fix(self-sovereign-cutover): Step-1 BusyBox-wget Basic auth header (--user not supported) (#845)
* fix(bp-gitea): mirror gitea-admin-secret to catalyst ns via reflector annotations

Live failure on otech103 2026-05-04: cutover Step-1 gitea-mirror Job in catalyst ns CrashLoops with 'secret "gitea-admin-secret" not found' because K8s forbids cross-namespace secretKeyRef. The Secret created by bp-gitea 1.2.4 lives in the gitea ns; the cutover Job runs in the catalyst ns.

Fix: add reflector.v1.k8s.emberstack.com annotations on the Secret so bp-reflector (already installed at slot 05a) mirrors it into the catalyst namespace. The Job's secretKeyRef then resolves locally. Reflector keeps the mirror in lockstep on password rotation.

Bumps bp-gitea 1.2.4 → 1.2.5 + slot 10 pin lockstep.

* fix(self-sovereign-cutover): Step-1 gitea-mirror BusyBox-wget compat (Basic auth header)

Live failure on otech103 2026-05-04: Step-1 cutover-gitea-mirror Pod exits with 'wget: unrecognized option: password=...' because the alpine/git image bundles BusyBox wget which does NOT recognise --user / --password (those are GNU wget flags).

Fix: build a base64'd Authorization: Basic header from $GITEA_USERNAME:$GITEA_PASSWORD and pass it via --header (BusyBox wget supports --header). Same Gitea API call surface, BusyBox-compatible wire.

Bumps bp-self-sovereign-cutover 0.1.0 → 0.1.1 + slot 06a pin lockstep.

---------

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
2026-05-05 00:40:24 +04:00
e3mrah
33dc98782b
feat(bp-self-sovereign-cutover): chart + bootstrap-kit slot 06a (#791) (#808)
New platform Blueprint at `platform/self-sovereign-cutover/chart/`. Ships
DORMANT — eight step PodSpec ConfigMaps, the registry-pivot DaemonSet, the
mutable cutover-status ConfigMap, plus ServiceAccount/RBAC. The catalyst-api
cutover endpoint (#792, merged at 03828641) reads each step ConfigMap by
label selector and stamps real Jobs only on operator-driven trigger.

Step inventory:
  01 gitea-mirror             — git push --mirror upstream → local Gitea
  02 harbor-projects          — create 7 proxy-cache projects
  03 harbor-prewarm           — HEAD-pull bootstrap-kit images through cache
  04 registry-pivot           — DaemonSet rewrites registries.yaml on every node
  05 flux-gitrepository-patch — pivot GitRepository.url → local Gitea
  06 helmrepository-patches   — pivot 38 OCI URLs → local Harbor
  07 catalyst-api-env-patch   — kubectl set env CATALYST_GITOPS_REPO_URL
  08 egress-block-test        — CiliumNetworkPolicy + 10-min sovereignty proof

Plus self-sovereign-cutover-status ConfigMap with the consumer-contract keys
(cutoverComplete, currentStep, step.<name>.result, etc.) shipped at install
with helm.sh/resource-policy: keep so chart uninstall doesn't lose state.

Bootstrap-kit slot `06a-bp-self-sovereign-cutover.yaml` installs the chart
into the `catalyst` namespace (matches catalyst-api's default discovery
namespace), depends on bp-gitea + bp-harbor, uses disableWait: true.

RBAC splits `create` verbs into their own Rule WITHOUT resourceNames per
feedback_rbac_create_no_resourcenames.md — the bp-openbao loop anchor.

chart/tests/cutover-contract.sh enforces:
  - 8 step ConfigMaps render
  - required labels (part-of/component/cutover-order/cutover-mode)
  - required data keys (stepName + podSpec for job-mode)
  - step 04 mode=daemonset-wait
  - status ConfigMap retained on uninstall
  - RBAC create/resourceNames split

helm template smoke render: 1180 lines, 19 resources (1 Namespace + 1 SA +
11 ConfigMaps + 1 DaemonSet + 1 ClusterRole + 1 ClusterRoleBinding).
helm lint: clean.
scripts/check-bootstrap-deps.sh: PASSED (slot 6a registered, depends_on
[bp-gitea, bp-harbor]).

Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:55:19 +04:00