openova/platform/llm-gateway
e3mrah 87d9a4afa7
feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288)
W2.5.E batch — three Application-tier Blueprints completing the LLM
serving / workflow stack:

- bp-temporal/1.0.0 — wraps temporal/temporal 1.2.0 (the new chart
  rewrite that removed cassandra:/mysql:/postgresql:/elasticsearch:/
  prometheus:/grafana: top-level keys in favour of
  server.config.persistence.datastores). Postgres-only via CNPG-backed
  visibility store (skip Cassandra). Web UI ON. Keycloak OIDC
  integration via --auth-claim-mapper renders auth.yaml ConfigMap
  (operator wires via additionalVolumes once bp-keycloak is
  reconciled, default OFF). dependsOn: bp-cnpg + bp-cert-manager.
  Closes #271.
  Kinds: Cluster (CNPG) + ConfigMap + Deployment + Job + Pod +
  Service.

- bp-llm-gateway/1.0.0 — wraps berriai/litellm-helm 0.1.572 from OCI.
  Subscription-aware proxy for Claude Code: routes to Anthropic (via
  operator OAuth/Max subscription — NEVER an ANTHROPIC_API_KEY,
  per memory/feedback_no_api_key.md), Bedrock, Vertex,
  OpenAI-compatible (via bp-anthropic-adapter), and self-hosted
  vLLM. CNPG-backed audit log (every prompt + response persisted
  for compliance). Bundled bitnami postgresql + redis subcharts
  DISABLED (db.useExisting=true points at the CNPG cluster).
  Keycloak SSO via auth.yaml ConfigMap (default OFF).
  ExternalSecret-backed environmentSecrets brings tokens / IAM
  creds in without inlining plaintext. dependsOn: bp-cnpg +
  bp-keycloak + bp-external-secrets. Closes #267.
  Kinds: Cluster (CNPG audit) + ConfigMap + Deployment + Job +
  Pod + Secret + Service + ServiceAccount.

- bp-anthropic-adapter/1.0.0 — Catalyst-authored scratch chart for
  the OpenAI ↔ Anthropic translation Go service. SHA-pinned image
  ghcr.io/openova-io/openova/anthropic-adapter:<sha> (Inviolable
  Principle #4a — GitHub Actions is the only build path; empty
  default tag fails the render with a clear error instead of
  silently shipping :latest). OAuth/Max subscription token mounted
  from K8s Secret materialized by ESO from bp-openbao —
  ANTHROPIC_OAUTH_TOKEN env var, NEVER an ANTHROPIC_API_KEY.
  Includes OpenAI → Anthropic model-mapping ConfigMap (gpt-4 →
  claude-3-5-sonnet, gpt-4o-mini → claude-3-5-haiku, etc.).
  sigstore/common library subchart included to satisfy the
  hollow-chart gate (matches bp-vllm pattern from #283).
  dependsOn: bp-external-secrets. Closes #268.
  Kinds: ConfigMap + Deployment + Service + ServiceAccount.

CRITICAL — bp-llm-gateway and bp-anthropic-adapter both consume the
operator's Claude OAuth/Max subscription. Per memory/
feedback_no_api_key.md and the user's standing instruction, neither
chart accepts or generates an ANTHROPIC_API_KEY. Tokens flow
exclusively through ExternalSecret-managed K8s Secrets that ESO
materializes from bp-openbao at install time.

Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every
observability toggle defaults `false` (ServiceMonitor / metrics
sidecar / PodMonitor) and is operator-tunable via per-cluster
overlay once bp-kube-prometheus-stack reconciles. Each chart ships
tests/observability-toggle.sh covering default-off, opt-in (with
--api-versions monitoring.coreos.com/v1 to simulate the CRDs), and
explicit-off cases. bp-anthropic-adapter additionally tests the
never-:latest gate via Case 4 (empty image tag must fail render).

Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every
upstream version, namespace, server URL, role, secret name, model
default, and toggle is exposed under values.yaml. Cluster overlays
in clusters/<sovereign>/ may override without rebuilding the
Blueprint OCI artifact.

Per docs/BLUEPRINT-AUTHORING.md §11.1 (umbrella shape — hard
contract): bp-temporal and bp-llm-gateway declare their upstream
charts under Chart.yaml dependencies: so helm dependency build
bundles the upstream payload into the OCI artifact. bp-anthropic-
adapter is a scratch chart (no upstream Helm chart exists) and
includes sigstore/common as the obligatory hollow-chart-gate
dependency, matching the bp-vllm precedent from W2.5.D (#283).

Closes #267
Closes #268
Closes #271

helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only)

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-04-30 19:37:19 +04:00
..
chart feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
blueprint.yaml feat(charts): bp-temporal + bp-llm-gateway + bp-anthropic-adapter wrapper charts (closes #267 #268 #271) (#288) 2026-04-30 19:37:19 +04:00
README.md docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs 2026-04-27 22:36:39 +02:00

LLM Gateway

Subscription-based proxy for LLM access via Claude Code. Application Blueprint (see docs/PLATFORM-TECH-STACK.md §4.6). Catalyst's outbound LLM access point — routes between Claude API, GPT-4 API, self-hosted vLLM, and Axon (the SaaS gateway). Used by bp-cortex.

Status: Accepted | Updated: 2026-04-27


Overview

LLM Gateway enables users with Claude/OpenAI subscriptions to use Claude Code with internal models without requiring API pay-as-you-go billing.

flowchart LR
    subgraph Gateway["LLM Gateway"]
        Auth[Subscription Auth]
        Quota[Usage Quota]
        Router[Model Router]
    end

    subgraph Backends["LLM Backends"]
        Internal[Internal LLM<br/>vLLM/KServe]
        Claude[Claude API]
        OpenAI[OpenAI API]
    end

    User[Claude Code User] -->|Subscription Token| Gateway
    Gateway --> Auth
    Auth --> Quota
    Quota --> Router
    Router --> Backends

Why LLM Gateway?

Feature Benefit
Subscription-based No API billing required
Quota management Fair usage limits
Model routing Internal + external models
Claude Code support Native integration

How It Works

  1. User authenticates with their subscription credentials
  2. Gateway validates subscription status
  3. Requests are routed to internal or external LLMs
  4. Usage is tracked against subscription quota

Configuration

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-gateway
  namespace: ai-hub
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: gateway
          image: harbor.<location-code>.<sovereign-domain>/ai-hub/llm-gateway:latest
          ports:
            - containerPort: 8000
          env:
            - name: INTERNAL_LLM_URL
              value: "http://vllm.ai-hub.svc:8000/v1"
            - name: CLAUDE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: claude-api-key
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: openai-api-key
            - name: QUOTA_REDIS_URL
              value: "redis://valkey.ai-hub.svc:6379"
            - name: AUTH_PROVIDER
              value: "keycloak"
            - name: KEYCLOAK_URL
              value: "https://keycloak.<location-code>.<sovereign-domain>/realms/<org>"

Subscription Tiers

Tier Daily Quota Models
Free 10 requests Internal only
Pro 1,000 requests Internal + Claude Haiku
Enterprise Unlimited All models

Authentication Flow

sequenceDiagram
    participant User
    participant ClaudeCode
    participant Gateway
    participant Keycloak
    participant LLM

    User->>ClaudeCode: Configure gateway URL
    ClaudeCode->>Gateway: Request + Token
    Gateway->>Keycloak: Validate subscription
    Keycloak-->>Gateway: Subscription tier
    Gateway->>Gateway: Check quota
    alt Quota available
        Gateway->>LLM: Forward request
        LLM-->>Gateway: Response
        Gateway->>Gateway: Decrement quota
        Gateway-->>ClaudeCode: Response
    else Quota exceeded
        Gateway-->>ClaudeCode: 429 Rate Limited
    end

Model Routing

# Model routing logic
def route_model(request_model: str, tier: str) -> str:
    routing = {
        "free": {
            "claude-3-opus": "qwen3-32b",  # Route to internal
            "claude-3-sonnet": "qwen3-32b",
            "gpt-4": "qwen3-32b"
        },
        "pro": {
            "claude-3-opus": "claude-3-haiku",  # Downgrade
            "claude-3-sonnet": "claude-3-haiku",
            "claude-3-haiku": "claude-3-haiku",  # Pass through
            "gpt-4": "qwen3-32b"
        },
        "enterprise": {
            # Pass through all models
        }
    }
    return routing.get(tier, {}).get(request_model, request_model)

Quota Management

# Redis-based quota tracking
async def check_quota(user_id: str, tier: str) -> bool:
    key = f"quota:{user_id}:{today()}"
    current = await redis.get(key) or 0

    limits = {"free": 10, "pro": 1000, "enterprise": float("inf")}

    if int(current) >= limits[tier]:
        return False

    await redis.incr(key)
    await redis.expire(key, 86400)  # Reset daily
    return True

Claude Code Setup

# Configure Claude Code to use gateway
export ANTHROPIC_API_KEY="your-subscription-token"
export ANTHROPIC_BASE_URL="https://llm-gateway.<env>.<sovereign-domain>/v1"

# Or in claude code config
claude config set api_base "https://llm-gateway.<env>.<sovereign-domain>/v1"
claude config set api_key "your-subscription-token"

API Endpoints

Endpoint Purpose
/v1/messages Anthropic-compatible chat
/v1/chat/completions OpenAI-compatible chat
/v1/models List available models
/quota Check remaining quota
/health Health check

Monitoring

Metric Query
Requests by tier gateway_requests_total{tier}
Quota usage gateway_quota_used{user}
Model routing gateway_model_routes_total{from, to}
Latency gateway_request_duration_seconds

Consequences

Positive:

  • No API billing for users
  • Subscription-based access
  • Quota management
  • Model routing flexibility
  • Claude Code compatible

Negative:

  • Additional infrastructure
  • Quota management complexity
  • Subscription validation overhead

Part of OpenOva