W2.5.E batch — three Application-tier Blueprints completing the LLM serving / workflow stack: - bp-temporal/1.0.0 — wraps temporal/temporal 1.2.0 (the new chart rewrite that removed cassandra:/mysql:/postgresql:/elasticsearch:/ prometheus:/grafana: top-level keys in favour of server.config.persistence.datastores). Postgres-only via CNPG-backed visibility store (skip Cassandra). Web UI ON. Keycloak OIDC integration via --auth-claim-mapper renders auth.yaml ConfigMap (operator wires via additionalVolumes once bp-keycloak is reconciled, default OFF). dependsOn: bp-cnpg + bp-cert-manager. Closes #271. Kinds: Cluster (CNPG) + ConfigMap + Deployment + Job + Pod + Service. - bp-llm-gateway/1.0.0 — wraps berriai/litellm-helm 0.1.572 from OCI. Subscription-aware proxy for Claude Code: routes to Anthropic (via operator OAuth/Max subscription — NEVER an ANTHROPIC_API_KEY, per memory/feedback_no_api_key.md), Bedrock, Vertex, OpenAI-compatible (via bp-anthropic-adapter), and self-hosted vLLM. CNPG-backed audit log (every prompt + response persisted for compliance). Bundled bitnami postgresql + redis subcharts DISABLED (db.useExisting=true points at the CNPG cluster). Keycloak SSO via auth.yaml ConfigMap (default OFF). ExternalSecret-backed environmentSecrets brings tokens / IAM creds in without inlining plaintext. dependsOn: bp-cnpg + bp-keycloak + bp-external-secrets. Closes #267. Kinds: Cluster (CNPG audit) + ConfigMap + Deployment + Job + Pod + Secret + Service + ServiceAccount. - bp-anthropic-adapter/1.0.0 — Catalyst-authored scratch chart for the OpenAI ↔ Anthropic translation Go service. SHA-pinned image ghcr.io/openova-io/openova/anthropic-adapter:<sha> (Inviolable Principle #4a — GitHub Actions is the only build path; empty default tag fails the render with a clear error instead of silently shipping :latest). OAuth/Max subscription token mounted from K8s Secret materialized by ESO from bp-openbao — ANTHROPIC_OAUTH_TOKEN env var, NEVER an ANTHROPIC_API_KEY. Includes OpenAI → Anthropic model-mapping ConfigMap (gpt-4 → claude-3-5-sonnet, gpt-4o-mini → claude-3-5-haiku, etc.). sigstore/common library subchart included to satisfy the hollow-chart gate (matches bp-vllm pattern from #283). dependsOn: bp-external-secrets. Closes #268. Kinds: ConfigMap + Deployment + Service + ServiceAccount. CRITICAL — bp-llm-gateway and bp-anthropic-adapter both consume the operator's Claude OAuth/Max subscription. Per memory/ feedback_no_api_key.md and the user's standing instruction, neither chart accepts or generates an ANTHROPIC_API_KEY. Tokens flow exclusively through ExternalSecret-managed K8s Secrets that ESO materializes from bp-openbao at install time. Per docs/BLUEPRINT-AUTHORING.md §11.2 (issue #182): every observability toggle defaults `false` (ServiceMonitor / metrics sidecar / PodMonitor) and is operator-tunable via per-cluster overlay once bp-kube-prometheus-stack reconciles. Each chart ships tests/observability-toggle.sh covering default-off, opt-in (with --api-versions monitoring.coreos.com/v1 to simulate the CRDs), and explicit-off cases. bp-anthropic-adapter additionally tests the never-:latest gate via Case 4 (empty image tag must fail render). Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode): every upstream version, namespace, server URL, role, secret name, model default, and toggle is exposed under values.yaml. Cluster overlays in clusters/<sovereign>/ may override without rebuilding the Blueprint OCI artifact. Per docs/BLUEPRINT-AUTHORING.md §11.1 (umbrella shape — hard contract): bp-temporal and bp-llm-gateway declare their upstream charts under Chart.yaml dependencies: so helm dependency build bundles the upstream payload into the OCI artifact. bp-anthropic- adapter is a scratch chart (no upstream Helm chart exists) and includes sigstore/common as the obligatory hollow-chart-gate dependency, matching the bp-vllm precedent from W2.5.D (#283). Closes #267 Closes #268 Closes #271 helm lint: 1 chart(s) linted, 0 chart(s) failed (each, INFO icon-recommended only) Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
LLM Gateway
Subscription-based proxy for LLM access via Claude Code. Application Blueprint (see docs/PLATFORM-TECH-STACK.md §4.6). Catalyst's outbound LLM access point — routes between Claude API, GPT-4 API, self-hosted vLLM, and Axon (the SaaS gateway). Used by bp-cortex.
Status: Accepted | Updated: 2026-04-27
Overview
LLM Gateway enables users with Claude/OpenAI subscriptions to use Claude Code with internal models without requiring API pay-as-you-go billing.
flowchart LR
subgraph Gateway["LLM Gateway"]
Auth[Subscription Auth]
Quota[Usage Quota]
Router[Model Router]
end
subgraph Backends["LLM Backends"]
Internal[Internal LLM<br/>vLLM/KServe]
Claude[Claude API]
OpenAI[OpenAI API]
end
User[Claude Code User] -->|Subscription Token| Gateway
Gateway --> Auth
Auth --> Quota
Quota --> Router
Router --> Backends
Why LLM Gateway?
| Feature | Benefit |
|---|---|
| Subscription-based | No API billing required |
| Quota management | Fair usage limits |
| Model routing | Internal + external models |
| Claude Code support | Native integration |
How It Works
- User authenticates with their subscription credentials
- Gateway validates subscription status
- Requests are routed to internal or external LLMs
- Usage is tracked against subscription quota
Configuration
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-gateway
namespace: ai-hub
spec:
replicas: 2
template:
spec:
containers:
- name: gateway
image: harbor.<location-code>.<sovereign-domain>/ai-hub/llm-gateway:latest
ports:
- containerPort: 8000
env:
- name: INTERNAL_LLM_URL
value: "http://vllm.ai-hub.svc:8000/v1"
- name: CLAUDE_API_KEY
valueFrom:
secretKeyRef:
name: gateway-secrets
key: claude-api-key
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: gateway-secrets
key: openai-api-key
- name: QUOTA_REDIS_URL
value: "redis://valkey.ai-hub.svc:6379"
- name: AUTH_PROVIDER
value: "keycloak"
- name: KEYCLOAK_URL
value: "https://keycloak.<location-code>.<sovereign-domain>/realms/<org>"
Subscription Tiers
| Tier | Daily Quota | Models |
|---|---|---|
| Free | 10 requests | Internal only |
| Pro | 1,000 requests | Internal + Claude Haiku |
| Enterprise | Unlimited | All models |
Authentication Flow
sequenceDiagram
participant User
participant ClaudeCode
participant Gateway
participant Keycloak
participant LLM
User->>ClaudeCode: Configure gateway URL
ClaudeCode->>Gateway: Request + Token
Gateway->>Keycloak: Validate subscription
Keycloak-->>Gateway: Subscription tier
Gateway->>Gateway: Check quota
alt Quota available
Gateway->>LLM: Forward request
LLM-->>Gateway: Response
Gateway->>Gateway: Decrement quota
Gateway-->>ClaudeCode: Response
else Quota exceeded
Gateway-->>ClaudeCode: 429 Rate Limited
end
Model Routing
# Model routing logic
def route_model(request_model: str, tier: str) -> str:
routing = {
"free": {
"claude-3-opus": "qwen3-32b", # Route to internal
"claude-3-sonnet": "qwen3-32b",
"gpt-4": "qwen3-32b"
},
"pro": {
"claude-3-opus": "claude-3-haiku", # Downgrade
"claude-3-sonnet": "claude-3-haiku",
"claude-3-haiku": "claude-3-haiku", # Pass through
"gpt-4": "qwen3-32b"
},
"enterprise": {
# Pass through all models
}
}
return routing.get(tier, {}).get(request_model, request_model)
Quota Management
# Redis-based quota tracking
async def check_quota(user_id: str, tier: str) -> bool:
key = f"quota:{user_id}:{today()}"
current = await redis.get(key) or 0
limits = {"free": 10, "pro": 1000, "enterprise": float("inf")}
if int(current) >= limits[tier]:
return False
await redis.incr(key)
await redis.expire(key, 86400) # Reset daily
return True
Claude Code Setup
# Configure Claude Code to use gateway
export ANTHROPIC_API_KEY="your-subscription-token"
export ANTHROPIC_BASE_URL="https://llm-gateway.<env>.<sovereign-domain>/v1"
# Or in claude code config
claude config set api_base "https://llm-gateway.<env>.<sovereign-domain>/v1"
claude config set api_key "your-subscription-token"
API Endpoints
| Endpoint | Purpose |
|---|---|
/v1/messages |
Anthropic-compatible chat |
/v1/chat/completions |
OpenAI-compatible chat |
/v1/models |
List available models |
/quota |
Check remaining quota |
/health |
Health check |
Monitoring
| Metric | Query |
|---|---|
| Requests by tier | gateway_requests_total{tier} |
| Quota usage | gateway_quota_used{user} |
| Model routing | gateway_model_routes_total{from, to} |
| Latency | gateway_request_duration_seconds |
Consequences
Positive:
- No API billing for users
- Subscription-based access
- Quota management
- Model routing flexibility
- Claude Code compatible
Negative:
- Additional infrastructure
- Quota management complexity
- Subscription validation overhead
Part of OpenOva