History

talent-mesh c9d04a53b4 refactor: flatten platform/ structure (41 components) Remove hierarchical grouping (networking/, security/, etc.) and use flat structure for all 41 platform components. Changes: - All components now directly under platform/ (no subfolders) - AI Hub components moved from meta-platforms/ai-hub/components/ to platform/ - Open Banking components (lago, openmeter) moved to platform/ - meta-platforms/ now only contains README files that reference platform/ - Open Banking custom services remain in meta-platforms/open-banking/services/ Structure: - platform/ (41 components, flat) - meta-platforms/ai-hub/ (README only, references platform/) - meta-platforms/open-banking/ (README + 6 custom services) All documentation links updated. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:19:48 +00:00
..
README.md	refactor: flatten platform/ structure (41 components)	2026-02-08 15:19:48 +00:00

talent-mesh c9d04a53b4 refactor: flatten platform/ structure (41 components)

Remove hierarchical grouping (networking/, security/, etc.) and use flat
structure for all 41 platform components.

Changes:
- All components now directly under platform/ (no subfolders)
- AI Hub components moved from meta-platforms/ai-hub/components/ to platform/
- Open Banking components (lago, openmeter) moved to platform/
- meta-platforms/ now only contains README files that reference platform/
- Open Banking custom services remain in meta-platforms/open-banking/services/

Structure:
- platform/ (41 components, flat)
- meta-platforms/ai-hub/ (README only, references platform/)
- meta-platforms/open-banking/ (README + 6 custom services)

All documentation links updated.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-08 15:19:48 +00:00

README.md

refactor: flatten platform/ structure (41 components)

2026-02-08 15:19:48 +00:00

README.md

LLM Gateway

Subscription-based proxy for LLM access via Claude Code.

Status: Accepted | Updated: 2026-02-07

Overview

LLM Gateway enables users with Claude/OpenAI subscriptions to use Claude Code with internal models without requiring API pay-as-you-go billing.

flowchart LR
    subgraph Gateway["LLM Gateway"]
        Auth[Subscription Auth]
        Quota[Usage Quota]
        Router[Model Router]
    end

    subgraph Backends["LLM Backends"]
        Internal[Internal LLM<br/>vLLM/KServe]
        Claude[Claude API]
        OpenAI[OpenAI API]
    end

    User[Claude Code User] -->|Subscription Token| Gateway
    Gateway --> Auth
    Auth --> Quota
    Quota --> Router
    Router --> Backends

Why LLM Gateway?

Feature	Benefit
Subscription-based	No API billing required
Quota management	Fair usage limits
Model routing	Internal + external models
Claude Code support	Native integration

How It Works

User authenticates with their subscription credentials
Gateway validates subscription status
Requests are routed to internal or external LLMs
Usage is tracked against subscription quota

Configuration

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-gateway
  namespace: ai-hub
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: gateway
          image: harbor.<domain>/ai-hub/llm-gateway:latest
          ports:
            - containerPort: 8000
          env:
            - name: INTERNAL_LLM_URL
              value: "http://vllm.ai-hub.svc:8000/v1"
            - name: CLAUDE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: claude-api-key
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: openai-api-key
            - name: QUOTA_REDIS_URL
              value: "redis://valkey.ai-hub.svc:6379"
            - name: AUTH_PROVIDER
              value: "keycloak"
            - name: KEYCLOAK_URL
              value: "https://keycloak.<domain>/realms/ai-hub"

Subscription Tiers

Tier	Daily Quota	Models
Free	10 requests	Internal only
Pro	1,000 requests	Internal + Claude Haiku
Enterprise	Unlimited	All models

Authentication Flow

sequenceDiagram
    participant User
    participant ClaudeCode
    participant Gateway
    participant Keycloak
    participant LLM

    User->>ClaudeCode: Configure gateway URL
    ClaudeCode->>Gateway: Request + Token
    Gateway->>Keycloak: Validate subscription
    Keycloak-->>Gateway: Subscription tier
    Gateway->>Gateway: Check quota
    alt Quota available
        Gateway->>LLM: Forward request
        LLM-->>Gateway: Response
        Gateway->>Gateway: Decrement quota
        Gateway-->>ClaudeCode: Response
    else Quota exceeded
        Gateway-->>ClaudeCode: 429 Rate Limited
    end

Model Routing

# Model routing logic
def route_model(request_model: str, tier: str) -> str:
    routing = {
        "free": {
            "claude-3-opus": "qwen3-32b",  # Route to internal
            "claude-3-sonnet": "qwen3-32b",
            "gpt-4": "qwen3-32b"
        },
        "pro": {
            "claude-3-opus": "claude-3-haiku",  # Downgrade
            "claude-3-sonnet": "claude-3-haiku",
            "claude-3-haiku": "claude-3-haiku",  # Pass through
            "gpt-4": "qwen3-32b"
        },
        "enterprise": {
            # Pass through all models
        }
    }
    return routing.get(tier, {}).get(request_model, request_model)

Quota Management

# Redis-based quota tracking
async def check_quota(user_id: str, tier: str) -> bool:
    key = f"quota:{user_id}:{today()}"
    current = await redis.get(key) or 0

    limits = {"free": 10, "pro": 1000, "enterprise": float("inf")}

    if int(current) >= limits[tier]:
        return False

    await redis.incr(key)
    await redis.expire(key, 86400)  # Reset daily
    return True

Claude Code Setup

# Configure Claude Code to use gateway
export ANTHROPIC_API_KEY="your-subscription-token"
export ANTHROPIC_BASE_URL="https://llm-gateway.ai-hub.<domain>/v1"

# Or in claude code config
claude config set api_base "https://llm-gateway.ai-hub.<domain>/v1"
claude config set api_key "your-subscription-token"

API Endpoints

Endpoint	Purpose
`/v1/messages`	Anthropic-compatible chat
`/v1/chat/completions`	OpenAI-compatible chat
`/v1/models`	List available models
`/quota`	Check remaining quota
`/health`	Health check

Monitoring

Metric	Query
Requests by tier	`gateway_requests_total{tier}`
Quota usage	`gateway_quota_used{user}`
Model routing	`gateway_model_routes_total{from, to}`
Latency	`gateway_request_duration_seconds`

Consequences

Positive:

No API billing for users
Subscription-based access
Quota management
Model routing flexibility
Claude Code compatible

Negative:

Additional infrastructure
Quota management complexity
Subscription validation overhead

Part of OpenOva