History

hatiyildiz 4277254577 docs(pass-52): bundled date-sweep + cross-component namespace clean; knative clean Pass 47 carry-over: 4 docs had stale "Updated: 2026-02-26" markers despite Pass 27/34/45 architectural edits. Updated all to 2026-04-28: - products/fabric/README.md (Pass 34 TENANT rename) - products/cortex/README.md (Pass 34 TENANT + DNS fixes) - products/fingate/README.md (Pass 34 TENANT + 6 URL templates) - docs/TECHNOLOGY-FORECAST-2027-2030.md (Pass 27 + Pass 45) products/relay/README.md left at 2026-02-26 — no architectural edits since (verified via git log --follow). Cross-component namespace sweep (Pass 51 lesson #16): all shared dependencies use canonical namespaces consistently across components: - minio.storage.svc: 10 instances ✓ - kafka-kafka-bootstrap.databases.svc: 4 instances ✓ - strimzi-kafka-bootstrap.databases.svc: 3 instances ✓ - opensearch.search.svc: 3 instances ✓ First pass where cross-component namespace sweep returned fully clean — significant convergence signal. The drift category that Pass 41 + Pass 51 hunted is now closed. platform/knative/README.md: clean. Banner correct (§4.6 AI/ML). Pass 32 image registry fix intact.	2026-04-28 00:37:21 +02:00
..
README.md	docs(pass-52): bundled date-sweep + cross-component namespace clean; knative clean	2026-04-28 00:37:21 +02:00

hatiyildiz 4277254577 docs(pass-52): bundled date-sweep + cross-component namespace clean; knative clean

Pass 47 carry-over: 4 docs had stale "Updated: 2026-02-26" markers
despite Pass 27/34/45 architectural edits. Updated all to 2026-04-28:
- products/fabric/README.md (Pass 34 TENANT rename)
- products/cortex/README.md (Pass 34 TENANT + DNS fixes)
- products/fingate/README.md (Pass 34 TENANT + 6 URL templates)
- docs/TECHNOLOGY-FORECAST-2027-2030.md (Pass 27 + Pass 45)

products/relay/README.md left at 2026-02-26 — no architectural edits
since (verified via git log --follow).

Cross-component namespace sweep (Pass 51 lesson #16): all shared
dependencies use canonical namespaces consistently across components:
- minio.storage.svc: 10 instances ✓
- kafka-kafka-bootstrap.databases.svc: 4 instances ✓
- strimzi-kafka-bootstrap.databases.svc: 3 instances ✓
- opensearch.search.svc: 3 instances ✓

First pass where cross-component namespace sweep returned fully clean
— significant convergence signal. The drift category that Pass 41 +
Pass 51 hunted is now closed.

platform/knative/README.md: clean. Banner correct (§4.6 AI/ML).
Pass 32 image registry fix intact.

2026-04-28 00:37:21 +02:00

README.md

docs(pass-52): bundled date-sweep + cross-component namespace clean; knative clean

2026-04-28 00:37:21 +02:00

README.md

OpenOva Cortex

Enterprise AI platform with LLM serving, RAG, AI safety, and LLM observability.

Status: Accepted | Updated: 2026-04-28

Overview

OpenOva Cortex is an enterprise AI product that bundles AI/ML infrastructure components with AI safety and observability for enterprise AI deployments.

flowchart TB
    subgraph UI["User Interfaces"]
        LibreChat[LibreChat<br/>Chat UI]
        ClaudeCode[Claude Code]
    end

    subgraph Safety["AI Safety"]
        Guardrails[NeMo Guardrails<br/>Safety Firewall]
    end

    subgraph Gateway["Gateway Layer"]
        LLMGateway[LLM Gateway]
        Adapter[Anthropic Adapter]
    end

    subgraph Serving["Model Serving"]
        KServe[KServe]
        vLLM[vLLM]
    end

    subgraph Knowledge["Knowledge Layer"]
        Milvus[Milvus<br/>Vectors]
        Neo4j[Neo4j<br/>Graph]
    end

    subgraph Embeddings["Embeddings"]
        BGE[BGE-M3]
        Reranker[BGE-Reranker]
    end

    subgraph Observability["AI Observability"]
        LangFuse[LangFuse]
    end

    UI --> Safety
    Safety --> Gateway
    Gateway --> Serving
    Serving --> Knowledge
    Serving --> Embeddings
    Gateway --> Observability

Components

All components are in platform/ (flat structure):

Component	Purpose	Location
llm-gateway	Subscription-based LLM access	platform/llm-gateway
anthropic-adapter	Claude API translation	platform/anthropic-adapter
knative	Serverless platform	platform/knative
kserve	Model serving	platform/kserve
vllm	LLM inference	platform/vllm
milvus	Vector database	platform/milvus
neo4j	Graph database	platform/neo4j
librechat	Chat UI	platform/librechat
bge	Embeddings + reranking	platform/bge
nemo-guardrails	AI safety firewall	platform/nemo-guardrails
langfuse	LLM observability	platform/langfuse

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     User Interfaces                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                  │
│  │LibreChat │  │Claude    │  │  Custom  │                  │
│  │  (Chat)  │  │  Code    │  │   Apps   │                  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘                  │
└───────┼─────────────┼─────────────┼─────────────────────────┘
        │             │             │
        ▼             ▼             ▼
┌─────────────────────────────────────────────────────────────┐
│                    AI Safety Layer                           │
│  ┌─────────────────────────────────────────────────────┐    │
│  │           NeMo Guardrails                           │    │
│  │  (Prompt injection, PII filter, topic control)      │    │
│  └──────────────────────┬──────────────────────────────┘    │
└─────────────────────────┼───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    Gateway Layer                            │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │    LLM Gateway      │  │  Anthropic Adapter  │          │
│  │ (Subscription Proxy)│  │  (API Translation)  │          │
│  └──────────┬──────────┘  └──────────┬──────────┘          │
└─────────────┼────────────────────────┼──────────────────────┘
              │                        │
              ▼                        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Model Serving                            │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │       KServe        │  │        vLLM         │          │
│  │   (Orchestration)   │  │     (Inference)     │          │
│  └─────────────────────┘  └─────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
         │              │
         ▼              ▼
┌─────────────────────────────────────────────────────────────┐
│                   Knowledge Layer                           │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │       Milvus        │  │       Neo4j         │          │
│  │   (Vector Store)    │  │   (Graph Store)     │          │
│  └─────────────────────┘  └─────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│                   Embedding Layer                           │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │       BGE-M3        │  │    BGE-Reranker     │          │
│  │    (Embeddings)     │  │  (Cross-Encoder)    │          │
│  └─────────────────────┘  └─────────────────────┘          │
└─────────────────────────────────────────────────────────────┘

                  LangFuse (traces all LLM calls)

Agent Presets

Agent	Purpose	Retrieval
Deep Thinker	Complex reasoning with CoT	None
Quick Thinker	Fast responses	None
Compliance Advisor	Regulatory knowledge	Vector + Graph
AIOps Advisor	Infrastructure docs	Vector
Dev Advisor	Development standards	Vector
CAD Advisor	Document comparison	Ephemeral Vector

Deployment

Enable Cortex Product

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: ai-hub
  namespace: flux-system
spec:
  interval: 10m
  path: ./ai-hub/deploy
  prune: true
  sourceRef:
    kind: GitRepository
    name: openova-blueprints
  postBuild:
    substitute:
      ORGANIZATION: ${ORGANIZATION}
      SOVEREIGN_DOMAIN: ${SOVEREIGN_DOMAIN}
      GPU_NODE_POOL: ${GPU_NODE_POOL}

Configuration

Parameter	Description	Default
`ORGANIZATION`	Catalyst Organization identifier (the multi-tenancy unit per `docs/GLOSSARY.md`; previously labelled "tenant" — banned term)	Required
`SOVEREIGN_DOMAIN`	Sovereign's base domain (e.g. `omantel.openova.io`, `acme.com`)	Required
`GPU_NODE_POOL`	GPU node label	Required
`LLM_MODEL`	Default LLM	`qwen3-32b`
`EMBEDDING_MODEL`	Embedding model	`bge-m3`
`VECTOR_DIM`	Vector dimensions	`1024`

Resource Requirements

Component	Replicas	CPU	Memory	GPU
vLLM	1	4	32Gi	2x A10
BGE-M3	1	2	4Gi	1x A10
BGE-Reranker	1	1	2Gi	1x A10
Milvus	3	2	8Gi	-
Neo4j	1	2	4Gi	-
LibreChat	2	0.5	1Gi	-
LLM Gateway	2	0.25	512Mi	-
NeMo Guardrails	2	1	2Gi	-
LangFuse	2	0.5	1Gi	-
Total	-	~16	~56Gi	4x A10

GPU Requirements

GPU Type	Minimum	Recommended
NVIDIA A10	2	4
NVIDIA A100	1	2
NVIDIA H100	1	1

Use Cases

Claude Code with Internal Models

# Configure Claude Code
export ANTHROPIC_BASE_URL="https://llm-gateway.<env>.<sovereign-domain>/v1"
export ANTHROPIC_API_KEY="your-subscription-token"

# Use Claude Code normally
claude "Explain this code..."

RAG-Powered Chat

# Access LibreChat
https://chat.<env>.<sovereign-domain>

# Select agent preset (e.g., Compliance Advisor)
# Upload documents for context
# Ask questions with citations

Monitoring

Key Metrics

Metric	Query
LLM latency	`vllm_request_duration_seconds`
Token throughput	`vllm_generation_tokens_total`
GPU utilization	`DCGM_FI_DEV_GPU_UTIL`
Guardrail blocks	`nemo_guardrails_blocked_total`
LLM cost	via LangFuse dashboard

Grafana Dashboards

Dashboard	Purpose
AI Hub Overview	Request rates, latencies
GPU Metrics	Utilization, memory
RAG Analytics	Retrieval quality, citations
AI Safety	Guardrail activations, blocked prompts
LLM Cost	Per-model, per-user cost tracking (LangFuse)

Operations

Health Checks

# Check all components
kubectl get pods -n ai-hub

# Check vLLM
curl http://vllm.ai-hub.svc:8000/health

# Check Milvus
kubectl exec -it milvus-proxy-0 -n ai-hub -- curl localhost:9091/healthz

Troubleshooting

Issue	Cause	Resolution
OOM on vLLM	Model too large	Increase GPU memory or use quantization
Slow retrieval	Index not optimized	Rebuild Milvus index
Empty responses	No relevant chunks	Check embedding quality
GPU not detected	Driver issue	Verify NVIDIA device plugin
Prompt injection	Guardrails not configured	Review NeMo Guardrails rules

Part of OpenOva