openova/products/cortex
talent-mesh 435f49738d feat: restructure platform to 52 components and 9 products
Technology forecast and strategic review restructure:
- Remove 13 components (backstage, mongodb, activemq, vitess, airflow, camel, dapr, superset, searxng, langserve, trino, lago, rabbitmq)
- Add 10 components (sigstore, syft-grype, nemo-guardrails, langfuse, reloader, matrix, ferretdb, litmus, livekit, coraza)
- Rename product: Synapse → Axon (SaaS LLM Gateway)
- Merge products: Titan + Fuse → Fabric (Data & Integration)
- New product: Relay (Communication)
- Replace Backstage with Catalyst IDP
- Replace MongoDB with FerretDB (MongoDB wire protocol on CNPG)
- Add supply chain security (Sigstore/Cosign, Syft+Grype)
- Add AI safety and observability (NeMo Guardrails, LangFuse)
- Add technology forecast 2027-2030 document
- Full verification pass: zero stale references across all docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:00:19 +00:00
..
README.md feat: restructure platform to 52 components and 9 products 2026-02-26 21:00:19 +00:00

OpenOva Cortex

Enterprise AI platform with LLM serving, RAG, AI safety, and LLM observability.

Status: Accepted | Updated: 2026-02-26


Overview

OpenOva Cortex is an enterprise AI product that bundles AI/ML infrastructure components with AI safety and observability for enterprise AI deployments.

flowchart TB
    subgraph UI["User Interfaces"]
        LibreChat[LibreChat<br/>Chat UI]
        ClaudeCode[Claude Code]
    end

    subgraph Safety["AI Safety"]
        Guardrails[NeMo Guardrails<br/>Safety Firewall]
    end

    subgraph Gateway["Gateway Layer"]
        LLMGateway[LLM Gateway]
        Adapter[Anthropic Adapter]
    end

    subgraph Serving["Model Serving"]
        KServe[KServe]
        vLLM[vLLM]
    end

    subgraph Knowledge["Knowledge Layer"]
        Milvus[Milvus<br/>Vectors]
        Neo4j[Neo4j<br/>Graph]
    end

    subgraph Embeddings["Embeddings"]
        BGE[BGE-M3]
        Reranker[BGE-Reranker]
    end

    subgraph Observability["AI Observability"]
        LangFuse[LangFuse]
    end

    UI --> Safety
    Safety --> Gateway
    Gateway --> Serving
    Serving --> Knowledge
    Serving --> Embeddings
    Gateway --> Observability

Components

All components are in platform/ (flat structure):

Component Purpose Location
llm-gateway Subscription-based LLM access platform/llm-gateway
anthropic-adapter Claude API translation platform/anthropic-adapter
knative Serverless platform platform/knative
kserve Model serving platform/kserve
vllm LLM inference platform/vllm
milvus Vector database platform/milvus
neo4j Graph database platform/neo4j
librechat Chat UI platform/librechat
bge Embeddings + reranking platform/bge
nemo-guardrails AI safety firewall platform/nemo-guardrails
langfuse LLM observability platform/langfuse

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     User Interfaces                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                  │
│  │LibreChat │  │Claude    │  │  Custom  │                  │
│  │  (Chat)  │  │  Code    │  │   Apps   │                  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘                  │
└───────┼─────────────┼─────────────┼─────────────────────────┘
        │             │             │
        ▼             ▼             ▼
┌─────────────────────────────────────────────────────────────┐
│                    AI Safety Layer                           │
│  ┌─────────────────────────────────────────────────────┐    │
│  │           NeMo Guardrails                           │    │
│  │  (Prompt injection, PII filter, topic control)      │    │
│  └──────────────────────┬──────────────────────────────┘    │
└─────────────────────────┼───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    Gateway Layer                            │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │    LLM Gateway      │  │  Anthropic Adapter  │          │
│  │ (Subscription Proxy)│  │  (API Translation)  │          │
│  └──────────┬──────────┘  └──────────┬──────────┘          │
└─────────────┼────────────────────────┼──────────────────────┘
              │                        │
              ▼                        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Model Serving                            │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │       KServe        │  │        vLLM         │          │
│  │   (Orchestration)   │  │     (Inference)     │          │
│  └─────────────────────┘  └─────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
         │              │
         ▼              ▼
┌─────────────────────────────────────────────────────────────┐
│                   Knowledge Layer                           │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │       Milvus        │  │       Neo4j         │          │
│  │   (Vector Store)    │  │   (Graph Store)     │          │
│  └─────────────────────┘  └─────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│                   Embedding Layer                           │
│  ┌─────────────────────┐  ┌─────────────────────┐          │
│  │       BGE-M3        │  │    BGE-Reranker     │          │
│  │    (Embeddings)     │  │  (Cross-Encoder)    │          │
│  └─────────────────────┘  └─────────────────────┘          │
└─────────────────────────────────────────────────────────────┘

                  LangFuse (traces all LLM calls)

Agent Presets

Agent Purpose Retrieval
Deep Thinker Complex reasoning with CoT None
Quick Thinker Fast responses None
Compliance Advisor Regulatory knowledge Vector + Graph
AIOps Advisor Infrastructure docs Vector
Dev Advisor Development standards Vector
CAD Advisor Document comparison Ephemeral Vector

Deployment

Enable Cortex Product

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: ai-hub
  namespace: flux-system
spec:
  interval: 10m
  path: ./ai-hub/deploy
  prune: true
  sourceRef:
    kind: GitRepository
    name: openova-blueprints
  postBuild:
    substitute:
      TENANT: ${TENANT}
      DOMAIN: ${DOMAIN}
      GPU_NODE_POOL: ${GPU_NODE_POOL}

Configuration

Parameter Description Default
TENANT Tenant identifier Required
DOMAIN Base domain Required
GPU_NODE_POOL GPU node label Required
LLM_MODEL Default LLM qwen3-32b
EMBEDDING_MODEL Embedding model bge-m3
VECTOR_DIM Vector dimensions 1024

Resource Requirements

Component Replicas CPU Memory GPU
vLLM 1 4 32Gi 2x A10
BGE-M3 1 2 4Gi 1x A10
BGE-Reranker 1 1 2Gi 1x A10
Milvus 3 2 8Gi -
Neo4j 1 2 4Gi -
LibreChat 2 0.5 1Gi -
LLM Gateway 2 0.25 512Mi -
NeMo Guardrails 2 1 2Gi -
LangFuse 2 0.5 1Gi -
Total - ~16 ~56Gi 4x A10

GPU Requirements

GPU Type Minimum Recommended
NVIDIA A10 2 4
NVIDIA A100 1 2
NVIDIA H100 1 1

Use Cases

Claude Code with Internal Models

# Configure Claude Code
export ANTHROPIC_BASE_URL="https://llm-gateway.ai-hub.<domain>/v1"
export ANTHROPIC_API_KEY="your-subscription-token"

# Use Claude Code normally
claude "Explain this code..."

RAG-Powered Chat

# Access LibreChat
https://chat.ai-hub.<domain>

# Select agent preset (e.g., Compliance Advisor)
# Upload documents for context
# Ask questions with citations

Monitoring

Key Metrics

Metric Query
LLM latency vllm_request_duration_seconds
Token throughput vllm_generation_tokens_total
GPU utilization DCGM_FI_DEV_GPU_UTIL
Guardrail blocks nemo_guardrails_blocked_total
LLM cost via LangFuse dashboard

Grafana Dashboards

Dashboard Purpose
AI Hub Overview Request rates, latencies
GPU Metrics Utilization, memory
RAG Analytics Retrieval quality, citations
AI Safety Guardrail activations, blocked prompts
LLM Cost Per-model, per-user cost tracking (LangFuse)

Operations

Health Checks

# Check all components
kubectl get pods -n ai-hub

# Check vLLM
curl http://vllm.ai-hub.svc:8000/health

# Check Milvus
kubectl exec -it milvus-proxy-0 -n ai-hub -- curl localhost:9091/healthz

Troubleshooting

Issue Cause Resolution
OOM on vLLM Model too large Increase GPU memory or use quantization
Slow retrieval Index not optimized Rebuild Milvus index
Empty responses No relevant chunks Check embedding quality
GPU not detected Driver issue Verify NVIDIA device plugin
Prompt injection Guardrails not configured Review NeMo Guardrails rules

Part of OpenOva