openova/platform/temporal/README.md
hatiyildiz 4043e1d51c docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs
Pass 25's deferred sweep, executed. Image refs of the form
harbor.<domain>/... (and one registry.<domain>/... in temporal) collapse
the location-code segment. Per NAMING §5.1, Catalyst per-host-cluster
Harbor DNS is harbor.{location-code}.{sovereign-domain} (e.g.
harbor.hfmp.openova.io).

Fixed (11 instances, 9 files):
- anthropic-adapter, bge (×2), debezium, harbor (×2 — ingress + Kyverno
  policy), knative (×2 — serving + traffic-split), llm-gateway, strimzi,
  trivy — all standardized to harbor.<location-code>.<sovereign-domain>.
- temporal had two drift items in one line: registry.<domain> (off-spec
  placeholder — Catalyst's only per-host-cluster registry is Harbor) AND
  legacy "fuse" namespace (renamed to bp-fabric per BUSINESS-STRATEGY
  §16.2 / Pass 26). Rewritten to fabric/order-worker.

Out of scope (deliberate): :latest tag hygiene, and whether Application
Blueprint READMEs should reference ghcr.io/openova-io/bp-<name>:<semver>
vs the Sovereign Harbor mirror. Stalwart customer-email-domain <domain>
placeholders preserved (correct semantics). external-dns illustrative
gslb/api/svc.<domain> preserved (upstream-doc generic).

With Pass 29 (canonical-doc DNS) + Pass 31 (carry-over fixes) + Pass 32
(image registry), the recurring DNS-placeholder collapse drift category
is addressed end-to-end.

Validation log Pass 32 entry added.
2026-04-27 22:36:39 +02:00

333 lines
10 KiB
Markdown

# Temporal
Durable workflow orchestration with saga + compensation. **Application Blueprint** (see [`docs/PLATFORM-TECH-STACK.md`](../../docs/PLATFORM-TECH-STACK.md) §4.3 — Workflow & processing). Used by `bp-fabric` (composite Data & Integration Blueprint) for long-running, compensable workflows that span multiple Application services.
**Status:** Accepted | **Updated:** 2026-04-27
---
## Overview
Temporal is a durable execution platform that makes it simple to build reliable, long-running workflows and microservice orchestrations. Unlike traditional message queues or job schedulers, Temporal provides durable execution: workflow code survives process crashes, node failures, and even entire cluster restarts without losing state. Developers write workflows as ordinary code in their language of choice, and Temporal handles retries, timeouts, and state persistence transparently.
Within OpenOva, Temporal serves as the workflow orchestration engine for the **Fabric** data and integration product. It handles saga patterns for distributed transactions, long-running business processes, scheduled jobs, and any operation that needs reliable execution across multiple services. Temporal replaces fragile combinations of message queues, cron jobs, and custom state machines with a single, battle-tested platform.
Temporal's architecture separates the server (which manages workflow state) from workers (which execute workflow and activity code). Workers are stateless and can be scaled independently. The server persists all workflow state to a database, ensuring that workflows survive any infrastructure failure. SDKs are available for Go, Java, Python, and TypeScript, making Temporal accessible to polyglot teams.
---
## Architecture
```mermaid
flowchart TB
subgraph Clients["Workflow Clients"]
API[API Services]
Scheduler[Scheduled Jobs]
Events[Event Handlers]
end
subgraph Temporal["Temporal Server"]
Frontend[Frontend Service]
History[History Service]
Matching[Matching Service]
Worker_svc[Internal Worker]
end
subgraph Persistence["Persistence"]
PG[PostgreSQL / CNPG]
ES[Elasticsearch / OpenSearch]
end
subgraph Workers["Application Workers"]
W1[Order Worker]
W2[Payment Worker]
W3[Notification Worker]
end
Clients --> Frontend
Frontend --> History
Frontend --> Matching
History --> PG
Worker_svc --> ES
Matching --> W1
Matching --> W2
Matching --> W3
```
### Saga Pattern
```mermaid
sequenceDiagram
participant C as Client
participant T as Temporal
participant O as Order Service
participant P as Payment Service
participant I as Inventory Service
participant N as Notification Service
C->>T: Start OrderSaga
T->>O: CreateOrder
O-->>T: Order Created
T->>P: ProcessPayment
P-->>T: Payment OK
T->>I: ReserveInventory
I-->>T: Inventory Reserved
T->>N: SendConfirmation
N-->>T: Notification Sent
T-->>C: Saga Complete
Note over T,I: If ReserveInventory fails:
T->>P: CompensatePayment (refund)
T->>O: CancelOrder
```
---
## Key Features
| Feature | Description |
|---------|-------------|
| Durable Execution | Workflows survive process/node/cluster failures |
| Saga Orchestration | Coordinate distributed transactions with compensation |
| Retry Policies | Configurable retry with exponential backoff per activity |
| Timeouts | Start-to-close, schedule-to-start, and heartbeat timeouts |
| Cron Workflows | Replace cron jobs with reliable scheduled workflows |
| Versioning | Deploy new workflow logic without breaking running instances |
| Signals & Queries | Send data to and read state from running workflows |
| Child Workflows | Compose complex workflows from smaller building blocks |
| Visibility | Search and filter workflows by custom attributes |
---
## Configuration
### Helm Values
```yaml
temporal:
server:
replicas: 3
config:
persistence:
default:
driver: sql
sql:
driver: postgres12
host: temporal-postgres.databases.svc
port: 5432
database: temporal
user: temporal
password: ${PG_PASSWORD} # From ESO
visibility:
driver: sql
sql:
driver: postgres12
host: temporal-postgres.databases.svc
port: 5432
database: temporal_visibility
user: temporal
password: ${PG_PASSWORD} # From ESO
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi
admintools:
enabled: true
web:
enabled: true
ingress:
enabled: true
hosts:
- temporal.fuse.<domain>
prometheus:
enabled: true
```
### Namespace Setup
```bash
# Create Temporal namespace for workload isolation
tctl namespace register \
--namespace orders \
--retention 30d \
--description "Order processing workflows"
```
---
## Workflow Examples
### Go SDK - Order Saga
```go
package workflows
import (
"time"
"go.temporal.io/sdk/temporal"
"go.temporal.io/sdk/workflow"
)
func OrderSagaWorkflow(ctx workflow.Context, order Order) (OrderResult, error) {
retryPolicy := &temporal.RetryPolicy{
InitialInterval: time.Second,
BackoffCoefficient: 2.0,
MaximumInterval: time.Minute,
MaximumAttempts: 5,
}
activityOpts := workflow.ActivityOptions{
StartToCloseTimeout: 30 * time.Second,
RetryPolicy: retryPolicy,
}
ctx = workflow.WithActivityOptions(ctx, activityOpts)
// Step 1: Create Order
var orderID string
err := workflow.ExecuteActivity(ctx, CreateOrder, order).Get(ctx, &orderID)
if err != nil {
return OrderResult{}, err
}
// Step 2: Process Payment (with compensation)
var paymentID string
err = workflow.ExecuteActivity(ctx, ProcessPayment, orderID, order.Amount).Get(ctx, &paymentID)
if err != nil {
// Compensate: cancel the order
_ = workflow.ExecuteActivity(ctx, CancelOrder, orderID).Get(ctx, nil)
return OrderResult{}, err
}
// Step 3: Reserve Inventory (with compensation)
err = workflow.ExecuteActivity(ctx, ReserveInventory, orderID, order.Items).Get(ctx, nil)
if err != nil {
// Compensate: refund payment, cancel order
_ = workflow.ExecuteActivity(ctx, RefundPayment, paymentID).Get(ctx, nil)
_ = workflow.ExecuteActivity(ctx, CancelOrder, orderID).Get(ctx, nil)
return OrderResult{}, err
}
// Step 4: Send Confirmation
_ = workflow.ExecuteActivity(ctx, SendConfirmation, orderID).Get(ctx, nil)
return OrderResult{OrderID: orderID, PaymentID: paymentID, Status: "completed"}, nil
}
```
### Python SDK - Data Pipeline
```python
from temporalio import workflow, activity
from datetime import timedelta
@activity.defn
async def extract_data(source: str) -> dict:
# Extract data from source system
...
@activity.defn
async def transform_data(raw_data: dict) -> dict:
# Apply business transformations
...
@activity.defn
async def load_data(transformed: dict) -> str:
# Write to destination
...
@workflow.defn
class DataPipelineWorkflow:
@workflow.run
async def run(self, source: str) -> str:
raw = await workflow.execute_activity(
extract_data, source,
start_to_close_timeout=timedelta(minutes=10),
)
transformed = await workflow.execute_activity(
transform_data, raw,
start_to_close_timeout=timedelta(minutes=30),
)
result = await workflow.execute_activity(
load_data, transformed,
start_to_close_timeout=timedelta(minutes=10),
)
return result
```
---
## Worker Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-workflow-worker
namespace: fuse
spec:
replicas: 3
template:
spec:
containers:
- name: worker
image: harbor.<location-code>.<sovereign-domain>/fabric/order-worker:latest
env:
- name: TEMPORAL_HOST
value: temporal-frontend.temporal.svc:7233
- name: TEMPORAL_NAMESPACE
value: orders
- name: TEMPORAL_TASK_QUEUE
value: order-processing
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1
memory: 1Gi
```
---
## Monitoring
| Metric | Description |
|--------|-------------|
| `temporal_workflow_started_total` | Workflows started |
| `temporal_workflow_completed_total` | Workflows completed successfully |
| `temporal_workflow_failed_total` | Workflows failed |
| `temporal_workflow_task_queue_depth` | Pending tasks per queue |
| `temporal_activity_execution_latency` | Activity execution duration |
| `temporal_workflow_endtoend_latency` | Total workflow duration |
| `temporal_schedule_missed_catchup_window` | Missed cron schedule executions |
---
## Consequences
**Positive:**
- Durable execution eliminates custom retry/state-machine code across all services
- Saga pattern support simplifies distributed transaction management
- Multi-language SDKs (Go, Java, Python, TypeScript) suit polyglot teams
- Workflow versioning enables safe deployments without breaking running instances
- Built-in visibility and search make debugging production workflows practical
- Cron workflows replace fragile crontab/CronJob setups with reliable scheduling
**Negative:**
- Requires PostgreSQL for persistence, adding a database dependency
- Temporal server itself needs careful sizing and operational attention
- Workflow determinism constraints require developer discipline (no random, no system clock)
- Learning curve for understanding event sourcing and replay semantics
- Debugging replayed workflows requires familiarity with Temporal's execution model
- Large workflow histories can impact performance without proper archival configuration
---
*Part of [OpenOva Fabric](https://openova.io) - Data & Integration*