openova/platform/trivy
e3mrah 0dbdf3b327
fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772)
PR #755 added `node-role.kubernetes.io/control-plane=true:NoSchedule` to
the CP node when worker_count > 0. Two bootstrap-kit charts have pods
that MUST land on the CP and lacked the matching toleration:

bp-trivy
  • node-collector: Pod pinned to each node via nodeSelector
    `kubernetes.io/hostname=<node>`. The CP-bound collector reads
    /var/lib/etcd, /var/lib/kubelet, /var/lib/kube-scheduler,
    /var/lib/kube-controller-manager via hostPath — these only exist
    on the CP. Without the toleration the collector sat Pending forever
    on otech93 (live evidence in #769).
  • scanJobTolerations: per-workload scan jobs the operator spawns may
    target pods on CP-only system DaemonSets (kube-system kube-proxy
    in non-Cilium mode, etc.). Adding the toleration here so reports
    are produced for those workloads too.

bp-alloy
  • DaemonSet — one pod MUST land on every node including the CP, so
    CP-local kubelet logs + node metrics flow into the LGTM stack.
    Without the toleration Alloy ran 3/4 nodes (Ready=N-1) on otech93
    and CP telemetry was silently lost.

Both tolerations are no-ops on solo Sovereigns (worker_count=0): the CP
is untainted in solo mode per PR #755's conditional.

Versions bumped:
  • bp-trivy 1.0.2 → 1.0.3 (Chart.yaml + 3× HelmRelease pins)
  • bp-alloy 1.0.0 → 1.0.1 (Chart.yaml + 3× HelmRelease pins)

Out of scope (audited, no change needed):
  • bp-cilium — upstream defaults already tolerate everything (verified
    on otech93: cilium DaemonSet at 4/4 nodes).
  • bp-falco — values.yaml already declares NoSchedule + NoExecute
    Exists tolerations (4/4 on otech93).
  • cnpg/harbor — no kubelet-cert-renew Jobs in current charts.

Verified:
  • `helm template` on both charts renders the expected toleration
    (alloy: pod-spec; trivy: trivy-operator-config ConfigMap consumed
     by the operator at scan-job spawn time).
  • `bash scripts/check-bootstrap-deps.sh` PASSED (no DAG drift).

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:38:29 +02:00
..
chart fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772) 2026-05-04 17:38:29 +02:00
blueprint.yaml feat(platform): security umbrellas (falco/kyverno/trivy/sigstore/syft-grype/reloader/coraza/litmus) (#216) 2026-04-30 06:07:38 +02:00
README.md docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs 2026-04-27 22:36:39 +02:00

Trivy

Image and IaC vulnerability scanning. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.3) — runs in CI for Blueprint scans, in Harbor for registry scans, and at runtime via Trivy Operator on every host cluster.

Status: Accepted | Updated: 2026-04-27


Overview

Trivy provides unified security scanning at multiple levels: CI/CD, registry, and runtime.

flowchart LR
    subgraph CI["CI/CD Pipeline"]
        Code[Code] --> Scan1[Trivy Scan]
        Scan1 --> Build[Build Image]
    end

    subgraph Registry
        Build --> Harbor
        Harbor --> Scan2[Trivy Scan]
    end

    subgraph Runtime["Kubernetes"]
        Harbor --> Deploy[Deploy]
        TO[Trivy Operator] --> Scan3[Continuous Scan]
    end

Scanning Levels

Level Integration Trigger
CI/CD Gitea Actions On push/PR
Registry Harbor (built-in) On push
Runtime Trivy Operator Continuous

Scanning Capabilities

Target Command
Container images trivy image
Kubernetes manifests trivy config
IaC (Terraform) trivy config
SBOM generation trivy sbom
Secrets detection trivy fs --scanners secret

Harbor Integration

Harbor includes Trivy scanning. Images are automatically scanned on push.

sequenceDiagram
    participant CI as CI/CD
    participant H as Harbor
    participant T as Trivy
    participant K as Kubernetes

    CI->>H: Push image
    H->>T: Trigger scan
    T->>H: Return vulnerabilities
    alt Critical vulnerabilities
        H-->>CI: Block deployment
    else Clean
        H->>K: Allow pull
    end

Scan Policies

Severity CI/CD Action Harbor Action
Critical Fail build Block pull
High Warn Allow (configurable)
Medium Info Allow
Low Info Allow

Trivy Operator

Continuous runtime scanning in Kubernetes:

apiVersion: aquasecurity.github.io/v1alpha1
kind: VulnerabilityReport
# Generated automatically for each workload

Installation

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: trivy-operator
  namespace: trivy-system
spec:
  interval: 10m
  chart:
    spec:
      chart: trivy-operator
      version: "0.20.x"
      sourceRef:
        kind: HelmRepository
        name: aqua
        namespace: flux-system
  values:
    trivy:
      ignoreUnfixed: true
    operator:
      scanJobsConcurrentLimit: 5

CI/CD Integration

Gitea Actions

name: Security Scan
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Scan filesystem
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

      - name: Scan Kubernetes manifests
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          scan-ref: './k8s'
          severity: 'CRITICAL,HIGH'

Kyverno Policy

Block deployment of vulnerable images:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-vulnerable-images
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-vulnerabilities
      match:
        any:
          - resources:
              kinds:
                - Pod
      verifyImages:
        - imageReferences:
            - "harbor.<location-code>.<sovereign-domain>/*"
          attestations:
            - type: https://cosign.sigstore.dev/attestation/vuln/v1
              conditions:
                - all:
                    - key: "{{ scanner }}"
                      operator: Equals
                      value: "trivy"
                    - key: "{{ criticalCount }}"
                      operator: LessThanOrEquals
                      value: "0"

Monitoring

Key Metrics

Metric Query
Vulnerability count trivy_vulnerability_id
Critical vulns count(trivy_vulnerability_id{severity="CRITICAL"})
Scan status trivy_image_vulnerabilities

Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: trivy-alerts
  namespace: monitoring
spec:
  groups:
    - name: trivy
      rules:
        - alert: CriticalVulnerabilityFound
          expr: count(trivy_vulnerability_id{severity="CRITICAL"}) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Critical vulnerability detected"

Consequences

Positive:

  • Unified scanning across CI/CD, registry, and runtime
  • Integrated with Harbor (mandatory component)
  • Shift-left security with fast feedback
  • SBOM generation for compliance

Negative:

  • False positives require triage
  • Scan time adds to CI/CD pipeline
  • Operator resources in cluster

Part of OpenOva