PR #755 added `node-role.kubernetes.io/control-plane=true:NoSchedule` to the CP node when worker_count > 0. Two bootstrap-kit charts have pods that MUST land on the CP and lacked the matching toleration: bp-trivy • node-collector: Pod pinned to each node via nodeSelector `kubernetes.io/hostname=<node>`. The CP-bound collector reads /var/lib/etcd, /var/lib/kubelet, /var/lib/kube-scheduler, /var/lib/kube-controller-manager via hostPath — these only exist on the CP. Without the toleration the collector sat Pending forever on otech93 (live evidence in #769). • scanJobTolerations: per-workload scan jobs the operator spawns may target pods on CP-only system DaemonSets (kube-system kube-proxy in non-Cilium mode, etc.). Adding the toleration here so reports are produced for those workloads too. bp-alloy • DaemonSet — one pod MUST land on every node including the CP, so CP-local kubelet logs + node metrics flow into the LGTM stack. Without the toleration Alloy ran 3/4 nodes (Ready=N-1) on otech93 and CP telemetry was silently lost. Both tolerations are no-ops on solo Sovereigns (worker_count=0): the CP is untainted in solo mode per PR #755's conditional. Versions bumped: • bp-trivy 1.0.2 → 1.0.3 (Chart.yaml + 3× HelmRelease pins) • bp-alloy 1.0.0 → 1.0.1 (Chart.yaml + 3× HelmRelease pins) Out of scope (audited, no change needed): • bp-cilium — upstream defaults already tolerate everything (verified on otech93: cilium DaemonSet at 4/4 nodes). • bp-falco — values.yaml already declares NoSchedule + NoExecute Exists tolerations (4/4 on otech93). • cnpg/harbor — no kubelet-cert-renew Jobs in current charts. Verified: • `helm template` on both charts renders the expected toleration (alloy: pod-spec; trivy: trivy-operator-config ConfigMap consumed by the operator at scan-job spawn time). • `bash scripts/check-bootstrap-deps.sh` PASSED (no DAG drift). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
Trivy
Image and IaC vulnerability scanning. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.3) — runs in CI for Blueprint scans, in Harbor for registry scans, and at runtime via Trivy Operator on every host cluster.
Status: Accepted | Updated: 2026-04-27
Overview
Trivy provides unified security scanning at multiple levels: CI/CD, registry, and runtime.
flowchart LR
subgraph CI["CI/CD Pipeline"]
Code[Code] --> Scan1[Trivy Scan]
Scan1 --> Build[Build Image]
end
subgraph Registry
Build --> Harbor
Harbor --> Scan2[Trivy Scan]
end
subgraph Runtime["Kubernetes"]
Harbor --> Deploy[Deploy]
TO[Trivy Operator] --> Scan3[Continuous Scan]
end
Scanning Levels
| Level | Integration | Trigger |
|---|---|---|
| CI/CD | Gitea Actions | On push/PR |
| Registry | Harbor (built-in) | On push |
| Runtime | Trivy Operator | Continuous |
Scanning Capabilities
| Target | Command |
|---|---|
| Container images | trivy image |
| Kubernetes manifests | trivy config |
| IaC (Terraform) | trivy config |
| SBOM generation | trivy sbom |
| Secrets detection | trivy fs --scanners secret |
Harbor Integration
Harbor includes Trivy scanning. Images are automatically scanned on push.
sequenceDiagram
participant CI as CI/CD
participant H as Harbor
participant T as Trivy
participant K as Kubernetes
CI->>H: Push image
H->>T: Trigger scan
T->>H: Return vulnerabilities
alt Critical vulnerabilities
H-->>CI: Block deployment
else Clean
H->>K: Allow pull
end
Scan Policies
| Severity | CI/CD Action | Harbor Action |
|---|---|---|
| Critical | Fail build | Block pull |
| High | Warn | Allow (configurable) |
| Medium | Info | Allow |
| Low | Info | Allow |
Trivy Operator
Continuous runtime scanning in Kubernetes:
apiVersion: aquasecurity.github.io/v1alpha1
kind: VulnerabilityReport
# Generated automatically for each workload
Installation
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: trivy-operator
namespace: trivy-system
spec:
interval: 10m
chart:
spec:
chart: trivy-operator
version: "0.20.x"
sourceRef:
kind: HelmRepository
name: aqua
namespace: flux-system
values:
trivy:
ignoreUnfixed: true
operator:
scanJobsConcurrentLimit: 5
CI/CD Integration
Gitea Actions
name: Security Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Scan filesystem
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
severity: 'CRITICAL,HIGH'
exit-code: '1'
- name: Scan Kubernetes manifests
uses: aquasecurity/trivy-action@master
with:
scan-type: 'config'
scan-ref: './k8s'
severity: 'CRITICAL,HIGH'
Kyverno Policy
Block deployment of vulnerable images:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-vulnerable-images
spec:
validationFailureAction: Enforce
rules:
- name: check-vulnerabilities
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "harbor.<location-code>.<sovereign-domain>/*"
attestations:
- type: https://cosign.sigstore.dev/attestation/vuln/v1
conditions:
- all:
- key: "{{ scanner }}"
operator: Equals
value: "trivy"
- key: "{{ criticalCount }}"
operator: LessThanOrEquals
value: "0"
Monitoring
Key Metrics
| Metric | Query |
|---|---|
| Vulnerability count | trivy_vulnerability_id |
| Critical vulns | count(trivy_vulnerability_id{severity="CRITICAL"}) |
| Scan status | trivy_image_vulnerabilities |
Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: trivy-alerts
namespace: monitoring
spec:
groups:
- name: trivy
rules:
- alert: CriticalVulnerabilityFound
expr: count(trivy_vulnerability_id{severity="CRITICAL"}) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Critical vulnerability detected"
Consequences
Positive:
- Unified scanning across CI/CD, registry, and runtime
- Integrated with Harbor (mandatory component)
- Shift-left security with fast feedback
- SBOM generation for compliance
Negative:
- False positives require triage
- Scan time adds to CI/CD pipeline
- Operator resources in cluster
Part of OpenOva