PR #755 added `node-role.kubernetes.io/control-plane=true:NoSchedule` to
the CP node when worker_count > 0. Two bootstrap-kit charts have pods
that MUST land on the CP and lacked the matching toleration:
bp-trivy
• node-collector: Pod pinned to each node via nodeSelector
`kubernetes.io/hostname=<node>`. The CP-bound collector reads
/var/lib/etcd, /var/lib/kubelet, /var/lib/kube-scheduler,
/var/lib/kube-controller-manager via hostPath — these only exist
on the CP. Without the toleration the collector sat Pending forever
on otech93 (live evidence in #769).
• scanJobTolerations: per-workload scan jobs the operator spawns may
target pods on CP-only system DaemonSets (kube-system kube-proxy
in non-Cilium mode, etc.). Adding the toleration here so reports
are produced for those workloads too.
bp-alloy
• DaemonSet — one pod MUST land on every node including the CP, so
CP-local kubelet logs + node metrics flow into the LGTM stack.
Without the toleration Alloy ran 3/4 nodes (Ready=N-1) on otech93
and CP telemetry was silently lost.
Both tolerations are no-ops on solo Sovereigns (worker_count=0): the CP
is untainted in solo mode per PR #755's conditional.
Versions bumped:
• bp-trivy 1.0.2 → 1.0.3 (Chart.yaml + 3× HelmRelease pins)
• bp-alloy 1.0.0 → 1.0.1 (Chart.yaml + 3× HelmRelease pins)
Out of scope (audited, no change needed):
• bp-cilium — upstream defaults already tolerate everything (verified
on otech93: cilium DaemonSet at 4/4 nodes).
• bp-falco — values.yaml already declares NoSchedule + NoExecute
Exists tolerations (4/4 on otech93).
• cnpg/harbor — no kubelet-cert-renew Jobs in current charts.
Verified:
• `helm template` on both charts renders the expected toleration
(alloy: pod-spec; trivy: trivy-operator-config ConfigMap consumed
by the operator at scan-job spawn time).
• `bash scripts/check-bootstrap-deps.sh` PASSED (no DAG drift).
Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>