Agent vulnerability resolution doesn't account for multiple workloads / agents
Summary
Agent vulnerability resolution works by submitting a list of Vulnerability UUIDs which were identified in a scan,
and resolving all the active cluster image scanning vulnerabilities which have UUIDs that do not appear in the list.
This does not behave as expected when there are multiple workloads, because each workload is scanned in a separate goroutine,
and each goroutine submits UUIDs to internal/kubernetes/modules/starboard_vulnerability/scan_result
individually. When we scan more than one workload, we encounter this conflict:
- Workload A is scanned
- Vulnerabilities for workload A are created in GitLab
- Vulnerabilities not present in the scan for workload A are resolved
- Workload B is scanned
- Vulnerabilities for workload B are created in GitLab
- Vulnerabilities not present in the scan for workload B are resolved
- This resolves all the vulnerabilities from the scan of Workload A, even if they are still present.
Attempting to use scanning with multiple agents in one project will result in a similar problem. When scanning with two agents A and B, agent B will mark all of the vulnerabilities detected by agent A as resolved upon completion.
Steps to reproduce
-
Ensure your GDK installation runs KAS from master:
# from GDK root echo master > gitlab/GITLAB_KAS_VERSION make gitlab-k8s-agent-update-run gdk restart gitlab-k8s-agent
-
Create a new local project.
-
Connect an Agent in an existing cluster.
-
Tunnel your local KAS to make it reachable from within the cluster (I used ngrok for exmaple). Patch the deployment's
--kas-address
to point to the tunneled KAS. -
Create two deployments, e.g.:
kubectl create deployment ubuntu --image ubuntu:18.04 kubectl create deployment nginx --image nginx:1.20.0
-
Navigate to the project's "Operational vulnerabilities" tab in the Security Report. Filter for "Resolved" with the status dropdown and find that all vulnerabilities have been resolved:
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Possible fixes
- Change how agentk does vulnerability resolution:
- As scans are running, create a consolidated list of vulnerability UUIDs from all the different scan goroutines
- Once all the scans have been completed, submit at
scan_result
request with the UUIDs from all workloads - Another option would be do a composite query on the rails end for undetected vulnerabilities, where we also search for the location UUID. However, this will not resolve vulnerabilities for workloads which no longer exist, so I believe that this method is better.
- In order to support multiple agents, we also need to update
StarboardVulnerabilityResolveService
so that it queries for undetected findings by agent_id.
Implementation plan
This MR (Fix resolving cluster image scanning vulnerabil... (!91121 - merged)) was created to verify this Implementation plan, you can verify if this is working in our case.
-
backend modify Transmit
function instarboard_vulnerability/agent/reporter.go
(https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/internal/module/starboard_vulnerability/agent/reporter.go#L36) to return additionallyuuids
and makeresolveVulnerabilities
function as a public function, -
backend modify scan
function instarboard_vulnerability/agent/scanner.go
(https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/internal/module/starboard_vulnerability/agent/scanner.go#L102) to collect uuids from all reports and then send to API it usingResolveVulnerability
function, -
backend add scope :with_findings_for_agent_id
toee/app/models/ee/vulnerability.rb
, -
backend extend undetected
method inee/app/services/vulnerabilities/starboard_vulnerability_resolve_service.rb
to include vulnerabilities onlywith_findings_for_agent_id
,