Add workaround in Container Scanning to allow us to update Trivy without first downloading java-db
Problem to solve
The version of Trivy
used in Container Scanning is 0.36.1, however, the most recent version of Trivy
is v0.39.0.
Unfortunately, we're currently blocked from being able to upgrade Container Scanning to use a more recent version of Trivy
because trivy >= v0.37.0
includes a new feature to automatically download a Java DB
when generating an SBOM
, which causes problems in an offline environment:
-
If we attempt use
Trivy
in an offline environment, then an error is returned:ERROR Unable to initialize the Java DB: Java DB update failed: Java DB update error: oci error: OCI repository error: Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io on 192.168.65.5:53: write udp 172.17.0.3:41561->192.168.65.5:53: write: operation not permitted
-
We don't want to include the
java-db
in thecontainer-scanning
image because it adds 679M to the image size. -
We can't skip updating the
java-db
by passing--skip-java-db-update
, otherwise an error is returned:ERROR The first run cannot skip downloading Java DB
We need to solve this issue in order to upgrade to more recent versions of Trivy
.
Background details
I created the following bug report in the upstream Trivy
project: Can't use Trivy v0.38.0 in offline environment without first fetching java-db #3980, however, it seems that this is expected behaviour, so the bug
was closed and a feature request created instead: Add ability to disable JAR scanning #3987.
There are currently three different scenarios that trigger a download of the java-db
and cause an error in an offline environment:
- When
CS_DISABLE_DEPENDENCY_LIST
isfalse
(the default setting). - When
CS_DISABLE_LANGUAGE_VULNERABILITY_SCAN
is false (default is true). - When generating an
SBOM
.
We need to make sure that we come up with an approach that works in an offline environment for all three of the above cases.
Proposal
Here are some possible solutions for this issue:
- Complete Add ability to disable JAR scanning in the upstream
Trivy
project.- Pros
- Allows the behaviour of downloading data to be configured.
- Works for both offline and online instances.
- Cons
- Need to implement this change in the upstream
trivy
project, which might not be accepted. - High chance of unreported vulnerabilities.
- Needs additional configuration in offline environments if a user wants to make sure all vulnerabilities are reported.
- Need to implement this change in the upstream
- Pros
- Add a skeleton
java-db
to the container scanning image.- Pros
- Easy to implement.
- User is not forced to download additional data.
- Works in an offline environment without any additional changes.
- Cons
- Can't easily change the behaviour, need to add another environment variable to container scanning to allow this to be overridden, which increases the complexity of the implementation.
- The default behaviour prevents
JAR
vulnerabilities from being detected in online instances. - High chance of unreported vulnerabilities.
- Pros
- Add a new
CS_TRIVY_JAVA_DB
environment variable and pass this totrivy
using--java-db-repository
.- Pros
- Easy to implement.
- Approach is flexible, since users can modify the
CS_TRIVY_JAVA_DB
var to point to any data source they want. - Vulnerabilities will be reported, as long as they're prsent in the
CS_TRIVY_JAVA_DB
. - Data is only fetched when scanning an image containing
JAR
files. - Works in both offline and online instances.
- Cons
- User is forced to download additional data.
- Needs additional configuration in offline environments.
- Pros
After discussing this here, approach 3.
seems like the best option.
Workaround
The following workaround can be used to upgrade to a more recent version of trivy until we've had a chance to properly solve this issue:
Create a custom Docker file, using registry.gitlab.com/security-products/container-scanning:latest
as the base image:
FROM registry.gitlab.com/security-products/container-scanning:latest
ENV TRIVY_VERSION=0.41.0
RUN sudo apt-get update && sudo apt-get install -y wget
RUN wget --no-verbose https://github.com/aquasecurity/trivy/releases/download/v"${TRIVY_VERSION}"/trivy_"${TRIVY_VERSION}"_Linux-64bit.tar.gz -O - | tar -zxvf - -C /home/gitlab/opt/trivy
Implementation Plan
NOTE: This issue is currently blocked by Add offline tests for Container Scanning (#404557 - closed), since we need offline tests in place to ensure that the implementation works as expected.
- Add a new variable named
CS_TRIVY_JAVA_DB
. - Add a new method called
trivy_java_db
to environment.rb, which defaults toregistry.gitlab.com/gitlab-org/security-products/dependencies/trivy-java-db
ghcr.io/aquasecurity/trivy-java-db:1
(see this discussion for details). - Update the scan_command, os_scan_command and sbom_scan_command methods to pass
--java-db-repository #{Gcs::Environment.trivy_java_db}
. - Document the new
CS_TRIVY_JAVA_DB
variable in the Container Scanning documentation. Make sure to include details on how to use this in offline instances. - Revert this MR Remove trivy from trigger-scanner-update job (gitlab-org/security-products/analyzers/container-scanning!2911 - merged) since we can now release new versions of container scanning.
/cc @sam.white @gonzoyumo