Build Docker image with r2c-hosted ruleset instead of local rules
TL;DR r2c would like to utilize rules provided by our systems as the single source of truth for this analyzer.
What does this MR do?
Hi all,
I'm posting this MR early to get quick feedback before proceeding with this approach. The r2c team is still working on the relevant rulesets that this MR will download. I'll update this MR as additional development occurs.
The goal of this MR is to reconcile r2c's and GitLab's rules into rulesets that we can both utilize, while also adhering to our licensing requirements. My ask is that you review this MR, make sure it meets your expectations for analyzers, and give the general approach, but not necessarily the current ruleset, a
This MR is operating under the following assumptions:
- Ruleset hash pinning is not yet implemented (e.g.
/p/semgrep-sast@123abc
). This functionality is unlikely to make it in before GitLab's 14.0 release. Because of this we will manually ensure that the rulesets we provide to this analyzer do not change until we have hash pinning in place, then we can enforce automatic verification. - We are baking the rulesets into the Docker image at build time and not leveraging their URLs at analyzer run time. This will mitigate reliability concerns and allow for running the Docker images in air-gapped environments.
- Rule IDs will now be prepended with
gitlab.
(e.g.bandit.B101
->gitlab.bandit.B101
). This introduces some backwards-compatibility concerns. We have some flexibility here, but we feel this will give the best user experience across our two systems.
From a technical standpoint, we will be making the following changes to our systems to fulfill the above requirements:
- We will be ingesting your custom written rules from the
rules/
directory in this repository. - We will be storing our shared rules in the
gitlab/
directory of thesemgrep-rules
repository. - We will then combine the above rules into two rulesets:
/p/gitlab-bandit
and/p/gitlab-eslint
. - Finally, we will download the YAML from the above rulesets and bake it into the analyzer Docker image at build time.
The code associated with this MR gives a rough outline of what the final changes will look like. Once everything is settled with the new approach we can remove the shared rules from this repository, which should make de-duplication easier on our end. In the future we will also plan to enable ruleset hash pinning for extra assurance.
Does this seem like a reasonable approach? Let me know if you have any questions or concerns!
What are the relevant issue numbers?
Does this MR meet the acceptance criteria?
-
Changelog entry added -
Documentation created/updated for GitLab EE, if necessary -
Documentation created/updated for this project, if necessary -
Documentation reviewed by technical writer or follow-up review issue created -
Tests added for this feature/bug -
Job definition updated, if necessary -
Conforms to the code review guidelines -
Conforms to the Go guidelines -
Security reports checked/validated by reviewer