Fix Secure feature API fetching group projects with security reports
Discovered in !62092 (closed) we found ee/app/finders/ee/group_projects_finder.rb joins between CI and non-CI tables when the with_security_reports
is called. It looks like this finder uses the :with_security_reports
scope.
It seems we have a group project API where we can filter a group's projects by whether it has with_security_reports
, or not.
The query that is failing in question:
SELECT "projects".* FROM ((SELECT "projects".* FROM "projects" WHERE "projects"."namespace_id" = 8 AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 3 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (10,20)))
UNION
(SELECT "projects".* FROM "projects" INNER JOIN "project_group_links" ON "projects"."id" = "project_group_links"."project_id" WHERE "project_group_links"."group_id" = 8 AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 3 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (10,20)))) projects WHERE "projects"."pending_delete" = FALSE AND (EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" IN (5, 6, 7, 8, 21, 23, 26, 27) AND (ci_job_artifacts.project_id = projects.id))) ORDER BY "projects"."id" DESC
Failing spec
https://gitlab.com/gitlab-org/gitlab/-/jobs/1440069680
Options
- Introduce a new table to store whether or not a project has security reports (de-normalization) and UPSERT into this table whenever we create security reports. Could be called
projects_with_security_reports
and only needs to contain theproject_id
for this use case. - Change this sub-select to 3 separate queries. Firstly load all the relevant projects (this should be bounded by the fact that it's a group API filter) and then do another query to find the list of those that have
ci_job_artifacts
withfile_type IN
then use the filtered list of project ids to find the final list of projects. This is probably not a good option for very large groups with thousands of projects because our usage in https://gitlab.com/gitlab-org/gitlab/-/blob/d6f264fc4a623ce7892186e64cbc3c1a2507a993/ee/app/assets/javascripts/security_dashboard/store/modules/projects/actions.js is including subgroups and not filtering much. Since the ordering and pagination is on the projects it would not be possible to order and limit the first candidate set of project ids much so it would mean sending thousands of project ids in the intermediate query when ultimately we're only going to want 100. This would look something like:scope :with_security_reports, -> do project_id_currently_filtered = self.pluck(:id) filtered_project_ids_with_security_reports = ::Ci::JobArtifact.security_reports.where(project_id: project_id_currently_filtered).distinct.pluck(:project_id) where(id: filtered_project_ids_with_security_reports) end
- Is there some other table that already stores some metadata about security reports that isn't in the
ci_*
tables that we could join to? (eg.security_scans
orsecurity_findings
noting thatsecurity_scans
is going to be de-normalized to have aproject_id
column soon to avoid joiningci_builds
)
Sharding team recommended option
Introduce a new table to store whether or not a project has security reports (de-normalization) and UPSERT into this table whenever we create security reports. Could be called projects_with_security_reports
and only needs to contain the project_id
for this use case.
Furthermore this looks a lot like #336199 (closed) so we may want to fix both in 1 go with a combined table to solve both problems called projects_with_ci_feature_usage
with 2 columns project_id, ci_feature
and then we can just store "code_coverage"
and "security_report"
in the ci_feature
column when this is first used for the project.
Implementation plan
Expand for previous implementation plan
-
Upsert a new record into the project_ci_feature_usages
table when asecurity_report
artifact is created -
Backfill project_ci_feature_usages
table with existing project data -
Update EE::Project.with_security_reports to use project_ci_feature_usages
table
Remove with_security_reports
scope which relies on the ci_job_artifacts
table, and add with_security_scans
scope which uses the security_scans
table, removing the cross-table join: !70031 (merged)
/cc @DylanGriffith