Return a unique SBoM occurrence for each unique component/version
What does this MR do and why?
This change uses a window function to compute the occurrence and project counts and returns a unique SBoM occurrence for each unique SBoM Component/Version to reduce the amount of data that is needed to filter and sort through.
Related to:
Before:
WITH "our_occurrences" AS MATERIALIZED (
SELECT "sbom_occurrences".*
FROM "sbom_occurrences"
WHERE "sbom_occurrences"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{6543}'))
)
AND "projects"."marked_for_deletion_at" IS NULL
AND "projects"."pending_delete" = FALSE
)
)
SELECT sbom_occurrences.*, agg_occurrences.occurrence_count, agg_occurrences.project_count
FROM "sbom_occurrences"
INNER JOIN (
SELECT component_id,
COUNT(DISTINCT id) AS occurrence_count,
COUNT(DISTINCT project_id) AS project_count
FROM our_occurrences
GROUP BY component_id
) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
WHERE "sbom_occurrences"."id" IN (
SELECT "our_occurrences"."id"
FROM "our_occurrences"
)
ORDER BY "sbom_occurrences"."id" ASC
LIMIT 20 OFFSET 0;
Time: 496.995 ms
- planning: 11.399 ms
- execution: 485.596 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 248033 (~1.90 GiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22603/commands/72705
After:
WITH "our_occurrences" AS MATERIALIZED (
SELECT DISTINCT ON (sbom_occurrences.component_version_id) sbom_occurrences.*,
COUNT(sbom_occurrences.id) OVER (PARTITION BY sbom_occurrences.component_version_id) AS occurrence_count,
COUNT(sbom_occurrences.project_id) OVER (PARTITION BY sbom_occurrences.component_version_id) AS project_count
FROM "sbom_occurrences"
WHERE "sbom_occurrences"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{6543}'))
)
AND "projects"."marked_for_deletion_at" IS NULL
AND "projects"."pending_delete" = FALSE
)
AND "sbom_occurrences"."component_version_id" IS NOT NULL
ORDER BY
"sbom_occurrences"."component_version_id" DESC,
"sbom_occurrences"."id" DESC
)
SELECT "sbom_occurrences".*
FROM "our_occurrences" AS "sbom_occurrences"
ORDER BY sbom_occurrences.id asc
LIMIT 20 OFFSET 0;
Time: 282.846 ms
- planning: 10.849 ms
- execution: 271.997 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 54593 (~426.50 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22603/commands/72708
Screenshots or screen recordings
Before | After |
---|---|
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by mo khan