Skip to content

Fix creating dependency list segment creation queries

Mehmet Emin INAC requested to merge minac_473262_improve_batching_query into master

What does this MR do and why?

PostgreSQL query planner chooses the wrong index when all the columns are selected from the sbom_occurrences table for batch queries. When we select only the indexed columns, the correct index is used. This change will cause firing an extra query for each batch but the extra one will query records by primary key.

Related to Segmented export parts can timeout due to plan ... (#473262 - closed).

Database review

Existing query
SELECT
    sbom_occurrences.*
FROM
    sbom_occurrences
WHERE
    traversal_ids >= '{4249178}' AND
    traversal_ids < '{4249179}' AND
    (
        sbom_occurrences.traversal_ids,
        sbom_occurrences.id
    ) >= (
        '{4249178,5030580,87255895}',
        5405902769
    ) AND
    (
        sbom_occurrences.traversal_ids,
        sbom_occurrences.id
    ) <= (
        '{4249178,5087837,9692938,9692949}',
        3801323122
    ) AND
    sbom_occurrences.archived = false
ORDER BY
    sbom_occurrences.traversal_ids ASC,
    sbom_occurrences.id ASC
LIMIT 1000;
Limit  (cost=5.85..291.79 rows=1000 width=355) (actual time=94.368..94.574 rows=1000 loops=1)
   Buffers: shared hit=38944
   I/O Timings: read=0.000 write=0.000
   ->  Incremental Sort  (cost=5.85..88551.81 rows=309663 width=355) (actual time=94.365..94.510 rows=1000 loops=1)
         Sort Key: sbom_occurrences.traversal_ids, sbom_occurrences.id
         Buffers: shared hit=38944
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using index_unarchived_sbom_occurrences_for_aggregations on public.sbom_occurrences  (cost=0.69..72740.94 rows=309663 width=355) (actual time=16.999..78.357 rows=20832 loops=1)
               Index Cond: ((sbom_occurrences.traversal_ids >= '{4249178}'::bigint[]) AND (sbom_occurrences.traversal_ids < '{4249179}'::bigint[]) AND (sbom_occurrences.traversal_ids >= '{4249178,5030580,87255895}'::bigint[]) AND (sbom_occurrences.traversal_ids <= '{4249178,5087837,9692938,9692949}'::bigint[]))
               Filter: ((ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) >= ROW('{4249178,5030580,87255895}'::bigint[], '5405902769'::bigint)) AND (ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) <= ROW('{4249178,5087837,9692938,9692949}'::bigint[], '3801323122'::bigint)))
               Rows Removed by Filter: 17701
               Buffers: shared hit=38933
               I/O Timings: read=0.000 write=0.000
New query
SELECT
    sbom_occurrences.id,
    sbom_occurrences.traversal_ids
FROM
    sbom_occurrences
WHERE
    traversal_ids >= '{4249178}' AND
    traversal_ids < '{4249179}' AND
    (
        sbom_occurrences.traversal_ids,
        sbom_occurrences.id
    ) >= (
        '{4249178,5030580,87255895}',
        5405902769
    ) AND
    (
        sbom_occurrences.traversal_ids,
        sbom_occurrences.id
    ) <= (
        '{4249178,5087837,9692938,9692949}',
        3801323122
    ) AND
    sbom_occurrences.archived = false
ORDER BY
    sbom_occurrences.traversal_ids ASC,
    sbom_occurrences.id ASC
LIMIT 1000;
Limit  (cost=0.56..49.92 rows=1000 width=54) (actual time=0.070..0.751 rows=1000 loops=1)
   Buffers: shared hit=144
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using index_sbom_occurrences_on_traversal_ids_and_id on public.sbom_occurrences  (cost=0.56..15283.55 rows=309663 width=54) (actual time=0.069..0.686 rows=1000 loops=1)
         Index Cond: ((sbom_occurrences.traversal_ids >= '{4249178}'::bigint[]) AND (sbom_occurrences.traversal_ids < '{4249179}'::bigint[]) AND (ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) >= ROW('{4249178,5030580,87255895}'::bigint[], '5405902769'::bigint)) AND (ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) <= ROW('{4249178,5087837,9692938,9692949}'::bigint[], '3801323122'::bigint)))
         Heap Fetches: 73
         Buffers: shared hit=144
         I/O Timings: read=0.000 write=0.000

Merge request reports

Loading