Optimized query to support "Don't list jobs with no artifacts"
This is the issue for backend support of #382074 (closed).
The current query times out for larger, complex data sets per #382074 (comment 1209912062):
Just coming back to this with the postgres.ai result. https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/13998/commands/48935
❗ ️ Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. <http://momjian.us/main/writings/pgsql/hw_performance/|Show details>❗ ️ Add LIMIT – The number of rows in the result set is too big. Limit number of rows. <https://postgres.ai/#tip-add-limit|Show details>❗ ️ Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (noticeRows Removed by Filter: ...
, meaning that the index fetched many non-target rows). Consider adding more specialized index(es). <https://postgres.ai/#tip-index-inefficient-high-filtered|Show details>❗ ️ VACUUM ANALYZE needed – Visibility map(s) for the table(s) involved in the query are outdated. For better performance: 1) runVACUUM ANALYZE
on them as a one-time solution, 2) perform autovacuum tuning as a long-term permanent solution (tuning "vacuum" part of autovacuum). <https://postgres.ai/#tip-vacuum-analyze-needed|Show details>
We need to optimize the query.
Current query made by Ci::JobFinder
when called from ProjectJobsResolver
with with_artifacts: true
SELECT "ci_builds".*
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build'
AND "ci_builds"."project_id" = 278964
AND ("ci_builds"."status" NOT IN ('created'))
AND (
EXISTS (
SELECT 1 FROM "ci_job_artifacts" WHERE ("ci_builds".id = "ci_job_artifacts".job_id) AND "ci_job_artifacts"."file_type" IN (1, 2, 4, 5, 6, 7, 8, 9, 101, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
)
) ORDER BY "ci_builds"."id" DESC;
Query plan with exec
Gather Merge (cost=1001.31..205227435.50 rows=4611629 width=1247) (actual time=141.034..29501250.832 rows=5117617 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=414643740 read=69519177 dirtied=1162358 written=89560
I/O Timings: read=53284434.610 write=5532.121
-> Nested Loop Semi Join (cost=1.28..204694139.07 rows=1921512 width=1247) (actual time=71.638..18293555.500 rows=1705872 loops=3)
Buffers: shared hit=414643740 read=69519177 dirtied=1162358 written=89560
I/O Timings: read=53284434.610 write=5532.121
-> Parallel Index Scan using index_ci_builds_on_project_id_and_id on public.ci_builds (cost=0.70..120789419.81 rows=36334530 width=1247) (actual time=26.366..16972977.633 rows=31472973 loops=3)
Index Cond: (ci_builds.project_id = 278964)
Filter: (((ci_builds.status)::text <> 'created'::text) AND ((ci_builds.type)::text = 'Ci::Build'::text))
Rows Removed by Filter: 126872
Buffers: shared hit=12876157 read=65273152 dirtied=229967 written=65646
I/O Timings: read=50191630.433 write=3933.365
-> Index Only Scan using index_ci_job_artifacts_on_job_id_and_file_type on public.ci_job_artifacts (cost=0.58..2.31 rows=1 width=8) (actual time=0.040..0.040 rows=0 loops=94418919)
Index Cond: (ci_job_artifacts.job_id = ci_builds.id)
Heap Fetches: 3782693
Filter: (ci_job_artifacts.file_type = ANY ('{1,2,4,5,6,7,8,9,101,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28}'::integer[]))
Rows Removed by Filter: 1
Buffers: shared hit=401767583 read=4246025 dirtied=930258 written=23914
I/O Timings: read=3092804.177 write=1598.756
Edited by Albert