New Query Optimization For Running Stuck Jobs (!71601) · Merge requests · GitLab.org / GitLab

Allison Browne requested to merge optimize-running-stuck-6 into master Oct 01, 2021

What does this MR do and why?

Behind flag ci_new_query_for_running_stuck_jobs

Optimize the database query as described in the database heading. Please see my last MR for the existing query explain plan. Rollout of that query still saw PG::QueryCanceled: ERROR: canceling statement due to statement timeout failures. This one seems to perform a little bit better in database lab.

Database

SELECT *
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build'
    AND ("ci_builds"."status" IN ('running'))
    AND (ci_builds.created_at < '2021-10-01 16:32:41')
    AND (ci_builds.updated_at < '2021-10-01 16:32:41')
ORDER BY "ci_builds"."created_at" ASC, "ci_builds"."project_id" ASC LIMIT 100;

 Limit  (cost=0.70..144.52 rows=100 width=1286) (actual time=0.122..2.707 rows=100 loops=1)
   Buffers: shared hit=638
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using ci_builds_gitlab_monitor_metrics on public.ci_builds  (cost=0.70..555647.31 rows=386366 width=1286) (actual time=0.121..2.686 rows=100 loops=1)
         Index Cond: (((ci_builds.status)::text = 'running'::text) AND (ci_builds.created_at < '2021-10-01 16:32:41'::timestamp without time zone))
         Filter: (ci_builds.updated_at < '2021-10-01 16:32:41'::timestamp without time zone)
         Rows Removed by Filter: 0
         Buffers: shared hit=638
         I/O Timings: read=0.000 write=0.000

Time: 3.445 ms
  - planning: 0.689 ms
  - execution: 2.756 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 638 (~5.00 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/6739/commands/23784

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Oct 01, 2021 by Allison Browne

New Query Optimization For Running Stuck Jobs

What does this MR do and why?

Database

MR acceptance checklist

Merge request reports