Update the query that looks for stuck pending jobs
What does this MR do?
Related to #339237 (closed)
We have around 500 outdated stuck pending jobs in the database and this is causing a deadlock with the effort of migrating them to the ci_pending_builds
table. This MR changes the database query to start dropping of the older builds and catching up to the newer ones.
This change is behind a feature flag and can be disabled if the query proves to be problematic.
Query and execution plans:
I think the previous timeout errors were caused by the wrong index being used(we have some that overlap):
"ci_builds_gitlab_monitor_metrics" btree (status, created_at, project_id) WHERE type::text = 'Ci::Build'::text
"index_ci_builds_on_status_and_type_and_runner_id" btree (status, type, runner_id)
"index_ci_builds_runner_id_pending_covering" btree (runner_id, id) INCLUDE (project_id) WHERE status::text = 'pending'::text AND type::text = 'Ci::Build'::text
The data that is being searched is pretty small in size, at this moment there are under 10000 pending builds:
gitlabhq_production=> SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."id" IN (SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('pending'))) ;
count
-------
9134
(1 row)
As the worker processes these records, the query planner might decide to use a different index, so I've added a ORDER BY "ci_builds"."created_at" ASC, "ci_builds"."project_id"
to the end of the query to force it to use ci_builds_gitlab_monitor_metrics
all the time.
Screenshots or Screencasts (strongly suggested)
How to setup and validate locally (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team