Improve PipelineScheduleWorker and check if schedule is executed
What does this MR do and why?
- Improve the performance for
PipelineScheduleWorker
to reduce the time to enqueue pipelines. - Check if schedule is executed or not.
Currently, the PipelineScheduleWorker
takes a long time to process, it might not be done before the next PipelineScheduleWorker
starts to run. This means the pipeline schedules that haven't been processed by the first PipelineScheduleWorker
might be picked up by the second PipelineScheduleWorker
.
There was a merge request !101328 (merged) trying to make the RunPipelineScheduleWorker
idempotent to reduce the duplicated enqueue. However, this didn't prevent the duplication from happening at two different timestamps and there are still duplications occurring reported by user in this thread #338609 (comment 1200751371).
Therefore, the aim of this MR is to reduce the time from executing PipelineScheduleWorker
so that it won't have any collapsing with the second run.
At the same time, a check is added to prevent the schedule from being executed when it's already scheduled in the next run.
Queries and their performance
This MR uses this scope:
Ci::PipelineSchedule
.select(:id, :owner_id) # Minimize the selected columns
.runnable_schedules
.find_in_batches do |schedules|
And the queries produced are:
-
First batch produces the following query:
SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000
The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49856
The explain is:
Limit (cost=0.42..171.29 rows=1000 width=8) (actual time=0.317..30.370 rows=1000 loops=1) Buffers: shared hit=4859 read=22 dirtied=36 I/O Timings: read=1.154 write=0.000 -> Index Scan using ci_pipeline_schedules_pkey on public.ci_pipeline_schedules (cost=0.42..14468.22 rows=84670 width=8) (actual time=0.316..30.249 rows=1000 loops=1) Filter: (ci_pipeline_schedules.active AND (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone)) Rows Removed by Filter: 4302 Buffers: shared hit=4859 read=22 dirtied=36 I/O Timings: read=1.154 write=0.000
-
Batches after first one produce the query similar to the above with an addition condition
"ci_pipeline_schedules"."id" > $MAX_ID_FROM_LAST_BATCH
:SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') AND "ci_pipeline_schedules"."id" > 10000 ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000
The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49857
The explain is:
Limit (cost=0.42..178.99 rows=1000 width=8) (actual time=0.040..3.928 rows=1000 loops=1) Buffers: shared hit=2911 I/O Timings: read=0.000 write=0.000 -> Index Scan using ci_pipeline_schedules_pkey on public.ci_pipeline_schedules (cost=0.42..14723.73 rows=82451 width=8) (actual time=0.039..3.837 rows=1000 loops=1) Index Cond: (ci_pipeline_schedules.id > 10000) Filter: (ci_pipeline_schedules.active AND (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone)) Rows Removed by Filter: 1950 Buffers: shared hit=2911 I/O Timings: read=0.000 write=0.000
As it can be seen above, an index should be used instead of the filter. The index is prepared in this MR and added in this MR !107904 (closed)
If the index is added at postgres.ai as follow:
EXEC CREATE INDEX index_ci_pipeline_schedules_on_id_and_next_run_at_and_active ON ci_pipeline_schedules USING btree (id, next_run_at) WHERE (active = true);
The execution plans for the above queries become:
-
First batch produces the following query:
SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000
The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49860
The explain is:
Limit (cost=0.42..127.70 rows=1000 width=8) (actual time=0.922..3.279 rows=1000 loops=1) Buffers: shared hit=935 read=5 I/O Timings: read=0.956 write=0.000 -> Index Scan using index_ci_pipeline_schedules_on_id_and_next_run_at_and_active on public.ci_pipeline_schedules (cost=0.42..9687.35 rows=76103 width=8) (actual time=0.921..3.181 rows=1000 loops=1) Index Cond: (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone) Buffers: shared hit=935 read=5 I/O Timings: read=0.956 write=0.000
-
Batches after first one produce the query similar to the above with an addition condition
"ci_pipeline_schedules"."id" > $MAX_ID_FROM_LAST_BATCH
:SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') AND "ci_pipeline_schedules"."id" > 10000 ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000
The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49859
The explain is:
Limit (cost=0.42..133.50 rows=1000 width=8) (actual time=0.213..2.369 rows=1000 loops=1) Buffers: shared hit=945 read=9 I/O Timings: read=0.335 write=0.000 -> Index Scan using index_ci_pipeline_schedules_on_id_and_next_run_at_and_active on public.ci_pipeline_schedules (cost=0.42..9862.55 rows=74108 width=8) (actual time=0.212..2.269 rows=1000 loops=1) Index Cond: ((ci_pipeline_schedules.id > 10000) AND (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone)) Buffers: shared hit=945 read=9 I/O Timings: read=0.335 write=0.000
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #338609 (closed)