Add worker to prune old export jobs
What does this MR do and why?
There are millions of old ProjectExportJob records on the DB that need pruning.
This change adds a worker that deletes any records older than 7 days.
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13698/commands/48086
Query
explain SELECT "project_export_jobs"."id" FROM "project_export_jobs" WHERE (updated_at < '2022-11-29 12:44:30.585313') ORDER BY "project_export_jobs"."id" ASC;
Plan with execution
Index Scan using project_export_jobs_pkey on public.project_export_jobs (cost=0.42..334614.27 rows=49495 width=8) (actual time=17.994..7923.438 rows=49501 loops=1)
Filter: (project_export_jobs.updated_at < '2022-11-29 12:44:30.585313+00'::timestamp with time zone)
Rows Removed by Filter: 110357
Buffers: shared hit=144921 read=7424 dirtied=175
I/O Timings: read=7445.582 write=0.000
Recommendations
❗️ Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. <http://momjian.us/main/writings/pgsql/hw_performance/|Show details>
❗️ Add LIMIT – The number of rows in the result set is too big. Limit number of rows. <https://postgres.ai/#tip-add-limit|Show details>
❗️ Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (notice `Rows Removed by Filter: ...`, meaning that the index fetched many non-target rows). Consider adding more specialized index(es). <https://postgres.ai/#tip-index-inefficient-high-filtered|Show details>
Copy
Statistics
Time: 7.929 s
- planning: 0.719 ms
- execution: 7.928 s
- I/O read: 7.446 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 144921 (~1.10 GiB) from the buffer pool
- reads: 7424 (~58.00 MiB) from the OS file cache, including disk I/O
- dirtied: 175 (~1.40 MiB)
- writes: 0
A batched background migration has already been run on production to do an initial prune of ~9m records: https://gitlab.slack.com/archives/CB2S7NNDP/p1670251122239449
ID
336
Job class
PruneStaleProjectExportJobs
Table
project_export_jobs
Status
finished
Progress
100%
Created at
2022-12-02T01:22:32.760Z
Before migration:
[ gprd ] production> ProjectExportJob.last.id
=> 9628218
[ gprd ] production> ProjectExportJob.where("updated_at < ?", 7.days.ago).size
=> 9138998
After migration:
[ gprd ] production> ProjectExportJob.last.id
=> 9800821
[ gprd ] production> ProjectExportJob.where("updated_at < ?", 7.days.ago).size
=> 36302
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #351253 (closed)
Edited by Carla Drago