LooseForeignKeys::CleanupWorker is falling behind while processing CI LFK_deleted_records
At the time of writing this issue, Loose Foreign Keys Cleanup Worker still has around 25M records of deleted records to process on the ci
database. Most of them are caused by deletions on the ci_builds
gitlabhq_dblab=# select count(*), fully_qualified_table_name from loose_foreign_keys_deleted_records where status = 1 group by fully_qualified_table_name;
count | fully_qualified_table_name
----------+----------------------------
4523 | public.ci_pipelines
13 | public.ci_runner_machines
1292217 | public.ci_runners
23951466 | public.p_ci_builds
(4 rows)
The worker is always hitting the time limit which is 30 seconds. See logs (internal)
As a result, the loose_foreign_keys_deleted_records
, which is a partitioned table, has so many partitions that cannot be deleted
Click to expand
gitlabhq_dblab=# \d+ loose_foreign_keys_deleted_records;
Partitioned table "public.loose_foreign_keys_deleted_records"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain | |
partition | bigint | | not null | 561 | plain | |
primary_key_value | bigint | | not null | | plain | |
status | smallint | | not null | 1 | plain | |
created_at | timestamp with time zone | | not null | now() | plain | |
fully_qualified_table_name | text | | not null | | extended | |
consume_after | timestamp with time zone | | | now() | plain | |
cleanup_attempts | smallint | | | 0 | plain | |
Partition key: LIST (partition)
Indexes:
"loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id)
"index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1
Check constraints:
"check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150)
Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_516 FOR VALUES IN ('516'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_517 FOR VALUES IN ('517'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_518 FOR VALUES IN ('518'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_519 FOR VALUES IN ('519'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_520 FOR VALUES IN ('520'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_521 FOR VALUES IN ('521'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_522 FOR VALUES IN ('522'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_523 FOR VALUES IN ('523'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_524 FOR VALUES IN ('524'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_525 FOR VALUES IN ('525'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_526 FOR VALUES IN ('526'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_527 FOR VALUES IN ('527'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_528 FOR VALUES IN ('528'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_529 FOR VALUES IN ('529'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_530 FOR VALUES IN ('530'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_531 FOR VALUES IN ('531'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_532 FOR VALUES IN ('532'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_533 FOR VALUES IN ('533'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_534 FOR VALUES IN ('534'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_535 FOR VALUES IN ('535'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_536 FOR VALUES IN ('536'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_537 FOR VALUES IN ('537'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_538 FOR VALUES IN ('538'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_539 FOR VALUES IN ('539'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_540 FOR VALUES IN ('540'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_541 FOR VALUES IN ('541'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_542 FOR VALUES IN ('542'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_543 FOR VALUES IN ('543'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_544 FOR VALUES IN ('544'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_545 FOR VALUES IN ('545'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_546 FOR VALUES IN ('546'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_547 FOR VALUES IN ('547'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_548 FOR VALUES IN ('548'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_549 FOR VALUES IN ('549'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_550 FOR VALUES IN ('550'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_551 FOR VALUES IN ('551'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_552 FOR VALUES IN ('552'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_553 FOR VALUES IN ('553'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_554 FOR VALUES IN ('554'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_555 FOR VALUES IN ('555'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_556 FOR VALUES IN ('556'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_557 FOR VALUES IN ('557'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_558 FOR VALUES IN ('558'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_559 FOR VALUES IN ('559'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_560 FOR VALUES IN ('560'),
gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_561 FOR VALUES IN ('561')
this comment it turned out that p_ci_builds
has no LFK definitions, which is causing the problem
After some analysis, see Suggested Corrective Actions, which can be split into their own issues:
Giving higher chance for CI in its share of the Worker which runs every minute: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/loose_foreign_keys/cleanup_worker.rb#L39-43- [Temporarily] tweaking the limit: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/loose_foreign_keys/modification_tracker.rb#L5-7
- Maybe we write down some Background Migrations that do extra cleanups, especially when the load on the database is low.
Parallelize the worker: #378586- Monitoring and Alerting
🚨 We have metrics with so much informationloose_foreign_key_updates
,loose_foreign_key_deletions
andloose_foreign_key_rescheduled_deleted_records
References:
- Grafana Dashboard: https://dashboards.gitlab.net/d/sidekiq-loose-foreign-keys/sidekiq-loose-foreign-keys-processing?orgId=1&from=now-6h&to=now
CC: @ahegyi
Edited by Marius Bobin