Skip to content

Deactivate prune webhooklogs worker

Andy Schoenen requested to merge deactivate-prune-webhooklogs-worker into master

What does this MR do?

It deactivates PruneWebhookWorker recurring Sidekiq job. The job runs once per hour and removes old web_hook_logs entries. It was originally introduced in 75316348. @iroussos and the groupdatabase is working on partitioning the web_hook_logs table to make it easier to drop old records. However, PruneWebhookWorker is currently blocking the migration as described in &5558 (comment 542199537).

Our proposal is to disable PruneWebHookLogsWorker for both GitLab.com and self hosted instances in %13.11:

  1. Why disable it on Gitlab.com?

    The PruneWebHookLogsWorker cron job is not able to keep up with the rate new records are added, as it is removing ~2.2M (= 50000 * (168 - 125)) records per week while we're well beyond 3M new records created per day.

    Even if we were to address all issues, we would cleanup 1.2M records per day, which is close to 35-40% the rate that new records are added.

    We think that it is better to stop cleaning the old records and prune the old partitions once and for all in %14.0 once we are done with the partitioning migration, than keeping the worker around while it is sending queries that time out.

    That would mean that we will be having ~50GB of additional records not cleaned per month until June, but that's a small fraction of the current size of web_hook_logs and in total less than a month's worth of data (on March we have gone up to 170GB as you can see in my comment above).

  2. Why disable it for self hosted instances as well?

    We worry that there is a risk that similar locking issues may happen while (large) self hosted instances run post deployment migrations on a no downtime way.

    As they are not at the scale of GitLab.com, we have the additional probability that PruneWebHookLogsWorker has not fallen behind and that it will directly compete with the backfilling migration for the same sets of records, causing even more lock conflicts.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Andy Schoenen

Merge request reports

Loading