Add migration to cleanup web_hook_logs backfill
What does this MR do?
Related issue: #323675 (closed)
Migration to call the partitioning migration helper that cleans up after the background migration used to backfill the web_hook_logs
table.
Summary from comments in #323675 (closed) and &5558 (comment 538648286):
On GitLab.com, 10691 background jobs were completed successfully and 338 background jobs are not marked as completed:
SELECT min(created_at), max(created_at), min(updated_at), max(updated_at)
FROM background_migration_jobs where arguments ? 'web_hook_logs';
min | max | min | max
-------------------------------+-------------------------------+------------------------------+-------------------------------
2021-03-10 19:17:40.001109+00 | 2021-03-10 19:21:40.654249+00 | 2021-03-10 19:17:41.94364+00 | 2021-03-26 03:01:39.943296+00
SELECT status, count(*) FROM background_migration_jobs WHERE arguments ? 'web_hook_logs' group by 1;
status | count
--------+-------
0 | 338
1 | 10691
The last background migration job run 9 hours ago, so we can assume the backfilling migration as completed with 338 jobs failed, not scheduled (lost?) or not marked as completed even though they run.
I can see no errors during the past 7 days in kibana other than a pgbouncer cannot connect to server
for a specific job, so there seem to be no errors related to database timeouts.
Checking the db_duration_s
of the Background Migration jobs for web_hook_logs
in kibana, there are a few jobs that go up to 1 minute of execution time (for 25 batches), but the average time is 7.4 seconds per job and 4.12 seconds of total db execution time.
I think that we can assume that 8 seconds per job is a safe maximum, which means that 338 jobs will require less than an hour to run (= 45 minutes = 2704 seconds), so I am moving forward with submitting this MR that runs finalize_backfilling_partitioned_table
in a post deployment migration.
Migration Output
up
$ bundle exec rake db:migrate
== 20210326121537 BackfillCleanupForPartitionedWebHookLogs: migrating =========
-- transaction_open?()
-> 0.0000s
-- table_exists?("web_hook_logs_part_0c5294f417")
-> 0.0007s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- execute("VACUUM FREEZE ANALYZE web_hook_logs_part_0c5294f417")
-> 0.0180s
-- execute("RESET ALL")
-> 0.0006s
== 20210326121537 BackfillCleanupForPartitionedWebHookLogs: migrated (0.0378s)
down
$ bundle exec rake db:rollback
== 20210326121537 BackfillCleanupForPartitionedWebHookLogs: reverting =========
== 20210326121537 BackfillCleanupForPartitionedWebHookLogs: reverted (0.0000s)
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. - [-] I have not included a changelog entry because _____.
-
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team