Fix Web hook deletion not working when many hook logs are present
When a user attempts to destroy a Web hook, the database will attempt to delete all the associated Web hook logs. However, as we have seen in #21940 (closed), the table may be too bloated or the number of rows too large that this deletion can time out due to a 15-second statement timeout.
We rectify this situation by deleting these logs in batches of 1000 outside of a transaction. That should be acceptable since old logs get pruned periodically anyway, and when a Web hook is destroyed it's more important that the destruction makes progress and eventually removes the hook.
When there are many Web logs present, it's not practical to delete all of them in a single request. We schedule an async delete if we detect there are at least 10,000 entries.
Relates to #21940 (closed)
SQL
I used this code to destroy 30 million Web hook logs. The batch deletion looks something like this (it starts out fast but then starts to take 500 ms or so per batch):
D, [2020-09-26T01:40:17.014729 #23113] DEBUG -- : WebHookLog Destroy (468.8ms) DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" IN (SELECT "web_hook_logs"."id" FROM "web_hook_logs" WHERE "web_hook_logs"."web_hook_id" = 333850 LIMIT 1000) /*application:console*/
D, [2020-09-26T01:40:17.511939 #23113] DEBUG -- : WebHookLog Destroy (495.7ms) DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" IN (SELECT "web_hook_logs"."id" FROM "web_hook_logs" WHERE "web_hook_logs"."web_hook_id" = 333850 LIMIT 1000) /*application:console*/
D, [2020-09-26T01:40:17.981824 #23113] DEBUG -- : WebHookLog Destroy (468.5ms) DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" IN (SELECT "web_hook_logs"."id" FROM "web_hook_logs" WHERE "web_hook_logs"."web_hook_id" = 333850 LIMIT 1000) /*application:console*/
D, [2020-09-26T01:40:18.454953 #23113] DEBUG -- : WebHookLog Destroy (471.7ms) DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" IN (SELECT "web_hook_logs"."id" FROM "web_hook_logs" WHERE "web_hook_logs"."web_hook_id" = 333850 LIMIT 1000) /*application:console*/
The count query takes 5 seconds on GitLab.com for the system hook with 30 million rows:
explain analyze SELECT COUNT(*) FROM (SELECT 1 AS one FROM "web_hook_logs" WHERE "web_hook_logs"."web_hook_id" = 67407 LIMIT 10000) subquery_for_count; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=540.40..540.41 rows=1 width=8) (actual time=5307.694..5307.695 rows=1 loops=1)
-> Limit (cost=0.57..415.40 rows=10000 width=4) (actual time=1.259..5303.695 rows=10000 loops=1)
-> Index Only Scan using index_web_hook_logs_on_web_hook_id on web_hook_logs (cost=0.57..1259872.78 rows=30370785 width=4) (actual time=1.258..5301.437 rows=10000 loops=1)
Index Cond: (web_hook_id = 67407)
Heap Fetches: 10001
Planning Time: 2.940 ms
Execution Time: 5307.832 ms
(7 rows)