Skip to content

Limit updates to Web Hook backoff interval

Stan Hu requested to merge sh-limit-web-hook-backoffs into master

If a Web hook times out, this is treated as an error, and Webhook#backoff! is executed. However, if the hook fires repeatedly, which is common for a system hook or group hook, it's possible for this backoff to update the same row repeatedly via WebHooks::LogExecutionWorker jobs. This not only generates unnecessary table bloat, but it can cause a significant performance degradation when a long transaction has started.

These concurrent row updates can cause PostgreSQL to allocate multixact transaction IDs. A SELECT call will cause PostgreSQL to prune tuples in an opportunistic way, but this pruning may be significantly slowed if the window of multiexact tuples grows over time. Once the simple LRU cache can no longer fit the multixact XIDs the in-memory cache, we will see slowdowns when accessing the web_hooks table.

To avoid this, we cap the number of backoffs to 100 (MAX_FAILURES) and only update the row if the disabled_until time has elapsed. This should ensure the hook only fires once every 24 hours and only updates the row once during that time.

Relates to #340272 (closed)

Edited by Stan Hu

Merge request reports

Loading