[Feature flag] Apply rate-limiting to webhook executions
Feature
This feature uses the :web_hooks_rate_limit
feature flag!
!61151 (merged) introduces the functionality, but is disabled by default (both through the FF, and by not defining a threshold yet).
Owners
- Team: ~"group::ecosystem"
- Most appropriate slack channel to reach out to:
#g_create_ecosystem-be
- Best individual to reach out to: @toupeira
- PM: @deuley
Stakeholders
The Rollout Plan
This issue only focuses on the rollout for the Free plan on gitlab.com.
Possible follow-up issues:
- Adding thresholds on paid plans.
- Adding a default threshold for self-managed instances.
Expectations
What are we expecting to happen?
Frequently called webhooks will get rate-limited.
What might happen if this goes wrong?
We might set the threshold too low and break user's workflows.
What can we monitor to detect problems with this?
Staging | Production | |
---|---|---|
Rate limit events (Rails) | https://nonprod-log.gitlab.net/goto/51d8ebf49baf1f84ed7a6f443bfffeb5 | https://log.gprd.gitlab.net/goto/f327f3c32a524be2be2a38e43bf8cffe |
Rate limit events (Sidekiq) | https://nonprod-log.gitlab.net/goto/f81cb098d007be1ea735bf702bd1e88d | https://log.gprd.gitlab.net/goto/cd9cdcae88393e22e822cd8f37b4b46d |
(Note: The log source depends on whether the webhook was triggered from a web request or a job worker)
Rollout Timeline
Rollout Steps
Preparation Phase
-
Enable on staging ( /chatops run feature set web_hooks_rate_limit true --staging
) -
Verify behaviour on staging - Set a temporary threshold for the Free plan.
- Verify the rate limiting behaves as expected (rate-limit takes effect, resets after the interval, doesn't affect non-Free plans)
- Reset the temporary threshold.
-
Ensure that documentation has been updated (More info) -
Check that !62130 (merged) is deployed to gitlab.com. -
Enable on production ( /chatops run feature set web_hooks_rate_limit true
)- No threshold is defined yet so this won't have an effect, but as a side-effect of checking the plan limits we'll also log the subscription plan in Kibana.
-
Determine a suitable threshold for the Free plan, based on usage patterns in Kibana. -
Submit an MR to: - Add a migration to set the threshold for the Free plan on gitlab.com.
- Document the threshold on https://docs.gitlab.com/ee/user/gitlab_com/index.html#webhooks.
- !62918 (merged)
-
Disable on production ( /chatops run feature set web_hooks_rate_limit false
)
Full Rollout Phase
-
Announce on the issue an estimated time this will be enabled on GitLab.com -
Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does. -
Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the @sre-oncall
Slack alias. -
Notify about the upcoming change in #support_gitlab-com
(more guidance when this is necessary in the dev docs) and in your team channel -
After the %14.0 release announcement on June 22: -
Enable on production ( /chatops run feature set web_hooks_rate_limit true
) -
Verify the behaviour on production (trigger more than 120 webhook calls per minute and check logs) -
Announce on the issue that the flag has been enabled
-
-
Submit an MR to make the feature flag enabled by default. -
Wait for the MR to be deployed. -
Remove the feature flag on all environments.
Rollback Steps
-
This feature can be disabled by running the following Chatops command:
/chatops run feature set web_hooks_rate_limit false