Limit the request and response being logged from the WebhookWorker
What does this MR do and why?
For https://gitlab.com/gitlab-org/gitlab/-/issues/340027
Whenever a webhook is executed, the execution result is stored inside the database so that customers can revisit it later. At the moment, both the request and response are logged. We haven't had any limits of both of them. We simply stores whatever a response throw back to us. That may be an issue in long-term. The data will be displayed on the UI. It does not make much sense to display a page full of texts back to the customers. In addition, when we applied Sidekiq limit, a too-large execution log will be rejected by Sidekiq. At the moment, we have a hard limit of 5MB payload after compression. This limit is more than generous for most use cases. If an execution log exceeds this limit, I highly doubt they can be useful for most customers.
This MR is to limit the data being logged by the webhooks. In details:
- Strip the response data to 8kb. 8kb is equal to 2 full pages of text on a 4K screen. I generally think that amount of data is enough for the users to inspect the errors if any.
- Store the first 50 response headers. A typical API to GitLab has around 20 headers.
- Limit the content of response header to 8kb.
- Strip request data when fails to schedule Sidekiq limit exception. Request data is tough. It is used for retry feature. So, we only strip the request data whenever we have no choice. The retry functionality will be disbled when so.
This MR affects two features: Project Hook, and Admin System Hook.
How to set up and validate locally
To validate this MR in the local environment, we'll need to setup the webhook to a project and/or system hook. The hook URL is set to a local mock server that returns desired mocking data. In this section, I use a minimal Rack server. Rack server definition is defined in test.ru
file. It is started up with bundle exec rackup test.ru
command.
The following set of testing scenarios are also repeated for admin system hooks.
Scenario 1: response data is too big
Rack Server
require 'securerandom'
require 'httparty'
seed = HTTParty.get("https://gitlab.com").response.body
test_content =
if seed.size > 8192
seed
else
seed * (seed / 8192) + 1
end
run ->(env) { [200, {}, [test_content]] }
Scenario 2: massive headers
Rack server
require 'securerandom'
headers = (1..1000).to_a.to_h do |v|
["Header-#{v}", SecureRandom.hex(v)]
end
run ->(env) { [200, headers, ['Hello world']] }
Scenario 3: request data is too huge
Prerequisite step: setup Sidekiq payload limit to a reasonable amount
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.