Limit the request and response being logged from the WebhookWorker (!88737) · Merge requests · GitLab.org / GitLab

Quang-Minh Nguyen requested to merge qmnguyen0711/limit-web-hook-logger into master May 27, 2022

What does this MR do and why?

For https://gitlab.com/gitlab-org/gitlab/-/issues/340027

Whenever a webhook is executed, the execution result is stored inside the database so that customers can revisit it later. At the moment, both the request and response are logged. We haven't had any limits of both of them. We simply stores whatever a response throw back to us. That may be an issue in long-term. The data will be displayed on the UI. It does not make much sense to display a page full of texts back to the customers. In addition, when we applied Sidekiq limit, a too-large execution log will be rejected by Sidekiq. At the moment, we have a hard limit of 5MB payload after compression. This limit is more than generous for most use cases. If an execution log exceeds this limit, I highly doubt they can be useful for most customers.

This MR is to limit the data being logged by the webhooks. In details:

Strip the response data to 8kb. 8kb is equal to 2 full pages of text on a 4K screen. I generally think that amount of data is enough for the users to inspect the errors if any.
Store the first 50 response headers. A typical API to GitLab has around 20 headers.
Limit the content of response header to 8kb.
Strip request data when fails to schedule Sidekiq limit exception. Request data is tough. It is used for retry feature. So, we only strip the request data whenever we have no choice. The retry functionality will be disbled when so.

This MR affects two features: Project Hook, and Admin System Hook.

How to set up and validate locally

To validate this MR in the local environment, we'll need to setup the webhook to a project and/or system hook. The hook URL is set to a local mock server that returns desired mocking data. In this section, I use a minimal Rack server. Rack server definition is defined in test.ru file. It is started up with bundle exec rackup test.ru command.

The following set of testing scenarios are also repeated for admin system hooks.

Scenario 1: response data is too big

Rack Server

require 'securerandom'
require 'httparty'

seed = HTTParty.get("https://gitlab.com").response.body
test_content =
  if seed.size > 8192
    seed
  else
    seed * (seed / 8192) + 1
  end

run ->(env) { [200, {}, [test_content]] }

Result (master) - pages and pages of response HTML. It makes my browser slow down a bit

Result (this MR) - only some lines of HTML are displayed

Scenario 2: massive headers

Rack server

require 'securerandom'

headers = (1..1000).to_a.to_h do |v|
  ["Header-#{v}", SecureRandom.hex(v)]
end

run ->(env) { [200, headers, ['Hello world']] }

Result (master) - multiple pages full of headers

Result (this MR) - First 50 headers only

Scenario 3: request data is too huge

Prerequisite step: setup Sidekiq payload limit to a reasonable amount

Result (master)

Result (this MR) - Notice the Resend button is disabled

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited May 31, 2022 by Quang-Minh Nguyen

Limit the request and response being logged from the WebhookWorker