Heap dumps: turn gzip errors into structured log events (!107474) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 370077-log-gzip-errors into master Dec 20, 2022

What does this MR do and why?

We recently added the ability to capture Ruby object space dumps.

While rolling this out on SaaS, I noticed that sometimes gzip would fail to compress them (likely due to running out of space) so we were uploading empty files instead of bailing out and logging this as a failure. The reason was that we merely forwarded gzip's stderr to the application i.e. container stderr which just dumps a string to the container logs.

With this MR, we capture stderr of that process and map it to an exception, which further up the call stack gets mapped to a log event. This means error handling is now more consistent, regardless of where we fail (application or when shelling out.)

Screenshots or screen recordings

To test this manually, I gave gzip an unsupported command line flag. This produces a log event in application_json.log now:

{
  "severity": "ERROR",
  "time": "2022-12-20T13:40:02.292Z",
  "correlation_id": null,
  "pid": 267,
  "worker_id": "puma_1",
  "perf_report_worker_uuid": "1d0dfa39-cef9-44b6-ab4b-7d77d3c59900",
  "message": "failed",
  "perf_report": "heap_dump",
  "error": "#\u003cStandardError: gzip: unrecognized option '--blah'\u003e"
}

whereas previously, no log event was emitted.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #370077 (closed)

Heap dumps: turn gzip errors into structured log events

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports