Heap dumps: turn gzip errors into structured log events
What does this MR do and why?
We recently added the ability to capture Ruby object space dumps.
While rolling this out on SaaS, I noticed that sometimes gzip
would fail to compress them (likely due to running out of space) so we were uploading empty files instead of bailing out and logging this as a failure. The reason was that we merely forwarded gzip
's stderr to the application i.e. container stderr which just dumps a string to the container logs.
With this MR, we capture stderr of that process and map it to an exception, which further up the call stack gets mapped to a log event. This means error handling is now more consistent, regardless of where we fail (application or when shelling out.)
Screenshots or screen recordings
To test this manually, I gave gzip an unsupported command line flag. This produces a log event in application_json.log
now:
{
"severity": "ERROR",
"time": "2022-12-20T13:40:02.292Z",
"correlation_id": null,
"pid": 267,
"worker_id": "puma_1",
"perf_report_worker_uuid": "1d0dfa39-cef9-44b6-ab4b-7d77d3c59900",
"message": "failed",
"perf_report": "heap_dump",
"error": "#\u003cStandardError: gzip: unrecognized option '--blah'\u003e"
}
whereas previously, no log event was emitted.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #370077 (closed)