Skip to content

Diagnostic reports: compress data

Matthias Käppler requested to merge 370077-compressed-reports into master

What does this MR do and why?

Refs #370077 (closed)

This adds a gzip compression step when streaming diagnostic reports to disk, which will dramatically decrease report size both on disk and on wire (we upload these into a GCS bucket every half hour or so).

Performance and efficacy

Our report files are currently all JSON text files i.e. highly compressable. For Object Space ("heap") dumps this is especially effective since a typical heap dump for a hot worker process will be around 1GB of uncompressed data.

I looked at 3 different compression tools: gzip, bzip2, and zstd. I summarized the results below. Data was collected via /usr/bin/time -v for a 1GB puma heap dump on an 8 core i9@3.8GHz (Carbon X1).

tool user_s system_s peak RSS size
zstd -1 1.07 0.20 12M 77M
gzip -1 4.02 0.13 2M 103M
bzip2 -1 66.14 0.47 2.3M 89M

zstd can leverage multiple CPU cores to speed up processing, but can result in higher memory use especially for larger compression levels. With the default level of -3, it used 3 times as much memory (39MB). It also supports setting a memory cap, but this did not appear to have any effect during testing. I think 12MB is definitely acceptable here.

bzip2 has unacceptable performance characteristics with compression that is only marginally better than gzip.

I chose gzip by default because it seems to over a decent trade-off between CPU and memory use, and it is already installed in our production CNG images. I will look into swapping this out with zstd in a follow-up issue.

gzip package

I verified that gzip is installed in the gitlab-rails image already, but nonetheless I am making this explicit here: gitlab-org/build/CNG!1218 (merged)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

See https://gitlab.com/gitlab-org/application-performance-team/team-tools/-/blob/master/DIAGNOSTIC_REPORTS.md

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370077 (closed)

Edited by Matthias Käppler

Merge request reports

Loading