Report Ruby process USS+PSS into Prometheus (!30374) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge ruby-sampler-pss into master Apr 24, 2020

What does this MR do?

We've been looking more at preferring USS and PSS (unique and proportional set size) to gauge memory use on Linux, since it accounts for shared memory which is important for pre-fork servers where a large chunk of memory remains static and can be shared between workers.

This MR aims at reporting these metrics alongside RSS in our Ruby sampler. The change is behind a feature flag:

feature flag collect_memory_uss_pss

Implementation

In the past we had sampled this data from /proc/self/smaps, which can be very slow, as it collects data from the kernel for each virtual memory area mapped by the process, and which was then summed up in Ruby space.

Here we instead rely on a relatively new kernel feature that came out of Android, which sums up all relevant VMAs into a new file /proc/self/smaps_rollup that has a single PSS figure, as well as private memory pages that can be further rolled up into USS, for the entire process:

By using smaps_rollup instead of smaps, a caller can avoid the significant overhead of formatting, reading, and parsing each of a large process's potentially very numerous memory mappings. For sampling system_server's PSS in Android, we measured a 12x speedup, representing a savings of several hundred milliseconds.

https://patchwork.kernel.org/patch/9896795/

i.e.

pss = Pss entry
uss = sum of Private_ page entries

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Test on review-app
Test on MacOS which does not have /proc
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
[-] Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Additionally, I verified that sampling works locally as follows:

USS:

curl -v localhost:3000/-/metrics | egrep '^ruby_process_unique' | grep puma_1
- => ruby_process_unique_memory_bytes{pid="puma_1"} 166150144
cat /proc/530/smaps_rollup | grep Private_ | awk '{print $2}' | paste -sd+ | read sum; bc <<< "($sum) * 1024"
- => 166150144
166150144 == 166150144

PSS:

curl -v localhost:3000/-/metrics | egrep '^ruby_process_proportional' | grep puma_1
- => ruby_process_proportional_memory_bytes{pid="puma_1"} 581093376
cat /proc/530/smaps_rollup | egrep '^Pss:' | awk '{print $2}' | read pss; bc <<< "($pss) * 1024"
- => 581966848
581093376 ~= 581966848

Note that it is expected not to get the exact same reading at all times here because prometheus only samples every so often.

Edited May 31, 2022 by 🤖 GitLab Bot 🤖

Report Ruby process USS+PSS into Prometheus

What does this MR do?

Implementation

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Merge request reports