Memory and Latency optimization of SD Scan (!16) · Merge requests · GitLab.org / security-products / secret-detection / Secret Detection Service

Vishwa Bhat requested to merge vbhat/prof into main Sep 09, 2024

What does this MR do?

This MR addresses several bottlenecks in the code to optimize the overall memory consumption and latency numbers. Here are the optimizations this MR covers:

Switch to an RE2-based matcher for finding keywords in the payload. See here for the details about this approach.
Get rid of the Semantic Logger library as it caused significant string allocations and switch to the built-in Ruby Logger library.
Symbolize the string keys for the ruleset to avoid Ruby making new copies of strings whenever accessing the ruleset across the requests.

In addition to optimizations, this MR does the following:

Remove benchmark directory since it was initially added for self-reference purposes. Any benchmark-related information will be made available in the GitLab issue.
Remove custom rules for runway jobs and stick to the standard rules defined by Runway.

Relevant Issue Numbers

Proof of optimization

Test Specification:

Each request contains 3 payloads with 40KB of total size (20KB + 20KB + 10KB)
Duration: 30s
Concurrency: 1
Timeout: 5s
gzip compression: Enabled
Device: Apple Macbook M1 Max (10 CPU cores), 32 GB DDR5 RAM

Input	Memory consumed	Object allocations	Latency: Avg	Latency: p90/p95/p99	Max request/sec achieved
Payload with no secrets (Current)	731.45 MB	6,005,796	27.38 ms	27.95/28.26/28.97 ms	36.13 RPS
Payload with no secrets (Optimized)	3.39 GB	3,247,550	1.15 ms	1.24/1.29/1.47 ms	821.19 RPS
Payload with secrets (Current)	802.97 MB	6,041,502	28.50 ms	29.51/29.96/31.13 ms	34.70 RPS
Payload with secrets (Optimized)	3.92 GB	21,173,202	3.01 ms	3.42/3.59/3.97 ms	322.91 RPS

At a glance, it appears that the optimized approach achieves lower latency at the cost of higher memory. However, this is not true when considering the Max RPS achieved during the tests. To put things in perspective, let's cap the request-per-second closer to the current approach say 30 RPS, and compare it against the optimized approach.

Input	Max request/sec capped	Memory consumed	Object allocations	Latency: Avg	Latency: p90/p95/p99
Payload with no secrets (Current)	30 RPS	611.33 MB	5,028,923	28.08 ms	28.93/29.24/30.91 ms
Payload with no secrets (Optimized)	30 RPS	148.91 MB (~4x lesser)	375,197 (~13x lesser)	5.08 ms (~5.5x faster)	6.29/6.88/10.79 ms
Payload with secrets (Current)	30 RPS	697.18 MB	5,254,572	28.21 ms	28.95/29.24/30.24 ms
Payload with secrets (Optimized)	30 RPS	387.31 MB (~2x lesser)	2,206,716 (~2.4x lesser)	7.58 ms (~4x faster)	9.58/10.54/13.60 ms

Detailed snapshots for reference:

after_profile_clean_30_rps.txt

after_profile_leaky_30_rps.txt

before_profile_clean.txt

before_profile_clean_30_rps.txt

before_profile_leaky.txt

before_profile_leaky_30_rps.txt

Edited Sep 11, 2024 by Vishwa Bhat

Memory and Latency optimization of SD Scan

What does this MR do?

Relevant Issue Numbers

Proof of optimization

Merge request reports