Memory and Latency optimization of SD Scan
What does this MR do?
This MR addresses several bottlenecks in the code to optimize the overall memory consumption and latency numbers. Here are the optimizations this MR covers:
- Switch to an RE2-based matcher for finding keywords in the payload. See here for the details about this approach.
- Get rid of the Semantic Logger library as it caused significant string allocations and switch to the built-in Ruby Logger library.
- Symbolize the string keys for the ruleset to avoid Ruby making new copies of strings whenever accessing the ruleset across the requests.
In addition to optimizations, this MR does the following:
- Remove
benchmark
directory since it was initially added for self-reference purposes. Any benchmark-related information will be made available in the GitLab issue. - Remove custom rules for runway jobs and stick to the standard rules defined by Runway.
Relevant Issue Numbers
- Run benchmarks on RPC service (gitlab-org/gitlab#468107 - closed)
- Make Secret Detection RPC service Production-ready (gitlab-org/gitlab#467531 - closed)
Proof of optimization
Test Specification:
- Each request contains 3 payloads with 40KB of total size (20KB + 20KB + 10KB)
- Duration: 30s
- Concurrency: 1
- Timeout: 5s
-
gzip
compression: Enabled - Device: Apple Macbook M1 Max (10 CPU cores), 32 GB DDR5 RAM
Input | Memory consumed | Object allocations | Latency: Avg | Latency: p90/p95/p99 | Max request/sec achieved |
---|---|---|---|---|---|
Payload with no secrets (Current) | 731.45 MB | 6,005,796 | 27.38 ms | 27.95/28.26/28.97 ms | 36.13 RPS |
Payload with no secrets (Optimized) | 3.39 GB | 3,247,550 | 1.15 ms | 1.24/1.29/1.47 ms | 821.19 RPS |
Payload with secrets (Current) | 802.97 MB | 6,041,502 | 28.50 ms | 29.51/29.96/31.13 ms | 34.70 RPS |
Payload with secrets (Optimized) | 3.92 GB | 21,173,202 | 3.01 ms | 3.42/3.59/3.97 ms | 322.91 RPS |
At a glance, it appears that the optimized approach achieves lower latency at the cost of higher memory. However, this is not true when considering the Max RPS achieved during the tests. To put things in perspective, let's cap the request-per-second closer to the current approach say 30 RPS
, and compare it against the optimized approach.
Input | Max request/sec capped | Memory consumed | Object allocations | Latency: Avg | Latency: p90/p95/p99 |
---|---|---|---|---|---|
Payload with no secrets (Current) | 30 RPS | 611.33 MB | 5,028,923 | 28.08 ms | 28.93/29.24/30.91 ms |
Payload with no secrets (Optimized) | 30 RPS | 148.91 MB (~4x lesser) | 375,197 (~13x lesser) | 5.08 ms (~5.5x faster) | 6.29/6.88/10.79 ms |
Payload with secrets (Current) | 30 RPS | 697.18 MB | 5,254,572 | 28.21 ms | 28.95/29.24/30.24 ms |
Payload with secrets (Optimized) | 30 RPS | 387.31 MB (~2x lesser) | 2,206,716 (~2.4x lesser) | 7.58 ms (~4x faster) | 9.58/10.54/13.60 ms |
Edited by Vishwa Bhat