Skip to content

Memory and Latency optimization of SD Scan

Vishwa Bhat requested to merge vbhat/prof into main

What does this MR do?

This MR addresses several bottlenecks in the code to optimize the overall memory consumption and latency numbers. Here are the optimizations this MR covers:

  1. Switch to an RE2-based matcher for finding keywords in the payload. See here for the details about this approach.
  2. Get rid of the Semantic Logger library as it caused significant string allocations and switch to the built-in Ruby Logger library.
  3. Symbolize the string keys for the ruleset to avoid Ruby making new copies of strings whenever accessing the ruleset across the requests.

In addition to optimizations, this MR does the following:

  • Remove benchmark directory since it was initially added for self-reference purposes. Any benchmark-related information will be made available in the GitLab issue.
  • Remove custom rules for runway jobs and stick to the standard rules defined by Runway.

Relevant Issue Numbers

Proof of optimization

Test Specification:

  • Each request contains 3 payloads with 40KB of total size (20KB + 20KB + 10KB)
  • Duration: 30s
  • Concurrency: 1
  • Timeout: 5s
  • gzip compression: Enabled
  • Device: Apple Macbook M1 Max (10 CPU cores), 32 GB DDR5 RAM
Input Memory consumed Object allocations Latency: Avg Latency: p90/p95/p99 Max request/sec achieved
Payload with no secrets (Current) 731.45 MB 6,005,796 27.38 ms 27.95/28.26/28.97 ms 36.13 RPS
Payload with no secrets (Optimized) 3.39 GB 3,247,550 1.15 ms 1.24/1.29/1.47 ms 821.19 RPS
Payload with secrets (Current) 802.97 MB 6,041,502 28.50 ms 29.51/29.96/31.13 ms 34.70 RPS
Payload with secrets (Optimized) 3.92 GB 21,173,202 3.01 ms 3.42/3.59/3.97 ms 322.91 RPS

At a glance, it appears that the optimized approach achieves lower latency at the cost of higher memory. However, this is not true when considering the Max RPS achieved during the tests. To put things in perspective, let's cap the request-per-second closer to the current approach say 30 RPS, and compare it against the optimized approach.

Input Max request/sec capped Memory consumed Object allocations Latency: Avg Latency: p90/p95/p99
Payload with no secrets (Current) 30 RPS 611.33 MB 5,028,923 28.08 ms 28.93/29.24/30.91 ms
Payload with no secrets (Optimized) 30 RPS 148.91 MB (~4x lesser) 375,197 (~13x lesser) 5.08 ms (~5.5x faster) 6.29/6.88/10.79 ms
Payload with secrets (Current) 30 RPS 697.18 MB 5,254,572 28.21 ms 28.95/29.24/30.24 ms
Payload with secrets (Optimized) 30 RPS 387.31 MB (~2x lesser) 2,206,716 (~2.4x lesser) 7.58 ms (~4x faster) 9.58/10.54/13.60 ms
Detailed snapshots for reference:

after_profile_clean_30_rps.txt

after_profile_leaky_30_rps.txt

before_profile_clean.txt

before_profile_clean_30_rps.txt

before_profile_leaky.txt

before_profile_leaky_30_rps.txt

Edited by Vishwa Bhat

Merge request reports

Loading