Allocate EntryMap keys only when needed
Currently we unconditionally copy each JSON string and pid into a new
EntryData
struct as we process the buffer.
The current lifetime of JSON strings is:
graph TD
A[Copy file to buffer] --> B
B[Copy JSON to EntryData] --> C
C[Hash JSON and check if present in map] --> D
C --> E
D[If no, move owned copy of JSON to map]
E[If yes, free copied JSON]
This is how the C implementation did things, so it was a reasonable
place to start. However, given that we have 12 Puma worker threads in
production, keys that are not pid-significant will be replicated many
times over. Each EntryData
struct we allocate is consumed before we
have finished processing the file buffer, so we can borrow directly from
that buffer instead of copying each chunk of JSON out of it
unconditionally.
Using the RawEntry
API available on the hashbrown
crate, we can
avoid allocating a new EntryData
unless the key is not present in the
map.
The new lifetime pattern is:
graph TD
A[Copy file to buffer] --> B
B[Borrow JSON from buffer] --> C
C[Hash JSON and check if present in map] --> D
D[If no, copy it into map as key]
hashbrown
is currently the underlying hashmap implementation used by
std::collections::HashMap
, so there's no change to the map
implementation itself, but std
does not expose the RawEntry
API in
stable. We no longer need to depend on ahash
directly as this is the
default hasher used by hashbrown
.
This change provides a ~15% performance improvement, getting us above 5x faster than the C implementation:
Warming up --------------------------------------
C 1.000 i/100ms
rust 2.000 i/100ms
Calculating -------------------------------------
C 5.760 (± 0.0%) i/s - 29.000 in 5.035721s
rust 30.186 (± 3.3%) i/s - 152.000 in 5.039813s
Comparison:
rust: 30.2 i/s
C: 5.8 i/s - 5.24x slower
With borrowed key check:
Benchmark 1: bundle exec ./bin/benchmark
Time (mean ± σ): 3.623 s ± 0.033 s [User: 2.954 s, System: 0.643 s]
Range (min … max): 3.565 s … 3.667 s 10 runs
With unconditional copy of JSON:
Benchmark 1: bundle exec ./bin/benchmark
Time (mean ± σ): 4.273 s ± 0.081 s [User: 3.589 s, System: 0.660 s]
Range (min … max): 4.206 s … 4.458 s 10 runs