Improve trace with buffered writes
What does this MR do?
- Adds buffered writes in front of trace log file
- Removes redundant allocation on trace reads
Why was this MR needed?
Performance:
- Fewer syscalls, reduces burden on CPU and improves write speed
- Avoids allocation on trace reads and reduces GC pressure
Before/After benchmark comparison.
-
Buffer10k/10k-logline
is a reasonably sized logline written 10k times. -
Buffer10k/10k-small
is a very small log line written 10k times. -
BufferReadWriteWorkload
is a reasonable size logline written 10k times, but with a read of the last 1000 bytes every 5 lines.
name old time/op new time/op delta
Buffer10k/10k-logline-16 60.5ms ± 1% 35.5ms ± 1% -41.27% (p=0.008 n=5+5)
Buffer10k/10k-small-16 30.8ms ± 1% 3.1ms ± 1% -90.02% (p=0.008 n=5+5)
BufferReadWriteWorkload-16 67.7ms ± 1% 41.3ms ± 0% -39.05% (p=0.008 n=5+5)
name old speed new speed delta
Buffer10k/10k-logline-16 57.7MB/s ± 1% 98.2MB/s ± 1% +70.26% (p=0.008 n=5+5)
Buffer10k/10k-small-16 1.94MB/s ± 1% 19.52MB/s ± 1% +904.12% (p=0.008 n=5+5)
BufferReadWriteWorkload-16 51.5MB/s ± 1% 84.5MB/s ± 0% +64.05% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Buffer10k/10k-logline-16 30.3kB ± 0% 34.4kB ± 0% +13.75% (p=0.008 n=5+5)
Buffer10k/10k-small-16 30.3kB ± 0% 34.4kB ± 0% +13.75% (p=0.008 n=5+5)
BufferReadWriteWorkload-16 3.20MB ± 0% 0.13MB ± 0% -95.91% (p=0.016 n=4+5)
name old allocs/op new allocs/op delta
Buffer10k/10k-logline-16 34.0 ± 0% 36.0 ± 0% +5.88% (p=0.008 n=5+5)
Buffer10k/10k-small-16 34.0 ± 0% 36.0 ± 0% +5.88% (p=0.008 n=5+5)
BufferReadWriteWorkload-16 6.03k ± 0% 2.04k ± 0% -66.25% (p=0.008 n=5+5)
What's the best way to test this MR?
This is more an implementation detail, existing unit and integration tests should cover these changes.
What are the relevant issue numbers?
Closes #27853 (closed)
Edited by Arran Walker