WIP: Introduce `rbtap` tool
Problem
I wanted a means of pulling Ruby & process related data externally, i.e. without having to write code that ships as part of the thing we're instrumenting (both approaches have pros and cons.) I also wanted this to be done in a way that:
- Is easy to use
- Produces data that is easily comparable from run to run
Approach
rbtap
is currently a POC for how that could look like. It is built around the following building blocks:
-
Tap
s: These are "data spouts" thatdispense
data from a connected process to a file -
Collector
: The object that controls taps and triggers them at given intervals
The way it works is as follows:
- The target process is
fork
ed off - The collector waits for the process to finish, while continuously polling all taps
- Taps pull data in whichever way they define
- Taps then write this data to a tap-file in JSON
There are currently only 2 taps:
RbtraceExprTap:
This tap encapsulates a Ruby instruction that is executed against the process via rbtrace
. I have written two of these for GC stats and ObjectSpace stats respectively. Expression results are written to a FIFO pipe in Hash
format, from which the tap reads then back before writing them out as JSON.
ProcStatTap:
This tap runs ps
to collect CPU and memory utilization stats and writes them to JSON in a file.
Example
$ ./rbtap <CMD>
This will write tap-files in the current directory (this should be configurable at some point). You can then tail
or jq
these files concurrently and watch the output as it is emitted (sampling is current done once a second)
[16:10:18] team-tools/rbtap::rbtap ✗ tail gcstat
{"count":17,"heap_allocated_pages":86,"heap_sorted_length":86,"heap_allocatable_pages":0,"heap_available_slots":35052,"heap_live_slots":24201,"heap_free_slots":10851,"heap_final_slots":0,"heap_marked_slots":22589,"heap_eden_pages":86,"heap_tomb_pages":0,"total_allocated_pages":86,"total_freed_pages":0,"total_allocated_objects":107773,"total_freed_objects":83572,"malloc_increase_bytes":78024,"malloc_increase_bytes_limit":16777216,"minor_gc_count":13,"major_gc_count":4,"remembered_wb_unprotected_objects":223,"remembered_wb_unprotected_objects_limit":440,"old_objects":22253,"old_objects_limit":41580,"oldmalloc_increase_bytes":78024,"oldmalloc_increase_bytes_limit":16777216}
{"count":17,"heap_allocated_pages":86,"heap_sorted_length":86,"heap_allocatable_pages":0,"heap_available_slots":35052,"heap_live_slots":24410,"heap_free_slots":10642,"heap_final_slots":0,"heap_marked_slots":22589,"heap_eden_pages":86,"heap_tomb_pages":0,"total_allocated_pages":86,"total_freed_pages":0,"total_allocated_objects":107985,"total_freed_objects":83575,"malloc_increase_bytes":87272,"malloc_increase_bytes_limit":16777216,"minor_gc_count":13,"major_gc_count":4,"remembered_wb_unprotected_objects":223,"remembered_wb_unprotected_objects_limit":440,"old_objects":22253,"old_objects_limit":41580,"oldmalloc_increase_bytes":87272,"oldmalloc_increase_bytes_limit":16777216}
You can then slice and dice this further as you see fit.
Benefits
- A well-defined way of capturing data in a way that is consistent and easy to post-process
- No modifications to target process necessary other than
require 'rbtrace'
-- even works with rake tasks, so you can observe the importer that way - Easy to extend: we can run as many taps as we see fit, and if we add a proper CLI interface we can switch them on and off as desired