Trace2: Expose internal pack objects metrics with trace2
For #4699 (closed)
git-pack-objects
is a hot topic recently. It's one of the core commands for pulling stuff from GitLab. We applied plenty of optimizations and solutions for this command. It makes sense to add some metrics to increase the observability. It may help when investigating performance issues.
This MR implement trace2 hook that exports pack-object metrics. This hook uses the existing trace2 hooking foundation. It hooks into the trace2 tree for the following events:
- data:pack-objects:write_pack_file/wrote
- data:pack-objects:loosen_unused_packed_objects/loosened
- data:pack-objects:stdin_packs_found
- data:pack-objects:stdin_packs_hints
- pack-objects:enumerate-objects
- pack-objects:prepare-pack
- pack-objects:write-pack-file
In the source, I can find some more data, but I don't think they are relevant.
When run into such traces, the hook increases the corresponding Prometheus' histograms. In addition, it also dumps equivalent fields to command stats. Eventually, Gitaly outputs logs having associated fields.
Performance impact
The biggest concern is how this hook impacts the performance of git-pack-objects command. There are some data to justify its impact:
- The output events are predictable. Unlike other commands, this command has a nearly fixed number of events. The full flow includes
enumerating-objects
,prepare-pack
andwrite-pack-file
steps. Each step has a handful of events. The total size of a typical event is around 3.4kb. I don't think writing and reading such small files will become a problem. The size of trace2 events doesn't increase according to size of pack files. -
´trace2.Parser
is fast for this small amount of events. Testing on my local environment, it takes a couple of micro-seconds. - The execution rate of this command is slow in each node. In normal conditions, the execution rate is around 3-4 RPS in most busy nodes and may peak to 15 RPS in exceptional situations.
Of course, let's roll it out under feature flag control. We can always roll back if something goes wrong
Execution rate of git-pack-objects
commands on production - Source
Sample recorded events of git-pack-objects
.
{"event":"version","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","time":"2023-02-22T12:05:04.840009Z","evt":"3","exe":"2.39.1"}
{"event":"start","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","t_abs":0.002338,"argv":["git","pack-objects","toon","--compression=0"]}
{"event":"def_repo","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"worktree":"/gitaly"}
{"event":"cmd_name","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","name":"pack-objects","hierarchy":"pack-objects"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","line":4460,"repo":1,"nesting":1,"category":"pack-objects","label":"enumerate-objects"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":2,"category":"progress","label":"Enumerating objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.001083,"nesting":2,"category":"progress","label":"Enumerating objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.001435,"nesting":1,"category":"pack-objects","label":"enumerate-objects"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":1,"category":"pack-objects","label":"prepare-pack"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":2,"category":"progress","label":"Counting objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.000075,"nesting":2,"category":"progress","label":"Counting objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.000090,"nesting":1,"category":"pack-objects","label":"prepare-pack"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":1,"category":"pack-objects","label":"write-pack-file"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":2,"category":"progress","label":"Writing objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.003972,"nesting":2,"category":"progress","label":"Writing objects"}
{"event":"data","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_abs":0.008451,"t_rel":0.003996,"nesting":2,"category":"pack-objects","key":"write_pack_file/wrote","value":"1"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.004003,"nesting":1,"category":"pack-objects","label":"write-pack-file"}
{"event":"data","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_abs":0.008482,"t_rel":0.008482,"nesting":1,"category":"fsync","key":"fsync/writeout-only","value":"2"}
{"event":"exit","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","t_abs":0.008489,"code":0}
{"event":"atexit","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","t_abs":0.008495,"code":0}