Tracing: Have a deeper look into Git process with trace2
For #4816 (closed)
Gitaly offloads a lot of complicated operations to git processes. When something goes wrong, it's quite hard to debug deeper into Git processes. In local environment, it's easy to plug some performance tools to measure how git process performs. It's not the case for the production environment. We can use some exposed logs, metrics, or similar statistical data. I wish we had a full descriptive picture, step by step, of an action. Distributed tracing may solve this problem. However, it has major problem: Most spans stop at git-spawning steps.
Git bundles a tool called Trace2. This tool enables Git processes to print underlying significant events and corresponding measurements. It collects the activities of child processes as well. Obviously, it adds some overhead to git processes and maybe some risks when passing the events in a fd. It may not very wise to enable Trace2 for all Git processes. However, it's a perfect fits for distributed tracing. We are sampling data at a very low rate. The exported events are also convenient to transform into Tracing spans.
So, the initial idea is to enable Trace2 for sampled trace only (blocked by Tracing: Add span's sampling status check to la... (#4805 - closed)). Apart from being used by distributed tracing, trace2 is also useful to export some underlying metrics, such as pack-objects (#4699 (closed)). So, it makes sense to convert the events to a tree data structure. This tree provides a walking method to implement hooks that serve data exporting.
I also opened another MR to link traces of gitaly-hooks (tracing: Fix disconnected spans in gitaly-hooks (!5419 - merged)). Afterward, we'll have full visibility of git hooks (pre-preceive, post-preceive, update-ref, etc.)
Before
The trace stops at git-something command. We have no idea what's going on.
After
Trace2 hooking is also added to Gitaly gRPC logs:
Note
@igorwwwwwwwwwwwwwwwwwwww mentioned after Implement tracing sampling status checker (labkit!175 - merged) that there is a possibility the size of Gitaly binary increases after adding that sampling check. I cast the opentracing.Span
to discrete structs. This may attach unexpected codes in the binary. Here's the comparison, both versions build with make all WITH_BUNDLED_GIT=YesPlease BUILD_TAGS="tracer_static tracer_static_jaeger"
command.
- Before this MR:
- This MR:
The binary size indeed increases by around 5%. It looks like we need more work on this