Make distributed tracing useful to Gitaly
For #4762 (comment 1260932936)
When debugging some performance problems on production, we realized the distributed tracing is now enabled on production. However, it has some major problems, making it not very useful. I think it's a low-hanging fruit. With some minor safe changes, the tool can become a valuable tool to debug an issue on production. It's a great addition to existing metrics and logs toolbox.
This MR adds a series of changes:
- Enhance command span, resolve the following problems:
- Simplify span command. Before, we use full command path as the operation span. It creates tons of noises and makes it impossible to search. The new version simplifies it to
git-diff
,git-rev-parse
,tar
,du
, etc. - Attach command result as span stags. The prior version logs command results as detached logs. Depending on platforms, the logs are collected differently. Worse, they may be rejected. Attaching them as tags improves its readability.
- Simplify span command. Before, we use full command path as the operation span. It creates tons of noises and makes it impossible to search. The new version simplifies it to
- Fix orphaned spans from catfile cache. Previously, we create a span before issuing catfile.getOrCreateProcess. This span finishes when the process exists. This process cache intends to share the process between requests. Its lifecycle spans between different requests, and may last for several seconds/minutes. As a result, the original span never ends. Eventually, it is sent to tracing server and becomes an orphaned span. This situation adds a lot of noises.
- Fix orphaned spans while Gitaly boosts.
- Add more spans to key modules, such as internal/middleware/limithandler, internal/git/housekeeping
Some screenshots to demonstrate the changes:
Edited by Quang-Minh Nguyen