Add feature flag field to Gitaly logs
When debugging Gitaly feature flags on production, one of the biggest difficulties is to determine whether a flag is on/off for a particular call. We don't put this information in the logs. Although we have time series metrics about feature flag evaluation, they are barely useful for debugging purpose.
In the code case, we implemented a pair of featureflag middlewares to log out the flags as a field of the logger. Unfortunately, the volume of logs is too huge that it pushes Elasticsearch to limit. We need to find another way. There are some rooms for improvement I observed:
- The middlewares stringify the flags using
flag_a:true flag_b:false flag_c:true
. This format is not compact. Disabled flags can be implied if the name is not there. Afterward, the log field becomesflag_a flag_c
. - The middleware attaches the feature_flag field to the logger. As a result, all subsequent logs will have this field. In most cases, adding the field after the call finished is enough. Converting the middlewares to FieldsProducer will remove redundant info.
- We may not need to attach this field to every logs. Attaching it to RPC failure is a good start. As a result, I added
AlwaysLogFeatureFlag
flag for this purpose.
Closes #4622 (closed)
Testing scenarios
All the following tests are conducted using the following feature flags:
[68] pry(main)> Feature::Gitaly.server_feature_flags
=> {"gitaly-feature-mep-mep"=>"true",
"gitaly-feature-go-find-license"=>"true",
"gitaly-feature-node-error-cancels-voter"=>"false",
"gitaly-feature-always-log-feature-flags"=>"true"}
The result is intercepted from Gitaly logs. I use jq to make them nicer for reading.
gdk tail gitaly | jq --stream --unbuffered --raw-input 'split("{")|.[1:]|join("{")|"{" + .|fromjson|{feature_flags, "grpc.request.fullMethod", "grpc.meta.client_name", "grpc.code", "msg"}'
feature_flag field is not included by default
- Unary call
- Streaming call
When a RPC fails, the logs attach this field. Other logs in the same RPC don't have this field, though
- Unary call
- Streaming call
When AlwaysLogFeatureFlag is on, this field is always attached
- Unary call
- Streaming call