Add ability for Service Ping to catch errors for singular metrics
requested to merge 378475-enhance-service-ping-error-reporting-with-ability-to-report-on-singular-metric-issues into master
What does this MR do and why?
When investigating failing metrics we need ability to get error descriptions.
This MR is the first integration. It adds ability to catch exceptions raised during legacy (non instrumented) metrics generation and reporting it along with other metadata.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Modify one of the metics to raise an exception, for example:
Index: lib/gitlab/usage_data.rb
<+>UTF-8
===================================================================
diff --git a/lib/gitlab/usage_data.rb b/lib/gitlab/usage_data.rb
--- a/lib/gitlab/usage_data.rb (revision c7b048a86655ac5cdfa1583d00c92fafd22d3083)
+++ b/lib/gitlab/usage_data.rb (date 1670556094588)
@@ -51,7 +51,7 @@
recorded_at: recorded_at,
uuid: add_metric('UuidMetric'),
hostname: add_metric('HostnameMetric'),
- version: alt_usage_data { Gitlab::VERSION },
+ version: alt_usage_data { raise StandardError, 'Cant find version' },
installation_type: alt_usage_data { installation_type },
active_user_count: add_metric('ActiveUserCountMetric'),
edition: 'CE'
in Rails console verify error reporting
> payload = Gitlab::Usage::ServicePingReport.for(output: :all_metrics_values)
> def metrics_collection_time(payload, parents = [])
return [] unless payload.is_a?(Hash)
payload.flat_map do |key, metric_value|
key_path = parents.dup.append(key)
if metric_value.respond_to?(:duration)
{ name: key_path.join('.'), time_elapsed: metric_value.duration, error: metric_value.error }
else
metrics_collection_time(metric_value, key_path)
end
end
end
> q = metrics_collection_time(payload.to_h)
[88] pry(main)> q[0]
=> {:name=>"version", :time_elapsed=>0.0032210350036621094, :error=>#<StandardError: Cant find version>}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #378475 (closed)
Edited by Niko Belokolodov