Experiment with pushing internal job metrics to InfluxDB
Context
Related to #416597. In particular, have a look at the Technical Implementation.
What does this MR do and why?
CI_JOB_METRICS_ENABLED
not set to true
Disabled with See https://gitlab.com/gitlab-org/gitlab/-/jobs/4921091933#L158:
$ tooling/bin/push_job_metrics || true
[job-metrics] Feature disabled because CI_JOB_METRICS_ENABLED is not set to true.
General
Add CI job metrics to InfluxDB. Specifically, we add the rspec_retried_in_new_process
metric, to track when a job triggered a new RSpec process.
Results
In !125546 (83f7d9f3), we're displaying the metrics instead of pushing them. THis change additionally makes some specs fail, so that we can test whether we'll push metrics in all scenarios
-
https://gitlab.com/gitlab-org/gitlab/-/jobs/4752968967
-
✅ Creates the job metrics file -
✅ "Pushes" the job metrics file - Ruby hash it was about to push:
-
{:name=>"job-metrics", :time=>2023-07-27 13:50:34 +0000, :tags=>{:job_finished_at=>"2023-07-27T14:07:32+00:00", :job_name=>"rspec fail-fast", :job_stage=>"test", :job_started_at=>"2023-07-27T14:02:40Z", :job_status=>"running", :project_id=>"278964", :rspec_retried_in_new_process=>"true", :server_host=>"gitlab.com"}, :fields=>{:job_id=>"4752968967", :merge_request_iid=>"125546", :pipeline_id=>"947415645"}}
Note the rspec_retried_in_new_process
key set to true
How to set up and validate locally
# Should fail because `JOB_METRICS_FILE_PATH` isn't set
tooling/bin/create_job_metrics_file || true
echo $?
export JOB_METRICS_FILE_PATH=tmp/job-metrics.json
rm -rf $JOB_METRICS_FILE_PATH
tooling/bin/create_job_metrics_file || true # Should succeed
# Check the metrics file
cat $JOB_METRICS_FILE_PATH | jq .
# Update the metrics file
tooling/bin/update_job_metrics_tag rspec_retried_in_new_process 1 || true
# Check the metrics file
cat $JOB_METRICS_FILE_PATH | jq .
# Push the metrics to influxDB - should fail since you don't have the env variables configured
tooling/bin/push_job_metrics || true
echo $?
🆕 In case we need to push less often to InfluxDB 🆕
Push only half of the time to InfluxDB.
# We push metrics 50% of the time.
if rand < 0.5
puts "[job-metrics] Will not push to influxDB (we only push in 50% of the cases)."
exit(1)
end
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.