Pipeline duration is wrong when a child pipeline job is started after the main pipeline finished
Problem
As discovered in gitlab-org/quality/engineering-analytics/team-tasks#99 (comment 1236437768):
I believe the bug may come from https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/pipeline/duration.rb where a pipeline duration is the sum of all the union of the
job_end - job_start
periods. I believe the problem comes from "bridge" jobs: these jobs are actually pipelines, and since theGitlab::Ci::Pipeline::Duration
class looks atbuilds.finished_at
, that means it would look at the child pipelinefinished_at
in the case of a "bridge" job. This is not a problem when all the jobs in the child pipeline runs sequentially with no queuing time, but in our case, thereview-stop
jobs starts 6 hours after thereview-deploy
one...This is actually an expected behavior in terms of calculation when a pipeline has bridge jobs: https://gitlab.com/gitlab-org/gitlab/-/blob/c163d6f2583740af8b763054afaeee430a8ede74/spec/lib/gitlab/ci/pipeline/duration_spec.rb#L166-174, but I think this is super confusing.
An example of such problematic child pipeline can be seen visually at gitlab-org/quality/engineering-productivity/team#140 (comment 1229942110). While the child pipeline has a correct duration of 53 minutes (correctly excluding the "idle" time before review-stop
started), the parent pipeline has an incorrect duration of 434 minutes, since its start-review-app-pipeline
bridge job's duration is equal to child_pipeline_end - child_pipeline_start
.
Proposed solution
I think one way to fix this could be to select all of the child's jobs for the duration calculation, instead of the bridge job, which is actually a pipeline.