Skip to content

GraphQL: Implement failure_reason filter in AllJobsResolver

Pedro Pombeiro requested to merge pedropombeiro/421889/2-add-graphql-query into master

What does this MR do and why?

MR Description
!130560 (merged) Implement redis cache of failed builds executed on instance runners
!130579 (merged) you are here GraphQL: Implement failure_reason filter in AllJobsResolver
!131124 (merged) Filter job failures in runner fleet dashboard

This MR implements a failure_reason argument in AllJobsResolver. This argument is only supported with a value of RUNNER_SYSTEM_FAILURE and when used on instance runners (due to the Redis list that was implemented in !130560 (merged)).

EE: true

Closes #421889 (closed)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

image

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Go to the shell in your GDK gitlab directory and run bundle exec rake "gitlab:seed:runner_fleet". This will seed your GDK with some runners and jobs required for testing this MR.

  2. You should now have a few failed jobs run by an instance runner. Let's update a couple of those to have the failure_reason we're interested in. Run the following command on the GDK console:

    Ci::Build.with_runner_type(:instance_type).order(id: :desc).where(failure_reason: :unknown_failure).limit(2).update_all(failure_reason: :runner_system_failure)

    Make sure that you get a number that is higher than 0 (ideally 2). If not, you will need to restart from step 1 to seed a few more builds.

  3. Open http://gdk.test:3000/-/graphql-explorer and run the following query:

    {
      jobs(failureReason: RUNNER_SYSTEM_FAILURE, runnerTypes: [INSTANCE_TYPE]) {
        count
        nodes {
          id
          status
          failureMessage
        }
      }
    }

    You should see 1 or 2 builds listed there.

Database query plan

The database calls are simply performing lookup by ID (retrieved from a Redis list). The remainder of the the conditions ("ci_builds"."failure_reason" = 4) are already satisfied by the builds referenced by the IDs, so the database won't have to do any extra work:

Started POST "/api/graphql" for 127.0.0.1 at 2023-09-01 15:58:18 +0200
Processing by Gitlab::RequestForgeryProtection::Controller#index as HTML
Completed 200 OK in 1ms (ActiveRecord: 0.0ms | Elasticsearch: 0.0ms | Allocations: 133)

Processing by GraphqlController#execute as */*
  Parameters: {"query"=>"{\n  jobs(failureReason: RUNNER_SYSTEM_FAILURE) {\n    count\n    nodes {\n      id\n      status\n      failureMessage\n    }\n  }\n}\n", "variables"=>"[FILTERED]", "graphql"=>{"query"=>"{\n  jobs(failureReason: RUNNER_SYSTEM_FAILURE) {\n    count\n    nodes {\n      id\n      status\n      failureMessage\n    }\n  }\n}\n", "variables"=>"[FILTERED]"}}
  Ci::Build Count (0.5ms)  SELECT COUNT(*) FROM (SELECT 1 AS one FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."id" IN (18112166, 18112166) AND "ci_builds"."failure_reason" = 4 LIMIT 1001) subquery_for_count /*application:web,correlation_id:01H98GK73GXH790QT46JBPN2GD,endpoint_id:graphql:unknown,db_config_name:ci,line:/app/graphql/types/limited_countable_connection_type.rb:20:in `count'*/
  ↳ config/initializers/kaminari_active_record_relation_methods_with_limit.rb:31:in `total_count_with_limit'
  Ci::Build Load (0.4ms)  SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."id" IN (18112166, 18112166) AND "ci_builds"."failure_reason" = 4 ORDER BY "ci_builds"."finished_at" DESC, "ci_builds"."id" DESC LIMIT 101 /*application:web,correlation_id:01H98GK73GXH790QT46JBPN2GD,endpoint_id:graphql:unknown,db_config_name:ci,line:/lib/gitlab/graphql/pagination/keyset/connection.rb:122:in `block in limited_nodes'*/
  ↳ lib/gitlab/graphql/pagination/keyset/connection.rb:122:in `block in limited_nodes'
Completed 200 OK in 29ms (Views: 0.1ms | ActiveRecord: 1.2ms | Elasticsearch: 0.0ms | Allocations: 27867)
SELECT COUNT(*)

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/21912/commands/70955

SELECT COUNT(*)
FROM (
  SELECT 1 AS one
  FROM "ci_builds"
  WHERE "ci_builds"."type" = 'Ci::Build'
    AND "ci_builds"."id" IN (4998268970, 4998268425, 4998268250, 4998250228, 4996875630, 4996718890, 4996692191,
      4996609876, 4996547461, 4994219957, 4994201571, 4991292879, 4991148681, 4991063794, 4990013546, 4930066358,
      4930051457, 4990013242, 4990012918, 4984165038, 4929322197, 4929322196, 4929020402, 4981665074, 4924801096,
      4924801095, 4980252015, 4973321350, 4924158055, 4923794135, 4922757967, 4973321310, 4973321197, 4967629991,
      4921077951, 4921076810, 4921076732, 4921076696, 4921056720, 4920795854, 4920795852, 4920733804, 4919967462,
      4919015591, 4918999026, 4918999025, 4918969597, 4996989149, 4966109451, 4966090281, 4966035646, 4965969273,
      4965234293, 4965233978, 4965212280, 4965094078, 4964235417, 4915490404, 4961033528, 4912567364, 4912566150,
      4912548487, 4912458410, 4911454578, 4961025426, 4961025310, 4960192424, 4960191870, 4959937756, 4959097837,
      4959087657, 4958926646, 4953869832, 4953869824, 4953796623, 4953670729, 4953029332, 4907383632, 4907383213,
      4907372086, 4907278439, 4907273909, 4907151334, 4906870887, 4953009921, 4952055501, 4905323466, 4905313773,
      4951452438, 4951252943, 4943849425, 4903353805, 4941633538, 4898833430, 4939642345, 4939642336, 4938564575,
      4938428899, 4896519712, 4938151990)
  LIMIT 1001) subquery_for_count
 Aggregate  (cost=337.03..337.04 rows=1 width=8) (actual time=1148.501..1148.503 rows=1 loops=1)
   Buffers: shared hit=223 read=274 dirtied=1
   I/O Timings: read=1138.720 write=0.000
   ->  Limit  (cost=0.58..335.86 rows=94 width=4) (actual time=39.963..1148.189 rows=100 loops=1)
         Buffers: shared hit=223 read=274 dirtied=1
         I/O Timings: read=1138.720 write=0.000
         ->  Index Scan using ci_builds_pkey on public.ci_builds  (cost=0.58..335.86 rows=94 width=4) (actual time=39.961..1147.984 rows=100 loops=1)
               Index Cond: (ci_builds.id = ANY ('{4998268970,4998268425,4998268250,4998250228,4996875630,4996718890,4996692191,4996609876,4996547461,4994219957,4994201571,4991292879,4991148681,4991063794,4990013546,4930066358,4930051457,4990013242,4990012918,4984165038,4929322197,4929322196,4929020402,4981665074,4924801096,4924801095,4980252015,4973321350,4924158055,4923794135,4922757967,4973321310,4973321197,4967629991,4921077951,4921076810,4921076732,4921076696,4921056720,4920795854,4920795852,4920733804,4919967462,4919015591,4918999026,4918999025,4918969597,4996989149,4966109451,4966090281,4966035646,4965969273,4965234293,4965233978,4965212280,4965094078,4964235417,4915490404,4961033528,4912567364,4912566150,4912548487,4912458410,4911454578,4961025426,4961025310,4960192424,4960191870,4959937756,4959097837,4959087657,4958926646,4953869832,4953869824,4953796623,4953670729,4953029332,4907383632,4907383213,4907372086,4907278439,4907273909,4907151334,4906870887,4953009921,4952055501,4905323466,4905313773,4951452438,4951252943,4943849425,4903353805,4941633538,4898833430,4939642345,4939642336,4938564575,4938428899,4896519712,4938151990}'::bigint[]))
               Filter: ((ci_builds.type)::text = 'Ci::Build'::text)
               Rows Removed by Filter: 0
               Buffers: shared hit=223 read=274 dirtied=1
               I/O Timings: read=1138.720 write=0.000
SELECT "ci_builds".*

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/21912/commands/70958

SELECT "ci_builds".*
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build'
  AND "ci_builds"."id" IN (4998268970, 4998268425, 4998268250, 4998250228, 4996875630, 4996718890, 4996692191,
      4996609876, 4996547461, 4994219957, 4994201571, 4991292879, 4991148681, 4991063794, 4990013546, 4930066358,
      4930051457, 4990013242, 4990012918, 4984165038, 4929322197, 4929322196, 4929020402, 4981665074, 4924801096,
      4924801095, 4980252015, 4973321350, 4924158055, 4923794135, 4922757967, 4973321310, 4973321197, 4967629991,
      4921077951, 4921076810, 4921076732, 4921076696, 4921056720, 4920795854, 4920795852, 4920733804, 4919967462,
      4919015591, 4918999026, 4918999025, 4918969597, 4996989149, 4966109451, 4966090281, 4966035646, 4965969273,
      4965234293, 4965233978, 4965212280, 4965094078, 4964235417, 4915490404, 4961033528, 4912567364, 4912566150,
      4912548487, 4912458410, 4911454578, 4961025426, 4961025310, 4960192424, 4960191870, 4959937756, 4959097837,
      4959087657, 4958926646, 4953869832, 4953869824, 4953796623, 4953670729, 4953029332, 4907383632, 4907383213,
      4907372086, 4907278439, 4907273909, 4907151334, 4906870887, 4953009921, 4952055501, 4905323466, 4905313773,
      4951452438, 4951252943, 4943849425, 4903353805, 4941633538, 4898833430, 4939642345, 4939642336, 4938564575,
      4938428899, 4896519712, 4938151990)
ORDER BY "ci_builds"."finished_at" DESC, "ci_builds"."id" DESC
LIMIT 101
 Limit  (cost=338.94..339.17 rows=94 width=1251) (actual time=0.998..1.012 rows=100 loops=1)
   Buffers: shared hit=503
   I/O Timings: read=0.000 write=0.000
   ->  Sort  (cost=338.94..339.17 rows=94 width=1251) (actual time=0.996..1.002 rows=100 loops=1)
         Sort Key: ci_builds.finished_at DESC, ci_builds.id DESC
         Sort Method: quicksort  Memory: 51kB
         Buffers: shared hit=503
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using ci_builds_pkey on public.ci_builds  (cost=0.58..335.86 rows=94 width=1251) (actual time=0.042..0.850 rows=100 loops=1)
               Index Cond: (ci_builds.id = ANY ('{4998268970,4998268425,4998268250,4998250228,4996875630,4996718890,4996692191,4996609876,4996547461,4994219957,4994201571,4991292879,4991148681,4991063794,4990013546,4930066358,4930051457,4990013242,4990012918,4984165038,4929322197,4929322196,4929020402,4981665074,4924801096,4924801095,4980252015,4973321350,4924158055,4923794135,4922757967,4973321310,4973321197,4967629991,4921077951,4921076810,4921076732,4921076696,4921056720,4920795854,4920795852,4920733804,4919967462,4919015591,4918999026,4918999025,4918969597,4996989149,4966109451,4966090281,4966035646,4965969273,4965234293,4965233978,4965212280,4965094078,4964235417,4915490404,4961033528,4912567364,4912566150,4912548487,4912458410,4911454578,4961025426,4961025310,4960192424,4960191870,4959937756,4959097837,4959087657,4958926646,4953869832,4953869824,4953796623,4953670729,4953029332,4907383632,4907383213,4907372086,4907278439,4907273909,4907151334,4906870887,4953009921,4952055501,4905323466,4905313773,4951452438,4951252943,4943849425,4903353805,4941633538,4898833430,4939642345,4939642336,4938564575,4938428899,4896519712,4938151990}'::bigint[]))
               Filter: ((ci_builds.type)::text = 'Ci::Build'::text)
               Rows Removed by Filter: 0
               Buffers: shared hit=497
               I/O Timings: read=0.000 write=0.000

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Pedro Pombeiro

Merge request reports

Loading