Introduce namespace/project hierarchy tables
What does this MR do and why?
See #336432 (closed)
Note: This MR supersedes !67237 (closed) which implemented previous attempts to solve this problem.
As part of the initiative to extract the ci_*
tables from the main database, we need to rework queries that join between the two. The RunnersFinder#group_runners
finder is currently joining ci_runners
with projects
and groups
, through the belonging_to_group_or_project
scope on Ci::Runners
.
This MR concerns preparatory work required to remove the join. It:
- adds 2 new tables
ci_namespace_hierarchies
/ci_project_hierarchies
to theci
database which shadow themain
database'snamespaces
/projects
tables so that we can quickly answer the question "What is the hierarchy of this namespace/project" in the upcomingci
database without touching themain
database. These new tables contain records only for namespaces/projects which have at least one runner registered; - keeps the new
ci_namespace_hierarchies
/ci_project_hierarchies
in sync by adding 2 other new "sync event" tables to themain
database. These tables get a new record added whenever a namespace is added or otherwise has its parent changed; - adds 2 new sidekiq jobs that consume events from
namespaces_sync_events
/projects_sync_events
in order to keepci_namespace_hierarchies
/ci_project_hierarchies
in sync on theci
database; - adds background migrations that will progressively populate
ci_namespace_hierarchies
/ci_project_hierarchies
from thenamespaces
/projects
tables.
The idea is that the background migrations will be busy at work bringing existing records to the ci_namespace_hierarchies
/ci_project_hierarchies
tables, while the "sync events" tables/sidekiq will take care of syncing any changes as they happen in the foreground.
TODO list
-
Migrations (kick off background migrations backfill_namespace_hierarchies.rb
/backfill_project_hierarchies.rb
) -
Adapt query (TBD in separate MR: !74386 (closed)) - Maybe add a feature flag to allow reverting to the previous query if needed.
-
Sync traversal IDs -
on ci_project_hierarchies
throughprojects_sync_events
when project is created -
on ci_project_hierarchies
throughprojects_sync_events
when project is deleted -
on ci_namespace_hierarchies
throughnamespaces_sync_events
when namespace is created -
on ci_namespace_hierarchies
throughnamespaces_sync_events
when namespace is deleted -
on ci_project_hierarchies
throughprojects_sync_events
when project is transferred to another group -
on ci_project_hierarchies
/ci_namespace_hierarchies
throughnamespaces_sync_events
when namespace is moved -
on ci_project_hierarchies
when parent group is deleted
-
-
Clean up code/tidy up approach -
Add tests once initial review is done -
Test that creating namespace creates a namespaces_sync_event
record and callsCi::ProcessNamespaceEventsWorker.perform_async
-
Test that creating project creates a projects_sync_event
record and callsCi::ProcessProjectEventsWorker.perform_async
-
Test that updating namespace hierarchy updates dependent ci_namespace_hierarchies
andci_project_hierarchies
records -
Test that updating project's namespace hierarchy updates respective ci_project_hierarchies
record -
Test ProcessNamespaceEventsWorker
-
Test ProcessProjectEventsWorker
-
Test BackfillNamespaceHierarchies
-
Test that migration does the right thing if run after records have been added to ci_namespace_hieararchies
-
-
Test BackfillProjectHierarchies
-
Test that migration does the right thing if run after records have been added to ci_project_hieararchies
-
-
Tested scenarios on GDK instance
-
Background migrations, ci_namespace_hierarchies
/ci_project_hierarchies
records are added✅ -
Delete a group, ci_namespace_hierarchies
record is deleted✅ -
Delete parent group of a group, ci_namespace_hierarchies
record is deleted✅ -
Delete parent group of a project, ci_project_hierarchies
record is deleted✅ -
Delete a project, ci_project_hierarchies
record is deleted✅
Questions for reviewer
-
database There are 2 indices generated for
ci_project_hierarchies
:ci_project_hierarchies_pkey
(primary key) andindex_ci_project_hierarchies_on_project_id
. Isindex_ci_project_hierarchies_on_project_id
required? I see other examples in the code base, so I assume yes, but not exactly sure why. - database The backfill queries are probably not as optimized as they could be, it might be worth spending some time there.
Migrations
db:migrate
bin/rails db:migrate RAILS_ENV=development
== 20211011140930 CreateCiNamespaceHierarchies: migrating =====================
-- create_table(:ci_namespace_hierarchies, {:id=>false})
-> 0.0210s
== 20211011140930 CreateCiNamespaceHierarchies: migrated (0.0211s) ============
== 20211011140931 CreateCiProjectHierarchies: migrating =======================
-- create_table(:ci_project_hierarchies, {:id=>false})
-> 0.0124s
== 20211011140931 CreateCiProjectHierarchies: migrated (0.0125s) ==============
== 20211011140932 CreateNamespacesSyncEvents: migrating ==========================
-- create_table(:namespaces_sync_events, {})
-> 0.0089s
== 20211011140932 CreateNamespacesSyncEvents: migrated (0.0089s) =================
== 20211011141239 CreateProjectsSyncEvents: migrating ============================
-- create_table(:projects_sync_events, {})
-> 0.0087s
== 20211011141239 CreateProjectsSyncEvents: migrated (0.0088s) ===================
== 20211011141240 AddCiNamespaceHierarchies: migrating ========================
-- Scheduled 1 BackfillNamespaceHierarchies jobs with a maximum of 1000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-10-11 12:15:47 UTC."
== 20211011141240 AddCiNamespaceHierarchies: migrated (0.1486s) ===============
== 20211011141241 AddCiProjectHierarchies: migrating ==========================
-- Scheduled 1 BackfillProjectHierarchies jobs with a maximum of 1000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-10-11 12:15:47 UTC."
== 20211011141241 AddCiProjectHierarchies: migrated (0.1983s) =================
db:rollback
Rolling back 6 branch migration(s)...
== 20211011141241 AddCiProjectHierarchies: reverting ==========================
-- execute("DELETE FROM ci_project_hierarchies")
-> 0.0050s
== 20211011141241 AddCiProjectHierarchies: reverted (0.0051s) =================
== 20211011141240 AddCiNamespaceHierarchies: reverting ========================
-- execute("DELETE FROM ci_namespace_hierarchies")
-> 0.0018s
== 20211011141240 AddCiNamespaceHierarchies: reverted (0.0019s) ===============
== 20211011141239 CreateProjectsSyncEvents: reverting ============================
-- drop_table(:projects_sync_events, {})
-> 0.0060s
== 20211011141239 CreateProjectsSyncEvents: reverted (0.0094s) ===================
== 20211011140932 CreateNamespacesSyncEvents: reverting ==========================
-- drop_table(:namespaces_sync_events, {})
-> 0.0033s
== 20211011140932 CreateNamespacesSyncEvents: reverted (0.0034s) =================
== 20211011140931 CreateCiProjectHierarchies: reverting =======================
-- drop_table(:ci_project_hierarchies, {:id=>false})
-> 0.0040s
== 20211011140931 CreateCiProjectHierarchies: reverted (0.0041s) ==============
== 20211011140930 CreateCiNamespaceHierarchies: reverting =====================
-- drop_table(:ci_namespace_hierarchies, {:id=>false})
-> 0.0048s
== 20211011140930 CreateCiNamespaceHierarchies: reverted (0.0050s) ============
New SQL tables after migration
gitlabhq_development> \d namespaces_sync_events
╒═══════════════════╤═══════════╤════════════════════════════════════════════════════════════════╕
│ Column │ Type │ Modifiers │
╞═══════════════════╪═══════════╪════════════════════════════════════════════════════════════════╡
│ id │ bigint │ not null default nextval('namespaces_sync_events_id_seq'::regclass) │
├───────────────────┼───────────┼────────────────────────────────────────────────────────────────┤
│ traversal_ids │ integer[] │ not null default '{}'::integer[] │
├───────────────────┼───────────┼────────────────────────────────────────────────────────────────┤
│ new_traversal_ids │ integer[] │ not null default '{}'::integer[] │
╘═══════════════════╧═══════════╧════════════════════════════════════════════════════════════════╛
Indexes:
"namespaces_sync_events_pkey" PRIMARY KEY, btree (id)
gitlabhq_development> \d projects_sync_events
╒═══════════════════╤═══════════╤══════════════════════════════════════════════════════════════╕
│ Column │ Type │ Modifiers │
╞═══════════════════╪═══════════╪══════════════════════════════════════════════════════════════╡
│ id │ bigint │ not null default nextval('projects_sync_events_id_seq'::regclass) │
├───────────────────┼───────────┼──────────────────────────────────────────────────────────────┤
│ project_id │ bigint │ not null │
├───────────────────┼───────────┼──────────────────────────────────────────────────────────────┤
│ traversal_ids │ integer[] │ not null default '{}'::integer[] │
├───────────────────┼───────────┼──────────────────────────────────────────────────────────────┤
│ new_traversal_ids │ integer[] │ not null default '{}'::integer[] │
╘═══════════════════╧═══════════╧══════════════════════════════════════════════════════════════╛
Indexes:
"projects_sync_events_pkey" PRIMARY KEY, btree (id)
"index_projects_sync_events_on_project_id" btree (project_id)
Foreign-key constraints:
"fk_rails_a57b0af680" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
gitlabhq_development> \d ci_namespace_hierarchies
╒═══════════════╤═══════════╤══════════════════════════════════════════════════════════════════════════════════╕
│ Column │ Type │ Modifiers │
╞═══════════════╪═══════════╪══════════════════════════════════════════════════════════════════════════════════╡
│ namespace_id │ bigint │ not null default nextval('ci_namespace_hierarchies_namespace_id_seq'::regclass) │
├───────────────┼───────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ traversal_ids │ integer[] │ not null default '{}'::integer[] │
╘═══════════════╧═══════════╧══════════════════════════════════════════════════════════════════════════════════╛
Indexes:
"ci_namespace_hierarchies_pkey" PRIMARY KEY, btree (namespace_id)
"index_ci_namespace_hierarchies_on_namespace_id" UNIQUE, btree (namespace_id)
"index_gin_ci_namespace_hierarchies" gin (traversal_ids)
Foreign-key constraints:
"fk_rails_de9e668a9e" FOREIGN KEY (namespace_id) REFERENCES namespaces(id) ON DELETE CASCADE
gitlabhq_development> \d ci_project_hierarchies
┌───────────────┬───────────┬───────────┬──────────┬────────────────────────────────────────────────────────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├───────────────┼───────────┼───────────┼──────────┼────────────────────────────────────────────────────────────┤
│ project_id │ bigint │ │ not null │ nextval('ci_project_hierarchies_project_id_seq'::regclass) │
│ traversal_ids │ integer[] │ │ not null │ '{}'::integer[] │
└───────────────┴───────────┴───────────┴──────────┴────────────────────────────────────────────────────────────┘
Indexes:
"ci_project_hierarchies_pkey" PRIMARY KEY, btree (project_id)
"index_gin_ci_project_hierarchies" gin (traversal_ids)
Foreign-key constraints:
"fk_rails_7d69e0670e" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
gitlabhq_dblab=# SELECT pg_size_pretty( pg_total_relation_size('ci_namespace_hierarchies') ) AS ci_namespace_hierarchies, pg_size_pretty( pg_total_relation_size('ci_project_hierarchies') ) AS ci_project_hierarchies;
ci_namespace_hierarchies | ci_project_hierarchies
--------------------------+------------------------
5808 kB | 44 MB
(1 row)
Query plans
NOTE: The queries below were executed against production data using a namespace which is one among the top 5 containing the most sub-groups (~6K), projects (~18K), and runners across groups and projects (~40K).
Backfill queries
Backfill all of ci_namespace_hierarchies in one go
INSERT INTO ci_namespace_hierarchies (namespace_id, traversal_ids)
WITH RECURSIVE cte (source_id, namespace_id, parent_id, height)
AS ((
SELECT
batch.id,
batch.id,
batch.parent_id,
1
FROM (
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."id" NOT IN (
SELECT
"ci_namespace_hierarchies"."namespace_id"
FROM
"ci_namespace_hierarchies")) AS batch)
UNION ALL (
SELECT
cte.source_id,
n.id,
n.parent_id,
cte.height + 1
FROM
namespaces n,
cte
WHERE
n.id = cte.parent_id))
SELECT
flat_hierarchy.source_id AS namespace_id,
array_agg(flat_hierarchy.namespace_id ORDER BY flat_hierarchy.height DESC) AS traversal_ids
FROM (
SELECT
*
FROM
cte
FOR UPDATE) flat_hierarchy
GROUP BY
flat_hierarchy.source_id;
INSERT 0 12313657
Time: 1168446.334 ms (19:28.446)
Execution plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6915/commands/24495
ModifyTable on public.ci_namespace_hierarchies (cost=184560695.53..189234310.40 rows=200 width=40) (actual time=616680.502..616680.540 rows=0 loops=1)
Buffers: shared hit=173493357 read=63059 dirtied=256847 written=257644
I/O Timings: read=2634.703 write=108.183
-> Subquery Scan on *SELECT* (cost=184560695.53..189234310.40 rows=200 width=40) (actual time=344763.403..380405.750 rows=12340446 loops=1)
Buffers: shared hit=72109108 read=63026
I/O Timings: read=2633.981 write=0.000
-> Aggregate (cost=184560695.53..189234307.90 rows=200 width=36) (actual time=344763.401..377117.503 rows=12340446 loops=1)
Group Key: cte.source_id
Buffers: shared hit=72109108 read=63026
I/O Timings: read=2633.981 write=0.000
CTE cte
-> Recursive Union (cost=38.21..60533069.52 rows=623147982 width=16) (actual time=2.670..322521.210 rows=13837913 loops=1)
Buffers: shared hit=72109100 read=63026
I/O Timings: read=2633.981 write=0.000
-> Index Only Scan using index_namespaces_on_parent_id_and_id on public.namespaces (cost=38.21..307097.46 rows=6169782 width=16) (actual time=2.669..8562.406 rows=12340446 loops=1)
Heap Fetches: 710675
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 0
Buffers: shared hit=3217529 read=63026
I/O Timings: read=2633.981 write=0.000
SubPlan 1
-> Index Only Scan using index_ci_namespace_hierarchies_on_namespace_id on public.ci_namespace_hierarchies ci_namespace_hierarchies_1 (cost=0.15..34.65 rows=1200 width=8) (actual time=0.014..0.017 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
-> Hash Join (cost=623263.46..4776301.24 rows=61697820 width=16) (actual time=11693.759..14479.891 rows=71308 loops=21)
Hash Cond: (cte_1.parent_id = n.id)
Buffers: shared hit=68891571
I/O Timings: read=0.000 write=0.000
-> WorkTable Scan on cte cte_1 (cost=0.00..1233956.40 rows=61697820 width=12) (actual time=0.016..177.224 rows=658948 loops=21)
I/O Timings: read=0.000 write=0.000
-> Hash (cost=276210.90..276210.90 rows=12339565 width=8) (actual time=11678.413..11678.414 rows=12340446 loops=21)
Buckets: 4194304 Batches: 8 Memory Usage: 87466kB
Buffers: shared hit=68891571
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_namespaces_on_parent_id_and_id on public.namespaces n (cost=0.56..276210.90 rows=12339565 width=8) (actual time=0.040..4086.611 rows=12340446 loops=21)
Heap Fetches: 14924175
Buffers: shared hit=68891571
I/O Timings: read=0.000 write=0.000
-> Sort (cost=124027626.01..125585495.97 rows=623147982 width=12) (actual time=344763.242..349113.929 rows=13837913 loops=1)
Sort Key: cte.source_id
Sort Method: external merge Disk: 297912kB
Buffers: shared hit=72109103 read=63026
I/O Timings: read=2633.981 write=0.000
-> CTE Scan on cte (cost=0.00..12462959.64 rows=623147982 width=12) (actual time=2.673..336329.001 rows=13837913 loops=1)
Buffers: shared hit=72109100 read=63026
I/O Timings: read=2633.981 write=0.000
Backfill all of ci_project_hierarchies in one go
INSERT INTO ci_project_hierarchies (project_id, traversal_ids) ( WITH RECURSIVE cte (
source_id, namespace_id, parent_id, height
) AS ((
SELECT
batch.id,
0,
batch.parent_id,
1
FROM (
SELECT
"projects"."id",
namespaces.id AS parent_id
FROM
"projects"
INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
WHERE
"projects"."id" NOT IN (
SELECT
"ci_project_hierarchies"."project_id"
FROM
"ci_project_hierarchies")) AS batch)
UNION ALL (
SELECT
cte.source_id,
n.id,
n.parent_id,
cte.height + 1
FROM
namespaces n,
cte
WHERE
n.id = cte.parent_id))
SELECT
h.project_id,
h.traversal_ids[1:array_length(h.traversal_ids, 1) - 1]
FROM (
SELECT
flat_hierarchy.source_id AS project_id,
array_agg(flat_hierarchy.namespace_id ORDER BY flat_hierarchy.height DESC) AS traversal_ids
FROM (
SELECT
*
FROM
cte
FOR UPDATE) flat_hierarchy
GROUP BY
flat_hierarchy.source_id) AS h)
ON CONFLICT (project_id)
DO NOTHING;
INSERT 0 21266067
Time: 4497791.040 ms (01:14:57.791)
Execution plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6915/commands/24491
ModifyTable on public.ci_project_hierarchies (cost=352964722.34..361033635.49 rows=200 width=40) (actual time=1126555.089..1126555.157 rows=0 loops=1)
Buffers: shared hit=244902079 read=330668 dirtied=386080 written=335056
I/O Timings: read=19256.362 write=20.708
-> Subquery Scan on *SELECT* (cost=352964722.34..361033635.49 rows=200 width=40) (actual time=608440.105..710497.242 rows=21306142 loops=1)
Buffers: shared hit=95893597 read=330617 dirtied=46171 written=18
I/O Timings: read=19251.700 write=18.635
-> Subquery Scan on h (cost=352964722.34..361033632.99 rows=200 width=36) (actual time=608440.100..705093.510 rows=21306142 loops=1)
Buffers: shared hit=95893597 read=330617 dirtied=46171 written=18
I/O Timings: read=19251.700 write=18.635
CTE cte
-> Recursive Union (cost=45.41..99136168.69 rows=1075854020 width=16) (actual time=0.120..494405.234 rows=48292056 loops=1)
Buffers: shared hit=95893592 read=330617 dirtied=46171 written=18
I/O Timings: read=19251.700 write=18.635
-> Merge Join (cost=45.41..1085652.45 rows=10652020 width=16) (actual time=0.119..44527.935 rows=21306142 loops=1)
Merge Cond: (projects.namespace_id = namespaces.id)
Buffers: shared hit=23759112 read=282865 dirtied=45047 written=18
I/O Timings: read=17639.322 write=18.635
-> Index Only Scan using index_projects_on_namespace_id_and_id on public.projects (cost=38.09..579707.80 rows=10652020 width=8) (actual time=0.098..26624.283 rows=21306142 loops=1)
Heap Fetches: 1589215
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 0
Buffers: shared hit=20015507 read=176744 dirtied=35115 written=7
I/O Timings: read=12113.697 write=16.783
SubPlan 1
-> Index Only Scan using index_ci_project_hierarchies_on_project_id on public.ci_project_hierarchies ci_project_hierarchies_1 (cost=0.15..34.65 rows=1200 width=8) (actual time=0.010..0.011 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using namespaces_pkey on public.namespaces (cost=0.43..341970.12 rows=12339565 width=4) (actual time=0.018..9603.530 rows=12340435 loops=1)
Heap Fetches: 723547
Buffers: shared hit=3743605 read=106121 dirtied=9932 written=11
I/O Timings: read=5525.625 write=1.852
-> Hash Join (cost=623263.46..7653343.58 rows=106520200 width=16) (actual time=13719.658..18426.545 rows=1226632 loops=22)
Hash Cond: (cte_1.parent_id = n.id)
Buffers: shared hit=72134480 read=47752 dirtied=1124
I/O Timings: read=1612.378 write=0.000
-> WorkTable Scan on cte cte_1 (cost=0.00..2130404.00 rows=106520200 width=12) (actual time=0.060..729.858 rows=2195093 loops=22)
I/O Timings: read=0.000 write=0.000
-> Hash (cost=276210.90..276210.90 rows=12339565 width=8) (actual time=13650.188..13650.188 rows=12340446 loops=22)
Buckets: 4194304 Batches: 8 Memory Usage: 87466kB
Buffers: shared hit=72134480 read=47752 dirtied=1124
I/O Timings: read=1612.378 write=0.000
-> Index Only Scan using index_namespaces_on_parent_id_and_id on public.namespaces n (cost=0.56..276210.90 rows=12339565 width=8) (actual time=0.117..4074.879 rows=12340446 loops=22)
Heap Fetches: 15648501
Buffers: shared hit=72134480 read=47752 dirtied=1124
I/O Timings: read=1612.378 write=0.000
-> Aggregate (cost=253828553.65..261897461.30 rows=200 width=36) (actual time=608440.095..690666.308 rows=21306142 loops=1)
Group Key: cte.source_id
Buffers: shared hit=95893597 read=330617 dirtied=46171 written=18
I/O Timings: read=19251.700 write=18.635
-> Sort (cost=253828553.65..256518188.70 rows=1075854020 width=12) (actual time=608439.792..628528.696 rows=48292056 loops=1)
Sort Key: cte.source_id
Sort Method: external merge Disk: 1039672kB
Buffers: shared hit=95893592 read=330617 dirtied=46171 written=18
I/O Timings: read=19251.700 write=18.635
-> CTE Scan on cte (cost=0.00..21517080.40 rows=1075854020 width=12) (actual time=0.122..550722.760 rows=48292056 loops=1)
Buffers: shared hit=95893592 read=330617 dirtied=46171 written=18
I/O Timings: read=19251.700 write=18.635
Production queries
Existing query
SELECT
"ci_runners".*
FROM ((
SELECT
"ci_runners".*
FROM
"ci_runners"
INNER JOIN "ci_runner_namespaces" ON "ci_runner_namespaces"."runner_id" = "ci_runners"."id"
WHERE
"ci_runner_namespaces"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE (traversal_ids @> ('{5892345}')))))
UNION (
SELECT
"ci_runners".*
FROM
"ci_runners"
INNER JOIN "ci_runner_projects" ON "ci_runner_projects"."runner_id" = "ci_runners"."id"
WHERE
"ci_runner_projects"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE (traversal_ids @> ('{5892345}')))))) ci_runners
ORDER BY
"ci_runners"."created_at" DESC
LIMIT 100;
pg.ai link: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6915/commands/24499
Limit (cost=60172.31..60172.56 rows=100 width=3765) (actual time=651.283..652.031 rows=100 loops=1)
Buffers: shared hit=354611
I/O Timings: read=0.000 write=0.000
-> Sort (cost=60172.31..60175.06 rows=1098 width=3765) (actual time=651.280..652.014 rows=100 loops=1)
Sort Key: ci_runners.created_at DESC
Sort Method: top-N heapsort Memory: 85kB
Buffers: shared hit=354611
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=60108.39..60119.37 rows=1098 width=3765) (actual time=593.579..632.189 rows=41656 loops=1)
Group Key: ci_runners.id, ci_runners.token, ci_runners.created_at, ci_runners.updated_at, ci_runners.description, ci_runners.contacted_at, ci_runners.active, ci_runners.name, ci_runners.version, ci_runners.revision, ci_runners.platform, ci_runners.architecture, ci_runners.run_untagged, ci_runners.locked, ci_runners.access_level, ci_runners.ip_address, ci_runners.maximum_timeout, ci_runners.runner_type, ci_runners.token_encrypted, ci_runners.public_projects_minutes_cost_factor, ci_runners.private_projects_minutes_cost_factor, ci_runners.config
Buffers: shared hit=354608
I/O Timings: read=0.000 write=0.000
-> Append (cost=19657.27..60048.00 rows=1098 width=3765) (actual time=35.449..507.189 rows=41841 loops=1)
Buffers: shared hit=354608
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=19657.27..23856.89 rows=165 width=232) (actual time=35.448..369.226 rows=40793 loops=1)
Buffers: shared hit=251247
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=19656.84..23759.20 rows=165 width=4) (actual time=35.414..129.871 rows=40793 loops=1)
Buffers: shared hit=87996
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=19656.41..19671.60 rows=1519 width=8) (actual time=34.984..37.560 rows=6354 loops=1)
Group Key: namespaces.id
Buffers: shared hit=29160
I/O Timings: read=0.000 write=0.000
-> Gather (cost=1131.48..19652.62 rows=1519 width=8) (actual time=4.278..32.875 rows=6354 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=29160
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=131.48..18500.72 rows=633 width=8) (actual time=1.271..27.154 rows=2118 loops=3)
Buffers: shared hit=29160
I/O Timings: read=0.000 write=0.000
-> Parallel Bitmap Heap Scan on public.namespaces namespaces_1 (cost=130.92..12706.87 rows=2845 width=4) (actual time=1.178..11.901 rows=2118 loops=3)
Buffers: shared hit=3520
I/O Timings: read=0.000 write=0.000
-> Bitmap Index Scan using index_namespaces_on_traversal_ids (cost=0.00..129.21 rows=6828 width=0) (actual time=2.854..2.855 rows=6354 loops=1)
Index Cond: (namespaces_1.traversal_ids @> '{5892345}'::integer[])
Buffers: shared hit=51
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_namespaces_on_type_and_id on public.namespaces (cost=0.56..2.04 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=6354)
Index Cond: ((namespaces.type = 'Group'::text) AND (namespaces.id = namespaces_1.id))
Heap Fetches: 191
Buffers: shared hit=25640
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_runner_namespaces_on_namespace_id on public.ci_runner_namespaces (cost=0.43..1.87 rows=82 width=8) (actual time=0.003..0.013 rows=6 loops=6354)
Index Cond: (ci_runner_namespaces.namespace_id = namespaces.id)
Buffers: shared hit=58836
I/O Timings: read=0.000 write=0.000
-> Index Scan using ci_runners_pkey on public.ci_runners (cost=0.43..0.59 rows=1 width=232) (actual time=0.005..0.005 rows=1 loops=40793)
Index Cond: (ci_runners.id = ci_runner_namespaces.runner_id)
Buffers: shared hit=163251
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=23993.59..36174.64 rows=933 width=232) (actual time=51.564..130.350 rows=1048 loops=1)
Buffers: shared hit=103361
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=23993.16..35650.00 rows=933 width=4) (actual time=51.538..120.952 rows=1048 loops=1)
Buffers: shared hit=99168
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=23992.73..24110.61 rows=11788 width=4) (actual time=50.982..57.572 rows=19019 loops=1)
Group Key: projects.id
Buffers: shared hit=41073
I/O Timings: read=0.000 write=0.000
-> Gather (cost=1131.35..23963.26 rows=11788 width=4) (actual time=4.366..41.972 rows=19019 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=41073
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=131.35..21784.46 rows=4912 width=4) (actual time=2.567..34.974 rows=6340 loops=3)
Buffers: shared hit=41073
I/O Timings: read=0.000 write=0.000
-> Parallel Bitmap Heap Scan on public.namespaces namespaces_2 (cost=130.92..12706.87 rows=2845 width=4) (actual time=1.343..12.884 rows=2118 loops=3)
Buffers: shared hit=3520
I/O Timings: read=0.000 write=0.000
-> Bitmap Index Scan using index_namespaces_on_traversal_ids (cost=0.00..129.21 rows=6828 width=0) (actual time=3.130..3.131 rows=6354 loops=1)
Index Cond: (namespaces_2.traversal_ids @> '{5892345}'::integer[])
Buffers: shared hit=51
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_projects_on_namespace_id_and_id on public.projects (cost=0.44..2.99 rows=20 width=8) (actual time=0.007..0.010 rows=3 loops=6354)
Index Cond: (projects.namespace_id = namespaces_2.id)
Heap Fetches: 1925
Buffers: shared hit=37553
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_runner_projects_on_project_id on public.ci_runner_projects (cost=0.43..0.78 rows=20 width=8) (actual time=0.003..0.003 rows=0 loops=19019)
Index Cond: (ci_runner_projects.project_id = projects.id)
Buffers: shared hit=58095
I/O Timings: read=0.000 write=0.000
-> Index Scan using ci_runners_pkey on public.ci_runners ci_runners_1 (cost=0.43..0.56 rows=1 width=232) (actual time=0.008..0.008 rows=1 loops=1048)
Index Cond: (ci_runners_1.id = ci_runner_projects.runner_id)
Buffers: shared hit=4193
I/O Timings: read=0.000 write=0.000
New queries
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" = 5892345
pg.ai link: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6915/commands/24496
SELECT
"ci_runners".*
FROM ((
SELECT
"ci_runners".*
FROM
"ci_runners"
INNER JOIN "ci_runner_namespaces" ON "ci_runner_namespaces"."runner_id" = "ci_runners"."id"
WHERE
"ci_runner_namespaces"."namespace_id" IN (
SELECT
traversal_ids[array_length(traversal_ids, 1)]
FROM
"ci_namespace_hierarchies"
WHERE (traversal_ids && ARRAY[5892345]::int[])))
UNION (
SELECT
"ci_runners".*
FROM
"ci_runners"
INNER JOIN "ci_runner_projects" ON "ci_runner_projects"."runner_id" = "ci_runners"."id"
WHERE
"ci_runner_projects"."project_id" IN (
SELECT
"ci_project_hierarchies"."project_id"
FROM
"ci_project_hierarchies"
WHERE (traversal_ids && ARRAY[5892345]::int[])))) ci_runners
ORDER BY
"ci_runners"."created_at" DESC
LIMIT 100;
pg.ai link: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6915/commands/24497
Index Scan using namespaces_pkey on public.namespaces (cost=0.43..3.46 rows=1 width=4) (actual time=20.709..20.712 rows=1 loops=1)
Index Cond: (namespaces.id = 5892345)
Filter: ((namespaces.type)::text = 'Group'::text)
Rows Removed by Filter: 0
Buffers: shared hit=2 read=2
I/O Timings: read=20.593 write=0.000
Limit (cost=2044649.34..2044649.59 rows=100 width=3765) (actual time=18687.541..18688.618 rows=100 loops=1)
Buffers: shared hit=207025 read=42363 dirtied=2244 written=7405
I/O Timings: read=50583.836 write=1611.373
-> Sort (cost=2044649.34..2045957.70 rows=523344 width=3765) (actual time=18687.507..18688.560 rows=100 loops=1)
Sort Key: ci_runners.created_at DESC
Sort Method: top-N heapsort Memory: 92kB
Buffers: shared hit=207025 read=42363 dirtied=2244 written=7405
I/O Timings: read=50583.836 write=1611.373
-> Unique (cost=1989321.79..2019414.07 rows=523344 width=3765) (actual time=18610.906..18675.485 rows=41656 loops=1)
Buffers: shared hit=207022 read=42363 dirtied=2244 written=7405
I/O Timings: read=50583.836 write=1611.373
-> Sort (cost=1989321.79..1990630.15 rows=523344 width=3765) (actual time=18610.900..18632.824 rows=41841 loops=1)
Sort Key: ci_runners.id, ci_runners.token, ci_runners.created_at, ci_runners.updated_at, ci_runners.description, ci_runners.contacted_at, ci_runners.active, ci_runners.name, ci_runners.version, ci_runners.revision, ci_runners.platform, ci_runners.architecture, ci_runners.run_untagged, ci_runners.locked, ci_runners.access_level, ci_runners.ip_address, ci_runners.maximum_timeout, ci_runners.runner_type, ci_runners.token_encrypted, ci_runners.public_projects_minutes_cost_factor, ci_runners.private_projects_minutes_cost_factor, ci_runners.config
Sort Method: quicksort Memory: 13110kB
Buffers: shared hit=207022 read=42363 dirtied=2244 written=7405
I/O Timings: read=50583.836 write=1611.373
-> Gather (cost=1166.37..304409.91 rows=523344 width=3765) (actual time=188.337..18530.936 rows=41841 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=207000 read=42363 dirtied=2244 written=7405
I/O Timings: read=50583.836 write=1611.373
-> Parallel Append (cost=166.37..251075.51 rows=523344 width=3765) (actual time=687.425..18530.955 rows=13947 loops=3)
Buffers: shared hit=207000 read=42363 dirtied=2244 written=7405
I/O Timings: read=50583.836 write=1611.373
-> Nested Loop (cost=13360.35..181621.15 rows=217432 width=232) (actual time=619.110..12076.024 rows=13598 loops=3)
Buffers: shared hit=148195 read=28669 dirtied=2098 written=7405
I/O Timings: read=32474.895 write=1611.373
-> Hash Join (cost=13359.92..52891.77 rows=217432 width=4) (actual time=616.486..2605.074 rows=13598 loops=3)
Hash Cond: (ci_runner_namespaces.namespace_id = ci_namespace_hierarchies.traversal_ids[array_length(ci_namespace_hierarchies.traversal_ids, 1)])
Buffers: shared hit=4596 read=9016 dirtied=1774 written=3121
I/O Timings: read=5737.978 write=861.361
-> Parallel Seq Scan on public.ci_runner_namespaces (cost=0.00..35333.42 rows=558142 width=8) (actual time=0.873..1789.167 rows=439391 loops=3)
Buffers: shared hit=1 read=7437 dirtied=1774 written=3121
I/O Timings: read=4001.801 write=861.361
-> Hash (cost=13280.22..13280.22 rows=6376 width=25) (actual time=614.986..615.017 rows=6354 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 482kB
Buffers: shared hit=4595 read=1579
I/O Timings: read=1736.177 write=0.000
-> HashAggregate (cost=13216.46..13280.22 rows=6376 width=25) (actual time=610.663..612.158 rows=6354 loops=3)
Group Key: ci_namespace_hierarchies.traversal_ids[array_length(ci_namespace_hierarchies.traversal_ids, 1)]
Buffers: shared hit=4595 read=1579
I/O Timings: read=1736.177 write=0.000
-> Bitmap Heap Scan on public.ci_namespace_hierarchies (cost=61.41..13200.52 rows=6376 width=25) (actual time=4.965..605.326 rows=6354 loops=3)
Buffers: shared hit=4595 read=1579
I/O Timings: read=1736.177 write=0.000
-> Bitmap Index Scan using index_gin_ci_namespace_hierarchies (cost=0.00..59.82 rows=6376 width=0) (actual time=4.023..4.029 rows=6354 loops=3)
Index Cond: (ci_namespace_hierarchies.traversal_ids && '{5892345}'::integer[])
Buffers: shared hit=19 read=5
I/O Timings: read=8.580 write=0.000
-> Index Scan using ci_runners_pkey on public.ci_runners (cost=0.43..0.59 rows=1 width=232) (actual time=0.694..0.694 rows=1 loops=40793)
Index Cond: (ci_runners.id = ci_runner_namespaces.runner_id)
Buffers: shared hit=143599 read=19653 dirtied=324 written=4284
I/O Timings: read=26736.917 write=750.012
-> Nested Loop (cost=166.37..61604.21 rows=628 width=232) (actual time=126.921..9674.789 rows=524 loops=2)
Buffers: shared hit=58805 read=13694 dirtied=146
I/O Timings: read=18108.941 write=0.000
-> Nested Loop (cost=165.94..61251.08 rows=628 width=4) (actual time=118.973..8732.410 rows=524 loops=2)
Buffers: shared hit=55940 read=12365 dirtied=24
I/O Timings: read=16271.141 write=0.000
-> Parallel Bitmap Heap Scan on public.ci_project_hierarchies (cost=165.51..41714.37 rows=7930 width=8) (actual time=21.319..6767.750 rows=9510 loops=2)
Buffers: shared hit=1 read=10202
I/O Timings: read=13335.341 write=0.000
-> Bitmap Index Scan using index_gin_ci_project_hierarchies (cost=0.00..160.75 rows=19033 width=0) (actual time=22.535..22.545 rows=19019 loops=1)
Index Cond: (ci_project_hierarchies.traversal_ids && '{5892345}'::integer[])
Buffers: shared hit=1 read=9
I/O Timings: read=16.709 write=0.000
-> Index Scan using index_ci_runner_projects_on_project_id on public.ci_runner_projects (cost=0.43..2.26 rows=20 width=8) (actual time=0.169..0.204 rows=0 loops=19019)
Index Cond: (ci_runner_projects.project_id = ci_project_hierarchies.project_id)
Buffers: shared hit=55939 read=2163 dirtied=24
I/O Timings: read=2935.800 write=0.000
-> Index Scan using ci_runners_pkey on public.ci_runners ci_runners_1 (cost=0.43..0.56 rows=1 width=232) (actual time=1.793..1.793 rows=1 loops=1048)
Index Cond: (ci_runners_1.id = ci_runner_projects.runner_id)
Buffers: shared hit=2865 read=1329 dirtied=122
I/O Timings: read=1837.800 write=0.000
Screenshots or Screencasts (strongly suggested)
How to setup and validate locally (strongly suggested)
-
Add patch to enable eager loading of sidekiq:
diff --git a/config/environments/development.rb b/config/environments/development.rb index 92de88394c6..354494b965e 100644 --- a/config/environments/development.rb +++ b/config/environments/development.rb @@ -61,7 +61,7 @@ BetterErrors::Middleware.allow_ip!("127.0.0.1/0") # Reassign some performance related settings when we profile the app - if Gitlab::Utils.to_boolean(ENV['RAILS_PROFILE'].to_s) + if Gitlab::Utils.to_boolean(ENV['RAILS_PROFILE'].to_s) || Gitlab::Runtime.sidekiq? warn "Hot-reloading is disabled as you are running with RAILS_PROFILE enabled" config.cache_classes = true config.eager_load = true
-
Run the migrations:
$ bin/rails db:migrate
-
Run the finder that is currently leveraging the query:
group = Group.find(22) admin = User.find(1) Ci::RunnersFinder.new(current_user: admin, params: { group: group, membership: :descendants }).execute
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team