Add two partial nonunique indexes
What does this MR do?
Add two partial non-unique indexes on labels
for [project_id, title]
and [group_id, title]
.
This is the first part of the strategy for #30390 (comment 254445189). By adding the indexes, we can then run the query on production to identify how many duplicate rows have been created due to this race condition.
Using #database-lab to run (only the [title project_id]
index):
explain SELECT L1.project_id, L1.title
FROM labels L1
WHERE (SELECT COUNT(*)
FROM labels L2
WHERE L1.project_id=L2.project_id AND L1.title=L2.title)>1;
Query plan without index
Seq Scan on labels l1 (cost=0.00..76555885.70 rows=3308220 width=13)
Filter: ((SubPlan 1) > 1)
SubPlan 1
-> Aggregate (cost=7.64..7.65 rows=1 width=8)
-> Index Scan using index_labels_on_project_id on labels l2 (cost=0.43..7.64 rows=1 width=0)
Index Cond: (l1.project_id = project_id)
Filter: ((l1.title)::text = (title)::text)
Execution summary without index (3.965 min)
Time: 3.965 min
- planning: 1.421 ms
- execution: 3.965 min
- I/O read: 2.451 min
- I/O write: 2.447 s
Shared buffers:
- hits: 83687304 (~638.50 GiB) from the buffer pool
- reads: 156844 (~1.20 GiB) from the OS file cache, including disk I/O
- dirtied: 4148 (~32.40 MiB)
- writes: 26587 (~207.70 MiB)
Having created the index:
CREATE INDEX CONCURRENTLY index_labels_on_project_id_and_title
ON labels(project_id, title)
WHERE labels.group_id = null;
Query plan with index
Seq Scan on public.labels l1 (cost=0.00..76597123.74 rows=3310013 width=13) (actual time=208.064..79341.877 rows=906 loops=1)
Filter: ((SubPlan 1) > 1)
Rows Removed by Filter: 9929134
Buffers: shared hit=83843340 read=93
I/O Timings: read=7.654
SubPlan 1
-> Aggregate (cost=7.64..7.65 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=9930040)
Buffers: shared hit=83719214
-> Index Scan using index_labels_on_project_id on public.labels l2 (cost=0.43..7.64 rows=1 width=0) (actual time=0.005..0.007 rows=1 loops=9930040)
Index Cond: (l1.project_id = l2.project_id)
Filter: ((l1.title)::text = (l2.title)::text)
Rows Removed by Filter: 11
Buffers: shared hit=83719214
Execution summary with index (1.322 min)
Time: 1.322 min
- planning: 1.044 ms
- execution: 1.322 min
- I/O read: 7.654 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 83843340 (~639.70 GiB) from the buffer pool
- reads: 93 (~744.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Still has a table scan and is using 639GB
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Related to #30390 (closed)
Edited by Coung Ngo