Migrate merge requests mentions to DB table
What does this MR do?
This MR contains migrations for merge request mentions found within merge request title or description
I've pulled some estimate counts of the records we might be looking at from slack database-lab
channel.
Database benchmarks
-
Cleanup records with no actual mentions:
explain delete from merge_request_user_mentions where mentioned_users_ids IS NULL AND mentioned_groups_ids IS NULL AND mentioned_projects_ids IS NULL
Click to see the plan. Time: 1.061s. https://explain.depesz.com/s/D1jw
ModifyTable on public.merge_request_user_mentions (cost=0.00..17105.61 rows=104795 width=6) (actual time=572.434..572.434 rows=0 loops=1) Buffers: shared hit=104326 read=3422 dirtied=2668 I/O Timings: read=292.197 -> Seq Scan on public.merge_request_user_mentions (cost=0.00..17105.61 rows=104795 width=6) (actual time=3.280..454.209 rows=102525 loops=1) Filter: ((merge_request_user_mentions.mentioned_users_ids IS NULL) AND (merge_request_user_mentions.mentioned_groups_ids IS NULL) AND (merge_request_user_mentions.mentioned_projects_ids IS NULL)) Rows Removed by Filter: 240203 Buffers: shared read=3421 dirtied=730 I/O Timings: read=291.067 Time: 1.061 s planning: 0.097 ms execution: 1.061 s I/O read: 428.308 ms I/O write: 0.000 ms Shared buffers: hits: 104278 (~814.70 MiB) from the buffer pool reads: 4893 (~38.20 MiB) from the OS file cache, including disk I/O dirtied: 2724 (~21.30 MiB) writes: 0
-
Create index
exec CREATE INDEX merge_request_mentions_temp_index ON merge_requests (id) WHERE title ~~ '%@%'::text OR description ~~ '%@%'::text The query has been executed. Duration: 4.650 min
Number of rows affected
-
Merge requests with mentions in description or title ~991373. Important this query is not run during migration
explain SELECT "merge_requests"."id" FROM "merge_requests" LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
Click to see the plan. Time: 11.598 s. https://explain.depesz.com/s/jDkY
Merge Join (cost=0.85..50211.96 rows=860453 width=4) (actual time=1.240..11499.663 rows=991373 loops=1) Buffers: shared hit=505240 read=27974 dirtied=14026 I/O Timings: read=10062.394 -> Index Only Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..35931.87 rows=864937 width=4) (actual time=0.237..8211.461 rows=1022929 loops=1) Heap Fetches: 28901 Buffers: shared hit=445976 read=25406 dirtied=13212 I/O Timings: read=7510.676 -> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..11193.45 rows=342161 width=4) (actual time=1.000..2927.670 rows=342728 loops=1) Heap Fetches: 64342 Buffers: shared hit=59264 read=2568 dirtied=814 I/O Timings: read=2551.718 Time: 11.598 s - planning: 10.174 ms - execution: 11.588 s - I/O read: 10.062 s - I/O write: 0.000 ms Shared buffers: - hits: 505240 (~3.90 GiB) from the buffer pool - reads: 27974 (~218.50 MiB) from the OS file cache, including disk I/O - dirtied: 14026 (~109.60 MiB) - writes: 0
Migrate merge request mentions
Query plans for sql queries to migrate mentions for merge requests based on database-lab
data.
-
Query 1: Get the start ID for the merge requests to be migrated.
explain SELECT "merge_requests"."id" FROM "merge_requests" LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null) ORDER BY "merge_requests"."id" ASC LIMIT 1
Click to see the plan. Time: 5.322 ms. https://explain.depesz.com/s/9u67
Limit (cost=0.85..0.91 rows=1 width=4) (actual time=0.029..0.030 rows=1 loops=1) Buffers: shared hit=8 -> Merge Join (cost=0.85..50211.96 rows=860453 width=4) (actual time=0.028..0.028 rows=1 loops=1) Buffers: shared hit=8 -> Index Only Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..35931.87 rows=864937 width=4) (actual time=0.017..0.017 rows=1 loops=1) Heap Fetches: 0 Buffers: shared hit=4 -> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..11193.45 rows=342161 width=4) (actual time=0.009..0.009 rows=1 loops=1) Heap Fetches: 0 Buffers: shared hit=4 Time: 5.322 ms - planning: 5.267 ms - execution: 0.055 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 8 (~64.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
-
Query 2: Get max id for the 10K batch rows for merge requests to be migrated.
explain SELECT "merge_requests"."id" FROM "merge_requests" LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null) ORDER BY "merge_requests"."id" ASC LIMIT 1 OFFSET 10000
Click to see the plan. Time: 12.081 ms. https://explain.depesz.com/s/CQto
Limit (cost=584.39..584.45 rows=1 width=4) (actual time=6.726..6.726 rows=1 loops=1) Buffers: shared hit=3115 -> Merge Join (cost=0.85..50211.96 rows=860453 width=4) (actual time=0.041..6.064 rows=10001 loops=1) Buffers: shared hit=3115 -> Index Only Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..35931.87 rows=864937 width=4) (actual time=0.025..4.099 rows=10002 loops=1) Heap Fetches: 601 Buffers: shared hit=3111 -> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..11193.45 rows=342161 width=4) (actual time=0.013..0.014 rows=2 loops=1) Heap Fetches: 0 Buffers: shared hit=4 Time: 12.081 ms - planning: 5.317 ms - execution: 6.764 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3115 (~24.30 MiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
explain SELECT "merge_requests"."id" FROM "merge_requests" LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null) AND merge_requests.id > 40100000 ORDER BY "merge_requests"."id" ASC LIMIT 1 OFFSET 10000
Click to see the plan. Time: 12.321 ms. https://explain.depesz.com/s/MoIa
Limit (cost=1106.33..1106.44 rows=1 width=4) (actual time=7.312..7.312 rows=1 loops=1) Buffers: shared hit=4689 -> Merge Join (cost=0.85..21622.69 rows=195587 width=4) (actual time=2.263..6.764 rows=10001 loops=1) Buffers: shared hit=4689 -> Index Only Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..9069.28 rows=196606 width=4) (actual time=0.016..2.762 rows=10022 loops=1) Index Cond: (merge_requests.id > 40100000) Heap Fetches: 173 Buffers: shared hit=4283 -> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..11193.45 rows=342161 width=4) (actual time=0.010..2.183 rows=2534 loops=1) Heap Fetches: 395 Buffers: shared hit=406 Time: 12.321 ms - planning: 4.979 ms - execution: 7.342 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 4689 (~36.60 MiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
-
Query 3: Get min and max ID for range of merge requests mentions to be migrated.
explain SELECT min("merge_requests"."id"), max("merge_requests"."id") FROM "merge_requests" LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null) AND merge_requests.id > 40100000 AND merge_requests.id < 40939508
Click to see the plan. Time: 64.459 ms. https://explain.depesz.com/s/e6rZ
Nested Loop (cost=0.85..9659.41 rows=14391 width=4) (actual time=0.026..56.184 rows=26416 loops=1) Buffers: shared hit=90756 -> Index Only Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..715.44 rows=14466 width=4) (actual time=0.017..8.561 rows=26454 loops=1) Index Cond: ((merge_requests.id > 40100000) AND (merge_requests.id < 40939508)) Heap Fetches: 406 Buffers: shared hit=11371 -> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..0.81 rows=2 width=4) (actual time=0.001..0.001 rows=0 loops=26454) Index Cond: (merge_request_user_mentions.merge_request_id = merge_requests.id) Heap Fetches: 8 Buffers: shared hit=79385 Time: 64.459 ms - planning: 6.674 ms - execution: 57.785 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 90756 (~709.00 MiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
-
Query 4: Get actual merge requests data for given range of ids
explain SELECT "merge_requests"."*" FROM "merge_requests" LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null) AND merge_requests.id > 40100000 AND merge_requests.id < 40939508
Click to see the plan. Time: 5.517 s. https://explain.depesz.com/s/ce1R
Nested Loop (cost=0.85..20256.82 rows=14391 width=719) (actual time=0.029..101.731 rows=26416 loops=1) Buffers: shared hit=105735 -> Index Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..11312.85 rows=14466 width=719) (actual time=0.019..48.470 rows=26454 loops=1) Index Cond: ((merge_requests.id > 40100000) AND (merge_requests.id < 40939508)) Buffers: shared hit=26350 -> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..0.81 rows=2 width=4) (actual time=0.002..0.002 rows=0 loops=26454) Index Cond: (merge_request_user_mentions.merge_request_id = merge_requests.id) Heap Fetches: 8 Buffers: shared hit=79385 Time: 5.517 s - planning: 6.349 ms - execution: 5.510 s - I/O read: 5.222 s - I/O write: 0.000 ms Shared buffers: - hits: 80777 (~631.10 MiB) from the buffer pool - reads: 20805 (~162.50 MiB) from the OS file cache, including disk I/O - dirtied: 237 (~1.90 MiB) - writes: 0
explain SELECT merge_requests.* FROM "merge_requests"
LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
AND merge_requests.id > 18100000 AND merge_requests.id < 18200000
Click to see the plan. Time: 2.334 s. https://explain.depesz.com/s/rxsf
Nested Loop Anti Join (cost=0.85..2702.96 rows=1042 width=731) (actual time=8.782..2326.244 rows=1584 loops=1)
Buffers: shared hit=4800 read=1465 dirtied=8
I/O Timings: read=2281.984
-> Index Scan using merge_request_mentions_temp_index on public.merge_requests (cost=0.42..1431.72 rows=1046 width=731) (actual time=8.748..2302.339 rows=1584 loops=1)
Index Cond: ((merge_requests.id > 18100000) AND (merge_requests.id < 18200000))
Buffers: shared hit=48 read=1465 dirtied=8
I/O Timings: read=2281.984
-> Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions (cost=0.42..2.01 rows=2 width=4) (actual time=0.010..0.010 rows=0 loops=1584)
Index Cond: (merge_request_user_mentions.merge_request_id = merge_requests.id)
Heap Fetches: 0
Buffers: shared hit=4752
Time: 2.334 s
planning: 6.568 ms
execution: 2.327 s
I/O read: 2.282 s
I/O write: 0.000 ms
Shared buffers:
hits: 4800 (~37.50 MiB) from the buffer pool
reads: 1465 (~11.40 MiB) from the OS file cache, including disk I/O
dirtied: 8 (~64.00 KiB)
writes: 0
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Closes #198338