Re-compute issue relative position on project import (!59175) · Merge requests · GitLab.org / GitLab

Alexandru Croitor requested to merge reset-relative-position-on-import into master Apr 12, 2021

What does this MR do?

This MR is related to #276483 (closed) performance issues we see recently with issues table. Because of high relative position we see a high number of updates happening more and more often aiming to rebalance issues relative position. One cause for that is the issues reaching end of integer rage too fast. This MR fixes one root cause of that.

When exporting project issues we can have very high relative positions that can generate big gaps in relative positions when re-importing the project. To improve that we export project issues sorted by relative position ascendingly and because we know the issues are sorted, on import we can set the new imported issues relative position to current max(relative_position) + issue index * IDEAL_DISTANCE

Context: We need to sort by relative position before inserting so that we can rely on issue positions to move the issues at the end of the position range in an hierarchy but still preserve the relative issues position within the project.

Example:

if for instance we have max relative position in the hierarchy is 100_000 but import issues that have relative positions starting at 1_000_000 we do not want to create this huge gap, but we rather want to start imported issues positions from 100_513, 101_026, ...

as a follow-up when we actually improve the rebalancing code, we may want to check if at the import time we reach a high enough relative position, or maybe even reach the end of the range(-2147483648, 2147483647) we would need to trigger a rebalancing.

Database

Batching ordered by relative position

🤔 I've added the suggested index but for some reason we are reading way more data using the index than wihout it:

with the index: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3595/commands/11976

Without new index:

First batch(cold cache and warm cache plans/times)
- cold cache: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3612/commands/12056
- warm cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3612/commands/12072

SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100

--- cold cache
Limit  (cost=112466.23..112477.90 rows=100 width=1301) (actual time=18896.732..18914.834 rows=100 loops=1)
   Buffers: shared hit=11010 read=56845 dirtied=541
   I/O Timings: read=54957.719
   ->  Gather Merge  (cost=112466.23..120832.28 rows=71704 width=1301) (actual time=18896.730..18914.816 rows=100 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=11010 read=56845 dirtied=541
         I/O Timings: read=54957.719
         ->  Sort  (cost=111466.21..111555.84 rows=35852 width=1301) (actual time=18854.675..18854.696 rows=68 loops=3)
               Sort Key: issues.relative_position, issues.id
               Sort Method: top-N heapsort  Memory: 330kB
               Buffers: shared hit=11010 read=56845 dirtied=541
               I/O Timings: read=54957.719
               ->  Parallel Index Scan using index_issues_on_project_id_and_iid on public.issues  (cost=0.57..110095.97 rows=35852 width=1301) (actual time=5.028..18758.873 rows=28430 loops=3)
                     Index Cond: (issues.project_id = 278964)
                     Buffers: shared hit=10992 read=56845 dirtied=541
                     I/O Timings: read=54957.719

Time: 18.920 s
  - planning: 5.562 ms
  - execution: 18.915 s (estimated* for prod: 0.773...17.265 s)
    - I/O read: 54.958 s
    - I/O write: N/A

Shared buffers:
  - hits: 11010 (~86.00 MiB) from the buffer pool
  - reads: 56845 (~444.10 MiB) from the OS file cache, including disk I/O
  - dirtied: 541 (~4.20 MiB)
  - writes: 0


--- warm cache
Limit  (cost=115226.03..115237.69 rows=100 width=1301) (actual time=137.848..174.599 rows=100 loops=1)
   Buffers: shared hit=67657
   ->  Gather Merge  (cost=115226.03..123813.29 rows=73600 width=1301) (actual time=137.846..174.578 rows=100 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=67657
         ->  Sort  (cost=114226.00..114318.00 rows=36800 width=1301) (actual time=131.744..131.761 rows=62 loops=3)
               Sort Key: issues.relative_position, issues.id
               Sort Method: top-N heapsort  Memory: 351kB
               Buffers: shared hit=67657
               ->  Parallel Index Scan using index_issues_on_project_id_and_iid on public.issues  (cost=0.57..112819.53 rows=36800 width=1301) (actual time=0.036..101.056 rows=28430 loops=3)
                     Index Cond: (issues.project_id = 278964)
                     Buffers: shared hit=67639

Time: 175.130 ms
  - planning: 0.447 ms
  - execution: 174.683 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 67657 (~528.60 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

consequent batches with issue relative position(these are the types of queries that batching runs the most)

without union optimization

plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3612/commands/12057

    SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND (("issues"."id" > 14011 AND "issues"."relative_position" = 26121343) OR ("issues"."relative_position" > 26121343) OR ("issues"."relative_position" IS NULL)) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100
    
    Limit  (cost=112260.44..112272.11 rows=100 width=1301) (actual time=130.979..163.186 rows=100 loops=1)
       Buffers: shared hit=67681
       ->  Gather Merge  (cost=112260.44..117728.29 rows=46864 width=1301) (actual time=130.976..163.169 rows=100 loops=1)
             Workers Planned: 2
             Workers Launched: 2
             Buffers: shared hit=67681
             ->  Sort  (cost=111260.41..111318.99 rows=23432 width=1301) (actual time=124.445..124.458 rows=64 loops=3)
                   Sort Key: issues.relative_position, issues.id
                   Sort Method: top-N heapsort  Memory: 313kB
                   Buffers: shared hit=67681
                   ->  Parallel Index Scan using index_issues_on_project_id_and_iid on public.issues  (cost=0.57..110364.86 rows=23432 width=1301) (actual time=1.573..109.286 rows=12375 loops=3)
                         Index Cond: (issues.project_id = 278964)
                         Filter: (((issues.id > 14011) AND (issues.relative_position = 26121343)) OR (issues.relative_position > 26121343) OR (issues.relative_position IS NULL))
                         Rows Removed by Filter: 16055
                         Buffers: shared hit=67663
    
    Time: 163.954 ms
      - planning: 0.688 ms
      - execution: 163.266 ms
        - I/O read: N/A
        - I/O write: N/A
    
    Shared buffers:
      - hits: 67681 (~528.80 MiB) from the buffer pool
      - reads: 0 from the OS file cache, including disk I/O
      - dirtied: 0
      - writes: 0

with union optimization, uses existing idx_issues_on_project_id_and_rel_position_and_state_id_and_id

plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3765/commands/12689

  explain SELECT "issues".* FROM ((SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."id" > 14011 AND "issues"."relative_position" = 26121343) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100)
  UNION ALL
  (SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."relative_position" > 26121343) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100)
  UNION ALL
  (SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."relative_position" IS NULL) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100)) issues ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100

  
   Limit  (cost=89275.51..89278.05 rows=100 width=1303) (actual time=131.483..131.566 rows=100 loops=1)
     Buffers: shared hit=32335
     ->  Merge Append  (cost=89275.51..89280.61 rows=201 width=1303) (actual time=131.481..131.545 rows=100 loops=1)
           Sort Key: issues.relative_position, issues.id
           Buffers: shared hit=32335
           ->  Sort  (cost=3.62..3.63 rows=1 width=1303) (actual time=0.078..0.081 rows=0 loops=1)
                 Sort Key: issues.relative_position, issues.id
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=4
                 ->  Limit  (cost=3.60..3.60 rows=1 width=1303) (actual time=0.073..0.076 rows=0 loops=1)
                       Buffers: shared hit=4
                       ->  Sort  (cost=3.60..3.60 rows=1 width=1303) (actual time=0.073..0.074 rows=0 loops=1)
                             Sort Key: issues.id
                             Sort Method: quicksort  Memory: 25kB
                             Buffers: shared hit=4
                             ->  Index Scan using idx_issues_on_project_id_and_rel_position_and_state_id_and_id on public.issues  (cost=0.57..3.59 rows=1 width=1303) (actual time=0.067..0.067 rows=0 loops=1)
                                   Index Cond: ((issues.project_id = 278964) AND (issues.relative_position = 26121343) AND (issues.id > 14011))
                                   Buffers: shared hit=4
           ->  Limit  (cost=36849.32..36849.57 rows=100 width=1303) (actual time=131.352..131.392 rows=100 loops=1)
                 Buffers: shared hit=32327
                 ->  Sort  (cost=36849.32..36918.30 rows=27591 width=1303) (actual time=131.351..131.372 rows=100 loops=1)
                       Sort Key: issues_1.relative_position, issues_1.id
                       Sort Method: top-N heapsort  Memory: 160kB
                       Buffers: shared hit=32327
                       ->  Index Scan using idx_issues_on_project_id_and_rel_position_and_state_id_and_id on public.issues issues_1  (cost=0.57..35794.81 rows=27591 width=1303) (actual time=0.016..93.520 rows=37621 loops=1)
                             Index Cond: ((issues_1.project_id = 278964) AND (issues_1.relative_position > 26121343))
                             Buffers: shared hit=32327
           ->  Limit  (cost=52422.54..52422.79 rows=100 width=1303) (actual time=0.049..0.049 rows=0 loops=1)
                 Buffers: shared hit=4
                 ->  Sort  (cost=52422.54..52520.75 rows=39285 width=1303) (actual time=0.047..0.047 rows=0 loops=1)
                       Sort Key: issues_2.relative_position, issues_2.id
                       Sort Method: quicksort  Memory: 25kB
                       Buffers: shared hit=4
                       ->  Index Scan using idx_issues_on_project_id_and_rel_position_and_state_id_and_id on public.issues issues_2  (cost=0.57..50921.09 rows=39285 width=1303) (actual time=0.031..0.031 rows=0 loops=1)
                             Index Cond: ((issues_2.project_id = 278964) AND (issues_2.relative_position IS NULL))
                             Buffers: shared hit=4

  
  Time: 133.144 ms
    - planning: 1.373 ms
    - execution: 131.771 ms
      - I/O read: N/A
      - I/O write: N/A
  
  Shared buffers:
    - hits: 32335 (~252.60 MiB) from the buffer pool
    - reads: 0 from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

batching issues with null position - uses the existing rel position index idx_issues_on_project_id_and_rel_position_and_state_id_and_id
- https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3612/commands/12058

SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."id" > 20932 AND "issues"."relative_position" IS NULL) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100

Limit  (cost=55145.69..55145.94 rows=100 width=1301) (actual time=13.074..13.077 rows=0 loops=1)
   Buffers: shared hit=3 read=4 dirtied=1
   I/O Timings: read=12.834
   ->  Sort  (cost=55145.69..55248.89 rows=41281 width=1301) (actual time=13.072..13.074 rows=0 loops=1)
         Sort Key: issues.relative_position, issues.id
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3 read=4 dirtied=1
         I/O Timings: read=12.834
         ->  Index Scan using idx_issues_on_project_id_and_rel_position_and_state_id_and_id on public.issues  (cost=0.57..53567.96 rows=41281 width=1301) (actual time=13.063..13.065 rows=0 loops=1)
               Index Cond: ((issues.project_id = 278964) AND (issues.relative_position IS NULL) AND (issues.id > 20932))
               Buffers: shared hit=3 read=4 dirtied=1
               I/O Timings: read=12.834

Time: 13.633 ms
  - planning: 0.497 ms
  - execution: 13.136 ms
    - I/O read: 12.834 ms
    - I/O write: N/A

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 1 (~8.00 KiB)
  - writes: 0

With new index:

Adding the index:

exec CREATE INDEX idx_issues_on_project_id_and_rel_asc_and_id     ON public.issues USING btree     (project_id ASC, relative_position ASC NULLS LAST, id ASC);

% time      seconds wait_event
------ ------------ -----------------------------
89.52   1609.769369 IO.DataFileRead
5.81     104.449480 Running
1.77      31.878622 IO.DataFileExtend
1.52      27.393409 IO.BufFileWrite
0.72      13.031209 LWLock.WALWriteLock
0.25       4.548676 IO.DataFileWrite
0.24       4.233695 IO.BufFileRead
0.13       2.326615 IO.WALWrite
0.03       0.459600 LWLock.WALBufMappingLock
0.01       0.167945 IO.SLRURead
0.00       0.059314 LWLock.buffer_mapping
------ ------------ -----------------------------
100.00  1798.317934
The query has been executed. Duration: 1798.318 s (estimated* for prod: 178.134...1645.562 s)
⠀* Estimated timing for production (experimental).

First batch(warm cache) - uses the first part of the index condition Index Cond: (issues.project_id = 278964)
- plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3612/commands/12067

SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100

Limit  (cost=0.57..128.83 rows=100 width=1301) (actual time=0.347..1.665 rows=100 loops=1)
   Buffers: shared hit=100 read=4
   I/O Timings: read=0.230
   ->  Index Scan using idx_issues_on_project_id_and_rel_asc_and_id on public.issues  (cost=0.57..113285.23 rows=88320 width=1301) (actual time=0.345..1.637 rows=100 loops=1)
         Index Cond: (issues.project_id = 278964)
         Buffers: shared hit=100 read=4
         I/O Timings: read=0.230

Time: 7.665 ms
  - planning: 5.949 ms
  - execution: 1.716 ms
    - I/O read: 0.230 ms
    - I/O write: N/A

Shared buffers:
  - hits: 100 (~800.00 KiB) from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

consequent batches with issue relative position(these are the types of queries that batching would run the most) - I do not see much difference from using the new index or the existing index_issues_on_project_id_and_iid index

without union optimization

plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3612/commands/12068

  SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND (("issues"."id" > 14011 AND "issues"."relative_position" = 26121343) OR ("issues"."relative_position" > 26121343) OR ("issues"."relative_position" IS NULL)) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100
  
   Limit  (cost=0.57..197.97 rows=100 width=1301) (actual time=175.645..175.872 rows=100 loops=1)
     Buffers: shared hit=39907 read=185
     I/O Timings: read=50.427
     ->  Index Scan using idx_issues_on_project_id_and_rel_asc_and_id on public.issues  (cost=0.57..113947.63 rows=57723 width=1301) (actual time=175.642..175.852 rows=100 loops=1)
           Index Cond: (issues.project_id = 278964)
           Filter: (((issues.id > 14011) AND (issues.relative_position = 26121343)) OR (issues.relative_position > 26121343) OR (issues.relative_position IS NULL))
           Rows Removed by Filter: 48164
           Buffers: shared hit=39907 read=185
           I/O Timings: read=50.427
  
  Time: 176.731 ms
    - planning: 0.787 ms
    - execution: 175.944 ms
      - I/O read: 50.427 ms
      - I/O write: N/A
  
  Shared buffers:
    - hits: 39907 (~311.80 MiB) from the buffer pool
    - reads: 185 (~1.40 MiB) from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

with union optimization, uses the new index idx_issues_on_project_id_and_rel_asc_and_id

plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3765/commands/12728

  explain SELECT "issues".* FROM ((SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."id" > 14011 AND "issues"."relative_position" = 26121343) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100)
  UNION ALL
  (SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."relative_position" > 26121343) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100)
  UNION ALL
  (SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."relative_position" IS NULL) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100)) issues ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100

  
  
   Limit  (cost=4.77..135.76 rows=100 width=1303) (actual time=0.116..0.544 rows=100 loops=1)
     Buffers: shared hit=97
     ->  Merge Append  (cost=4.77..268.06 rows=201 width=1303) (actual time=0.114..0.529 rows=100 loops=1)
           Sort Key: issues.relative_position, issues.id
           Buffers: shared hit=97
           ->  Sort  (cost=3.61..3.61 rows=1 width=1303) (actual time=0.064..0.064 rows=0 loops=1)
                 Sort Key: issues.relative_position, issues.id
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=4
                 ->  Limit  (cost=0.57..3.59 rows=1 width=1303) (actual time=0.059..0.059 rows=0 loops=1)
                       Buffers: shared hit=4
                       ->  Index Scan using idx_issues_on_project_id_and_rel_asc_and_id on public.issues  (cost=0.57..3.59 rows=1 width=1303) (actual time=0.058..0.058 rows=0 loops=1)
                             Index Cond: ((issues.project_id = 278964) AND (issues.relative_position = 26121343) AND (issues.id > 14011))
                             Buffers: shared hit=4
           ->  Limit  (cost=0.57..129.97 rows=100 width=1303) (actual time=0.015..0.414 rows=100 loops=1)
                 Buffers: shared hit=89
                 ->  Index Scan using idx_issues_on_project_id_and_rel_asc_and_id on public.issues issues_1  (cost=0.57..35703.31 rows=27591 width=1303) (actual time=0.014..0.400 rows=100 loops=1)
                       Index Cond: ((issues_1.project_id = 278964) AND (issues_1.relative_position > 26121343))
                       Buffers: shared hit=89
           ->  Limit  (cost=0.57..129.85 rows=100 width=1303) (actual time=0.035..0.035 rows=0 loops=1)
                 Buffers: shared hit=4
                 ->  Index Scan using idx_issues_on_project_id_and_rel_asc_and_id on public.issues issues_2  (cost=0.57..50790.59 rows=39285 width=1303) (actual time=0.035..0.035 rows=0 loops=1)
                       Index Cond: ((issues_2.project_id = 278964) AND (issues_2.relative_position IS NULL))
                       Buffers: shared hit=4
  
  Time: 7.836 ms
    - planning: 7.178 ms
    - execution: 0.658 ms
      - I/O read: N/A
      - I/O write: N/A
  
  Shared buffers:
    - hits: 97 (~776.00 KiB) from the buffer pool
    - reads: 0 from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

batching issues with null position - uses the new index effectively, though the timings are not super different from existing rel pos index idx_issues_on_project_id_and_rel_asc_and_id
- plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3612/commands/12069

SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND ("issues"."id" > 20932 AND "issues"."relative_position" IS NULL) ORDER BY relative_position asc NULLS LAST, "issues"."id" ASC LIMIT 100

Limit  (cost=0.57..129.78 rows=100 width=1301) (actual time=20.379..20.380 rows=0 loops=1)
   Buffers: shared hit=2 read=2
   I/O Timings: read=20.242
   ->  Index Scan using idx_issues_on_project_id_and_rel_asc_and_id on public.issues  (cost=0.57..54752.11 rows=42372 width=1301) (actual time=20.377..20.377 rows=0 loops=1)
         Index Cond: ((issues.project_id = 278964) AND (issues.relative_position IS NULL) AND (issues.id > 20932))
         Buffers: shared hit=2 read=2
         I/O Timings: read=20.242

Time: 20.861 ms
  - planning: 0.419 ms
  - execution: 20.442 ms
    - I/O read: 20.242 ms
    - I/O write: N/A

Shared buffers:
  - hits: 2 (~16.00 KiB) from the buffer pool
  - reads: 2 (~16.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

📋 Does this MR need a changelog?
- I have included a changelog entry.
- I have not included a changelog entry because _____.
Documentation (if required)
Code review guidelines
Merge request performance guidelines
Style guides
Database guides
Separation of EE specific content

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited Apr 28, 2021 by Alexandru Croitor

Re-compute issue relative position on project import