Backfill / update timelogs.spent_at where NULL
What does this MR do and why?
In #347473 (closed) we identified that timelogs.spent_at
is only populated via quick action, not via the API.
This has now been remedied, but we need to backfill/update all of the empty values.
This MR takes does so by using the created_at
timestamp.
I previously ran a background migration on timelogs so used that as a template !60439 (merged)
I'm not sure if this is overkill, perhaps a single UPDATE statement would be sufficient?
First run (cold?):
explain UPDATE timelogs SET spent_at = created_at WHERE spent_at IS NULL
ModifyTable on public.timelogs (cost=0.43..143519.97 rows=1217054 width=116) (actual time=177948.671..177948.697 rows=0 loops=1)
Buffers: shared hit=33539769 read=136929 dirtied=157749 written=31390
I/O Timings: read=135631.349 write=0.000
-> Index Scan using index_timelogs_on_project_id_and_spent_at on public.timelogs (cost=0.43..143519.97 rows=1217054 width=116) (actual time=9.461..68844.401 rows=1257328 loops=1)
Index Cond: (timelogs.spent_at IS NULL)
Buffers: shared hit=672703 read=61163 dirtied=188
I/O Timings: read=66067.123 write=0.000
Time: 2.966 min
- planning: 1.251 ms
- execution: 2.966 min
- I/O read: 2.261 min
- I/O write: 0.000 ms
Shared buffers:
- hits: 33539769 (~255.90 GiB) from the buffer pool
- reads: 136929 (~1.00 GiB) from the OS file cache, including disk I/O
- dirtied: 157749 (~1.20 GiB)
- writes: 31390 (~245.20 MiB)
Subsequent run:
ModifyTable on public.timelogs (cost=0.43..72778.38 rows=1 width=116) (actual time=177.247..177.250 rows=0 loops=1)
Buffers: shared hit=24290
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_timelogs_on_project_id_and_spent_at on public.timelogs (cost=0.43..72778.38 rows=1 width=116) (actual time=177.243..177.243 rows=0 loops=1)
Index Cond: (timelogs.spent_at IS NULL)
Buffers: shared hit=24290
I/O Timings: read=0.000 write=0.000
Time: 177.804 ms
- planning: 0.495 ms
- execution: 177.309 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 24290 (~189.80 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Background Migration Details
1,258,911 rows to update
batch size = 5,000
1,258,911 / 5,000 = 252 batches
Estimated times per batch:
- Records are updated in sub-batches of 100 => 5,000 / 1,000 = 50 total updates
- <4ms for update statement with 100 items [(see linked explain plan)](https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7650/commands/27184)
Total batch time: <4 * 50 = <200 ms batch
2 mins delay per batch
252 batches * 2 min per batch = 8.4 hours to run all the scheduled jobs
Edited by Lee Tickett