Backfill project statistics with recent objects size
What does this MR do and why?
We need to backfill ProjectStatistics
records to recalculate repository storage size using the recent objects size introduced as part of Use repository recent objects size for project ... (#419903 - closed)
In this MR we achieve that by:
-
looking up
ProjectStatistics
that have arepository_size
> 0 -
if the migration is running on .com, we also filter by
updated_at
< 2023-08-08We started using recent object size in production code on 2023-08-08, so any statistics updated after that won't need to be refreshed
-
re-calculating the
storage_size
based on the component parts that make up the value, but with the newrepository_size
-
trigger the async worker that handles the
RootStorageStatistics
update (that class sums the relatedproject_statistic#storage_size
values, so the total storage size will also be updated accordingly)
We need to have an EE override for this background migration to handle a nuance with ProjectStatistics
that means we also need to exclude upload_size
when calculating the new total.
We have some previous/recent examples of backfills for project statistics:
- Backfill project stats to remove pipeline artifact (!126053 - merged)
- Backfill to exclude uploads_size from storage_s... (!103017 - merged)
Database info
Self-Managed batch query
explain SELECT "project_statistics"."project_id" FROM "project_statistics" WHERE "project_statistics"."project_id" BETWEEN 28026969 AND 28029237 AND (repository_size > 0) ORDER BY "project_statistics"."project_id"
Time: 8.001 ms
- planning: 0.499 ms
- execution: 7.502 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1150 (~9.00 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21658/commands/70363
No index added as one already exists for this query
SaaS batch query
SELECT "project_statistics"."project_id"
FROM "project_statistics"
WHERE "project_statistics"."project_id" BETWEEN 28026969 AND 28029237 AND (repository_size > 0 and updated_at < '2023-08-08') ORDER BY "project_statistics"."project_id"
Time: 7.761 ms
- planning: 0.411 ms
- execution: 7.350 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1150 (~9.00 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21658/commands/70364
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.