Added temporary index for uploads_size on project_statistics table
What does this MR do and why?
This MR adds a partial index on project_statistics table where uploads_size is not 0.
The index is required for the Backfill/trigger of storage_size of ProjectStatistics table, as it optimises the performance of the primary query used to filter ProjectStatistics to trigger storage_size_refresh!, as part of https://gitlab.com/gitlab-org/gitlab/-/issues/370313 issue.
Follow up MR that performs the Backfill: here
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Migration Output
Up
main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: migrating
main: -- transaction_open?()
main: -> 0.0000s
main: -- index_exists?(:project_statistics, [:project_id], {:name=>"tmp_index_project_statistics_uploads_size", :where=>"uploads_size <> 0", :algorithm=>:concurrently})
main: -> 0.0037s
main: -- add_index(:project_statistics, [:project_id], {:name=>"tmp_index_project_statistics_uploads_size", :where=>"uploads_size <> 0", :algorithm=>:concurrently})
main: -> 0.0016s
main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: migrated (0.0070s)
Down
main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: reverting
main: -- transaction_open?()
main: -> 0.0000s
main: -- indexes(:project_statistics)
main: -> 0.0060s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0002s
main: -- remove_index(:project_statistics, {:algorithm=>:concurrently, :name=>"tmp_index_project_statistics_uploads_size"})
main: -> 0.0010s
main: -- execute("RESET statement_timeout")
main: -> 0.0002s
main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: reverted (0.0118s)
Query Plans
Query:
SELECT "project_statistics".project_id FROM "project_statistics" WHERE (uploads_size <> 0)
Number of Affected Rows: 1155675
Without the index the performance of the query is very slow.
Before Index
Link: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/12979/commands/45585
Plan
Gather (cost=1000.00..2283590.20 rows=1146767 width=132) (actual time=0.797..49929.177 rows=1161472 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=27719 read=478940 dirtied=8105 written=7680
I/O Timings: read=112683.792 write=335.706
-> Parallel Seq Scan on public.project_statistics (cost=0.00..2167913.50 rows=477820 width=132) (actual time=0.402..49436.833 rows=387157 loops=3)
Filter: (project_statistics.uploads_size <> 0)
Rows Removed by Filter: 8724840
Buffers: shared hit=27719 read=478940 dirtied=8105 written=7680
I/O Timings: read=112683.792 write=335.706
Recommendations
After Index
CREATE INDEX index_project_statistics_nonzero_uploads_size ON project_statistics USING btree (project_id) WHERE (uploads_size <> 0);
Link: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/12979/commands/45587
Plan
Index Scan using index_project_statistics_nonzero_uploads_size on public.project_statistics (cost=0.43..639099.47 rows=1155675 width=132) (actual time=0.386..15179.883 rows=1161472 loops=1)
Buffers: shared hit=717388 read=446380 dirtied=12516 written=4908
I/O Timings: read=12451.101 write=149.956
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.