SPIKE:Performance and Sorting of removing ProjectStatistics#storage_size
What does this MR do and why?
- This MR is the spike validation for: https://gitlab.com/gitlab-org/gitlab/-/issues/427722
- This MR build on top of: !129625 (closed)
- In this MR, we are refactoring the scope
order_by_storage_size
to use the new methodstorage_size_components_for_sql_queries
.
Scope of Validation:
-
Performance Testing:
- Measure and compare query execution times to ensure that enabling the method does not significantly impact performance.
-
Sorting Validation:
- Ensure that the sorting order remains consistent with our requirements, and current implementation.
========================= Findings
Sorting Analysis:
All specs are passing and sorting is still working as expected:
-
ee/spec/finders/ee/namespaces/projects_finder_spec.rb
:✅ -
ee/spec/models/ee/project_spec.rb
:✅
Database performance analysis:
Current Query plan, Before (LIMIT 20):
- Cold run: Time: 22.392 s
========================= New query plan after using the method (LIMIT 20):
-
Without any indexes (upload_size included):
-
Without any indexes (upload_size is not included):
Index optimizations Option 1: (Recommended)
CREATE INDEX idx_project_statistics_composite_project_id
ON project_statistics (project_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size, uploads_size);
CREATE INDEX idx_project_statistics_composite_project_id_without_uploads
ON project_statistics (project_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size);
VACUUM ANALYZE namespaces;
VACUUM ANALYZE projects;
VACUUM ANALYZE project_statistics;
-
With one new index (idx_project_statistics_composite_project_id) on the sum components (upload_size included):
-
With one new index (idx_project_statistics_composite_project_id_without_uploads) on the sum components (upload_size not included):
Index optimizations 2nd approach
CREATE INDEX idx_project_statistics_composite_namespace_project_id
ON project_statistics (project_id, namespace_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size, uploads_size);
CREATE INDEX idx_project_statistics_namespace_size_sum_without_uploads
ON project_statistics (project_id, namespace_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size);
VACUUM ANALYZE namespaces;
VACUUM ANALYZE projects;
VACUUM ANALYZE project_statistics;
-
With one new index (idx_project_statistics_composite_namespace_project_id) on the sum components and project_id, namespace_id (upload_size included):
-
With one new index (
idx_project_statistics_namespace_size_sum_without_uploads
) on the sum components and project_id, namespace_id (upload_size is not included):
To sum up: The new query integrated into the method illustrates enhanced performance when compared to the current query, even without the incorporation of any new indexes (Cold run). In terms of the Hot run, the new approach only shows a marginal increase of 2 ms.
Additionally, the introduction of a single new index, targeting the sum components and project_id exclusively, has notably bolstered query performance during the cold run. Although the hot run experiences slight changes in performance (approximately 20 ms), they are relatively insignificant.
Moreover, the application of a composite index that combines namespace_id
and project_id
significantly elevates performance during the cold run.
Challenges:
- Validate with database if the new performance results are accepted or if they could be optimized.
========================= Suggestions:
Consider the addition of the idx_project_statistics_composite_project_id
and idx_project_statistics_composite_project_id_without_uploads
indexes which is effective for indexing the total storage size on a cold and hot run