Geo - Make registry table SSOT for Uploads
What does this MR do?
Make the file_registry
table on the tracking database the SSOT to sync uploads.
Database queries
-
Geo::AttachmentRegistryFinder#find_never_synced_registries
:SELECT "file_registry".* FROM "file_registry" WHERE "file_registry"."success" = FALSE AND "file_registry"."retry_count" IS NULL AND 1 = 1 LIMIT 1000;
Query plan: https://explain.depesz.com/s/vPtH (Staging - There is no production environment)
☝ -
Geo::AttachmentRegistryFinder#find_registry_differences
- The following queries assume !24891 (merged) is merged. So they do not have this condition which would cause the queries to be much worse:OR "lfs_objects"."file_store" IS NULL
:-
No selective sync, and object storage is not synced
SELECT "uploads"."id", "uploads"."uploader" FROM "uploads" WHERE "uploads"."store" = 1 AND "uploads"."id" BETWEEN 1 AND 1000;
Time: 0.258 ms - planning: 0.197 ms - execution: 0.061 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3 (~24.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/W4J
☝ SELECT "file_registry"."file_id", "file_registry"."file_type" FROM "file_registry" WHERE "file_registry"."file_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)
☝ -
With selective sync by shard, and object storage is not synced
SELECT "uploads"."id", "uploads"."uploader" FROM "uploads" WHERE (("uploads"."model_type" = 'Namespace' AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_ancestors" AS (( SELECT "namespaces"."id", "namespaces"."parent_id" FROM "namespaces" WHERE "namespaces"."id" IN( SELECT "projects"."namespace_id" FROM "projects" WHERE "projects"."repository_storage" = 'default')) UNION ( SELECT "namespaces"."id", "namespaces"."parent_id" FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "id" FROM "base_and_ancestors" AS "namespaces") OR "uploads"."model_type" = 'Project' AND "uploads"."model_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."repository_storage" = 'default')) OR "uploads"."model_type" NOT IN('Namespace', 'Project') ) AND "uploads"."store" = 1 AND "uploads"."id" BETWEEN 1 AND 1000;
Time: 1.720 ms - planning: 1.430 ms - execution: 0.290 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3 (~24.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/CCGD
☝ SELECT "file_registry"."file_id", "file_registry"."file_type" FROM "file_registry" WHERE "file_registry"."file_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)
☝ -
With selective sync by namespace, and object storage is not synced
SELECT "uploads"."id", "uploads"."uploader" FROM "uploads" WHERE (("uploads"."model_type" = 'Namespace' AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_descendants" AS (( SELECT "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links" WHERE "geo_node_namespace_links"."geo_node_id" = 2) UNION ( SELECT "namespaces"."id" FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces") OR "uploads"."model_type" = 'Project' AND "uploads"."model_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN(WITH RECURSIVE "base_and_descendants" AS (( SELECT "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links" WHERE "geo_node_namespace_links"."geo_node_id" = 2) UNION ( SELECT "namespaces"."id" FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"))) OR "uploads"."model_type" NOT IN('Namespace', 'Project') ) AND "uploads"."store" = 1 AND "uploads"."id" BETWEEN 1 AND 1000;
Time: 1.722 ms - planning: 1.378 ms - execution: 0.344 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3 (~24.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/DlZg (The subquery for
geo_node_namespace_links
was replaced with a query for thegitlab-org
namespace, since we don't have anygeo_nodes
in production)☝ SELECT "file_registry"."file_id", "file_registry"."file_type" FROM "file_registry" WHERE "file_registry"."file_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)
☝ -
No selective sync, and object storage is synced
SELECT "uploads"."id", "uploads"."uploader" FROM "uploads" WHERE "uploads"."id" BETWEEN 1 AND 1000;
Time: 0.181 ms - planning: 0.154 ms - execution: 0.027 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3 (~24.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/7Kmc
☝ -
With selective sync by shard, and object storage is synced
SELECT "uploads"."id", "uploads"."uploader" FROM "uploads" WHERE (("uploads"."model_type" = 'Namespace' AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_ancestors" AS (( SELECT "namespaces"."id", "namespaces"."parent_id" FROM "namespaces" WHERE "namespaces"."id" IN( SELECT "projects"."namespace_id" FROM "projects" WHERE "projects"."repository_storage" = 'default')) UNION ( SELECT "namespaces"."id", "namespaces"."parent_id" FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "id" FROM "base_and_ancestors" AS "namespaces") OR "uploads"."model_type" = 'Project' AND "uploads"."model_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."repository_storage" = 'default')) OR "uploads"."model_type" NOT IN('Namespace', 'Project') ) AND "uploads"."id" BETWEEN 1 AND 1000;
Time: 1.639 ms - planning: 1.326 ms - execution: 0.313 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3 (~24.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/iXmD
☝ SELECT "file_registry"."file_id", "file_registry"."file_type" FROM "file_registry" WHERE "file_registry"."file_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)
☝ -
With selective sync by namespace, and object storage is synced
SELECT "uploads"."id", "uploads"."uploader" FROM "uploads" WHERE (("uploads"."model_type" = 'Namespace' AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_descendants" AS (( SELECT "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links" WHERE "geo_node_namespace_links"."geo_node_id" = 2) UNION ( SELECT "namespaces"."id" FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces") OR "uploads"."model_type" = 'Project' AND "uploads"."model_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN(WITH RECURSIVE "base_and_descendants" AS (( SELECT "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links" WHERE "geo_node_namespace_links"."geo_node_id" = 2) UNION ( SELECT "namespaces"."id" FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"))) OR "uploads"."model_type" NOT IN('Namespace', 'Project') ) AND "uploads"."id" BETWEEN 1 AND 1000;
Time: 1.426 ms - planning: 1.223 ms - execution: 0.203 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 3 (~24.00 KiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/wrPH (The subquery for geo_node_namespace_links was replaced with a query for the gitlab-org namespace, since we don't have any geo_nodes in production)
☝ SELECT "file_registry"."file_id", "file_registry"."file_type" FROM "file_registry" WHERE "file_registry"."file_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)
☝
-
Does this MR meet the acceptance criteria?
Conformity
- [-] Changelog entry
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers - [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team