Geo - Make registry table SSOT for Job Artifacts
What does this MR do?
Make the job_artifact_registry
table on the tracking database the SSOT to sync Job artifacts.
Database queries
-
Geo::JobArtifactRegistryFinder#find_never_synced_registries
:SELECT "job_artifact_registry".* FROM "job_artifact_registry" WHERE "job_artifact_registry"."success" = FALSE AND "job_artifact_registry"."retry_count" IS NULL AND 1 = 1 LIMIT 1000;
Query plan: https://explain.depesz.com/s/gtkX (Staging - There is no production environment)
☝ -
Geo::JobArtifactRegistryFinder#find_registry_differences
- The following queries assume !24891 (merged) is merged. So they do not have this condition which would cause the queries to be much worse:OR "lfs_objects"."file_store" IS NULL
:-
No selective sync, and object storage is not synced
SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_store" = 1 AND "ci_job_artifacts"."id" BETWEEN 1 AND 1000;
Time: 45.666 ms - planning: 0.198 ms - execution: 45.468 ms - I/O read: 42.696 ms - I/O write: 0.000 ms Shared buffers: - hits: 256 (~2.00 MiB) from the buffer pool - reads: 54 (~432.00 KiB) from the OS file cache, including disk I/O - dirtied: 6 (~48.00 KiB) - writes: 0
Query plan: https://explain.depesz.com/s/kniM
☝ SELECT "job_artifact_registry"."artifact_id" FROM "job_artifact_registry" WHERE "job_artifact_registry"."artifact_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/AhsR (Staging - There is no production environment)
☝ -
With selective sync by shard, and object storage is not synced
WITH "restricted_job_artifacts" AS ( SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."repository_storage" = 'default')) SELECT "ci_job_artifacts"."id" FROM "restricted_job_artifacts" INNER JOIN "ci_job_artifacts" ON "restricted_job_artifacts"."id" = "ci_job_artifacts"."id" WHERE "ci_job_artifacts"."file_store" = 1 AND "ci_job_artifacts"."id" BETWEEN 1 AND 1000;
Time: 1.278 ms - planning: 0.927 ms - execution: 0.351 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 320 (~2.50 MiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/SaWQ
☝ SELECT "job_artifact_registry"."artifact_id" FROM "job_artifact_registry" WHERE "job_artifact_registry"."artifact_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/AhsR (Staging - There is no production environment)
☝ -
With selective sync by namespace, and object storage is not synced
WITH "restricted_job_artifacts" AS ( SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN(WITH RECURSIVE "base_and_descendants" AS (( SELECT "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links" WHERE "geo_node_namespace_links"."geo_node_id" = 2) UNION ( SELECT "namespaces"."id" FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces")) )) SELECT "ci_job_artifacts"."id" FROM "restricted_job_artifacts" INNER JOIN "ci_job_artifacts" ON "restricted_job_artifacts"."id" = "ci_job_artifacts"."id" WHERE "ci_job_artifacts"."file_store" = 1 AND "ci_job_artifacts"."id" BETWEEN 1 AND 1000;
Time: 1.533 s - planning: 1.431 ms - execution: 1.532 s - I/O read: 1.500 s - I/O write: 0.000 ms Shared buffers: - hits: 664 (~5.20 MiB) from the buffer pool - reads: 979 (~7.60 MiB) from the OS file cache, including disk I/O - dirtied: 35 (~280.00 KiB) - writes: 0
Query plan: https://explain.depesz.com/s/MpWF (The subquery for
geo_node_namespace_links
was replaced with a query for thegitlab-org
namespace, since we don't have anygeo_nodes
in production)☝ SELECT "job_artifact_registry"."artifact_id" FROM "job_artifact_registry" WHERE "job_artifact_registry"."artifact_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/AhsR (Staging - There is no production environment)
☝ -
No selective sync, and object storage is synced
SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."id" BETWEEN 1 AND 1000;
Time: 2.606 ms - planning: 0.186 ms - execution: 2.420 ms - I/O read: 2.170 ms - I/O write: 0.000 ms Shared buffers: - hits: 57 (~456.00 KiB) from the buffer pool - reads: 1 (~8.00 KiB) from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/7Kmc
☝ -
With selective sync by shard, and object storage is synced
WITH "restricted_job_artifacts" AS ( SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."repository_storage" = 'default') ) SELECT "ci_job_artifacts"."id" FROM "restricted_job_artifacts" INNER JOIN "ci_job_artifacts" ON "restricted_job_artifacts"."id" = "ci_job_artifacts"."id" WHERE "ci_job_artifacts"."id" BETWEEN 1 AND 1000;
Time: 46.394 s - planning: 0.997 ms - execution: 46.393 s - I/O read: 45.365 s - I/O write: 0.000 ms Shared buffers: - hits: 1241 (~9.70 MiB) from the buffer pool - reads: 41828 (~326.80 MiB) from the OS file cache, including disk I/O - dirtied: 3827 (~29.90 MiB) - writes: 0
Query plan: https://explain.depesz.com/s/379G
☝ SELECT "job_artifact_registry"."artifact_id" FROM "job_artifact_registry" WHERE "job_artifact_registry"."artifact_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/AhsR (Staging - There is no production environment)
☝ -
With selective sync by namespace, and object storage is synced
WITH "restricted_job_artifacts" AS ( SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" IN( SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN(WITH RECURSIVE "base_and_descendants" AS (( SELECT "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links" WHERE "geo_node_namespace_links"."geo_node_id" = 2) UNION ( SELECT "namespaces"."id" FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces")) ) SELECT "ci_job_artifacts"."id" FROM "restricted_job_artifacts" INNER JOIN "ci_job_artifacts" ON "restricted_job_artifacts"."id" = "ci_job_artifacts"."id" WHERE "ci_job_artifacts"."id" BETWEEN 1 AND 1000;
Time: 13.622 ms - planning: 1.851 ms - execution: 11.771 ms - I/O read: 0.000 ms - I/O write: 0.000 ms Shared buffers: - hits: 1629 (~12.70 MiB) from the buffer pool - reads: 0 from the OS file cache, including disk I/O - dirtied: 0 - writes: 0
Query plan: https://explain.depesz.com/s/zbd (The subquery for geo_node_namespace_links was replaced with a query for the gitlab-org namespace, since we don't have any geo_nodes in production)
☝ SELECT "job_artifact_registry"."artifact_id" FROM "job_artifact_registry" WHERE "job_artifact_registry"."artifact_id" BETWEEN 1 AND 1000;
Query plan: https://explain.depesz.com/s/AhsR (Staging - There is no production environment)
☝
-
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team