Add API to bulk delete project artifacts
What does this MR do and why?
Adds a new API to delete project build artifacts in bulk.
DELETE /projects/:id/artifacts
Project maintainers can call this API to remove all existing erasable job artifacts from non-locked pipelines.
Some keywords here are:
- erasable job artifacts - all job artifacts except
trace
- non-locked pipelines - pipelines that are not the latest pipeline of each ref
This MR also adds a background migration to add index on ci_job_artifacts
on id
, project_id
and file_type
.
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
Dependencies
How to set up and validate locally
- Start with FF
:bulk_expire_project_artifacts
disabled - Create a new project
- Run a
main
branch pipeline that creates an artifact (pipeline1) - Add a new commit on
main
branch, run a new pipeline onmain
that creates an artifact (pipeline2) - Verify that pipeline1 artifacts are
unknown?
and expire_at is in the future based on the default (30 days) - Verify that pipeline2 artifacts are
unknown?
and expire_at is in the future based on the default (30 days) - Add a tag on
main
branch, creating a pipeline for the tag with artifacts (pipeline_tagged) - Verify that pipeline_tagged artifacts are
unknown?
and expire_at is in the future based on the default (30 days) - Call API
DELETE /projects/:id/artifacts
, verify 404 response - Enable FF
:bulk_expire_project_artifacts
- Call API
DELETE /projects/:id/artifacts
, verify 202 response, leaving some time for the worker to complete. - Verify that pipeline1 artifacts that are not
trace
areunlocked?
and expire_at is the current time - Verify that pipeline1 artifacts that are
trace
remains unchanged. - Verify that pipeline2 and pipeline_tagged artifacts are unchanged
- Run
Ci::ExpireBuildArtifactsWorker
- Wait for the next run of
Ci::DeleteObjectsWorker
- Verify that pipeline1 artifacts that have been expired are removed
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Feature flags
-
bulk_expire_project_artifacts
- roll out issue: #347405 (closed)
Database queries
Migration:
CREATE INDEX index_ci_job_artifacts_on_id_project_id_and_file_type ON ci_job_artifacts USING btree (project_id, file_type, id);
SELECT query:
SELECT "id"
FROM (WITH RECURSIVE "array_cte"
AS MATERIALIZED (SELECT "file_type" FROM (VALUES (1), (2), (4), (5), (6), (7), (8), (9), (101), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21), (22), (23), (24), (25), (26), (27)) AS file_types (file_type)),
"recursive_keyset_cte" AS ((SELECT NULL::bigint AS id, array_cte_file_type_array, ci_job_artifacts_id_array, 0::bigint AS count FROM (SELECT ARRAY_AGG("array_cte"."file_type") AS array_cte_file_type_array, ARRAY_AGG("ci_job_artifacts"."id") AS ci_job_artifacts_id_array FROM (SELECT "array_cte"."file_type" FROM array_cte) array_cte LEFT JOIN LATERAL (SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" = 26 AND "ci_job_artifacts"."file_type" = "array_cte"."file_type" ORDER BY "ci_job_artifacts"."id" ASC LIMIT 1) ci_job_artifacts ON TRUE WHERE "ci_job_artifacts"."id" IS NOT NULL) array_scope_lateral_query LIMIT 1)
UNION ALL
(
SELECT recursive_keyset_cte.ci_job_artifacts_id_array[position], array_cte_file_type_array, recursive_keyset_cte.ci_job_artifacts_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.ci_job_artifacts_id_array[position_query.position+1:], recursive_keyset_cte.count + 1
FROM recursive_keyset_cte,
LATERAL (SELECT id, position FROM UNNEST(ci_job_artifacts_id_array) WITH ORDINALITY AS u(id, position) WHERE id IS NOT NULL ORDER BY 1 ASC LIMIT 1) AS position_query,
LATERAL (SELECT "record"."id" FROM (VALUES (NULL)) AS nulls LEFT JOIN (SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" = 278964 AND "ci_job_artifacts"."file_type" = recursive_keyset_cte.array_cte_file_type_array[position] AND ("ci_job_artifacts"."id" > recursive_keyset_cte.ci_job_artifacts_id_array[position]) ORDER BY "ci_job_artifacts"."id" ASC LIMIT 1) record ON TRUE LIMIT 1) AS next_cursor_values
))
SELECT id
FROM "recursive_keyset_cte" AS "ci_job_artifacts"
WHERE (count <> 0)) ci_job_artifacts LIMIT 1000
Query plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7797/commands/27802
Update query:
UPDATE "ci_job_artifacts"
SET "locked" = 0,
"expire_at" = '2021 - 12 - 28 10:48:09.406304'
WHERE "ci_job_artifacts"."id" IN (SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" INNER JOIN "ci_builds" ON "ci_builds"."id" = "ci_job_artifacts"."job_id"
AND "ci_builds"."type" = 'Ci::Build' INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_builds"."commit_id" WHERE "ci_pipelines"."locked" = 0 AND "ci_job_artifacts"."id" IN (13263, 13264, 13265, 13266, 13267, 13268, 13269, 13270, 13271, 13272, 13368, 13369, 13370, 13371, 13372, 13373, 13374, 13375, 13376, 13377))
Query plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7797/commands/27803
Related to #223793 (closed)