Add package file cleanup jobs
What does this MR do and why?
This is MR 2 of #345755 (closed), in which we are adding a scalable way to delete package files.
In MR 1 (!76767 (merged)) we added the PackageFile#status
enum attribute to allow us to mark package files to be destroyed asynchronously with a pending_destruction
status. For more context please look at the comprehensive description in that MR.
This second MR builds upon that work and:
- Adds the
processing
anderror
values to thestatus
enum, which are useful to track the async destruction status in the limited capacity cleanup worker (explained further below). - Adds a cron job and a limited capacity worker to delete package files marked for destruction.
- The
CleanupPackageRegistryWorker
cron job will run twice a day. It will check if there are any package files pending destruction, in which case it will enqueue the limited capacity worker. - The
CleanupPackageFileWorker
limited capacity worker will pick up and destroy package files that are pending destruction one by one until all have been deleted. A couple important things with this worker are:- We are using the new
processing
anderror
package file statuses for monitoring and more robust processing and error handling: if a package file fails to be destroyed for some reason, we mark it with theerror
status and move on. - We are logging some statistics so that we can build a Kibana dashboard to monitor the following things at any time:
- How many package files are pending destruction.
- How many package files are being processed for destruction.
- How many package files have failed to be destroyed.
- We are using the new
- The
It's worth noting that these new jobs are heavily based on the Dependency Proxy cleanup workers, which are already… working in Production. So, this should provide some extra confidence in this implementation
It's also worth mentioning a couple improvements that may be follow-up issues:
- We could re-use the existing logic in the
DependencyProxy::CleanupWorker
concern instead of duplicating most of it for theCleanupPackageFileWorker
.I've added aUpdate: Addressing backend review feedback, I've now done this refactoring hereCONSIDER
code comment about this.⚡ . - We could add a way to handle package files left with an
error
status. Right now, this is also something pending for the Dependency Proxy cleanup workers.
Screenshots or screen recordings
NA
How to set up and validate locally
- Set up a project to use its Package Registry.
- Publish a bunch of generic package files:
echo 'destroy all the things' > destroy.txt curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-1.txt" curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-2.txt" curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-3.txt" curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-4.txt" curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-5.txt"
- In a Rails console session, mark the package files as
pending_destruction
and run the clean up cron job. The cron job will enqueue the limited capacity job to delete all package files. When using GDK, therails-background-jobs
should pick up and run the enqueued job (this can be monitored locally withgdk tail rails-background-jobs
). After the enqueued job is run, you can confirm that the package files have been destroyed.package_files = Packages::Package.generic.with_name('destroy').with_version('0.0.1').last.package_files package_files.count # => 5 package_files.update_all(status: :pending_destruction) # => 5 Packages::CleanupPackageRegistryWorker.new.perform package_files.count # => 0
Database review
Migration up
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: migrating
-- add_column(:application_settings, :packages_cleanup_package_file_worker_capacity, :smallint, {:default=>2, :null=>false})
-> 0.0028s
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: migrated (0.0029s)
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: migrating
-- transaction_open?()
-> 0.0000s
-- current_schema()
-> 0.0003s
-- transaction_open?()
-> 0.0000s
-- execute("ALTER TABLE application_settings\nADD CONSTRAINT app_settings_p_cleanup_package_file_worker_capacity_positive\nCHECK ( packages_cleanup_package_file_worker_capacity >= 0 )\nNOT VALID;\n")
-> 0.0010s
-- current_schema()
-> 0.0003s
-- execute("SET statement_timeout TO 0")
-> 0.0003s
-- execute("ALTER TABLE application_settings VALIDATE CONSTRAINT app_settings_p_cleanup_package_file_worker_capacity_positive;")
-> 0.0008s
-- execute("RESET statement_timeout")
-> 0.0003s
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: migrated (0.0119s)
== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: migrating =========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:packages_package_files, :status, {:name=>"index_packages_package_files_on_status", :algorithm=>:concurrently})
-> 0.0065s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- add_index(:packages_package_files, :status, {:name=>"index_packages_package_files_on_status", :algorithm=>:concurrently})
-> 0.0048s
-- execute("RESET statement_timeout")
-> 0.0009s
== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: migrated (0.0150s)
Migration down
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: reverting
-- transaction_open?()
-> 0.0000s
-- transaction_open?()
-> 0.0000s
-- execute("ALTER TABLE application_settings\nDROP CONSTRAINT IF EXISTS app_settings_p_cleanup_package_file_worker_capacity_positive\n")
-> 0.0014s
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: reverted (0.0089s)
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: reverting
-- remove_column(:application_settings, :packages_cleanup_package_file_worker_capacity, :smallint, {:default=>2, :null=>false})
-> 0.0020s
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: reverted (0.0044s)
== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: reverting =========
-- transaction_open?()
-> 0.0000s
-- indexes(:packages_package_files)
-> 0.0051s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"index_packages_package_files_on_status"})
-> 0.0032s
-- execute("RESET statement_timeout")
-> 0.0005s
== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: reverted (0.0112s)
Explain plans
Similar to !76767 (merged), for the explain plans I've used a package from gitlab.com which has ~26K+ package files.
I've also run the following to set up the existing data before each query. This will update the status of the package files for my target package to a random value of 0
(default) or 1
(pending destruction).
exec UPDATE packages_package_files SET status = floor(random() * 2) WHERE package_id = XXX;
exec VACUUM ANALYZE packages_package_files;
app/workers/concerns/cleanup_artifact_worker.rb
>#remaining_work_count
app/workers/concerns/cleanup_artifact_worker.rb
>#next_item
app/workers/packages/cleanup_package_registry_worker.rb
>Packages::PackageFile.pending_destruction.exists?
app/workers/packages/cleanup_package_registry_worker.rb
>Packages::PackageFile.pending_destruction.count
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #345755 (closed)