Add read_at to dependency proxy objects
🔭 What does this MR do and why?
In 14.4 we added cleanup policies to the Dependency Proxy.
They work by looking at when a Dependency Proxy file (blob or manifest) was last read, then expiring and deleting them depending on the number of days (time-to-live/TTL) in the policy.
In the initial implementation we used the updated_at
column to track when these files were last read. This was not optimal because we were updating the updated_at
column even though the file/record was not actually being updated.
In this MR, we:
- Introduce a
read_at
column to both of the Dependency Proxy models:dependency_proxy_manifests
dependency_proxy_blobs
- Update the code so the
read_at
column gets updated when they are read rather than theupdated_at
column - Update the cleanup policy (TTL) worker to check the
read_at
column when determining if an file qualifies for expiration.
🐘 Database
Migrations
Up output
== 20211105125756 AddReadAtToDependencyProxyManifests: migrating ==============
-- add_column(:dependency_proxy_manifests, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f93a69bd610 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125756_add_read_at_to_dependency_proxy_manifests.rb:5 (lambda)>})
-> 0.0090s
== 20211105125756 AddReadAtToDependencyProxyManifests: migrated (0.0091s) =====
== 20211105125813 AddReadAtToDependencyProxyBlobs: migrating ==================
-- add_column(:dependency_proxy_blobs, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f93a69e6d58 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125813_add_read_at_to_dependency_proxy_blobs.rb:5 (lambda)>})
-> 0.0024s
== 20211105125813 AddReadAtToDependencyProxyBlobs: migrated (0.0025s) =========
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: migrating ===========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:dependency_proxy_blobs, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id", :algorithm=>:concurrently})
-> 0.0058s
-- execute("SET statement_timeout TO 0")
-> 0.0008s
-- add_index(:dependency_proxy_blobs, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id", :algorithm=>:concurrently})
-> 0.0110s
-- execute("RESET statement_timeout")
-> 0.0007s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:dependency_proxy_manifests, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id", :algorithm=>:concurrently})
-> 0.0037s
-- add_index(:dependency_proxy_manifests, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id", :algorithm=>:concurrently})
-> 0.0041s
-- transaction_open?()
-> 0.0000s
-- indexes(:dependency_proxy_blobs)
-> 0.0036s
-- remove_index(:dependency_proxy_blobs, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_blobs_on_group_id_status_and_id"})
-> 0.0059s
-- transaction_open?()
-> 0.0000s
-- indexes(:dependency_proxy_manifests)
-> 0.0028s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_status_and_id"})
-> 0.0025s
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: migrated (0.0569s) ==
Down output
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: reverting ===========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:dependency_proxy_blobs, [:group_id, :status, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_and_id", :algorithm=>:concurrently})
-> 0.0040s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- add_index(:dependency_proxy_blobs, [:group_id, :status, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_and_id", :algorithm=>:concurrently})
-> 0.0031s
-- execute("RESET statement_timeout")
-> 0.0007s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:dependency_proxy_manifests, [:group_id, :status, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_and_id", :algorithm=>:concurrently})
-> 0.0020s
-- add_index(:dependency_proxy_manifests, [:group_id, :status, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_and_id", :algorithm=>:concurrently})
-> 0.0024s
-- transaction_open?()
-> 0.0000s
-- indexes(:dependency_proxy_blobs)
-> 0.0027s
-- remove_index(:dependency_proxy_blobs, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id"})
-> 0.0085s
-- transaction_open?()
-> 0.0000s
-- indexes(:dependency_proxy_manifests)
-> 0.0043s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id"})
-> 0.0019s
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: reverted (0.0380s) ==
== 20211105125813 AddReadAtToDependencyProxyBlobs: reverting ==================
-- remove_column(:dependency_proxy_blobs, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f8c80b82748 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125813_add_read_at_to_dependency_proxy_blobs.rb:5 (lambda)>})
-> 0.0156s
== 20211105125813 AddReadAtToDependencyProxyBlobs: reverted (0.0215s) =========
== 20211105125756 AddReadAtToDependencyProxyManifests: reverting ==============
-- remove_column(:dependency_proxy_manifests, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007fe5287e2fb0 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125756_add_read_at_to_dependency_proxy_manifests.rb:5 (lambda)>})
-> 0.0028s
== 20211105125756 AddReadAtToDependencyProxyManifests: reverted (0.0047s) =====
Queries
We change the query in the worker to use read_at
rather than updated_at
. Updating the index further optimizes the queries from their original state.
Before
Blob Query:
SELECT "dependency_proxy_blobs".*
FROM "dependency_proxy_blobs"
WHERE "dependency_proxy_blobs"."group_id" = 9970
AND "dependency_proxy_blobs"."status" = 0
AND ( updated_at <= '2021-06-18 20:56:42.607245' );
Explain plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7173/commands/25368
Manifest Query:
SELECT "dependency_proxy_manifests".*
FROM "dependency_proxy_manifests"
WHERE "dependency_proxy_manifests"."group_id" = 9970
AND "dependency_proxy_manifests"."status" = 0
AND ( updated_at <= '2021-06-18 20:56:42.607245' );
Explain plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7173/commands/25369
After:
Note: in the linked postgres.ai examples, I used the created_at
column in the index and queries since it has realistic data that already exists. I believe this will give us the best idea of what the queries against read_at
will look like once it is populated and in use.
SELECT "dependency_proxy_blobs".*
FROM "dependency_proxy_blobs"
WHERE "dependency_proxy_blobs"."group_id" = 9970
AND "dependency_proxy_blobs"."status" = 0
AND ( read_at <= '2021-06-18 20:56:42.607245' );
Explain plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7198/commands/25440
SELECT "dependency_proxy_manifests".*
FROM "dependency_proxy_manifests"
WHERE "dependency_proxy_manifests"."group_id" = 9970
AND "dependency_proxy_manifests"."status" = 0
AND ( read_at <= '2021-06-18 20:56:42.607245' );
Explain plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7198/commands/25441
Screenshots or screen recordings
These are backend changes, so beyond the rails console validating locally as described below, there is nothing to include in a screenshot.
💻 How to set up and validate locally
To check the read_at
updates:
-
Follow these docs to set up the Dependency Proxy on your GDK.
-
Create a group and navigate to
Packages & Registries -> Dependency Proxy
to find the image prefix. -
Log into the Dependency Proxy using a PAT:
docker login gdk.test:3000 username: root password: <personal_access_token>
-
Pull an image through the dependency proxy:
# use your image prefix, it should look like docker pull gdk.test:3000/<group_path>/dependency_proxy/containers/alpine:latest
-
In the rails console, check the
read_at
andupdated_at
values:DependencyProxy::Manifest.select(:read_at, :updated_at).last DependencyProxy::Blob.select(:read_at, :updated_at).last
-
Use
docker images
to find the IMAGE ID of the image you pulled and then remove it from your local machine's cache:docker rmi -f 14119a10abf4
-
Pull the image again. This time, you are pulling the cached image.
-
Check the
read_at
andupdated_at
values again. Theread_at
values should have changed, but theupdated_at
value should remain the same.
To check the worker:
- In the rails console, update your group's TTL policy to have a short expiration:
Group.last.dependency_proxy_image_ttl_policy.update(ttl: 1, enabled: true)
- Update your Dependency Proxy objects to have a
read_at
older than 1 day ago:DependencyProxy::Manifest.update_all(read_at: 5.days.ago) DependencyProxy::Blob.update_all(read_at: 5.days.ago)
- Run the worker
DependencyProxy::ImageTtlGroupPolicyWorker.perform_in(1.second)
- Wait for the worker to complete (this worker kicks off other workers, so make sure all jobs finish).
- All of the objects you updated should have been deleted:
DependencyProxy::Manifest.all => []
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #341536 (closed)