Skip to content

Add read_at to dependency proxy objects

Steve Abrams requested to merge 341536-dp-read-at-for-ttl into master

🔭 What does this MR do and why?

In 14.4 we added cleanup policies to the Dependency Proxy.

They work by looking at when a Dependency Proxy file (blob or manifest) was last read, then expiring and deleting them depending on the number of days (time-to-live/TTL) in the policy.

In the initial implementation we used the updated_at column to track when these files were last read. This was not optimal because we were updating the updated_at column even though the file/record was not actually being updated.

In this MR, we:

  • Introduce a read_at column to both of the Dependency Proxy models:
    • dependency_proxy_manifests
    • dependency_proxy_blobs
  • Update the code so the read_at column gets updated when they are read rather than the updated_at column
  • Update the cleanup policy (TTL) worker to check the read_at column when determining if an file qualifies for expiration.

🐘 Database


Up output
== 20211105125756 AddReadAtToDependencyProxyManifests: migrating ==============
-- add_column(:dependency_proxy_manifests, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f93a69bd610 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125756_add_read_at_to_dependency_proxy_manifests.rb:5 (lambda)>})
   -> 0.0090s
== 20211105125756 AddReadAtToDependencyProxyManifests: migrated (0.0091s) =====

== 20211105125813 AddReadAtToDependencyProxyBlobs: migrating ==================
-- add_column(:dependency_proxy_blobs, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f93a69e6d58 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125813_add_read_at_to_dependency_proxy_blobs.rb:5 (lambda)>})
   -> 0.0024s
== 20211105125813 AddReadAtToDependencyProxyBlobs: migrated (0.0025s) =========

== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: migrating ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_blobs, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0058s
-- execute("SET statement_timeout TO 0")
   -> 0.0008s
-- add_index(:dependency_proxy_blobs, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0110s
-- execute("RESET statement_timeout")
   -> 0.0007s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_manifests, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0037s
-- add_index(:dependency_proxy_manifests, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0041s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_blobs)
   -> 0.0036s
-- remove_index(:dependency_proxy_blobs, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_blobs_on_group_id_status_and_id"})
   -> 0.0059s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_manifests)
   -> 0.0028s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_status_and_id"})
   -> 0.0025s
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: migrated (0.0569s) ==
Down output
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: reverting ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_blobs, [:group_id, :status, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0040s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:dependency_proxy_blobs, [:group_id, :status, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0031s
-- execute("RESET statement_timeout")
   -> 0.0007s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_manifests, [:group_id, :status, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0020s
-- add_index(:dependency_proxy_manifests, [:group_id, :status, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0024s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_blobs)
   -> 0.0027s
-- remove_index(:dependency_proxy_blobs, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id"})
   -> 0.0085s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_manifests)
   -> 0.0043s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id"})
   -> 0.0019s
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: reverted (0.0380s) ==

== 20211105125813 AddReadAtToDependencyProxyBlobs: reverting ==================
-- remove_column(:dependency_proxy_blobs, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f8c80b82748 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125813_add_read_at_to_dependency_proxy_blobs.rb:5 (lambda)>})
   -> 0.0156s
== 20211105125813 AddReadAtToDependencyProxyBlobs: reverted (0.0215s) =========

== 20211105125756 AddReadAtToDependencyProxyManifests: reverting ==============
-- remove_column(:dependency_proxy_manifests, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007fe5287e2fb0 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125756_add_read_at_to_dependency_proxy_manifests.rb:5 (lambda)>})
   -> 0.0028s
== 20211105125756 AddReadAtToDependencyProxyManifests: reverted (0.0047s) =====


We change the query in the worker to use read_at rather than updated_at. Updating the index further optimizes the queries from their original state.


Blob Query:

SELECT "dependency_proxy_blobs".*
FROM   "dependency_proxy_blobs"
WHERE  "dependency_proxy_blobs"."group_id" = 9970
       AND "dependency_proxy_blobs"."status" = 0
       AND ( updated_at <= '2021-06-18 20:56:42.607245' );

Explain plan:

Manifest Query:

SELECT "dependency_proxy_manifests".*
FROM   "dependency_proxy_manifests"
WHERE  "dependency_proxy_manifests"."group_id" = 9970
       AND "dependency_proxy_manifests"."status" = 0
       AND ( updated_at <= '2021-06-18 20:56:42.607245' );

Explain plan:


Note: in the linked examples, I used the created_at column in the index and queries since it has realistic data that already exists. I believe this will give us the best idea of what the queries against read_at will look like once it is populated and in use.

SELECT "dependency_proxy_blobs".*
FROM   "dependency_proxy_blobs"
WHERE  "dependency_proxy_blobs"."group_id" = 9970
       AND "dependency_proxy_blobs"."status" = 0
       AND ( read_at <= '2021-06-18 20:56:42.607245' );

Explain plan:

SELECT "dependency_proxy_manifests".*
FROM   "dependency_proxy_manifests"
WHERE  "dependency_proxy_manifests"."group_id" = 9970
       AND "dependency_proxy_manifests"."status" = 0
       AND ( read_at <= '2021-06-18 20:56:42.607245' );

Explain plan:

Screenshots or screen recordings

These are backend changes, so beyond the rails console validating locally as described below, there is nothing to include in a screenshot.

💻 How to set up and validate locally

To check the read_at updates:

  1. Follow these docs to set up the Dependency Proxy on your GDK.

  2. Create a group and navigate to Packages & Registries -> Dependency Proxy to find the image prefix.

  3. Log into the Dependency Proxy using a PAT:

    docker login gdk.test:3000
    username: root
    password: <personal_access_token>
  4. Pull an image through the dependency proxy:

    # use your image prefix, it should look like
    docker pull gdk.test:3000/<group_path>/dependency_proxy/containers/alpine:latest
  5. In the rails console, check the read_at and updated_at values:, :updated_at).last, :updated_at).last
  6. Use docker images to find the IMAGE ID of the image you pulled and then remove it from your local machine's cache:

    docker rmi -f 14119a10abf4
  7. Pull the image again. This time, you are pulling the cached image.

  8. Check the read_at and updated_at values again. The read_at values should have changed, but the updated_at value should remain the same.

To check the worker:

  1. In the rails console, update your group's TTL policy to have a short expiration:
    Group.last.dependency_proxy_image_ttl_policy.update(ttl: 1, enabled: true)
  2. Update your Dependency Proxy objects to have a read_at older than 1 day ago:
    DependencyProxy::Manifest.update_all(read_at: 5.days.ago)
    DependencyProxy::Blob.update_all(read_at: 5.days.ago)
  3. Run the worker
  4. Wait for the worker to complete (this worker kicks off other workers, so make sure all jobs finish).
  5. All of the objects you updated should have been deleted: DependencyProxy::Manifest.all => []

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #341536 (closed)

Edited by Steve Abrams

Merge request reports
