Skip to content

Add read_at to dependency proxy objects

Steve Abrams requested to merge 341536-dp-read-at-for-ttl into master

🔭 What does this MR do and why?

In 14.4 we added cleanup policies to the Dependency Proxy.

They work by looking at when a Dependency Proxy file (blob or manifest) was last read, then expiring and deleting them depending on the number of days (time-to-live/TTL) in the policy.

In the initial implementation we used the updated_at column to track when these files were last read. This was not optimal because we were updating the updated_at column even though the file/record was not actually being updated.

In this MR, we:

  • Introduce a read_at column to both of the Dependency Proxy models:
    • dependency_proxy_manifests
    • dependency_proxy_blobs
  • Update the code so the read_at column gets updated when they are read rather than the updated_at column
  • Update the cleanup policy (TTL) worker to check the read_at column when determining if an file qualifies for expiration.

🐘 Database

Migrations

Up output
== 20211105125756 AddReadAtToDependencyProxyManifests: migrating ==============
-- add_column(:dependency_proxy_manifests, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f93a69bd610 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125756_add_read_at_to_dependency_proxy_manifests.rb:5 (lambda)>})
   -> 0.0090s
== 20211105125756 AddReadAtToDependencyProxyManifests: migrated (0.0091s) =====

== 20211105125813 AddReadAtToDependencyProxyBlobs: migrating ==================
-- add_column(:dependency_proxy_blobs, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f93a69e6d58 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125813_add_read_at_to_dependency_proxy_blobs.rb:5 (lambda)>})
   -> 0.0024s
== 20211105125813 AddReadAtToDependencyProxyBlobs: migrated (0.0025s) =========

== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: migrating ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_blobs, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0058s
-- execute("SET statement_timeout TO 0")
   -> 0.0008s
-- add_index(:dependency_proxy_blobs, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0110s
-- execute("RESET statement_timeout")
   -> 0.0007s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_manifests, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0037s
-- add_index(:dependency_proxy_manifests, [:group_id, :status, :read_at, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id", :algorithm=>:concurrently})
   -> 0.0041s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_blobs)
   -> 0.0036s
-- remove_index(:dependency_proxy_blobs, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_blobs_on_group_id_status_and_id"})
   -> 0.0059s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_manifests)
   -> 0.0028s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_status_and_id"})
   -> 0.0025s
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: migrated (0.0569s) ==
Down output
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: reverting ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_blobs, [:group_id, :status, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0040s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:dependency_proxy_blobs, [:group_id, :status, :id], {:name=>"index_dependency_proxy_blobs_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0031s
-- execute("RESET statement_timeout")
   -> 0.0007s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:dependency_proxy_manifests, [:group_id, :status, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0020s
-- add_index(:dependency_proxy_manifests, [:group_id, :status, :id], {:name=>"index_dependency_proxy_manifests_on_group_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0024s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_blobs)
   -> 0.0027s
-- remove_index(:dependency_proxy_blobs, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_blobs_on_group_id_status_read_at_id"})
   -> 0.0085s
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_manifests)
   -> 0.0043s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_status_read_at_id"})
   -> 0.0019s
== 20211108203248 UpdateDependencyProxyIndexesWithReadAt: reverted (0.0380s) ==

== 20211105125813 AddReadAtToDependencyProxyBlobs: reverting ==================
-- remove_column(:dependency_proxy_blobs, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007f8c80b82748 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125813_add_read_at_to_dependency_proxy_blobs.rb:5 (lambda)>})
   -> 0.0156s
== 20211105125813 AddReadAtToDependencyProxyBlobs: reverted (0.0215s) =========

== 20211105125756 AddReadAtToDependencyProxyManifests: reverting ==============
-- remove_column(:dependency_proxy_manifests, :read_at, :datetime_with_timezone, {:null=>false, :default=>#<Proc:0x00007fe5287e2fb0 /Users/steveabrams/workspace/gdk-ee/gitlab/db/migrate/20211105125756_add_read_at_to_dependency_proxy_manifests.rb:5 (lambda)>})
   -> 0.0028s
== 20211105125756 AddReadAtToDependencyProxyManifests: reverted (0.0047s) =====

Queries

We change the query in the worker to use read_at rather than updated_at. Updating the index further optimizes the queries from their original state.

Before

Blob Query:

SELECT "dependency_proxy_blobs".*
FROM   "dependency_proxy_blobs"
WHERE  "dependency_proxy_blobs"."group_id" = 9970
       AND "dependency_proxy_blobs"."status" = 0
       AND ( updated_at <= '2021-06-18 20:56:42.607245' );

Explain plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7173/commands/25368

Manifest Query:

SELECT "dependency_proxy_manifests".*
FROM   "dependency_proxy_manifests"
WHERE  "dependency_proxy_manifests"."group_id" = 9970
       AND "dependency_proxy_manifests"."status" = 0
       AND ( updated_at <= '2021-06-18 20:56:42.607245' );

Explain plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7173/commands/25369

After:

Note: in the linked postgres.ai examples, I used the created_at column in the index and queries since it has realistic data that already exists. I believe this will give us the best idea of what the queries against read_at will look like once it is populated and in use.

SELECT "dependency_proxy_blobs".*
FROM   "dependency_proxy_blobs"
WHERE  "dependency_proxy_blobs"."group_id" = 9970
       AND "dependency_proxy_blobs"."status" = 0
       AND ( read_at <= '2021-06-18 20:56:42.607245' );

Explain plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7198/commands/25440

SELECT "dependency_proxy_manifests".*
FROM   "dependency_proxy_manifests"
WHERE  "dependency_proxy_manifests"."group_id" = 9970
       AND "dependency_proxy_manifests"."status" = 0
       AND ( read_at <= '2021-06-18 20:56:42.607245' );

Explain plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7198/commands/25441

Screenshots or screen recordings

These are backend changes, so beyond the rails console validating locally as described below, there is nothing to include in a screenshot.

💻 How to set up and validate locally

To check the read_at updates:

  1. Follow these docs to set up the Dependency Proxy on your GDK.

  2. Create a group and navigate to Packages & Registries -> Dependency Proxy to find the image prefix.

  3. Log into the Dependency Proxy using a PAT:

    docker login gdk.test:3000
    username: root
    password: <personal_access_token>
  4. Pull an image through the dependency proxy:

    # use your image prefix, it should look like
    docker pull gdk.test:3000/<group_path>/dependency_proxy/containers/alpine:latest
  5. In the rails console, check the read_at and updated_at values:

    DependencyProxy::Manifest.select(:read_at, :updated_at).last
    DependencyProxy::Blob.select(:read_at, :updated_at).last
  6. Use docker images to find the IMAGE ID of the image you pulled and then remove it from your local machine's cache:

    docker rmi -f 14119a10abf4
  7. Pull the image again. This time, you are pulling the cached image.

  8. Check the read_at and updated_at values again. The read_at values should have changed, but the updated_at value should remain the same.

To check the worker:

  1. In the rails console, update your group's TTL policy to have a short expiration:
    Group.last.dependency_proxy_image_ttl_policy.update(ttl: 1, enabled: true)
  2. Update your Dependency Proxy objects to have a read_at older than 1 day ago:
    DependencyProxy::Manifest.update_all(read_at: 5.days.ago)
    DependencyProxy::Blob.update_all(read_at: 5.days.ago)
  3. Run the worker
    DependencyProxy::ImageTtlGroupPolicyWorker.perform_in(1.second)
  4. Wait for the worker to complete (this worker kicks off other workers, so make sure all jobs finish).
  5. All of the objects you updated should have been deleted: DependencyProxy::Manifest.all => []

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #341536 (closed)

Edited by Steve Abrams

Merge request reports

Loading